What is a Hyperplane?
A hyperplane in n-dimensional space is an (n-1)-dimensional subspace; if space is 3-dimensional, then its hyperplanes are the 2-dimensional planes. An n-dimensional space is always spanned by a set of n linearly independent vectors, and it is always possible to find n mutually orthogonal vectors that span the space. That may or may not be in the definition of a finite-dimensional vector space, but it is a fact for which proof can be found in almost any undergraduate linear algebra textbook.
As a result, a hyperplane in n-space is spanned by n-1 linearly independent vectors and has an nth vector (not in the plane) orthogonal to it.
What is a Support Vector Machine?
The Support Vector Machine (SVM) is a supervised machine learning binary classification algorithm. Given a set of two types of points in N dimensions, SVM generates an (N-1) dimensional hyperplane to divide those points into two groups as shown below:
In the above figure, SVM will choose the red line as the best hyperplane separating the blue and green classes.
Let’s suppose you have two types of points in a plane that are linearly separable. SVM will find a straight line that divides those points into two types and is as far away from all of them as possible. This line is known as a hyperplane, and it was chosen so that outliers are not ignored, and points of different classes are as far apart as possible. If the points cannot be separated, SVM uses a kernel transformation to increase the dimensions of the points.
The case discussed above was pretty straightforward because the data was separable linearly — as we saw, we could draw a straight line to separate red and blue types of points.
What if the data is not linearly separable? We won’t be able to separate the classes by drawing a straight hyperplane. To tackle this challenge, we’re going to add a third dimension to the dataset. We had two dimensions up until now: x and y. We create a new dimension and mandate that it is calculated in a manner that is convenient for us: z = x2 + y2.
This will create a three-dimensional space from the previous points. We can infer from the below figure that initially, the points were not linearly separable, but after applying the kernel function, we easily separated the data points. There are many kernel functions available that you can choose according to your use case.
Advantages of SVM
- Good for data where the number of dimensions is more than the number of data points.
- Good for both classification and regression.
- It is space-optimized.
- It handles outliers.
Disadvantages of SVM
- It is difficult to select a “good” kernel function.
- Large data sets require a long training time.
- The final model is difficult to understand and interpret, with variable weights and individual impact.
- We can’t do small calibrations to the model because the final model isn’t easily visible, making it difficult to incorporate our business logic.
Stock Price Directions Prediction Using SVM
Stock market predictions are made by predicting the future value of a company’s stock or another financial instrument traded on an exchange using fundamental or technical analysis.
The benefit of stock market prediction is that it allows you to invest wisely and profitably.
The first task for this implementation is to import all the libraries and modules in our script. sklearn will be used to build the model, pandas will be used to handle data frames, and numpy is for linear algebra. Below are the required imports that we do:
from sklearn.metrics import accuracy_score
import pandas as pd
import numpy as np
The next task is to read the dataset from the file. The file will be in external storage, and you can download the dataset from here.
df = pd.read_csv('RELIANCE.csv')
Assign the datetime as the index of the data frame and drop the “date” column
df.index = pd.to_datetime(df['Date'])
# drop the column named “Date”
df = df.drop(['Date'], axis='columns')
Assign the input features to a variable
df['Open-Close'] = df.Open - df.Close
df['High-Low'] = df.High - df.Low
# Store all predictor variables in a variable X
X = df[['Open-Close', 'High-Low']]
print(X.head())
Assign target column to another variable
y = np.where(df['Close'].shift(-1) > df['Close'], 1, 0)
print(y)
Split the dataset into train and test samples. The train samples will build up the model, while the test samples will identify the model’s accuracy.
# Train data set
X_train = X[:split]
y_train = y[:split]
# Test data set
X_test = X[split:]
y_test = y[split:]
Create the SVM model now
model = SVC().fit(X_train, y_train)
You can find the accuracy of this model using various metrics.
To predict the signal of the stock, use the below method.
Conclusion
This article went through the discussion, advantages, and use cases of Support Vector Machines. It is a popular and space-efficient algorithm for both classification and regression tasks, and it uses geometrical principles to solve our problems. Later, we also implemented stock price direction prediction using the SVM algorithm. Stock price prediction is extremely helpful in the business world, and when we employ automation for this, it creates more hype for this problem.