Data Science

How to Predict Stock Price Using SVM

Machine Learning is an Artificial Intelligence application that is improving the way the world works in every domain. At its core, it is an algorithm or model that observes patterns in a given data set and then predicts the learned patterns on general data. In layman’s terms, it’s the idea that machines learn a pattern and adapt through experience to make accurate and repeatable decisions. The Support Vector Machine is a popular ML algorithm that we will use today to predict stock prices. There are several advantages to this model, which we will discuss and walk through the approach’s implementation.

What is a Hyperplane?

A hyperplane in n-dimensional space is an (n-1)-dimensional subspace; if space is 3-dimensional, then its hyperplanes are the 2-dimensional planes. An n-dimensional space is always spanned by a set of n linearly independent vectors, and it is always possible to find n mutually orthogonal vectors that span the space. That may or may not be in the definition of a finite-dimensional vector space, but it is a fact for which proof can be found in almost any undergraduate linear algebra textbook.

As a result, a hyperplane in n-space is spanned by n-1 linearly independent vectors and has an nth vector (not in the plane) orthogonal to it.

What is a Support Vector Machine?

The Support Vector Machine (SVM) is a supervised machine learning binary classification algorithm. Given a set of two types of points in N dimensions, SVM generates an (N-1) dimensional hyperplane to divide those points into two groups as shown below:

In the above figure, SVM will choose the red line as the best hyperplane separating the blue and green classes.

Let’s suppose you have two types of points in a plane that are linearly separable. SVM will find a straight line that divides those points into two types and is as far away from all of them as possible. This line is known as a hyperplane, and it was chosen so that outliers are not ignored, and points of different classes are as far apart as possible. If the points cannot be separated, SVM uses a kernel transformation to increase the dimensions of the points.

The case discussed above was pretty straightforward because the data was separable linearly — as we saw, we could draw a straight line to separate red and blue types of points.

What if the data is not linearly separable? We won’t be able to separate the classes by drawing a straight hyperplane. To tackle this challenge, we’re going to add a third dimension to the dataset. We had two dimensions up until now: x and y. We create a new dimension and mandate that it is calculated in a manner that is convenient for us: z = x2 + y2.

This will create a three-dimensional space from the previous points. We can infer from the below figure that initially, the points were not linearly separable, but after applying the kernel function, we easily separated the data points. There are many kernel functions available that you can choose according to your use case.

Advantages of SVM

  1. Good for data where the number of dimensions is more than the number of data points.
  2. Good for both classification and regression.
  3. It is space-optimized.
  4. It handles outliers.

Disadvantages of SVM

  1. It is difficult to select a “good” kernel function.
  2. Large data sets require a long training time.
  3. The final model is difficult to understand and interpret, with variable weights and individual impact.
  4. We can’t do small calibrations to the model because the final model isn’t easily visible, making it difficult to incorporate our business logic.

Stock Price Directions Prediction Using SVM

Stock market predictions are made by predicting the future value of a company’s stock or another financial instrument traded on an exchange using fundamental or technical analysis.

The benefit of stock market prediction is that it allows you to invest wisely and profitably.

The first task for this implementation is to import all the libraries and modules in our script. sklearn will be used to build the model, pandas will be used to handle data frames, and numpy is for linear algebra. Below are the required imports that we do:

from sklearn.svm import SVC

from sklearn.metrics import accuracy_score

import pandas as pd

import numpy as np

The next task is to read the dataset from the file. The file will be in external storage, and you can download the dataset from here.

# Reading the CSV file from external storage

df = pd.read_csv('RELIANCE.csv')

Assign the datetime as the index of the data frame and drop the “date” column

# Making date as index column

df.index = pd.to_datetime(df['Date'])

# drop the column named “Date”

df = df.drop(['Date'], axis='columns')

Assign the input features to a variable

# Create predictor variables

df['Open-Close'] = df.Open - df.Close

df['High-Low'] = df.High - df.Low

# Store all predictor variables in a variable X

X = df[['Open-Close', 'High-Low']]

print(X.head())

Assign target column to another variable

# Target variables

y = np.where(df['Close'].shift(-1) > df['Close'], 1, 0)

print(y)

Split the dataset into train and test samples. The train samples will build up the model, while the test samples will identify the model’s accuracy.

split = int(0.9*len(df))

# Train data set

X_train = X[:split]

y_train = y[:split]

# Test data set

X_test = X[split:]

y_test = y[split:]

Create the SVM model now

# Support vector classifier

model = SVC().fit(X_train, y_train)

You can find the accuracy of this model using various metrics.

To predict the signal of the stock, use the below method.

df['sig'] = model.predict(X)

Conclusion

This article went through the discussion, advantages, and use cases of Support Vector Machines. It is a popular and space-efficient algorithm for both classification and regression tasks, and it uses geometrical principles to solve our problems. Later, we also implemented stock price direction prediction using the SVM algorithm. Stock price prediction is extremely helpful in the business world, and when we employ automation for this, it creates more hype for this problem.

About the author

Simran Kaur

Simran works as a technical writer. The graduate in MS Computer Science from the well known CS hub, aka Silicon Valley, is also an editor of the website. She enjoys writing about any tech topic, including programming, algorithms, cloud, data science, and AI. Travelling, sketching, and gardening are the hobbies that interest her.