Concept of Logistic Regression
Logistic Regression is a binary classification algorithm. It is a decision-making algorithm, which means it creates boundaries between two classes. It extends the Linear regression problem that uses an activation function on its outputs to limit it between 1 and 0. As a result, this is used for binary classification problems. The graph of logistic regression looks like the below figure:
We can see that the graph is restricted between 0 and 1. Normal linear regression can give the target value as any real number, but this is not the case with logistic regression due to the sigmoid function. Logistic Regression is based on the concept of Maximum Likelihood Estimation (MLE). Maximum likelihood is simply taking a probability distribution with a given set of parameters and asking, “How likely is it that I would see this data if my data was generated from this probability distribution?” It works by calculating the likelihood for each individual data point and then multiplying all of those likelihoods together. In practice, we add the logarithms of the likelihoods.
If we need to build a machine learning model, each independent variable data point will be x1 * w1 + x2 * w2… and so on, yielding a value between 0 and 1 when passed through the activation function. If we take 0.50 as a deciding factor or threshold. Then, any result greater than 0.5 is regarded as a 1, while any result less than that is regarded as a 0.
For more than 2 classes, we use the One-Vs-All approach. One-Vs-All, also known as One-Vs-Rest, is a multilabel and multiclass ML classification process. It works by first training a binary classifier for each category, then fitting each classifier to each input to determine which class the input belongs to. If your problem has n classes, One-Vs-All will convert your training dataset into n binary classification problems.
The loss function associated with the logistic regression is Binary Cross Entropy which is the reverse of information gain. This is also known as the name log loss. The loss function is given by the equation:
What is Loss Function?
A loss function is a mathematical metric that we want to reduce. We want to build a model that can accurately predict what we want, and one way to measure the model’s performance is to look at the loss since we know what the model outputs and what we should be getting. We can train and improve our model by using this loss and adjusting the model’s parameters accordingly. Loss functions vary depending on the type of algorithm. For Linear Regression, Mean Squared Error and Mean Absolute Error are popular loss functions, whereas Cross-Entropy is appropriate for classification problems.
What is the Activation Function?
Activation Functions are simply mathematical functions that modify the input variable to give a new output. This is usually done in Machine Learning to either standardize the data or restrict the input to a certain limit. Popular action functions are sigmoid, Rectified Linear Unit (ReLU), Tan(h), etc.
What is PyTorch?
Pytorch is a popular deep learning alternative that works with Torch. It was created by Facebook’s AI department, but it can be used similarly to other options. It is used to develop a variety of models, but it is most widely applied in the natural language processing (NLP) use cases. Pytorch is always a great option if you wish to build models with very few resources and want a user-friendly, easy to use and light library for your models. It also feels natural, which aids in the completion of the process. We will be using PyTorch for the implementation of our models due to the mentioned reasons. However, the algorithm remains the same with other alternatives like Tensorflow.
Implementing Logistic Regression in PyTorch
We will use the below steps for implementing our model:
- Create a neural network with some parameters that will be updated after each iteration.
- Iterate through the given input data.
- The input will pass through the network using forward propagation.
- We now calculate the loss using binary cross-entropy.
- To minimize the cost function, we update the parameters using gradient descent.
- Again do the same steps using updated parameters.
We will be classifying the MNIST dataset digits. This is a popular Deep Learning problem taught to beginners.
Let’s first import the required libraries and modules.
from torch.autograd import Variable
import torchvision.transforms as transforms
import torchvision.datasets as dsets
The next step is to import the dataset.
test = dsets.MNIST(root='./data', train=False, transform=transforms.ToTensor())
Use data loader to make your data iterable
test_loader = torch.utils.data.DataLoader(dataset=test, batch_size=batch_size, shuffle=False)
Define the model.
def __init__(self, inp, out):
super(Model, self).__init__()
self.linear = torch.nn.Linear(inp, out)
def forward(self, x):
outputs = self.linear(x)
return outputs
Specify the hyperparameters, optimizer, and loss.
n_iters = 1500
epochs = n_iters / (len(train_dataset) / batch)
inp = 784
out = 10
alpha = 0.001
model = LogisticRegression(inp, out)
loss = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=alpha)
Train the model finally.
for epoch in range(int(epochs)):
for i, (images, labels) in enumerate(train_loader):
images = Variable(images.view(-1, 28 * 28))
labels = Variable(labels)
optimizer.zero_grad()
outputs = model(images)
lossFunc = loss(outputs, labels)
lossFunc.backward()
optimizer.step()
itr+=1
if itr%500==0:
correct = 0
total = 0
for images, labels in test_loader:
images = Variable(images.view(-1, 28*28))
outputs = model(images)
_, predicted = torch.max(outputs.data, 1)
total+= labels.size(0)
correct+= (predicted == labels).sum()
accuracy = 100 * correct/total
print("Iteration is {}. Loss is {}. Accuracy is {}.".format(itr, lossFunc.item(), accuracy))
Conclusion
We went through the explanation of Logistic Regression and its implementation using PyTorch, which is a popular library for developing Deep Learning models. We implemented the MNIST dataset classification problem where we recognized the digits based on the images parameters.