Python

Logistic Regression Sklearn

Python is a modern high-level object-oriented programming language designed to help programmers create and write easy-to-understand and straightforward codes. It is a simple and easy high-level programming language, best and easy to understand for beginners. Its built-in high-level data structures, together with dynamic typing & binding, make it an excellent choice for rapid application development. Its simple syntax makes it more readable, which ultimately reduces the cost of maintaining the program. It supports various packages and modules that emphasize reusing the code and enhancing the program’s modularity. Its extensive standard library and several interpreters are available free of cost as well as online. Python’s capability of increased productivity makes the programmers fall in love with the language.

Moreover, the edit, test, and debug cycle is incredibly fast as there is no complication step involved. Python makes machine learning principles simple to learn and comprehend. It gives a bird’s eye view of how to step through a small or big machine learning project. This article is about what a logistic regression is and its classifiers. Let’s start with the fundamentals of logistic regression.

Logistic Regression Definition

Logistic regression is a classification algorithm. An appropriate regression analysis algorithm from the fraternity of machine learning describes data. It explains the relationship between multiple variables, i.e., ratio level or interval independent variable, ordinal or nominal dependent binary variable. Logistic regression is generally used in statistical models to understand the data and the relationship between dependent and independent variables by predicting the probabilities of categorical dependent variables. As the number of data increases rapidly, the strength of computing power and algorithm improvement is rising, enhancing the importance of machine learning and data science. Across machine learning, classification has become the essential area, and one of its basic methods is logistic regression. By the end of this article, you’ll be able to implement logistic regression on various types of data. Let us begin to apply suitable classes, functions, and appropriate packages to perform logistic regression in python. One of the common python packages for logistic regression is sklearn. Here, we will show you a step-by-step practical example of logistic regression sklearn in python to help you understand how to implement logistic regression sklearn in python.

Steps to implement logistic regression sklearn in python

Step 1: Collect the data

To start with a small or big project, the first thing you need is the data on which you will build a logistic regression model. Here is the command to prepare the model for the dataset.

Step 2: Import the necessary packages of python

Once you install the dataprep, the next step is to import the packages needed to implement the logistic regression. Here, we are learning about the sklearn package, which is basically used to build the logistic regression model in python. The following packages need to be installed:

import pandas as pd

import numpy as np

import matplotlib

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.preprocessing import LabelEncoder

from sklearn.metrics import confusion_matrix

from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, classification_report

from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import train_test_split

from sklearn import metrics

from sklearn.metrics import accuracy_score

from imblearn.over_sampling import SMOTE

Step 3: Load the data to build a dataframe

The next step is to capture the dataset, for which you need the following command to use:

df = pd.read_csv("/content/drive/MyDrive/Covid Dataset.csv")

This way, you can import the data from an external file; however, alternatively, you can define the dataset in the form of an array.

Step 4: Creating the logistic regression after loading the data

The next step is to develop the logistic regression in python after the data has been put into a python application. In this step, you need to set the dependent and independent variables. Here is how you can set the variable:

X= df.drop('COVID-19',axis=1)

y=df['COVID-19']

The ‘X’ variable represents the independent variable, and the ‘Y’ variable represents the dependent variable. Now apply the train_text_split function to set the testing and training size of the dataset.

X_train,X_test,y_train,y_test=train_test_split(X, y, test_size = 0.20)

Step 5: Apply logistic regression

Now apply the logistic regression by following the command given below:

model = LogisticRegression()

# Fitting the Model

model.fit(X_train,y_train)

y_pred = model.predict(X_test)

acc_logreg=model.score(X_test, y_test)*100

Step 6: Plot the confusion matrix

The final part is to plot the confusion matrix which shows the accuracy in true positive and false positive form.

confusion_mtx = confusion_matrix(y_test, y_pred)

# plot the confusion matrix

# plot the confusion matrix

f,ax = plt.subplots(figsize=(8, 8))

sns.heatmap(confusion_mtx, annot=True, linewidths=0.01,cmap="Greens",linecolor="gray", fmt= '.1f',ax=ax)

plt.xlabel("Predicted Label")

plt.ylabel("True Label")

plt.title("Confusion Matrix")

plt.show()

A screenshot of a computer Description automatically generated with medium confidence

To print the accuracy or, in other words, the classification report, use the following command:

print(classification_report(y_test, y_pred))

Once you run all the commands, you will get a confusion matrix as well as a classification report. Take a look at the output below.

Confusion matrix:

True positive (tp), false negative (fn), true negative (tn), and false positive (fp) are the four core values in the confusion matrix.

Chart Description automatically generated

Classification report:

Classification report provides the accuracy of the trained model, which can be achieved by using the formula:

Accuracy = (tp + tn) / Total

Conclusion:

This article taught us the logistic regression and sklearn library in python. The data is explained, and the link between the dependent and independent variables is described using logistic regression. The sklearn library in python is mostly used in statistical data where prediction or probability is required to be known.

About the author

Kalsoom Bibi

Hello, I am a freelance writer and usually write for Linux and other technology related content