Python is a modern high-level object-oriented programming language designed to help programmers create and write easy-to-understand and straightforward codes. It is a simple and easy high-level programming language, best and easy to understand for beginners. Its built-in high-level data structures, together with dynamic typing & binding, make it an excellent choice for rapid application development. Its simple syntax makes it more readable, which ultimately reduces the cost of maintaining the program. It supports various packages and modules that emphasize reusing the code and enhancing the program’s modularity. Its extensive standard library and several interpreters are available free of cost as well as online. Python’s capability of increased productivity makes the programmers fall in love with the language.
Moreover, the edit, test, and debug cycle is incredibly fast as there is no complication step involved. Python makes machine learning principles simple to learn and comprehend. It gives a bird’s eye view of how to step through a small or big machine learning project. This article is about what a logistic regression is and its classifiers. Let’s start with the fundamentals of logistic regression.
Logistic Regression Definition
Logistic regression is a classification algorithm. An appropriate regression analysis algorithm from the fraternity of machine learning describes data. It explains the relationship between multiple variables, i.e., ratio level or interval independent variable, ordinal or nominal dependent binary variable. Logistic regression is generally used in statistical models to understand the data and the relationship between dependent and independent variables by predicting the probabilities of categorical dependent variables. As the number of data increases rapidly, the strength of computing power and algorithm improvement is rising, enhancing the importance of machine learning and data science. Across machine learning, classification has become the essential area, and one of its basic methods is logistic regression. By the end of this article, you’ll be able to implement logistic regression on various types of data. Let us begin to apply suitable classes, functions, and appropriate packages to perform logistic regression in python. One of the common python packages for logistic regression is sklearn. Here, we will show you a step-by-step practical example of logistic regression sklearn in python to help you understand how to implement logistic regression sklearn in python.
Steps to implement logistic regression sklearn in python
Step 1: Collect the data
To start with a small or big project, the first thing you need is the data on which you will build a logistic regression model. Here is the command to prepare the model for the dataset.
Step 2: Import the necessary packages of python
Once you install the dataprep, the next step is to import the packages needed to implement the logistic regression. Here, we are learning about the sklearn package, which is basically used to build the logistic regression model in python. The following packages need to be installed:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import confusion_matrix
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, classification_report
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import accuracy_score
from imblearn.over_sampling import SMOTE
Step 3: Load the data to build a dataframe
The next step is to capture the dataset, for which you need the following command to use:
This way, you can import the data from an external file; however, alternatively, you can define the dataset in the form of an array.
Step 4: Creating the logistic regression after loading the data
The next step is to develop the logistic regression in python after the data has been put into a python application. In this step, you need to set the dependent and independent variables. Here is how you can set the variable:
The ‘X’ variable represents the independent variable, and the ‘Y’ variable represents the dependent variable. Now apply the train_text_split function to set the testing and training size of the dataset.
Step 5: Apply logistic regression
Now apply the logistic regression by following the command given below:
# Fitting the Model
y_pred = model.predict(X_test)
Step 6: Plot the confusion matrix
The final part is to plot the confusion matrix which shows the accuracy in true positive and false positive form.
# plot the confusion matrix
# plot the confusion matrix
f,ax = plt.subplots(figsize=(8, 8))
sns.heatmap(confusion_mtx, annot=True, linewidths=0.01,cmap="Greens",linecolor="gray", fmt= '.1f',ax=ax)
To print the accuracy or, in other words, the classification report, use the following command:
Once you run all the commands, you will get a confusion matrix as well as a classification report. Take a look at the output below.
True positive (tp), false negative (fn), true negative (tn), and false positive (fp) are the four core values in the confusion matrix.
Classification report provides the accuracy of the trained model, which can be achieved by using the formula:
This article taught us the logistic regression and sklearn library in python. The data is explained, and the link between the dependent and independent variables is described using logistic regression. The sklearn library in python is mostly used in statistical data where prediction or probability is required to be known.