Python

# Ridge Regression in Sklearn

Ridge Regression is a technique in machine learning to regularize the data used for performing the linear regression. It is an optimization of simple linear regression. In this article, we are going to discuss about ride Ridge Regression, its advantages, the problems it solved, and its implementation in sklearn.

## What is Ridge Regression?

In order to increase the prediction accuracy, the statistical technique known as Ridge Regression reduces the magnitude of parameter estimations. It works especially well if your dataset contains correlated columns that you attempt to use as inputs (independent variables) into regression models, but none of your models produced very accurate results. In other words, Ridge Regression is a model tuning technique that is used for any multicollinear data analysis. The data are subjected to L2 regularization using this method.

The cost function for Ridge Regression is:

Min(||Y â€“ X(theta)||^2 + Î»||theta||^2)

## What is Multicollinearity?

The concept of multicollinearity is based on statistical research. Just when your independent variables have a significant degree of correlation. Collinearity does not directly affect the answer variable; rather, it concerns the interactions between the predictor variables or characteristics. The estimations of the regression coefficients may be inaccurate as a result of multicollinearity. It may potentially increase the regression coefficient standard errors and decrease the effectiveness of any t-tests. Multicollinearity can provide misleading results and p-values, increasing the model redundancy and lowering the predictability’s effectiveness and reliability.

• It guards against the model becoming overfit.
• It does not require objective estimators.
• Only a small amount of bias exists, allowing the estimates to be a quite accurate approximation of the genuine population numbers.
• When there is multicollinearity, the ridge estimator is quite useful for enhancing the least-squares estimate.

## Implementing Ridge Regression in Sklearn

Importing requires the following libraries:

from sklearn.linear_model import Ridge
import numpy as np

Create the dataset using the following command:

n_samples, n_features = 20, 4
data = np.random.RandomState(0)
y = data.randn(n_samples)
X = data.randn(n_samples, n_features)
print('Features are', X)
print('Labels are', y)

Output:

Features are [[-2.55298982  0.6536186   0.8644362  -0.74216502]
[ 2.26975462 -1.45436567  0.04575852 -0.18718385]
[ 1.53277921  1.46935877  0.15494743  0.37816252]
[-0.88778575 -1.98079647 -0.34791215  0.15634897]
[ 1.23029068  1.20237985 -0.38732682 -0.30230275]
[-1.04855297 -1.42001794 -1.70627019  1.9507754 ]
[-0.50965218 -0.4380743  -1.25279536  0.77749036]
[-1.61389785 -0.21274028 -0.89546656  0.3869025 ]
[-0.51080514 -1.18063218 -0.02818223  0.42833187]
[ 0.06651722  0.3024719  -0.63432209 -0.36274117]
[-0.67246045 -0.35955316 -0.81314628 -1.7262826 ]
[ 0.17742614 -0.40178094 -1.63019835  0.46278226]
[-0.90729836  0.0519454   0.72909056  0.12898291]
[ 1.13940068 -1.23482582  0.40234164 -0.68481009]
[-0.87079715 -0.57884966 -0.31155253  0.05616534]
[-1.16514984  0.90082649  0.46566244 -1.53624369]
[ 1.48825219  1.89588918  1.17877957 -0.17992484]
[-1.07075262  1.05445173 -0.40317695  1.22244507]
[ 0.20827498  0.97663904  0.3563664   0.70657317]
[ 0.01050002  1.78587049  0.12691209  0.40198936]]

Labels are [ 1.76405235  0.40015721  0.97873798  2.2408932   1.86755799 -0.97727788
0.95008842 -0.15135721 -0.10321885  0.4105985   0.14404357  1.45427351
0.76103773  0.12167502  0.44386323  0.33367433  1.49407907 -0.20515826
0.3130677  -0.85409574]

Creating and fitting the model:

model = Ridge(alpha=1.0)
model.fit(X, y)

## Conclusion

We discussed the Ridge Regression model in Machine Learning which is used to perform the L2 regularization on the data to prevent overfitting. It is an advanced version of a simple linear regression. We also discussed its implementation using sklearn.