What is Ridge Regression?
In order to increase the prediction accuracy, the statistical technique known as Ridge Regression reduces the magnitude of parameter estimations. It works especially well if your dataset contains correlated columns that you attempt to use as inputs (independent variables) into regression models, but none of your models produced very accurate results. In other words, Ridge Regression is a model tuning technique that is used for any multicollinear data analysis. The data are subjected to L2 regularization using this method.
The cost function for Ridge Regression is:
What is Multicollinearity?
The concept of multicollinearity is based on statistical research. Just when your independent variables have a significant degree of correlation. Collinearity does not directly affect the answer variable; rather, it concerns the interactions between the predictor variables or characteristics. The estimations of the regression coefficients may be inaccurate as a result of multicollinearity. It may potentially increase the regression coefficient standard errors and decrease the effectiveness of any t-tests. Multicollinearity can provide misleading results and p-values, increasing the model redundancy and lowering the predictability’s effectiveness and reliability.
Advantages of Ridge Regression
- It guards against the model becoming overfit.
- It does not require objective estimators.
- Only a small amount of bias exists, allowing the estimates to be a quite accurate approximation of the genuine population numbers.
- When there is multicollinearity, the ridge estimator is quite useful for enhancing the least-squares estimate.
Implementing Ridge Regression in Sklearn
Importing requires the following libraries:
import numpy as np
Create the dataset using the following command:
data = np.random.RandomState(0)
y = data.randn(n_samples)
X = data.randn(n_samples, n_features)
print('Features are', X)
print('Labels are', y)
Output:
[ 2.26975462 -1.45436567 0.04575852 -0.18718385]
[ 1.53277921 1.46935877 0.15494743 0.37816252]
[-0.88778575 -1.98079647 -0.34791215 0.15634897]
[ 1.23029068 1.20237985 -0.38732682 -0.30230275]
[-1.04855297 -1.42001794 -1.70627019 1.9507754 ]
[-0.50965218 -0.4380743 -1.25279536 0.77749036]
[-1.61389785 -0.21274028 -0.89546656 0.3869025 ]
[-0.51080514 -1.18063218 -0.02818223 0.42833187]
[ 0.06651722 0.3024719 -0.63432209 -0.36274117]
[-0.67246045 -0.35955316 -0.81314628 -1.7262826 ]
[ 0.17742614 -0.40178094 -1.63019835 0.46278226]
[-0.90729836 0.0519454 0.72909056 0.12898291]
[ 1.13940068 -1.23482582 0.40234164 -0.68481009]
[-0.87079715 -0.57884966 -0.31155253 0.05616534]
[-1.16514984 0.90082649 0.46566244 -1.53624369]
[ 1.48825219 1.89588918 1.17877957 -0.17992484]
[-1.07075262 1.05445173 -0.40317695 1.22244507]
[ 0.20827498 0.97663904 0.3563664 0.70657317]
[ 0.01050002 1.78587049 0.12691209 0.40198936]]
Labels are [ 1.76405235 0.40015721 0.97873798 2.2408932 1.86755799 -0.97727788
0.95008842 -0.15135721 -0.10321885 0.4105985 0.14404357 1.45427351
0.76103773 0.12167502 0.44386323 0.33367433 1.49407907 -0.20515826
0.3130677 -0.85409574]
Creating and fitting the model:
model.fit(X, y)
Conclusion
We discussed the Ridge Regression model in Machine Learning which is used to perform the L2 regularization on the data to prevent overfitting. It is an advanced version of a simple linear regression. We also discussed its implementation using sklearn.