Python

Adaboost in Sklearn

“Adaboost is a gradient boosting technique used to convert weak to strong learners. It is widely used in various competitive Machine Learning platforms. This article will discuss the Adaboost algorithm, its uses, and its implementation in sklearn.”

What is Ensemble Learning?

By combining many models, ensemble learning enhances machine learning outcomes. In comparison to using a single model, this strategy enables the generation of greater prediction performance. In order to reduce variance (bagging), reduce bias (boosting), and improve predictions, ensemble approaches blend multiple machine learning techniques into one predictive model (stacking).

What is Adaboost?

Adaboost is an example of “Ensemble Learning,” which involves using many learners to create a more effective learning algorithm. Adaboost operates by selecting a basic algorithm (such as decision trees) and incrementally enhancing it by considering the improperly categorized samples in the training set. We select a basic method and give each training example the same weight. The training set is subjected to the base algorithm at each iteration, and the weights of the cases that were mistakenly categorized are increased. We apply the base learner to the training set with updated weights each time we repeat “n” times. The weighted average of the “n” learners makes up the final model.

Why Do We Use Adaboost?

Since the input parameters in the Adaboost algorithm are not simultaneously optimized, it is less affected by overfitting. By applying Adaboost, weak classifiers’ accuracy can be increased. Instead of binary classification problems, Adaboost is also utilized to solve text and image classification problems. Adaboost is also frequently employed in challenging Machine Learning issues.

Implementing Adaboost in sklearn

Importing libraries

from sklearn.ensemble import AdaBoostClassifier

from sklearn.datasets import make_classification

Creating the dataset

X, y = make_classification(n_samples = 500, n_features = 4, n_informative=2, n_redundant=0, random_state=0, shuffle=False)

print("Feature data is", X)

print("Label data is", y)

Output

Feature data is [[ 0.44229321 0.08089276 0.54077359 -1.81807763]

[ 1.34699113 1.48361993 -0.04932407 0.2390336 ]

[-0.54639809 -1.1629494 -1.00033035 1.67398571]

...

[ 0.8903941 1.08980087 -1.53292105 -1.71197016]

[ 0.73135482 1.25041511 0.04613506 -0.95837448]

[ 0.26852399 1.70213738 -0.08081161 -0.70385904]]

Label data is [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]

Creating the model and making predictions

clf = AdaBoostClassifier(n_estimators=100, random_state=0)

clf.fit(X, y)

print("Output Label is", clf.predict([[1.5, 1, 0.5, -0.5]]))

print("Classification score is", clf.score(X, y))

Output

Output Label is [1]

Classification score is 0.94

Conclusion

We discussed the Adaboost algorithm in Machine Learning, including ensemble learning, its advantages and implementation in sklearn. This is a helpful algorithm as it uses a set of models to decide the output instead of one and also converts the weak learners to strong learners. Sklearn provides Adaboost implementation in the “ensemble” class, where we provide custom parameters for the model.

About the author

Simran Kaur

Simran works as a technical writer. The graduate in MS Computer Science from the well known CS hub, aka Silicon Valley, is also an editor of the website. She enjoys writing about any tech topic, including programming, algorithms, cloud, data science, and AI. Travelling, sketching, and gardening are the hobbies that interest her.