A more effective variation of the gradient boosting framework that includes both a linear model solver and tree learning methods is called Extreme Gradient Boosting (XGBoost). It is developed on the gradient boosting algorithm, which produces an ensemble of weak prediction models, often decision trees, as a prediction model. Competitions for machine learning often feature XGBoost. It is crucial to comprehend the causes and mechanisms behind this algorithm’s successful operation. We will learn about decision trees, gradient boosting, XGBoost, and its implementation in this article.
What are Decision Trees?
A computer algorithm called a decision tree algorithm creates a decision tree from a supplied dataset. The dataset will have attributes, sometimes known as features or characteristics, and a class attribute. Target and outcome are other words for class characteristics. This characteristic is what we hope to anticipate. The decision tree algorithm creates a decision tree model. A root node, leaf nodes, and edges make up the decision tree model. The decision tree algorithm is recursive, meaning that it calls itself again and continues processing the data until a halting condition is satisfied.
Example of Decision Tree from Wikipedia
What is Gradient Boosting?
For regression and classification problems, gradient boosting is a machine learning technique that generates a model in the form of a group of weak prediction models, often decision trees. Similar to other boosting techniques, it constructs the model in stages, but it generalizes them by enabling the optimization of any loss function. Three essential processes are involved in gradient boosting:
- Optimizing a loss function is the first step that must be taken. A differentiable loss function is required. How well a machine learning model fits the data for a particular phenomenon is determined by its loss function.
- Using a weak learner is the second step. A decision tree serves as the weak learner in gradient boosters. Regression trees that produce real values for splits and whose output can be added together are specifically used, allowing for the addition of output from successive models to correct residuals in predictions from the prior iteration.
- The third step involves adding a lot of weak learners. One by one, decision trees are added. When adding trees, a gradient descent approach is utilized to reduce loss. The gradient component of gradient boosters is that. In machine learning, gradient descent optimization is frequently employed to identify the parameters connected to a single model that enhances a particular loss function.
What is XGBoost?
A Python module for performing gradient boosting is called XGBoost (Extreme Gradient Boosting). Gradient-boosted decision trees serve as its foundation. It is a boosting algorithm that increases the model’s robustness and accuracy in various competitive platforms like Kaggle. The best algorithms for small to medium-sized structured or tabular data are based on decision trees. Model construction using XGBoost is quick and effective.
Where Can we Use XGBoost?
- In case of large number of observations in training data.
- Number of features are less than number of observations.
- When model performance metric are into consideration.
Implementation in sklearn
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
import xgboost
# creating the dataset
X, y = make_classification(n_samples = 1000, n_features = 5, n_informative = 2, n_redundant = 0, random_state = 0, shuffle = False)
print('Features are', X[:20])
print('Labels are', y[:20])
# splitting the data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
# creating XGBoost instance
xgb_classifier = xgboost.XGBClassifier()
# fitting the model onto the train data
xgb_classifier.fit(X_train, y_train)
# making the predictions
predictions = xgb_classifier.predict(X_test)
# printing 20 predictions
print('Predictions are', predictions[:20])
Output
[-2.9728827 -1.08878294 0.42281857 -3.11685659 0.64445203]
[-0.59614125 -1.37007001 -1.91374267 0.66356158 -0.1540724 ]
[-1.06894674 -1.17505738 1.19361168 -0.09816121 -0.88661426]
[-1.30526888 -0.96592566 -0.14735366 1.05980629 0.02624662]
[-2.18261832 -0.97011387 -0.11433516 0.74355352 0.21035937]
[-1.24797892 -1.13094525 -0.00592741 1.36606007 1.55511403]
[-1.35308792 -1.06633681 0.61332623 -0.28595915 1.49691099]
[-1.13449871 -1.27403448 1.18311956 0.71889717 -1.21607658]
[-0.38457445 -1.08840346 0.1406719 -0.74367217 -0.15901225]
[-1.0106506 -0.52017071 0.24005693 0.10015941 -0.47517511]
[-0.58310155 -1.18236446 1.27295375 -1.69613127 0.73018353]
[-0.29760388 -1.45995253 -1.85748327 0.38259814 -0.88690433]
[-0.86057581 -1.01530275 0.87830376 0.08645252 0.24770638]
[-2.47091771 -1.21696663 -1.01827933 -0.65457013 0.20721739]
[-1.33090082 -1.01316175 0.58356993 2.92909624 0.22285832]
[ 0.74840002 -0.91748674 0.97603753 -1.55693393 -1.32989186]
[-1.05483466 -0.9320408 -0.35549477 -1.1974277 1.48639925]
[-2.19888276 -1.17327072 -0.41021869 1.38218189 1.48678247]
[-0.68305478 -0.94253787 0.04277972 0.50179975 -0.05609947]]
Labels are [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
Predictions are [1 1 1 0 0 1 0 0 1 0 1 0 1 0 0 0 1 0 0 0]
Conclusion
We discussed the XGBoost algorithm in machine learning. This is an implementation of the gradient boosting technique, which is widely used in competitions today. We saw why this algorithm performs well and what it uses under the hood. Lastly, we also went through the implementation of XGBoost in Python.