Understanding AdaBoost: Theory, Scikit‑learn Library, and Practical Implementation in Python
This article introduces the AdaBoost algorithm, explains its boosting principle, describes the AdaBoostClassifier and AdaBoostRegressor classes in scikit‑learn, provides a complete Python example with data loading, model training, prediction, evaluation, and visualisation, and discusses the algorithm’s advantages, disadvantages, and detailed iterative process.
Overview
The article explains the AdaBoost algorithm, covering its introduction, the scikit‑learn library support, a practical code example, the underlying theory, and a summary of its pros and cons.
Reference Materials
AdaBoost principle analysis and practice
AdaBoost algorithm principle summary
AdaBoost library usage summary
Book: Zhou Zhihua – Machine Learning
Wikipedia – AdaBoost detailed entry
AdaBoost Algorithm Introduction
AdaBoost belongs to the boosting family of ensemble learning. Boosting iteratively trains weak learners, increasing the weight of mis‑classified samples so that subsequent learners focus on them, and finally combines all weak learners into a strong classifier.
AdaBoost Library Introduction (scikit‑learn)
scikit‑learn provides AdaBoostClassifier for classification and AdaBoostRegressor for regression. Both require a base estimator that supports sample weights. By default, the classifier uses DecisionTreeClassifier and the regressor uses DecisionTreeRegressor .
Key parameters:
base_estimator : the weak learner (any estimator supporting sample weights).
n_estimators : maximum number of weak learners (default 50). Too few may underfit; too many may overfit.
learning_rate : weight shrinkage factor ν (default 1). Smaller values require more estimators.
Decision‑tree parameters follow the CART settings (see internal documentation).
AdaBoost Practical Example
Import Modules
#coding=utf-8
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifierLoad Sample Data
# Use the built‑in digits dataset
digits = datasets.load_digits()Split Training and Test Sets
# Randomly allocate 75% of samples to training
x_train, x_test, y_train, y_test = train_test_split(
digits.data, digits.target, test_size=0.25, random_state=0)Train Model
# Train AdaBoost with a max tree depth of 2
clf = AdaBoostClassifier(
base_estimator=DecisionTreeClassifier(max_depth=2),
min_samples_split=20, min_samples_leaf=5,
learning_rate=0.7, n_estimators=200)
clf.fit(x_train, y_train)Prediction and Validation
# Predict on the test set
y_predict = clf.predict(x_test)Calculate Accuracy
diff = 0.0
for num in range(len(y_predict)):
if y_predict[num] != y_test[num]:
diff = diff + 1
rate = diff / len(y_predict)
print(1 - rate) # Test accuracy
print(clf.score(x_train, y_train)) # Training scoreRelationship Between n_estimators and Score
# Plot training and testing scores over the number of estimators
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
estimators_num = len(clf.estimators_)
X = range(1, estimators_num + 1)
ax.plot(list(X), list(clf.staged_score(x_train, y_train)), label='Training score')
ax.plot(list(X), list(clf.staged_score(x_test, y_test)), label='Testing score')
ax.set_xlabel('estimator num')
ax.set_ylabel('score')
ax.set_title('AdaBoostClassifier')
plt.show()AdaBoost Principle
Boosting starts with equal sample weights, trains a weak learner, computes its error, updates sample weights (increasing weights of mis‑classified samples), and repeats the process for T iterations. The final strong classifier is a weighted sum of the weak learners.
AdaBoost Process
1. Initialize all sample weights to 1/m. 2. For each iteration t = 1,…,T:
Select the weak classifier with the lowest weighted error.
Compute its weight a_t = 0.5 * ln((1‑e_t)/e_t).
Update sample weights: increase for mis‑classified samples, decrease for correctly classified ones, then normalize.
3. Combine weak classifiers using their weights to form the final strong classifier.
Detailed AdaBoost Example
A toy dataset with 10 points is used. Three simple axis‑aligned decision stumps serve as weak classifiers. The article walks through three iterations, showing weight updates, error calculations, classifier selection, and the resulting combined classifier after each round.
Summary
Advantages
High classification accuracy.
Flexibility to use various weak learners.
Simple binary classifier construction and interpretable results.
Relatively resistant to over‑fitting.
Disadvantages
Sensitive to noisy data and outliers.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.