Ensemble Learning
Voting
Model
Aggregate the predictions of several different models and predict the class that gets the most votes.
Due to the law of large number, even if each classifier were a weak learner (classifiers that perform only a little better than simply guessing), the ensemble could still be a strong learner.
Voting algorithm works best when the predictions are as independent from one another as possible.
-
Hard Voting
Predicts the class that gets the most votes from the individual models
-
Soft Voting
Predict the class with the highest probability, averaged over all the individual classifiers. Soft Voting is available when all the individual classifiers have
predict_proba()
method
Code
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
log_clf = LogisticRegression()
rnd_clf = RandomForestClassifier()
voting_clf = VotingClassifier(
estimators=[('lr', log_clf), ('rf', rnd_clf)],
voting='hard' # for soft voting, assign `soft`
)
voting_clf.fit(X_train, Y_train)
Bagging & Pasting
Model
Train the same based model on different random subsets of the training set. The final predictions is typically the most frequent predictions of from the individual models or average for regression.
Bagging: sampling process is performed with replacement
Pasting: sampling process is performed without replacement
Random Patches: sampling both training instances and features.
Random Subspaces: sampling features but keeping all training instances
Why the net result is generally better?
Each individual model has a higher bias than if it were trained on the original training set. However, the ensemble model will achieve a similar bias but lower variance than a single model trained on the original training set.
Comparison between Bagging and Pasting:
Without replacement during the sampling process, pasting results in a slightly lower bias than bagging because each based model is trained on a more diverse subset. However, each based model also ends up being less correlated; therefore, the variance of the ensemble is reduced.
Out-of-Bag Evaluation
Use the instances in the training set which is never sampled and used to train models to evaluate the performance of the ensemble model so that the ensemble model could be evaluated without a separate validation set.
Code
# If a based model has `predict_proba` method,
# BaggingClassifier automatically performs soft voting
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
bag_clf = BaggingClassifier(
DecisionTreeClassifier(),
n_estimators=500,
max_samples=100,
bootstrap=True, # for bagging, bootstrap=True; for pasting, bootstrap=False
)
bag_clf.fit(X_train, Y_train)
Out-of-bag evaluation
bag_clf = BaggingClassifier(
n_estimators=500,
max_samples=100,
bootstrap=True, # for bagging, bootstrap=True; for pasting, bootstrap=False
oob_score=True,
)
bag_clf.fit(X_train, Y_train)
print(bag_clf.oob_score_)
Random Forests
Model
Random Forests is an ensemble of Decision Trees, trained via the bagging method.
The algorithm introduces randomness when growing trees; instead of searching for the very best feature when splitting a node, it searches for the best feature among a random subset of features.
Code
from sklearn.ensemble import RandomForestClassifier
rnd_clf = RandomForestClassifier()
rnd_clf.fit(X_train, Y_train)
Boosting
AdaBoost
Training (Classification)
- Initialize $w^{(i)} = \frac{1}{m} \text{for i = 1 ... m}$
- For t = 1 ... T
- train $model_t$ and compute error on the training set
- Compute $model_t$'s weight $\alpha_t$. $\eta$ is learning rate
- Update $w^{(i)}$ for $i = 1,...,m$
Model Prediction (Classification)
Code (Classification)
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
ada_clf = AdaBoostClassifier(
DecisionTreeClassifier(max_depth=1),
n_estimators=200,
algorithm='SAMME.R',
learning_rate=0.5,
)
ada_clf.fit(X_train, Y_train)
Gradient Boosting
Training (Regression)
For t (step) = 1...T
- if t == 1:
- fit a model_t with the training dataset
- else:
- fit a a model_t with $\epsilon^{(i)}$
- calculate residual error
Model prediction (Regression)
Code (Regression)
from sklearn.ensemble import GradientBoostingRegressor
bg_reg = GradientBoostingRegressor()
bg_reg.fit(X_train, Y_train)