Ensemble Techniques in Machine Learning - A Practical Guide to Bagging, Boosting, Stacking, Blending, and Bayesian Model Averaging

Jun 10, 2021 · 10 min read · machine learning data science ·

There are several types of ensemble techniques in machine learning, including: Bagging, Boosting, Stacking, Blending, Bootstrapped ensembles, Bayesian model averaging.

Bagging

Bagging (short for bootstrapped aggregating) is an ensemble technique that involves training multiple models on different subsets of the training data, and then averaging the predictions of the individual models to make the final prediction. This can be done with decision trees, neural networks, or any other type of model.

One of the main advantages of bagging is that it can reduce overfitting, which is when a model performs well on the training data but poorly on new, unseen data. By training multiple models on different subsets of the data, bagging can help to reduce the variance of the model, which can lead to improved generalization and better performance on new data.

To understand how bagging works, let's consider a simple example using decision trees. Suppose we have a dataset with 100 samples and 10 features. We can create 10 different subsets of the data by sampling with replacement from the original dataset. This means that some samples may appear in multiple subsets, while others may not appear at all. We can then train a decision tree model on each of the 10 subsets, resulting in 10 different models.

To make a prediction for a new sample, we can pass it through each of the 10 models and average the predictions. For example, if four of the models predict that the sample is positive, and six predict that it is negative, the final prediction would be negative.

Bagging is often used in conjunction with decision trees because decision trees are prone to overfitting. By training multiple decision trees on different subsets of the data, bagging can help to reduce the variance of the model and improve its generalization performance.

Here's a Python code that demonstrates how to implement bagging using the scikit-learn library:

 1from sklearn.ensemble import BaggingClassifier
 2from sklearn.tree import DecisionTreeClassifier
 3
 4# Create the base classifier
 5base_classifier = DecisionTreeClassifier(max_depth=4)
 6
 7# Create the bagging classifier
 8bagging_classifier = BaggingClassifier(base_classifier, n_estimators=10, max_samples=0.8, max_features=0.8)
 9
10# Train the classifier on the training data
11bagging_classifier.fit(X_train, y_train)
12
13# Make predictions on the test data
14predictions = bagging_classifier.predict(X_test)

In this example, we've created a base classifier using a decision tree with a maximum depth of 4. We've then created a bagging classifier that trains 10 decision trees on different subsets of the data (80% of the samples and 80% of the features). Finally, we've trained the classifier on the training data and made predictions on the test data.

Bagging is a simple and effective way to improve the performance of machine learning models by reducing overfitting and increasing the robustness of the model. It can be applied to a wide range of models, including decision trees, neural networks, and other types of models.

Boosting

Boosting is an ensemble technique that involves training a sequence of models, where each model tries to correct the mistakes of the previous model. The most common example of boosting is gradient boosting, which is often used for decision trees.

One of the main advantages of boosting is that it can improve the performance of weak models by combining their predictions in a way that reduces bias and variance. Boosting algorithms are based on the idea that it is possible to train a series of simple models that can be combined to form a more powerful model.

To understand how boosting works, let's consider a simple example using decision trees. Suppose we have a dataset with 100 samples and 10 features. We can train a decision tree model on the data, and then calculate the error of the model (the difference between the predicted output and the true output).

Next, we can train a second decision tree model on the data, but this time, we weight the samples differently. For example, we might give higher weights to samples that were misclassified by the first model, and lower weights to samples that were correctly classified. By doing this, we are trying to correct the mistakes of the first model by training a new model that focuses more on the difficult samples.

We can repeat this process multiple times, training additional decision tree models on the data and weighting the samples differently at each iteration. The final prediction is made by combining the predictions of the individual models, typically by taking a weighted average.

Here's a Python code that demonstrates how to implement gradient boosting using the scikit-learn library:

 1from sklearn.ensemble import GradientBoostingClassifier
 2
 3# Create the gradient boosting classifier
 4gb = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, max_depth=3)
 5
 6# Train the classifier on the training data
 7gb.fit(X_train, y_train)
 8
 9# Make predictions on the test data
10predictions = gb.predict(X_test)

In this example, we've created a gradient boosting classifier that trains 100 decision trees with a learning rate of 0.1 and a maximum depth of 3. We've then trained the classifier on the training data and made predictions on the test data.

Stacking

Stacking is an ensemble technique that involves training a sequence of models, where the output of each model is used as input to the next model in the sequence. The final prediction is made by a "meta-model" that takes the output of the other models as input.

One of the main advantages of stacking is that it can improve the performance of machine learning models by combining the predictions of multiple models in a way that reduces bias and variance. Stacking algorithms are based on the idea that it is possible to train a series of simple models, and then use a more powerful model to combine their predictions in a way that is more accurate than any individual model.

To understand how stacking works, let's consider a simple example using decision trees and logistic regression. Suppose we have a dataset with 100 samples and 10 features. We can train a decision tree model and a logistic regression model on the data, and then use the predictions of these models as input to a meta-model (in this case, another logistic regression model).

To train the meta-model, we create a new dataset that consists of the predictions of the decision tree and logistic regression models as features, and the true output as the target variable. We can then train the meta-model on this new dataset, and use it to make predictions on new samples.

Here's a Python code that demonstrates how to implement stacking using the scikit-learn library:

 1from sklearn.ensemble import StackingClassifier
 2from sklearn.tree import DecisionTreeClassifier
 3from sklearn.linear_model import LogisticRegression
 4
 5# Create the base classifiers
 6decision_tree = DecisionTreeClassifier(max_depth=4)
 7logistic_regression = LogisticRegression()
 8
 9# Create the meta-model
10meta_model = LogisticRegression()
11
12# Create the stacking classifier
13stacking_classifier = StackingClassifier(estimators=[('dt', decision_tree), ('lr', logistic_regression)], final_estimator=meta_model)
14
15# Train the classifier on the training data
16stacking_classifier.fit(X_train, y_train)
17
18# Make predictions on the test data
19predictions = stacking_classifier.predict(X_test)

In this example, we've created a decision tree and a logistic regression model as the base classifiers, and another logistic regression model as the meta-model. We've then created a stacking classifier that combines the predictions of the base classifiers and uses the meta-model to make the final prediction. Finally, we've trained the classifier on the training data and made predictions on the test data.

Blending

Blending is an ensemble technique that involves training multiple models on the entire training set, and then averaging their predictions on the test set.

To understand how blending works, let's consider a simple example using decision trees and logistic regression. Suppose we have a dataset with 100 samples and 10 features. We can train a decision tree model and a logistic regression model on the entire training set, and then make predictions on the test set using each model.

To blend the predictions of the two models, we can simply take the average of their predictions. For example, if the decision tree model predicts that a sample is positive with probability 0.6, and the logistic regression model predicts that it is positive with probability 0.7, the blended prediction would be 0.65 (0.6 + 0.7)/2.

Here's a Python code that demonstrates how to implement blending using the scikit-learn library:

 1from sklearn.tree import DecisionTreeClassifier
 2from sklearn.linear_model import LogisticRegression
 3
 4# Create the decision tree and logistic regression models
 5decision_tree = DecisionTreeClassifier(max_depth=4)
 6logistic_regression = LogisticRegression()
 7
 8# Train the models on the training data
 9decision_tree.fit(X_train, y_train)
10logistic_regression.fit(X_train, y_train)
11
12# Make predictions on the test data
13predictions_dt = decision_tree.predict_proba(X_test)[:,1]
14predictions_lr = logistic_regression.predict_proba(X_test)[:,1]
15
16# Blend the predictions
17predictions = (predictions_dt + predictions_lr)/2

In this example, we've created a decision tree and a logistic regression model, and trained them on the training data. We've then made predictions on the test data using each model, and blended the predictions by taking the average of the two.

Bootstrapped ensembles

Bootstrapped ensembles is an ensemble technique that involves creating multiple versions of the training set by sampling with replacement, and then training a separate model on each version. The final prediction is made by averaging the predictions of the individual models.

Bootstrapped ensembles are based on the idea that it is possible to create multiple versions of the training set by sampling with replacement, and then train a separate model on each version. This can help to reduce the variance of the model, which can lead to improved generalization and better performance on new data.

To understand how bootstrapped ensembles work, let's consider a simple example using decision trees. Suppose we have a dataset with 100 samples and 10 features. We can create 10 different versions of the training set by sampling with replacement from the original dataset. This means that some samples may appear in multiple versions, while others may not appear at all. We can then train a decision tree model on each of the 10 versions, resulting in 10 different models.

Here's some example Python code that demonstrates how to implement bootstrapped ensembles using the scikit-learn library:

 1from sklearn.ensemble import BaggingClassifier
 2from sklearn.tree import DecisionTreeClassifier
 3
 4# Create the base classifier
 5base_classifier = DecisionTreeClassifier(max_depth=4)
 6
 7# Create the bagging classifier
 8bagging_classifier = BaggingClassifier(base_classifier, n_estimators=10, bootstrap=True, bootstrap_features=False)
 9
10# Train the classifier on the training data
11bagging_classifier.fit(X_train, y_train)
12
13# Make predictions on the test data
14predictions = bagging_classifier.predict(X_test)

In this example, we've created a base classifier using a decision tree with a maximum depth of 4. We've then created a bagging classifier that trains 10 decision trees on different versions of the training set (created by sampling with replacement). Finally, we've trained the classifier on the training data and made predictions on the test data.

Bayesian model averaging

Bayesian model averaging (BMA) is an ensemble technique that involves training multiple models on the training data, and then using Bayesian techniques to combine the predictions of the individual models.

One of the main advantages of BMA is that it can improve the performance of machine learning models by incorporating uncertainty into the model selection process. BMA algorithms are based on the idea that it is possible to train a series of simple models, and then use Bayesian techniques to combine their predictions in a way that takes into account the uncertainty of each model.

To understand how BMA works, let's consider a simple example using decision trees and logistic regression. Suppose we have a dataset with 100 samples and 10 features. We can train a decision tree model and a logistic regression model on the data, and then use the predictions of these models to calculate the posterior probability of each class for each sample.

To combine the predictions of the two models, we can use Bayesian model averaging to calculate the weighted average of the posterior probabilities, where the weights are determined by the relative performance of the models. For example, if the decision tree model has a higher accuracy than the logistic regression model, it would receive a higher weight in the weighted average.

Here's some example Python code that demonstrates how to implement BMA using the scikit-learn library:

 1from sklearn.ensemble import BaggingClassifier
 2from sklearn.tree import DecisionTreeClassifier
 3from sklearn.linear_model import LogisticRegression
 4
 5# Create the decision tree and logistic regression models
 6decision_tree = DecisionTreeClassifier(max_depth=4)
 7logistic_regression = LogisticRegression()
 8
 9# Train the models on the training data
10decision_tree.fit(X_train, y_train)
11logistic_regression.fit(X_train, y_train)
12
13# Make predictions on the test data
14predictions_dt = decision_tree.predict_proba(X_test)
15predictions_lr = logistic_regression.predict_proba(X_test)
16
17# Calculate the weighted average of the posterior probabilities
18weights = [0.7, 0.3]  # weights determined by the relative performance of the models
19predictions = weights[0]*predictions_dt + weights

Author: Sadman Kabir Soumik