Ensemble Learning: Concepts, Methods, and Applications in Deep Learning
This article provides a comprehensive overview of ensemble learning, explaining its principles, common classifiers, major ensemble strategies such as bagging, boosting, and stacking, and demonstrates practical deep‑learning ensemble techniques like Dropout, test‑time augmentation, and Snapshot ensembles with code examples.
Ensemble learning, also known as classifier ensembles, combines multiple base learners to improve predictive performance by reducing variance, bias, or both.
Typical ensemble strategies include bagging (parallel training of homogeneous weak learners and averaging), boosting (sequential training where each learner focuses on previously mis‑classified samples), and stacking (heterogeneous learners combined via a meta‑model).
Common base classifiers such as decision trees, Naïve Bayes, AdaBoost, support vector machines, and k‑nearest neighbors are introduced, along with their construction steps.
Bagging creates bootstrap samples, trains independent models, and aggregates predictions by averaging or voting; it assumes a sufficiently large original dataset for representativeness.
Boosting (e.g., AdaBoost and gradient boosting) iteratively fits weak learners, updates sample weights, and combines them with learned coefficients to form a strong learner with lower bias.
Stacking trains several diverse weak learners, uses their predictions as features for a meta‑learner (e.g., a neural network), and often employs k‑fold cross‑validation to utilize all data.
In deep learning, ensemble techniques such as Dropout, test‑time augmentation (TTA), and Snapshot ensembles are described, with PyTorch code examples illustrating model definition, forward pass, and TTA inference.
# 定义模型 class SVHN_Model1(nn.Module): def __init__(self): super(SVHN_Model1, self).__init__() self.cnn = nn.Sequential( nn.Conv2d(3, 16, kernel_size=(3, 3), stride=(2, 2)), nn.ReLU(), nn.Dropout(0.25), nn.MaxPool2d(2), nn.Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2)), nn.ReLU(), nn.Dropout(0.25), nn.MaxPool2d(2), ) self.fc1 = nn.Linear(32*3*7, 11) self.fc2 = nn.Linear(32*3*7, 11) self.fc3 = nn.Linear(32*3*7, 11) self.fc4 = nn.Linear(32*3*7, 11) self.fc5 = nn.Linear(32*3*7, 11) self.fc6 = nn.Linear(32*3*7, 11) def forward(self, img): feat = self.cnn(img) feat = feat.view(feat.shape[0], -1) c1 = self.fc1(feat) c2 = self.fc2(feat) c3 = self.fc3(feat) c4 = self.fc4(feat) c5 = self.fc5(feat) c6 = self.fc6(feat) return c1, c2, c3, c4, c5, c6
def predict(test_loader, model, tta=10): model.eval() test_pred_tta = None for _ in range(tta): test_pred = [] with torch.no_grad(): for i, (input, target) in enumerate(test_loader): c0, c1, c2, c3, c4, c5 = model(data[0]) output = np.concatenate([c0.data.numpy(), c1.data.numpy(), c2.data.numpy(), c3.data.numpy(), c4.data.numpy(), c5.data.numpy()], axis=1) test_pred.append(output) test_pred = np.vstack(test_pred) if test_pred_tta is None: test_pred_tta = test_pred else: test_pred_tta += test_pred return test_pred_tta
The article concludes with practical tips for post‑processing predictions in competition settings, including frequency‑based correction and length‑prediction models to refine final results.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.