Intra‑Ensemble in Neural Networks
This paper proposes an intra‑ensemble strategy that trains multiple sub‑networks within a single neural network using random training operations, width‑depth variations, and parameter sharing, achieving diverse models and improved performance comparable to traditional ensembles while adding only marginal parameter overhead.
Background: Improving model performance is a core challenge in machine learning; deep networks suffer from edge effects when deep, and ensemble methods are effective for further gains.
Proposed Intra‑Ensemble: An end‑to‑end strategy that trains several sub‑networks inside one neural network with minimal extra parameters because most weights are shared. Random training increases sub‑network diversity, boosting ensemble effect.
Related Knowledge: Review of ensemble techniques (bagging, boosting, stacking) and neural architecture search (NAS) methods such as DARTS, ProxylessNAS, FBNet, which inspire parameter‑efficient designs.
Parameter Sharing: Sub‑networks share the majority of weights; only batch‑norm statistics are kept separate for each width to maintain stability.
Training Sub‑Networks: Define a list of width ratios and depth choices; use a single network to host sub‑networks of varying width and depth. A switched batch‑norm (S‑BN) allows training across widths with negligible parameter increase.
Random Training Operations: Four operations increase diversity: 1. Random Cut (RC) – mask a contiguous block of channels. 2. Random Offset (RO) – shift channel indices. 3. Shuffle Channel (SC) – randomly reorder channels. 4. Depth Operation – random skip (RS) of layers or shuffle layer (SL) order.
Similarity Metric: Defines similarity as the proportion of test images that produce identical outputs across sub‑networks, balancing accuracy and diversity.
Combination Strategies: Voting, averaging, and stacking are evaluated; stacking with random cut yields the best results.
Experiments: Extensive tests show that intra‑ensemble improves accuracy with only a slight parameter increase, outperforming many NAS‑derived models and matching traditional ensembles while being more resource‑efficient.
Conclusion: Intra‑ensemble combines multiple sub‑networks within a single model, leveraging random training and parameter sharing to achieve high‑accuracy, diverse ensembles with minimal overhead, and is effective across architectures and datasets.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.