On the Current State of Research in Explaining Ensemble Performance Using Margins

by   Waldyn Martinez, et al.
Miami University

Empirical evidence shows that ensembles, such as bagging, boosting, random and rotation forests, generally perform better in terms of their generalization error than individual classifiers. To explain this performance, Schapire et al. (1998) developed an upper bound on the generalization error of an ensemble based on the margins of the training data, from which it was concluded that larger margins should lead to lower generalization error, everything else being equal. Many other researchers have backed this assumption and presented tighter bounds on the generalization error based on either the margins or functions of the margins. For instance, Shen and Li (2010) provide evidence suggesting that the generalization error of a voting classifier might be reduced by increasing the mean and decreasing the variance of the margins. In this article we propose several techniques and empirically test whether the current state of research in explaining ensemble performance holds. We evaluate the proposed methods through experiments with real and simulated data sets.


page 1

page 2

page 3

page 4


On the Insufficiency of the Large Margins Theory in Explaining the Performance of Ensemble Methods

Boosting and other ensemble methods combine a large number of weak class...

Random Hyperboxes

This paper proposes a simple yet powerful ensemble classifier, called Ra...

In Search of Robust Measures of Generalization

One of the principal scientific challenges in deep learning is explainin...

Margins are Insufficient for Explaining Gradient Boosting

Boosting is one of the most successful ideas in machine learning, achiev...

Efficient Estimation of Generalization Error and Bias-Variance Components of Ensembles

For many applications, an ensemble of base classifiers is an effective s...

A Note on "Assessing Generalization of SGD via Disagreement"

Jiang et al. (2021) give empirical evidence that the average test error ...

A Sharp Bound on the Computation-Accuracy Tradeoff for Majority Voting Ensembles

When random forests are used for binary classification, an ensemble of t...

Please sign up or login with your details

Forgot password? Click here to reset