On the Convergence Properties of Optimal AdaBoost

12/05/2012
by   Joshua Belanich, et al.
0

AdaBoost is one of the most popular machine-learning algorithms. It is simple to implement and often found very effective by practitioners, while still being mathematically elegant and theoretically sound. AdaBoost's behavior in practice, and in particular the test-error behavior, has puzzled many eminent researchers for over a decade: It seems to defy our general intuition in machine learning regarding the fundamental trade-off between model complexity and generalization performance. In this paper, we establish the convergence of "Optimal AdaBoost," a term coined by Rudin, Daubechies, and Schapire in 2004. We prove the convergence, with the number of rounds, of the classifier itself, its generalization error, and its resulting margins for fixed data sets, under certain reasonable conditions. More generally, we prove that the time/per-round average of almost any function of the example weights converges. Our approach is to frame AdaBoost as a dynamical system, to provide sufficient conditions for the existence of an invariant measure, and to employ tools from ergodic theory. Unlike previous work, we do not assume AdaBoost cycles; actually, we present empirical evidence against it on real-world datasets. Our main theoretical results hold under a weaker condition. We show sufficient empirical evidence that Optimal AdaBoost always met the condition on every real-world dataset we tried. Our results formally ground future convergence-rate analyses, and may even provide opportunities for slight algorithmic modifications to optimize the generalization ability of AdaBoost classifiers, thus reducing a practitioner's burden of deciding how long to run the algorithm.

READ FULL TEXT
research
07/19/2019

On Linear Convergence of Weighted Kernel Herding

We provide a novel convergence analysis of two popular sampling algorith...
research
08/16/2022

On the generalization of learning algorithms that do not converge

Generalization analyses of deep learning typically assume that the train...
research
05/16/2018

On the Convergence of the SINDy Algorithm

One way to understand time-series data is to identify the underlying dyn...
research
10/14/2016

Generalization Error of Invariant Classifiers

This paper studies the generalization error of invariant classifiers. In...
research
05/26/2015

Some Open Problems in Optimal AdaBoost and Decision Stumps

The significance of the study of the theoretical and practical propertie...
research
10/26/2022

Is Out-of-Distribution Detection Learnable?

Supervised learning aims to train a classifier under the assumption that...
research
10/05/2022

Spectral Regularization Allows Data-frugal Learning over Combinatorial Spaces

Data-driven machine learning models are being increasingly employed in s...

Please sign up or login with your details

Forgot password? Click here to reset