Provable Benefit of Mixup for Finding Optimal Decision Boundaries

06/01/2023
by   Junsoo Oh, et al.
0

We investigate how pair-wise data augmentation techniques like Mixup affect the sample complexity of finding optimal decision boundaries in a binary linear classification problem. For a family of data distributions with a separability constant κ, we analyze how well the optimal classifier in terms of training loss aligns with the optimal one in test accuracy (i.e., Bayes optimal classifier). For vanilla training without augmentation, we uncover an interesting phenomenon named the curse of separability. As we increase κ to make the data distribution more separable, the sample complexity of vanilla training increases exponentially in κ; perhaps surprisingly, the task of finding optimal decision boundaries becomes harder for more separable distributions. For Mixup training, we show that Mixup mitigates this problem by significantly reducing the sample complexity. To this end, we develop new concentration results applicable to n^2 pair-wise augmented data points constructed from n independent data, by carefully dealing with dependencies between overlapping pairs. Lastly, we study other masking-based Mixup-style techniques and show that they can distort the training loss and make its minimizer converge to a suboptimal classifier in terms of test accuracy.

READ FULL TEXT
research
03/31/2023

Large Dimensional Independent Component Analysis: Statistical Optimality and Computational Tractability

In this paper, we investigate the optimal statistical performance and th...
research
02/12/2018

On the Sample Complexity of Learning from a Sequence of Experiments

We analyze the sample complexity of a new problem: learning from a seque...
research
04/05/2012

Distribution-Dependent Sample Complexity of Large Margin Learning

We obtain a tight distribution-specific characterization of the sample c...
research
12/19/2020

Sample Complexity of Adversarially Robust Linear Classification on Separated Data

We consider the sample complexity of learning with adversarial robustnes...
research
11/17/2022

On the Sample Complexity of Two-Layer Networks: Lipschitz vs. Element-Wise Lipschitz Activation

We investigate the sample complexity of bounded two-layer neural network...
research
02/15/2022

A Theory of PAC Learnability under Transformation Invariances

Transformation invariances are present in many real-world problems. For ...
research
05/06/2019

Picturing Bivariate Separable-Features for Univariate Vector Magnitudes in Large-Magnitude-Range Quantum Physics Data

We present study results from two experiments to empirically validate th...

Please sign up or login with your details

Forgot password? Click here to reset