Implicit bias of any algorithm: bounding bias via margin

11/12/2020
by   Elvis Dohmatob, et al.
0

Consider n points x_1,…,x_n in finite-dimensional euclidean space, each having one of two colors. Suppose there exists a separating hyperplane (identified with its unit normal vector w) for the points, i.e a hyperplane such that points of same color lie on the same side of the hyperplane. We measure the quality of such a hyperplane by its margin γ(w), defined as minimum distance between any of the points x_i and the hyperplane. In this paper, we prove that the margin function γ satisfies a nonsmooth Kurdyka-Lojasiewicz inequality with exponent 1/2. This result has far-reaching consequences. For example, let γ^opt be the maximum possible margin for the problem and let w^opt be the parameter for the hyperplane which attains this value. Given any other separating hyperplane with parameter w, let d(w):=w-w^opt be the euclidean distance between w and w^opt, also called the bias of w. From the previous KL-inequality, we deduce that (γ^opt-γ(w)) / R ≤ d(w) ≤ 2√((γ^opt-γ(w))/γ^opt), where R:=max_i x_i is the maximum distance of the points x_i from the origin. Consequently, for any optimization algorithm (gradient-descent or not), the bias of the iterates converges at least as fast as the square-root of the rate of their convergence of the margin. Thus, our work provides a generic tool for analyzing the implicit bias of any algorithm in terms of its margin, in situations where a specialized analysis might not be available: it is sufficient to establish a good rate for converge of the margin, a task which is usually much easier.

READ FULL TEXT
research
06/11/2019

A refined primal-dual analysis of the implicit bias

Recent work shows that gradient descent on linearly separable data is im...
research
05/27/2023

Faster Margin Maximization Rates for Generic Optimization Methods

First-order optimization methods tend to inherently favor certain soluti...
research
06/20/2023

The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks

We study the implicit bias of batch normalization trained by gradient de...
research
07/30/2019

Projective toric codes over hypersimplices

Let d≥ 1 be an integer, and let P be the convex hull in R^s of all integ...
research
04/07/2019

Implicit Manifold Reconstruction

Let M⊂R^d be a compact, smooth and boundaryless manifold with dimension ...
research
03/07/2023

On the Implicit Bias of Linear Equivariant Steerable Networks: Margin, Generalization, and Their Equivalence to Data Augmentation

We study the implicit bias of gradient flow on linear equivariant steera...
research
05/04/2022

Rate of convergence of the smoothed empirical Wasserstein distance

Consider an empirical measure ℙ_n induced by n iid samples from a d-dime...

Please sign up or login with your details

Forgot password? Click here to reset