Implicit bias of any algorithm: bounding bias via margin

11/12/2020

∙

Consider n points x_1,…,x_n in finite-dimensional euclidean space, each having one of two colors. Suppose there exists a separating hyperplane (identified with its unit normal vector w) for the points, i.e a hyperplane such that points of same color lie on the same side of the hyperplane. We measure the quality of such a hyperplane by its margin γ(w), defined as minimum distance between any of the points x_i and the hyperplane. In this paper, we prove that the margin function γ satisfies a nonsmooth Kurdyka-Lojasiewicz inequality with exponent 1/2. This result has far-reaching consequences. For example, let γ^opt be the maximum possible margin for the problem and let w^opt be the parameter for the hyperplane which attains this value. Given any other separating hyperplane with parameter w, let d(w):=w-w^opt be the euclidean distance between w and w^opt, also called the bias of w. From the previous KL-inequality, we deduce that (γ^opt-γ(w)) / R ≤ d(w) ≤ 2√((γ^opt-γ(w))/γ^opt), where R:=max_i x_i is the maximum distance of the points x_i from the origin. Consequently, for any optimization algorithm (gradient-descent or not), the bias of the iterates converges at least as fast as the square-root of the rate of their convergence of the margin. Thus, our work provides a generic tool for analyzing the implicit bias of any algorithm in terms of its margin, in situations where a specialized analysis might not be available: it is sufficient to establish a good rate for converge of the margin, a task which is usually much easier.

READ FULL TEXT

Implicit bias of any algorithm: bounding bias via margin

A refined primal-dual analysis of the implicit bias

Faster Margin Maximization Rates for Generic Optimization Methods

The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks

Projective toric codes over hypersimplices

Implicit Manifold Reconstruction

On the Implicit Bias of Linear Equivariant Steerable Networks: Margin, Generalization, and Their Equivalence to Data Augmentation

Rate of convergence of the smoothed empirical Wasserstein distance

Implicit bias of any algorithm: bounding bias via margin

Related Research

A refined primal-dual analysis of the implicit bias

Faster Margin Maximization Rates for Generic Optimization Methods

The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks

Projective toric codes over hypersimplices

Implicit Manifold Reconstruction

On the Implicit Bias of Linear Equivariant Steerable Networks: Margin, Generalization, and Their Equivalence to Data Augmentation

Rate of convergence of the smoothed empirical Wasserstein distance