A refined primal-dual analysis of the implicit bias
Recent work shows that gradient descent on linearly separable data is implicitly biased towards the maximum margin solution. However, no convergence rate which is tight in both n (the dataset size) and t (the training time) is given. This work proves that the normalized gradient descent iterates converge to the maximum margin solution at a rate of O(ln(n)/ ln(t)), which is tight in both n and t. The proof is via a dual convergence result: gradient descent induces a multiplicative weights update on the (normalized) SVM dual objective, whose convergence rate leads to the tight implicit bias rate.
READ FULL TEXT