Making Progress Based on False Discoveries

04/19/2022
by   Roi Livni, et al.
0

We consider the question of adaptive data analysis within the framework of convex optimization. We ask how many samples are needed in order to compute ϵ-accurate estimates of O(1/ϵ^2) gradients queried by gradient descent, and we provide two intermediate answers to this question. First, we show that for a general analyst (not necessarily gradient descent) Ω(1/ϵ^3) samples are required. This rules out the possibility of a foolproof mechanism. Our construction builds upon a new lower bound (that may be of interest of its own right) for an analyst that may ask several non adaptive questions in a batch of fixed and known T rounds of adaptivity and requires a fraction of true discoveries. We show that for such an analyst Ω (√(T)/ϵ^2) samples are necessary. Second, we show that, under certain assumptions on the oracle, in an interaction with gradient descent Ω̃(1/ϵ^2.5) samples are necessary. Our assumptions are that the oracle has only first order access and is post-hoc generalizing. First order access means that it can only compute the gradients of the sampled function at points queried by the algorithm. Our assumption of post-hoc generalization follows from existing lower bounds for statistical queries. More generally then, we provide a generic reduction from the standard setting of statistical queries to the problem of estimating gradients queried by gradient descent. These results are in contrast with classical bounds that show that with O(1/ϵ^2) samples one can optimize the population risk to accuracy of O(ϵ) but, as it turns out, with spurious gradients.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/25/2019

Complexity of Highly Parallel Non-Smooth Convex Optimization

A landmark result of non-smooth convex optimization is that gradient des...
research
06/15/2018

The Limits of Post-Selection Generalization

While statistics and machine learning offers numerous methods for ensuri...
research
01/30/2019

Natural Analysts in Adaptive Data Analysis

Adaptive data analysis is frequently criticized for its pessimistic gene...
research
07/10/2023

Generalization Error of First-Order Methods for Statistical Learning with Generic Oracles

In this paper, we provide a novel framework for the analysis of generali...
research
04/02/2021

Information-constrained optimization: can adaptive processing of gradients help?

We revisit first-order optimization under local information constraints ...
research
02/25/2020

Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization

An open question in the Deep Learning community is why neural networks t...

Please sign up or login with your details

Forgot password? Click here to reset