Learning in the Presence of Corruption

04/01/2015
by   Brendan van Rooyen, et al.
0

In supervised learning one wishes to identify a pattern present in a joint distribution P, of instances, label pairs, by providing a function f from instances to labels that has low risk E_Pℓ(y,f(x)). To do so, the learner is given access to n iid samples drawn from P. In many real world problems clean samples are not available. Rather, the learner is given access to samples from a corrupted distribution P̃ from which to learn, while the goal of predicting the clean pattern remains. There are many different types of corruption one can consider, and as of yet there is no general means to compare the relative ease of learning under these different corruption processes. In this paper we develop a general framework for tackling such problems as well as introducing upper and lower bounds on the risk for learning in the presence of corruption. Our ultimate goal is to be able to make informed economic decisions in regards to the acquisition of data sets. For a certain subclass of corruption processes (those that are reconstructible) we achieve this goal in a particular sense. Our lower bounds are in terms of the coefficient of ergodicity, a simple to calculate property of stochastic matrices. Our upper bounds proceed via a generalization of the method of unbiased estimators appearing in recent work of Natarajan et al and implicit in the earlier work of Kearns.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2022

Near-Optimal Statistical Query Lower Bounds for Agnostically Learning Intersections of Halfspaces with Gaussian Marginals

We consider the well-studied problem of learning intersections of halfsp...
research
10/02/2022

Learning Algorithm Generalization Error Bounds via Auxiliary Distributions

Generalization error boundaries are essential for comprehending how well...
research
01/27/2020

Naive Exploration is Optimal for Online LQR

We consider the problem of online adaptive control of the linear quadrat...
research
06/29/2020

Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and ReLUs under Gaussian Marginals

We study the fundamental problems of agnostically learning halfspaces an...
research
10/18/2022

SQ Lower Bounds for Learning Single Neurons with Massart Noise

We study the problem of PAC learning a single neuron in the presence of ...
research
06/09/2022

Optimal SQ Lower Bounds for Robustly Learning Discrete Product Distributions and Ising Models

We establish optimal Statistical Query (SQ) lower bounds for robustly le...
research
07/02/2013

A Statistical Learning Theory Framework for Supervised Pattern Discovery

This paper formalizes a latent variable inference problem we call super...

Please sign up or login with your details

Forgot password? Click here to reset