Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers

07/28/2021
by   Colin Wei, et al.
0

A common lens to theoretically study neural net architectures is to analyze the functions they can approximate. However, the constructions from approximation theory often have unrealistic aspects, for example, reliance on infinite precision to memorize target function values, which make these results potentially less meaningful. To address these issues, this work proposes a formal definition of statistically meaningful approximation which requires the approximating network to exhibit good statistical learnability. We present case studies on statistically meaningful approximation for two classes of functions: boolean circuits and Turing machines. We show that overparameterized feedforward neural nets can statistically meaningfully approximate boolean circuits with sample complexity depending only polynomially on the circuit size, not the size of the approximating network. In addition, we show that transformers can statistically meaningfully approximate Turing machines with computation time bounded by T, requiring sample complexity polynomial in the alphabet size, state space size, and log (T). Our analysis introduces new tools for generalization bounds that provide much tighter sample complexity guarantees than the typical VC-dimension or norm-based bounds, which may be of independent interest.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/23/2021

PAC-learning gains of Turing machines over circuits and neural networks

A caveat to many applications of the current Deep Learning approach is t...
research
10/26/2020

The sample complexity of level set approximation

We study the problem of approximating the level set of an unknown functi...
research
09/18/2021

Separating Circuits : Switching Lemmas and Random Restrictions

This was submitted as a final project for CS254B, taught by Li Yang Tan ...
research
10/13/2019

Generalization Bounds for Neural Networks via Approximate Description Length

We investigate the sample complexity of networks with bounds on the magn...
research
02/07/2018

A Schematic Definition of Quantum Polynomial Time Computability

In the past four decades, the notion of quantum polynomial-time computab...
research
04/02/2021

Linear Systems can be Hard to Learn

In this paper, we investigate when system identification is statisticall...
research
04/19/2020

The Space of Functions Computed By Deep Layered Machines

We study the space of Boolean functions computed by random layered machi...

Please sign up or login with your details

Forgot password? Click here to reset