The Learnability of In-Context Learning

03/14/2023
by   Noam Wies, et al.
0

In-context learning is a surprising and important phenomenon that emerged when modern language models were scaled to billions of learned parameters. Without modifying a large language model's weights, it can be tuned to perform various downstream natural language tasks simply by including concatenated training examples of these tasks in its input. Though disruptive for many practical applications of large language models, this emergent learning paradigm is not well understood from a theoretical perspective. In this paper, we propose a first-of-its-kind PAC based framework for in-context learnability, and use it to provide the first finite sample complexity results for the in-context learning setup. Our framework includes an initial pretraining phase, which fits a function to the pretraining distribution, and then a second in-context learning phase, which keeps this function constant and concatenates training examples of the downstream task in its input. We use our framework in order to prove that, under mild assumptions, when the pretraining distribution is a mixture of latent tasks (a model often considered for natural language pretraining), these tasks can be efficiently learned via in-context learning, even though the model's weights are unchanged and the input significantly diverges from the pretraining distribution. Our theoretical analysis reveals that in this setting, in-context learning is more about identifying the task than about learning it, a result which is in line with a series of recent empirical findings. We hope that the in-context learnability framework presented in this paper will facilitate future progress towards a deeper understanding of this important new learning paradigm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/03/2021

An Explanation of In-context Learning as Implicit Bayesian Inference

Large pretrained language models such as GPT-3 have the surprising abili...
research
03/14/2023

A Theory of Emergent In-Context Learning as Implicit Structure Induction

Scaling large language models (LLMs) leads to an emergent capacity to le...
research
11/01/2017

Learning with Latent Language

The named concepts and compositional operators present in natural langua...
research
10/09/2021

The Inductive Bias of In-Context Learning: Rethinking Pretraining Example Design

Pretraining Neural Language Models (NLMs) over a large corpus involves c...
research
05/22/2023

In-Context Learning of Large Language Models Explained as Kernel Regression

Large language models (LLMs) have initiated a paradigm shift in transfer...
research
05/26/2023

Large Language Models Can be Lazy Learners: Analyze Shortcuts in In-Context Learning

Large language models (LLMs) have recently shown great potential for in-...
research
05/30/2023

Dissecting Chain-of-Thought: A Study on Compositional In-Context Learning of MLPs

Chain-of-thought (CoT) is a method that enables language models to handl...

Please sign up or login with your details

Forgot password? Click here to reset