Studying Large Language Model Generalization with Influence Functions

08/07/2023
by   Roger Grosse, et al.
0

When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior? Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a given sequence were added to the training set? While influence functions have produced insights for small models, they are difficult to scale to large language models (LLMs) due to the difficulty of computing an inverse-Hessian-vector product (IHVP). We use the Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) approximation to scale influence functions up to LLMs with up to 52 billion parameters. In our experiments, EK-FAC achieves similar accuracy to traditional influence function estimators despite the IHVP computation being orders of magnitude faster. We investigate two algorithmic techniques to reduce the cost of computing gradients of candidate training sequences: TF-IDF filtering and query batching. We use influence functions to investigate the generalization patterns of LLMs, including the sparsity of the influence patterns, increasing abstraction with scale, math and programming abilities, cross-lingual generalization, and role-playing behavior. Despite many apparently sophisticated forms of generalization, we identify a surprising limitation: influences decay to near-zero when the order of key phrases is flipped. Overall, influence functions give us a powerful new tool for studying the generalization properties of LLMs.

READ FULL TEXT

page 6

page 29

page 31

page 32

page 35

page 36

page 38

page 39

research
03/25/2020

RelatIF: Identifying Explanatory Training Examples via Relative Influence

In this work, we focus on the use of influence functions to identify rel...
research
05/25/2023

On Influence Functions, Classification Influence, Relative Influence, Memorization and Generalization

Machine learning systems such as large scale recommendation systems or n...
research
03/14/2017

Understanding Black-box Predictions via Influence Functions

How can we explain the predictions of a black-box model? In this paper, ...
research
12/31/2020

FastIF: Scalable Influence Functions for Efficient Model Interpretation and Debugging

Influence functions approximate the 'influences' of training data-points...
research
06/25/2020

Influence Functions in Deep Learning Are Fragile

Influence functions approximate the effect of training samples in test-t...
research
12/24/2021

Counterfactual Memorization in Neural Language Models

Modern neural language models widely used in tasks across NLP risk memor...
research
03/14/2023

Simfluence: Modeling the Influence of Individual Training Examples by Simulating Training Runs

Training data attribution (TDA) methods offer to trace a model's predict...

Please sign up or login with your details

Forgot password? Click here to reset