Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla

by   Tom Lieberum, et al.

Circuit analysis is a promising technique for understanding the internal mechanisms of language models. However, existing analyses are done in small models far from the state of the art. To address this, we present a case study of circuit analysis in the 70B Chinchilla model, aiming to test the scalability of circuit analysis. In particular, we study multiple-choice question answering, and investigate Chinchilla's capability to identify the correct answer label given knowledge of the correct answer text. We find that the existing techniques of logit attribution, attention pattern visualization, and activation patching naturally scale to Chinchilla, allowing us to identify and categorize a small set of `output nodes' (attention heads and MLPs). We further study the `correct letter' category of attention heads aiming to understand the semantics of their features, with mixed results. For normal multiple-choice question answers, we significantly compress the query, key and value subspaces of the head without loss of performance when operating on the answer labels for multiple-choice questions, and we show that the query and key subspaces represent an `Nth item in an enumeration' feature to at least some extent. However, when we attempt to use this explanation to understand the heads' behaviour on a more general distribution including randomized answer labels, we find that it is only a partial explanation, suggesting there is more to learn about the operation of `correct letter' heads on multiple choice question answering.


page 1

page 14

page 41


Mounting Video Metadata on Transformer-based Language Model for Open-ended Video Question Answering

Video question answering has recently received a lot of attention from m...

A Question Answering System Using Graph-Pattern Association Rules (QAGPAR) On YAGO Knowledge Base

A question answering system (QA System) was developed that uses graph-pa...

Adversarial TableQA: Attention Supervision for Question Answering on Tables

The task of answering a question given a text passage has shown great de...

MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering

This paper introduces MedMCQA, a new large-scale, Multiple-Choice Questi...

Leveraging Large Language Models for Multiple Choice Question Answering

While large language models (LLMs) like GPT-3 have achieved impressive r...

Performance of ChatGPT-3.5 and GPT-4 on the United States Medical Licensing Examination With and Without Distractions

As Large Language Models (LLMs) are predictive models building their res...

Please sign up or login with your details

Forgot password? Click here to reset