Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks

10/02/2022
by   Yuxuan Li, et al.
0

Transformer networks have seen great success in natural language processing and machine vision, where task objectives such as next word prediction and image classification benefit from nuanced context sensitivity across high-dimensional inputs. However, there is an ongoing debate about how and when transformers can acquire highly structured behavior and achieve systematic generalization. Here, we explore how well a causal transformer can perform a set of algorithmic tasks, including copying, sorting, and hierarchical compositions of these operations. We demonstrate strong generalization to sequences longer than those used in training by replacing the standard positional encoding typically used in transformers with labels arbitrarily paired with items in the sequence. By finding the layer and head configuration sufficient to solve the task, then performing ablation experiments and representation analysis, we show that two-layer transformers learn generalizable solutions to multi-level problems and develop signs of systematic task decomposition. They also exploit shared computation across related tasks. These results provide key insights into how transformer models may be capable of decomposing complex decisions into reusable, multi-level policies in tasks requiring structured behavior.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2023

Key-Value Transformer

Transformers have emerged as the prevailing standard solution for variou...
research
10/08/2021

Iterative Decoding for Compositional Generalization in Transformers

Deep learning models do well at generalizing to in-distribution data but...
research
11/22/2021

DBIA: Data-free Backdoor Injection Attack against Transformer Networks

Recently, transformer architecture has demonstrated its significance in ...
research
06/01/2023

Learning Transformer Programs

Recent research in mechanistic interpretability has attempted to reverse...
research
11/02/2022

Characterizing Intrinsic Compositionality in Transformers with Tree Projections

When trained on language data, do transformers learn some arbitrary comp...
research
10/14/2021

The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization

Despite successes across a broad range of applications, Transformers hav...
research
10/11/2021

Leveraging Transformers for StarCraft Macromanagement Prediction

Inspired by the recent success of transformers in natural language proce...

Please sign up or login with your details

Forgot password? Click here to reset