Trees in transformers: a theoretical analysis of the Transformer's ability to represent trees

12/16/2021
by   Qi He, et al.
0

Transformer networks are the de facto standard architecture in natural language processing. To date, there are no theoretical analyses of the Transformer's ability to capture tree structures. We focus on the ability of Transformer networks to learn tree structures that are important for tree transduction problems. We first analyze the theoretical capability of the standard Transformer architecture to learn tree structures given enumeration of all possible tree backbones, which we define as trees without labels. We then prove that two linear layers with ReLU activation function can recover any tree backbone from any two nonzero, linearly independent starting backbones. This implies that a Transformer can learn tree structures well in theory. We conduct experiments with synthetic data and find that the standard Transformer achieves similar accuracy compared to a Transformer where tree position information is explicitly encoded, albeit with slower convergence. This confirms empirically that Transformers can learn tree structures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/15/2023

Transformers are Universal Predictors

We find limits to the Transformer architecture for language modeling and...
research
11/02/2022

Characterizing Intrinsic Compositionality in Transformers with Tree Projections

When trained on language data, do transformers learn some arbitrary comp...
research
02/19/2020

Tree-structured Attention with Hierarchical Accumulation

Incorporating hierarchical structures like constituency trees has been s...
research
09/04/2020

AutoTrans: Automating Transformer Design via Reinforced Architecture Search

Though the transformer architectures have shown dominance in many natura...
research
05/04/2023

G-MATT: Single-step Retrosynthesis Prediction using Molecular Grammar Tree Transformer

Various template-based and template-free approaches have been proposed f...
research
10/19/2020

Parameter Norm Growth During Training of Transformers

The capacity of neural networks like the widely adopted transformer is k...
research
03/30/2020

Code Prediction by Feeding Trees to Transformers

In this paper, we describe how to leverage Transformer, a recent neural ...

Please sign up or login with your details

Forgot password? Click here to reset