What they do when in doubt: a study of inductive biases in seq2seq learners

by   Eugene Kharitonov, et al.

Sequence-to-sequence (seq2seq) learners are widely used, but we still have only limited knowledge about what inductive biases shape the way they generalize. We address that by investigating how popular seq2seq learners generalize in tasks that have high ambiguity in the training data. We use SCAN and three new tasks to study learners' preferences for memorization, arithmetic, hierarchical, and compositional reasoning. Further, we connect to Solomonoff's theory of induction and propose to use description length as a principled and sensitive measure of inductive biases. In our experimental study, we find that LSTM-based learners can learn to perform counting, addition, and multiplication by a constant from a single training example. Furthermore, Transformer and LSTM-based learners show a bias toward the hierarchical induction over the linear one, while CNN-based learners prefer the opposite. On the SCAN dataset, we find that CNN-based, and, to a lesser degree, Transformer- and LSTM-based learners have a preference for compositional generalization over memorization. Finally, across all our experiments, description length proved to be a sensitive measure of inductive biases.


page 1

page 2

page 3

page 4


Does syntax need to grow on trees? Sources of hierarchical inductive bias in sequence-to-sequence networks

Learners that are exposed to the same training data might generalize dif...

ORCHARD: A Benchmark For Measuring Systematic Generalization of Multi-Hierarchical Reasoning

The ability to reason with multiple hierarchical structures is an attrac...

LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning

While designing inductive bias in neural architectures has been widely s...

LSTMs Compose (and Learn) Bottom-Up

Recent work in NLP shows that LSTM language models capture hierarchical ...

Can neural networks acquire a structural bias from raw linguistic data?

We evaluate whether BERT, a widely used neural network for sentence proc...

Examining the Inductive Bias of Neural Language Models with Artificial Languages

Since language models are used to model a wide variety of languages, it ...

The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning

No free lunch theorems for supervised learning state that no learner can...

Please sign up or login with your details

Forgot password? Click here to reset