On the Linguistic Representational Power of Neural Machine Translation Models

by   Yonatan Belinkov, et al.

Despite the recent success of deep neural networks in natural language processing (NLP), their interpretability remains a challenge. We analyze the representations learned by neural machine translation models at various levels of granularity and evaluate their quality through relevant extrinsic properties. In particular, we seek answers to the following questions: (i) How accurately is word-structure captured within the learned representations, an important aspect in translating morphologically-rich languages? (ii) Do the representations capture long-range dependencies, and effectively handle syntactically divergent languages? (iii) Do the representations capture lexical semantics? We conduct a thorough investigation along several parameters: (i) Which layers in the architecture capture each of these linguistic phenomena; (ii) How does the choice of translation unit (word, character, or subword unit) impact the linguistic properties captured by the underlying representations? (iii) Do the encoder and decoder learn differently and independently? (iv) Do the representations learned by multilingual NMT models capture the same amount of linguistic information as their bilingual counterparts? Our data-driven, quantitative evaluation illuminates important aspects in NMT models and their ability to capture various linguistic phenomena. We show that deep NMT models learn a non-trivial amount of linguistic information. Notable findings include: i) Word morphology and part-of-speech information are captured at the lower layers of the model; (ii) In contrast, lexical semantics or non-local syntactic and semantic dependencies are better represented at the higher layers; (iii) Representations learned using characters are more informed about wordmorphology compared to those learned using subword units; and (iv) Representations learned by multilingual models are richer compared to bilingual models.


page 19

page 21

page 22


What do Neural Machine Translation Models Learn about Morphology?

Neural machine translation (MT) models obtain state-of-the-art performan...

Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks

While neural machine translation (NMT) models provide improved translati...

What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models

Despite the remarkable evolution of deep neural networks in natural lang...

An empirical analysis of phrase-based and neural machine translation

Two popular types of machine translation (MT) are phrase-based and neura...

Language Embeddings Sometimes Contain Typological Generalizations

To what extent can neural network models learn generalizations about lan...

Emerging Language Spaces Learned From Massively Multilingual Corpora

Translations capture important information about languages that can be u...

Discovering Latent Concepts Learned in BERT

A large number of studies that analyze deep neural network models and th...

Please sign up or login with your details

Forgot password? Click here to reset