Apply Chinese Radicals Into Neural Machine Translation: Deeper Than Character Level

by   Shaohui Kuang, et al.

In neural machine translation (NMT), researchers face the challenge of un-seen (or out-of-vocabulary OOV) words translation. To solve this, some researchers propose the splitting of western languages such as English and German into sub-words or compounds. In this paper, we try to address this OOV issue and improve the NMT adequacy with a harder language Chinese whose characters are even more sophisticated in composition. We integrate the Chinese radicals into the NMT model with different settings to address the unseen words challenge in Chinese to English translation. On the other hand, this also can be considered as semantic part of the MT system since the Chinese radicals usually carry the essential meaning of the words they are constructed in. Meaningful radicals and new characters can be integrated into the NMT systems with our models. We use an attention-based NMT system as a strong baseline system. The experiments on standard Chinese-to-English NIST translation shared task data 2006 and 2008 show that our designed models outperform the baseline model in a wide range of state-of-the-art evaluation metrics including LEPOR, BEER, and CharacTER, in addition to the traditional BLEU and NIST scores, especially on the adequacy-level translation. We also have some interesting findings from the results of our various experiment settings about the performance of words and characters in Chinese NMT, which is different with other languages. For instance, the full character level NMT may perform very well or the state of the art in some other languages as researchers demonstrated recently, however, in the Chinese NMT model, word boundary knowledge is important for the model learning.


page 3

page 4


Word, Subword or Character? An Empirical Study of Granularity in Chinese-English NMT

Neural machine translation (NMT), a new approach to machine translation,...

Improving Character-level Japanese-Chinese Neural Machine Translation with Radicals as an Additional Input Feature

In recent years, Neural Machine Translation (NMT) has been proven to get...

Korean-to-Chinese Machine Translation using Chinese Character as Pivot Clue

Korean-Chinese is a low resource language pair, but Korean and Chinese h...

Breaking the Representation Bottleneck of Chinese Characters: Neural Machine Translation with Stroke Sequence Modeling

Existing research generally treats Chinese character as a minimum unit f...

Inference-only sub-character decomposition improves translation of unseen logographic characters

Neural Machine Translation (NMT) on logographic source languages struggl...

Rare but Severe Neural Machine Translation Errors Induced by Minimal Deletion: An Empirical Study on Chinese and English

We examine the inducement of rare but severe errors in English-Chinese a...

English-to-Chinese Transliteration with Phonetic Auxiliary Task

Approaching named entities transliteration as a Neural Machine Translati...

Please sign up or login with your details

Forgot password? Click here to reset