A reproduction of Apple's bi-directional LSTM models for language identification in short strings

02/11/2021
by   Mads Toftrup, et al.
0

Language Identification is the task of identifying a document's language. For applications like automatic spell checker selection, language identification must use very short strings such as text message fragments. In this work, we reproduce a language identification architecture that Apple briefly sketched in a blog post. We confirm the bi-LSTM model's performance and find that it outperforms current open-source language identifiers. We further find that its language identification mistakes are due to confusion between related languages.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset