BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese

09/20/2021
by   Nguyen Luong Tran, et al.
0

We present BARTpho with two versions – BARTpho_word and BARTpho_syllable – the first public large-scale monolingual sequence-to-sequence models pre-trained for Vietnamese. Our BARTpho uses the "large" architecture and pre-training scheme of the sequence-to-sequence denoising model BART, thus especially suitable for generative NLP tasks. Experiments on a downstream task of Vietnamese text summarization show that in both automatic and human evaluations, our BARTpho outperforms the strong baseline mBART and improves the state-of-the-art. We release BARTpho to facilitate future research and applications of generative Vietnamese NLP tasks. Our BARTpho models are available at: https://github.com/VinAIResearch/BARTpho

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset