Automatic punctuation restoration with BERT models
We present an approach for automatic punctuation restoration with BERT models for English and Hungarian. For English, we conduct our experiments on Ted Talks, a commonly used benchmark for punctuation restoration, while for Hungarian we evaluate our models on the Szeged Treebank dataset. Our best models achieve a macro-averaged F_1-score of 79.8 in English and 82.2 in Hungarian. Our code is publicly available.
READ FULL TEXT