BERTScore: Evaluating Text Generation with BERT

04/21/2019
by   Tianyi Zhang, et al.
0

We propose BERTScore, an automatic evaluation metric for text generation. Analogous to common metrics, computes a similarity score for each token in the candidate sentence with each token in the reference. However, instead of looking for exact matches, we compute similarity using contextualized BERT embeddings. We evaluate on several machine translation and image captioning benchmarks, and show that BERTScore correlates better with human judgments than existing metrics, often significantly outperforming even task-specific supervised metrics.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset