Transferring BERT-like Transformers' Knowledge for Authorship Verification

by   Andrei Manolache, et al.

The task of identifying the author of a text spans several decades and was tackled using linguistics, statistics, and, more recently, machine learning. Inspired by the impressive performance gains across a broad range of natural language processing tasks and by the recent availability of the PAN large-scale authorship dataset, we first study the effectiveness of several BERT-like transformers for the task of authorship verification. Such models prove to achieve very high scores consistently. Next, we empirically show that they focus on topical clues rather than on author writing style characteristics, taking advantage of existing biases in the dataset. To address this problem, we provide new splits for PAN-2020, where training and test data are sampled from disjoint topics or authors. Finally, we introduce DarkReddit, a dataset with a different input data distribution. We further use it to analyze the domain generalization performance of models in a low-data regime and how performance varies when using the proposed PAN-2020 splits for fine-tuning. We show that those splits can enhance the models' capability to transfer knowledge over a new, significantly different dataset.


page 1

page 2

page 3

page 4


Federated pretraining and fine tuning of BERT using clinical notes from multiple silos

Large scale contextual representation models, such as BERT, have signifi...

Fine-Tuning BERT for Sentiment Analysis of Vietnamese Reviews

Sentiment analysis is an important task in the field ofNature Language P...

M-BERT: Injecting Multimodal Information in the BERT Structure

Multimodal language analysis is an emerging research area in natural lan...

Med-BERT: pre-trained contextualized embeddings on large-scale structured electronic health records for disease prediction

Deep learning (DL) based predictive models from electronic health record...

Adapting Sentence Transformers for the Aviation Domain

Learning effective sentence representations is crucial for many Natural ...

Actuarial Applications of Natural Language Processing Using Transformers: Case Studies for Using Text Features in an Actuarial Context

This tutorial demonstrates workflows to incorporate text data into actua...

How Different Text-preprocessing Techniques Using The BERT Model Affect The Gender Profiling of Authors

Forensic author profiling plays an important role in indicating possible...

Please sign up or login with your details

Forgot password? Click here to reset