Document Provenance and Authentication through Authorship Classification

03/02/2023
by   Muhammad Tayyab Zamir, et al.
0

Style analysis, which is relatively a less explored topic, enables several interesting applications. For instance, it allows authors to adjust their writing style to produce a more coherent document in collaboration. Similarly, style analysis can also be used for document provenance and authentication as a primary step. In this paper, we propose an ensemble-based text-processing framework for the classification of single and multi-authored documents, which is one of the key tasks in style analysis. The proposed framework incorporates several state-of-the-art text classification algorithms including classical Machine Learning (ML) algorithms, transformers, and deep learning algorithms both individually and in merit-based late fusion. For the merit-based late fusion, we employed several weight optimization and selection methods to assign merit-based weights to the individual text classification algorithms. We also analyze the impact of the characters on the task that are usually excluded in NLP applications during pre-processing by conducting experiments on both clean and un-clean data. The proposed framework is evaluated on a large-scale benchmark dataset, significantly improving performance over the existing solutions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2023

Quantum Text Classifier – A Synchronistic Approach Towards Classical and Quantum Machine Learning

Although it will be a while before a practical quantum computer is avail...
research
09/24/2017

HDLTex: Hierarchical Deep Learning for Text Classification

The continually increasing number of documents produced each year necess...
research
03/03/2020

Benchmark Performance of Machine And Deep Learning Based Methodologies for Urdu Text Document Classification

In order to provide benchmark performance for Urdu text document classif...
research
05/10/2018

Text classification based on ensemble extreme learning machine

In this paper, we propose a novel approach based on cost-sensitive ensem...
research
08/31/2018

Seeing Colors: Learning Semantic Text Encoding for Classification

The question we answer with this work is: can we convert a text document...
research
06/09/2016

Large scale biomedical texts classification: a kNN and an ESA-based approaches

With the large and increasing volume of textual data, automated methods ...
research
06/17/2019

Recursive Style Breach Detection with Multifaceted Ensemble Learning

We present a supervised approach for style change detection, which aims ...

Please sign up or login with your details

Forgot password? Click here to reset