A Framework For Refining Text Classification and Object Recognition from Academic Articles

05/27/2023
by   Jinghong Li, et al.
0

With the widespread use of the internet, it has become increasingly crucial to extract specific information from vast amounts of academic articles efficiently. Data mining techniques are generally employed to solve this issue. However, data mining for academic articles is challenging since it requires automatically extracting specific patterns in complex and unstructured layout documents. Current data mining methods for academic articles employ rule-based(RB) or machine learning(ML) approaches. However, using rule-based methods incurs a high coding cost for complex typesetting articles. On the other hand, simply using machine learning methods requires annotation work for complex content types within the paper, which can be costly. Furthermore, only using machine learning can lead to cases where patterns easily recognized by rule-based methods are mistakenly extracted. To overcome these issues, from the perspective of analyzing the standard layout and typesetting used in the specified publication, we emphasize implementing specific methods for specific characteristics in academic articles. We have developed a novel Text Block Refinement Framework (TBRF), a machine learning and rule-based scheme hybrid. We used the well-known ACL proceeding articles as experimental data for the validation experiment. The experiment shows that our approach achieved over 95 classification accuracy and 90

READ FULL TEXT

page 1

page 2

research
09/23/2010

Optimal Bangla Keyboard Layout using Association Rule of Data Mining

In this paper we present an optimal Bangla Keyboard Layout, which distri...
research
12/28/2022

Automatic Recognition and Classification of Future Work Sentences from Academic Articles in a Specific Domain

Future work sentences (FWS) are the particular sentences in academic pap...
research
07/23/2018

AceKG: A Large-scale Knowledge Graph for Academic Data Mining

Most existing knowledge graphs (KGs) in academic domains suffer from pro...
research
02/16/2022

Processing the structure of documents: Logical Layout Analysis of historical newspapers in French

Background. In recent years, libraries and archives led important digiti...
research
03/11/2020

Predicting the Amount of GDPR Fines

The General Data Protection Regulation (GDPR) was enforced in 2018. Afte...
research
09/25/2010

Optimal Bangla Keyboard Layout using Data Mining Technique

This paper presents an optimal Bangla Keyboard Layout, which distributes...
research
12/23/2021

LAME: Layout Aware Metadata Extraction Approach for Research Articles

The volume of academic literature, such as academic conference papers an...

Please sign up or login with your details

Forgot password? Click here to reset