Multimodal deep networks for text and image-based document classification

07/15/2019
by   Nicolas Audebert, et al.
16

Classification of document images is a critical step for archival of old manuscripts, online subscription and administrative procedures. Computer vision and deep learning have been suggested as a first solution to classify documents based on their visual appearance. However, achieving the fine-grained classification that is required in real-world setting cannot be achieved by visual analysis alone. Often, the relevant information is in the actual text content of the document. We design a multimodal neural network that is able to learn from word embeddings, computed on text extracted by OCR, and from the image. We show that this approach boosts pure image accuracy by 3 Tobacco3482 and RVL-CDIP augmented by our new QS-OCR text dataset (https://github.com/Quicksign/ocrized-text-dataset), even without clean text information.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/24/2021

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Visual information extraction (VIE) has attracted considerable attention...
research
11/24/2022

Analysis on English Vocabulary Appearance Pattern in Korean CSAT

A text-mining-based word class categorization method and LSTM-based voca...
research
03/18/2021

Learning Multimodal Affinities for Textual Editing in Images

Nowadays, as cameras are rapidly adopted in our daily routine, images of...
research
06/27/2021

Deep Learning for Technical Document Classification

In large technology companies, the requirements for managing and organiz...
research
09/17/2021

Including Keyword Position in Image-based Models for Act Segmentation of Historical Registers

The segmentation of complex images into semantic regions has seen a grow...
research
08/12/2021

VTLayout: Fusion of Visual and Text Features for Document Layout Analysis

Documents often contain complex physical structures, which make the Docu...
research
06/29/2021

SDL: New data generation tools for full-level annotated document layout

We present a novel data generation tool for document processing. The too...

Please sign up or login with your details

Forgot password? Click here to reset