DocAligner: Annotating Real-world Photographic Document Images by Simply Taking Pictures

06/09/2023
by   Jiaxin Zhang, et al.
0

Recently, there has been a growing interest in research concerning document image analysis and recognition in photographic scenarios. However, the lack of labeled datasets for this emerging challenge poses a significant obstacle, as manual annotation can be time-consuming and impractical. To tackle this issue, we present DocAligner, a novel method that streamlines the manual annotation process to a simple step of taking pictures. DocAligner achieves this by establishing dense correspondence between photographic document images and their clean counterparts. It enables the automatic transfer of existing annotations in clean document images to photographic ones and helps to automatically acquire labels that are unavailable through manual labeling. Considering the distinctive characteristics of document images, DocAligner incorporates several innovative features. First, we propose a non-rigid pre-alignment technique based on the document's edges, which effectively eliminates interference caused by significant global shifts and repetitive patterns present in document images. Second, to handle large shifts and ensure high accuracy, we introduce a hierarchical aligning approach that combines global and local correlation layers. Furthermore, considering the importance of fine-grained elements in document images, we present a details recurrent refinement module to enhance the output in a high-resolution space. To train DocAligner, we construct a synthetic dataset and introduce a self-supervised learning approach to enhance its robustness for real-world data. Through extensive experiments, we demonstrate the effectiveness of DocAligner and the acquired dataset. Datasets and codes will be publicly available.

READ FULL TEXT

page 5

page 7

page 8

page 12

page 13

research
03/16/2023

ShabbyPages: A Reproducible Document Denoising and Binarization Dataset

Document denoising and binarization are fundamental problems in the docu...
research
10/25/2018

Improving Document Binarization via Adversarial Noise-Texture Augmentation

Binarization of degraded document images is an elementary step in most o...
research
08/30/2022

PanorAMS: Automatic Annotation for Detecting Objects in Urban Context

Large collections of geo-referenced panoramic images are freely availabl...
research
08/27/2023

High-Resolution Document Shadow Removal via A Large-Scale Real-World Dataset and A Frequency-Aware Shadow Erasing Net

Shadows often occur when we capture the documents with casual equipment,...
research
04/07/2021

MultiScene: A Large-scale Dataset and Benchmark for Multi-scene Recognition in Single Aerial Images

Aerial scene recognition is a fundamental research problem in interpreti...
research
09/11/2023

TransferDoc: A Self-Supervised Transferable Document Representation Learning Model Unifying Vision and Language

The field of visual document understanding has witnessed a rapid growth ...

Please sign up or login with your details

Forgot password? Click here to reset