Intrinsic Decomposition of Document Images In-the-Wild

11/29/2020
by   Sagnik Das, et al.
0

Automatic document content processing is affected by artifacts caused by the shape of the paper, non-uniform and diverse color of lighting conditions. Fully-supervised methods on real data are impossible due to the large amount of data needed. Hence, the current state of the art deep learning models are trained on fully or partially synthetic images. However, document shadow or shading removal results still suffer because: (a) prior methods rely on uniformity of local color statistics, which limit their application on real-scenarios with complex document shapes and textures and; (b) synthetic or hybrid datasets with non-realistic, simulated lighting conditions are used to train the models. In this paper we tackle these problems with our two main contributions. First, a physically constrained learning-based method that directly estimates document reflectance based on intrinsic image formation which generalizes to challenging illumination conditions. Second, a new dataset that clearly improves previous synthetic ones, by adding a large range of realistic shading and diverse multi-illuminant conditions, uniquely customized to deal with documents in-the-wild. The proposed architecture works in a self-supervised manner where only the synthetic texture is used as a weak training signal (obviating the need for very costly ground truth with disentangled versions of shading and reflectance). The proposed approach leads to a significant generalization of document reflectance estimation in real scenes with challenging illumination. We extensively evaluate on the real benchmark datasets available for intrinsic image decomposition and document shadow removal tasks. Our reflectance estimation scheme, when used as a pre-processing step of an OCR pipeline, shows a 26 error rate (CER), thus, proving the practical applicability.

READ FULL TEXT

page 2

page 4

page 7

page 9

page 10

research
11/15/2018

Conditional GANs for Multi-Illuminant Color Constancy: Revolution or Yet Another Approach?

Non-uniform and multi-illuminant color constancy are important tasks, th...
research
11/10/2017

Self-Supervised Intrinsic Image Decomposition

Intrinsic decomposition from a single image is a highly challenging task...
research
04/22/2019

Water-Filling: An Efficient Algorithm for Digitized Document Shadow Removal

In this paper, we propose a novel algorithm to rectify illumination of t...
research
03/22/2023

LP-IOANet: Efficient High Resolution Document Shadow Removal

Document shadow removal is an integral task in document enhancement pipe...
research
11/05/2021

Self-Supervised Intrinsic Image Decomposition Network Considering Reflectance Consistency

We propose a novel intrinsic image decomposition network considering ref...
research
02/06/2023

Neural Document Unwarping using Coupled Grids

Restoring the original, flat appearance of a printed document from casua...
research
09/14/2020

Deep intrinsic decomposition trained on surreal scenes yet with realistic light effects

Estimation of intrinsic images still remains a challenging task due to w...

Please sign up or login with your details

Forgot password? Click here to reset