Form2Seq : A Framework for Higher-Order Form Structure Extraction

by   Milan Aggarwal, et al.

Document structure extraction has been a widely researched area for decades with recent works performing it as a semantic segmentation task over document images using fully-convolution networks. Such methods are limited by image resolution due to which they fail to disambiguate structures in dense regions which appear commonly in forms. To mitigate this, we propose Form2Seq, a novel sequence-to-sequence (Seq2Seq) inspired framework for structure extraction using text, with a specific focus on forms, which leverages relative spatial arrangement of structures. We discuss two tasks; 1) Classification of low-level constituent elements (TextBlock and empty fillable Widget) into ten types such as field captions, list items, and others; 2) Grouping lower-level elements into higher-order constructs, such as Text Fields, ChoiceFields and ChoiceGroups, used as information collection mechanism in forms. To achieve this, we arrange the constituent elements linearly in natural reading order, feed their spatial and textual representations to Seq2Seq framework, which sequentially outputs prediction of each element depending on the final task. We modify Seq2Seq for grouping task and discuss improvements obtained through cascaded end-to-end training of two tasks versus training in isolation. Experimental results show the effectiveness of our text-based approach achieving an accuracy of 90 61.63 on groups discussed above respectively, outperforming segmentation baselines. Further we show our framework achieves state of the results for table structure recognition on ICDAR 2013 dataset.


Multi-Modal Association based Grouping for Form Structure Extraction

Document structure extraction has been a widely researched area for deca...

Document Structure Extraction for Forms using Very High Resolution Semantic Segmentation

In this work, we look at the problem of structure extraction from docume...

GroupLink: An End-to-end Multitask Method for Word Grouping and Relation Extraction in Form Understanding

Forms are a common type of document in real life and carry rich informat...

Rethinking BiSeNet For Real-time Semantic Segmentation

BiSeNet has been proved to be a popular two-stream network for real-time...

DUBLIN – Document Understanding By Language-Image Network

Visual document understanding is a complex task that involves analyzing ...

Learning Rich Representations For Structured Visual Prediction Tasks

We describe an approach to learning rich representations for images, tha...

Only Six Passive Circuit Elements Are Existent

We found that a second-order ideal memristor degenerates into a negative...

Please sign up or login with your details

Forgot password? Click here to reset