PPN: Parallel Pointer-based Network for Key Information Extraction with Complex Layouts

by   Kaiwen Wei, et al.

Key Information Extraction (KIE) is a challenging multimodal task that aims to extract structured value semantic entities from visually rich documents. Although significant progress has been made, there are still two major challenges that need to be addressed. Firstly, the layout of existing datasets is relatively fixed and limited in the number of semantic entity categories, creating a significant gap between these datasets and the complex real-world scenarios. Secondly, existing methods follow a two-stage pipeline strategy, which may lead to the error propagation problem. Additionally, they are difficult to apply in situations where unseen semantic entity categories emerge. To address the first challenge, we propose a new large-scale human-annotated dataset named Complex Layout form for key information EXtraction (CLEX), which consists of 5,860 images with 1,162 semantic entity categories. To solve the second challenge, we introduce Parallel Pointer-based Network (PPN), an end-to-end model that can be applied in zero-shot and few-shot scenarios. PPN leverages the implicit clues between semantic entities to assist extracting, and its parallel extraction mechanism allows it to extract multiple results simultaneously and efficiently. Experiments on the CLEX dataset demonstrate that PPN outperforms existing state-of-the-art methods while also offering a much faster inference speed.


page 1

page 2

page 3

page 4


Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution

Visual information extraction (VIE), which aims to simultaneously perfor...

MatchVIE: Exploiting Match Relevancy between Entities for Visual Information Extraction

Visual Information Extraction (VIE) task aims to extract key information...

EATEN: Entity-aware Attention for Single Shot Visual Text Extraction

Extracting entity from images is a crucial part of many OCR applications...

Kleister: Key Information Extraction Datasets Involving Long Documents with Complex Layouts

The relevance of the Key Information Extraction (KIE) task is increasing...

Modeling Entities as Semantic Points for Visual Information Extraction in the Wild

Recently, Visual Information Extraction (VIE) has been becoming increasi...

One-shot Key Information Extraction from Document with Deep Partial Graph Matching

Automating the Key Information Extraction (KIE) from documents improves ...

PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks

Computer vision with state-of-the-art deep learning models has achieved ...

Please sign up or login with your details

Forgot password? Click here to reset