StructChart: Perception, Structuring, Reasoning for Visual Chart Understanding

by   Renqiu Xia, et al.

Charts are common in literature across different scientific fields, conveying rich information easily accessible to readers. Current chart-related tasks focus on either chart perception which refers to extracting information from the visual charts, or performing reasoning given the extracted data, e.g. in a tabular form. In this paper, we aim to establish a unified and label-efficient learning paradigm for joint perception and reasoning tasks, which can be generally applicable to different downstream tasks, beyond the question-answering task as specifically studied in peer works. Specifically, StructChart first reformulates the chart information from the popular tubular form (specifically linearized CSV) to the proposed Structured Triplet Representations (STR), which is more friendly for reducing the task gap between chart perception and reasoning due to the employed structured information extraction for charts. We then propose a Structuring Chart-oriented Representation Metric (SCRM) to quantitatively evaluate the performance for the chart perception task. To enrich the dataset for training, we further explore the possibility of leveraging the Large Language Model (LLM), enhancing the chart diversity in terms of both chart visual style and its statistical information. Extensive experiments are conducted on various chart-related tasks, demonstrating the effectiveness and promising potential for a unified chart perception-reasoning paradigm to push the frontier of chart understanding.


page 20

page 21


LMEye: An Interactive Perception Network for Large Language Models

Training a Large Visual Language Model (LVLM) from scratch, like GPT-4, ...

REXUP: I REason, I EXtract, I UPdate with Structured Compositional Reasoning for Visual Question Answering

Visual question answering (VQA) is a challenging multi-modal task that r...

StructGPT: A General Framework for Large Language Model to Reason over Structured Data

In this paper, we study how to improve the zero-shot reasoning ability o...

Towards Unsupervised Visual Reasoning: Do Off-The-Shelf Features Know How to Reason?

Recent advances in visual representation learning allowed to build an ab...

LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering

Video Question Answering (VideoQA), aiming to correctly answer the given...

Visual Semantic Information Pursuit: A Survey

Visual semantic information comprises two important parts: the meaning o...

VisAlign: Dataset for Measuring the Degree of Alignment between AI and Humans in Visual Perception

AI alignment refers to models acting towards human-intended goals, prefe...

Please sign up or login with your details

Forgot password? Click here to reset