Structure-aware Pre-training for Table Understanding with Tree-based Transformers

by   Zhiruo Wang, et al.
Carnegie Mellon University
Peking University

Tables are widely used with various structures to organize and present data. Recent attempts on table understanding mainly focus on relational tables, yet overlook to other common table structures. In this paper, we propose TUTA, a unified pre-training architecture for understanding generally structured tables. Since understanding a table needs to leverage both spatial, hierarchical, and semantic information, we adapt the self-attention strategy with several key structure-aware mechanisms. First, we propose a novel tree-based structure called a bi-dimensional coordinate tree, to describe both the spatial and hierarchical information in tables. Upon this, we extend the pre-training architecture with two core mechanisms, namely the tree-based attention and tree-based position embedding. Moreover, to capture table information in a progressive manner, we devise three pre-training objectives to enable representations at the token, cell, and table levels. TUTA pre-trains on a wide range of unlabeled tables and fine-tunes on a critical task in the field of table structure understanding, i.e. cell type classification. Experiment results show that TUTA is highly effective, achieving state-of-the-art on four well-annotated cell type classification datasets.


page 4

page 8

page 10


TURL: Table Understanding through Representation Learning

Relational tables on the Web store a vast amount of knowledge. Owing to ...

TabularNet: A Neural Network Architecture for Understanding Semantic Structures of Tabular Data

Tabular data are ubiquitous for the widespread applications of tables an...

Table-To-Text generation and pre-training with TabT5

Encoder-only transformer models have been successfully applied to differ...

Generative Table Pre-training Empowers Models for Tabular Prediction

Recently, the topic of table pre-training has attracted considerable res...

A Mixed Hierarchical Attention based Encoder-Decoder Approach for Standard Table Summarization

Structured data summarization involves generation of natural language su...

TableParser: Automatic Table Parsing with Weak Supervision from Spreadsheets

Tables have been an ever-existing structure to store data. There exist n...

Please sign up or login with your details

Forgot password? Click here to reset