TabSim: A Siamese Neural Network for Accurate Estimation of Table Similarity

by   Maryam Habibi, et al.

Tables are a popular and efficient means of presenting structured information. They are used extensively in various kinds of documents including web pages. Tables display information as a two-dimensional matrix, the semantics of which is conveyed by a mixture of structure (rows, columns), headers, caption, and content. Recent research has started to consider tables as first class objects, not just as an addendum to texts, yielding interesting results for problems like table matching, table completion, or value imputation. All of these problems inherently rely on an accurate measure for the semantic similarity of two tables. We present TabSim, a novel method to compute table similarity scores using deep neural networks. Conceptually, TabSim represents a table as a learned concatenation of embeddings of its caption, its content, and its structure. Given two tables in this representation, a Siamese neural network is trained to compute a score correlating with the tables' semantic similarity. To train and evaluate our method, we created a gold standard corpus consisting of 1500 table pairs extracted from biomedical articles and manually scored regarding their degree of similarity, and adopted two other corpora originally developed for a different yet similar task. Our evaluation shows that TabSim outperforms other table similarity measures on average by app. 7 similarity classification setting and by app. 1.5


page 1

page 9


GitTables: A Large-Scale Corpus of Relational Tables

The practical success of deep learning has sparked interest in improving...

Ad Hoc Table Retrieval using Semantic Similarity

We introduce and address the problem of ad hoc table retrieval: answerin...

Recommending Related Tables

Tables are an extremely powerful visual and interactive tool for structu...

StruBERT: Structure-aware BERT for Table Search and Matching

A large amount of information is stored in data tables. Users can search...

Generative Benchmark Creation for Table Union Search

Data management has traditionally relied on synthetic data generators to...

Efficient Joinable Table Discovery in Data Lakes: A High-Dimensional Similarity-Based Approach

Finding joinable tables in data lakes is key procedure in many applicati...

TableParser: Automatic Table Parsing with Weak Supervision from Spreadsheets

Tables have been an ever-existing structure to store data. There exist n...

Please sign up or login with your details

Forgot password? Click here to reset