Importance of Synthesizing High-quality Data for Text-to-SQL Parsing

by   Yiyun Zhao, et al.
The University of Arizona

Recently, there has been increasing interest in synthesizing data to improve downstream text-to-SQL tasks. In this paper, we first examined the existing synthesized datasets and discovered that state-of-the-art text-to-SQL algorithms did not further improve on popular benchmarks when trained with augmented synthetic data. We observed two shortcomings: illogical synthetic SQL queries from independent column sampling and arbitrary table joins. To address these issues, we propose a novel synthesis framework that incorporates key relationships from schema, imposes strong typing, and conducts schema-distance-weighted column sampling. We also adopt an intermediate representation (IR) for the SQL-to-text task to further improve the quality of the generated natural language questions. When existing powerful semantic parsers are pre-finetuned on our high-quality synthesized data, our experiments show that these models have significant accuracy boosts on popular benchmarks, including new state-of-the-art performance on Spider.


page 1

page 2

page 3

page 4


X-SQL: reinforce schema representation with context

In this work, we present X-SQL, a new network architecture for the probl...

Diverse Parallel Data Synthesis for Cross-Database Adaptation of Text-to-SQL Parsers

Text-to-SQL parsers typically struggle with databases unseen during the ...

GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing

We present GraPPa, an effective pre-training approach for table semantic...

Structure-Grounded Pretraining for Text-to-SQL

Learning to capture text-table alignment is essential for table related ...

Learning to Synthesize Data for Semantic Parsing

Synthesizing data for semantic parsing has gained increasing attention r...

STAR: SQL Guided Pre-Training for Context-dependent Text-to-SQL Parsing

In this paper, we propose a novel SQL guided pre-training framework STAR...

Embedding Individual Table Columns for Resilient SQL Chatbots

Most of the world's data is stored in relational databases. Accessing th...

Please sign up or login with your details

Forgot password? Click here to reset