Dr.Spider: A Diagnostic Evaluation Benchmark towards Text-to-SQL Robustness

by   Shuaichen Chang, et al.
The Ohio State University

Neural text-to-SQL models have achieved remarkable performance in translating natural language questions into SQL queries. However, recent studies reveal that text-to-SQL models are vulnerable to task-specific perturbations. Previous curated robustness test sets usually focus on individual phenomena. In this paper, we propose a comprehensive robustness benchmark based on Spider, a cross-domain text-to-SQL benchmark, to diagnose the model robustness. We design 17 perturbations on databases, natural language questions, and SQL queries to measure the robustness from different angles. In order to collect more diversified natural question perturbations, we utilize large pretrained language models (PLMs) to simulate human behaviors in creating natural questions. We conduct a diagnostic study of the state-of-the-art models on the robustness set. Experimental results reveal that even the most robust model suffers from a 14.0 the most challenging perturbation. We also present a breakdown analysis regarding text-to-SQL model designs and provide insights for improving model robustness.


UNITE: A Unified Benchmark for Text-to-SQL Evaluation

A practical text-to-SQL system should generalize well on a wide variety ...

Towards Robustness of Text-to-SQL Models Against Natural and Realistic Adversarial Table Perturbation

The robustness of Text-to-SQL parsers against adversarial perturbations ...

On the Structural Generalization in Text-to-SQL

Exploring the generalization of a text-to-SQL parser is essential for a ...

Photon: A Robust Cross-Domain Text-to-SQL System

Natural language interfaces to databases (NLIDB) democratize end user ac...

Towards Robustness of Text-to-SQL Models against Synonym Substitution

Recently, there has been significant progress in studying neural network...

Evaluating the Text-to-SQL Capabilities of Large Language Models

We perform an empirical evaluation of Text-to-SQL capabilities of the Co...

Improving Text-to-SQL Evaluation Methodology

To be informative, an evaluation must measure how well systems generaliz...

Please sign up or login with your details

Forgot password? Click here to reset