KaggleDBQA: Realistic Evaluation of Text-to-SQL Parsers

06/22/2021
by   Chia-Hsuan Lee, et al.
0

The goal of database question answering is to enable natural language querying of real-life relational databases in diverse application domains. Recently, large-scale datasets such as Spider and WikiSQL facilitated novel modeling techniques for text-to-SQL parsing, improving zero-shot generalization to unseen databases. In this work, we examine the challenges that still prevent these techniques from practical deployment. First, we present KaggleDBQA, a new cross-domain evaluation dataset of real Web databases, with domain-specific data types, original formatting, and unrestricted questions. Second, we re-examine the choice of evaluation tasks for text-to-SQL parsers as applied in real-life settings. Finally, we augment our in-domain evaluation task with database documentation, a naturally occurring source of implicit domain knowledge. We show that KaggleDBQA presents a challenge to state-of-the-art zero-shot parsers but a more realistic evaluation setting and creative use of associated database documentation boosts their accuracy by over 13.2 their performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/24/2018

Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task

We present Spider, a large-scale, complex and cross-domain semantic pars...
research
09/11/2021

Exploring Underexplored Limitations of Cross-Domain Text-to-SQL Generalization

Recently, there has been significant progress in studying neural network...
research
05/19/2023

How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings

Large language models (LLMs) with in-context learning have demonstrated ...
research
05/25/2023

UNITE: A Unified Benchmark for Text-to-SQL Evaluation

A practical text-to-SQL system should generalize well on a wide variety ...
research
07/14/2023

C3: Zero-shot Text-to-SQL with ChatGPT

This paper proposes a ChatGPT-based zero-shot Text-to-SQL method, dubbed...
research
08/29/2019

Zero-shot Text-to-SQL Learning with Auxiliary Task

Recent years have seen great success in the use of neural seq2seq models...
research
05/21/2023

Wav2SQL: Direct Generalizable Speech-To-SQL Parsing

Speech-to-SQL (S2SQL) aims to convert spoken questions into SQL queries ...

Please sign up or login with your details

Forgot password? Click here to reset