Generating Data for Symbolic Language with Large Language Models

05/23/2023
by   Jiacheng Ye, et al.
0

While large language models (LLMs) bring not only performance but also complexity, recent work has started to turn LLMs into data generators rather than task inferencers, where another affordable task model is trained for efficient deployment and inference. However, such an approach has primarily been applied to natural language tasks and has not yet been explored for symbolic language tasks with complex structured outputs (e.g., semantic parsing and code generation). In this paper, we propose SymGen which utilizes LLMs for generating various annotation-expensive symbolic language data. SymGen consists of an informative prompt to steer generation and an agreement-based verifier to improve data correctness. We conduct extensive experiments on six symbolic language tasks across various settings. Compared with the LLMs, we demonstrate the 1\%-sized task model can achieve comparable or better performance, largely cutting inference and deployment costs. We also show that generated data with only a few human demonstrations can be as effective as over 10 times the amount of human-annotated data when training the task model, saving a considerable amount of annotation effort. SymGen sheds new light on data generation for complex tasks, and we release the code at \href{https://github.com/HKUNLP/SymGen}{https://github.com/HKUNLP/SymGen}.

READ FULL TEXT
research
08/23/2023

Prompt2Model: Generating Deployable Models from Natural Language Instructions

Large language models (LLMs) enable system builders today to create comp...
research
10/09/2022

Controllable Dialogue Simulation with In-Context Learning

Building dialogue systems requires a large corpus of annotated dialogues...
research
08/19/2023

Inductive-bias Learning: Generating Code Models with Large Language Model

Large Language Models(LLMs) have been attracting attention due to a abil...
research
03/07/2023

Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction

Large language models (LLMs) show great potential for synthetic data gen...
research
05/21/2023

PiVe: Prompting with Iterative Verification Improving Graph-based Generative Capability of LLMs

Large language models (LLMs) have shown great abilities of solving vario...
research
11/30/2022

BEVPoolv2: A Cutting-edge Implementation of BEVDet Toward Deployment

We release a new codebase version of the BEVDet, dubbed branch dev2.0. W...
research
04/22/2020

Logical Natural Language Generation from Open-Domain Tables

Neural natural language generation (NLG) models have recently shown rema...

Please sign up or login with your details

Forgot password? Click here to reset