NL2CMD: An Updated Workflow for Natural Language to Bash Commands Translation

02/15/2023
by   Quchen Fu, et al.
0

Translating natural language into Bash Commands is an emerging research field that has gained attention in recent years. Most efforts have focused on producing more accurate translation models. To the best of our knowledge, only two datasets are available, with one based on the other. Both datasets involve scraping through known data sources (through platforms like stack overflow, crowdsourcing, etc.) and hiring experts to validate and correct either the English text or Bash Commands. This paper provides two contributions to research on synthesizing Bash Commands from scratch. First, we describe a state-of-the-art translation model used to generate Bash Commands from the corresponding English text. Second, we introduce a new NL2CMD dataset that is automatically generated, involves minimal human intervention, and is over six times larger than prior datasets. Since the generation pipeline does not rely on existing Bash Commands, the distribution and types of commands can be custom adjusted. We evaluate the performance of ChatGPT on this task and discuss the potential of using it as a data generator. Our empirical results show how the scale and diversity of our dataset can offer unique opportunities for semantic parsing researchers.

READ FULL TEXT
research
04/30/2020

Use of Machine Translation to Obtain Labeled Datasets for Resource-Constrained Languages

The large annotated datasets in NLP are overwhelmingly in English. This ...
research
02/22/2018

RDF2PT: Generating Brazilian Portuguese Texts from RDF Data

The generation of natural language from Resource Description Framework (...
research
10/30/2021

EventNarrative: A large-scale Event-centric Dataset for Knowledge Graph-to-Text Generation

We introduce EventNarrative, a knowledge graph-to-text dataset from publ...
research
10/12/2020

OCNLI: Original Chinese Natural Language Inference

Despite the tremendous recent progress on natural language inference (NL...
research
05/05/2020

Russian Natural Language Generation: Creation of a Language Modelling Dataset and Evaluation with Modern Neural Architectures

Generating coherent, grammatically correct, and meaningful text is very ...
research
09/20/2023

K-pop Lyric Translation: Dataset, Analysis, and Neural-Modelling

Lyric translation, a field studied for over a century, is now attracting...

Please sign up or login with your details

Forgot password? Click here to reset