InfeRE: Step-by-Step Regex Generation via Chain of Inference

by   Shuai Zhang, et al.
Shanghai Jiao Tong University

Automatically generating regular expressions (abbrev. regexes) from natural language description (NL2RE) has been an emerging research area. Prior studies treat regex as a linear sequence of tokens and generate the final expressions autoregressively in a single pass. They did not take into account the step-by-step internal text-matching processes behind the final results. This significantly hinders the efficacy and interpretability of regex generation by neural language models. In this paper, we propose a new paradigm called InfeRE, which decomposes the generation of regexes into chains of step-by-step inference. To enhance the robustness, we introduce a self-consistency decoding mechanism that ensembles multiple outputs sampled from different models. We evaluate InfeRE on two publicly available datasets, NL-RX-Turk and KB13, and compare the results with state-of-the-art approaches and the popular tree-based generation approach TRANX. Experimental results show that InfeRE substantially outperforms previous baselines, yielding 16.3 accuracy on two datasets, respectively. Particularly, InfeRE outperforms the popular tree-based generation approach by 18.1 respectively, in terms of DFA@5 accuracy.


page 1

page 2

page 3

page 4


Tree-Based Representation and Generation of Natural and Mathematical Language

Mathematical language in scientific communications and educational scena...

TransRegex: Multi-modal Regular Expression Synthesis by Generate-and-Repair

Since regular expressions (abbrev. regexes) are difficult to understand ...

Complexity-Based Prompting for Multi-Step Reasoning

We study the task of prompting large-scale language models to perform mu...

Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding

We endow Large Language Models (LLMs) with fine-grained self-evaluation ...

CODEP: Grammatical Seq2Seq Model for General-Purpose Code Generation

General-purpose code generation (GPCG) aims to automatically convert the...

Robot Behavior-Tree-Based Task Generation with Large Language Models

Nowadays, the behavior tree is gaining popularity as a representation fo...

Please sign up or login with your details

Forgot password? Click here to reset