Pika parsing: parsing in reverse solves the left recursion and error recovery problems

05/13/2020
by   Luke A. D. Hutchison, et al.
0

A recursive descent parser is built from a set of mutually-recursive functions, where each function directly implements one of the nonterminals of a grammar, causing the structure of recursive calls to directly parallel the structure of the grammar. Recursive descent parsers can take time exponential in the length of the input and the depth of the parse tree, however a memoized recursive descent parser or packrat parser is able to parse in time linear in the length of the input and the depth of the parse tree. Recursive descent parsers are extremely simple to write, but suffer from two significant problems: (i) left-recursive grammars cause the parser to get stuck in infinite recursion, and (ii) it is difficult or impossible to optimally recover the parse state and continue parsing after a syntax error. Surprisingly, both problems can be solved by parsing the input in reverse. The pika parser is a new type of packrat parser that employs dynamic programming to parse the input from right to left, bottom-up – the reverse of the standard recursive descent order of top-down, left to right. This reversed parsing order enables pika parsers to directly handle left-recursive grammars, simplifying grammar writing, and enables pika parsers to directly and optimally recover from syntax errors, which is a crucial property for IDEs and compilers. Pika parsing maintains the linear-time performance characteristics of packrat parsing. Several new insights into precedence, associativity, and left recursion are presented.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/13/2020

Pika parsing: reformulating packrat parsing as a dynamic programming algorithm solves the left recursion and error recovery problems

A recursive descent parser is built from a set of mutually-recursive fun...
research
04/10/2023

Interval Parsing Grammars for File Format Parsing

File formats specify how data is encoded for persistent storage. They ca...
research
08/28/2019

Eliminating Left Recursion without the Epsilon

The standard algorithm to eliminate indirect left recursion takes a prev...
research
03/01/2022

Fast-R2D2: A Pretrained Recursive Neural Network based on Pruned CKY for Grammar Induction and Text Representation

Recently CKY-based models show great potential in unsupervised grammar i...
research
05/06/2019

A Semi-Automatic Approach for Syntax Error Reporting and Recovery in Parsing Expression Grammars

Error recovery is an essential feature for a parser that should be plugg...
research
11/19/2014

Type-Driven Incremental Semantic Parsing with Polymorphism

Semantic parsing has made significant progress, but most current semanti...
research
12/12/2019

Inferring Input Grammars from Dynamic Control Flow

A program is characterized by its input model, and a formal input model ...

Please sign up or login with your details

Forgot password? Click here to reset