Nightmare at test time: How punctuation prevents parsers from generalizing

08/31/2018
by   Anders Søgaard, et al.
0

Punctuation is a strong indicator of syntactic structure, and parsers trained on text with punctuation often rely heavily on this signal. Punctuation is a diversion, however, since human language processing does not rely on punctuation to the same extent, and in informal texts, we therefore often leave out punctuation. We also use punctuation ungrammatically for emphatic or creative purposes, or simply by mistake. We show that (a) dependency parsers are sensitive to both absence of punctuation and to alternative uses; (b) neural parsers tend to be more sensitive than vintage parsers; (c) training neural parsers without punctuation outperforms all out-of-the-box parsers across all scenarios where punctuation departs from standard punctuation. Our main experiments are on synthetically corrupted data to study the effect of punctuation in isolation and avoid potential confounds, but we also show effects on out-of-domain data.

READ FULL TEXT
research
07/12/2022

Utilizing Excess Resources in Training Neural Networks

In this work, we suggest Kernel Filtering Linear Overparameterization (K...
research
11/20/2015

Sequence Level Training with Recurrent Neural Networks

Many natural language processing applications use language models to gen...
research
05/25/2021

Context-Sensitive Visualization of Deep Learning Natural Language Processing Models

The introduction of Transformer neural networks has changed the landscap...
research
05/31/2017

Analysis of the Effect of Dependency Information on Predicate-Argument Structure Analysis and Zero Anaphora Resolution

This paper investigates and analyzes the effect of dependency informatio...
research
08/26/2015

Crossings as a side effect of dependency lengths

The syntactic structure of sentences exhibits a striking regularity: dep...
research
10/28/2019

Evaluating Lottery Tickets Under Distributional Shifts

The Lottery Ticket Hypothesis suggests large, over-parameterized neural ...
research
02/03/2023

Controlling for Stereotypes in Multimodal Language Model Evaluation

We propose a methodology and design two benchmark sets for measuring to ...

Please sign up or login with your details

Forgot password? Click here to reset