Two-Level Transformer and Auxiliary Coherence Modeling for Improved Text Segmentation

01/03/2020
by   Goran Glavaš, et al.
0

Breaking down the structure of long texts into semantically coherent segments makes the texts more readable and supports downstream applications like summarization and retrieval. Starting from an apparent link between text coherence and segmentation, we introduce a novel supervised model for text segmentation with simple but explicit coherence modeling. Our model – a neural architecture consisting of two hierarchically connected Transformer networks – is a multi-task learning model that couples the sentence-level segmentation objective with the coherence objective that differentiates correct sequences of sentences from corrupt ones. The proposed model, dubbed Coherence-Aware Text Segmentation (CATS), yields state-of-the-art segmentation performance on a collection of benchmark datasets. Furthermore, by coupling CATS with cross-lingual word embeddings, we demonstrate its effectiveness in zero-shot language transfer: it can successfully segment texts in languages unseen in training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2021

Transformer over Pre-trained Transformer for Neural Text Segmentation with Enhanced Topic Coherence

This paper proposes a transformer over transformer framework, called Tra...
research
09/15/2021

Towards Zero-shot Cross-lingual Image Retrieval and Tagging

There has been a recent spike in interest in multi-modal Language and Vi...
research
07/04/2019

Multi-Task Learning for Coherence Modeling

We address the task of assessing discourse coherence, an aspect of text ...
research
10/21/2017

Text Coherence Analysis Based on Deep Neural Network

In this paper, we propose a novel deep coherence model (DCM) using a con...
research
06/07/2023

Allophant: Cross-lingual Phoneme Recognition with Articulatory Attributes

This paper proposes Allophant, a multilingual phoneme recognizer. It req...
research
09/05/2021

Transformer Models for Text Coherence Assessment

Coherence is an important aspect of text quality and is crucial for ensu...
research
04/30/2020

Text Segmentation by Cross Segment Attention

Document and discourse segmentation are two fundamental NLP tasks pertai...

Please sign up or login with your details

Forgot password? Click here to reset