Comparative Document Summarisation via Classification

by   Umanga Bista, et al.

This paper considers extractive summarisation in a comparative setting: given two or more document groups (e.g., separated by publication time), the goal is to select a small number of documents that are representative of each group, and also maximally distinguishable from other groups. We formulate a set of new objective functions for this problem that connect recent literature on document summarisation, interpretable machine learning, and data subset selection. In particular, by casting the problem as a binary classification amongst different groups, we derive objectives based on the notion of maximum mean discrepancy, as well as a simple yet effective gradient-based optimisation strategy. Our new formulation allows scalable evaluations of comparative summarisation as a classification task, both automatically and via crowd-sourcing. To this end, we evaluate comparative summarisation methods on a newly curated collection of controversial news topics over 13 months. We observe that gradient-based optimisation outperforms discrete and baseline approaches in 15 out of 24 different automatic evaluation settings. In crowd-sourced evaluations, summaries from gradient optimisation elicit 7 from human workers than discrete optimisation. Our result contrasts with recent literature on submodular data subset selection that favours discrete optimisation. We posit that our formulation of comparative summarisation will prove useful in a diverse range of use cases such as comparing content sources, authors, related topics, or distinct view points.


A Comparative Study on Data Representation to Categorize Text Documents

In the modern world text documents play an important role in most of the...

IEO: Intelligent Evolutionary Optimisation for Hyperparameter Tuning

Hyperparameter optimisation is a crucial process in searching the optima...

Differentiable Greedy Networks

Optimal selection of a subset of items from a given set is a hard proble...

PeSOA: Penguins Search Optimisation Algorithm for Global Optimisation Problems

This paper develops Penguin search Optimisation Algorithm (PeSOA), a new...

Distributed Gradient Flow: Nonsmoothness, Nonconvexity, and Saddle Point Evasion

The paper considers distributed gradient flow (DGF) for multi-agent nonc...

Star Discrepancy Subset Selection: Problem Formulation and Efficient Approaches for Low Dimensions

Motivated by applications in instance selection, we introduce the star d...

Please sign up or login with your details

Forgot password? Click here to reset