Topic Detection and Tracking with Time-Aware Document Embeddings

12/12/2021
by   Hang Jiang, et al.
19

The time at which a message is communicated is a vital piece of metadata in many real-world natural language processing tasks such as Topic Detection and Tracking (TDT). TDT systems aim to cluster a corpus of news articles by event, and in that context, stories that describe the same event are likely to have been written at around the same time. Prior work on time modeling for TDT takes this into account, but does not well capture how time interacts with the semantic nature of the event. For example, stories about a tropical storm are likely to be written within a short time interval, while stories about a movie release may appear over weeks or months. In our work, we design a neural method that fuses temporal and textual information into a single representation of news documents for event detection. We fine-tune these time-aware document embeddings with a triplet loss architecture, integrate the model into downstream TDT systems, and evaluate the systems on two benchmark TDT data sets in English. In the retrospective setting, we apply clustering algorithms to the time-aware embeddings and show substantial improvements over baselines on the News2013 data set. In the online streaming setting, we add our document encoder to an existing state-of-the-art TDT pipeline and demonstrate that it can benefit the overall performance. We conduct ablation studies on the time representation and fusion algorithm strategies, showing that our proposed model outperforms alternative strategies. Finally, we probe the model to examine how it handles recurring events more effectively than previous TDT systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/26/2021

Event-Driven News Stream Clustering using Entity-Aware Contextual Embeddings

We propose a method for online news stream clustering that is a variant ...
research
10/12/2021

Topic-time Heatmaps for Human-in-the-loop Topic Detection and Tracking

The essential task of Topic Detection and Tracking (TDT) is to organize ...
research
04/27/2022

TimeBERT: Enhancing Pre-Trained Language Representations with Temporal Information

Time is an important aspect of text documents, which has been widely exp...
research
03/07/2021

RevDet: Robust and Memory Efficient Event Detection and Tracking in Large News Feeds

With the ever-growing volume of online news feeds, event-based organizat...
research
11/01/2022

Semantic Pivoting Model for Effective Event Detection

Event Detection, which aims to identify and classify mentions of event i...
research
05/27/2021

Corpus-Level Evaluation for Event QA: The IndiaPoliceEvents Corpus Covering the 2002 Gujarat Violence

Automated event extraction in social science applications often requires...
research
03/30/2020

Detection of FLOSS version release events from Stack Overflow message data

Topic Detection and Tracking (TDT) is a very active research question wi...

Please sign up or login with your details

Forgot password? Click here to reset