Tri-Transformer Hawkes Process: Three Heads are better than one

by   Zhi-yan Song, et al.

Abstract. Most of the real world data we encounter are asynchronous event sequence, so the last decades have been characterized by the implementation of various point process into the field of social networks,electronic medical records and financial transactions. At the beginning, Hawkes process and its variants which can simulate simultaneously the self-triggering and mutual triggering patterns between different events in complex sequences in a clear and quantitative way are more popular.Later on, with the advances of neural network, neural Hawkes process has been proposed one after another, and gradually become a research hotspot. The proposal of the transformer Hawkes process (THP) has gained a huge performance improvement, so a new upsurge of the neural Hawkes process based on transformer is set off. However, THP does not make full use of the information of occurrence time and type of event in the asynchronous event sequence. It simply adds the encoding of event type conversion and the location encoding of time conversion to the source encoding. At the same time, the learner built from a single transformer will result in an inescapable learning bias. In order to mitigate these problems, we propose a tri-transformer Hawkes process (Tri-THP) model, in which the event and time information are added to the dot-product attention as auxiliary information to form a new multihead attention. The effectiveness of the Tri-THP is proved by a series of well-designed experiments on both real world and synthetic data.


page 1

page 2

page 3

page 4


Temporal Attention Augmented Transformer Hawkes Process

In recent years, mining the knowledge from asynchronous sequences by Haw...

Bayesian Neural Hawkes Process for Event Uncertainty Prediction

Many applications comprise of sequences of event data with the time of o...

Learning Hawkes Processes from Short Doubly-Censored Event Sequences

Many real-world applications require robust algorithms to learn point pr...

Joint Microseismic Event Detection and Location with a Detection Transformer

Microseismic event detection and location are two primary components in ...

ChunkFormer: Learning Long Time Series with Multi-stage Chunked Transformer

The analysis of long sequence data remains challenging in many real-worl...

Universal Transformer Hawkes Process with Adaptive Recursive Iteration

Asynchronous events sequences are widely distributed in the natural worl...

Cross-Supervised Joint-Event-Extraction with Heterogeneous Information Networks

Joint-event-extraction, which extracts structural information (i.e., ent...

Please sign up or login with your details

Forgot password? Click here to reset