PAT: Position-Aware Transformer for Dense Multi-Label Action Detection

08/09/2023
by   Faegheh Sardari, et al.
0

We present PAT, a transformer-based network that learns complex temporal co-occurrence action dependencies in a video by exploiting multi-scale temporal features. In existing methods, the self-attention mechanism in transformers loses the temporal positional information, which is essential for robust action detection. To address this issue, we (i) embed relative positional encoding in the self-attention mechanism and (ii) exploit multi-scale temporal relationships by designing a novel non hierarchical network, in contrast to the recent transformer-based approaches that use a hierarchical structure. We argue that joining the self-attention mechanism with multiple sub-sampling processes in the hierarchical approaches results in increased loss of positional information. We evaluate the performance of our proposed approach on two challenging dense multi-label benchmark datasets, and show that PAT improves the current state-of-the-art result by 1.1 MultiTHUMOS datasets, respectively, thereby achieving the new state-of-the-art mAP at 26.5 studies to examine the impact of the different components of our proposed network.

READ FULL TEXT

page 1

page 4

research
03/01/2023

Label Attention Network for sequential multi-label classification

Multi-label classification is a natural problem statement for sequential...
research
09/06/2021

Class Semantics-based Attention for Action Detection

Action localization networks are often structured as a feature encoder s...
research
08/18/2023

Transformer-based Detection of Microorganisms on High-Resolution Petri Dish Images

Many medical or pharmaceutical processes have strict guidelines regardin...
research
06/10/2022

AntPivot: Livestream Highlight Detection via Hierarchical Attention Mechanism

In recent days, streaming technology has greatly promoted the developmen...
research
03/23/2023

Frame-Level Multi-Label Playing Technique Detection Using Multi-Scale Network and Self-Attention Mechanism

Instrument playing technique (IPT) is a key element of musical presentat...
research
01/30/2021

MUSE: Multi-Scale Temporal Features Evolution for Knowledge Tracing

Transformer based knowledge tracing model is an extensively studied prob...
research
03/04/2021

Modeling Multi-Label Action Dependencies for Temporal Action Localization

Real-world videos contain many complex actions with inherent relationshi...

Please sign up or login with your details

Forgot password? Click here to reset