End-to-end Temporal Action Detection with Transformer

06/18/2021
by   Xiaolong Liu, et al.
0

Temporal action detection (TAD) aims to determine the semantic label and the boundaries of every action instance in an untrimmed video. It is a fundamental task in video understanding and significant progress has been made in TAD. Previous methods involve multiple stages or networks and hand-designed rules or operations, which fall short in efficiency and flexibility. Here, we construct an end-to-end framework for TAD upon Transformer, termed TadTR, which simultaneously predicts all action instances as a set of labels and temporal locations in parallel. TadTR is able to adaptively extract temporal context information needed for making action predictions, by selectively attending to a number of snippets in a video. It greatly simplifies the pipeline of TAD and runs much faster than previous detectors. Our method achieves state-of-the-art performance on HACS Segments and THUMOS14 and competitive performance on ActivityNet-1.3. Our code will be made available at <https://github.com/xlliu7/TadTR>.

READ FULL TEXT

page 4

page 8

research
04/06/2022

An Empirical Study of End-to-End Temporal Action Detection

Temporal action detection (TAD) is an important yet challenging task in ...
research
11/08/2022

SimOn: A Simple Framework for Online Temporal Action Localization

Online Temporal Action Localization (On-TAL) aims to immediately provide...
research
08/25/2022

Adaptive Perception Transformer for Temporal Action Localization

Temporal action localization aims to predict the boundary and category o...
research
11/28/2022

Connecting the Dots: Floorplan Reconstruction Using Two-Level Queries

We address 2D floorplan reconstruction from 3D scans. Existing approache...
research
09/18/2021

Towards High-Quality Temporal Action Detection with Sparse Proposals

Temporal Action Detection (TAD) is an essential and challenging topic in...
research
10/20/2022

PointTAD: Multi-Label Temporal Action Detection with Learnable Query Points

Traditional temporal action detection (TAD) usually handles untrimmed vi...
research
05/05/2022

BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection

Temporal action detection (TAD) is extensively studied in the video unde...

Please sign up or login with your details

Forgot password? Click here to reset