COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers

09/03/2023
by   Julien Denize, et al.
0

We present COMEDIAN, a novel pipeline to initialize spatio-temporal transformers for action spotting, which involves self-supervised learning and knowledge distillation. Action spotting is a timestamp-level temporal action detection task. Our pipeline consists of three steps, with two initialization stages. First, we perform self-supervised initialization of a spatial transformer using short videos as input. Additionally, we initialize a temporal transformer that enhances the spatial transformer's outputs with global context through knowledge distillation from a pre-computed feature bank aligned with each short video segment. In the final step, we fine-tune the transformers to the action spotting task. The experiments, conducted on the SoccerNet-v2 dataset, demonstrate state-of-the-art performance and validate the effectiveness of COMEDIAN's pretraining paradigm. Our results highlight several advantages of our pretraining pipeline, including improved performance and faster convergence compared to non-pretrained models.

READ FULL TEXT
research
10/03/2022

Attention Distillation: self-supervised vision transformer students need more guidance

Self-supervised learning has been widely applied to train high-quality v...
research
06/17/2021

Long-Short Temporal Contrastive Learning of Video Transformers

Video transformers have recently emerged as a competitive alternative to...
research
09/16/2022

Self-Supervised Learning of Phenotypic Representations from Cell Images with Weak Labels

We propose WS-DINO as a novel framework to use weak label information in...
research
09/29/2020

Knowledge Fusion Transformers for Video Action Recognition

We introduce Knowledge Fusion Transformers for video action classificati...
research
07/16/2022

SSMTL++: Revisiting Self-Supervised Multi-Task Learning for Video Anomaly Detection

A self-supervised multi-task learning (SSMTL) framework for video anomal...
research
06/19/2023

ExpPoint-MAE: Better interpretability and performance for self-supervised point cloud transformers

In this paper we delve into the properties of transformers, attained thr...
research
10/25/2022

Audio MFCC-gram Transformers for respiratory insufficiency detection in COVID-19

This work explores speech as a biomarker and investigates the detection ...

Please sign up or login with your details

Forgot password? Click here to reset