Efficient U-Transformer with Boundary-Aware Loss for Action Segmentation

05/26/2022
by   Dazhao Du, et al.
0

Action classification has made great progress, but segmenting and recognizing actions from long untrimmed videos remains a challenging problem. Most state-of-the-art methods focus on designing temporal convolution-based models, but the limitations on modeling long-term temporal dependencies and inflexibility of temporal convolutions limit the potential of these models. Recently, Transformer-based models with flexible and strong sequence modeling ability have been applied in various tasks. However, the lack of inductive bias and the inefficiency of handling long video sequences limit the application of Transformer in action segmentation. In this paper, we design a pure Transformer-based model without temporal convolutions by incorporating the U-Net architecture. The U-Transformer architecture reduces complexity while introducing an inductive bias that adjacent frames are more likely to belong to the same class, but the introduction of coarse resolutions results in the misclassification of boundaries. We observe that the similarity distribution between a boundary frame and its neighboring frames depends on whether the boundary frame is the start or end of an action segment. Therefore, we further propose a boundary-aware loss based on the distribution of similarity scores between frames from attention modules to enhance the ability to recognize boundaries. Extensive experiments show the effectiveness of our model.

READ FULL TEXT

page 9

page 15

page 16

research
07/14/2020

Alleviating Over-segmentation Errors by Detecting Action Boundaries

We propose an effective framework for the temporal action segmentation t...
research
03/05/2019

MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation

Temporally locating and classifying action segments in long untrimmed vi...
research
10/16/2021

ASFormer: Transformer for Action Segmentation

Algorithms for the action segmentation task typically use temporal model...
research
06/03/2021

Anticipative Video Transformer

We propose Anticipative Video Transformer (AVT), an end-to-end attention...
research
09/11/2023

Temporal Action Localization with Enhanced Instant Discriminability

Temporal action detection (TAD) aims to detect all action boundaries and...
research
03/02/2022

Colar: Effective and Efficient Online Action Detection by Consulting Exemplars

Online action detection has attracted increasing research interests in r...
research
09/07/2023

The Making and Breaking of Camouflage

Not all camouflages are equally effective, as even a partially visible c...

Please sign up or login with your details

Forgot password? Click here to reset