Self-Supervised Video Transformers for Isolated Sign Language Recognition

This paper presents an in-depth analysis of various self-supervision methods for isolated sign language recognition (ISLR). We consider four recently introduced transformer-based approaches to self-supervised learning from videos, and four pre-training data regimes, and study all the combinations on the WLASL2000 dataset. Our findings reveal that MaskFeat achieves performance superior to pose-based and supervised video models, with a top-1 accuracy of 79.02 to produce representations of ASL signs using linear probing on diverse phonological features. This study underscores the value of architecture and pre-training task choices in ISLR. Specifically, our results on WLASL2000 highlight the power of masked reconstruction pre-training, and our linear probing results demonstrate the importance of hierarchical vision transformers for sign language representation.

READ FULL TEXT

page 1

page 2

page 6

page 7

page 12

page 13

page 14

research
12/23/2021

SLIP: Self-supervision meets Language-Image Pre-training

Recent work has shown that self-supervised pre-training leads to improve...
research
11/29/2021

Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis

Vision Transformers (ViT)s have shown great performance in self-supervis...
research
05/08/2023

SignBERT+: Hand-model-aware Self-supervised Pre-training for Sign Language Understanding

Hand gesture serves as a crucial role during the expression of sign lang...
research
10/11/2021

SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign Language Recognition

Hand gesture serves as a critical role in sign language. Current deep-le...
research
12/16/2021

Masked Feature Prediction for Self-Supervised Visual Pre-Training

We present Masked Feature Prediction (MaskFeat) for self-supervised pre-...
research
07/27/2021

Is Object Detection Necessary for Human-Object Interaction Recognition?

This paper revisits human-object interaction (HOI) recognition at image ...
research
03/22/2022

Self-supervision through Random Segments with Autoregressive Coding (RandSAC)

Inspired by the success of self-supervised autoregressive representation...

Please sign up or login with your details

Forgot password? Click here to reset