Joint Adversarial and Collaborative Learning for Self-Supervised Action Recognition

by   Tianyu Guo, et al.

Considering the instance-level discriminative ability, contrastive learning methods, including MoCo and SimCLR, have been adapted from the original image representation learning task to solve the self-supervised skeleton-based action recognition task. These methods usually use multiple data streams (i.e., joint, motion, and bone) for ensemble learning, meanwhile, how to construct a discriminative feature space within a single stream and effectively aggregate the information from multiple streams remains an open problem. To this end, we first apply a new contrastive learning method called BYOL to learn from skeleton data and formulate SkeletonBYOL as a simple yet effective baseline for self-supervised skeleton-based action recognition. Inspired by SkeletonBYOL, we further present a joint Adversarial and Collaborative Learning (ACL) framework, which combines Cross-Model Adversarial Learning (CMAL) and Cross-Stream Collaborative Learning (CSCL). Specifically, CMAL learns single-stream representation by cross-model adversarial loss to obtain more discriminative features. To aggregate and interact with multi-stream information, CSCL is designed by generating similarity pseudo label of ensemble learning as supervision and guiding feature generation for individual streams. Exhaustive experiments on three datasets verify the complementary properties between CMAL and CSCL and also verify that our method can perform favorably against state-of-the-art methods using various evaluation protocols. Our code and models are publicly available at <>.


page 1

page 9


MS^2L: Multi-Task Self-Supervised Learning for Skeleton Based Action Recognition

In this paper, we address self-supervised representation learning from h...

Cross-Stream Contrastive Learning for Self-Supervised Skeleton-Based Action Recognition

Self-supervised skeleton-based action recognition enjoys a rapid growth ...

Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Recognition

In recent years, self-supervised representation learning for skeleton-ba...

Unsupervised Human Action Recognition with Skeletal Graph Laplacian and Self-Supervised Viewpoints Invariance

This paper presents a novel end-to-end method for the problem of skeleto...

Pyramid Self-attention Polymerization Learning for Semi-supervised Skeleton-based Action Recognition

Most semi-supervised skeleton-based action recognition approaches aim to...

PSUMNet: Unified Modality Part Streams are All You Need for Efficient Pose-based Action Recognition

Pose-based action recognition is predominantly tackled by approaches whi...

Actionlet-Dependent Contrastive Learning for Unsupervised Skeleton-Based Action Recognition

The self-supervised pretraining paradigm has achieved great success in s...

Please sign up or login with your details

Forgot password? Click here to reset