Pyramid Self-attention Polymerization Learning for Semi-supervised Skeleton-based Action Recognition

by   Binqian Xu, et al.

Most semi-supervised skeleton-based action recognition approaches aim to learn the skeleton action representations only at the joint level, but neglect the crucial motion characteristics at the coarser-grained body (e.g., limb, trunk) level that provide rich additional semantic information, though the number of labeled data is limited. In this work, we propose a novel Pyramid Self-attention Polymerization Learning (dubbed as PSP Learning) framework to jointly learn body-level, part-level, and joint-level action representations of joint and motion data containing abundant and complementary semantic information via contrastive learning covering coarse-to-fine granularity. Specifically, to complement semantic information from coarse to fine granularity in skeleton actions, we design a new Pyramid Polymerizing Attention (PPA) mechanism that firstly calculates the body-level attention map, part-level attention map, and joint-level attention map, as well as polymerizes these attention maps in a level-by-level way (i.e., from body level to part level, and further to joint level). Moreover, we present a new Coarse-to-fine Contrastive Loss (CCL) including body-level contrast loss, part-level contrast loss, and joint-level contrast loss to jointly measure the similarity between the body/part/joint-level contrasting features of joint and motion data. Finally, extensive experiments are conducted on the NTU RGB+D and North-Western UCLA datasets to demonstrate the competitive performance of the proposed PSP Learning in the semi-supervised skeleton-based action recognition task. The source codes of PSP Learning are publicly available at


page 2

page 5

page 9

page 12


Spatiotemporal Decouple-and-Squeeze Contrastive Learning for Semi-Supervised Skeleton-based Action Recognition

Contrastive learning has been successfully leveraged to learn action rep...

Prototypical Contrast and Reverse Prediction: Unsupervised Skeleton Based Action Recognition

In this paper, we focus on unsupervised representation learning for skel...

Contrastive Learning from Spatio-Temporal Mixed Skeleton Sequences for Self-Supervised Skeleton-Based Action Recognition

Self-supervised skeleton-based action recognition with contrastive learn...

Joint Adversarial and Collaborative Learning for Self-Supervised Action Recognition

Considering the instance-level discriminative ability, contrastive learn...

Grapy-ML: Graph Pyramid Mutual Learning for Cross-dataset Human Parsing

Human parsing, or human body part semantic segmentation, has been an act...

Actionlet-Dependent Contrastive Learning for Unsupervised Skeleton-Based Action Recognition

The self-supervised pretraining paradigm has achieved great success in s...

Human Action Recognition with Deep Temporal Pyramids

Deep convolutional neural networks (CNNs) are nowadays achieving signifi...

Please sign up or login with your details

Forgot password? Click here to reset