SequentialPointNet: A strong parallelized point cloud sequence network for 3D action recognition

11/16/2021
by   Xing Li, et al.
13

Point cloud sequences of 3D human actions exhibit unordered intra-frame spatial information and ordered interframe temporal information. In order to capture the spatiotemporal structures of the point cloud sequences, cross-frame spatio-temporal local neighborhoods around the centroids are usually constructed. However, the computationally expensive construction procedure of spatio-temporal local neighborhoods severely limits the parallelism of models. Moreover, it is unreasonable to treat spatial and temporal information equally in spatio-temporal local learning, because human actions are complicated along the spatial dimensions and simple along the temporal dimension. In this paper, to avoid spatio-temporal local encoding, we propose a strong parallelized point cloud sequence network referred to as SequentialPointNet for 3D action recognition. SequentialPointNet is composed of two serial modules, i.e., an intra-frame appearance encoding module and an inter-frame motion encoding module. For modeling the strong spatial structures of human actions, each point cloud frame is processed in parallel in the intra-frame appearance encoding module and the feature vector of each frame is output to form a feature vector sequence that characterizes static appearance changes along the temporal dimension. For modeling the weak temporal changes of human actions, in the inter-frame motion encoding module, the temporal position encoding and the hierarchical pyramid pooling strategy are implemented on the feature vector sequence. In addition, in order to better explore spatio-temporal content, multiple level features of human movements are aggregated before performing the end-to-end 3D action recognition. Extensive experiments conducted on three public datasets show that SequentialPointNet outperforms stateof-the-art approaches.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 11

research
10/19/2021

Spatial-Temporal Transformer for 3D Point Cloud Sequences

Effective learning of spatial-temporal information within a point cloud ...
research
07/22/2023

Two-stream Multi-level Dynamic Point Transformer for Two-person Interaction Recognition

As a fundamental aspect of human life, two-person interactions contain m...
research
05/27/2022

PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences

Point cloud sequences are irregular and unordered in the spatial dimensi...
research
02/04/2016

Joint Recognition and Segmentation of Actions via Probabilistic Integration of Spatio-Temporal Fisher Vectors

We propose a hierarchical approach to multi-action recognition that perf...
research
08/19/2023

TTPOINT: A Tensorized Point Cloud Network for Lightweight Action Recognition with Event Cameras

Event cameras have gained popularity in computer vision due to their dat...
research
12/09/2021

Spatio-temporal Relation Modeling for Few-shot Action Recognition

We propose a novel few-shot action recognition framework, STRM, which en...
research
02/20/2019

Dynamic Matrix Decomposition for Action Recognition

Designing a technique for the automatic analysis of different actions in...

Please sign up or login with your details

Forgot password? Click here to reset