Improving Video Violence Recognition with Human Interaction Learning on 3D Skeleton Point Clouds

08/26/2023
by   Yukun Su, et al.
0

Deep learning has proved to be very effective in video action recognition. Video violence recognition attempts to learn the human multi-dynamic behaviours in more complex scenarios. In this work, we develop a method for video violence recognition from a new perspective of skeleton points. Unlike the previous works, we first formulate 3D skeleton point clouds from human skeleton sequences extracted from videos and then perform interaction learning on these 3D skeleton point clouds. Specifically, we propose two types of Skeleton Points Interaction Learning (SPIL) strategies: (i) Local-SPIL: by constructing a specific weight distribution strategy between local regional points, Local-SPIL aims to selectively focus on the most relevant parts of them based on their features and spatial-temporal position information. In order to capture diverse types of relation information, a multi-head mechanism is designed to aggregate different features from independent heads to jointly handle different types of relationships between points. (ii) Global-SPIL: to better learn and refine the features of the unordered and unstructured skeleton points, Global-SPIL employs the self-attention layer that operates directly on the sampled points, which can help to make the output more permutation-invariant and well-suited for our task. Extensive experimental results validate the effectiveness of our approach and show that our model outperforms the existing networks and achieves new state-of-the-art performance on video violence datasets.

READ FULL TEXT

page 1

page 2

page 3

page 10

research
09/23/2022

View-Invariant Skeleton-based Action Recognition via Global-Local Contrastive Learning

Skeleton-based human action recognition has been drawing more interest r...
research
02/17/2023

Dynamic Spatial-temporal Hypergraph Convolutional Network for Skeleton-based Action Recognition

Skeleton-based action recognition relies on the extraction of spatial-te...
research
05/06/2019

A multimodal lossless coding method for skeletons in videos

Nowadays, skeleton information in videos plays an important role in huma...
research
10/06/2022

Focal and Global Spatial-Temporal Transformer for Skeleton-based Action Recognition

Despite great progress achieved by transformer in various vision tasks, ...
research
05/30/2018

A Fine-to-Coarse Convolutional Neural Network for 3D Human Action Recognition

This paper presents a new framework for human action recognition from 3D...
research
11/10/2022

Contrastive Self-Supervised Learning for Skeleton Representations

Human skeleton point clouds are commonly used to automatically classify ...
research
10/09/2021

SOMA: Solving Optical Marker-Based MoCap Automatically

Marker-based optical motion capture (mocap) is the "gold standard" metho...

Please sign up or login with your details

Forgot password? Click here to reset