JRDB-Act: A Large-scale Multi-modal Dataset for Spatio-temporal Action, Social Group and Activity Detection

by   Mahsa Ehsanpour, et al.

The availability of large-scale video action understanding datasets has facilitated advances in the interpretation of visual scenes containing people. However, learning to recognize human activities in an unconstrained real-world environment, with potentially highly unbalanced and long-tailed distributed data remains a significant challenge, not least owing to the lack of a reflective large-scale dataset. Most existing large-scale datasets are either collected from a specific or constrained environment, e.g. kitchens or rooms, or video sharing platforms such as YouTube. In this paper, we introduce JRDB-Act, a multi-modal dataset, as an extension of the existing JRDB, which is captured by asocial mobile manipulator and reflects a real distribution of human daily life actions in a university campus environment. JRDB-Act has been densely annotated with atomic actions, comprises over 2.8M action labels, constituting a large-scale spatio-temporal action detection dataset. Each human bounding box is labelled with one pose-based action label and multiple (optional) interaction-based action labels. Moreover JRDB-Act comes with social group identification annotations conducive to the task of grouping individuals based on their interactions in the scene to infer their social activities (common activities in each social group).


Joint learning of Social Groups, Individuals Action and Sub-group Activities in Videos

The state-of-the art solutions for human activity understanding from a v...

VPN: Learning Video-Pose Embedding for Activities of Daily Living

In this paper, we focus on the spatio-temporal aspect of recognizing Act...

Are You Imitating Me? Unsupervised Sparse Modeling for Group Activity Analysis from a Single Video

A framework for unsupervised group activity analysis from a single video...

Computer-Aided Automated Detection of Gene-Controlled Social Actions of Drosophila

Gene expression of social actions in Drosophilae has been attracting wid...

ETRI-Activity3D: A Large-Scale RGB-D Dataset for Robots to Recognize Daily Activities of the Elderly

Deep learning, based on which many modern algorithms operate, is well kn...

Learning from Synthetic Human Group Activities

The understanding of complex human interactions and group activities has...

Understanding Human Gaze Communication by Spatio-Temporal Graph Reasoning

This paper addresses a new problem of understanding human gaze communica...

Please sign up or login with your details

Forgot password? Click here to reset