An Open-source Benchmark of Deep Learning Models for Audio-visual Apparent and Self-reported Personality Recognition

by   Rongfan Liao, et al.

Personality is crucial for understanding human internal and external states. The majority of existing personality computing approaches suffer from complex and dataset-specific pre-processing steps and model training tricks. In the absence of a standardized benchmark with consistent experimental settings, it is not only impossible to fairly compare the real performances of these personality computing models but also makes them difficult to be reproduced. In this paper, we present the first reproducible audio-visual benchmarking framework to provide a fair and consistent evaluation of eight existing personality computing models (e.g., audio, visual and audio-visual) and seven standard deep learning models on both self-reported and apparent personality recognition tasks. We conduct a comprehensive investigation into all the benchmarked models to demonstrate their capabilities in modelling personality traits on two publicly available datasets, audio-visual apparent personality (ChaLearn First Impression) and self-reported personality (UDIVA) datasets. The experimental results conclude: (i) apparent personality traits, inferred from facial behaviours by most benchmarked deep learning models, show more reliability than self-reported ones; (ii) visual models frequently achieved superior performances than audio models on personality recognition; and (iii) non-verbal behaviours contribute differently in predicting different personality traits. We make the code publicly available at .


page 1

page 4

page 10

page 11

page 12

page 16


OxfordVGG Submission to the EGO4D AV Transcription Challenge

This report presents the technical details of our submission on the EGO4...

InDL: A New Datasets and Benchmark for In-Diagram Logic Interpreting based on Visual Illusion

This paper introduces a novel approach to evaluating deep learning model...

Weakly Supervised Visual-Auditory Saliency Detection with Multigranularity Perception

Thanks to the rapid advances in deep learning techniques and the wide av...

HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec

Audio codec models are widely used in audio communication as a crucial t...

Audiovisual SlowFast Networks for Video Recognition

We present Audiovisual SlowFast Networks, an architecture for integrated...

REACT2023: the first Multi-modal Multiple Appropriate Facial Reaction Generation Challenge

The Multi-modal Multiple Appropriate Facial Reaction Generation Challeng...

Benchmarking and Analyzing Generative Data for Visual Recognition

Advancements in large pre-trained generative models have expanded their ...

Please sign up or login with your details

Forgot password? Click here to reset