Multimodal Deep Models for Predicting Affective Responses Evoked by Movies

09/16/2019
by   Ha Thi Phuong Thao, et al.
0

The goal of this study is to develop and analyze multimodal models for predicting experienced affective responses of viewers watching movie clips. We develop hybrid multimodal prediction models based on both the video and audio of the clips. For the video content, we hypothesize that both image content and motion are crucial features for evoked emotion prediction. To capture such information, we extract features from RGB frames and optical flow using pre-trained neural networks. For the audio model, we compute an enhanced set of low-level descriptors including intensity, loudness, cepstrum, linear predictor coefficients, pitch and voice quality. Both visual and audio features are then concatenated to create audio-visual features, which are used to predict the evoked emotion. To classify the movie clips into the corresponding affective response categories, we propose two approaches based on deep neural network models. The first one is based on fully connected layers without memory on the time component, the second incorporates the sequential dependency with a long short-term memory recurrent neural network (LSTM). We perform a thorough analysis of the importance of each feature set. Our experiments reveal that in our set-up, predicting emotions at each time step independently gives slightly better accuracy performance than with the LSTM. Interestingly, we also observe that the optical flow is more informative than the RGB in videos, and overall, models using audio features are more accurate than those based on video features when making the final prediction of evoked emotions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/18/2017

Continuous Multimodal Emotion Recognition Approach for AVEC 2017

This paper reports the analysis of audio and visual features in predicti...
research
06/17/2023

Enhancing the Prediction of Emotional Experience in Movies using Deep Neural Networks: The Significance of Audio and Language

Our paper focuses on making use of deep neural network models to accurat...
research
06/01/2018

Synchronous Prediction of Arousal and Valence Using LSTM Network for Affective Video Content Analysis

The affect embedded in video data conveys high-level semantic informatio...
research
04/05/2020

Emotional Video to Audio Transformation Using Deep Recurrent Neural Networks and a Neuro-Fuzzy System

Generating music with emotion similar to that of an input video is a ver...
research
12/12/2018

A Multimodal LSTM for Predicting Listener Empathic Responses Over Time

People naturally understand the emotions of-and often also empathize wit...
research
09/06/2017

Affect Recognition in Ads with Application to Computational Advertising

Advertisements (ads) often include strongly emotional content to leave a...
research
11/27/2019

GLA in MediaEval 2018 Emotional Impact of Movies Task

The visual and audio information from movies can evoke a variety of emot...

Please sign up or login with your details

Forgot password? Click here to reset