ArtEmis: Affective Language for Visual Art

by   Panos Achlioptas, et al.

We present a novel large-scale dataset and accompanying machine learning models aimed at providing a detailed understanding of the interplay between visual content, its emotional effect, and explanations for the latter in language. In contrast to most existing annotation datasets in computer vision, we focus on the affective experience triggered by visual artworks and ask the annotators to indicate the dominant emotion they feel for a given image and, crucially, to also provide a grounded verbal explanation for their emotion choice. As we demonstrate below, this leads to a rich set of signals for both the objective content and the affective impact of an image, creating associations with abstract concepts (e.g., "freedom" or "love"), or references that go beyond what is directly visible, including visual similes and metaphors, or subjective references to personal experiences. We focus on visual art (e.g., paintings, artistic photographs) as it is a prime example of imagery created to elicit emotional responses from its viewers. Our dataset, termed ArtEmis, contains 439K emotion attributions and explanations from humans, on 81K artworks from WikiArt. Building on this data, we train and demonstrate a series of captioning systems capable of expressing and explaining emotions from visual stimuli. Remarkably, the captions produced by these systems often succeed in reflecting the semantic and abstract content of the image, going well beyond systems trained on existing datasets. The collected dataset and developed methods are available at


page 2

page 3

page 4

page 5

page 9


Taking an Emotional Look at Video Paragraph Captioning

Translating visual data into natural language is essential for machines ...

Affection: Learning Affective Explanations for Real-World Visual Data

In this work, we explore the emotional reactions that real-world images ...

Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations

We introduce Affective Visual Dialog, an emotion explanation and reasoni...

It is Okay to Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection

Datasets that capture the connection between vision, language, and affec...

HL Dataset: Grounding High-Level Linguistic Concepts in Vision

Current captioning datasets, focus on object-centric captions, describin...

EmoSet: A Large-scale Visual Emotion Dataset with Rich Attributes

Visual Emotion Analysis (VEA) aims at predicting people's emotional resp...

Can Machines Help Us Answering Question 16 in Datasheets, and In Turn Reflecting on Inappropriate Content?

Large datasets underlying much of current machine learning raise serious...

Please sign up or login with your details

Forgot password? Click here to reset