Language Aligned Visual Representations Predict Human Behavior in Naturalistic Learning Tasks

by   Can Demircan, et al.

Humans possess the ability to identify and generalize relevant features of natural objects, which aids them in various situations. To investigate this phenomenon and determine the most effective representations for predicting human behavior, we conducted two experiments involving category learning and reward learning. Our experiments used realistic images as stimuli, and participants were tasked with making accurate decisions based on novel stimuli for all trials, thereby necessitating generalization. In both tasks, the underlying rules were generated as simple linear functions using stimulus dimensions extracted from human similarity judgments. Notably, participants successfully identified the relevant stimulus features within a few trials, demonstrating effective generalization. We performed an extensive model comparison, evaluating the trial-by-trial predictive accuracy of diverse deep learning models' representations of human choices. Intriguingly, representations from models trained on both text and image data consistently outperformed models trained solely on images, even surpassing models using the features that generated the task itself. These findings suggest that language-aligned visual representations possess sufficient richness to describe human generalization in naturalistic settings and emphasize the role of language in shaping human cognition.


page 1

page 2

page 3

page 5

page 6

page 7

page 17

page 18


Learning Realistic Patterns from Unrealistic Stimuli: Generalization and Data Anonymization

Good training data is a prerequisite to develop useful ML applications. ...

Expectation Learning for Adaptive Crossmodal Stimuli Association

The human brain is able to learn, generalize, and predict crossmodal sti...

"Task-relevant autoencoding" enhances machine learning for human neuroscience

In human neuroscience, machine learning can help reveal lower-dimensiona...

Pragmatic inference and visual abstraction enable contextual flexibility during visual communication

Visual modes of communication are ubiquitous in modern life — from maps ...

Modeling Human Categorization of Natural Images Using Deep Feature Representations

Over the last few decades, psychologists have developed sophisticated fo...

Neural Latent Aligner: Cross-trial Alignment for Learning Representations of Complex, Naturalistic Neural Data

Understanding the neural implementation of complex human behaviors is on...

How trial-to-trial learning shapes mappings in the mental lexicon: Modelling Lexical Decision with Linear Discriminative Learning

Priming and antipriming can be modelled with error-driven learning (Mars...

Please sign up or login with your details

Forgot password? Click here to reset