Visually Grounded Concept Composition

09/29/2021
by   Bowen Zhang, et al.
1

We investigate ways to compose complex concepts in texts from primitive ones while grounding them in images. We propose Concept and Relation Graph (CRG), which builds on top of constituency analysis and consists of recursively combined concepts with predicate functions. Meanwhile, we propose a concept composition neural network called Composer to leverage the CRG for visually grounded concept learning. Specifically, we learn the grounding of both primitive and all composed concepts by aligning them to images and show that learning to compose leads to more robust grounding results, measured in text-to-image matching accuracy. Notably, our model can model grounded concepts forming at both the finer-grained sentence level and the coarser-grained intermediate level (or word-level). Composer leads to pronounced improvement in matching accuracy when the evaluation data has significant compound divergence from the training data.

READ FULL TEXT

page 2

page 8

research
07/19/2017

Learning Visually Grounded Sentence Representations

We introduce a variety of models, trained on a supervised image captioni...
research
11/22/2015

Visual Word2Vec (vis-w2v): Learning Visually Grounded Word Embeddings Using Abstract Scenes

We propose a model to learn visually grounded word embeddings (vis-w2v) ...
research
06/07/2019

Visually Grounded Neural Syntax Acquisition

We present the Visually Grounded Neural Syntax Learner (VG-NSL), an appr...
research
01/25/2016

A Label Semantics Approach to Linguistic Hedges

We introduce a model for the linguistic hedges `very' and `quite' within...
research
01/17/2023

GLIGEN: Open-Set Grounded Text-to-Image Generation

Large-scale text-to-image diffusion models have made amazing advances. H...
research
04/18/2021

Language in a (Search) Box: Grounding Language Learning in Real-World Human-Machine Interaction

We investigate grounded language learning through real-world data, by mo...
research
05/05/2023

Interactive Acquisition of Fine-grained Visual Concepts by Exploiting Semantics of Generic Characterizations in Discourse

Interactive Task Learning (ITL) concerns learning about unforeseen domai...

Please sign up or login with your details

Forgot password? Click here to reset