Voxel-informed Language Grounding

05/19/2022
by   Rodolfo Corona, et al.
1

Natural language applied to natural 2D images describes a fundamentally 3D world. We present the Voxel-informed Language Grounder (VLG), a language grounding model that leverages 3D geometric information in the form of voxel maps derived from the visual input using a volumetric reconstruction model. We show that VLG significantly improves grounding accuracy on SNARE, an object reference game task. At the time of writing, VLG holds the top place on the SNARE leaderboard, achieving SOTA results with a 2.0

READ FULL TEXT
research
11/16/2018

Temporal Grounding Graphs for Language Understanding with Accrued Visual-Linguistic Context

A robot's ability to understand or ground natural language instructions ...
research
06/05/2019

Learning to Compose and Reason with Language Tree Structures for Visual Grounding

Grounding natural language in images, such as localizing "the black dog ...
research
09/07/2023

DetermiNet: A Large-Scale Diagnostic Dataset for Complex Visually-Grounded Referencing using Determiners

State-of-the-art visual grounding models can achieve high detection accu...
research
03/21/2023

Joint Visual Grounding and Tracking with Natural Language Specification

Tracking by natural language specification aims to locate the referred t...
research
07/12/2023

OG: Equip vision occupancy with instance segmentation and visual grounding

Occupancy prediction tasks focus on the inference of both geometry and s...
research
10/09/2020

Pragmatically Informative Color Generation by Grounding Contextual Modifiers

Grounding language in contextual information is crucial for fine-grained...
research
09/07/2018

Meteorologists and Students: A resource for language grounding of geographical descriptors

We present a data resource which can be useful for research purposes on ...

Please sign up or login with your details

Forgot password? Click here to reset