Grounding Language Attributes to Objects using Bayesian Eigenobjects

05/30/2019
by   Vanya Cohen, et al.
0

We develop a system to disambiguate objects based on simple physical descriptions. The system takes as input a natural language phrase and a depth image containing a segmented object and predicts how similar the observed object is to the described object. Our system is designed to learn from only a small amount of human-labeled language data and generalize to viewpoints not represented in the language-annotated depth-image training set. By decoupling 3D shape representation from language representation, our method is able to ground language to novel objects using a small amount of language-annotated depth-data and a larger corpus of unlabeled 3D object meshes, even when these objects are partially observed from unusual viewpoints. Our system is able to disambiguate between novel objects, observed via depth-images, based on natural language descriptions. Our method also enables view-point transfer; trained on human-annotated data on a small set of depth-images captured from frontal viewpoints, our system successfully predicted object attributes from rear views despite having no such depth images in its training set. Finally, we demonstrate our system on a Baxter robot, enabling it to pick specific objects based on human-provided natural language descriptions.

READ FULL TEXT

page 1

page 4

page 6

research
07/18/2017

Grounding Spatio-Semantic Referring Expressions for Human-Robot Interaction

The human language is one of the most natural interfaces for humans to i...
research
06/23/2020

Robot Object Retrieval with Contextual Natural Language Queries

Natural language object retrieval is a highly useful yet challenging tas...
research
05/22/2018

Experiments on Learning Based Industrial Bin-picking with Iterative Visual Recognition

This paper shows experimental results on learning based randomized bin-p...
research
11/05/2020

Utilizing Every Image Object for Semi-supervised Phrase Grounding

Phrase grounding models localize an object in the image given a referrin...
research
02/13/2023

Paparazzi: A Deep Dive into the Capabilities of Language and Vision Models for Grounding Viewpoint Descriptions

Existing language and vision models achieve impressive performance in im...
research
05/11/2022

Identifying concept libraries from language about object structure

Our understanding of the visual world goes beyond naming objects, encomp...
research
09/09/2021

Reconstructing and grounding narrated instructional videos in 3D

Narrated instructional videos often show and describe manipulations of s...

Please sign up or login with your details

Forgot password? Click here to reset