DoRO: Disambiguation of referred object for embodied agents

07/28/2022
by   Pradip Pramanick, et al.
9

Robotic task instructions often involve a referred object that the robot must locate (ground) within the environment. While task intent understanding is an essential part of natural language understanding, less effort is made to resolve ambiguity that may arise while grounding the task. Existing works use vision-based task grounding and ambiguity detection, suitable for a fixed view and a static robot. However, the problem magnifies for a mobile robot, where the ideal view is not known beforehand. Moreover, a single view may not be sufficient to locate all the object instances in the given area, which leads to inaccurate ambiguity detection. Human intervention is helpful only if the robot can convey the kind of ambiguity it is facing. In this article, we present DoRO (Disambiguation of Referred Object), a system that can help an embodied agent to disambiguate the referred object by raising a suitable query whenever required. Given an area where the intended object is, DoRO finds all the instances of the object by aggregating observations from multiple views while exploring scanning the area. It then raises a suitable query using the information from the grounded object instances. Experiments conducted with the AI2Thor simulator show that DoRO not only detects the ambiguity more accurately but also raises verbose queries with more accurate information from the visual-language grounding.

READ FULL TEXT

page 1

page 3

page 4

research
04/30/2019

Learning from Implicit Information in Natural Language Instructions for Robotic Manipulations

Human-robot interaction often occurs in the form of instructions given f...
research
05/28/2018

Interactive Text2Pickup Network for Natural Language based Human-Robot Collaboration

In this paper, we propose the Interactive Text2Pickup (IT2P) network for...
research
10/01/2021

TEACh: Task-driven Embodied Agents that Chat

Robots operating in human spaces must be able to engage in natural langu...
research
07/26/2021

Language Grounding with 3D Objects

Seemingly simple natural language requests to a robot are generally unde...
research
10/17/2017

Interactively Picking Real-World Objects with Unconstrained Spoken Language Instructions

Comprehension of spoken natural language is an essential component for r...
research
07/12/2016

Boundary conditions for Shape from Shading

The Shape From Shading is one of a computer vision field. It studies the...
research
07/12/2023

OG: Equip vision occupancy with instance segmentation and visual grounding

Occupancy prediction tasks focus on the inference of both geometry and s...

Please sign up or login with your details

Forgot password? Click here to reset