Segmentation from Natural Language Expressions

03/20/2016
by   Ronghang Hu, et al.
0

In this paper we approach the novel problem of segmenting an image based on a natural language expression. This is different from traditional semantic segmentation over a predefined set of semantic classes, as e.g., the phrase "two men sitting on the right bench" requires segmenting only the two people on the right bench and no one standing or sitting on another bench. Previous approaches suitable for this task were limited to a fixed set of categories and/or rectangular regions. To produce pixelwise segmentation for the language expression, we propose an end-to-end trainable recurrent and convolutional network model that jointly learns to process visual and linguistic information. In our model, a recurrent LSTM network is used to encode the referential expression into a vector representation, and a fully convolutional network is used to a extract a spatial feature map from the image and output a spatial response map for the target object. We demonstrate on a benchmark dataset that our model can produce quality segmentation output from the natural language expression, and outperforms baseline methods by a large margin.

READ FULL TEXT

page 18

page 19

page 20

page 21

page 22

page 23

page 24

page 25

research
10/10/2019

Referring Expression Object Segmentation with Caption-Aware Consistency

Referring expressions are natural language descriptions that identify a ...
research
07/06/2018

Dynamic Multimodal Instance Segmentation guided by natural language queries

In this paper, we address the task of segmenting an object given a natur...
research
01/30/2020

Dual Convolutional LSTM Network for Referring Image Segmentation

We consider referring image segmentation. It is a problem at the interse...
research
12/17/2022

Fully and Weakly Supervised Referring Expression Segmentation with End-to-End Learning

Referring Expression Segmentation (RES), which is aimed at localizing an...
research
03/23/2017

Recurrent Multimodal Interaction for Referring Image Segmentation

In this paper we are interested in the problem of image segmentation giv...
research
11/13/2015

Natural Language Object Retrieval

In this paper, we address the task of natural language object retrieval,...
research
09/17/2023

CLIPUNetr: Assisting Human-robot Interface for Uncalibrated Visual Servoing Control with CLIP-driven Referring Expression Segmentation

The classical human-robot interface in uncalibrated image-based visual s...

Please sign up or login with your details

Forgot password? Click here to reset