Object Captioning and Retrieval with Natural Language

03/16/2018
by   Anh Nguyen, et al.
0

We address the problem of jointly learning vision and language to understand the object in a fine-grained manner. The key idea of our approach is the use of object descriptions to provide the detailed understanding of an object. Based on this idea, we propose two new architectures to solve two related problems: object captioning and natural language-based object retrieval. The goal of the object captioning task is to simultaneously detect the object and generate its associated description, while in the object retrieval task, the goal is to localize an object given an input query. We demonstrate that both problems can be solved effectively using hybrid end-to-end CNN-LSTM networks. The experimental results on our new challenging dataset show that our methods outperform recent methods by a fair margin, while providing a detailed understanding of the object and having fast inference time. The source code will be made available.

READ FULL TEXT

page 1

page 6

page 7

research
11/13/2015

Natural Language Object Retrieval

In this paper, we address the task of natural language object retrieval,...
research
10/21/2022

PoseScript: 3D Human Poses from Natural Language

Natural language is leveraged in many computer vision tasks such as imag...
research
03/08/2020

Object-Oriented Video Captioning with Temporal Graph and Prior Knowledge Building

Traditional video captioning requests a holistic description of the vide...
research
05/24/2017

Attention-based Natural Language Person Retrieval

Following the recent progress in image classification and captioning usi...
research
04/13/2018

Text-to-Clip Video Retrieval with Early Fusion and Re-Captioning

We propose a novel method capable of retrieving clips from untrimmed vid...
research
03/26/2023

GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for Real-time Soccer Commentary Generation

Despite the recent emergence of video captioning models, how to generate...
research
04/04/2021

FixMyPose: Pose Correctional Captioning and Retrieval

Interest in physical therapy and individual exercises such as yoga/dance...

Please sign up or login with your details

Forgot password? Click here to reset