Jiasen Lu

research

∙ 06/17/2022

Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks

We propose Unified-IO, a model that performs a large variety of AI tasks...

14 Jiasen Lu, et al. ∙

research

∙ 02/14/2022

ASC me to Do Anything: Multi-task Training for Embodied AI

Embodied AI has seen steady progress across a diverse set of independent...

3 Jiasen Lu, et al. ∙

research

∙ 11/29/2021

A Simple Long-Tailed Recognition Baseline via Vision-Language Model

The visual world naturally exhibits a long-tailed distribution of open c...

9 Teli Ma, et al. ∙

research

∙ 06/02/2021

Container: Context Aggregation Network

Convolutional neural networks (CNNs) are ubiquitous in computer vision, ...

15 Peng Gao, et al. ∙

research

∙ 03/23/2021

Multi-Modal Answer Validation for Knowledge-Based VQA

The problem of knowledge-based visual question answering involves answer...

2 Jialin Wu, et al. ∙

research

∙ 09/23/2020

X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers

Mirroring the success of masked language models, vision-and-language cou...

4 Jaemin Cho, et al. ∙

research

∙ 07/24/2020

Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data

Can we develop visually grounded dialog agents that can efficiently adap...

6 Michael Cogswell, et al. ∙

research

∙ 07/23/2020

Spatially Aware Multimodal Transformers for TextVQA

Textual cues are essential for everyday tasks like buying groceries and ...

11 Yash Kant, et al. ∙

research

∙ 12/05/2019

12-in-1: Multi-Task Vision and Language Representation Learning

Much of vision-and-language research focuses on a small but diverse set ...

22 Jiasen Lu, et al. ∙

research

∙ 08/06/2019

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

We present ViLBERT (short for Vision-and-Language BERT), a model for lea...

9 Jiasen Lu, et al. ∙

research

∙ 04/19/2019

Emergence of Compositional Language with Deep Generational Transmission

Consider a collaborative task that requires communication. Two agents ar...

12 Michael Cogswell, et al. ∙

research

∙ 01/10/2019

Self-Monitoring Navigation Agent via Auxiliary Progress Estimation

The Vision-and-Language Navigation (VLN) task entails an agent following...

2 Chih-Yao Ma, et al. ∙

research

∙ 10/01/2018

Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition

In an open-world setting, it is inevitable that an intelligent agent (e....

20 Jianwei Yang, et al. ∙

research

∙ 08/01/2018

Graph R-CNN for Scene Graph Generation

We propose a novel scene graph generation model called Graph R-CNN, that...

6 Jianwei Yang, et al. ∙

research

∙ 03/27/2018

Neural Baby Talk

We introduce a novel framework for image captioning that can produce nat...

0 Jiasen Lu, et al. ∙

research

∙ 06/05/2017

Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model

We present a novel training framework for neural sequence models, partic...

0 Jiasen Lu, et al. ∙

research

∙ 05/18/2017

ParlAI: A Dialog Research Software Platform

We introduce ParlAI (pronounced "par-lay"), an open-source software plat...

0 Alexander H. Miller, et al. ∙

research

∙ 12/06/2016

Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning

Attention-based neural encoder-decoder frameworks have been widely adopt...

0 Jiasen Lu, et al. ∙

research

∙ 05/31/2016

Hierarchical Question-Image Co-Attention for Visual Question Answering

A number of recent works have proposed attention models for Visual Quest...

0 Jiasen Lu, et al. ∙

research

∙ 05/03/2015

VQA: Visual Question Answering

We propose the task of free-form and open-ended Visual Question Answerin...

0 Aishwarya Agrawal, et al. ∙

Jiasen Lu

Featured Co-authors

Sign in with Google

Consider DeepAI Pro