b'Lei Ji'

research

∙ 07/10/2023

KU-DMIS-MSRA at RadSum23: Pre-trained Vision-Language Model for Radiology Report Summarization

In this paper, we introduce CheXOFA, a new pre-trained vision-language m...

0 Gangwoo Kim, et al. ∙

research

∙ 06/27/2023

GroundNLQ @ Ego4D Natural Language Queries Challenge 2023

In this report, we present our champion solution for Ego4D Natural Langu...

0 Zhijian Hou, et al. ∙

research

∙ 06/14/2023

AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn

Recent research on Large Language Models (LLMs) has led to remarkable ad...

0 Difei Gao, et al. ∙

research

∙ 03/29/2023

TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs

Artificial Intelligence (AI) has made incredible progress recently. On t...

0 Yaobo Liang, et al. ∙

research

∙ 12/19/2022

MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering

To build Video Question Answering (VideoQA) systems capable of assisting...

0 Difei Gao, et al. ∙

research

∙ 11/16/2022

An Efficient COarse-to-fiNE Alignment Framework @ Ego4D Natural Language Queries Challenge 2022

This technical report describes the CONE approach for Ego4D Natural Lang...

0 Zhijian Hou, et al. ∙

research

∙ 09/22/2022

CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding

Video temporal grounding (VTG) targets to localize temporal moments in a...

0 Zhijian Hou, et al. ∙

research

∙ 04/08/2022

From PHY to QoE: A Parameterized Framework Design

The rapid development of 5G communication technology has given birth to ...

0 Hao Wang, et al. ∙

research

∙ 12/02/2021

ScaleVLAD: Improving Multimodal Sentiment Analysis via Multi-Scale Fusion of Locally Descriptors

Fusion technique is a key research topic in multimodal sentiment analysi...

0 Huaishao Luo, et al. ∙

research

∙ 11/24/2021

NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion

This paper presents a unified multimodal pre-trained model called NÜWA t...

26 Chenfei Wu, et al. ∙

research

∙ 08/05/2021

Hybrid Reasoning Network for Video-based Commonsense Captioning

The task of video-based commonsense captioning aims to generate event-wi...

14 Weijiang Yu, et al. ∙

research

∙ 06/18/2021

GEM: A General Evaluation Benchmark for Multimodal Tasks

In this paper, we present GEM as a General Evaluation benchmark for Mult...

0 Lin Su, et al. ∙

research

∙ 04/30/2021

GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions

Generating videos from text is a challenging task due to its high comput...

14 Chenfei Wu, et al. ∙

research

∙ 04/18/2021

CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval

Video-text retrieval plays an essential role in multi-modal research and...

0 Huaishao Luo, et al. ∙

research

∙ 09/22/2020

GRACE: Gradient Harmonized and Cascaded Labeling for Aspect-based Sentiment Analysis

In this paper, we focus on the imbalance issue, which is rarely studied ...

0 Huaishao Luo, et al. ∙

research

∙ 09/16/2020

Tag and Correct: Question aware Open Information Extraction with Two-stage Decoding

Question Aware Open Information Extraction (Question aware Open IE) take...

6 Martin Kuo, et al. ∙

research

∙ 05/02/2020

A Benchmark for Structured Procedural Knowledge Extraction from Cooking Videos

Procedural knowledge, which we define as concrete information about the ...

6 Frank F. Xu, et al. ∙

research

∙ 03/03/2020

XGPT: Cross-modal Generative Pre-Training for Image Captioning

While many BERT-based cross-modal pre-trained models produce excellent r...

9 Qiaolin Xia, et al. ∙

research

∙ 02/15/2020

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

We propose UniViLM: a Unified Video and Language pre-training Model for ...

16 Huaishao Luo, et al. ∙

research

∙ 05/24/2018

R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering

Recently, Visual Question Answering (VQA) has emerged as one of the most...

0 Pan Lu @ UCLA, et al. ∙

Lei Ji

Featured Co-authors

Sign in with Google

Consider DeepAI Pro