Large vision-language models (LVLMs) have recently witnessed rapid
advan...
We introduce the Qwen-VL series, a set of large-scale vision-language mo...
Foundation language models obtain the instruction-following ability thro...
Mathematical reasoning is a challenging task for large language models
(...
In this study, we focus on the problem of 3D human mesh recovery from a
...
In this paper, we focus on the task of generalizable neural human render...
The answering quality of an aligned large language model (LLM) can be
dr...
In this work, we explore a scalable way for building a general represent...
Video Frame Interpolation (VFI) aims to synthesize non-existent intermed...
Video-based 3D human pose and shape estimations are evaluated by intra-f...
Large-scale embedding-based retrieval (EBR) is the cornerstone of
search...
This paper proposes a new method, OFA-OCR, to transfer multimodal pretra...
Generalist models, which are capable of performing diverse multi-modal t...
Generative modeling of human motion has broad applications in computer
a...
In this paper, we propose a novel multi-modal multi-task encoder-decoder...
The goal of expressive Text-to-speech (TTS) is to synthesize natural spe...
The tremendous success of CLIP (Radford et al., 2021) has promoted the
r...
Knowledge distillation (KD) is essentially a process of transferring a
t...
Prompt tuning has become a new paradigm for model tuning and it has
demo...
Virtual try-on aims to generate a photo-realistic fitting result given a...
Prompt Learning has recently gained great popularity in bridging the gap...
The fashion industry has diverse applications in multi-modal image gener...
Industrial recommender systems have been growing increasingly complex, m...
In this paper, we focus on the unsupervised Video Object Segmentation (V...
Despite the remarkable success of deep multi-modal learning in practice,...
In this work, we pursue a unified paradigm for multimodal pretraining to...
Heterogeneous graph neural networks (HGNNs) have been blossoming in rece...
Recent expeditious developments in deep learning algorithms, distributed...
Attention module does not always help deep models learn causal features ...
Real-time semantic segmentation has received considerable attention due ...
Existing reasoning tasks often have an important assumption that the inp...
In this paper, we identify and study an important problem of gradient it...
Mixture-of-Experts (MoE) models can achieve promising results with outra...
Vehicle search is one basic task for the efficient traffic management in...
Table-to-text generation refers to generating a descriptive text from a
...
Conditional image synthesis aims to create an image according to some
mu...
Text-to-Image generation in the general domain has long been an open pro...
Graph representation learning aims to learn low-dimensional node embeddi...
In this work, we construct the largest dataset for multimodal pretrainin...
Deep candidate generation (DCG) that narrows down the collection of rele...
Deep candidate generation has become an increasingly popular choice depl...
Graph representation learning has been extensively studied in recent yea...
Recently, neural networks have been widely used in e-commerce recommende...
The problem of hyperparameter optimization exists widely in the real lif...
User behavior data in recommender systems are driven by the complex
inte...
Graph Convolution Networks (GCNs) are becoming more and more popular for...
Inferring new facts from existing knowledge graphs (KG) with explainable...
We propose a new CogQA framework for multi-hop question answering in
web...
Product bundling, offering a combination of items to customers, is one o...
An increasing number of machine learning tasks require dealing with larg...