Incremental decision making in real-world environments is one of the mos...
In an era where images and visual content dominate our digital landscape...
Attaining a high degree of user controllability in visual generation oft...
3D human modeling has been widely used for engaging interaction in gamin...
The field of text-to-image (T2I) generation has garnered significant
att...
Diffusion models, such as Stable Diffusion, have shown incredible perfor...
Large-scale diffusion models have achieved state-of-the-art results on
t...
Generating a video given the first several static frames is challenging ...
Prompt tuning is a new few-shot transfer learning technique that only tu...
Vision-and-Language Navigation (VLN) is a task to guide an embodied agen...
Masked visual modeling (MVM) has been recently proven effective for visu...
A great challenge in video-language (VidL) modeling lies in the disconne...
Despite having promising results, style transfer, which requires prepari...
Video editing tools are widely used nowadays for digital design. Althoug...
Recent advances in language and vision push forward the research of
capt...
Creating presentation materials requires complex multimodal reasoning sk...
Although distant supervision automatically generates training data for
r...
Iterative Language-Based Image Editing (IL-BIE) tasks follow iterative
i...
In the vision-and-language navigation (VLN) task, an agent follows natur...
Vision-and-Language Navigation (VLN) is a task where agents must decide ...
State-of-the-art approaches of NER have used sequence-labeling BiLSTM as...
Recent researches prevalently used BiLSTM-CNN as a core module for NER i...
In this paper, we investigate the use of an unsupervised label clusterin...
We present an adversarial exploration strategy, a simple yet effective
i...
In this paper, we present a detailed design of dynamic video segmentatio...