Tsu-Jui Fu

research

∙ 07/12/2023

VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View

Incremental decision making in real-world environments is one of the mos...

0 Raphael Schumann, et al. ∙

research

∙ 05/29/2023

Photoswap: Personalized Subject Swapping in Images

In an era where images and visual content dominate our digital landscape...

0 Jing Gu, et al. ∙

research

∙ 05/24/2023

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

Attaining a high degree of user controllability in visual generation oft...

6 Weixi Feng, et al. ∙

research

∙ 05/23/2023

Text-guided 3D Human Generation from 2D Collections

3D human modeling has been widely used for engaging interaction in gamin...

0 Tsu-Jui Fu, et al. ∙

research

∙ 05/18/2023

Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation

The field of text-to-image (T2I) generation has garnered significant att...

0 Wanrong Zhu, et al. ∙

research

∙ 05/18/2023

Discriminative Diffusion Models as Few-shot Vision and Language Learners

Diffusion models, such as Stable Diffusion, have shown incredible perfor...

0 Xuehai He, et al. ∙

research

∙ 12/09/2022

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis

Large-scale diffusion models have achieved state-of-the-art results on t...

0 Weixi Feng, et al. ∙

research

∙ 11/23/2022

Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation

Generating a video given the first several static frames is challenging ...

0 Tsu-Jui Fu, et al. ∙

research

∙ 10/19/2022

CPL: Counterfactual Prompt Learning for Vision and Language Models

Prompt tuning is a new few-shot transfer learning technique that only tu...

0 Xuehai He, et al. ∙

research

∙ 10/18/2022

ULN: Towards Underspecified Vision-and-Language Navigation

Vision-and-Language Navigation (VLN) is a task to guide an embodied agen...

0 Weixi Feng, et al. ∙

research

∙ 09/04/2022

An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling

Masked visual modeling (MVM) has been recently proven effective for visu...

8 Tsu-Jui Fu, et al. ∙

research

∙ 11/24/2021

VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling

A great challenge in video-language (VidL) modeling lies in the disconne...

19 Tsu-Jui Fu, et al. ∙

research

∙ 06/01/2021

Language-Driven Image Style Transfer

Despite having promising results, style transfer, which requires prepari...

0 Tsu-Jui Fu, et al. ∙

research

∙ 04/02/2021

Language-based Video Editing via Multi-Modal Multi-Level Transformer

Video editing tools are widely used nowadays for digital design. Althoug...

0 Tsu-Jui Fu, et al. ∙

research

∙ 02/03/2021

L2C: Describing Visual Differences Needs Semantic Understanding of Individuals

Recent advances in language and vision push forward the research of capt...

0 An Yan, et al. ∙

research

∙ 01/28/2021

DOC2PPT: Automatic Presentation Slides Generation from Scientific Documents

Creating presentation materials requires complex multimodal reasoning sk...

0 Tsu-Jui Fu, et al. ∙

research

∙ 12/07/2020

H-FND: Hierarchical False-Negative Denoising for Distant Supervision Relation Extraction

Although distant supervision automatically generates training data for r...

0 Jhih-Wei Chen, et al. ∙

research

∙ 09/21/2020

SSCR: Iterative Language-Based Image Editing via Self-Supervised Counterfactual Reasoning

Iterative Language-Based Image Editing (IL-BIE) tasks follow iterative i...

0 Tsu-Jui Fu, et al. ∙

research

∙ 07/01/2020

Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation

In the vision-and-language navigation (VLN) task, an agent follows natur...

0 Wanrong Zhu, et al. ∙

research

∙ 11/17/2019

Counterfactual Vision-and-Language Navigation via Adversarial Path Sampling

Vision-and-Language Navigation (VLN) is a task where agents must decide ...

0 Tsu-Jui Fu, et al. ∙

research

∙ 10/07/2019

Why Attention? Analyzing and Remedying BiLSTM Deficiency in Modeling Cross-Context for NER

State-of-the-art approaches of NER have used sequence-labeling BiLSTM as...

0 Peng-Hsuan Li, et al. ∙

research

∙ 08/29/2019

Remedying BiLSTM-CNN Deficiency in Modeling Cross-Context for NER

Recent researches prevalently used BiLSTM-CNN as a core module for NER i...

0 Peng-Hsuan Li, et al. ∙

research

∙ 09/09/2018

Visual Relationship Prediction via Label Clustering and Incorporation of Depth Information

In this paper, we investigate the use of an unsupervised label clusterin...

0 Hsuan-Kung Yang, et al. ∙

research

∙ 06/26/2018

Adversarial Exploration Strategy for Self-Supervised Imitation Learning

We present an adversarial exploration strategy, a simple yet effective i...

0 Zhang-Wei Hong, et al. ∙

research

∙ 04/03/2018

Dynamic Video Segmentation Network

In this paper, we present a detailed design of dynamic video segmentatio...

0 Yu-Syuan Xu, et al. ∙

Tsu-Jui Fu

Featured Co-authors

Sign in with Google

Consider DeepAI Pro