Peng Xu
Ph.D. student at HKUST
We present ImageBind-LLM, a multi-modality instruction tuning method of ...
We present a deep-dive into a real-world robotic learning system that, i...
Large language models (LLMs) have revolutionized natural language proces...
In this paper, we investigate the in-context learning ability of
retriev...
Super Resolution (SR) and Camouflaged Object Detection (COD) are two hot...
Recent advancements in Large Vision-Language Models (LVLMs) have demonst...
We study how vision-language models trained on Internet-scale data can b...
A major challenge to deploying robots widely is navigation in human-popu...
Despite the great success of deep learning in stereo matching, recoverin...
Large Vision-Language Models (LVLMs) have recently played a dominant rol...
Large language models (LLMs) have demonstrated exciting progress in acqu...
Token compression aims to speed up large-scale vision transformers (e.g....
This paper considers a novel and challenging problem: unsupervised long-...
Concealed scene understanding (CSU) is a hot computer vision topic aimin...
Large decoder-only language models (LMs) can be largely improved in term...
Segmenting anything is a ground-breaking step toward artificial general
...
Searchable symmetric encryption enables private queries over an encrypte...
By transferring knowledge from large, diverse, task-agnostic datasets, m...
We describe the current content moderation strategy employed by Meta to
...
Parameter efficient learning methods (PERMs) have recently gained signif...
We propose a framework to enable multipurpose assistive mobile robots to...
Massively multiplayer online role-playing games create virtual communiti...
Closed-book question answering (QA) requires a model to directly answer ...
Steady-state visual evoked potential (SSVEP) is one of the most commonly...
Despite decades of research, existing navigation systems still face
real...
Large language models (LLMs) trained on code completion have been shown ...
Navigation functions provide both path and motion planning, which can be...
UAVs (Unmanned Aerial Vehicles) dynamic encirclement is an emerging fiel...
Transformer is a promising neural network learner, and has achieved grea...
Pretrained language models (LMs) are susceptible to generate text with
n...
Large language models have achieved high performance on various question...
Reorienting objects using extrinsic supporting items on the working plat...
Adaptively Informed Trees (AIT*) develops the problem-specific heuristic...
The detection of tiny objects in microscopic videos is a problematic poi...
Given a question-image input, the Visual Commonsense Reasoning (VCR) mod...
Large language models can encode a wealth of semantic knowledge about th...
Verification planning is a sequential decision-making problem that speci...
Multi-hop question generation (MQG) aims to generate complex questions w...
Pre-trained language models (LMs) are shown to easily generate toxic
lan...
"Masked Autoencoders (MAE) Are Scalable Vision Learners" revolutionizes ...
We investigate unsupervised person re-identification (Re-ID) with clothe...
With the popularity of Android growing exponentially, the amount of malw...
Nowadays, Android is the most dominant operating system in the mobile
ec...
Code-switching is a speech phenomenon when a speaker switches language d...
Semantic parsing datasets are expensive to collect. Moreover, even the
q...
Reinforcement learning can train policies that effectively perform compl...
Recent progress in pretrained Transformer-based language models has show...
We propose a novel method for applying Transformer models to extractive
...
Transformer models, which leverage architectural improvements like
self-...
Verification is a critical process in the development of engineered syst...