We develop a diffusion-based approach for various document layout sequen...
We study the problem of recognizing structured text, i.e. text that foll...
Text recognition is a long-standing research problem for document
digita...
Multimodal pre-training with text, layout, and image has achieved SOTA
p...
Pre-training of text and layout has proved effective in a variety of
vis...
In this paper, we propose Text-Aware Pre-training (TAP) for Text-VQA and...
We address the problem of retrieving a specific moment from an untrimmed...
3D models of humans are commonly used within computer graphics and visio...
This paper investigates how to perform robust visual tracking in adverse...
RGB-Thermal (RGB-T) object tracking receives more and more attention due...