Contrastive language-image pre-training, CLIP for short, has gained
incr...
We launch EVA-02, a next-generation Transformer-based visual representat...
We launch EVA, a vision-centric foundation model to explore the limits o...
Recently vision transformer has achieved tremendous success on image-lev...
We present an approach to efficiently and effectively adapt a masked ima...
We introduce Corrupted Image Modeling (CIM) for self-supervised visual
p...
Recent studies show that hierarchical Vision Transformer with interleave...
Recently, query based deep networks catch lots of attention owing to the...
Can Transformer perform 2D object-level recognition from a pure
sequence...
Recently, query based object detection frameworks achieve comparable
per...
Modeling temporal visual context across frames is critical for video ins...
Few-shot learning is a challenging task that aims at training a classifi...