Humans write code in a fundamentally interactive manner and rely on cons...
Transformer models have shown great potential in computer vision, follow...
We propose Referral-Augmented Retrieval (RAR), a simple technique that
c...
Existing benchmarks for grounding language in interactive environments e...
Transformer and its variants have shown state-of-the-art results in many...
While hand pose estimation is a critical component of most interactive
e...
3D hand pose estimation based on RGB images has been studied for a long ...
Recent advances in image-to-image translation have led to some ways to
g...
We tackle the blackbox issue of deep neural networks in the settings of
...
While convolutional neural networks (CNNs) are widely used for handling
...