We present cross-view transformers, an efficient attention-based model f...
Deep networks devour millions of precisely annotated images to build the...
Vision-based urban driving is hard. The autonomous system needs to learn...
Computer vision produces representations of scene content. Much computer...