We propose ImGeoNet, a multi-view image-based 3D object detection framew...
Most prior semantic segmentation methods have been developed for day-tim...
Large language models (LLMs) have demonstrated impressive capabilities i...
GuessWhat?! is a two-player visual dialog guessing game where player A a...
Recently, end-to-end multi-speaker text-to-speech (TTS) systems gain suc...
In this paper we propose a Sequential Representation Quantization AutoEn...
End-to-end text-to-speech (TTS) has shown great success on large quantit...