We address the challenging task of Localization via Embodied Dialog (LED...
We introduce a novel interface for large scale collection of human memor...
We present Where Are You? (WAY), a dataset of 6k dialogs in which two h...
Localizing moments in untrimmed videos via language queries is a new and...
We describe a novel cross-modal embedding space for actions, named
Actio...
Automatic generation of textual video descriptions that are time-aligned...
In this paper, we study a discriminatively trained deep convolutional ne...