Food instance segmentation is essential to estimate the serving size of
...
In this report, we present our champion solution for Ego4D Natural Langu...
Food image-to-recipe aims to learn an embedded space linking the rich
se...
This technical report describes the CONE approach for Ego4D Natural Lang...
Video temporal grounding (VTG) targets to localize temporal moments in a...
Cross-modal representation learning has become a new normal for bridging...
Cross-modal recipe retrieval has attracted research attention in recent
...
This paper tackles a recently proposed Video Corpus Moment Retrieval tas...