In this paper, we introduce CheXOFA, a new pre-trained vision-language m...
In this report, we present our champion solution for Ego4D Natural Langu...
This technical report describes the CONE approach for Ego4D Natural Lang...
Video temporal grounding (VTG) targets to localize temporal moments in a...
Treatment effect estimation, which refers to the estimation of causal ef...
Filter pruning is one of the most effective ways to accelerate and compr...