This technical report describes the CONE approach for Ego4D Natural Lang...
Video temporal grounding (VTG) targets to localize temporal moments in a...
Conventional video models rely on a single stream to capture the complex...
Localizing persons and recognizing their actions from videos is a challe...
In this paper, we study an intermediate form of supervision, i.e.,
singl...
Recently, Weakly-supervised Temporal Action Localization (WTAL) has been...
Many real-world applications involve multivariate, geo-tagged time serie...
Motion has shown to be useful for video understanding, where motion is
t...
Deep neural networks suffer from over-fitting and catastrophic forgettin...
Temporal Action Localization (TAL) in untrimmed video is important for m...
The goal of Online Action Detection (OAD) is to detect action in a timel...
Temporal action detection is a very important yet challenging problem, s...
In this notebook paper, we describe our approach in the submission to th...
Temporal action localization is an important yet challenging problem. Gi...
EventNet is a large-scale video corpus and event ontology consisting of ...
We address temporal action localization in untrimmed long videos. This i...