No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection

07/20/2023
by   Qi Zhang, et al.
0

Temporal video grounding (TVG) aims to retrieve the time interval of a language query from an untrimmed video. A significant challenge in TVG is the low "Semantic Noise Ratio (SNR)", which results in worse performance with lower SNR. Prior works have addressed this challenge using sophisticated techniques. In this paper, we propose a no-frills TVG model that consists of two core modules, namely multi-scale neighboring attention and zoom-in boundary detection. The multi-scale neighboring attention restricts each video token to only aggregate visual contexts from its neighbor, enabling the extraction of the most distinguishing information with multi-scale feature hierarchies from high-ratio noises. The zoom-in boundary detection then focuses on local-wise discrimination of the selected top candidates for fine-grained grounding adjustment. With an end-to-end training strategy, our model achieves competitive performance on different TVG benchmarks, while also having the advantage of faster inference speed and lighter model parameters, thanks to its lightweight architecture.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/27/2021

Multi-Scale Local-Temporal Similarity Fusion for Continuous Sign Language Recognition

Continuous sign language recognition (cSLR) is a public significant task...
research
01/25/2020

Look Closer to Ground Better: Weakly-Supervised Temporal Grounding of Sentence in Video

In this paper, we study the problem of weakly-supervised temporal ground...
research
08/07/2018

Dynamic Temporal Pyramid Network: A Closer Look at Multi-Scale Modeling for Activity Detection

Recognizing instances at different scales simultaneously is a fundamenta...
research
04/06/2022

Multi-Scale Memory-Based Video Deblurring

Video deblurring has achieved remarkable progress thanks to the success ...
research
04/12/2022

Position-aware Location Regression Network for Temporal Video Grounding

The key to successful grounding for video surveillance is to understand ...
research
03/14/2023

Generation-Guided Multi-Level Unified Network for Video Grounding

Video grounding aims to locate the timestamps best matching the query de...
research
01/25/2022

Explore and Match: End-to-End Video Grounding with Transformer

We present a new paradigm named explore-and-match for video grounding, w...

Please sign up or login with your details

Forgot password? Click here to reset