Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning

03/24/2022
by   Juncheng Li, et al.
6

Temporal grounding in videos aims to localize one target video segment that semantically corresponds to a given query sentence. Thanks to the semantic diversity of natural language descriptions, temporal grounding allows activity grounding beyond pre-defined classes and has received increasing attention in recent years. The semantic diversity is rooted in the principle of compositionality in linguistics, where novel semantics can be systematically described by combining known words in novel ways (compositional generalization). However, current temporal grounding datasets do not specifically test for the compositional generalizability. To systematically measure the compositional generalizability of temporal grounding models, we introduce a new Compositional Temporal Grounding task and construct two new dataset splits, i.e., Charades-CG and ActivityNet-CG. Evaluating the state-of-the-art methods on our new dataset splits, we empirically find that they fail to generalize to queries with novel combinations of seen words. To tackle this challenge, we propose a variational cross-graph reasoning framework that explicitly decomposes video and language into multiple structured hierarchies and learns fine-grained semantic correspondence among them. Experiments illustrate the superior compositional generalizability of our approach. The repository of this work is at https://github.com/YYJMJC/ Compositional-Temporal-Grounding.

READ FULL TEXT

page 4

page 8

research
01/22/2023

Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding

Temporal grounding is the task of locating a specific segment from an un...
research
12/04/2019

Compositional Temporal Visual Grounding of Natural Language Event Descriptions

Temporal grounding entails establishing a correspondence between natural...
research
10/12/2022

CTL++: Evaluating Generalization on Never-Seen Compositional Patterns of Known Functions, and Compatibility of Neural Representations

Well-designed diagnostic tasks have played a key role in studying the fa...
research
03/15/2023

Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos

Video temporal grounding aims to pinpoint a video segment that matches t...
research
03/23/2021

Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos

In this paper, we address the problem of referring expression comprehens...
research
03/29/2023

EgoTV: Egocentric Task Verification from Natural Language Task Descriptions

To enable progress towards egocentric agents capable of understanding ev...
research
03/31/2021

Embracing Uncertainty: Decoupling and De-bias for Robust Temporal Grounding

Temporal grounding aims to localize temporal boundaries within untrimmed...

Please sign up or login with your details

Forgot password? Click here to reset