Dual-Path Temporal Map Optimization for Make-up Temporal Video Grounding

09/12/2023
by   Jiaxiu Li, et al.
0

Make-up temporal video grounding (MTVG) aims to localize the target video segment which is semantically related to a sentence describing a make-up activity, given a long video. Compared with the general video grounding task, MTVG focuses on meticulous actions and changes on the face. The make-up instruction step, usually involving detailed differences in products and facial areas, is more fine-grained than general activities (e.g, cooking activity and furniture assembly). Thus, existing general approaches cannot locate the target activity effectually. More specifically, existing proposal generation modules are not yet fully developed in providing semantic cues for the more fine-grained make-up semantic comprehension. To tackle this issue, we propose an effective proposal-based framework named Dual-Path Temporal Map Optimization Network (DPTMO) to capture fine-grained multimodal semantic details of make-up activities. DPTMO extracts both query-agnostic and query-guided features to construct two proposal sets and uses specific evaluation methods for the two sets. Different from the commonly used single structure in previous methods, our dual-path structure can mine more semantic information in make-up videos and distinguish fine-grained actions well. These two candidate sets represent the cross-modal makeup video-text similarity and multi-modal fusion relationship, complementing each other. Each set corresponds to its respective optimization perspective, and their joint prediction enhances the accuracy of video timestamp prediction. Comprehensive experiments on the YouMakeup dataset demonstrate our proposed dual structure excels in fine-grained semantic comprehension.

READ FULL TEXT

page 1

page 4

page 9

page 10

page 11

research
10/21/2022

Fine-grained Semantic Alignment Network for Weakly Supervised Temporal Language Grounding

Temporal language grounding (TLG) aims to localize a video segment in an...
research
02/21/2023

Tracking Objects and Activities with Attention for Temporal Sentence Grounding

Temporal sentence grounding (TSG) aims to localize the temporal segment ...
research
01/25/2020

Look Closer to Ground Better: Weakly-Supervised Temporal Grounding of Sentence in Video

In this paper, we study the problem of weakly-supervised temporal ground...
research
12/06/2018

Guided Zoom: Questioning Network Evidence for Fine-grained Classification

We propose Guided Zoom, an approach that utilizes spatial grounding to m...
research
04/13/2018

Text-to-Clip Video Retrieval with Early Fusion and Re-Captioning

We propose a novel method capable of retrieving clips from untrimmed vid...
research
01/22/2023

Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding

Temporal grounding is the task of locating a specific segment from an un...
research
01/25/2022

Explore and Match: End-to-End Video Grounding with Transformer

We present a new paradigm named explore-and-match for video grounding, w...

Please sign up or login with your details

Forgot password? Click here to reset