Representation Learning via Global Temporal Alignment and Cycle-Consistency

05/11/2021
by   Isma Hadji, et al.
0

We introduce a weakly supervised method for representation learning based on aligning temporal sequences (e.g., videos) of the same process (e.g., human action). The main idea is to use the global temporal ordering of latent correspondences across sequence pairs as a supervisory signal. In particular, we propose a loss based on scoring the optimal sequence alignment to train an embedding network. Our loss is based on a novel probabilistic path finding view of dynamic time warping (DTW) that contains the following three key features: (i) the local path routing decisions are contrastive and differentiable, (ii) pairwise distances are cast as probabilities that are contrastive as well, and (iii) our formulation naturally admits a global cycle consistency loss that verifies correspondences. For evaluation, we consider the tasks of fine-grained action classification, few shot learning, and video synchronization. We report significant performance increases over previous methods. In addition, we report two applications of our temporal alignment framework, namely 3D pose reconstruction and fine-grained audio/visual retrieval.

READ FULL TEXT

page 8

page 14

research
04/16/2019

Temporal Cycle-Consistency Learning

We introduce a self-supervised representation learning method based on t...
research
08/26/2021

Drop-DTW: Aligning Common Signal Between Sequences While Dropping Outliers

In this work, we consider the problem of sequence-to-sequence alignment ...
research
03/07/2023

A Light-Weight Contrastive Approach for Aligning Human Pose Sequences

We present a simple unsupervised method for learning an encoder mapping ...
research
09/04/2015

Learning Temporal Alignment Uncertainty for Efficient Event Detection

In this paper we tackle the problem of efficient video event detection. ...
research
02/08/2023

Weakly-supervised Representation Learning for Video Alignment and Analysis

Many tasks in video analysis and understanding boil down to the need for...
research
10/05/2016

A tentative model for dimensionless phoneme distance from binary distinctive features

This work proposes a tentative model for the calculation of dimensionles...
research
04/11/2023

Soft Dynamic Time Warping for Multi-Pitch Estimation and Beyond

Many tasks in music information retrieval (MIR) involve weakly aligned d...

Please sign up or login with your details

Forgot password? Click here to reset