From Third Person to First Person: Dataset and Baselines for Synthesis and Retrieval

by   Mohamed Elfeki, et al.

First-person (egocentric) and third person (exocentric) videos are drastically different in nature. The relationship between these two views have been studied in recent years, however, it has yet to be fully explored. In this work, we introduce two datasets (synthetic and natural/real) containing simultaneously recorded egocentric and exocentric videos. We also explore relating the two domains (egocentric and exocentric) in two aspects. First, we synthesize images in the egocentric domain from the exocentric domain using a conditional generative adversarial network (cGAN). We show that with enough training data, our network is capable of hallucinating how the world would look like from an egocentric perspective, given an exocentric video. Second, we address the cross-view retrieval problem across the two views. Given an egocentric query frame (or its momentary optical flow), we retrieve its corresponding exocentric frame (or optical flow) from a gallery set. We show that using synthetic data could be beneficial in retrieving real data. We show that performing domain adaptation from the synthetic domain to the natural/real domain, is helpful in tasks such as retrieval. We believe that the presented datasets and the proposed baselines offer new opportunities for further research in this direction. The code and dataset are publicly available.


page 3

page 4

page 7


Learning optical flow from still images

This paper deals with the scarcity of data for training optical flow net...

Hybrid Learning of Optical Flow and Next Frame Prediction to Boost Optical Flow in the Wild

CNN-based optical flow estimation has attracted attention recently, main...

Optical Flow in Dense Foggy Scenes using Semi-Supervised Learning

In dense foggy scenes, existing optical flow methods are erroneous. This...

Hierarchical Video Generation from Orthogonal Information: Optical Flow and Texture

Learning to represent and generate videos from unlabeled data is a very ...

The benefits of synthetic data for action categorization

In this paper, we study the value of using synthetically produced videos...

Revisiting Optical Flow Estimation in 360 Videos

Nowadays 360 video analysis has become a significant research topic in t...

Let's Dance: Learning From Online Dance Videos

In recent years, deep neural network approaches have naturally extended ...

Please sign up or login with your details

Forgot password? Click here to reset