Cross-modal Variational Auto-encoder for Content-based Micro-video Background Music Recommendation

07/15/2021
by   Jing Yi, et al.
0

In this paper, we propose a cross-modal variational auto-encoder (CMVAE) for content-based micro-video background music recommendation. CMVAE is a hierarchical Bayesian generative model that matches relevant background music to a micro-video by projecting these two multimodal inputs into a shared low-dimensional latent space, where the alignment of two corresponding embeddings of a matched video-music pair is achieved by cross-generation. Moreover, the multimodal information is fused by the product-of-experts (PoE) principle, where the semantic information in visual and textual modalities of the micro-video are weighted according to their variance estimations such that the modality with a lower noise level is given more weights. Therefore, the micro-video latent variables contain less irrelevant information that results in a more robust model generalization. Furthermore, we establish a large-scale content-based micro-video background music recommendation dataset, TT-150k, composed of approximately 3,000 different background music clips associated to 150,000 micro-videos from different users. Extensive experiments on the established TT-150k dataset demonstrate the effectiveness of the proposed method. A qualitative assessment of CMVAE by visualizing some recommendation results is also included.

READ FULL TEXT

page 4

page 10

research
03/22/2023

VMCML: Video and Music Matching via Cross-Modality Lifting

We propose a content-based system for matching video and background musi...
research
08/07/2022

Debiased Cross-modal Matching for Content-based Micro-video Background Music Recommendation

Micro-video background music recommendation is a complicated task where ...
research
03/28/2020

Predicting the Popularity of Micro-videos with Multimodal Variational Encoder-Decoder Framework

As an emerging type of user-generated content, micro-video drastically e...
research
06/12/2023

Video-to-Music Recommendation using Temporal Alignment of Segments

We study cross-modal recommendation of music tracks to be used as soundt...
research
11/16/2022

Video-Music Retrieval:A Dual-Path Cross-Modal Network

We propose a method to recommend background music for videos. Current wo...
research
09/18/2023

Unified Pretraining Target Based Video-music Retrieval With Music Rhythm And Video Optical Flow Information

Background music (BGM) can enhance the video's emotion. However, selecti...
research
09/03/2019

Translating Visual Art into Music

The Synesthetic Variational Autoencoder (SynVAE) introduced in this rese...

Please sign up or login with your details

Forgot password? Click here to reset