MusicLM: Generating Music From Text

01/26/2023
by   Andrea Agostinelli, et al.
0

We introduce MusicLM, a model generating high-fidelity music from text descriptions such as "a calming violin melody backed by a distorted guitar riff". MusicLM casts the process of conditional music generation as a hierarchical sequence-to-sequence modeling task, and it generates music at 24 kHz that remains consistent over several minutes. Our experiments show that MusicLM outperforms previous systems both in audio quality and adherence to the text description. Moreover, we demonstrate that MusicLM can be conditioned on both text and a melody in that it can transform whistled and hummed melodies according to the style described in a text caption. To support future research, we publicly release MusicCaps, a dataset composed of 5.5k music-text pairs, with rich text descriptions provided by human experts.

READ FULL TEXT

page 13

page 15

research
09/05/2022

Bridging Music and Text with Crowdsourced Music Comments: A Sequence-to-Sequence Framework for Thematic Music Comments Generation

We consider a novel task of automatically generating text descriptions o...
research
08/09/2023

JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models

Music generation has attracted growing interest with the advancement of ...
research
02/08/2023

Noise2Music: Text-conditioned Music Generation with Diffusion Models

We introduce Noise2Music, where a series of diffusion models is trained ...
research
02/09/2023

ERNIE-Music: Text-to-Waveform Music Generation with Diffusion Models

In recent years, there has been an increased popularity in image and spe...
research
09/22/2020

PodSumm – Podcast Audio Summarization

The diverse nature, scale, and specificity of podcasts present a unique ...
research
01/12/2022

Differentiating Geographic Movement Described in Text Documents

Understanding movement described in text documents is important since te...
research
05/03/2023

Diverse and Vivid Sound Generation from Text Descriptions

Previous audio generation mainly focuses on specified sound classes such...

Please sign up or login with your details

Forgot password? Click here to reset