Msanii: High Fidelity Music Synthesis on a Shoestring Budget

by   Kinyugo Maina, et al.

In this paper, we present Msanii, a novel diffusion-based model for synthesizing long-context, high-fidelity music efficiently. Our model combines the expressiveness of mel spectrograms, the generative capabilities of diffusion models, and the vocoding capabilities of neural vocoders. We demonstrate the effectiveness of Msanii by synthesizing tens of seconds (190 seconds) of stereo music at high sample rates (44.1 kHz) without the use of concatenative synthesis, cascading architectures, or compression techniques. To the best of our knowledge, this is the first work to successfully employ a diffusion-based model for synthesizing such long music samples at high sample rates. Our demo can be found and our code .


page 5

page 8


Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion

The recent surge in popularity of diffusion models for image generation ...

Jukebox: A Generative Model for Music

We introduce Jukebox, a model that generates music with singing in the r...

Noise2Music: Text-conditioned Music Generation with Diffusion Models

We introduce Noise2Music, where a series of diffusion models is trained ...

Generating High Fidelity Data from Low-density Regions using Diffusion Models

Our work focuses on addressing sample deficiency from low-density region...

HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models

Recently, denoising diffusion models have demonstrated remarkable perfor...

Hierarchical Timbre-Painting and Articulation Generation

We present a fast and high-fidelity method for music generation, based o...

Robust Dancer: Long-term 3D Dance Synthesis Using Unpaired Data

How to automatically synthesize natural-looking dance movements based on...

Code Repositories


A novel diffusion-based model for synthesizing long-context, high-fidelity music efficiently.

view repo

Please sign up or login with your details

Forgot password? Click here to reset