DISCO-10M: A Large-Scale Music Dataset

06/23/2023
by   Luca A. Lanzendörfer, et al.
0

Music datasets play a crucial role in advancing research in machine learning for music. However, existing music datasets suffer from limited size, accessibility, and lack of audio resources. To address these shortcomings, we present DISCO-10M, a novel and extensive music dataset that surpasses the largest previously available music dataset by an order of magnitude. To ensure high-quality data, we implement a multi-stage filtering process. This process incorporates similarities based on textual descriptions and audio embeddings. Moreover, we provide precomputed CLAP embeddings alongside DISCO-10M, facilitating direct application on various downstream tasks. These embeddings enable efficient exploration of machine learning applications on the provided data. With DISCO-10M, we aim to democratize and facilitate new research to help advance the development of novel machine learning models for music.

READ FULL TEXT

page 6

page 13

research
09/15/2023

MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response

Large Language Models (LLMs) have shown immense potential in multimodal ...
research
07/19/2023

From West to East: Who can understand the music of the others better?

Recent developments in MIR have led to several benchmark deep learning m...
research
07/25/2011

An end-to-end machine learning system for harmonic analysis of music

We present a new system for simultaneous estimation of keys, chords, and...
research
06/08/2016

Symbolic Music Data Version 1.0

In this document, we introduce a new dataset designed for training machi...
research
07/29/2020

dMelodies: A Music Dataset for Disentanglement Learning

Representation learning focused on disentangling the underlying factors ...
research
06/26/2021

An Audio Envelope Generator Derived from Industrial Process Control

Audio envelopes serve a crucial role in ensuring the versatility of synt...
research
10/22/2020

Mood Classification Using Listening Data

The mood of a song is a highly relevant feature for exploration and reco...

Please sign up or login with your details

Forgot password? Click here to reset