Addressing the confounds of accompaniments in singer identification

by   Tsung-Han Hsieh, et al.

Identifying singers is an important task with many applications. However, the task remains challenging due to many issues. One major issue is related to the confounding factors from the background instrumental music that is mixed with the vocals in music production. A singer identification model may learn to extract non-vocal related features from the instrumental part of the songs, if a singer only sings in certain musical contexts (e.g., genres). The model cannot therefore generalize well when the singer sings in unseen contexts. In this paper, we attempt to address this issue. Specifically, we employ open-unmix, an open source tool with state-of-the-art performance in source separation, to separate the vocal and instrumental tracks of music. We then investigate two means to train a singer identification model: by learning from the separated vocal only, or from an augmented set of data where we "shuffle-and-remix" the separated vocal tracks and instrumental tracks of different songs to artificially make the singers sing in different contexts. We also incorporate melodic features learned from the vocal melody contour for better performance. Evaluation results on a benchmark dataset called the artist20 shows that this data augmentation method greatly improves the accuracy of singer identification.


Mixing-Specific Data Augmentation Techniques for Improved Blind Violin/Piano Source Separation

Blind music source separation has been a popular and active subject of r...

Learning a Joint Embedding Space of Monophonic and Mixed Music Signals for Singing Voice

Previous approaches in singer identification have used one of monophonic...

Improved singing voice separation with chromagram-based pitch-aware remixing

Singing voice separation aims to separate music into vocals and accompan...

FretNet: Continuous-Valued Pitch Contour Streaming for Polyphonic Guitar Tablature Transcription

In recent years, the task of Automatic Music Transcription (AMT), whereb...

Reverb Conversion of Mixed Vocal Tracks Using an End-to-end Convolutional Deep Neural Network

Reverb plays a critical role in music production, where it provides list...

Segmentation of nearly isotropic overlapped tracks in photomicrographs using successive erosions as watershed markers

The major challenges of automatic track counting are distinguishing trac...

Assessing Algorithmic Biases for Musical Version Identification

Version identification (VI) systems now offer accurate and scalable solu...

Please sign up or login with your details

Forgot password? Click here to reset