Using Cell Phone Pictures of Sheet Music To Retrieve MIDI Passages

by   TJ Tsai, et al.

This article investigates a cross-modal retrieval problem in which a user would like to retrieve a passage of music from a MIDI file by taking a cell phone picture of several lines of sheet music. This problem is challenging for two reasons: it has a significant runtime constraint since it is a user-facing application, and there is very little relevant training data containing cell phone images of sheet music. To solve this problem, we introduce a novel feature representation called a bootleg score which encodes the position of noteheads relative to staff lines in sheet music. The MIDI representation can be converted into a bootleg score using deterministic rules of Western musical notation, and the sheet music image can be converted into a bootleg score using classical computer vision techniques for detecting simple geometrical shapes. Once the MIDI and cell phone image have been converted into bootleg scores, we can estimate the alignment using dynamic programming. The most notable characteristic of our system is that it has no trainable weights at all – only a set of about 40 hyperparameters. With a training set of just 400 images, we show that our system generalizes well to a much larger set of 1600 test images from 160 unseen musical scores. Our system achieves a test F measure score of 0.89, has an average runtime of 0.90 seconds, and outperforms baseline systems based on music object detection and sheet-audio alignment. We provide extensive experimental validation and analysis of our system.


page 1

page 13


MIDI Passage Retrieval Using Cell Phone Pictures of Sheet Music

This paper investigates a cross-modal retrieval problem in which a user ...

Towards Linking the Lakh and IMSLP Datasets

This paper investigates the problem of matching a MIDI file against a la...

Camera-Based Piano Sheet Music Identification

This paper presents a method for large-scale retrieval of piano sheet mu...

MIDI-Sheet Music Alignment Using Bootleg Score Synthesis

MIDI-sheet music alignment is the task of finding correspondences betwee...

Kinetic Song Comprehension: Deciphering Personal Listening Habits via Phone Vibrations

Music is an expression of our identity, showing a significant correlatio...

Multi-modal Conditional Bounding Box Regression for Music Score Following

This paper addresses the problem of sheet-image-based on-line audio-to-s...

Please sign up or login with your details

Forgot password? Click here to reset