27
Music Information Retrieval in Python Steve Tjoa Tuesday, October 21, 2014 Hackbright Academy

Music Information Retrieval in Python - Hackbright … · Music Information Retrieval in Python Steve Tjoa Tuesday, October 21, 2014 Hackbright Academy. ... Dictionary learning, sparse

Embed Size (px)

Citation preview

Music Information Retrieval in Python

Steve Tjoa Tuesday, October 21, 2014

Hackbright Academy

Acknowledgements

• Angie Chang, Wendy Dherin, Hackbright Academy

• Jay LeBoeuf (realindustry.org, iZotope, Imagine Research)

• Owen Campbell (Humtap, UCSB)

A bit about me

What is MIR?

• fingerprinting

• cover song detection

• genre recognition

• transcription

• recommendation

• symbolic melodic similarity

• onset detection

• mood

• source separation

• instrument recognition

• pitch tracking

• tempo estimation

• score alignment

• song structure/form

• beat tracking

• key detection

• query by humming

• query by tapping

Why Python?

• Because it’s not Matlab.

• free

• general purpose

• nice syntax

• easy to learn

• fast to develop

• popular

• lots of libraries

• good for signal processing (NumPy, SciPy)

• high demand

Why Python?

MIR System Architecture

Segmentation; Preprocessing

Feature Extraction

Machine Learning

Audio

Musical Information

Kick Drum

Snare Drum

Music Search and Retrieval

12th International Society for Music Information Retrieval Conference (ISMIR 2011)

FACTORIZATION OF OVERLAPPING HARMONIC SOUNDS USINGAPPROXIMATE MATCHING PURSUIT

Steven K. TjoaImagine Research

San Francisco, CA 94114 [email protected]

K. J. Ray LiuUniversity of Maryland

College Park, MD 20742 [email protected]

ABSTRACT

Factorization of polyphonic musical signals remains a dif-ficult problem due to the presence of overlapping harmon-ics. Existing dictionary learning methods cannot guaranteethat the learned dictionary atoms are semantically meaning-ful. In this paper, we explore the factorization of harmonicmusical signals when a fixed dictionary of harmonic soundsis already present. We propose a method called approxi-mate matching pursuit (AMP) that can efficiently decom-pose harmonic sounds by using a known predetermined dic-tionary. We illustrate the effectiveness of AMP by decom-posing polyphonic musical spectra with respect to a largedictionary of instrumental sounds. AMP executes faster thanorthogonal matching pursuit yet performs comparably basedupon recall and precision.

1. INTRODUCTION

Dictionary learning, sparse coding, and constrained factor-ization algorithms have recently revolutionized the way weperform music transcription and source separation. Manyresearchers have reported success when decomposing sim-ple musical signals using nonnegative matrix factorization(NMF) [23] or methods based upon sparse coding such asK-SVD [1,2]. Unfortunately, problems remain for intricate,polyphonic musical signals. When musical notes overlapin time and frequency, the separation and transcription per-formance of these basic dictionary learning methods dimin-ishes rapidly. In such a case, the algorithm will usually learna dictionary where each individual atom contains informa-tion from multiple musical sources, thus hindering our at-tempts at decomposition.

Researchers have slowly improved upon the original dic-tionary learning methods by adding constraints to the learn-

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page.c� 2011 International Society for Music Information Retrieval.

ing process. By restricting the dictionary atoms to residewithin a predetermined feasible set, we can ensure that thelearned atoms will be useful at the conclusion of the learn-ing process. For example, existing solutions include addingconstraints to the dictionary learning process such as har-monicity [3, 25] or smoothness [3, 26].

Another solution is to add structure to the dictionary. Forexample, one can construct and use a large, predefined, over-complete dictionary where each atom is already labeled andassumed to contain information from only one musical source.Instead of learning an optimal dictionary for a given musi-cal signal, it may suffice to match the signal to this large setof precomputed, labeled dictionary atoms. Then, by decom-posing a signal with respect to this fixed dictionary, classifi-cation is easily achieved by simply reading the label of theatom. As musical databases become more available, con-struction of predefined dictionaries will become easier, thusreducing the need for adaptive dictionary learning.

Of course, the performance of such an algorithm dependsupon the breadth of the dictionary. When atoms from moremusical sources are added to the dictionary, the dictionary’sability to decompose polyphonic music will improve. How-ever, dictionary growth introduces concerns related to scal-ability and computational complexity. While the aforemen-tioned algorithms have significantly advanced the state ofthe art, they remain slow and difficult to scale as the dictio-nary size increases. Most of the original factorization meth-ods such as matching pursuit (MP) [18] and NMF with mul-tiplicative updates [17] have complexity that is linear in thesize of the dictionary. As a result, when dictionary sizesgrow, the transcription efficiency of these algorithms dimin-ishes.

To summarize the problem: how can we make use ofa large, precomputed, overcomplete dictionary to factorizeoverlapping harmonic sounds accurately and efficiently?

We address this problem by proposing a variant of MPcalled approximate matching pursuit (AMP). Unlike MP andNMF, AMP can decompose signals into a sparse combina-tion of atoms with complexity that is sublinear in the dictio-nary size while maintaining accuracy. To do this, AMP uses

257

For more…

• IEEE ICASSP

• IEEE Trans. Audio Speech Language Processing

• ACM Multimedia

• Computer Music Journal

• Journal of New Music Research

• NumPy

• SciPy

• IPython notebook

• scikit-learn

What is the range of a viola?

• As far as you can kick it.