Note-level Music Transcription by Maximum Likelihood Sampling Zhiyao Duan ¹ & David Temperley ² 1.Department of Electrical and Computer Engineering 2.Eastman

Note-level Music Transcription by Maximum Likelihood Sampling Zhiyao Duan & David Temperley 1.Department of Electrical and Computer Engineering 2.Eastman School of Music University of Rochester Presentation at ISMIR 2014 Taipei, Taiwan October 28, 2014

Different Levels of Music Transcription Frame-level (multi-pitch estimation) Estimate pitches and polyphony in each frame Many methods Note-level (note tracking) Estimate pitch, onset, offset of notes Fewer methods Song-level (multi-pitch streaming) Stream pitches by sources Very few methods 2

Existing Note Tracking Methods Connect proximate frame-level pitch estimates Misses in pitch estimates will cause fragmented notes False alarms will generate spurious notes that are unreasonably short Fill gaps and prune short notes Deals with notes individually, and does not consider interactions between different notes 3 Frame-level pitch estimates Ryynanen05, Bello06, Kameoka07, Poliner07, Lagrange07, Chang08, Raczynski09, Dessein10, Grindlay11, Benetos11, Grosche12, etc.

Problems Contains many spurious notes caused by consistent MPE errors (usually octave/harmonic errors) Often violates instantaneous polyphony constraints 4 Ground-truth Results from the existing connect-fill-prune approach

Our Idea Consider interactions between notes A generation-evaluation strategy Generate a number of transcription candidates Evaluate each candidate on how well its notes explain the audio as a whole 5

Proposed System 6 Generate subsets as transcription candidates Evaluate candidates and select the best [Duan, Pardo, & Zhang, 2010]

Note Sampling Strategies What we want Sampling space not too big Only sample good notes Diversity in transcription candidates Candidates obey polyphony constraints 7 How to sample efficiently and effectively?

Note Sampling Algorithm 8

Note Likelihood Indicates how good the note is by itself Also called salience, activation, strength Note likelihood = geometric mean of single- pitch-likelihood of pitches in the note Multi-pitch estimation algorithms almost always estimate a likelihood (salience) for each pitch estimate 9

Candidate Evaluation 10

Single-pitch vs. Multi-pitch Likelihood Single-pitch likelihood (salience) Note likelihood E.g., total spectral energy at its harmonic positions Describes how well a pitch fits in the audio individually A correct pitch usually has a high likelihood Octave/harmonic errors may also have high likelihood Multi-pitch likelihood Transcription likelihood Defined as the match between spectral peaks and harmonics of all pitches Describes how well a set of pitches explain the audio as a whole Octave/harmonic relations would not improve likelihood much 11

An Example Pitch candidateC3C4E4 Log single-pitch- likelihood -338.8-466.9-475 Pitch set candidate{C3}{C3, C4}{C3, E4} Log multi-pitch- likelihood -338.8-346.2-318.9 Trombone: C3 Violin: E4 12 Higher value is better

Experiments Bach10 dataset: 110 polyphonic combinations derived from 10 pieces of 4-part J.S. Bach chorales, played by violin, clarinet, saxophone, and bassoon 60 duets, 40 trios, 10 quartets Comparison methods Benetos13: shift-invariant PLCA (frame-level) + median filtering of pitch activity matrix (note-level) Klapuri06: iterative spectral subtraction (frame-level) + our preliminary note tracking (note-level) 13

Performance Measures 14

Comparison with state of the art 15

Works with state of the art 16

Example 17

Conclusions A new method for note-level transcription, considering note interactions Generate transcription candidates by sampling notes according to note length and note likelihood, derived from single-pitch likelihood Evaluate candidates according to transcription likelihood, derived from multi-pitch likelihood Good performance against state of the art Can work with any MPE or note tracking algorithm, as long as single-pitch likelihood (salience) is calculated 18

Limitations and Future Work Only removes spurious notes, but cant add back missed notes Different runs of sampling are independent A better sampling technique E.g., Using Markov Chain Monte Carlo to add back missed notes and to consider dependencies between different runs of sampling A better evaluation technique E.g., considering musical knowledge to evaluate the musical plausibility of transcription candidates 19

Documents

Note-level Music Transcription by Maximum Likelihood Sampling Zhiyao Duan ¹ & David Temperley ² 1.Department of Electrical and Computer Engineering 2.Eastman