Note-level Music Transcription by Maximum Likelihood Sampling Zhiyao Duan ¹ & David Temperley ²...
20
Note-level Music Transcription by Maximum Likelihood Sampling Zhiyao Duan ¹ & David Temperley ² 1. Department of Electrical and Computer Engineering 2. Eastman School of Music University of Rochester Presentation at ISMIR 2014 Taipei, Taiwan October 28, 2014
Note-level Music Transcription by Maximum Likelihood Sampling Zhiyao Duan ¹ & David Temperley ² 1.Department of Electrical and Computer Engineering 2.Eastman
Note-level Music Transcription by Maximum Likelihood Sampling
Zhiyao Duan & David Temperley 1.Department of Electrical and
Computer Engineering 2.Eastman School of Music University of
Rochester Presentation at ISMIR 2014 Taipei, Taiwan October 28,
2014
Slide 3
Different Levels of Music Transcription Frame-level
(multi-pitch estimation) Estimate pitches and polyphony in each
frame Many methods Note-level (note tracking) Estimate pitch,
onset, offset of notes Fewer methods Song-level (multi-pitch
streaming) Stream pitches by sources Very few methods 2
Slide 4
Existing Note Tracking Methods Connect proximate frame-level
pitch estimates Misses in pitch estimates will cause fragmented
notes False alarms will generate spurious notes that are
unreasonably short Fill gaps and prune short notes Deals with notes
individually, and does not consider interactions between different
notes 3 Frame-level pitch estimates Ryynanen05, Bello06, Kameoka07,
Poliner07, Lagrange07, Chang08, Raczynski09, Dessein10, Grindlay11,
Benetos11, Grosche12, etc.
Slide 5
Problems Contains many spurious notes caused by consistent MPE
errors (usually octave/harmonic errors) Often violates
instantaneous polyphony constraints 4 Ground-truth Results from the
existing connect-fill-prune approach
Slide 6
Our Idea Consider interactions between notes A
generation-evaluation strategy Generate a number of transcription
candidates Evaluate each candidate on how well its notes explain
the audio as a whole 5
Slide 7
Proposed System 6 Generate subsets as transcription candidates
Evaluate candidates and select the best [Duan, Pardo, & Zhang,
2010]
Slide 8
Note Sampling Strategies What we want Sampling space not too
big Only sample good notes Diversity in transcription candidates
Candidates obey polyphony constraints 7 How to sample efficiently
and effectively?
Slide 9
Note Sampling Algorithm 8
Slide 10
Note Likelihood Indicates how good the note is by itself Also
called salience, activation, strength Note likelihood = geometric
mean of single- pitch-likelihood of pitches in the note Multi-pitch
estimation algorithms almost always estimate a likelihood
(salience) for each pitch estimate 9
Slide 11
Candidate Evaluation 10
Slide 12
Single-pitch vs. Multi-pitch Likelihood Single-pitch likelihood
(salience) Note likelihood E.g., total spectral energy at its
harmonic positions Describes how well a pitch fits in the audio
individually A correct pitch usually has a high likelihood
Octave/harmonic errors may also have high likelihood Multi-pitch
likelihood Transcription likelihood Defined as the match between
spectral peaks and harmonics of all pitches Describes how well a
set of pitches explain the audio as a whole Octave/harmonic
relations would not improve likelihood much 11
Slide 13
An Example Pitch candidateC3C4E4 Log single-pitch- likelihood
-338.8-466.9-475 Pitch set candidate{C3}{C3, C4}{C3, E4} Log
multi-pitch- likelihood -338.8-346.2-318.9 Trombone: C3 Violin: E4
12 Higher value is better
Slide 14
Experiments Bach10 dataset: 110 polyphonic combinations derived
from 10 pieces of 4-part J.S. Bach chorales, played by violin,
clarinet, saxophone, and bassoon 60 duets, 40 trios, 10 quartets
Comparison methods Benetos13: shift-invariant PLCA (frame-level) +
median filtering of pitch activity matrix (note-level) Klapuri06:
iterative spectral subtraction (frame-level) + our preliminary note
tracking (note-level) 13
Slide 15
Performance Measures 14
Slide 16
Comparison with state of the art 15
Slide 17
Works with state of the art 16
Slide 18
Example 17
Slide 19
Conclusions A new method for note-level transcription,
considering note interactions Generate transcription candidates by
sampling notes according to note length and note likelihood,
derived from single-pitch likelihood Evaluate candidates according
to transcription likelihood, derived from multi-pitch likelihood
Good performance against state of the art Can work with any MPE or
note tracking algorithm, as long as single-pitch likelihood
(salience) is calculated 18
Slide 20
Limitations and Future Work Only removes spurious notes, but
cant add back missed notes Different runs of sampling are
independent A better sampling technique E.g., Using Markov Chain
Monte Carlo to add back missed notes and to consider dependencies
between different runs of sampling A better evaluation technique
E.g., considering musical knowledge to evaluate the musical
plausibility of transcription candidates 19