Automatic transcription of polyphonic piano music using a note masking technique

Preview:

DESCRIPTION

Automatic transcription of polyphonic piano music using a note masking technique. Mr Ronan Kelly and Dr Jacqueline Walker Department of Electronic & Computer Engineering University of Limerick Ronan.kelly@ericsson.com , jacqueline.walker@ul.ie. Overview. Music transcription Our approach - PowerPoint PPT Presentation

Citation preview

Automatic transcription of polyphonic piano music using a note masking

technique

Mr Ronan Kelly and Dr Jacqueline Walker

Department of Electronic & Computer Engineering

University of Limerick

Ronan.kelly@ericsson.com, jacqueline.walker@ul.ie

Overview

• Music transcription

• Our approach

• Onset detection

• Algorithm

• Results

• Conclusions

Music Transcription

• Complex cognitive task

Example: Top of the Pops!

• A challenging task for a computer but one which pushes boundaries of signal processing, pattern recognition, machine learning,….

Monophonic Music Transcription

• A solved problem– Sliding window-based analysis of melody

line– Steps – decimate – reduce data– Onset detecton– FFT or constant Q transform– Note detection

Polyphonic Music Transcription

• Multiple simultaneous notes

• In Western Tonal Music (WTM), notes played together almost inevitably share harmonics

• Impact of rhythms, held notes

• Possibility of multiple instruments

Approaches to Polyphonic Transcription

• Human audition based– Martin Cooke’s “Modelling Auditory Processing and

Organisation”, 1993– Brown & Cooke, “Computational Auditory Scene Analysis”,

1994

• Signal processing based– Tanguiane “Artificial Perception and Music

Recognition”, 1993

– Klapuri et al, since 1998

Our Approach

• Onset Detection

• Note Window & FFT

• Masking Scheme Iteration

Onset Detection

• NAE (Note Average Energy) Onset detection1.

1. (Liu, R., Griffith J., Walker, J. & Murphy, P., TIME DOMAIN NOTE AVERAGE ENERGY BASED MUSIC ONSET DETECTION, Proceedings of the Stockholm Music Acoustics Conference, August 6-9, 2003 (SMAC 03), Stockholm, Sweden

Figure 3 Energy (b), averaged energy (c) and note average energy (d) of power envelope (a).

Power Envelope p(t)

Energy e(t)

Note Average Energy NAE(t)

Average Energy a(t)

(a)

(b)

(c)

(d)

In practice, we search for local minima…

,1

)( t

tn

)( 1nn tttdt)(tp

tttNAE

n

Note Window• FFT performed on the whole note• Avoids start-of-note and end-of-note effects• Gives greater robustness against noise

Algorithm for Masking Scheme - 1

Continue until no peaks above threshold

FFT on note window

Find max peak in window

Remove peak from window; add to list

Algorithm for Masking Scheme - 2

Continue until list is empty

Apply mask to first (lowest) frequency in list

Adjust amplitudes of all affected frequencies by mask

Add frequency to note list; move to next frequency

Masking Scheme - 1

C4, E4, G4

Max. peak amplitude = 29.9 @ 392 Hz (G4)

262 Hz, 330 Hz, 392 Hz

Next peak amplitude = 21.4 @ 330 Hz

Masking Scheme - 2

05

1015202530

Amplitude

262 330 392 523

Frequency (Hz)

Detected frequency peaksFrequency (Hz) Amplitude

262 11.2

330 21.4

392 29.9

523 7.1

0

0.2

0.4

0.6

0.8

1

Amplitude

261 523 784

Frequency (Hz)

Frequency (Hz) Amplitude

260,261,262 100%

523,524 72%

784,785 41%

Note mask

Masking Scheme - 3

0

5

10

15

20

25

30

Amplitude

262 330 392 523

Frequency (Hz)

C4 Mask

Values Detected

Masking action

0 5

10 15 20 25 30

Amplitude

262 330 392 523 Frequency (Hz)

Remaining detected values

Frequency (Hz) Amplitude

330 21.4

392 29.9

523 3.1

After masking

Note played: C4

Building a Note Mask - 1

A note is played with other notes and the significant frequency peaks and amplitudes recorded:

harmonics of D4 in red

D4 harmonics in common in blue

Building a Note Mask - 2

05

101520253035404550

Amplitude

262 523 785 1047 1309 1570 1832

Frequency (Hz)

D4 Values

C4 Values

0

5

10 15 20

25 30 35

Amplitude

294 587 1174 1469 2056

Frequency (Hz)

D4 Values A4 Values D4 + A4 values

D4 and C4 D4 and A4

Building a Note Mask - 3

Frequency (Hz)

D4, C4 D4, E4 D4, F4 D4, G4 D4, A4 D4, B4

294 1 1 1 1 1 1

587 0.70 0.67 0.76 0.75 0.84 0.65

881 0.38 0.37 0.44 0.44 0.40

1175 0.11 0.12

1468 0.17 0.16 0.15 0.17 0.14

1762 0.12 0.11 0.12

2056 0.27 0.25 0.28 0.28 0.30 0.18

Extract values unique to D4 and normalise to amplitude of highest peak:

Building a Note Mask - 3

Average across samples:

0102030405060708090

100

Amplitude % of

Fundamental Frequency

294 587 881 1175 1468 1762 2056

Frequency

D4 Mask

Frequency (Hz) Amplitude

294 100%

587 72.69%

881 40.63%

1175 11.49%

1468 15.93%

1762 11.61%

2056 26.03%

Experimental Set-up

• Keyboard used: Technics KN800 PCM Keyboard

• Note range: C2 to B6

• Recording – direct using line-in

• Isolated chords and polyphonic music samples

Results

How to define error?

Need to account for both missed notes (m) and spurious notes (x)

%n

xmE% 100

n is number of notes detected – not number of notes played

Results – Isolated Chords

Notes Played

Notes detected

Missed notes Spurious notes

Total Error (%)

Chords

5-8 notes

243 225 18 0 8.0

Chords

3-4 notes

648 638 15 5 3.1

Chords 1898 1906 69 77 7.7

Results – Polyphonic Music

Notes played

Notes detected

Missed notes

Spurious notes

Total Error (%)

Danny Boy

(slow)

87 94 7 14 22

Danny Boy

(moderate)

91 98 8 15 23.5

Danny Boy

(fast)

90 99 8 17 25

Effect of Onset Detection

• Effective onset detection is crucial• Two types of errors:

Extra onset

less likely to cause a problem

but, … note divided up too finely

Missing onset

note windows not placed ‘correctly’

Results with Onset Detection

Notes played

Notes detected

Missed notes

Spurious notes

Total Error (%)

Danny Boy

(slow) 87 120 10 43 44

Danny Boy

(moderate)91 120 17 28 44

Danny Boy

(fast)90 120 23 37 58

Future Work

• Develop model for note combinations (polyphonic note masks)

• Use wider range of note combinations

• Develop an efficient approach to applying polyphonic note masks

• Improve note onset detection

Recommended