Onset Detection in Audio Music

Onset Detection in Audio Music

J.-S Roger Jang (張智星 )

http://mirlab.org/jang

MIR Lab, CSIE Dept.

National Taiwan University

-2-

What Are Onsets?

Energy profile of an instrument is usually modeled by 4 stages of ADSR

Onset is the time when the slope is the highest, during the attack time.

Soft onsets via violins, etc, are much harder to define and detect.

-3-

Onset Detection

Music types Monophonic Easier

Polyphonic Harder

Instrument types Percussive instruments Easier

String instruments Harder (soft onsets)

-4-

Why Onset Detection is Important?

It is fundamental in audio music analysis Music transcription (from wave to midi)

Music editing (Song segmentation) Tempo estimation and beat tracking Musical fingerprinting (the onset trace can serve as a robust feature for fingerprinting)

-5-

Onset Detection Function

ODF (onset detection function) creates a curve of onset strength, aka Onset strength curve (OSC)

Novelty curve

Most ODFs are based on time-frequency representation (spectrogram) of Magnitude of STFT (Short-time Fourier transform)

Phase of STFT Mel-band of STFT Constant-Q transform

-6-

ODF: Spectral Flux

Concept sum the positive change in each frequency bin

rectifier wave-halfAka x) max(0,2

)(

size frame: index, freq: index, time:

)),1(),(()(1

xxxh

Nkn

knXknXhnsfN

k

),1( knX ),( knX

-7-

Flowchart of OSC

Steps of OSC Spectrogram Mel-band spectrogram Spectral flux Smoothed OSC via Gaussian smoothing Trend of OSC via Gaussian smoothing Trend-subtracted OSC

Check out wave2osc.m to see these steps.

-8-

Mel-freq Spectrogram

40 filters in Mel-freq filter bank

Spectrograms

Linear frequency bins20 40 60 80 100 120

Mel

freq

uenc

y bi

ns

10

20

30

40

Linear frequency bins0 20 40 60 80 100 120 140

0

0.005

0.01

0.015

0.0240 filter banks

200 400 600 800 1000 1200

2060

100

200 400 600 800 1000 1200

2040

Linear freq

Mel freq

spec1

spec2

M

spec2=M*spec1

See melBinPlot.m

-9-

Mel-freq Representation

About mel-freq spectrogram Advantage: More correlated to human perception (just like MFCC in speech recognition)

The degree of effectiveness is yet to be verified

-10-

Spectral Flux

Linear spectrogram

200 400 600 800 1000 1200

Fre

q bi

ns

20

40

60

80

100

120

Frame index200 400 600 800 1000 1200

0.1

0.2

0.3

0.4

0.5

OSC via spectral flux

rawOsc=mean(max(0, diff(magSpec, 1, 2)));

-11-

Smoothing and Trend Removal

Smoothing Trend removal

-12-

Example of OSC

Try “wave2osc.m”

Time (sec)

Fre

q. b

in in

dex

Spectrogram

0.5 1 1.5 2 2.5 3 3.5 4 4.5

10

20

30

40

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

0.005

0.01

0.015

0.02

0.025

Time (sec)

Am

plitu

de

OSC (original and smoothed)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

2

4

6

8x 10

-3

Time (sec)

Am

plitu

de

Smoothed OSC and its trend

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

1

2

3

4

5x 10

-3

Time (sec)

Am

plitu

de

Trend-subtracted OSC

-13-

What Can You Do With OSC...

OSC onsets Pick peaks to have onsets

OSC tempo (BPM, beats per minute) Apply ACF (or other PDF) to find the BPM

OSC beat tracking Pick equal-spaced peaks to have beat positions

-14-

Beat Tracking

Demos http://mirlab.org/demo/beatTracking

Try “beatTrack” in SAP toolbox

-15-

Example of Beat Tracking

beatTrack.m

Tempo estimation via ACF8 candidate sets for beat positions

Identifiedbeat positions

-16-

Performance Indices ofBeat Tracking

Many performance indices of BT Check out audio beat tracking task of MIREX

Mostly adopted ones Precision, recall, f-measure, accuracy

Try simSequence.m in SAP toolbox

0 1 2 3 4 5 6

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

TP TP TPFP FP FP

FN FN

Computed

GT

Precision = tp/(tp+fp)=3/(3+3) = 0.5Recall = tp/(tp+fn)=3/(3+2) = 0.6F-measure = tp/(tp+(fn+fp)/2)=3/(3+(2+3)/2) = 0.545Accuracy = tp/(tp+fn+fp)=3/(3+2+3) = 0.375

Documents

Onset Detection in Audio Music