Upload
goro
View
68
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Onset Detection in Audio Music. J.-S Roger Jang ( 張智星 ) http://mirlab.org/jang MIR Lab , CSIE Dept. National Taiwan University. What Are Note Onsets?. Energy profile of a percussive instrument is modeled as ADSR stages - PowerPoint PPT Presentation
Citation preview
Onset Detection in Audio Music
J.-S Roger Jang (張智星 )
http://mirlab.org/jang
MIR Lab, CSIE Dept.
National Taiwan University
-2-
What Are Onsets?
Energy profile of an instrument is usually modeled by 4 stages of ADSR
Onset is the time when the slope is the highest, during the attack time.
Soft onsets via violins, etc, are much harder to define and detect.
-3-
Onset Detection
Music types Monophonic Easier
Polyphonic Harder
Instrument types Percussive instruments Easier
String instruments Harder (soft onsets)
-4-
Why Onset Detection is Important?
It is fundamental in audio music analysis Music transcription (from wave to midi)
Music editing (Song segmentation) Tempo estimation and beat tracking Musical fingerprinting (the onset trace can serve as a robust feature for fingerprinting)
-5-
Onset Detection Function
ODF (onset detection function) creates a curve of onset strength, aka Onset strength curve (OSC)
Novelty curve
Most ODFs are based on time-frequency representation (spectrogram) of Magnitude of STFT (Short-time Fourier transform)
Phase of STFT Mel-band of STFT Constant-Q transform
-6-
ODF: Spectral Flux
Concept sum the positive change in each frequency bin
rectifier wave-halfAka x) max(0,2
)(
size frame: index, freq: index, time:
)),1(),(()(1
xxxh
Nkn
knXknXhnsfN
k
),1( knX ),( knX
-7-
Flowchart of OSC
Steps of OSC Spectrogram Mel-band spectrogram Spectral flux Smoothed OSC via Gaussian smoothing Trend of OSC via Gaussian smoothing Trend-subtracted OSC
Check out wave2osc.m to see these steps.
-8-
Mel-freq Spectrogram
40 filters in Mel-freq filter bank
Spectrograms
Linear frequency bins20 40 60 80 100 120
Mel
freq
uenc
y bi
ns
10
20
30
40
Linear frequency bins0 20 40 60 80 100 120 140
0
0.005
0.01
0.015
0.0240 filter banks
200 400 600 800 1000 1200
2060
100
200 400 600 800 1000 1200
2040
Linear freq
Mel freq
spec1
spec2
M
spec2=M*spec1
See melBinPlot.m
-9-
Mel-freq Representation
About mel-freq spectrogram Advantage: More correlated to human perception (just like MFCC in speech recognition)
The degree of effectiveness is yet to be verified
-10-
Spectral Flux
Linear spectrogram
200 400 600 800 1000 1200
Fre
q bi
ns
20
40
60
80
100
120
Frame index200 400 600 800 1000 1200
0.1
0.2
0.3
0.4
0.5
OSC via spectral flux
rawOsc=mean(max(0, diff(magSpec, 1, 2)));
-11-
Smoothing and Trend Removal
Smoothing Trend removal
-12-
Example of OSC
Try “wave2osc.m”
Time (sec)
Fre
q. b
in in
dex
Spectrogram
0.5 1 1.5 2 2.5 3 3.5 4 4.5
10
20
30
40
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.005
0.01
0.015
0.02
0.025
Time (sec)
Am
plitu
de
OSC (original and smoothed)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
2
4
6
8x 10
-3
Time (sec)
Am
plitu
de
Smoothed OSC and its trend
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
1
2
3
4
5x 10
-3
Time (sec)
Am
plitu
de
Trend-subtracted OSC
-13-
What Can You Do With OSC...
OSC onsets Pick peaks to have onsets
OSC tempo (BPM, beats per minute) Apply ACF (or other PDF) to find the BPM
OSC beat tracking Pick equal-spaced peaks to have beat positions
-14-
Beat Tracking
Demos http://mirlab.org/demo/beatTracking
Try “beatTrack” in SAP toolbox
-15-
Example of Beat Tracking
beatTrack.m
Tempo estimation via ACF8 candidate sets for beat positions
Identifiedbeat positions
-16-
Performance Indices ofBeat Tracking
Many performance indices of BT Check out audio beat tracking task of MIREX
Mostly adopted ones Precision, recall, f-measure, accuracy
Try simSequence.m in SAP toolbox
0 1 2 3 4 5 6
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
TP TP TPFP FP FP
FN FN
Computed
GT
Precision = tp/(tp+fp)=3/(3+3) = 0.5Recall = tp/(tp+fn)=3/(3+2) = 0.6F-measure = tp/(tp+(fn+fp)/2)=3/(3+(2+3)/2) = 0.545Accuracy = tp/(tp+fn+fp)=3/(3+2+3) = 0.375