33
111/06/09 1 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang 張張張Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan Univ. http://mirlab.org/jang

2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

Embed Size (px)

Citation preview

Page 1: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

112/04/21 1

Music Information Retrieval:Overview and Challenges

J.-S. Roger Jang (張智星)Multimedia Information Retrieval (MIR) Lab

CSIE Dept, National Taiwan Univ.

http://mirlab.org/jang

Page 2: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-2-

Outline

Music information Retrieval (MIR) Intro to MIR Intro to ISMIR & MIREX

Two classical paradigms of MIR QBSH (query by singing/humming) AFP (audio fingerprinting)

Conclusions

Page 3: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-3-

Introduction to QBSH

QBSH: Query by Singing/Humming Input: Singing or humming from microphone Output: A ranked list retrieved from the song database

according to similarity to the query

Progression First paper: Around 1994 Extensive studies since 2001 State of the art: QBSH tasks at ISMIR/MIREX, since

2006

Page 4: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-4-

Two Steps in QBSH

Pitch Tracking To detect the period of a

waveform Time domain (時域 )

ACF (Autocorrelation function)

NSDF (Normalized squared difference function)

AMDF (Average magnitude difference function)

Frequency domain (頻域 )Harmonic product spectrumCepstrum

Database comparison To find similarity between

query and database songs Linear scaling Dynamic time warping Recursive alignment Hybrid methods

Page 5: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-5-

Frame Blocking for Pitch Tracking

Sample rate = 16 kHzFrame size = 512 samplesFrame duration = 512/16000 = 0.032 s = 32 msOverlap = 192 samplesHop size = frame size – overlap = 512-192 = 320 samplesFrame rate = 16000/320 = 50 frames/sec = Pitch rate

0 50 100 150 200 250 300-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

Zoom in

Overlap

Frame

0 500 1000 1500 2000 2500-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

Page 6: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-6-

ACF: Auto-correlation Function

Shifted frame s(t-):

Original frame s(t):

=30 acf(30) = inner product of the overlap part

Pitch period

To play safe, the frame size needs to cover at least two fundamental periods!

1n

t

acf s t s t

Page 7: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-7-

Frequency to Semitone Conversion

Semitone : A music scale based on A440

Reasonable pitch range: E2 - C6 82 Hz - 1047 Hz ( - )

69440

log12 2

freq

semitone

Page 8: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-8-

Demos

Pitch related demos Pitch tracking Pitch shift

Page 9: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-9-

Basic Comparison Method:Linear Scaling

Scale the query pitch linearly to match the candidates

Original input pitch

Stretched by 1.25

Stretched by 1.5

Compressed by 0.75

Compressed by 0.5

Target pitch in database

Best match

Original pitch

Page 10: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-10-

Typical Result of Pitch Tracking

Pitch tracking via autocorrelation for 茉莉花 (jasmine)聲音

Page 11: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-11-

Comparison of Pitch VectorsYellow line : Target pitch vector

Page 12: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-12-

QBSH Demos

QBSH demos by our lab Description QBSH on the web: MIRACLE QBSH on toys

Existing commercial QBSH systems www.midomi.com www.soundhound.com

Page 13: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-13-

Our QBSH System: MiracleSingle server with GPU

NVIDIA 560 Ti, 384 cores (speedup factor = 10)

Master server

Clients Single server

PC

PDA/Smartphone

Cellular

Master serverRequest: pitch vector

Response: search result

Database size: ~20,000

Page 14: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-14-

Improving QBSH

Many ways to improve QBSH Sorted error vector Various weight for rests Re-ranking for better accuracy Better memory arrangement in GPU …

Page 15: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-15-

Intro to Audio Fingerprinting (AFP)

Goal Identify a noisy version of a given audio clips

Also known as… “Query by exact example” no “cover versions”

are allowed

Page 16: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-16-

AFP Applications

Commercial applications of AFP Music identification & purchase Royalty assignment (over radio) TV shows or commercials ID (over TV) Copyright violation (over web)

Major commercial players Shazam, Soundhound, Intonow, Viggle…

Page 17: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-17-

Two Stages in AFP

Offline Feature extraction Hash table construction

for songs in database Inverted indexing

Online Feature extraction Hash table search Ranked list of the

retrieved songs/music

Page 18: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-18-

Robust Feature Extraction

Various kinds of features for AFP Invariance along time and frequency Landmark of a pair of local maxima Wavelets …

Extensive test required for choosing the best features

Page 19: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-19-

Representative Approaches to AFP

Philips J. Haitsma and T.

Kalker, “A highly robust audio fingerprinting system”, ISMIR 2002.

Shazam A.Wang, “An industrial-

strength audio search algorithm”, ISMIR 2003

Google S. Baluja and M. Covell,

“Content fingerprinting using wavelets”, Euro. Conf. on Visual Media Production, 2006.

V. Chandrasekhar, M. Sharifi, and D. A. Ross, “Survey and evaluation of audio fingerprinting schemes for mobile query-by-example applications”, ISMIR 2011

Page 20: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-20-

Improvement on AFP

Re-ranking of AFP by learning to rankDemo:

http://mirlab.org/demo/audioFingerprinting

Page 21: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-21-

Shazam’s Method

Ideas Take advantage of music local structures

Find salient peaks on spectrogramPair peaks to form landmarks for comparison

Efficient search by hash tablesUse positions of landmarks as hash keysUse song ID and offset time as hash valuesUse time constraints to find matched landmarks

Page 22: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-22-

How to Find Salient Peaks

We need to find peaks that are salient along both frequency and time axes Frequency axis: Gaussian local smoothing Time axis: Decaying threshold over time

Page 23: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-23-

How to Find Initial Threshold?

Goal To suppress neighboring

peaks

Ideas Find the local max. of mag.

spectra of initial 10 frames Superimpose a Gaussian on

each local max. Find the max. of all

Gaussians 50 100 150 200 2500

0.5

1

1.5

2

2.5

3

3.5

4

Original signal

Positive local maximaFinal output

Page 24: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-24-

How to Update the Threshold along Time?

Decay the threshold Find local maxima larger

than the threshold salient peaks

Define the new threshold as the max of the old threshold and the Gaussians passing through the active local maxima

Page 25: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-25-

Time-decaying Thresholds

Frame index

Fre

q in

dex

Forward pass

200 400 600 800 1000 1200

50

100

150

200

250

1

2

3

4

5

Frame index

Fre

q in

dex

Backward pass

200 400 600 800 1000 1200

50

100

150

200

250

1

2

3

4

5

Forward:

Backward:

Page 26: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-26-

How to Pair Salient Peaks?

Target zone

Page 27: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-27-

Salient Peaks and Landmarks

Peak picking after forward smoothing

Matched landmarks (green)

(Source: Dan Ellis)

Page 28: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-28-

Landmarks for Hash Table Access

Page 29: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-29-

Optimization Strategies for AFP

Several ways to optimize AFP Strategy for query landmark extraction Confidence measure Incremental retrieval Better use of the hash table Re-ranking for better performance

Page 30: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-30-

Demos of Audio Fingerprinting

Commercial apps Shazam Soundhound

Our demo http://mirlab.org/demo/audioFingerprinting

Page 31: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-31-

QBSH vs. AFP

QBSH Goal: MIR Feature: Pitch

PerceptibleSmall data size

Method: LS Database

Harder to collectSmall storage

BottleneckCPU/GPU-bound

AFP Goal: MIR Features: Landmarks

Not perceptibleBig data size

Method: Matched LM Database

Easier to collectLarge storage

BottleneckI/O-bound

Page 32: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-32-

Conclusions

Successful applications in MIR QBSH AFP

Due to Faster bigger memory Advances in GPU/CPU

(Moore’s law) New machine learning

methods

Challenges in MIR Audio melody extraction

from polyphonic musicDatabase collection for

QBSHCover song ID (which

cannot handled by AFP)

Polyphonic music transcription

Page 33: 2015/12/71 Music Information Retrieval: Overview and Challenges J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan

-33-

Thank you for your attention!

Questions & comments?