Upload
theodore-harvey
View
286
Download
3
Embed Size (px)
Citation preview
Retrieval Methods for QBSH (Query By Singing/Humming)
J.-S. Roger Jang (張智星 )
http://mirlab.org/jang
Multimedia Information Retrieval Lab
CSIE Dept, National Taiwan University
Retrieval Methods for QBSH
Goal Find the most similar melody in the database
Challenges Robust pitch tracking for various acoustic inputs
Input from mobile devicesInput at a noisy karaoke box
Comparison methods should be able to deal with…Key variations in users’ input (for instance, due to gender
difference)Tempo variations in users’ inputReasonable response time, e.g., 5 seconds
Evaluation of QBSH Methods
Two categories for evaluating QBSH methods Efficiency: How fast is the system?
Can it deal with a music database of size 100K?
Effectiveness: How accurate is the system?Top-10 recognition rates for n queries:
• (1+0+0+1+1…)/n
Top-10 mean reciprocal rank for n queries:• (1/3+1/inf+1/4+1/2+1/5…)/n
True positive and true negative to deal with out-of-vocabulary (OOV) problem
Types of QBSH Approaches
Categories of approaches to QBSH Histogram/statistics-based Note vs. note
Edit distance
Frame vs. noteHMM
Frame vs. frameLinear scaling, DTW, recursive alignment
Linear Scaling (LS)
Concept Scale the query linearly to match the candidates
Assumption Uniform tempo variation
Rest handling Cut leading and trailing zeros (silence) All the other zeros (rests) are replaced with the
previous non-zero pitch
Linear Scaling
Scale the query pitch linearly to match the candidates
Original input pitch
Stretched by 1.25
Stretched by 1.5
Compressed by 0.75
Compressed by 0.5
Target pitch in database
Best match
Original pitch
Strength and Weakness of LS
Strength One-shot for dealing
with key transposition Efficient and effective Indexing methods
available
Weakness Cannot deal with non-
uniform tempo variations
Typical mapping path
Shorten or Lengthen a Pitch Vector
Given a pitch vector x of length m, how to shorten or lengthen it to length n? x2=interp1(1:m, x, linspace(1, m, n)); Examples
m=7, n=13m=7, n=9
Distance Function for LS
Commonly used distance function for LS Normalized Lp-norm
Characteristics Usually p=1 or 2 for LS Normalization to get rid of length variations
pp
n
pp
p n
xxxxL
/1
21)(
Key Transposition in LS
How to find the best transposed query that has the smallest distance from the database items: Best transposition
In practice…
)(minargˆ rsqLs ps
Query
Database item
Transposed query
)()()(ˆ1
)()()(ˆ2
rmedianqmedianrqmediansp
rmeanqmeanrqmeansp
Example of Linear Scaling via L1 Norm
linScaling01.m
0 50 100 150 200 250 300 350
50
60
70Database and input pitch vectors
Sem
itone
s
Database pitch
Input pitch
0 50 100 150 200 250 300 350
50
60
70
Sem
itone
s
Database and scaled pitch vectors
Database pitch
Scaled pitch
0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.50
2
4
Scaling factor
Dis
tanc
e
Normalized distance
Linear Scaling via L1 and L2 Norm
linScaling02.m
0 50 100 150 200 250 300 350
50
60
70Database and input pitch vectors
Sem
itone
s
Database pitch
Input pitch
0 50 100 150 200 250 300 350
50
60
70
Sem
itone
s
Database and scaled pitch vectors
Database pitch
Scaled pitch via L1 norm
Scaled pitch via L2 norm
0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.50
5
Scaling factor
Dis
tanc
es
Normalized distances via L1 & L
2 norm
L
1 norm
L2 norm
DTW (Dynamic Time Warping)
About DTW DTW introduction DTW for QBSH#1 method for task 2 in QBSH/MIREX 2006
RA (Recursive Alignment)
Characteristics Combine characteristics
of LS & DTW #1 method for task 1 in
QBSH/MIREX 2006
A typical mapping path
Modified Edit Distance
Note segmentation
Modified edit distance
,
)(}2),,....,,({
)(}2),,,....,({
)(),(
)(),(
)(),(
min
1,1
11,
1,1
1,
,1
,
ionfragmentatjkbbawd
ionconsolidatikbaawd
treplacemenbawd
insertionbwd
deletionawd
d
jkjikji
jikijki
jiji
jji
ji
ji