Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Mining for Meaning of Music - Ellis 2008-03-27 p. /451
1. Motivation: Oodles of Music2. Eigenrhythms & Eigenmelodies3. Melodic-Harmonic Fragments4. Other Projects
Mining for theMeaning of Music
Dan EllisLaboratory for Recognition and Organization of Speech and Audio
Dept. Electrical Engineering, Columbia University, NY USA
http://labrosa.ee.columbia.edu/
Mining for Meaning of Music - Ellis 2008-03-27 p. /452
LabROSA Overview
InformationExtraction
MachineLearning
SignalProcessing
Speech
Music EnvironmentRecognition
Retrieval
Separation
Mining for Meaning of Music - Ellis 2008-03-27 p. /453
1. Motivation: Oodles of Music
• The impact of the iPodcreates new research questions (music IR)but also: provides new tools for old questions
• What can you do with 100k+ tracks?around 9 months of listening..unsupervised data
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
“The Meaning of Music”Two kinds of “meaning”:
• What does music evoke in a listener’s mind?
i.e. “what does it all mean?” (metaphysics?)study with subjective experiments(then build detectors for specific responses ...?)
• What phenomena are denoted by “music”?
i.e. delineate the “set of all music”(the ultimate music/nonmusic classifier?).. this talk’s topic
4
?
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
Re-used Musical Elements• “What are the most popular chord
progressions?”a well-formed question...music occupies a small subset of some spacelook at massive audio archive?
• How can we distill a large collection of music audio into a compact description of what “music” means?or at least a vocabulary...
5
Scatter of PCA(3:6) of 12x16 beatchroma
10 20 30 40 50 60
10
20
30
40
50
60
10 20 30 40 50 60
10
20
30
40
50
60
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
Potential Applications• Given a description of the
musically valid subspace...compression: represent a given piece by its indices/parameters in the subspace
classification: subspace representation reveals ‘essence’; neighbors are interesting
manipulation: modifications within the space remain musically valid
6
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
2. Eigenrhythms: Drum Track Structure
• To first order, All pop music has the same drum track:
• Can we capture this from examples?.. including the variations
• Can we exploit it?.. for synthesis.. for classification.. for insight
Ellis & Arroyo ISMIR’04
7
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
Basis Sets• Dataset reduced to
linear combinations of a few basic patterns
bases H: subspace that spans the dataweights W: dimension-reduced projection of data
X = W × Hdataweights
bases
-1
0
1
×=
8
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
Different basis projections
• Principal Component Analysis (PCA)optimizes MSE of low-D reconstruction
• Independent Component Analysis (ICA)projections are independent (cf decorrelated)
• Linear Discriminant Analysis (LDA)given class labels for data, find dimensions to separate them
• Nonnegative Matrix Factorization (NMF)each basis function only adds bits in
0 2 4 60
1
2
3
4
5
6ICA basis
0 2 4 60
1
2
3
4
5
6PCA basis
-2 0 2-3
-2
-1
0
1
2
3LDA basis
0 2 4 60
1
2
3
4
5
6NMF basis
9
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
100 200 300 samples
BD
SN
HH
Data• Drum tracks extracted from MIDI
100 examples (10 × 10 genre classes)fixed mapping to 3 instruments:bass drum, snare, hi-hattemporary proxy for audio transcription...
• Pseudo-envelope representation40ms half-Gauss window sampled at 200 Hz
• Extract just one pattern from each MIDI looking for variety, quantity not a problem
10
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
Aligning Data: Tempo• Need to align patterns prior to PCA/...• First, normalize tempo
autocorrelation gives BPM candidateskeep them all for now...
11
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
Aligning Data: Downbeat• Downbeat from best match of tempo-
normalized pattern to mean templatetry every tempo hypotheses, choose best match
Original pattern scaled
12
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
Aligned Data
• Tempo normalization + downbeat alignment→ 100 excerpts (2 bars each):
• Can now extract basis projection(s)
13
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
Eigenrhythms (PCA)
• Need 20+ Eigenvectors for good coverage of 100 training patterns (1200 dims)
• Eigenrhythms both add and subtract
14
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
Posirhythms (NMF)
• Nonnegative: only adds beat-weight• Capturing some structure
Posirhythm 1
BDSNHH
BDSNHH
BDSNHH
BDSNHH
BDSNHH
BDSNHH
Posirhythm 2
Posirhythm 3 Posirhythm 4
Posirhythm 5
500 0 samples (@ 200 Hz)beats (@ 120 BPM)
100 150 200 250 300 350 4001 2 3 4 1 2 3 4
Posirhythm 6
50 100 150 200 250 300 350
-0.1
0
0.1
15
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
Eigenrhythms for Classification• Projections in Eigenspace / LDA space
• 10-way Genre classification (nearest nbr):PCA3: 20% correctLDA4: 36% correct
-20 -10 0 10-10
-5
0
5
10PCA(1,2) projection (16% corr)
-8 -6 -4 -2 0 2-4
-2
0
2
4
6LDA(1,2) projection (33% corr)
bluescountrydiscohiphophousenewwaverockpoppunkrnb
16
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
Eigenrhythm BeatBox• Resynthesize rhythms from eigen-space
17
Mining for Meaning of Music - Ellis 2008-03-27 p. /4518
Eigenmelodies?• Can we do a similar thing with melodies?• Cluster ‘fragments’ that recur in melodies
.. across large music database
.. one way to get fragment alignment?
.. trade data for model sophistication
• Data sourcespitch tracker, or MIDI training data
• Melody fragment representationDCT(1:20) - removes average, smoothes detail
Trainingdata
Melodyextraction
5 secondfragments
Topclusters
VQ clustering
Mining for Meaning of Music - Ellis 2008-03-27 p. /4519
Melody Clustering• Clusters match underlying contour:
• Some interesting matches:e.g. Pink + Nsync
Mining for Meaning of Music - Ellis 2008-03-27 p. /4520
3. Melodic-Harmonic Fragments• Can we use the subspace and clustering
ideas with our oodles of music?use lots of real music audiocapture both melodic and harmonic context...
• Goal: Dictionary of common motifs (clichés)build up into longer sequencesreveal quotes & inspirations, genre/style idioms
• Questionswhat representation and similarity measure?what clustering scheme?tractability: how large can we go?
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
Finding Common Fragments
• Chop up music into short descriptions of musical content24-beat beat-chroma matrices
• Choose a few at “starts” (landmarks)
• Put into LSH table similar items fall in same bin
• Find the bins with most entries= most commonly reused motifs
21
Music
audio
Locality
Sensitive
Hash Table
Beat
tracking
Chroma
features
Key
normalization
Landmark
identification
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
Beat Tracking• Goal: Per-‘beat’ (tatum) feature vector
for tempo normalization, efficiency
• “Onset Strength Envelope”sumf(max(0, difft(log |X(t, f)|)))
• Autocorr. + window → global tempo estimate
22
10203040
freq
/ mel
0
0
5 10 15time / sec
0 100 200 300 400 500 600 700 800 900lag / 4 ms samples
1000
0
168.5 BPM
Ellis ’06,’07
Music
audio
Locality
Sensitive
Hash Table
Beat
tracking
Chroma
features
Key
normalization
Landmark
identification
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
Beat Tracking (2)• Dynamic Programming finds beat times {ti}
optimizes i O(ti) + i F(ti+1 – ti , p)where O(t) is onset strength envelope (local score)W(t) is a log-Gaussian window (transition cost)p is the default beat period per measured tempoincrementally find best predecessor at every timebacktrace from largest final score to get beats
23
C*(t) = O(t) + max{αF(t – τ, τp) + C*(τ)}τ
P(t) = argmax{αF(t – τ, τp) + C*(τ)}τ
tτ
O(t)
C*(t)
Music
audio
Locality
Sensitive
Hash Table
Beat
tracking
Chroma
features
Key
normalization
Landmark
identification
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
Beat Tracking (3)• DP will bridge gaps (non-causal)
there is always a best path ...
• 2nd place in MIREX 2006 Beat Trackingcompared to McKinney & Moelants human data
24
10
20
30
40
0 5 10 150
20
40
freq / B
ark
band
Subje
ct #
time / s
test 2 (Bragg) - McKinney + Moelants Subject data
182 184 186 188 190 192 time / sec
10
20
30
40
freq / B
ark
band
Alanis Morissette - All I Want - gap + beats
Music
audio
Locality
Sensitive
Hash Table
Beat
tracking
Chroma
features
Key
normalization
Landmark
identification
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
Chroma Features• Chroma features convert spectral energy
into musical weights in a canonical octavei.e. 12 semitone bins
• Can resynthesize as “Shepard Tones”all octaves at once
25
Piano scale
2 4 6 8 10 100 200 300 400 500 600 700time / sec
freq / k
Hz
0
1
2
3
4
time / frames
chro
ma
A
C
D
F
G
Piano chromatic scale IF chroma
0 500 1000 1500 2000 2500 freq / Hz-60-50-40-30-20-10
0
2 4 6 8 10 time / sec
freq
/ kH
z
leve
l / d
B
0
1
2
3
4Shepard tone resynth12 Shepard tone spectra
Music
audio
Locality
Sensitive
Hash Table
Beat
tracking
Chroma
features
Key
normalization
Landmark
identification
Mining for Meaning of Music - Ellis 2008-03-27 p. /4526
Beat-Chroma Features• Beat + chroma features / 30ms frames→ average chroma within each beatcompact; sufficient?
! "# "! $# $! %# %!
$
&
'
(
"#
"$
"#
$#
%#
&#
$
&
'
(
"#
"$
# ! "# "!)*+,-.-/,0
)*+,-.-1,2)/
34,5-.-6,7
89/,)-/)4,9:);
0;48+2-1*9/
0;48+2-1*9/
#
Music
audio
Locality
Sensitive
Hash Table
Beat
tracking
Chroma
features
Key
normalization
Landmark
identification
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
Aligned Global model
Taxman Eleanor Rigby I'm Only Sleeping
She Said She Said Good Day Sunshine And Your Bird Can Sing
Love You To Yellow Submarine
alig
ned
chro
ma
A
CD
FG
A C D F GA
CD
FG
A C D F GA
CD
FG
A C D F GA
CD
FG
A C D F GA
CD
FG
A C D F GA
CD
FG
A C D F GA
CD
FG
A C D F GA
CD
FG
A C D F GA
CD
FG
aligned chroma
Key Estimation• Covariance of chroma reflects key
• Normalize by transposing for best fit
single Gaussian model of one piecefind ML rotation of other piecesmodel all transposed piecesiterate until convergence
27
Ellis ICASSP ’07
Music
audio
Locality
Sensitive
Hash Table
Beat
tracking
Chroma
features
Key
normalization
Landmark
identification
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
Landmark Location• Looking for “beginnings” of phrases
e.g. abrupt change in harmony, instruments, etc.use likelihood ratio test: data following under model up to boundary
• Choose top 10 locally-normalized peaks.. to control data size? include ± 2 beatsto catch errors
28time / sec
freq /
kHz
chro
ma bi
n
Come Together - Spectrogram, Beat-sync chromogram, and top 10 segment points
0
1
2
3
4
5
0 50 100 150 200 250
2
4
6
8
10
12
Music
audio
Locality
Sensitive
Hash Table
Beat
tracking
Chroma
features
Key
normalization
Landmark
identification
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
Locality Sensitive Hash• Goal: Quantize high-dimensional data so
‘similar’ items fall into same bin.. for fast and scalable nearest-neighbor search
• Idea: Multiple random scalar projectionseach one will tend to keep neighbors nearbyitems close together in all projections are probably neighbors
29
from Slaney & Casey ‘08
Music
audio
Locality
Sensitive
Hash Table
Beat
tracking
Chroma
features
Key
normalization
Landmark
identification
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
Experiments• Data
“artist20” - 20 artist x 6 albums = 1413 tracks(up to) 10 landmarks/track = 14,078 patcheseach patch = 12 chroma bins x 24 beats (288 dims)
• Performancefeature calculation:~ 60 minLSH 14k NNs:~ 30 sec51 patches have >10 NNs within r = 2.0
30
0 5 10 15 20 25 30 35 400
10
20
30
40
50
60
70
80
90
100
# near neighbors
coun
t
# neighbors within 2.0 - 14078 a20 patches
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
Results - artist20
mainly sustained notes31
chro
ma bi
n
radiohead 01-You 192.7-198.5s
2
4
6
8
10
12
chro
ma bi
n
green day 02-Blood Sex And Booze 122.3-129.8s
2
4
6
8
10
12ch
roma
bin
beat
radiohead 07-Ripcord 177.4-182.5s
5 10 15 20
2
4
6
8
10
12
chro
ma bi
n
beat
radiohead 11-Genchildren hidden 30.9-35.0s
5 10 15 20
2
4
6
8
10
12
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
Results - Beatles• Only the 86 Beatles tracks
• All beat offsets = 41,705 patchesLSH takes 300 sec - approx NlogN in patches?
• High-pass along time to avoid sustainednotes
• Song filterremove hitsin same track
32
chro
ma bi
n
02-I Should Have Known Better 92.4-97.7s
2
4
6
8
10
12
chro
ma bi
n
05-Here There And Everywhere 12.1-20.5s
2
4
6
8
10
12
chro
ma bi
n
beat
09-Martha My Dear 90.9-98.6s
5 10 15 20
2
4
6
8
10
12
chro
ma bi
nbeat
12-Piggies 22.0-29.6s
5 10 15 20
2
4
6
8
10
12
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
Results - Chroma Peaks
• Beat-chroma too diversereduce variation by keeping only 4 chroma/frame
• Landmarks off-by-1 → use tr –2 ... tr +270,606 fragments (all beats would be 1.3M fragments)
33
#32273 - 13 instances #51917 - 13 instances
#65512 - 10 instances #9667 - 9 instances
#10929 - 7 instances #61202 - 6 instances
#55881 - 5 instances
5 10 15 20
#68445 - 5 instances
5 10 15 time / beats
GFDCA
GFDCA
GFDCA
GFDCA
chro
ma
bins
a20-top10x5-cp4-4p0
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
Results - Detail• Interesting fragment cluster...
• Not that interesting...further simplification of fragments?larger dataset?
34
depeche mode 13-Ice Machine 199.5-204.6s roxette 03-Fireworks 107.9-114.8s
roxette 04-Waiting For The Rain 80.1-93.5s
5 10 15 20 time / beats
tori amos 11-Playboy Mommy 157.8-171.3s
5 10 15
GFDCA
GFDCA
chro
ma
bins
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
4. Other Projects: Music Similarity• The most central problem...
motivates extracting musical informationsupports real applications (playlists, discovery)
• But do we need content-based similarity?compete with collaborative filteringcompete with fingerprinting + metadata
• Maybe ... for the Future of Musicconnect listeners directly to musicians
35
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
Discriminative Classification• Classification as a proxy for similarity
• Distribution models...
• vs. SVM
36
KL
KL
MFCCs
Art
ist
1A
rtis
t 2
Test Song
Min Artist
Training
GMMs
D
D
D
D
D
D
MFCCs
Art
ist
1A
rtis
t 2
Song Features
DAG SVM
Test Song
Artist
Training
Mandel & Ellis ‘05
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
Segment-Level Features• Statistics of spectra and envelope
define a point in feature spacefor SVM classification, or Euclidean similarity...
37
Mandel & Ellis ‘07
LabROSALaboratory for the Recognition andOrganization of Speech and Audio
LabROSA’s audio music similarity and classification systemsMichael I Mandel and Daniel P W Ellis
LabROSA · Dept of Electrical Engineering · Columbia University, New York{mim,dpwe}@ee.columbia.edu
1. Feature Extraction• Spectral features are the same as Mandel and Ellis (2005), mean and
unwrapped covariance of MFCCs• Temporal features are similar to those in Rauber et al. (2002)
• Break input into overlapping 10-second clips• Analyze mel spectrum of each clip, averaging clips’ features• Combine mel frequencies together to get magnitude bands for low,
low-mid, high-mid, and high frequencies• FFT in time gives modulation frequency, keep magnitude of low-
est 20% of frequencies• DCT in modulation frequency gives envelope cepstrum• Stack the four bands’ envelope cepstra into one feature vector
• Each feature is then normalized across all of the songs to be zero-mean, unit-variance.
2. Similarity and Classification• We use a DAG-SVM for n-way classification of songs (Platt et al., 2000)• The distance between songs is the Euclidean distance between their
feature vectors• During testing, some songs were a top similarity match for many songs• Re-normalizing each song’s feature vector avoided this problem
ReferencesM. Mandel and D. Ellis. Song-level features and support vector machines for music classification. In Joshua
Reiss and Geraint Wiggins, editors, Proc. ISMIR, pages 594–599, 2005.
John C. Platt, Nello Cristianini, and John Shawe-Taylor. Large margin DAGs for multiclass classification. InS.A. Solla, T.K. Leen, and K.-R. Mueller, editors, Advances in Neural Information Processing Systems 12,pages 547–553, 2000.
Andreas Rauber, Elias Pampalk, and Dieter Merkl. Using psycho-acoustic models and self-organizing mapsto create a hierarchical structuring of music by sound similarity. In Michael Fingerhut, editor, Proc. ISMIR,pages 71–80, 2002.
3. ResultsAudio Music Similarity Audio Classification
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
PS GT LB CB1 TL1 ME TL2 CB2 CB3 BK1 PC BK2
Greater0
Psum
Fine
WCsum
SDsum
Greater10
10
20
30
40
50
60
70
80
IMsvm
MEspec ME TL GT IM
knn KL CL GH
Genre ID Hierarchical
Genre ID Raw
Mood ID
Composer ID
Artist ID
PS = Pohle, Schnitzer; GT = George Tzane-takis; LB = Barrington, Turnbull, Torres, Lanckriet;CB = Christoph Bastuck; TL = Lidy, Rauber, Per-tusa, Inesta; ME = Mandel, Ellis; BK = Bosteels,Kerre; PC = Paradzinets, Chen
IM = IMIRSEL M2K; ME = Mandel, Ellis; TL = Lidy,Rauber, Pertusa, Inesta; GT = George Tzane-takis; KL = Kyogu Lee; CL = Laurier, Herrera;GH = Guaus, Herrera
MIREX 2007, 3rd Music Information Retrieval Evaluation eXchange, 26 September 2007, Vienna, Austria
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
MIREX’07 Results• One system for similarity and classification
38
LabROSALaboratory for the Recognition andOrganization of Speech and Audio
LabROSA’s audio music similarity and classification systemsMichael I Mandel and Daniel P W Ellis
LabROSA · Dept of Electrical Engineering · Columbia University, New York{mim,dpwe}@ee.columbia.edu
1. Feature Extraction• Spectral features are the same as Mandel and Ellis (2005), mean and
unwrapped covariance of MFCCs• Temporal features are similar to those in Rauber et al. (2002)
• Break input into overlapping 10-second clips• Analyze mel spectrum of each clip, averaging clips’ features• Combine mel frequencies together to get magnitude bands for low,
low-mid, high-mid, and high frequencies• FFT in time gives modulation frequency, keep magnitude of low-
est 20% of frequencies• DCT in modulation frequency gives envelope cepstrum• Stack the four bands’ envelope cepstra into one feature vector
• Each feature is then normalized across all of the songs to be zero-mean, unit-variance.
2. Similarity and Classification• We use a DAG-SVM for n-way classification of songs (Platt et al., 2000)• The distance between songs is the Euclidean distance between their
feature vectors• During testing, some songs were a top similarity match for many songs• Re-normalizing each song’s feature vector avoided this problem
ReferencesM. Mandel and D. Ellis. Song-level features and support vector machines for music classification. In Joshua
Reiss and Geraint Wiggins, editors, Proc. ISMIR, pages 594–599, 2005.
John C. Platt, Nello Cristianini, and John Shawe-Taylor. Large margin DAGs for multiclass classification. InS.A. Solla, T.K. Leen, and K.-R. Mueller, editors, Advances in Neural Information Processing Systems 12,pages 547–553, 2000.
Andreas Rauber, Elias Pampalk, and Dieter Merkl. Using psycho-acoustic models and self-organizing mapsto create a hierarchical structuring of music by sound similarity. In Michael Fingerhut, editor, Proc. ISMIR,pages 71–80, 2002.
3. ResultsAudio Music Similarity Audio Classification
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
PS GT LB CB1 TL1 ME TL2 CB2 CB3 BK1 PC BK2
Greater0
Psum
Fine
WCsum
SDsum
Greater10
10
20
30
40
50
60
70
80
IMsvm
MEspec ME TL GT IM
knn KL CL GH
Genre ID Hierarchical
Genre ID Raw
Mood ID
Composer ID
Artist ID
PS = Pohle, Schnitzer; GT = George Tzane-takis; LB = Barrington, Turnbull, Torres, Lanckriet;CB = Christoph Bastuck; TL = Lidy, Rauber, Per-tusa, Inesta; ME = Mandel, Ellis; BK = Bosteels,Kerre; PC = Paradzinets, Chen
IM = IMIRSEL M2K; ME = Mandel, Ellis; TL = Lidy,Rauber, Pertusa, Inesta; GT = George Tzane-takis; KL = Kyogu Lee; CL = Laurier, Herrera;GH = Guaus, Herrera
MIREX 2007, 3rd Music Information Retrieval Evaluation eXchange, 26 September 2007, Vienna, Austria
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
Cover Song Detection• “Cover Songs” = reinterpretation of a piece
different instrumentation, characterno match with “timbral” features
• Need a different representation!beat-synchronous chroma features
39
Let It Be - The Beatles Let It Be - Nick Cave
time / sectime / sec
freq / k
Hz
Let It Be / Beatles / verse 1
2 4 6 8 100
1
2
3
4
freq / k
Hz
chro
ma
chro
ma
0
1
2
3
4Let It Be / Nick Cave / verse 1
2 4 6 8 10
Beat-sync chroma features
5 10 15 20 25 beats
Beat-sync chroma features
5 10 15 20 25 beats
A
C
D
F
G
A
C
D
F
G
Ellis & Poliner ‘07
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
Matching: Global Correlation• Cross-correlate entire beat-chroma matrices
... at all possible transpositionsimplicit combination of match quality and duration
• One good matching fragment is sufficient...?
40
100 200 300 400 500 beats @281 BPM
-500 -400 -300 -200 -100 0 100 200 300 400 skew / beats
-5
0
+5
G
E
D
C
A
ch
rom
a b
ins
G
E
D
C
A
ch
rom
a b
ins
ske
w /
se
mito
ne
s
Elliott Smith - Between the Bars
Glen Phillips - Between the Bars
Cross-correlation
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
“Semantic Bases”: MajorMiner• Describe segment in human-relevant terms
e.g. anchor space, but more so
• Need ground truth...what words to people use?
• MajorMiner game:400 users7500 unique tags70,000 taggings2200 10-sec clips used
• Train classifiers...
41
Mandel & Ellis ’07,‘08
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
MajorMiner Autotagging Results• Tags with enough verified clips → train SVM
• Some good resultstest has 50% baseline; 7% better is significant50-300 training patterns
• Next step: Propagate labelssemi-supervised“multi-instance” learning
42
0.4 0.5 0.6 0.7 0.8 0.9 1drumsvoicebassvocalkeyboardinstrumental80ssolomalebritishslowpianosoftpopdrum machinefemaleguitardistortionsynthstringscountryrockfastpunkdanceelectronicelectronicabeatambientjazzr&btechnohousesaxophonehip hoprap
Classification accuracy
Mining for Meaning of Music - Ellis 2008-03-27 p. /4543
Transcription as Classification• Exchange signal models for data
transcription as pure classification problem:
Poliner & Ellis ‘05,’06,’07
Classification:•N-binary SVMs (one for ea. note).•Independent frame-levelclassification on 10 ms grid.
•Dist. to class bndy as posterior.
classification posteriors
Temporal Smoothing:•Two state (on/off) independentHMM for ea. note. Parameters learned from training data.
•Find Viterbi sequence for ea. note.
hmm smoothing
Training data and features:•MIDI, multi-track recordings, playback piano, & resampled audio(less than 28 mins of train audio).
•Normalized magnitude STFT.
feature representation feature vector
Mining for Meaning of Music - Ellis 2008-03-27 p. /45
Singing Voice Modeling & Alignment• How do singers sing?
e.g. “vowel modification” in classical voicetuning variation...
• Collect the data.. by aligning libretto to recordingse.g. align Karaoke MIDI files to original recordingsdetail at alignments
• Lyric Transcription?
44
Christine SmitJohanna Devaney
Time / sec
Fre
quency / H
z
5 10 15 20 25 30 35 40 450
100
200
300
400
500
600
700
800
900
1000
Mining for Meaning of Music - Ellis 2008-03-27 p. /4545
Conclusions
• Lots of data + noisy transcription + weak clustering⇒ musical insights?
Music
audio
Tempo
and beat
Low-level
features Classification
and Similarity
Music
Structure
Discovery
Melody
and notes
Key
and chords
browsing
discovery
production
modeling
generation
curiosity