View
0
Download
0
Category
Preview:
Citation preview
Data-Driven Music Understanding - Dan Ellis 2013-05-15 /311
1. Motivation: What is Music?2. Eigenrhythms3. Melodic-Harmonic Fragments4. Example Applications
Data DrivenMusic Understanding
Dan EllisLaboratory for Recognition and Organization of Speech and Audio
Dept. Electrical Engineering, Columbia University, NY USAhttp://labrosa.ee.columbia.edu/
Data-Driven Music Understanding - Dan Ellis 2013-05-15 /31
LabROSA : Machine Listening
2
• Extracting useful information from sound... like (we) animals do
Dectect
Classify
Describe
EnvironmentalSound
Speech Music
Task
Domain
ASR MusicTranscription
MusicRecommendation
VAD Speech/Music
Emotion
“Sound Intelligence”
EnvironmentAwareness
AutomaticNarration
Data-Driven Music Understanding - Dan Ellis 2013-05-15 /31
1. Motivation: What is music?
• What does music evoke in a listener’s mind?
• Which are the things that we call “music”?
3
?
Data-Driven Music Understanding - Dan Ellis 2013-05-15 /314
Oodles of Music
• What can you do with a million tracks?
Bertin-Mahieux et al. ’09
Data-Driven Music Understanding - Dan Ellis 2013-05-15 /31
Re-use in Music
• What are the most popular chord progressions in pop music?
5
points towards a great degree of conventionalism in the creationand production of this type of music. Yet, we find three importanttrends in the evolution of musical discourse: the restriction of pitchsequences (with metrics showing less variety in pitch progressions),the homogenization of the timbral palette (with frequent timbresbecoming more frequent), and growing average loudness levels(threatening a dynamic richness that has been conserved untiltoday). This suggests that our perception of the new would be essen-tially rooted on identifying simpler pitch sequences, fashionable tim-bral mixtures, and louder volumes. Hence, an old tune with slightlysimpler chord progressions, new instrument sonorities that were inagreement with current tendencies, and recorded with modern tech-niques that allowed for increased loudness levels could be easilyperceived as novel, fashionable, and groundbreaking.
ResultsTo identify structural patterns of musical discourse we first need tobuild a ‘vocabulary’ of musical elements (Fig. 1). To do so, we encodethe dataset descriptions by a discretization of their values, yieldingwhat we call music codewords20 (see Supplementary Information, SI).In the case of pitch, the descriptions of each song are additionallytransposed to an equivalent main tonality, such that all of them areautomatically considered within the same tonal context or key. Next,
to quantify long-term variations of a vocabulary, we need to obtainsamples of it at different periods of time. For that we perform aMonte Carlo sampling in a moving window fashion. In particular,for each year, we sample one million beat-consecutive codewords,considering entire tracks and using a window length of 5 years (thewindow is centered at the corresponding year such that, for instance,for 1994 we sample one million consecutive beats by choosing fulltracks whose year annotation is between 1992 and 1996, bothincluded). This procedure, which is repeated 10 times, guarantees arepresentative sample with a smooth evolution over the years.
We first count the frequency of usage of pitch codewords (i.e. thenumber of times each codeword type appears in a sample). Weobserve that most used pitch codewords generally correspond towell-known harmonic items21, while unused codewords correspondto strange/dissonant pitch combinations (Fig. 2a). Sorting the fre-quency counts in decreasing order provides a very clear patternbehind the data: a power law17 of the form z / r2a, where z corre-sponds to the frequency count of a codeword, r denotes its rank (i.e. r5 1 for the most used codeword and so forth), and a is the power lawexponent. Specifically, we find that the distribution of codewordfrequencies for a given year nicely fits to P(z) / (c 1 z) 2b for z .zmin, where we take z as the random variable22, b 5 1 1 1/a as theexponent, and c as a constant (Fig. 2b). A power law indicates thata few codewords are very frequent while the majority are highly
Figure 1 | Method schematic summary with pitch data. The dataset contains the beat-based music descriptions of the audio rendition of a musical pieceor score (G, Em, and D7 on the top of the staff denote chords). For pitch, these descriptions reflect the harmonic content of the piece15, and encapsulate allsounding notes of a given time interval into a compact representation11,12, independently of their articulation (they consist of the 12 pitch class relativeenergies, where a pitch class is the set of all pitches that are a whole number of octaves apart, e.g. notes C1, C2, and C3 all collapse to pitch class C). Alldescriptions are encoded into music codewords, using a binary discretization in the case of pitch. Codewords are then used to perform frequency counts,and as nodes of a complex network whose links reflect transitions between subsequent codewords.
www.nature.com/scientificreports
SCIENTIFIC REPORTS | 2 : 521 | DOI: 10.1038/srep00521 2
points towards a great degree of conventionalism in the creationand production of this type of music. Yet, we find three importanttrends in the evolution of musical discourse: the restriction of pitchsequences (with metrics showing less variety in pitch progressions),the homogenization of the timbral palette (with frequent timbresbecoming more frequent), and growing average loudness levels(threatening a dynamic richness that has been conserved untiltoday). This suggests that our perception of the new would be essen-tially rooted on identifying simpler pitch sequences, fashionable tim-bral mixtures, and louder volumes. Hence, an old tune with slightlysimpler chord progressions, new instrument sonorities that were inagreement with current tendencies, and recorded with modern tech-niques that allowed for increased loudness levels could be easilyperceived as novel, fashionable, and groundbreaking.
ResultsTo identify structural patterns of musical discourse we first need tobuild a ‘vocabulary’ of musical elements (Fig. 1). To do so, we encodethe dataset descriptions by a discretization of their values, yieldingwhat we call music codewords20 (see Supplementary Information, SI).In the case of pitch, the descriptions of each song are additionallytransposed to an equivalent main tonality, such that all of them areautomatically considered within the same tonal context or key. Next,
to quantify long-term variations of a vocabulary, we need to obtainsamples of it at different periods of time. For that we perform aMonte Carlo sampling in a moving window fashion. In particular,for each year, we sample one million beat-consecutive codewords,considering entire tracks and using a window length of 5 years (thewindow is centered at the corresponding year such that, for instance,for 1994 we sample one million consecutive beats by choosing fulltracks whose year annotation is between 1992 and 1996, bothincluded). This procedure, which is repeated 10 times, guarantees arepresentative sample with a smooth evolution over the years.
We first count the frequency of usage of pitch codewords (i.e. thenumber of times each codeword type appears in a sample). Weobserve that most used pitch codewords generally correspond towell-known harmonic items21, while unused codewords correspondto strange/dissonant pitch combinations (Fig. 2a). Sorting the fre-quency counts in decreasing order provides a very clear patternbehind the data: a power law17 of the form z / r2a, where z corre-sponds to the frequency count of a codeword, r denotes its rank (i.e. r5 1 for the most used codeword and so forth), and a is the power lawexponent. Specifically, we find that the distribution of codewordfrequencies for a given year nicely fits to P(z) / (c 1 z) 2b for z .zmin, where we take z as the random variable22, b 5 1 1 1/a as theexponent, and c as a constant (Fig. 2b). A power law indicates thata few codewords are very frequent while the majority are highly
Figure 1 | Method schematic summary with pitch data. The dataset contains the beat-based music descriptions of the audio rendition of a musical pieceor score (G, Em, and D7 on the top of the staff denote chords). For pitch, these descriptions reflect the harmonic content of the piece15, and encapsulate allsounding notes of a given time interval into a compact representation11,12, independently of their articulation (they consist of the 12 pitch class relativeenergies, where a pitch class is the set of all pitches that are a whole number of octaves apart, e.g. notes C1, C2, and C3 all collapse to pitch class C). Alldescriptions are encoded into music codewords, using a binary discretization in the case of pitch. Codewords are then used to perform frequency counts,and as nodes of a complex network whose links reflect transitions between subsequent codewords.
www.nature.com/scientificreports
SCIENTIFIC REPORTS | 2 : 521 | DOI: 10.1038/srep00521 2
Serrà et al. 2012
Data-Driven Music Understanding - Dan Ellis 2013-05-15 /31
Potential Applications
• Compression
• Classification
• Manipulation
6
Data-Driven Music Understanding - Dan Ellis 2013-05-15 /31
2. Eigenrhythms: Drum Track Structure
• To first order, all pop music has the same beat:
• Can we learn this from examples?
Ellis & Arroyo ’04
7
Data-Driven Music Understanding - Dan Ellis 2013-05-15 /31
Basis Sets
• Combine a few basic patternsto make a larger dataset
X = W × Hdataweights
patterns
-1
0
1
!=
8
Data-Driven Music Understanding - Dan Ellis 2013-05-15 /31
Drum Pattern Data
• Tempo normalization + downbeat alignment
9
Data-Driven Music Understanding - Dan Ellis 2013-05-15 /31
NMF Eigenrhythms
• Nonnegative: only add beat-weight
Posirhythm 1
BDSNHH
BDSNHH
BDSNHH
BDSNHH
BDSNHH
BDSNHH
Posirhythm 2
Posirhythm 3 Posirhythm 4
Posirhythm 5
500 0 samples (@ 200 Hz)beats (@ 120 BPM)
100 150 200 250 300 350 4001 2 3 4 1 2 3 4
Posirhythm 6
50 100 150 200 250 300 350
-0.1
0
0.1
10
Data-Driven Music Understanding - Dan Ellis 2013-05-15 /31
Eigenrhythm BeatBox• Resynthesize rhythms from eigen-space
11
Data-Driven Music Understanding - Dan Ellis 2013-05-15 /3112
3. Melodic-Harmonic Fragments
• How similar are two pieces?
• Can we find all the pop-music clichés?
Data-Driven Music Understanding - Dan Ellis 2013-05-15 /31
MFCC Features• Used in speech recognition
13
freq
/ Hz
Coef
ficien
tfre
q / H
z
Let It Be (LIB-1) - log-freq specgram
32913965915
MFCCs
2468
1012
time / sec
Noise excited MFCC resynthesis (LIB-2)
0 5 10 15 20 25
32913965915
Data-Driven Music Understanding - Dan Ellis 2013-05-15 /31
Chroma Features• Idea:
Project onto 12 semitonesregardless of octave
14
Fujishima 1999
50 100 150 fft bin 2 4 6 8 time / sec 50 100 150 200 250 time / frame
freq
/ kHz
01234
chro
ma
A
CD
FG
chro
ma
A
CD
FG
M X C× =mapping signal chroma
Data-Driven Music Understanding - Dan Ellis 2013-05-15 /31
Chroma Features• To capture “musical” content
15
Let It Be - log-freq specgram (LIB-1)
Chroma features
CDEGAB
Shepard tone resynthesis of chroma (LIB-3)
MFCC-filtered shepard tones (LIB-4)
freq
/ Hz
30014006000
freq
/ Hz
chro
ma b
in
30014006000
freq
/ Hz
30014006000
time / sec0 5 10 15 20 25
Data-Driven Music Understanding - Dan Ellis 2013-05-15 /31
Let It Be - log-freq specgram (LIB-1)
Onset envelope + beat times
Beat-synchronous chroma
Beat-synchronous chroma + Shepard resynthesis (LIB-6)
freq
/ Hz
30014006000
freq
/ Hz
30014006000
CDEGAB
chro
ma
bin
time / sec0 5 10 15 20 25
Beat-Synchronous Chroma• Compact representation of harmonies
16
Data-Driven Music Understanding - Dan Ellis 2013-05-15 /31
Chord Recognition• Beat synchronous chroma look like chords
can we transcribe them?
• Two approachesmanual templates (prior knowledge)learned models (from training data)
17
ACDEG
chro
ma
bin
time / sec0 5 10 15 20
C-E-
G
B-D
-G
A-C-
E
A-C-
D-F ...
Data-Driven Music Understanding - Dan Ellis 2013-05-15 /31
Chord Recognition System• Analogous to speech recognition
Gaussian models of features for each chordHidden Markov Models for chord transitions
18
Audio
Labels
Beat track
Resample
Chroma100-1600 HzBPF
Chroma25-400 HzBPF
Root normalize
HMMViterbi
Counttransitions
Gaussian Unnormalize
beat-synchronouschroma features
chordlabels
24x24transition
matrix
24Gaussmodels
traintest
C D E G A BCDE
GAB
CDE
GAB
C D E G A B
C maj
c min
C D E F G A B c d e f g a bC
D
EF
G
A
Bc
d
ef
g
a
b
Sheh & Ellis 2003Ellis & Weller 2010
Data-Driven Music Understanding - Dan Ellis 2013-05-15 /31
Chord Recognition• Often works:
• But only 60-80% of the time
19
freq
/ Hz
Let It Be/06-Let It Be
C G A:min
A:m
in/b7
F:m
aj7F:
maj6 C G F C C G A:min
A:m
in/b7
F:m
aj7
240
761
2416
0 2 4 6 8 10 12 14 16 18
ABCDEFG
C G a F C G F C G a
Groundtruthchord
Audio
Recognized
Beat-synchronous
chroma
Data-Driven Music Understanding - Dan Ellis 2013-05-15 /3120
• Chord model centers (means) indicate chord ‘templates’:
What did the models learn?
0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4PCP_ROT family model means (train18)
DIMDOM7MAJMINMIN7
C D E F G A B C (for C-root chords)
Data-Driven Music Understanding - Dan Ellis 2013-05-15 /31
Finding ‘Cover Songs’
• Little similarity in surface audio...
• .. but appears in beat-chroma
21
Let It Be - The Beatles
time / sectime / sec
freq
/ kH
z
Let It Be / Beatles / verse 1
2 4 6 8 100
1
2
3
4
freq
/ kH
z
chro
ma
chro
ma
0
1
2
3
4Let It Be / Nick Cave / verse 1
2 4 6 8 10
Beat-sync chroma features
5 10 15 20 25 beats
Beat-sync chroma features
5 10 15 20 25 beatsA
CD
FG
A
CD
FG
Let It Be - Nick Cave
Ellis & Poliner ’07Ravuri & Ellis ’10
Data-Driven Music Understanding - Dan Ellis 2013-05-15 /31
Large-Scale Cover Recognition
• 2D Fourier Transform Magnitude (2DFTM)fixed-size feature to capture “essence” of chromagram:
• First results on finding covers in 1M songs
22
Average rank meanAP
random 500,000 0.000
jumpcodes 2 308,369 0.002
2DFTM (50 PC) 137,117 0.020
Bertin-Mahieux & Ellis ’12
Data-Driven Music Understanding - Dan Ellis 2013-05-15 /31
Finding Common Fragments
• Cluster beat-synchronous chroma patches
23
#32273 - 13 instances #51917 - 13 instances
#65512 - 10 instances #9667 - 9 instances
#10929 - 7 instances #61202 - 6 instances
#55881 - 5 instances
5 10 15 20
#68445 - 5 instances
5 10 15 time / beats
GFDCA
GFDCA
GFDCA
GFDCA
chro
ma b
insa20-top10x5-cp4-4p0
Data-Driven Music Understanding - Dan Ellis 2013-05-15 /31
Clustered Fragments
• ... for a dictionary of common themes?
24
depeche mode 13-Ice Machine 199.5-204.6s roxette 03-Fireworks 107.9-114.8s
roxette 04-Waiting For The Rain 80.1-93.5s
5 10 15 20 time / beats
tori amos 11-Playboy Mommy 157.8-171.3s
5 10 15
GFDCA
GFDCA
chro
ma
bins
Data-Driven Music Understanding - Dan Ellis 2013-05-15 /31
4. Example Applications:Music Discovery
• Connecting listeners to musicians
25
Berenzweig & Ellis ‘03
Data-Driven Music Understanding - Dan Ellis 2013-05-15 /31
Playlist Generation
• Incremental learning of listeners’ preferences
26
Mandel, Poliner, Ellis ‘06
Data-Driven Music Understanding - Dan Ellis 2013-05-15 /31
MajorMiner: Music Tagging• Describe music using words
27
Mandel & Ellis ’07,‘08
Data-Driven Music Understanding - Dan Ellis 2013-05-15 /31
Classification Results• Classifiers trained from top 50 tags
28
40 80 120 160 200 240 280 320drumguitarmalesynthrockelectronicpopvocalbassfemaledancetechnopianojazzhip_hoprapslowbeatvoice80selectronicainstrumentalfastsaxophonekeyboardcountrydrum_machinedistortionbritishambientsoftfunkr_balternativehouseindiestringssolonoisequietsilencesamplespunkhornssingingdrum_bassendtranceclub_90s
320!2
!1.5
!1
!0.5
0
0.5
1
1.5
01 Soul Eyes
50 100 150 200 250 300
time / s
freq
/ Hz
135240427761
13562416
Data-Driven Music Understanding - Dan Ellis 2013-05-15 /3129
Music Transcription Poliner & Ellis ‘05,’06,’07
Classification:
•N-binary SVMs (one for ea. note).
•Independent frame-level
classification on 10 ms grid.
•Dist. to class bndy as posterior.
classification posteriors
Temporal Smoothing:
•Two state (on/off) independent
HMM for ea. note. Parameters
learned from training data.
•Find Viterbi sequence for ea. note.
hmm smoothing
Training data and features:
•MIDI, multi-track recordings, playback piano, & resampled audio
(less than 28 mins of train audio). •Normalized magnitude STFT.
feature representation feature vector
Data-Driven Music Understanding - Dan Ellis 2013-05-15 /31
MEAPsoft• Music Engineering Art Projects
collaboration between EE and Computer Music Center
30
with Douglas Repetto, Ron Weiss, and the rest
of the MEAP team
Data-Driven Music Understanding - Dan Ellis 2013-05-15 /31
Conclusions
• Lots of data + noisy transcription + weak clustering⇒ musical insights?
Musicaudio
Tempoand beat
Low-levelfeatures Classification
and Similarity
MusicStructureDiscovery
Melodyand notes
Keyand chords
browsingdiscoveryproduction
modelinggenerationcuriosity
31
Recommended