43
Singer Similarity Doug Van Nort MUMT 611

Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Embed Size (px)

Citation preview

Page 1: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Singer Similarity

Doug Van Nort

MUMT 611

Page 2: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Goal

Determine Singer / Vocalist based on extracted features of audio signal

Classify audio files based on singer Storage and retrieval

Page 3: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Introduction

Identification of singer fairly easy task for humans regardless of musical context

Not so easy to find parameters for automatic identification

More file sharing and databases leads to increased demand

Page 4: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Introduction

Much work done in speech recognition, performs poorly for singer ID Systems trained on speech data, with no

background noise

The vocal problem has some fundamental differences Vocals exist in variety of background noise Voiced/unvoiced content

Singer recognition similar problem to solo instrument identification

Page 5: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

The Players

Kim and Whitman 2002

Liu and Huang 2002

Page 6: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Kim and Whitman

From MIT Media Lab

Singer identification which Assumes strong harmonicity from

vocals Assumes pop music

Instrumentation/levels within critical frequency range

Page 7: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Two step process

Untrained algorithm for automatic segmentation

Classification with training based on vocal segments

Page 8: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Detection of Vocal Regions Filter frequencies outside of vocal

range of 200-2,000 Hz Chebychev IIR digital filter

Detect harmonicity

Page 9: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Filter Frequency Response

Page 10: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Filtering alone not enough Bass and cymbals gone, but Other instruments fall within range

Need to extract features within vocal range to find voice

Page 11: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Harmonic detection

Band limited output sent through bank of inverse comb filters Delay varied

Page 12: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Most attenuated signal represents strongest harmonic content

Harmonicity measure calculated by ratio of signal energy to maximally attenuated signal Allows for establishment of

threshold

Page 13: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Singer Identification

Linear Predictive Coding (LPC) used to extract location and magnitude of formants

One of two classifiers used to identify singer based on formant information

Page 14: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Feature Extraction

A 12-pole linear predictor used to find formants using autocorrelation method

Standard LPC treats frequencies linearly, but human sensitivity is more logarithmic Warp function maps frequencies to

approximation of Bark scale Further beneficial in finding fundamental

Page 15: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer
Page 16: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Classification Techniques 2 established pattern recognition

algorithms used:

Gaussian Mixture Model (GMM)

Support Vector Machine (SVM)

Page 17: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

GMM

Uses multiple weighted Gaussians to capture behavior of each class Each vector assumed to arise from mixture of

gaussian dists.

Parameters for Gaussians found via Expectation Maximization (EM) Mean and variance

Prior to EM, Principal Component Analysis (PCA) taken of data Normalizes variances, avoids highly irregular

scalings which EM can produce

Page 18: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

SVM

Computes optimal hyperplane to linearly separate two classes of data

Does not depend on probability estimation

Determined by a small number of data points (support vectors)

Page 19: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Experiments & Results Testbed of 200 songs by 17

different artists/vocalists

Tracks downsampled to 11.025 Khz Vocal range still well below Nyquist

Page 20: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Half of database used for training, half for testing

Two experiments: LPC features taken from entire

song LPC features taken from vocal

segments

Page 21: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

1024 frame analysis with hop size of 2

LP analysis used both linear and warped freq scales

Page 22: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Results

Page 23: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Results

Results better than chance (~6%) but fall short of expected human performance

Linear freq alone outperforms warped freq

Oddly, using only vocal segments decreases performance for SVM

Page 24: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Liu and Huang

Based on MP3 database

Particularly high demand for such an approach, given widespread use of Mpeg 1, layer 3

Algorithm works directly on MP3 decoder algorithm

Page 25: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Process

Coefficients of polyphase filter taken from MP3 decoding process

File segmented into phonemes based on said coefficients

Feature vector constructed for each phoneme, and stored along with artist name in database

Classifier trained on database, used to identify unknown MP3 files

Page 26: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Flowchart for singer similarity System of Liu/Huang

Page 27: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Phoneme Segmentation MP3 decoding provides polyphase

coefficients

Page 28: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Energy intensity of each subband is sum of squares of subband coefficients

Frame energy calculated from polyphase coefficients

Page 29: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Energy gap exists between two phonemes

Segmentation looks to automatically identify this gap

Page 30: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Waveform of two phonemes

Frame energy of two phonemes

Page 31: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Phoneme Feature Extraction Phoneme features computed

directly from MDCT coefficients 576 dimensional feature vector for

each frame

Phoneme feature vector of n frames

Page 32: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Classification : setup

Create database of phoneme feature vectors Becomes training set

Discriminating Radius: measure of uniqueness by min Euclidean distance between dissimilar vectors

Page 33: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Good vs. Bad discriminators

Page 34: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Number of similar phonemes within discriminating radius also cosidered

Number of phonemes within radius = wf = frequency of phoneme f

Page 35: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Discriminating ability of each phoneme depends on frequency and distance

Page 36: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Classification: in action Unknown MP3 segmented into

phonemes Only first N used for efficiency

kNN used as classifier K neighbors compared to N

phonemes and weighted by discriminating function

K*N weighted “votes” clustered by singer, and the winner is one with largest score

Page 37: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Experiments/Results

10 Male, 10 Female singers

30 songs apiece 10 phoneme database 10 training (discriminator weights) 10 test set

Page 38: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Free parameters

User defined parameters: k value Discrimination threshold Number of singers in a class

Page 39: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Varying threshold

Page 40: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Varying k

Page 41: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Varying number of singers

Page 42: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Results for all Singers

Page 43: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer

Conclusion

Not much work yet strictly on singer

Tough because of time and background variances

Quite useful as many people identify artists with singer

Initial results promising, short of human performance

See also: Minnowmatch [Whitman, Flake, Lawrence]