Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang

Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer

IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007

Present by Yi-Tang Wang

Outline

Introduction Audio-Based Similarity Web-Based Similarity Problem Modeling Evaluation and Results Conclusion & future work

Introduction

A novel music player interface using a wheel

Generating a circular playlist from personal repositories

Keeps on playing similar tracks Not only audio-based similarity is

used, but also text-based similarity

Audio-Based Similarity

MFCCs ( Mel frequency cepstral coefficients )

Discarding the higher-order MFCCs beneficial for the ability to compare

different frames, but possibly at the cost of discarding musically meaningful information.


The wave file were downsampled to 22 kHz

19 MFCCs per frame Ignoring the temporal order Model the distribution of MFCC

coefficients with Gaussian mixture model


Similarity between music Compute the distance between two

GMM Likelihood

computing the probability that the MFCCs of song A be generated by the model of B

Drawback: need to store all MFCC coefficients


Sampling Only store the GMM parameters,

instead of storing MFCCs Sample from one GMM

compute the likelihood given another GMM

Corresponds roughly to re-creating a song

Web-Based Similarity

Cultural, social, historical, and contextual aspects should be taken into account

WWW information Query using artist’s name + ”music”

with Google 50 top-ranked pages are retrieved Remove all terms that - # of occur page

< c Such that about 10000 terms remain


Term frequency tfta

a : artist , t : term # of occurrences of t in documents

related to a Document Frequency dft

# of pages t occurred in Term weight per artist

term frequency × inverse document frequency


Each artist is described by a vector of term weights

Apply cosine normalization on the vector

Euclidean distance is a simple similarity measure

In this paper, we use SOM as measure method

Web-Based Similarity - SOM SOM － Self-organizing Maps

a subtype of artificial neural networks It is trained using unsupervised learning low dimensional representation of the

training samples while preserving the topological properties of the input space

Using a rectangular 2-D grid in this paper for text-based similarity between songs

Web-Based Similarity - SOM A SOM consists of units A model vector in the high-

dimensional input data space is assigned to each of the units.

model vectors which belong to units close to each other on the 2-D grid, are also close to each other in the data space.

Training to choose model vectors

Unit

Web-Based Similarity - SOM Batch-SOM algorithm Initial

Randomly initialize the model vector 1st step

for each data item xi, the Euclidean distance between x and each model vector is calculated

each data item x is assigned to the unit ci that represents it best.

Web-Based Similarity - SOM 2nd step

neighborhood relationship between two units is usually defined by a Gaussian-like function

hjk = exp(-djk2/rt 2)

djk= distance on the map , rt= neighborhood radius rt decrease with each iteration (the

adaptation strength decreases gradually)

Web-Based Similarity - SOM Two artist is similar if they are

mapped to same or adjacent units

Newer experiments have actually shown that 6 × 6 grid might be better for this collection

Combining two approach

Adding a constant value to the audio-based distance matrix for all songs of dissimilar artists Half of maximum audio-based distance

Adding Penalty to transitions between songs by dissimilar artist

Previous work

Audio-based similarity – Fluctuation Patterns

Using SOM only on audio-based data Labeling SOM with information from

www A 3-D browsing system

P. Knees, M. Schedl, T. Pohle and G.Widmer, “An Innovative Three Dimensional User Interface for Exploring Music Collections Enriched with Meta-Information from the Web,” ACM MM’06

Problem Modeling

Map the playlist generation problem to Traveling Salesman Problem

The cities correspond to the tracks in collection

The distances are determined by the similarities between the tracks

Find a optimal route = producing a circular playlist

TSP Problem Greedy Algorithm

All edges are examined in order of increasing length and add to the route properly

Minimum Spanning Tree Found a minimum spanning tree and do DFS Connecting the nodes in the order they are

first visited LKH

Lin-Kernighan algorithm proposed in 1971 Start with randomly generated tour Deleting edges from the route and

recombining the remaining tour fragments

TSP Problem

One-Dimensional SOM Train a 1-D cyclic SOM

a circular playlist As many units as tracks? Recursive approach Combining subtour in a greedy manner

Evaluation & Results

Collection 1 2545 tracks, 13 genres A Cappella (4.4%), Acid Jazz (2.7%), Blues

(2.5%), Bossa Nova (2.8%), Celtic (5.2%), Electronica (21.1%), Folk Rock (9.4%), Italian (5.6%), Jazz (5.3%), Metal (16.1%), Punk Rock (10.2%), Rap (12.9%), and Reggae (1.8%)

103 artists for each artist, minimum - 8 tracks,

maximum - 61 tracks

Evaluation & Results

Collection 2 3456 tracks, 7 genres Classical (14.7%), Dance (15.0%), Hip-

Hop (14.5%), Jazz (13.6%), Metal (14.9%), Pop (11.6%), and Punk (15.6%). The minimum number

339 artists for each artist, minimum - 1 tracks,

maximum - 317 tracks

Fluctuations Between Genres A Cappella, Acid Jazz, Blues, Bossa Nova, Celtic,

Electronica, Folk Rock, Italian, Jazz, Metal, Punk Rock, Rap, andReggae (collection 1)

Shannon Entropy

Estimate how locally coherent a playlist is

Count how many of n consecutive tracks belonged to each genre

n = 2…12 Typical album contains about 12 tracks

Average over the whole playlist SOM yields better results on web-

enhanced data than LKH on audio only data

Shannon Entropy

Long-Term Consistency

SOM algorithm on combined data


MinSpan algorithm on audio similarity data


Greedy algorithm on audio similarity data


User Study

10 test persons using the collection 2 Create a large playlist Extract 10 seed tracks

Randomly choosing a start point Selecting tracks at intervals of 3 degress

Generate two playlist Adding the next nine tracks Randomly choose from same genre

User Study

Users rate each playlist from 1 to 5 Summing up rating scores Calculate the difference tspi,j － geni,j

i : playlist no. , j : user

User Interface

User Interface

The user interface is very intuitive and its handling extremely easy

Apple’s iPod Users’ opinion

A scanning function to skip 10 seconds when pressing

Genres containing only a few tracks are quite difficult to locate

Not usable when finding a specific track

Summary of Evaluation Result all TSP algorithms provided better

results with respect to our playlist evaluation criteria when using the web based extension

the combined similarity measure reduces the number of unexpected placements of tracks in the playlist

Summary of Evaluation Result LKH and greedy algorithm

best small-scale genre entropy values large-scale genre distributions are quite

fragmented SOM-based algorithm

highest entropy values the least fragmented long-term genre

distributions MinSpan algorithm

in the middle field regarding the entropy values

Conclusion & future work

a new approach to conveniently access the music stored in mobile sound players

The whole collection is ordered in a circular playlist and thus accessible with only one input wheel

two different similarity measures — one relying on timbre information, the other on a combination of timbre and community metadata gathered from artist related web pages


Problems to solve Not possible to precisely select a desired

piece only tracks selectable that are

representative for a region zooming or hierarchical structuring

techniques The user does not know in advance

which region on the wheel contains which style of music


M. Schedl, T. Pohle, P. Knees, and G.Widmer, “Assigning and visualizing music genres by web-based co-occurrence analysis,” in Proc. 7th Int. Conf. Music Information Retrieval (ISMIR’06), Victoria, Canada, Oct. 2006.

Thank You

Documents

Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang