32
Combining Audio Content and Social Context for Semantic Music Discovery José Carlos Delgado Ramos Universidad Católica San Pablo

Combining Audio Content and Social Context for Semantic Music Discovery

  • Upload
    azana

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

Combining Audio Content and Social Context for Semantic Music Discovery. José Carlos Delgado Ramos Universidad Católica San Pablo. Introduction Sources of Music Information Combining multiple sources of music information Experiments. Introduction. - PowerPoint PPT Presentation

Citation preview

Page 1: Combining Audio Content and Social Context for Semantic Music Discovery

Combining Audio Content and Social Context for Semantic Music Discovery

José Carlos Delgado Ramos

Universidad Católica San Pablo

Page 2: Combining Audio Content and Social Context for Semantic Music Discovery

I. Introduction

II. Sources of Music Information

III. Combining multiple sources of music information

IV. Experiments

Page 3: Combining Audio Content and Social Context for Semantic Music Discovery

Introduction

• Most music IR system focus on either content-based analysis of audio signals

Page 4: Combining Audio Content and Social Context for Semantic Music Discovery

Introduction

• Or content-based analysis of webpages…

Page 5: Combining Audio Content and Social Context for Semantic Music Discovery

Introduction

• …user preference information…

Page 6: Combining Audio Content and Social Context for Semantic Music Discovery

Introduction

• … and social tagging data.

Page 7: Combining Audio Content and Social Context for Semantic Music Discovery

Tags

• Short text-based tokens• Helpful when describing songs

Page 8: Combining Audio Content and Social Context for Semantic Music Discovery

Tags

• Not always accurate, the strength of the semantic association betwen each song and each tag may vary.

Page 9: Combining Audio Content and Social Context for Semantic Music Discovery

Sources of semantic information

• Surveys

• Social tagging websites

• Annotation games

Page 10: Combining Audio Content and Social Context for Semantic Music Discovery

Relevance of tags to songs

• May be determined by using content-based audio analysis or by text-mining associated web documents.

Page 11: Combining Audio Content and Social Context for Semantic Music Discovery

Main sources for information retrieval

• Audio content, Social tags and Web documents

• Also used audio signal analysis by using two acoustic feature representations related to timbre and harmony.

Page 12: Combining Audio Content and Social Context for Semantic Music Discovery

Sources of Music Information

• A relevance score function r(s;t) is derived; evaluates the relevance of a song s to a tag t.

• Song-tag representations are dense if based on audio content, sparse if based on social representations.

Page 13: Combining Audio Content and Social Context for Semantic Music Discovery

Representing Audio Content: Supervised Multiclass Labeling (SML)

• Audio track s represented as a bag of feature vectors X = {x1,x2,…,xT}

• 1: Expectation maximization algorithm • 2: Identify set of example songs with a given tag.• 3: Mixture-hiearchies expectation maximization

algorithm.

Page 14: Combining Audio Content and Social Context for Semantic Music Discovery

Representing Audio Content: Supervised Multiclass Labeling (SML)

• Given a song s, X is extracted and likehood is evaluated using each of the tag GMMs.

• Result: vector or probabilites. Relevance of song s to a tag t may be written as:

Page 15: Combining Audio Content and Social Context for Semantic Music Discovery

Representing Audio Content: Audio feature representations

• Mel Frequency Cepstral Coefficients (MFCC): associated with musical notion of timbre.

• Chroma: represents the armonic content (keys, chords) by computing spectral energy at frequences corresponding to chromatic scale.

Page 16: Combining Audio Content and Social Context for Semantic Music Discovery

Representing Social Context:

• Summarize each song with annotation vector over a vocabulary of tags.

• Methods for retrieval tags: social & web-mined.• Missing song-tag pair: Tag not relevant or

relevant but not annotated.

Page 17: Combining Audio Content and Social Context for Semantic Music Discovery

Representing Social Context:Social Tags

• Last.FM: Music discovery website.• 20 million users a month annotate 3.8 million

items over 50 million times using a 1.2 million tags universe.

• Last.FM db: 150 million songs/16 million artists.

Page 18: Combining Audio Content and Social Context for Semantic Music Discovery

Representing Social Context:Social Tags

Page 19: Combining Audio Content and Social Context for Semantic Music Discovery

Representing Social Context:Social Tags

• Two lists of social Last.FM tags for each song: relating song to tags, and relating artist to tags.

• Relevance Tsocial(s,t) = artist list tag scores + songs lists tag scores + tag score for synonyms or wildcard matches of t on either list.

Page 20: Combining Audio Content and Social Context for Semantic Music Discovery

Representing Social Context:Web-Mined Tags

• Relevance Scoring (RS) algorithm.• Relevance function is a function of tag-

frequency, document frequency, number of total words in documents, etc

• Site-specific queries in HQ web-sites.• Steps: Collect Document Corpus and Tag songs

Page 21: Combining Audio Content and Social Context for Semantic Music Discovery

Combining multiple sources ofmusic information

• Given a query tag t, goal: fin a simple rank ordering of songs based on relevance to t.

• Tag-score, web-relevance score and convex optimization used.

• Three algorithms: supervised, use labeled traning data for learning.

Page 22: Combining Audio Content and Social Context for Semantic Music Discovery

Calibrated Score Averaging (CSA)

• Using training data, we can learn a function g() that calibrates scores such that

• To learn g(), we start with a rank-ordered training set of N songs where

• If data is is perfectly ordered, then g is isotonic. Otherwise:

Page 23: Combining Audio Content and Social Context for Semantic Music Discovery

Calibrated Score Averaging (CSA)

• E.g. 7 songs with relevant scores (1,2,4,5,6,7,9) and ground truth levels = (0,1,0,1,1,0,1)

• Then g(r) = 0 for r < 2, g(r) = ½ for 3<=r<6, g(r) = 2/3 for 6<=r<9 and g(r) = 1 for 9<=r.

• Missing song tags scores suggests tag isn’t relevant. Instead:

Page 24: Combining Audio Content and Social Context for Semantic Music Discovery

Rankboost algorithm

• For a given song, weak ranking function is n indicator functions that outputs 1 if the scoe for the associated representation is greater than the threshold or if the score is missing and the default value is set to 1. Otherwise 0.

Page 25: Combining Audio Content and Social Context for Semantic Music Discovery

Kernel Combination SVM (KC-SVM)

• Linear combination of M different kernels that each encode different data features:

• Since each kernel matrix, Km is positive semi-definite, their positive-weighted sum, K is also a valid positive semi-definite kernel.

Page 26: Combining Audio Content and Social Context for Semantic Music Discovery

Kernel Combination SVM (KC-SVM)

• Km represents similarities between all songs in the data set, after vectors X = {x1,x2,…,xT} obtained from MFCC and Chroma. Compute the entries of a probability product kernel (PPK)

Page 27: Combining Audio Content and Social Context for Semantic Music Discovery

Kernel Combination SVM (KC-SVM)

• For each of the social context features, a radial basis function (RBF) function is computed, with entries:

• Where K(i,j) represents the similaritybetween xi and xj, the annotation vectors for songs i and j.

Page 28: Combining Audio Content and Social Context for Semantic Music Discovery

Kernel Combination SVM (KC-SVM)

• For each tag t and corresponding class-label vector, y, the primal problem for single-kernel SVM is to find the decision boundary with maximum margin separating the two clases..

• Optimum K can be learned by minimizing the function that optimizes the dual (thereby maximizing hte margin) with respect to the kernel weights .

Page 29: Combining Audio Content and Social Context for Semantic Music Discovery

Kernel Combination SVM (KC-SVM)

• Where and e is an n-vector of ones such that constrains the weights tu sum to one. C is a hyper parameter that limits violations of the margin.

Page 30: Combining Audio Content and Social Context for Semantic Music Discovery

Kernel Combination SVM (KC-SVM)

• The solution returns a linear decision function that defines the distance of a new song sz, from the hyperplane boundary between the positive and negative classes (i.e. elevance of sz to tag t)

• b: offset of the decision boundary from the region.

Page 31: Combining Audio Content and Social Context for Semantic Music Discovery

Semantic Music Retrieval Experiments

• 500 songs by 500 unique artists, each annotated by a minimum of 3 individual from a 174-tag vocabulary.

• Song annotated: 80% agree with tag relevance.• Experiment: 72 tags associated with at least 20

songs each.

Page 32: Combining Audio Content and Social Context for Semantic Music Discovery

Thanks!