6
A COMBINED THEMATIC AND ACOUSTIC APPROACH FOR A MUSIC RECOMMENDATION SERVICE IN TV COMMERCIALS Mohamed Morchid, Richard Dufour, Georges Linar` es LIA - University of Avignon (France) {mohamed.morchid, richard.dufour, georges.linares}@univ-avignon.fr ABSTRACT Most of modern advertisements contain a song to illustrate the commercial message. The success of a product, and its economic impact, can be directly linked to this choice. Finding the most appropriate song is usually made man- ually. Nonetheless, a single person is not able to listen and choose the best music among millions. The need for an automatic system for this particular task becomes in- creasingly critical. This paper describes the LIA music recommendation system for advertisements using both tex- tual and acoustic features. This system aims at providing a song to a given commercial video and was evaluated in the context of the MediaEval 2013 Soundtrack task [14]. The goal of this task is to predict the most suitable sound- track from a list of candidate songs, given a TV commer- cial. The organizers provide a development dataset includ- ing multimedia features. The initial assumption of the pro- posed system is that commercials which sell the same type of product, should also share the same music rhythm. A two-fold system is proposed: find commercials with close subjects in order to determine the mean rhythm of this sub- set, and then extract, from the candidate songs, the music which better corresponds to this mean rhythm. 1. INTRODUCTION The success of a product or a service essentially depends of the way to present it. Thus, companies pay much at- tention to choose the most appropriate advertisement that will make a difference in the customer choice. The ad- vertisers have different media possibilities, such as journal paper, radio, TV or Internet. In this context, they can ex- ploit the audio media (TV, radio...) to attract listeners using a song related to the commercial. The choice of an appro- priate song is crucial and can have a significant economic impact [5,18]. Usually, this choice is made by a human ex- pert. Nonetheless, while millions of musics exist, a human agent could only choose a song among a limited subset. This choice could then be inappropriate, or simply not the best one, since the agent could not search into a large num- c Mohamed Morchid, Richard Dufour, Georges Linar` es. Licensed under a Creative Commons Attribution 4.0 International Li- cense (CC BY 4.0). Attribution: Mohamed Morchid, Richard Dufour, Georges Linar` es. “A Combined Thematic and Acoustic Approach for a Music Recommendation Service in TV Commercials”, 15th International Society for Music Information Retrieval Conference, 2014. ber of musics. For these reasons, the need for an automatic song recommandation system, to illustrate advertisements, becomes a critical subject for companies. In this paper, an automatic system for songs recomman- dation is proposed. The proposed approach combines both textual (web pages) and audio (acoustic) features to select, among a large number of songs, the most appropriate and relevant music knowing the commercial content. The first step of the proposed system is to represent commercials into a thematic space built from a Latent Dirichlet Alloca- tion (LDA) [4]. This pre-processing subtask uses the re- lated textual content of the commercial. Then, acoustic features of each song are extracted to find a set of the most relevant songs for a given commercial. An appropriate benchmark is needed to evaluate the ef- fectiveness of the proposed recommandation system. For these reasons, the proposed system is evaluated in the con- text of the challenging MediaEval 2013 Soundtrack task for commercials [10]. Indeed, the MusiClef task seeks to make this process automated by taking into account both context- and content-based information about the video, the brand, and the music. The main difficulty of this task is to find the set of relevant features that best describes the most appropriate song for a video. Next section describes related work in topic space mod- eling for information retrieval and music tasks. Section 3 presents the proposed music recommandation system us- ing both textual content and acoustic features related to musics from commercials. Section 4 explains in details the unsupervised Latent Dirichlet Allocation (LDA) tech- nique, while Section 4.2 describes how the acoustic fea- tures are used to evaluate the proximity of a music to a commercial. Finally, experiments are presented in Sec- tion 5, while Section 6 gives conclusions and perspectives. 2. RELATED WORKS Latent Dirichlet Allocation (LDA) [4] is widely used in several tasks of information retrieval such as classifica- tion or keywords extraction. However, this unsupervised method is not much considered in the music processing tasks. Next sections describe related works using LDA techniques with text corpora (Section 2.1) and in the con- text of music tasks (Section 2.3). 15th International Society for Music Information Retrieval Conference (ISMIR 2014) 465

A COMBINED THEMATIC AND ACOUSTIC APPROACH FOR A …€¦ · Music Recommendation Service in TV Commercials”, 15th International Society for Music Information Retrieval Conference,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A COMBINED THEMATIC AND ACOUSTIC APPROACH FOR A …€¦ · Music Recommendation Service in TV Commercials”, 15th International Society for Music Information Retrieval Conference,

A COMBINED THEMATIC AND ACOUSTIC APPROACH FOR A MUSICRECOMMENDATION SERVICE IN TV COMMERCIALS

Mohamed Morchid, Richard Dufour, Georges LinaresLIA - University of Avignon (France)

{mohamed.morchid, richard.dufour, georges.linares}@univ-avignon.fr

ABSTRACT

Most of modern advertisements contain a song to illustratethe commercial message. The success of a product, andits economic impact, can be directly linked to this choice.Finding the most appropriate song is usually made man-ually. Nonetheless, a single person is not able to listenand choose the best music among millions. The need foran automatic system for this particular task becomes in-creasingly critical. This paper describes the LIA musicrecommendation system for advertisements using both tex-tual and acoustic features. This system aims at providinga song to a given commercial video and was evaluated inthe context of the MediaEval 2013 Soundtrack task [14].The goal of this task is to predict the most suitable sound-track from a list of candidate songs, given a TV commer-cial. The organizers provide a development dataset includ-ing multimedia features. The initial assumption of the pro-posed system is that commercials which sell the same typeof product, should also share the same music rhythm. Atwo-fold system is proposed: find commercials with closesubjects in order to determine the mean rhythm of this sub-set, and then extract, from the candidate songs, the musicwhich better corresponds to this mean rhythm.

1. INTRODUCTION

The success of a product or a service essentially dependsof the way to present it. Thus, companies pay much at-tention to choose the most appropriate advertisement thatwill make a difference in the customer choice. The ad-vertisers have different media possibilities, such as journalpaper, radio, TV or Internet. In this context, they can ex-ploit the audio media (TV, radio...) to attract listeners usinga song related to the commercial. The choice of an appro-priate song is crucial and can have a significant economicimpact [5,18]. Usually, this choice is made by a human ex-pert. Nonetheless, while millions of musics exist, a humanagent could only choose a song among a limited subset.This choice could then be inappropriate, or simply not thebest one, since the agent could not search into a large num-

c© Mohamed Morchid, Richard Dufour, Georges Linares.Licensed under a Creative Commons Attribution 4.0 International Li-cense (CC BY 4.0). Attribution: Mohamed Morchid, Richard Dufour,Georges Linares. “A Combined Thematic and Acoustic Approach for aMusic Recommendation Service in TV Commercials”, 15th InternationalSociety for Music Information Retrieval Conference, 2014.

ber of musics. For these reasons, the need for an automaticsong recommandation system, to illustrate advertisements,becomes a critical subject for companies.

In this paper, an automatic system for songs recomman-dation is proposed. The proposed approach combines bothtextual (web pages) and audio (acoustic) features to select,among a large number of songs, the most appropriate andrelevant music knowing the commercial content. The firststep of the proposed system is to represent commercialsinto a thematic space built from a Latent Dirichlet Alloca-tion (LDA) [4]. This pre-processing subtask uses the re-lated textual content of the commercial. Then, acousticfeatures of each song are extracted to find a set of the mostrelevant songs for a given commercial.

An appropriate benchmark is needed to evaluate the ef-fectiveness of the proposed recommandation system. Forthese reasons, the proposed system is evaluated in the con-text of the challenging MediaEval 2013 Soundtrack taskfor commercials [10]. Indeed, the MusiClef task seeks tomake this process automated by taking into account bothcontext- and content-based information about the video,the brand, and the music. The main difficulty of this taskis to find the set of relevant features that best describes themost appropriate song for a video.

Next section describes related work in topic space mod-eling for information retrieval and music tasks. Section 3presents the proposed music recommandation system us-ing both textual content and acoustic features related tomusics from commercials. Section 4 explains in detailsthe unsupervised Latent Dirichlet Allocation (LDA) tech-nique, while Section 4.2 describes how the acoustic fea-tures are used to evaluate the proximity of a music to acommercial. Finally, experiments are presented in Sec-tion 5, while Section 6 gives conclusions and perspectives.

2. RELATED WORKS

Latent Dirichlet Allocation (LDA) [4] is widely used inseveral tasks of information retrieval such as classifica-tion or keywords extraction. However, this unsupervisedmethod is not much considered in the music processingtasks. Next sections describe related works using LDAtechniques with text corpora (Section 2.1) and in the con-text of music tasks (Section 2.3).

15th International Society for Music Information Retrieval Conference (ISMIR 2014)

465

Page 2: A COMBINED THEMATIC AND ACOUSTIC APPROACH FOR A …€¦ · Music Recommendation Service in TV Commercials”, 15th International Society for Music Information Retrieval Conference,

2.1 Topic modeling

Several methods were proposed by Information Retrieval(IR) researchers to build topic spaces such as Latent Se-mantic Analysis or Indexing (LSA/LSI) [2, 6], that use asingular value decomposition (SVD) to reduce the spacedimension.

This method was improved by [11] which proposed aprobabilistic LSA/LSI (pLSA/pLSI). The pLSI approachmodels each word in a document as a sample from a mix-ture model, where the mixture components are multino-mial random variables that can be viewed as representa-tions of topics. This method demonstrated its performanceon various tasks, such as sentence [3] or keyword [24] ex-traction. In spite of the effectiveness of the pLSI approach,this method has two main drawbacks. The distribution oftopics in pLSI is indexed by training documents. Thus, thenumber of these parameters grows with the training doc-ument set size, and then, the model is prone to overfittingwhich is a main issue in an IR task such as document clus-tering. However, to address this shortcoming, a temperingheuristic is used to smooth the parameter of pLSI model foracceptable predictive performance. Nonetheless, authorsshowed in [20] that overfitting can occur even if temperingprocess is used.

As a result, IR researchers proposed the Latent Dirichletallocation (LDA) [4] method to overcome these two draw-backs. Thus, the number of parameters of LDA does notgrow with the size of the training corpus and LDA is notcandidate for overfitting. LDA is a generative model whichconsiders a document, seen as a bag-of-words [21], as amixture of latent topics. In opposition to a multinomialmixture model, LDA considers that a theme is associatedto each occurrence of a word composing the document,rather than associate a topic with the complete document.Thereby, a document can change of topics from a word toanother. However, the word occurrences are connected bya latent variable which controls the global respect of thedistribution of the topics in the document. These latenttopics are characterized by a distribution of word proba-bilities which are associated with them. pLSI and LDAmodels have been shown to generally outperform LSI onIR tasks [12]. Moreover, LDA provides a direct estimateof the relevance of a topic knowing a word set or a docu-ment such as a web pages in the proposed system.

α θ z w

β φ

wordtopic ND

topicdistribution

worddistribution

Figure 1. LDA Formalism.

Figure 1 presents the LDA formalism. For every docu-

ment d of a corpus D, a first parameter θ is drawn accordingto a Dirichlet law of parameter α. A second parameter φis drawn according to the same Dirichlet law of parameterβ. Then, to generate every word w of the document d, alatent topic z is drawn from a multinomial distribution onθ. Knowing this topic z, the distribution of the words isa multinomial of parameters φ. The parameter θ is drawnfor all the documents from the same prior parameter α.This allows to obtain a parameter binding the documentsall together [4].

2.2 Gibbs sampling

Several techniques have been proposed to estimate LDAparameters, such as Variational Methods [4], Expectation-Propagation [17] or Gibbs Sampling [8]. Gibbs Samplingis a special case of Markov-chain Monte Carlo (MCMC) [7]and gives a simple algorithm for approximate inferencein high-dimensional models such as LDA [9]. This over-comes the difficulty to directly and exactly estimate param-eters that maximize the likelihood of the whole data col-lection defined as: P (W |−→α ,−→β ) =

∏Mm=1 P (

−→wm|−→α ,−→β )

for the whole data collection W = {−→wm}Mm=1 knowingthe Dirichlet parameters −→α and

−→β .

The first use of Gibbs Sampling for estimating LDA isreported in [8] and a more comprehensive description ofthis method can be found in [9]. One can refer to these pa-pers for a better understanding of this sampling technique.

2.3 Topic modeling and Music

Topic modeling was already used in music processing, suchas [13], where the authors presented a system which learnsmusical key as a key-profile. Thus, the proposed approachconsidered a song as a random mixture of key-profiles.In [25], authors described a classification method to assigna label to an unseen music. The authors use LDA to builda topic space from music-tags to get the probability of ev-ery music-tag belonging to each music genre. Then, eachmusic is labeled to a genre knowing its tags. The purposeof the proposed approach is to find a set of relevant musicsfor a TV commercial.

3. PROPOSED APPROACH

The goal of the proposed automatic system is to recom-mend a set of musics given a TV commercial. The sys-tem uses external knowledge to find these songs. Theseexternal resources are composed with a set of TV commer-cials associated, for each one, with a song and a set of webpages (see [14] for more details about the MediaEval 2013Soundtrack task). The idea behind the proposed approachis to assume that two commercials sharing same subjects orinterests, also share the same kind of songs. The main issuein this approach is to find commercials, from the externaldataset, that have sets of subjects close to those in commer-cials from the test set. As described in Section 2.1, a doc-ument can be represented as a set of latent topics. Thus,two documents sharing the same topics could be seen asthematically close.

15th International Society for Music Information Retrieval Conference (ISMIR 2014)

466

Page 3: A COMBINED THEMATIC AND ACOUSTIC APPROACH FOR A …€¦ · Music Recommendation Service in TV Commercials”, 15th International Society for Music Information Retrieval Conference,

{C ,S }d∈D

dd

{C ,S }t∈T

t1

TopicSpaceLDA

Development set

Topic vectors

{V }d∈Dd

Topic vector

V1

Mapping

V d

V 1

cos (α ) -1

d,1

l commercial fromdevelopment set D

{C ,S }with the highestsimilarity with Cfrom test set T

l l

1S

S t

cos (α ) -1

l,t

S

A TV commercial C and the candidate songs S

from test set T

1t

5 nearest soundtracks{S }

with the commercial C1t

t=1, ... , 5

Cosine similarity α

Cosine similarity αMean rhythm pattern

Figure 2. Global architecture of the proposed system.

Basically, the first process of the proposed three stepsystem is to map each TV commercial from the test anddevelopment sets, into a topic space learnt with a LDA al-gorithm. A TV commercial from the test set is then linkedto TV commercials from development set sharing a set ofclose topics. Moreover, each commercial of the develop-ment set is related to a music. Thus, as a result, a commer-cial from the test set is related to a subset of songs fromthe development set, considered as thematically close tothe commercial textual content.

The second step has the responsibility to estimate a listof candidate songs (see Figure 2) using song audio featuresfrom the subset of songs thematically close associated dur-ing the first step. This subset of songs is used to evaluate arhythm pattern of the ideal song for this commercial.

The last step retrieves, from all candidate songs fromthe test set, the closest song to the rhythm pattern estimatedduring the previous step.

In details, the development set D is composed of TVcommercialsCd, with for each, a soundtrack Sd and a vec-tor representation V d related to the dth TV commercial. Inthe same manner, the test set T is composed of TV com-mercials Ct, with, for the tth one, a vector representationV t and a soundtrack St to predict. Then a similarity score{αd,t}t=1,...,T

d=1,...,D is computed for each commercial Cdi of the

development set given one from the test set Ct:

D = {Cd, V D, Sd}d=1,...,D (1)

T = {Ct, V T , Stk}k=1,...,5000

t=1,...,T .

In the next sections, the topic space representation and

the mapping of a commercial in this topic representationto evaluate both V d and V t are described. Then, the com-puted similarity score is detailed. Finally, the soundtrackprediction process from a TV commercial is explained.

4. TOPIC REPRESENTATION OF A TVCOMMERCIAL

Let’s consider a corpus D from the development set of TVcommercials with a word vocabulary V = {w1, . . . , wN}of size N . A topic representation from corpus D is thenperformed using a Latent Dirichlet Allocation (LDA) [4]approach. At the final LDA analysis, a topic space m of ntopics is obtained with, for each theme z, the probabilityof each word w of V knowing z, and for the entire modelm, the probability of each theme z knowing the model m.Each TV commercial from both development and test setsis mapped into the topic space (see Figure 3) to obtain avector representation (V d and V t) of web pages related toa commercial into the thematic space computed as follow:

V d[i](Cdj ) = P (zi|Cd

j ) (2)

where P (zi|Cdj ) is the probability of a topic zi to be

generated by the web pages from the commercial Cdj , esti-

mated using Gibbs sampling as described in Section 2.2. Inthe same way, V t is estimated with the same topic space,and with the use of web pages of commercials of test setCt

j (see Figure 3).

TV Commercial

z3

. . .

P (w1|z3)WORD WEIGHT

w1

P (w2|z3)w2

P (w|V ||z3)w|V |

. . .

z2

P (w1|z2)WORD WEIGHT

w1

P (w2|z2)w2

P (w|V ||z2)w|V |

. . .z1

P (w1|z1)WORD WEIGHT

w1

P (w2|z1)w2

P (w|V ||z1)w|V |

. . .

z4

P (w1|z4)WORD WEIGHT

w1

P (w2|z4)w2

P (w|V ||z4)w|V |

. . .zn

P (w1|zn)WORD WEIGHT

w1

P (w2|zn)w2

P (w|V ||zn)w|V |

. . .

. . .

Vd[1]

Vd[2]Vd[3]

Vd[4]Vd[n]

Figure 3. Mapping of a TV commercial in the topic space.

4.1 Similarity measure

Each commercial from both development and test set, ismapped into the topic space to produce a vector represen-tation for each one, respectively V d and V t as outcomes.Then, given a TV commercial C1 from the test set T, asubset of other TV commercials from the development setD is selected knowing their thematic proximity with C1.

15th International Society for Music Information Retrieval Conference (ISMIR 2014)

467

Page 4: A COMBINED THEMATIC AND ACOUSTIC APPROACH FOR A …€¦ · Music Recommendation Service in TV Commercials”, 15th International Society for Music Information Retrieval Conference,

To estimate the similarity between C1 and commercialsfrom development set, the cosine metric α is used. Thissimilarity metric is expressed thereafter:

cosine(V d, V t) = αd,t

=

n∑i=1

V d[i]× V t[i]√n∑

i=1

V d[i]2

√n∑

i=1

V t[i]2

(3)

This metric allows to extract a subset of commercialsfrom D thematically close to C1.

4.2 Rhythm pattern

The cosine measure, presented in previous section, is alsoused to evaluate the similarity between a mean rhythm pat-tern vector Sd of a song, and all the candidate songs St

k ofthe test set.

<?xml version="1.0" ?><rhythmdescription> <media>363445_sum.wav</media> <description> <bpm_mean>99.982723</bpm_mean> <bpm_std>0.047869</bpm_std> <meter>22.000000</meter> <perc>47.023527</perc> <perc_norm>1.910985</perc_norm> <complex>29.630575</complex> <complex_norm>0.652134</complex_norm> <speed>2.660229</speed> <speed_norm>1.201633</speed_norm> <periodicity>0.900763</periodicity> <rhythmpattern>0.124231 ... 0.098873</rhythmpattern> </description></rhythmdescription>

bmp_meanbmp_std

meterperc

perc_normcomplex

complex_normspeed

speed_normpeiodicity

rhythmpattern_1rhythmpattern_2

...rhythmpattern_48{

Rhythm pattern vectorRhythm pattern of a song

(a) (b)

Figure 4. Rhythm pattern of a song from the developmentset in xml (a) and vector (b) representations.

In details, each commercial from D is related with asoundtrack that is represented with a rhythm pattern vector.The organizers provide for each song contained into theMusicClef 2013 dataset:

• video features (MPEG-7 Motion Activity and Scal-able Color Descriptor [15]),

• web pages about the respective brands and musicartists,

• music features:

− MFCC or BLF [22],

− PS209 [19],

− beat, key, harmonic pattern extracted with theIrcam software [1].

In our experiments, 10 rhythm features of songs areused (speed, percussion, . . . , periodicity) as shown in Fig-ure 4. These features of beat, key or harmonic pattern are

extracted using the Ircam software available at [1]. Moreinformation about features extraction from songs are de-tailed in [14].

As an outcome, each commercial is represented by arhythm pattern vector of size 58 (10 from song features and48 from rhythm pattern). From the subset of soundtracksof the l nearest commercials from D, a mean rhythm vectorS is performed as:

S =1

l

∑d∈l

Sd .

Finally, the cosine measure between this mean rhythmS of the l nearest commercials from D, and each commer-cial (cosine(S, St)t∈T ), is used to find, from the sound-track St of the test set T, the 5 songs from all the candi-dates having the closest rhythm pattern.

5. EXPERIMENTS AND RESULTS

Previous sections described the proposed automatic musicrecommandation system for TV commercials. This sys-tem is decomposed into three sub-processes. The first onemaps the commercials into a topic space to evaluate theproximity of a commercial from the test set and all com-mercials from the development set. Then, the mean rhythmpattern of the thematically close commercials is computed.Finally, this rhythm pattern is computed with all ones fromthe test set of candidate songs to find a set of relevant mu-sics.

5.1 Experimental protocol

The first step of the proposed approach, detailed in pre-vious section, maps TV commercial textual content into atopic space of size n (n = 500). This one is learnt from aLDA in a large corpus of documents. Section 4 describesthe corpus D of web pages. This corpus contains 10, 724Web pages related to brands of the commercials containedin D. This corpus is composed of 44, 229, 747 words fora vocabulary of 4, 476, 153 unique words. More detailsabout this text corpus, and the way to collect it, is explainedinto [14].

The first step of the proposed approach is to map eachcommercial textual content into a topic space learnt from alatent Dirichlet allocation (LDA). During the experiments,the MALLET tool is used [16] to perform a topic model.The proposed system is evaluated in the MediaEval 2013MusiClef benchmark [14]. The aim of this task is to pre-dict, for each video of the test set, the most suitable sound-track from 5,000 candidate songs. The dataset is split into3 sets. The development set contains multimodal infor-mation on 392 commercials (various metadata includingYoutube uploader comments, audio features, video fea-tures, web pages and text features). The test set is a setof 55 videos to which a song should be associated usingthe recommandation set of 5,000 soundtracks (30 secondslong excerpts).

15th International Society for Music Information Retrieval Conference (ISMIR 2014)

468

Page 5: A COMBINED THEMATIC AND ACOUSTIC APPROACH FOR A …€¦ · Music Recommendation Service in TV Commercials”, 15th International Society for Music Information Retrieval Conference,

5.2 Experimental metrics

For each video in the test set, a ranked list of 5 candidatesongs should be proposed. The song prediction evaluationis manually performed using the Amazon Mechanical Turkplatform. This novel task is non-trivial in terms of “groundtruth”, that is why human ratings for evaluation are used.Three scores have been computed from our system output.Let V be the full collection of test set videos, and let sr(v)be the average suitability score for the audio file suggestedat rank r for the video v. Then, the evaluation measuresare computed as follows:

• Average suitability score of the first-ranked song:

1V

|V |∑i=1

s1(vi)

• Average suitability score for the full top-5:

1V

|V |∑i=1

15sr(vi)

• Weighted average suitability score of the full top-5. Here, we apply a weighted harmonic mean scoreinstead of an arithmetic mean:1V

|V |∑i=1

∑5r=1 sr(vi)∑5r=1

sr(vi)

r

The previously presented measures are used to studyboth rating and ranking aspects of the results.

5.3 Results

The measures defined in the previous section are used toevaluate the effectiveness of songs selected to be associ-ated to TV commercials from the test set. The proposedtopic space-based approach is evaluated in the same way,and obtained the results detailed thereafter:

• First rank average score: 2.16

• Top 5 average score (arithmetic mean): 2.24

• Top 5 average score (harmonic mean, taking rankinto account): 2.22

Considering that human judges rate the predicted songsfrom 1 (very poor) to 4 (very well), we can consider thatour system is slightly better than the mean evaluation score(2) no matter the metric considered. While the systemproposed in [23] is clearly different from ours, results arevery similar. This shows the difficulty to build an auto-matic song recommendation system for TV commercials,the evaluation being also a critical point to discuss.

6. CONCLUSIONS AND PERSPECTIVES

In this paper, an automatic system to assign a soundtrackto a TV commercial has been proposed. This system com-bines two media: textual commercial content and audiorhythm pattern. The proposed approach obtains good re-sults in spite of the fact that the system is automatic and un-supervised. Indeed, both subtasks are unsupervised (LDAlearning and commercials mapping into the topic space)

and songs extraction (rhythm pattern estimation of the idealsongs for a commercial from the test set). Moreover, thispromising approach, combining thematic representation ofthe textual content of a set of web pages describing a TVcommercial and acoustic features, shows the relevance oftopic-based representation in automatic recommandationusing external resources (development set).

The choice of a relevant song to describe the idea behinda commercial, is a challenging task when the frameworkdoes not take into account relevant features related to:

• mood, such as harmonic content, harmonic progres-sions and timbre,

• music rhythm, such as musical style, texture, spec-tral centroid, or tempo.

The proposed automatic music recommendation systemis limited by this small number (58) of features which notdescribe all music aspects. For these reasons, in futureworks, we plan to use others features, such as the songlyrics or the audio transcription of the TV commercials,and evaluate the effectiveness of the proposed hybrid frame-work into other information retrieval tasks such as classifi-cation of music genre or music clustering.

7. REFERENCES

[1] Ircam. analyse-synthse: Software. Inhttp://anasynth.ircam.fr/home/software., Accessed:Sept. 2013.

[2] J.R. Bellegarda. A latent semantic analysis frameworkfor large-span language modeling. In Fifth EuropeanConference on Speech Communication and Technol-ogy, 1997.

[3] J.R. Bellegarda. Exploiting latent semantic informa-tion in statistical language modeling. Proceedings ofthe IEEE, 88(8):1279–1296, 2000.

[4] D.M. Blei, A.Y. Ng, and M.I. Jordan. Latent dirichletallocation. The Journal of Machine Learning Research,3:993–1022, 2003.

[5] Claudia Bullerjahn. The effectiveness of music in tele-vision commercials. Food Preferences and Taste: Con-tinuity and Change, 2:207, 1997.

[6] S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Lan-dauer, and R. Harshman. Indexing by latent semanticanalysis. Journal of the American society for informa-tion science, 41(6):391–407, 1990.

[7] Stuart Geman and Donald Geman. Stochastic relax-ation, gibbs distributions, and the bayesian restorationof images. IEEE Transactions on Pattern Analysis andMachine Intelligence, (6):721–741, 1984.

[8] Thomas L Griffiths and Mark Steyvers. Finding sci-entific topics. Proceedings of the National academy ofSciences of the United States of America, 101(Suppl1):5228–5235, 2004.

15th International Society for Music Information Retrieval Conference (ISMIR 2014)

469

Page 6: A COMBINED THEMATIC AND ACOUSTIC APPROACH FOR A …€¦ · Music Recommendation Service in TV Commercials”, 15th International Society for Music Information Retrieval Conference,

[9] Gregor Heinrich. Parameter estimation for text anal-ysis. Web: http://www. arbylon. net/publications/text-est. pdf, 2005.

[10] Nina Hoeberichts. Music and advertising: The effect ofmusic in television commercials on consumer attitudes.Bachelor Thesis, 2012.

[11] T. Hofmann. Probabilistic latent semantic analysis. InProc. of Uncertainty in Artificial Intelligence, UAI ’ 99,page 21. Citeseer, 1999.

[12] Thomas Hofmann. Unsupervised learning by proba-bilistic latent semantic analysis. Machine Learning,42(1):177–196, 2001.

[13] Diane Hu and Lawrence K Saul. A probabilistictopic model for unsupervised learning of musical key-profiles. In ISMIR, pages 441–446, 2009.

[14] Cynthia C. S. Liem, Nicola Orio, Geoffroy Peeters, andMarkus Scheld. MusiClef 2013: Soundtrack Selectionfor Commercials. In MediaEval, 2013.

[15] Bangalore S Manjunath, Philippe Salembier, andThomas Sikora. Introduction to MPEG-7: multimediacontent description interface, volume 1. John Wiley &Sons, 2002.

[16] Andrew Kachites McCallum. Mallet: A machine learn-ing for language toolkit. http://mallet.cs.umass.edu,2002.

[17] Thomas Minka and John Lafferty. Expectation-propagation for the generative aspect model. In Pro-ceedings of the Eighteenth conference on Uncertaintyin artificial intelligence, pages 352–359. Morgan Kauf-mann Publishers Inc., 2002.

[18] C Whan Park and S Mark Young. Consumer responseto television commercials: The impact of involvementand background music on brand attitude formation.Journal of Marketing Research, pages 11–24, 1986.

[19] Tim Pohle, Dominik Schnitzer, Markus Schedl, PeterKnees, and Gerhard Widmer. On rhythm and generalmusic similarity. In ISMIR, pages 525–530, 2009.

[20] Alexandrin Popescul, David M Pennock, and SteveLawrence. Probabilistic models for unified collabora-tive and content-based recommendation in sparse-dataenvironments. In Proceedings of the Seventeenth con-ference on Uncertainty in artificial intelligence, pages437–444. Morgan Kaufmann Publishers Inc., 2001.

[21] G. Salton. Automatic text processing: the transforma-tion. Analysis and Retrieval of Information by Com-puter, 1989.

[22] Klaus Seyerlehner, Gerhard Widmer, and Tim Pohle.Fusing block-level features for music similarity esti-mation. In Proc. of the 13th Int. Conference on DigitalAudio Effects (DAFx-10), pages 225–232, 2010.

[23] Han Su, Fang-Fei Kuo, Chu-Hsiang Chiu, Yen-JuChou, and Man-Kwan Shan. Mediaeval 2013: Sound-track selection for commercials based on content cor-relation modeling. In MediaEval 2013, volume 1043 ofCEUR Workshop Proceedings. CEUR-WS.org, 2013.

[24] Y. Suzuki, F. Fukumoto, and Y. Sekiguchi. Keywordextraction using term-domain interdependence for dic-tation of radio news. In 17th international conferenceon Computational linguistics, volume 2, pages 1272–1276. ACL, 1998.

[25] Chao Zhen and Jieping Xu. Multi-modal music genreclassification approach. In Computer Science and In-formation Technology (ICCSIT), 2010 3rd IEEE In-ternational Conference on, volume 8, pages 398–402.IEEE, 2010.

15th International Society for Music Information Retrieval Conference (ISMIR 2014)

470