Song Recommendation for Social Singing Community

Song Recommendation for Social Singing Community

Kuang Mao1 Ju Fan2 Lidan Shou1 Gang Chen1 Mohan Kankanhalli21College of Computer Science, Zhejiang University, Hangzhou, China2School of Computing, National University of Singapore, Singapore

[email protected]; [email protected]; {should,cg}@zju.edu.cn; [email protected]

ABSTRACTNowadays, an increasing number of singing enthusiasts up-load their cover songs and share their performances in on-line social singing communities. They can also listen andrate other users’ song renderings. An important feature ofthe social singing communities is to recommend appropriatesinging-songs which users are able to perform excellently.In this paper, we propose a singing-song recommendation

framework to make song recommendation in social singingcommunity. Instead of recommending songs that people liketo listen, we recommend suitable songs that people can singwell. We propose to discover the song difficulty orderingsfrom the song performance ratings of each user. We trans-form the difficulty orderings into a difficulty graph and pro-pose an iterative inference algorithm to make singing-songrecommendation on the difficulty graph. The experimentalresult shows the effectiveness of our proposed framework.To the best of our knowledge, our work is the first study ofsinging-song recommendation in social singing communities.

Categories and Subject DescriptorsH.3.3 [Information Search and Retrieval]: Informationfiltering; H.5.5 [Sound and Music Computing]: Systems

General TermsAlgorithms; Experimentation

KeywordsSinging-song recommendation; social singing community; rec-ommender systems

1. INTRODUCTIONSinging and listening in Karaoke is an enjoyable form of

entertainment for many people. With the popularity of high-speed wireless connectivity, the Web today is increasinglyseeing a new form of online Karaoke known as the SocialSinging Community (SSC). A singer in a SSC can record a

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full cita-tion on the first page. Copyrights for components of this work owned by others thanACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected] ’14, November 3-7,2014, Orlando, USACopyright 2014 978-1-4503-3063-3/14/11...$15.00http://dx.doi.org/10.1145/2647868.2654921.

Figure 1: Screenshot of Sing! interface

performance (or album) via her/his device, and then pub-lish to SSC, allowing for access to it at any place and anytime. On the other hand, a listener, after enjoying the per-formances disseminated by the system, gives ratings, likes,or comments to the performances or singers. To date, a greatnumber of SSCs have emerged on the Web. Some popularones include Smule1, 5sing2, and Sing!3. For example, 5singhas a population of 1.3 million active users and a repositoryof 7.5 million cover songs sung by its users by 2013. Figure 1shows the interface of a well-known mobile SSC applicationcalled Sing!.

While SSC can serve as a conventional platform of listen-ing to songs for the listeners, it provides an essential “virtualstage” for the amateur singers. Unlike conventional musiccommunities in which listeners rate the songs and interactwith other listeners, SSC focuses more on the singers, by en-couraging interaction between the singers and the listenersvia ratings on the performances.

One important service in SSC is to provide singing-songrecommendation for singers. Despite the abundance of s-tudies in song recommendation [7], the problem of singing-song recommendation in SSC has its unique characteristicwhich has not been addressed adequately in the literature.Specifically, the conventional song recommendation focuseson discovering songs that a user likes to listen. In contrast,singing-song recommendation aims to find the songs whichare easier to lead to high singing performances. In partic-ular, higher singing performance in SSC means the corre-

1http://www.smule.com/2http://www.5sing.com/3https://itunes.apple.com/app/sing!-karaoke-by-smule/id509993510?mt=8

127

sponding songs can help a singer obtain more appreciationand higher ratings from the listeners. Acquiring higher rat-ing is very important to the singer, because it can not onlybring acknowledgement and encouragement, but also helpher promote the influence in the community and further at-tract more fans. In the rest of the paper, for ease of presen-tation, we use “song s1 is easier than s2” to represent that s1is easier to incur higher singing performance than s2. Notethat this may not necessarily reflect the difficulty in terms ofacoustic features (melody, pitch range, etc.): some easy-to-sing songs may not bring impressive performances and highratings due to their unattractive melodies.Although a previous study published in [15] has considered

singing-song recommendation, it cannot address our prob-lem in SSC. The work in [15] proposed a solution called com-petence based song recommendation, which relies on com-plex modeling of singer’s vocal competence and song profiles,and conducts a matching between singer competence andsongs using a machine learning algorithm. Unfortunately,this approach is impractical for singing-song recommenda-tion in SSC, due to the following reasons: First, the vocalcompetence modeling in [15] employs an expensive recordingprocess which requires the user to sing every pitch in his/herpitch range from the softest to the loudest, under the super-vision of music professionals. This requirement cannot bepractically achieved in SSC. Second, the song profiles areacquired from music scores and manually annotated singingintensity data. Such process requires prohibitive manualworks of cleaning, calibration, and annotation, which is im-possible under the circumstance of large-scale SSC.In this paper, we report our research on SInging-song Rec-

ommendation (SIR) for a social singing community. Themain objective is to study how to recommend songs thatusers have large potentials to perform well and finally ob-tain high ratings from the listeners. Specifically, we developa SIR framework which takes a user’s singing history as in-put, and, from the singing history, predicts which songs theuser can perform well. Our song recommendation is differ-ent from the traditional song retrieval which focuses onlyon matching users’ listening interests. We pay particular at-tention to matching the users’ singing performance needs. Italso differs with Shou [15]’s work which focuses on matchingsongs with singer’s singing model. We discover the songs’difficulty evidences in terms of performance from listenerratings and recommend songs that are most probable to in-cur better performance than the songs sung by a user in thehistory. To the best of our knowledge, this is the first workto deal with the SIR in a SSC.Our singing-song recommendation faces two main techni-

cal challenges. The first one is how to determine whethera song is easier than another one, i.e., the difficulty rela-tion in performance between two songs. As such knowledgeof difficulty relation is not off-the-shelf, one approach is toask a group of singers to judge the difficulty in performancebetween two songs. However, it is costly to employ peopleto compare the difficulty. Another approach is to indirect-ly discover the difficulty comparison results from SSC. Theproblem is we need to determine how reliable the obtaineddifficulty relations are.The second challenge is how to make recommendation

based on the aforementioned song difficulty relations. Tomake full use of these relations, we need to model thesesong difficulty relations so that we can infer new difficulty

relations from existing ones. For song recommendation, it ischallenging to estimate whether a user can perform a specificsong well based on the inference from their singing history.

To solve the first challenge, we propose to discover thesong difficulty ordering from the singing performance rat-ing of the songs in users’ singing history. We derive songdifficulty orderings from the performance ratings given tothe songs sung by the same user. This is not easy, however,as the diversity in singer’s vocal competence and listener’spreference may probably lead to inconsistent ratings. Suchuncertainty poses the following questions to us: Is the dis-covered difficulty ordering valid for every singer? Are there“common”orders which are valid for most singers and ad-hocones which depend largely on individual singers. To answerthese questions, we need to measure the reliability of songdifficulty orderings as the result of derivation from humanjudgments.

To address the second challenge, we capture all the songdifficulty orderings in a difficulty graph, where each vertexrepresents a song. We evaluate whether the user can sing asong well, using a quantity called performance degree. Giv-en a song (vertex), this quantity can be estimated from itsneighboring vertices in the graph, using a probabilistic in-ference technique. Since a user’s singing history covers onlya few vertices in the difficulty graph, we propose an Iter-ative Probabilistic Inference (IPI) algorithm in the graphwhich iteratively “spreads” the performance degrees fromthe user’s singing history to other songs. Note that, theproposed recommendation framework can also be general-ized to other competence-related recommendation problemssuch as books, games, where we only need to determine thedifficulty relations between vertices in the difficulty graph.

The advantage of this approach in the current problemis that, the difficulty orderings discovered in one SSC canbe reused and updated in another SSC system which is away to ease the cold start problem in recommendation sys-tem. Given sufficient resources, it is also possible to employmusic experts to build an ever-expanding song difficulty or-dering database and then provide singing-song recommen-dation services for any social singing community. In thispaper, we publish the very first difficulty ordering databaseof the songs used in our experiments4.

Our contributions can be summarized as follows:(1) We propose a singing song recommendation frame-

work to solve the song recommendation problem for a socialsinging community.

(2) We propose a method for discovering the song difficul-ty orderings from the social singing community and measurethe reliability of the discovered difficulty orderings.

(3) We present a difficulty graph to model song difficultyorderings and build up a probabilistic inference technique toestimate a song’s performance degree.

(4) We introduce a ranking algorithm for singing songrecommendation from a difficulty graph.

(5) Our experiment shows promising results for the pro-posed recommendation framework.

The rest of the paper is organized as follows: Section 2introduces the related work. In Section 3, we define our re-search problem and provide an overview of the framework.Section 4 describes how to discover the song difficulty or-derings from social singing community. Section 5 introduces

4http://db.zju.edu.cn/dataDownload/index.html

128

the inference technique in difficulty graph and describes thesong recommendation algorithm. Section 6 describes the de-tails of the experiments and the results. Finally, in Section7 we conclude the paper.

2. RELATED WORKIn this paper, we study the problem of singing song rec-

ommendation for social singing community. This researchis related to the traditional song recommendation, singingsong recommendation and graph ranking algorithms. Wewill give a brief review of these works and explain the dif-ferences with our work.

2.1 Traditional Song RecommendationTraditional song recommendation systems are designed

for discovering songs that satisfy user’s listening interests.The early methods for song recommendation such as [10]explore techniques in the domain of content based song rec-ommendation. These techniques use low level features torepresent user’s preferences of the songs such as moods andrhythms and recommend songs that have a large similarityon low level features. However, it suffers from the semanticgap between low level features and user’s preferences.The classic way to recommend songs is via Collaborative

Filtering (CF) [8], which can be divided into user-based CF[6] and item-based CF [14]. The user-based CF first predictsthe human ratings for unrated items (songs) from a groupof similar users and makes recommendation by ranking thepredicted rating of each items. The intuition of CF (user-based) comes from the assumption that if users A and Bhave a strong similarity in the choice of listening songs, theirsongs can be recommended to each other. However, it is notalways valid for recommending songs for singing. People willtend to sing some popular songs which are liked by manypeople, however they will also sing some unique songs. Itis inappropriate to recommend such unique songs to otherswithout considering the songs’ difficulty in performance.Another straightforward way is to use an ordinary graph

to do song recommendation, it first builds a user-item bi-graph and performs random walk to get the song rankingwhere those songs that share many connected users with thequery song will get higher ranks. It intuitively recommendspopular songs to satisfy user’s interest needs. It also ignorestaking song’s difficulty in performance into consideration.Wu [16] proposed a next-song recommendation strategy foronline karaoke users using personalized markov embedding.They also focus on matching user’s singing interests, whileignoring singer’s demands for high performance in karaoke.In our proposed approach, we will discover song’s diffi-

culty in performance. The recommendation is based on thesong difficulty orderings which guarantees that recommend-ed songs are easier than the user’s past sung songs.

2.2 Singing Song RecommendationSinging song recommendation is a relatively new research

area which was first proposed by Mao et al. [13] in 2012.Then Shou et al. [15] presented a competence based songrecommendation system which utilizes a singer’s digitizedvoice recording to recommend a list of songs based on theanalysis of singer’s vocal competence. They require a profes-sional recording process to acquire user’s vocal competence,called singer profile, and model the song profile which isthe singing notes distribution within each song. After that,they propose a learning to rank scheme for recommending

songs using the singer profile. Their system has two majordrawbacks for recommending songs for a social singing com-munity. The first one is that it requires a complex user’svocal competence acquisition process which is not fit for thesocial singing community. Another drawback is that it need-s each song’s score to build the song model which is hardto acquire for large number of songs in the social singingcommunity setting.

In our work, we avoid modeling user’s vocal competenceand do not need to know song’s score for singing song rec-ommendation. We only need the users’ singing history to dothe recommendation.

2.3 Ranking on the GraphGraph ranking studies the ranking method for the graph

vertices. Most methods [4, 5, 18] are based on regulariza-tion theory for graphs. Zhou et al. [19] propose a universalranking algorithm, which can exploit the intrinsic manifoldstructure of the data. They first build a weighted graph,define and normalize the weight of the edge. During eachiteration, each vertex spreads its rank to the neighbors vi-a the weighted network until the ranks converge. Agarw-al [1] develops an algorithmic framework for learning rank-ing functions on both directed and undirected graph data.They build the graph which acts as a regularizer to makesure closely connected document will get a similar value.Baluja [3] proposed a video co-view graph and developed arandom walk algorithm to do video recommendation.

For our graph ranking method, the basic idea is similar tothat of Zhou et al. [3, 19]. We use a probabilistic modelingtechnique to model our difficulty graph. We rank the songsby iteratively doing probabilistic inference in the graph.

3. SINGING-SONG RECOMMENDATIONIn this section, we formalize the problem of song recom-

mendation for social singing community in Section 3.1 andintroduce our recommendation framework in Section 3.2.

3.1 Problem StatementSocial Singing Community. Our work studies a socialsinging community with a set of users U = {u1, u2, . . . , um}and a collection of songs S = {s1, s2, . . . , sn}. In the com-munity, each user uk ∈ U has a singing history that consistsof a subset of songs in S, denoted by Sk ⊆ S. The user uk

could post this history in the community to let other userslisten to the songs in Sk. Then, the listeners may evaluateuk’s performance on these songs by providing ratings on ascale from 1 to 5. We compute the average rating of listenerson each song ski ∈ Sk performed by uk and denote this rat-ing as rki . The input phase of Figure 2 provides an exampleof user’s singing histories. In this example, the user u1 hasa history consisting of songs s1, s2, etc., where the songs areassociated with the average rating by listeners, e.g., ratingfor s1 is 4.3 for user u1.

Singing-song Recommendation. In this work, we aimto recommend the songs that users have the potential toperform well. To better illustrate this recommendation ob-jective, let us consider a user uk and two songs si and sj inthe song set S − Sk. If the performance of uk on si couldattract more attention from the listeners and the listenerswould give higher ratings to this song, then song si wouldbe preferable than sj in our recommendation.

129

Figure 2: Overview of the singing-song recommendation framework

3.2 Recommendation FrameworkOur framework for singing-song recommendation is illus-

trated in Figure 2. The framework takes as input the users’singing histories in the community and recommends songsto each individual user. As mentioned in the introduction,the basic idea is to infer whether a user can perform well ona song by considering song rendition difficulty. Intuitively,if a song is much more difficult than the singing history ofa user, then the song is more likely to be inappropriate forthe user, and it would not be recommended. Specifically,the framework recommends songs through two phases.

Phase I - Discovering song difficulty orderings. Inthis phase, we discover the ordering of any two songs interms of difficulty, which captures whether song si is moredifficult to perform well than sj to the users, denoted bysi > sj . We discover the difficulty orderings from users’singing histories: if si has lower ratings than sj in manysinging histories, we conjecture that si is more difficult thansj . Then, we evaluate the reliability of the conjecture byconsidering the following two factors.

1) The first factor is the support of the ordering, that is,how many singers support si is more difficult than sj .

2) The second factor is the confidence of the ordering, thatis, whether there are many other singing histories supportthe opposite conclusion, i.e., sj is more difficult than si.

We will formally define song difficulty ordering and presentour method of estimating support and confidence of difficul-ty ordering in Section 4.

Phase II - Recommending singing-songs. In this phase,we recommend songs that users have the potential to per-form well. To this end, given a user uk, we need to inferwhich songs are easier to get a good performance comparedto the songs in uk’s singing history. We make the inferencewith the help of the difficulty orderings obtained in the firstphase. More specifically, we first construct a difficulty graphbased on difficulty orderings, where each vertex representsa song and a directed edge captures the difficulty orderingof two songs. Then, we develop an iterative probabilisticinference algorithm to obtain songs which are most likely tobe ”easier” to perform well for a user than those in her/hissinging history. We will discuss the details of the algorithmin Section 5.We illustrate our framework by using the example in Fig-

ure 2. In the first phase, the framework discovers difficultyorderings for song pairs and evaluates the reliability of eachordering. For instance, there are 2 users’ singing historysupporting s1 > s2 while only 1 singing history is support-

ing s2 > s1. Based on this, we estimate the support andconfidence of s1 > s2 for evaluating the reliability of thisordering. In the second phase, we organize the songs intoa graph based on difficulty orderings as shown in the rightpart of Figure 2. The gray vertices represent the songs in auser’s singing history while the white vertices are the can-didates to be recommended. Then, our inference algorithmapplies an iterative strategy to infer the likelihood that eachwhite vertex is easier than gray vertices and recommendsthe top songs with the highest likelihood scores.

4. SONG DIFFICULTY ORDERINGThis section presents the technique of discovering difficul-

ty ordering of singing-songs. We formalize the notion of songdifficulty ordering in Section 4.1 and estimate the reliabilityof song difficulty orderings in Section 4.2. Finally, we dis-cuss how to discover difficulty orderings from users’ singinghistories in Section 4.3.

4.1 Song Difficulty OrderingThe knowledge of users’ performance on some songs can

help us infer their performance on other songs. Intuitively,if a user performs well on a song si, then we can conjec-ture that she is also be capable of singing the songs that are“easier” to perform well than si. To formalize this, we intro-duce the idea of song difficulty ordering to capture whethera song is more difficult to perform well than another one.Specifically, given two songs si and sj , we use the notationsi > sj to represent one difficulty ordering, which means siis more difficult to perform well than sj .

We introduce a voting-based strategy to determine the d-ifficulty ordering of two songs. The basic idea is as follows.Suppose that a group of singers sing and vote which songfrom s1 and s2 is more difficult to get a good performance.Then, we can obtain a set of votes for difficulty comparisonof s1 and s2. If more singers vote si as the difficult one,we can obtain a difficulty ordering si > sj based on thesevotes. We will discuss how we obtain singers’ votes later inSection 4.3. More formally, we first introduce the difficultvoting as follows.

Definition 1 (Difficulty Voting). Difficulty votingis the process of voting the relative difficulty of two songssi and sj in getting a better performance, where nij andnji represent the numbers of votes that support si > sj andsj > si respectively.

For instance, if there are 7 singers think si is more difficultthan sj while 3 singers vote for sj , then we have nij = 7 andnji = 3 in the difficulty voting of the two songs.

130

Now we define the song difficulty ordering of two songs.

Definition 2 (Song Difficulty Ordering). The songdifficulty ordering of song si and sj is the majority choicein difficulty voting for si > sj and sj > si.

Continuing the above example, because si and sj ’s difficultyvoting results is nij > nji, we get the song difficulty orderingfor the two songs as si > sj .

4.2 Reliability of Difficulty OrderingAs singers may not always have agreement on the difficul-

ty ordering, we need to estimate the reliability of difficultyordering from the corresponding difficulty voting. We con-sider two constituents of reliability, namely the support andthe confidence of difficulty ordering.

4.2.1 SupportWe first consider the factor that how many singers support

the ordering si > sj . We denote the Support of difficultyordering by supp(si > sj). Intuitively, a difficulty orderingwhich is supported by more singers would be more reliablethan the one with less support. We define the support ofthe difficulty ordering

Definition 3 (Support of Difficulty Ordering).The support of the difficulty ordering si > sj, denoted assupp(si > sj), is the number of votes that si > sj gets.

Then we need to measure whether a difficulty ordering getssufficient supports. The difficulty orderings are used for rec-ommendation. si will form difficulty orderings with manyother songs, for example si > st. It would be more reliableto use the one with more supports to do recommendation.So we estimate the probability that si > sj gets sufficientsupports as follow:

Psuff (si > sj) =supp(si > sj)∑

st∈Sσsupp(si > st)

(1)

where Sσ is a song set and each st in Sσ forms a songdifficulty ordering si > st.For example, if we have only two difficulty orderings si >

sj and si > st who have support of 3 and 2 respectively.Then Psuff (si > sj) = 3/(2 + 3).

4.2.2 ConfidenceWhile considering how many singers support an ordering,

the support does not take into account how many singersoppose that ordering. For example, the vote number of si >sj and sj > si is 10 and 9 respectively, although si > sj hasmany votes, the reliability of si > sj is doubtful due to thelarge number of opposing votes. To address this problem, wefurther consider how confidently can we “trust” a difficultyordering as the Confidence of the difficulty ordering, denotedby conf(si > sj).We define the confidence of difficulty ordering

Definition 4 (Confidence of Difficulty Ordering).The confidence of the difficulty ordering si > sj , denoted asconf(si > sj), is the probability of getting the judgment siis more difficult than sj , given si > sj and sj > si’s votenumber nij and nji.

As the definition, we represent conf(si > sj) in the probabil-ity form P (si > sj |nij , nji). To compute P (si > sj |nij , nji),we first use Bayes’s theorem:

P (si > sj |nij , nji) =P (nij , nji|si > sj)P (si > sj)

P (nij , nji)(2)

We assume si,sj have equal probability to be more difficultwhich makes P (si > sj) and P (sj > si) the same value.Then

P (si > sj |nij , nji)

P (sj > si|nji, nij)=

P (nij , nji|si > sj)

P (nji, nij |sj > si)(3)

If we assume P (si > sj |nij , nji)+P (sj > si|nji, nij) = 1,together with Equation 3, we get

P (si > sj |nij , nji) =P (nij , nji|si > sj)

P (nij , nji|si > sj) + P (nji, nij |sj > si)(4)

Then we need to calculate the probability of the votingresult given the difficulty ordering. The subjects can onlyvote one of si and sj . We consider the process of subject’svoting as a Bernoulli trial. We set the human’s voting ac-curacy to be p. The probability of observing nij ,nji is theBernoulli distribution probability mass function:

P (nij , nji|si > sj) = Cnij

nij+njipnij (1− p)nji (5)

P (nji, nij |sj > si) = Cnji

nij+njipnji(1− p)nij (6)

Now we begin to estimate p from user’s difficulty voting.We denote τ as our song difficulty ordering set. γ as anopposite difficulty ordering set with τ , their sizes being thesame. For example, if si > sj ∈ τ , then sj > si ∈ γ. Weestimate p as follows

p =

∑(si>sj)∈τ V (si > sj)∑

(sj>si)∈γ V (sj > si) +∑

(si>sj)∈τ V (si > sj)(7)

Where V (si > sj) is the difficulty votes si > sj gets.Knowing P (nij , nji|si > sj) and P (nji, nij |sj > si), using

Equation 4 we can obtain P (si > sj |nij , nji)Similar to the sufficient of the support, we need to mea-

sure the probability of the difficulty ordering to be used inrecommendation. We estimate whether the probability thatsi > sj gets a high confidence as follow:

Pconf (si > sj) =conf(si > sj)∑

st∈Sσconf(si > st)

(8)

where Sσ is a song set and each st in Sσ forms a song diffi-culty ordering si > st.

4.2.3 ReliabilityIt is more reliable to make recommendation based on the

song difficulty orderings which have both sufficient supportand high confidence. We define the reliability of song diffi-culty ordering as follows.

Definition 5 (Reliability of Difficulty Ordering).The reliability of the difficulty ordering si > sj, denotes asReli(si > sj), is the probability of the event that si > sjgets sufficient supports and a high confidence.

We define the event that si > sj gets sufficient supportsas S(si > sj), the event si > sj gets a high confidenceas C(si > sj). As the definition, Reli(si > sj) equals toP (S(si > sj), C(si > sj)). We can suppose the two eventsare independent, then

P (S(si > sj), C(si > sj)) = Psupp(S(si > sj))∗Pconf (C(si > sj))(9)

Therefore,

Reli(si > sj) =supp(si > sj)∑

st∈Sσsupp(si > st)

∗ conf(si > sj)∑st∈Sσ

conf(si > st)

(10)

131

where Sσ is a song set and each st in Sσ forms a song diffi-culty orderings si > st.The reliability of difficulty ordering will be used in the

process of song recommendation. See Section 5 for details.

4.3 Discover Song Difficulty OrderingsOne essential problem of discovering difficulty orderings

is to obtain the difficulty voting introduced in Definition 1.Specifically, we need to obtain the number nij that supportsi > sj and nji that opposes the ordering. To this end,we exploit the users’ ratings which implicitly reflect singers’opinions on song difficulty.Each user of the social singing community is considered

a singer. We first deduce the singer’s voting for song pairsi and sj as follows. Suppose user uk has sang si and sj ,if the listeners’ mean ratings for the two songs is rki < rkj .We know the listeners will give a high rating to the song ifthe user performs well. So the rating means uk perform sjbetter than si. This reflects si is more difficult than sj foruk. We can deduce uk’s voting preference for songs si andsj is si > sj .We discover song difficulty orderings from user’s singing

history as follows.

1. Discover the song pairs that are sung by at least θ users.

2. For each song pair si,sj , we deduce each singer’s votingfrom the rating and count the number of votes si > sjand sj > si get as nij and nji respectively.

3. Set the one in si > sj and sj > si with more votes assi,sj ’s song difficulty ordering.

Knowing the difficulty voting results of difficulty order-ings, we can also calculate the reliability of the discoveredsong difficulty orderings.

5. SONG RECOMMENDATION BASED ONDIFFICULTY ORDERINGS

In this section, we propose a song recommendation ap-proach based on the difficulty ordering introduced in theprevious section. The basic idea of our approach is to esti-mate the likelihood that a user can perform the songs well,called performance degree. The estimated performance de-gree is then used for song recommendation. For instance, ifwe know that the user has larger performance degree on songsi than sj , then we would prefer si for the recommendation.In this approach, to recommend songs to user uk, we firstestimate user’s performance degree on each song outside hersinging history (i.e., S−Sk), and then rank the songs basedon the performance degree as the final recommendation list.As mentioned in Section 4, we estimate the performance

degree based on the song difficulty orderings. Specifically,we first pick the songs in user uk’s singing list Sk that areevaluated by listeners as having been well-performed as seed-s. Then, we exploit the difficulty orderings to find the songswhich have large probabilities to be easier than the seeds,and utilize these probabilities to estimate the performancedegree.We formalize the above-mentioned idea in this section.

We formally define performance degree and how we inferthe performance degree in Section 5.1, and then develop aniterative algorithm in Section 5.2.

Sik

Sjkwij

Figure 3: Inference on Difficulty Graph

5.1 Estimating Performance DegreeWe formally define the performance degree, denoted by

pkj , of user uk on song sj as the probability that uk canperform well on sj , i.e.,

pkj = Pr{uk performs well on sj}To estimate songs’ performance degrees based on difficulty

orderings. We first organize all difficulty orderings into anordinary direct graph called difficulty graph. We denote theweight of edge from si to sj in the difficulty graph as wij .We define difficulty graph as follow:

Definition 6 (Difficulty Graph). A difficulty graphG is an ordinary directed weighted graph where the vertex5

set S are songs, each edge from song si to sj represents asong difficulty ordering with wij as the weight.

Next, we present how to estimate performance degree pkjfor song sj from the performance degrees of its nearby songsthrough probabilistic inference. Specifically, suppose thatthe performance degrees of the songs in sj ’s in-degree songset Sj

in are all known, shown in the gray vertices in Figure 3.We denote the event that uk performs well on sj as W (skj ).

Now we discusses how to infer pkj from the performance de-

gree of the songs in Sjin.

First we use sum and product rules of probability to get

Pr(W (skj )) =∑

ski ∈Sjin

[Pr(W (skj )|W (ski )) ∗ Pr(W (ski ))] (11)

We let

wij = Pr(W (skj )|W (ski )) (12)

which is the edge weight from si to sj . It represents theprobability the user can perform sj well, given that userperforms si well. It equals to the probability that judgmentski > skj is correct, i.e., the reliability of the song difficulty

ordering Reli(ski > skj ). Then, we have

wij =Reli(ski > skj )

=supp(ski > skj )∑

skt ∈Siout

supp(ski > skt )∗

conf(ski > skj )∑skt ∈Si

outconf(ski > skt )

(13)where Si

out represents the out-degree vertex set of si, shownas the white vertices in Figure 3.

Knowing the performance degree of sj ’s in-degree verticesand the edge weight, we can infer sj ’s performance degreepkj . However, user only sings very few songs in the difficultygraph, how to infer sj ’s performance degree if the perfor-mance degree of its in-degree vertices are unknown. In thenext part, we will introduce an iterative algorithm to dealwith this problem.

5We use vertex and song interchangeably

132

5.2 Iterative Probabilistic InferenceBased on the estimation method introduced in Section 5.1,

we introduce a graph-based inference algorithm called It-erative Probabilistic Inference (IPI) to compute unknownperformance degree of songs from the known performancedegree of songs in the singing history.We denote the performance degree of each vertex as d =

[d1, d2, ..., d|S|], where |S| is the number of vertices in thedifficulty graph. We construct the weight matrix W whichis a |S| × |S| matrix, where Wij equals to the weight fromsong si to sj , that is Wij = wij . Based on the performancedegree estimation described in Section 5.1, we can performprobabilistic inference in the difficulty graph as follows

d(t+ 1) = WTd(t) (14)

Because the initial performance degree d of each vertexis the same, each user’s recommendation result will be thesame. We need to give some “advantage” to user’s previ-ous singing songs. We construct a bonus vector as b =[b1, b2, ..., b|S|] and assign bonus 1 to the songs that the userhas sung in the past, bi = 1, otherwise bi = 0. The bonusvector allows user’s previous singing songs to have a prob-ability of getting an additional performance degree in eachiteration. Through the iteration, these vertices will also af-fect other vertices’ performance degrees.Then the probabilistic inference in each iteration can be

represented as

d(t+ 1) = αWTd(t) + (1− α)b (15)

where α represents the probability of inferring from the in-degree vertices, 1−α as the probability of getting the bonus.Then we repeatedly perform the probabilistic inference in

the difficulty graph until the vertices’ performance degreesconverge. Theoretically, the iteration will converge to (I −α ∗ WT )−1 ∗ b according to [18]. Empirically, we observedthat the iteration algorithm took three minutes to convergein our experiments. Because the calculation of edge weightin difficulty graph guarantees the sum of si’s out-degree edgeweight

∑st∈Si

outwit ≤ 1, we set Wii = 1 −

∑st∈Si

outwit to

make sure the iteration on the difficulty graph will converge.The complete algorithm description is given in Algorithm 1

Algorithm 1: Iterative Probabilistic Inference

1 Input: weight matrix W;bonus vector b2 Output: final performance degree vector d3 δ = 0.0001; error = ∞; t = -1;α ∈ [0, 1);4 d where di = 1/sizeof(d)5 while error > δ do6 t = t + 1;

7 d(t+1) = αWTd(t)+(1-α)b;8 error = d(t+1) - d(t);

9 d* = d(t)

6. EXPERIMENTSIn this section, we report the experimental setup and re-

sults. We first introduce the datasets being used in the ex-periments. After that, we report the baseline methods usedto compare with our algorithm. Then we introduce the e-valuation metrics used for measuring the performance of thealgorithms. Finally, we present and analyse our experimen-tal results.

6.1 Datasets6.1.1 Song Difficulty Ordering DatasetSince there is no publicly available dataset for singing-

song recommendation, we have collected the data from 5singwhich is the largest social singing community in China. In5sing, users can upload their song recordings, listen and rateother user’s recording. We chose this website because it hassufficient performance ratings for most users’ singing songs.

For song difficulty orderings discovery, we crawl song pairswhich have been sung by at least 10 users. The rating scorefor these song pairs must satisfy two conditions: the songs’rating scores in each song pair are non-zero and different.We model these song pairs into song difficulty orderings.Note that we will ignore song pairs whose ratings for bothsongs are the same. Finally we get a song difficulty orderingdataset. It contains total 5877 songs. We get totally 257666song difficulty orderings.

6.1.2 Singing-song Recommendation DatasetNow we describe the acquisition of our singing-song rec-

ommendation dataset. We download user’s singing historyfrom 5sing. Only songs appearing in the song difficulty or-derings are kept. Because most of the users sing only alimited number of songs, we crawl the users who have sungat least 60 songs to partly overcome the data sparsity prob-lem. Finally, we obtain a total of 2705 users. The numberof songs sung by each user is 96. Despite this selection, therecommendation dataset is still very sparse. Note that theevaluation result of the recommendation for sparse datasetis usually not so high. For example, the best MAP reportedby [12] is 0.048 which has 500 songs listened by each userin a 30520 song set. So we focus on comparing the relativeincrease in performance. We expect that in the future, thesocial singing community will develop the singing-song rec-ommendation system and provides abundant data for thesinging-song recommendation research.

For evaluation purposes, we randomly select 30% songsas the testing set, while the remaining one are used as thetraining set. We make sure there is no overlapping songbetween each user’s training set and testing set.

6.2 Evaluation MetricFor evaluation, we use precision@n, recall@n and precision

recall curve to compare the recommendation accuracy of ourmethod with the baseline methods. We use mean averageprecision (MAP), Normalized Discount Cumulative Gain@n(NDCG@n) to measure ranking performance.

Precision@n and recall@n shows the precision and recallvalue at the specific position in the ranking list. Precision@nis defined as the number of correctly recommended songs di-vided by the total number of songs recommended at positionn. While recall@n is defined as the correctly recommendedsongs divided by the total number of songs should be rec-ommend in the test set.

We also use 11-point interpolated average precision todemonstrate the precision change at all recall levels. Weplot the precision at 11 recall levels (0, 0.1, 0.2,..., 1.0) forthe average of all users. For each user, the precision at eachrecall level i is determined by taking the maximum preci-sion at any actual recall levels which are bigger than i. Thisis the precision recall curve for one user. After calculatingthe recall levels for each user, we average the precision atthe corresponding recall levels over all users to get the finalprecision recall curve.

133

We use mean average precision (MAP) [2] to compare theranking accuracy measured from the whole rank list. Foreach user, it first calculates the precision at the positionof each relevant song in the recommend list. Then theseprecision values are reduced by their rank positions so thattop ranked relevant documents contribute more to the finalscore. The sum of the precision values is averaged by thetotal number of correctly recommended items. Now we getthe average precision (AP). MAP is calculated by averagingthe AP value of each query.

MAP =1

|Q|∑q∈Q

∑Nn=1(Precision@n ∗ rel(n))

Rq(16)

where Q is the test query set, Rq is q′s relevant document,rel(n) is a binary function on the relevance of a given rank,N is the number of retrieved documents.We use NGCG@n [11] to measure the accuracy of the rank

order at specific position of the ranking result. It calculatesas follow

NDCG(n) =1

Zn

n∑j=1

2r(j) − 1

log2(j + 1)(17)

where j is the predicted rank position , r(j) is the ratingvalue of j-th document in the ground-truth rank list, Zn isthe normalization factor which is the discounted cumulativegain in the n− th position of the ground-truth rank list. Weconsider r(j) as 1 if the song is sang by the user, otherwise0.

6.3 Baseline MethodsWe denote our iterative probabilistic inference algorithm

as IPI. We compare IPI with four baselines.The first classic baseline is the user-based Collaborative

Filtering (CF) method. We build the user-song matrix usinguser’s singing history. Songs sang by each user will have a5-graded rating. Our objective is to predict the rating forother unrated songs. CF utilizes users’ current song ratingsand the similarity between users to predict the other songs’rating. We recommend songs which have high predictedrating scores. Suppose we want to get the predicted ratingvalue for user ut and song si, we get the rating value rti as

rti =∑

uj∈S(ut,k)∩N(si)

Sim(ut, uj)∑uj∈S(ut,k)∩N(si)

Sim(ut, uj)∗ rji

(18)Where S(ut, k) is the top k similar users with the target

user ut, N(si) is the user set that has rating for song si, rji is

user uj ’s rating for song si and Sim(ut, uj) is the similaritybetween user ut and uj . We set the number of the nearestneighbor k to 20 to achieve the best performance. For theuser’s similarity, we use the cosine distance to calculate thesimilarity between two users’ song rating vectors.Another popular baseline is graph based song recommen-

dation algorithm. We use the similar baseline as [7]. Webuild an undirected graph with song and user as vertex,user’s singing relation as edge. We use the graph rankingalgorithm proposed in [9]. We call this baseline as OrdinaryGraph based Recommendation (OGR).Because measuring the reliability of song difficulty order-

ings is the key for singing-song recommendation. We com-pare the result of singing-song recommendation using onlysupport or confidence to estimate the reliability of difficultyorderings. We denote these two baselines as IPI-supp andIPI-conf respectively.

For IPI-supp, the edge weight of difficulty graph can becalculated as

wij =supp(si > sj)∑

st∈Siout

supp(si > st)(19)

For IPI-conf, the edge weight of difficulty graph can becalculated as

wij =conf(si > sj)∑

st∈Siout

conf(si > st)(20)

6.4 Experimental Results

6.4.1 Baseline ComparisonWe use the metrics mentioned in Section 6.2 to compare

the performance of different algorithms. We fix the commonrandom walk parameter α shared by OGR, IPI-supp, IPI-conf, IPI as 0.85. We estimate human’s voting accuracyusing the equation 7, we get p = 0.71. Figure 4 demonstratesthe song ranking performance of the five algorithms.

Figure 4(a) and Figure 4(b) are the precision and recal-l results of the five algorithms. We find our method IPIoutperform baseline CF, OGR, IPI-supp and IPI-conf by anaverage of 180%, 36%, 28% 18% in precision and 190%, 32%,29% 19% in recall. For IPI, IPI-supp, IPI-conf, those meth-ods that make recommendation on difficulty graph performbetter than the traditional song recommendation algorithmsOGR and CF. This is because CF and OGR ignore the song’sdifficulty in performance during recommendation. It provesthe effectiveness of song difficulty orderings in the singing-song recommendation. Figure 4(c) and Figure 4(d) are theperformance results measured by NDCG@n and precisionrecall curve, it shows the similar results. IPI is always thebest and those baselines who do recommendation on diffi-culty graph are also much better than the traditional songrecommendation algorithms.

The above results show that IPI is better than IPI-suppand IPI-conf in all metrics. The difference between IPI andIPI-supp, IPI-conf is that the later only uses single factorsupport or confidence to estimate the reliability of the songdifficulty orderings. This proves that using both factors tomeasure the reliability is better than the single one. This isreasonable because difficulty orderings with high support on-ly represent the absolute number of people who support thedifficulty orderings, it is possible that there are also manypeople who do not agree with the difficulty orderings. How-ever, confidence considers the agreement of two sides, whileignoring the possible bias caused by the shortage of judg-ments. Adding both of them in reliability estimation over-comes the shortage of both factors.

Table 1 is the performance result measured from MAP.We find IPI outperform CF and OGR by an average of 160%and 40% respectively. But the absolute values are quite low,this is caused by the sparseness problem of the dataset. Wecompare the sparseness of our data set with the one reportedin [12]. In their dataset, the number of songs listened byeach user is 1.6% of the total song set. Their precision recallcurve also shows a fast decrease in the precision and the bestMAP is 0.048. [17] is another example. In our dataset, thenumber of songs sang by each user is also about 1.6% of thetotal song set. The low performance value is reasonable.

Table 1: Ranking accuracy in terms of MAPAlgorithm IPI IPI-conf IPI-supp OGR CFMAP value 0.061 0.051 0.048 0.043 0.024

134

10 30 50 70 90

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

Pre

cis

ion

@n

n

CF

OGR

IPI-supp

IPI-conf

IPI

(a) Precision@n

10 30 50 70 90

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0.18

0.20

0.22

Re

ca

ll@n

n

CF

OGR

IPI-supp

IPI-conf

IPI

(b) Recall@n

10 30 50 70 90

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0.18

0.20

ND

CG

@n

n

CF

OGR

IPI-supp

IPI-conf

IPI

(c) NDCG@n

0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Pre

cis

ion

Recall

CF

OGR

IPI-supp

IPI-conf

IPI

(d) Precision-recall Curve

Figure 4: Performance Comparison of the Five Algorithms

10 30 50 70 90

0.000

0.005

0.010

0.015

0.020

0.025

Pre

cis

ion

@n

n

CF

OGR

IPI-supp

IPI-conf

IPI

(a) Precision@n

10 30 50 70 90

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0.18

0.20

0.22

0.24

0.26

0.28

0.30

Re

ca

ll@n

n

CF

OGR

IPI-supp

IPI-conf

IPI

(b) Recall@n

10 30 50 70 90

0.00

0.02

0.04

0.06

0.08

0.10

0.12

ND

CG

@n

n

CF

OGR

IPI-supp

IPI-conf

IPI

(c) NDCG@n

Figure 5: Performance Comparison for Cold Start User

6.4.2 Test for Cold Start User

We also test the performance of the algorithms for coldstart users. We consider those users who sing between 7 to10 songs in the training set as cold start users. Limiting thenumber of each user’s singing-song is to ensure there will beat least 3 songs in the testing set for the evaluation purpose.The number of users in the sparse dataset is 3000.Figure 5(a) and Figure 5(b) is the precision and recall

results of the five methods on the sparse dataset. We findthe performance of IPI-conf is lower than IPI-supp which isopposite to the above results. This is caused by the error inreliability estimation using only support factors. For thoseconjectures with few supports, even if the confidence is high,it still exhibits errors when estimating the reliability. Forcold start users, they do not have enough previous singingsongs as query to overcome the error cause by confidencefactor. While support can also reflect the popularity of thesong difficulty ordering. Large number of support means itis more possible that many people had sung the song pair.For cold start user, recommending popular songs through

Table 2: Statistical significance comparison betweenIPI and two IPI variantsPrecision@n P@10 P@30 P@50 P@70 P@90IPI-supp 1.384 5.2259 8.4185 11.0886 13.2275IPI-conf 3.9934 6.5927 10.3389 12.7425 15.097

popular difficulty orderings may perform better. Our IPIagain performs better than the two baselines which showsagain the measuring the reliability through both the supportand confidence is effective. Figure 5(c) is the performancemeasured from NDCG@n which shows the same results.

Because the precision differences between IPI and two IP-I variants are minor in the cold start dataset. We conductthe significance test(t-Test) on the results. For each method,we compute a list of precisions@n, each corresponding to therecommendation for a user. Given IPI and a baseline(IPI-supp or IPI-conf), we compute t-ratio over their precisionlists. Table 2 shows the t-ratios between IPI and two base-lines. We find most t-ratios are larger than the cutoff of3.3 (nearest df in t-test table is 1000 under p<0.001), whichshows IPI’s improvement is significant. The only exception

135

is the result of the precision@10 for IPI-supp, which is causedby the data sparse problem.For the baseline OGR, the performance drops a lot on

the sparse dataset. It is caused by the existence of largeamounts of isolated vertices in song-user graph. Due to thesparseness of the dataset, many songs in the test set had notbeen sung by any user in the train set. It is impossible tobe recommended based on the song-user graph. By lookinginto the graph, we find 1391 out of 5877 songs had not beensung by any user in the training set. While our difficultygraph will not be affected by this data sparseness problem.This is because our difficulty graph is pre-built, the itera-tive algorithm will not largely be affected by the shortage ofuser’s singing relations.

7. CONCLUSIONSIn this paper, we address the singing-song recommenda-

tion problem in social singing communities. We propose amethod to discover song difficulty orderings from the songperformance ratings of each user. We also estimate the relia-bility of the discovered difficulty orderings based on the sup-port and confidence of the difficulty orderings. For singing-song recommendation, we model the discovered difficulty or-derings as a difficulty graph that captures the difficulty re-lation in performance between different songs. An iterativeprobabilistic inference algorithm is proposed to estimate theperformance degrees of all the other vertices in the graphfrom each user’s past singing history. The experiments havebeen performed on a dataset collected from an actual socialsinging community (5sing). The result shows our proposedframework is more effective than the traditional song rec-ommendation methods.For future work, we will conduct user study to see if the

amateur singers are satisfied with the recommendation gen-erated by this system. Secondly, because the discovered songdifficulty orderings can also be used in other social singingcommunities, we will focus on discovering more difficultyorderings from other sources of social singing communityto maintain a complete difficulty orderings database. An-other direction for singing song recommendation is that wewill exploit other factors such as users’ singing preference aswell as friendship relations to see if they can help make therecommendation better.

8. ACKNOWLEDGEMENTThis research was carried out at the NUS-ZJU SeSaMe

Centre. It is supported by the Singapore NRF under itsIRC@SG Funding Initiative and administered by the IDM-PO. This research was also supported by the National HighTechnology Research and Development Program of China(Grant No. 2014AA015205).

9. REFERENCES[1] S. Agarwal. Ranking on graph data. In W. W. Cohen

and A. Moore, editors, ICML, volume 148, pages25–32. ACM, 2006.

[2] R. Baeza-Yates and B. Ribeiro-Neto. ModernInformation Retrieval. Addison-Wesley, 1999.

[3] S. Baluja, R. Seth, D. Sivakumar, Y. Jing, J. Yagnik,S. Kumar, D. Ravichandran, and M. Aly. Videosuggestion and discovery for youtube: Taking random

walks through the view graph. WWW ’08, pages895–904, New York, NY, USA, 2008. ACM.

[4] M. Belkin, I. Matveeva, and P. Niyogi. Regularizationand semi-supervised learning on large graphs. volume3120, pages 624–638. Springer, 2004.

[5] M. Belkin and P. Niyogi. Semi-supervised learning onriemannian manifolds. Machine Learning,56(1-3):209–239, 2004.

[6] J. S. Breese, D. Heckerman, and C. M. Kadie.Empirical analysis of predictive algorithms forcollaborative filtering. CoRR, abs/1301.7363, 2013.

[7] J. Bu, S. Tan, C. Chen, C. Wang, H. Wu, L. Zhang,and X. He. Music recommendation by unifiedhypergraph: Combining social media information andmusic content. MM ’10, pages 391–400, New York,NY, USA, 2010. ACM.

[8] D. Goldberg, D. A. Nichols, B. M. Oki, and D. B.Terry. Using collaborative filtering to weave aninformation tapestry. Commun. ACM, 35(12):61–70,1992.

[9] T. H. Haveliwala. Topic-sensitive pagerank. InProceedings of the Eleventh International World WideWeb Conference, 2002.

[10] K. Hoashi, K. Matsumoto, and N. Inoue.Personalization of user profiles for content-basedmusic retrieval based on relevance feedback. In ACMMultimedia, pages 110–119, 2003.

[11] K. Jarvelin and J. Kekalainen. Ir evaluation methodsfor retrieving highly relevant documents. In SIGIR,pages 41–48, 2000.

[12] I. Konstas, V. Stathopoulos, and J. M. Jose. On socialnetworks and collaborative recommendation. SIGIR’09, pages 195–202, New York, NY, USA, 2009. ACM.

[13] K. Mao, X. Luo, K. Chen, G. Chen, and L. Shou.myDJ: recommending karaoke songs from one’s ownvoice. In SIGIR, page 1009. ACM, 2012.

[14] B. M. Sarwar, G. Karypis, J. A. Konstan, andJ. Riedl. Item-based collaborative filteringrecommendation algorithms. In WWW, pages285–295. ACM, 2001.

[15] L. Shou, K. Mao, X. Luo, K. Chen, G. Chen, andT. Hu. Competence-based song recommendation. InSIGIR, pages 423–432. ACM, 2013.

[16] X. Wu, Q. Liu, E. Chen, L. He, J. Lv, C. Cao, andG. Hu. Personalized next-song recommendation inonline karaokes. In Proceedings of the 7th ACMConference on Recommender Systems, RecSys ’13,pages 137–140, New York, NY, USA, 2013. ACM.

[17] M. Ye, P. Yin, W.-C. Lee, and D. L. Lee. Exploitinggeographical influence for collaborativepoint-of-interest recommendation. In SIGIR, pages325–334. ACM, 2011.

[18] D. Zhou, J. Huang, and B. Scholkopf. Learning fromlabeled and unlabeled data on a directed graph. InICML, volume 119, pages 1036–1043. ACM, 2005.

[19] D. Zhou, J. Weston, A. Gretton, O. Bousquet, andB. Scholkopf. Ranking on data manifolds. In Advancesin Neural Information Processing Systems 16. MITPress, 2004.

136

Documents

Song Recommendation for Social Singing Community