Click here to load reader

Analysis of Sentiment Communities in Online Networksceur-ws.org/Vol-1421/paper4.pdf · Analysis of Sentiment Communities in Online Networks Davide Feltoni Gurini, Fabio Gasparetti,

Embed Size (px)

Citation preview

Analysis of Sentiment Communities in Online Networks

Davide Feltoni Gurini, Fabio Gasparetti,Alessandro Micarelli and Giuseppe Sansonetti

Roma Tre UniversityDepartment of EngineeringVia della Vasca Navale 79

Rome, 00146 Italy{feltoni,gaspare,micarel,gsansone}@dia.uniroma3.it

ABSTRACTThis article reports our experience in developing a recom-mender system (RS) able to suggest relevant people to thetarget user. Such a RS relies on a user profile represented asa set of weighted concepts related to the users interests. Theweighting function, we named sentiment-volume-objectivity(SVO) function, takes into account not only the users senti-ment toward his/her interests, but also the volume and ob-jectivity of related contents. A clustering technique basedon modularity optimization enables us to identify the latentsentiment communities. A preliminary experimental evalua-tion on real-world datasets from Twitter shows the benefitsof the proposed approach and allows us to make some con-siderations about the detected communities.

Categories and Subject DescriptorsH.3.3 [Information Search and Retrieval]: [InformationFiltering]

General TermsAlgorithms, Experimentation

1. BACKGROUNDThe rationale behind this work is that users may share

similar interests, but have different opinions about them.Therefore, we extend the traditional approaches to user rec-ommendation through sentiments and opinions extractedfrom the user-generated content in order to improve the ac-curacy of suggestions. This way, we can identify latent senti-ment communities of users. As far as we know, there are fewworks on considering users attitudes for community detec-tion or user recommendation. In [6] the authors formulatethe problem of sentiment community discovery as a semidef-inite programming (SDP) problem and solve it through anSDP-based rounding method. Nguyen et al. [5] address theproblem of clustering blog communities into groups based

on users sentiments, and propose a non-parametric clus-tering algorithm for its solution. Yang and Manandhar [7]propose two community discovery models by combining so-cial links, author based topics and sentiment informationto detect communities with different sentiment-topic distri-butions. Unlike previous works, we consider not only thetarget users attitudes toward his/her interests, but also thevolume and objectivity of related generated contents.

2. PROPOSED APPROACHTraditional approaches to user recommendation rely on

the definition of a similarity measure between two usersu and v. Given the target user u, the ranked list of sug-gested users corresponds to the set of users v that maximizethe aforementioned measure. Content-based approaches onTwitter 1 define this measure by analyzing user tweets. Con-cepts dealt with by a user are identified through hashtagscontained in his/her tweets, namely, the metadata tags thatare used in Twitter to indicate the context or the flow atweet is associated with. Thus, we define the profile p of theuser u as the set of weighted concepts:

p(u) = {(c, (u, c))|c Cu} (1)

where (u, c) is the relevance of the concept c for the useru, and Cu is the set of concepts cited by the user u. Theuser profile representation is generated by monitoring theuser activity, that is, all the tweets included in the observa-tion period. Afterwards, given two users u and u, and theirprofiles p(u) and p(v), the similarity function is defined interms of cosine similarity:

sim(u,v)=sim(p(u),p(v))=

cCuCv (u,c)(v,c)

cCu (u,c)2

cCv (v,c)2

(2)

where Cu and Cv are the concepts in the profiles of users uand v, respectively. The idea behind this work is that takinginto account users attitudes towards their interests can yieldbenefits in recommending friends to follow. Specifically, weconsider three contributions: 1) S(u, c), that is, the senti-ment expressed by the user u for the concept c; 2) V (u, c),that is, how much he/she is interested in that concept; 3)O(u, c), that is, how much he/she expresses objective com-ments on it. The details regarding the computation of suchcontributions can be found in [2]. Based on those contribu-tions, we propose a weighting function, we called sentiment-volume-objectivity (SVO) function, that takes into account

1twitter.com

Figure 1: Communities for the Apple concept, de-tected through the SVO profiling and clustering.

all of them. It is defined as follows:

SV O(u, c) = S(u, c) + V (u, c) + O(u, c) (3)

where , , and are three constants [0, 1], such that + + = 1. The function SV O(u, c) [0, 1] is theweighting function (u, c) that appears in Equations 1 and 2.Once the similarities between users are computed, we builda graph for each concept as follows: if the similarity valuebetween users exceeds a threshold value , we consider anedge between them. The optimal value for was determinedthrough a gradient descent algorithm that maximizes therecommender precision. Such value was 0.8. Afterwards,a clustering algorithm based on modularity optimization [1]allows us to detect the latent communities for the consideredconcept c.

3. EXPERIMENTAL EVALUATIONExperimental tests were performed on three datasets gath-

ered from Twitter through its APIs 2 by searching for spe-cific hashtags. Dataset1 was obtained during the 2013 Ital-ian political elections. We retrieved the Twitter streamsabout politician leaders and Italian parties from January25th to February 27th. The final dataset counted 1,085,121tweets in Italian language and 70,977 unique users. Dataset2was obtained searching for hashtags and keywords represent-ing the most important mobile tech companies such as Sam-sung, Apple, and Nokia. The dataset was gathered fromSeptember 2014 to February 2015 considering only Italiantweets, and counted 3,511,455 tweets from 181,000 users.Dataset3 was obtained analyzing English tweets on the au-tomotive landscape. To this aim, we searched terms such asAudi, BMW, and Ferrari. The collection set, gathered fromDecember 2014 to February 2015, counted 2,915,131 tweetsfrom 110,350 users. Figure 1 shows the different communi-ties detected for the Apple concept (dataset2). The bottom,right, isolated, community marked with number one includesusers not interested in Apple ((i.e., their SVO value is zero).The bottom, left, community with number two consists ofusers with low interest (i.e., low value of volume) and nega-

2dev.twitter.com

Table 1: A comparison among different techniquesUser Recommender Dataset1 Dataset2 Dataset3

Our Approach 0.177 0.195 0.185S1-Twittomender 0.130 0.118 0.115VSM (Hashtag) 0.127 0.099 0.105

Table 2: SVO parameters for the three datasetsParameter Dataset1 Dataset2 Dataset3

0.45 0.25 0.28 0.45 0.50 0.52 0.10 0.25 0.20

tive sentiment. The community labeled with number three ischaracterized by high interest and negative sentiment. Thecentral communities identified with numbers four, five, six,and seven encompass users with high interest and positivesentiment (with different SVO combinations for each com-munity). Users belonging to communities eight and ninehave high values of objectivity, namely, they generate ob-jective contents with few opinions. Among such users, forexample, we find online newspapers such as BBC and CNN.

In order to evaluate our approach, we relied on the ho-mophily [4] phenomenon, that is, the tendency of individu-als with similar characteristics to associate with each other.For each dataset, we selected 1000 users that (i) posted atleast 50 tweets in the observed period, and (ii) had morethan 30 friends and followers. Table 1 reports the results interms of Success at Rank 10 (S@10) of a comparative anal-ysis of our system with two state-of-the-art functions: 1) acontent-based function, called S1-Twittomender [3], whereusers are profiled through the content of their tweets; and2) a VSM (Hashtag) function representing cosine similarityin a vector space model, where vectors are weighted hash-tags. Table 2 shows the values of SVO parameters , ,and that maximize the performance of the recommender.Such values were determined through a mini-batch gradientdescent algorithm. Based on the proposed model and theused datasets, these weights appear to highlight the contri-bution of volume and sentiment in dataset1, and objectivityin dataset2 and dataset3. This can be explained becausedataset2 (technology) and dataset3 (automotive) are likelyto contain more news and articles with few opinions andsentiments than dataset1 (politics).

4. CONCLUSIONS AND FUTURE WORKIn this article, we have presented some results of our

research work whose aim is to exploit the implicit senti-ment analysis for improving the performance of user recom-menders. In particular, we have reported some preliminaryconsiderations on the sentiment communities that our ap-proach allows us to identify.

Our work is still at an exploratory stage, so several itsaspects have to be further developed. Among others, weplan to perform an in-depth sensitivity analysis to studyhow user preferences, social interactions, and dataset char-acteristics can affect SVO parameters , , and . Theirrole is indeed crucial, since they define the contribution ex-tent of sentiment, volume, and objectivity that determinethe distribution of users in sentiment communities.

5. REFERENCES[1] V. Blondel, J. Guillaume, R. Lambiotte, and E. Mech.

Fast unfolding of communities in large networks. J.Stat. Mech, 2008.

[2] D. F. Gurini, F. Gasparetti, A. Micarelli, andG. Sansonetti. A sentiment-based approach to twitteruser recommendation. In Proc. of the Fifth RSWebWorkshop co-located with the Seventh ACM RecSysConference, Hong Kong, China, 2013.

[3] J. Hannon, M. Bennett, and B. Smyth. Recommendingtwitter users to follow using content and collaborativefiltering approaches. Proceedings of the 4th ACMRecSys Conference, 26-30(10):8, 2010.

[4] M. McPherson, L. S. Lovin, and J. M. Cook. Birds of afeather: Homophily in social networks. Annual Reviewof Sociology, 27(1):415444, 2001.

[5] T. Nguyen, D. Q. Phung, B. Adams, and S. Venkatesh.A sentiment-aware approach to community formationin social media. In ICWSM. The AAAI Press, 2012.

[6] K. Xu, J. Li, and S. S. Liao. Sentiment communitydetection in social networks. In Proc. of the 2011iConference, pages 804805, New York, NY, USA, 2011.

[7] B. Yang and S. Manandhar. Community discoveryusing social links and author-based sentiment topics. InAdvances in Social Networks Analysis and Mining(ASONAM), 2014 IEEE/ACM InternationalConference on, pages 580587, 2014.