Wu, Shaomei Hofman, Jake M; Mason, Winter y Watss, Duncan. (2011). “Who Says What to Whom on Twitter”

Embed Size (px)

Citation preview

  • 8/19/2019 Wu, Shaomei Hofman, Jake M; Mason, Winter y Watss, Duncan. (2011). “Who Says What to Whom on Twitter”

    1/10

    Who Says What to Whom on Twitter

    Shaomei Wu ∗Cornell University, [email protected]

    Jake M. HofmanYahoo! Research, NY, USA

    [email protected]

    Winter A. MasonYahoo! Research, NY, USA

    [email protected]

    Duncan J. WattsYahoo! Research, NY, USA

    [email protected]

    ABSTRACTWe study several longstanding questions in media communi-cations research, in the context of the microblogging serviceTwitter, regarding the production, ow, and consumption of information. To do so, we exploit a recently introduced fea-ture of Twitter known as “lists” to distinguish between eliteusers—by which we mean celebrities, bloggers, and represen-

    tatives of media outlets and other formal organizations—andordinary users. Based on this classication, we nd a strik-ing concentration of attention on Twitter, in that roughly50% of URLs consumed are generated by just 20K eliteusers, where the media produces the most information, butcelebrities are the most followed. We also nd signicanthomophily within categories: celebrities listen to celebrities,while bloggers listen to bloggers etc; however, bloggers ingeneral rebroadcast more information than the other cate-gories. Next we re-examine the classical “two-step ow” the-ory of communications, nding considerable support for iton Twitter. Third, we nd that URLs broadcast by differentcategories of users or containing different types of contentexhibit systematically different lifespans. And nally, we ex-amine the attention paid by the different user categories to

    different news topics.

    Categories and Subject DescriptorsH.1.2 [Models and Principles ]: User/Machine Systems;J.4 [Social and Behavioral Sciences ]: Sociology

    General Termstwo-step ow, communications, classication

    KeywordsCommunication networks, Twitter, information ow

    1. INTRODUCTIONA longstanding objective of media communications re-

    search is encapsulated by what is known as Lasswell’s maxim:∗ Part of this research was performed while the author wasvisiting Yahoo! Research, New York. The author was alsosupported by NSF grant IIS-0910664.

    Copyright is held by the International World Wide Web Conference Com-mittee (IW3C2). Distribution of these papers is limited to classroom use,and personal use by others.WWW 2011 , March 28–April 1, 2011, Hyderabad, India.ACM 978-1-4503-0637-9/11/03.

    “who says what to whom in what channel with what ef-fect” [12], so-named for one of the pioneers of the eld,Harold Lasswell. Although simple to state, Laswell’s maximhas proven difficult to answer in the more-than 60 yearssince he stated it, in part because it is generally difficult toobserve information ows in large populations, and in partbecause different channels have very different attributes andeffects. As a result, theories of communications have tendedto focus either on “mass” communication, dened as “one-way message transmissions from one source to a large, rela-tively undifferentiated and anonymous audience,” or on “in-terpersonal” communication, meaning a “two-way messageexchange between two or more individuals.” [16].

    Correspondingly, debates among communication theoristshave tended to revolve around the relative importance of these two putative modes of communication. For exam-ple, whereas early theories such as the “hypodermic needle”model posited that mass media exerted direct and relativelystrong effects on public opinion, mid-century researchers [13,9, 14, 4] argued that the mass media inuenced the pub-lic only indirectly, via what they called a two-step ow of communications, where the critical intermediate layer was

    occupied by a category of media-savvy individuals calledopinion leaders . The resulting “limited effects” paradigmwas then subsequently challenged by a new generation of researchers [6], who claimed that the real importance of themass media lay in its ability to set the agenda of publicdiscourse. But in recent years rising public skepticism of mass media, along with changes in media and communica-tion technology, have tilted conventional academic wisdomonce more in favor of interpersonal communication, whichsome identify as a “new era” of minimal effects [2].

    Recent changes in technology, however, have increasinglyundermined the validity of the mass vs. interpersonal di-chotomy itself. On the one hand, over the past few decadesmass communication has experienced a proliferation of newchannels, including cable television, satellite radio, special-

    ist book and magazine publishers, and of course an arrayof web-based media such as sponsored blogs, online com-munities, and social news sites. Correspondingly, the tra-ditional mass audience once associated with, say, networktelevision has fragmented into many smaller audiences, eachof which increasingly selects the information to which it isexposed, and in some cases generates the information it-self [15]. Meanwhile, in the opposite direction interpersonalcommunication has become increasingly amplied throughpersonal blogs, email lists, and social networking sites to

    WWW 2011 – Session: Diffusion March 28–April 1, 2011, Hyderabad, India

    705

  • 8/19/2019 Wu, Shaomei Hofman, Jake M; Mason, Winter y Watss, Duncan. (2011). “Who Says What to Whom on Twitter”

    2/10

    afford individuals ever-larger audiences. Together, thesetwo trends have greatly obscured the historical distinctionbetween mass and interpersonal communications, leadingsome scholars to refer instead to “masspersonal” communi-cations [16].

    A striking illustration of this erosion of traditional me-dia categories is provided by the micro-blogging platformTwitter. For example, the top ten most-followed users onTwitter are not corporations or media organizations, butindividual people, mostly celebrities. Moreover, these indi-viduals communicate directly with their millions of followersvia their tweets, often managed by themselves or publicists,thus bypassing the traditional intermediation of the massmedia between celebrities and fans. Next, in addition toconventional celebrities, a new class of“semi-public” individ-uals like bloggers, authors, journalists, and subject matterexperts has come to occupy an important niche on Twit-ter, in some cases becoming more prominent (at least interms of number of followers) than traditional public guressuch as entertainers and elected officials. Third, in spiteof these shifts away from centralized media power, mediaorganizations—along with corporations, governments, andNGOs—all remain well represented among highly followed

    users, and are often extremely active. And nally, Twitteris primarily made up of many millions of users who seemto be ordinary individuals communicating with their friendsand acquaintances in a manner largely consistent with tra-ditional notions of interpersonal communication.

    Twitter, therefore, represents the full spectrum of commu-nications from personal and private to“masspersonal”to tra-ditional mass media. Consequently it provides an interestingcontext in which to address Lasswell’s maxim, especially asTwitter—unlike television, radio, and print media—enablesone to easily observe information ows among the membersof its ecosystem. Unfortunately, however, the kinds of ef-fects that are of most interest to communications theorists,such as changes in behavior, attitudes, etc., remain difficultto measure on Twitter. Therefore in this paper we limitour focus to the “who says what to whom” part of Laswell’smaxim.

    To this end, our paper makes three main contributions:

    • We introduce a method for classifying users using Twit-ter Lists into “elite” and “ordinary”users, further clas-sifying elite users into one of four categories of interest—media, celebrities, organizations, and bloggers.

    • We investigate the ow of information among thesecategories, nding that although audience attention ishighly concentrated on a minority of elite users, muchof the information they produce reaches the massesindirectly via a large population of intermediaries.

    • We nd that different categories of users emphasize dif-ferent types of content, and that different content typesexhibit dramatically different characteristic lifespans,ranging from less than a day to months.

    The remainder of the paper proceeds as follows. In thenext section, we review related work. In Section 3 we dis-cuss our data and methods, including Section 3.3 in whichwe describe how we use Twitter Lists to classify users, out-line two different sampling methods, and show that theydeliver qualitatively similar results. In Section 4 we ana-lyze the production of information on Twitter, particularly

    who pays attention to whom. In section 4.1, we revisit thetheory of the two-step ow—arguably the dominant theoryof communications for much of the past 50 years—ndingconsiderable support for the theory. In Section 5, we con-sider “who listens to what”, examining rst who shares whatkinds of media content, and second the lifespan of URLs as afunction of their origin and their content. Finally, in Section6 we conclude with a brief discussion of future work.

    2. RELATED WORKAside from the communications literature surveyed above,

    a number of recent papers have examined information dif-fusion on Twitter. Kwak et al. [11] studied the topologicalfeatures of the Twitter follower graph, concluding from thehighly skewed nature of the distribution of followers and thelow rate of reciprocated ties that Twitter more closely resem-bled an information sharing network than a social network—a conclusion that is consistent with our own view. In ad-dition, Kwak et al. compared three different measures of inuence—number of followers, page-rank, and number of retweets—nding that the ranking of the most inuentialusers differed depending on the measure. In a similar vein,Cha et al. [3] compared three measures of inuence—numberof followers, number of retweets, and number of mentions—and also found that the most followed users did not neces-sarily score highest on the other measures. Weng et al. [17]compared number of followers and page rank with a modiedpage-rank measure which accounted for topic, again ndingthat ranking depended on the inuence measure. Finally,Bakshy et al. [1] studied the distribution of retweet cascadeson Twitter, nding that although users with large followercounts and past success in triggering cascades were on aver-age more likely to trigger large cascades in the future, thesefeatures are in general poor predictors of future cascade size.

    Our paper differs from this earlier work by shifting atten-tion from the ranking of individual users in terms of variousinuence measures to the ow of information among differ-ent categories of users. In this sense, it is related to recentwork by Crane and Sornette [5], who posited a mathemati-cal model of social inuence to account for observed tempo-ral patterns in the popularity of YouTube videos, and alsoto Gomez et al [7], who studied the diffusion of informa-tion among blogs and online news sources. Here, however,our focus is on identifying specic categories of “elite” users,who we differentiate from “ordinary” users in terms of theirvisibility, and understanding their role in introducing infor-mation into Twitter, as well as how information originatingfrom traditional media sources reaches the masses.

    3. DATA AND METHODS

    3.1 Twitter Follower GraphIn order to understand how information is transmitted on

    Twitter, we need to know the channels by which it ows;that is, who is following whom on Twitter. To this end, weused the follower graph studied by Kwak et al. [11], whichincluded 42M users and 1.5B edges. This data representsa crawl of the graph seeded with all users on Twitter asobserved by July 31st, 2009, and is publicly available 1 . Asreported by Kwak et al. [11], the follower graph is a directed1 The data is free to download fromhttp://an.kaist.ac.kr/traces/WWW2010.html

    WWW 2011 – Session: Diffusion March 28–April 1, 2011, Hyderabad, India

    706

  • 8/19/2019 Wu, Shaomei Hofman, Jake M; Mason, Winter y Watss, Duncan. (2011). “Who Says What to Whom on Twitter”

    3/10

    network characterized by highly skewed distributions both of in-degree (# followers) and out-degree (# “friends”, Twitternomenclature for how many others a user follows); however,the out-degree distribution is even more skewed than thein-degree distribution. In both friend and follower distribu-tions, for example, the median is less than 100, but the max-imum # friends is several hundred thousand, while a smallnumber of users have millions of followers. In addition, thefollower graph is also characterized by extremely low reci-procity (roughly 20%)—in particular, the most-followed in-dividuals typically do not follow many others. The Twitterfollower graph, in other words, does not conform to the usualcharacteristics of social networks, which exhibit much higherreciprocity and far less skewed degree distributions [10], butinstead resembles more the mixture of one-way mass com-munications and reciprocated interpersonal communicationsdescribed above.

    3.2 Twitter FirehoseIn addition to the follower graph, we are interested in the

    content being shared on Twitter, and so we examined thecorpus of all 5B tweets generated over a 223 day period fromJuly 28, 2009 to March 8, 2010 using data from the Twitter

    “rehose,” the complete stream of all tweets2

    . Because ourobjective is to understand the ow of information, it is use-ful for us to restrict attention to tweets containing URLs,for two reasons. First, URLs add easily identiable tags toindividual tweets, allowing us to observe when a particularpiece of content is either retweeted or subsequently reintro-duced by another user. And second, because URLs pointto online content outside of Twitter, they provide a muchricher source of variation than is possible in the typical 140character tweet 3 . Finally, we note that almost all URLsbroadcast on Twitter have been shortened using one of anumber of URL shorteners, of which the most popular ishttp://bit.ly/. From the total of 5B tweets recorded duringour observation period, therefore, we focus our attention onthe subset of 260M containing bit.ly URLs; thus all subse-quent counts are implicitly understood to be restricted tothis content.

    3.3 Twitter ListsOur method for classifying users exploits a relatively re-

    cent feature of Twitter: Twitter Lists. Since its launch onNovember 2, 2009, Twitter Lists have been used extensivelyto group sets of users into topical or other categories, andthereby to better organize and/or lter incoming tweets. Tocreate a Twitter List, a user provides a name (required) anddescription (optional) for the list, and decides whether thenew list is public (anyone can view and subscribe to this list)or private (only the list creator can view or subscribe to thislist). Once a list is created, the user can add/edit/delete

    list members. As the purpose of Twitter Lists is to helpusers organize users they follow, the name of the list canbe considered a meaningful label for the listed users. The

    2 http://dev.twitter.com/doc/get/statuses/rehose3 Naturally, this restriction also has downsides, in particularthat some users may be more likely to include URLs in theirtweets than others, and thus will appear to be relativelymore active and/or have more impact than if we were insteadto consider all tweets. For our purposes, however, we believethat the practical advantages of the restriction outweigh thepotential for bias.

    classication of users can therefore effectively exploit the“wisdom of crowds” with these created lists, both in termsof their importance to the community (number of lists onwhich they appear), and also how they are perceived (e.g.news organization vs. celebrity, etc.).

    Before describing our methods for classifying users in termsof the lists on which they appear, we emphasize that weare motivated by a particular set of substantive questionsarising out of communications theory. In particular, weare interested in the relative importance of mass commu-nications, as practiced by media and other formal organiza-tions, masspersonal communications as practiced by celebri-ties and prominent bloggers, and interpersonal communica-tions, as practiced by ordinary individuals communicatingwith their friends. In addition, we are interested in the re-lationships between these categories of users, motivated bytheoretical arguments such as the theory of the two-stepow [9]. Rather than pursuing a strategy of automatic clas-sication, therefore, our approach depends on dening andidentifying certain predetermined classes of theoretical in-terest, where both approaches have advantages and disad-vantages. In particular, we restrict our attention to fourclasses of what we call “elite” users: media, celebrities, orga-

    nizations, and bloggers, as well as the relationships betweenthese elite users and the much larger population of “ordi-nary” users.

    Analytically, our approach has some disadvantages. Inparticular, by determining the categories of interest in ad-vance, we reduce the possibility of discovering unanticipatedcategories that may be of equal or greater relevance thanthose we selected. Thus although we believe that for our par-ticular purposes, the advantages of our approach—namelyconceptual clarity and ease of interpretation—outweigh thedisadvantages, automated classication methods remain aninteresting topic for future work. Finally, in addition tothese theoretically-imposed constraints, our proposed clas-sication method must also satisfy a practical constraint—namely that the rate limits established by Twitter’s APIeffectively preclude crawling all lists for all Twitter users

    4

    .Thus we instead devised two different sampling schemes—asnowball sample and an activity sample—each with someadvantages and disadvantages, discussed below.

    3.3.1 Snowball sample of Twitter ListsThe rst method for identifying elite users employed snow-

    ball sampling. For each category, we chose a number u 0 of seed users that were highly representative of the desired cat-egory and appeared on many category-related lists. For eachof the four categories above, the following seeds were chosen:

    • Celebrities : Barack Obama, Lady Gaga, Paris Hilton

    • Media : CNN, New York Times

    • Organizations : Amnesty International, World WildlifeFoundation, Yahoo! Inc., Whole Foods

    4 The Twitter API allows only 20K calls per hour, where atmost 20 lists can be retrieved for each API call. Under themodest assumption of 40M users, where each user is includedon at most 20 lists, this would require roughly 11 weeks.Clearly this time could be reduced by deploying multipleaccounts, but it also likely underestimates the real time quitesignicantly, as many users appear on many more than 20lists (e.g. Lady Gaga appears on nearly 140,000).

    WWW 2011 – Session: Diffusion March 28–April 1, 2011, Hyderabad, India

    707

  • 8/19/2019 Wu, Shaomei Hofman, Jake M; Mason, Winter y Watss, Duncan. (2011). “Who Says What to Whom on Twitter”

    4/10

    • Blogs 5 : BoingBoing, FamousBloggers, problogger, mash-able. Chrisbrogan, virtuosoblogger, Gizmodo, Ileane,dragonblogger, bbrian017, hishaman, copyblogger, en-gadget, danielscocco, BlazingMinds, bloggersblog, Ty-coonBlogger, shoemoney, wchingya, extremejohn,GrowMap, kikolani, smartbloggerz, Element321, bran-donacox, remarkablogger, jsinkeywest, seosmarty, No-tAProBlog, kbloemendaal, JimiJones, ditesco

    After reviewing the lists associated with these seeds, thefollowing keywords were hand-selected based on (a) theirrepresentativeness of the desired categories; and (b) theirlack of overlap between categories:

    • Celebrities : star, stars, hollywood, celebs, celebrity,celebrities, celebsveried, celebrity-list,celebrities-on-twitter, celebrity-tweets

    • Media : news, media, news-media

    • Organizations : company, companies, organization,organisation, organizations, organisations, corporation,brands, products, charity, charities, causes, cause, ngo

    • Blogs : blog, blogs, blogger, bloggersHaving selected the seeds and the keywords for each cate-

    gory, we then performed a snowball sample of the bipartitegraph of users and lists (see Figure 1). For each seed, wecrawled all lists on which that seed appeared. The resulting“list of lists” was then pruned to contain only the l0 listswhose names matched at least one of the chosen keywordsfor that category. For instance, Lady Gaga is on lists called“faves”, “celebs”, and “celebrity”, but only the latter two listswould be kept after pruning. We then crawled all u1 usersappearing in the pruned “list of lists” (for instance, nd-ing all users that appeared in the “celebrity” list with LadyGaga), and then repeated these last two steps to completethe crawl. In total, 524 , 116 users were obtained, who ap-peared on 7 , 000, 000 lists; however, many of the more promi-nent users appeared on lists in more than one category—forexample Oprah Winfrey was frequently included in lists of “celebrity” as well as “media.” To resolve this ambiguity, wecomputed a user i’s membership score in category c:

    wic = n icN c

    ,

    where n ic is the number of lists in category c that containuser i and N c is the total number of lists in category c.We then assigned each user to the category in which heor she had the highest membership score (i.e., belonged tothe highest fraction of the category’s lists). The number of users assigned in this manner to each category is reportedin Table 1.

    3.3.2 Activity Sample of Twitter ListsAlthough the snowball sampling method is convenient and

    is easily interpretable with respect to our theoretical moti-vation, it is also potentially biased by our particular choiceof seeds. To address this concern, we also generated a sam-ple of users based on their activity. Specically, we crawled5 The blogger category required many more seeds becausebloggers are in general lower prole than the seeds for theother categories

    u0

    l 0

    u1

    l 1

    u 2

    l 2

    Figure 1: Schematic of the Snowball SamplingMethod

    Table 1: Distribution of users over categoriesSnowball Sample Activity Sample

    category # of users % of users # of users % of usersceleb 82,770 15 .8% 14,778 13 .0%

    media 216,010 41 .2% 40,186 35 .3%org 97,853 18 .7% 14,891 13 .1%

    blog 127,483 24 .3% 43,830 38 .6%total 524,116 100% 113,685 100%

    all lists associated with all users who tweeted at least onceevery week for our entire observation period.

    This “activity-based” sampling method is also clearly bi-ased towards users who are consistently active. Importantly,however, the bias is likely to be quite different from any in-troduced by the snowball sample; despite these differences,the qualitative results that follow are similar for both sam-ples, providing evidence that our ndings are not artifactsof the sampling procedures. This method initially yielded750k users and 5M lists; however, after pruning the lists tothose that contained at least one of the keywords above, andassigning users to unique categories (as described above), weobtained a rened sample of 113,685 users, where Table 1reports the number of users assigned to each category. Wenote that the number of lists obtained by the activity sam-pling methods is considerably smaller than that obtainedby the snowball sample, and that bloggers are more heav-ily represented among the activity sample at the expense of the other three categories—consistent with our claim thatthe two methods introduce different biases. Interestingly,however, 97,614 of the activity sample, or 85%, also appearin the snowball sample, suggesting that the two samplingmethods identify similar populations of elite users—as in-deed we conrm in the next section.

    3.3.3 Classifying Elite UsersHaving classied users into the desired categories, we next

    rened the categories to identify “elite”users within each set.

    In doing so, we sought to reduce the size of each categorywhile still accounting for a large fraction of content con-sumed from these categories. In addition, we xed the fourcategories to be of the same size, as categories of very differ-ent sizes would require us to draw two sets of comparisons—one on the basis of total activity/impact, the other on aper-capita basis—rather than just one. To this end, we rstranked all users in each of category by how frequently theyare listed in that category. Next, we measured the ow of in-formation from the top k users in each of the four categoriesto a random sample of 100K ordinary (i.e. unclassied) users

    WWW 2011 – Session: Diffusion March 28–April 1, 2011, Hyderabad, India

    708

  • 8/19/2019 Wu, Shaomei Hofman, Jake M; Mason, Winter y Watss, Duncan. (2011). “Who Says What to Whom on Twitter”

    5/10

    0

    1 0

    2 0

    3 0

    4 0

    5 0

    friends

    top k

    a v e r a g e

    %

    1000 4000 7000 10000

    celebmediaorgblog

    0

    1 0

    2 0

    3 0

    4 0

    5 0

    tweets received

    top k

    a v e r a g e

    %

    1000 4000 7000 10000

    celebmediaorgblog

    (a) Snowball sample

    0

    1 0

    2 0

    3 0

    4 0

    5 0friends

    top k

    a v e r a g e

    %

    1000 4000 7000 10000

    celebmediaorgblog

    0

    1 0

    2 0

    3 0

    4 0

    5 0tweets received

    top k

    a v e r a g e

    %

    1000 4000 7000 10000

    celebmediaorgblog

    (b) Activity sample

    Figure 2: Average fraction of # following (blue line)and # tweets (red line) for a random user that areaccounted for by the top K elites users crawled

    in two ways: the proportion of accounts the user follows ineach category, and the proportion of tweets the user receivedfrom everyone the user follows in each category.

    Figures 2(a) and 2(b) show the fraction of following links(square symbols) and tweets received (diamonds) by an av-erage user from each category, respectively. Although thenumerical values differ slightly, the two sets of results arequalitatively similar. In particular, for both sampling meth-ods, celebrities outrank all other categories, followed by themedia, organizations, and bloggers. Also in both cases, thebulk of the attention is accounted for by a relatively smallnumber of users within each category, as evidenced by therelatively at slope of the attention curves, where we notethat the curve for celebrities asymptotes more slowly thanfor the other three categories. Balancing the requirementsdescribed above, therefore, we chose k = 5000 as a cut-off for the elite categories, where all remaining users are hence-forth classied as ordinary. Naturally, imposing categoricaldistinctions of any kind articially transforms differences of degree (e.g. more or less prominent users) into differencesof kind (“elite” vs. “ordinary”), but again we feel the in-terpretability gained by this distinction outweighs the costs.

    Moreover, because the choice of k = 5000 is arbitrary, wereplicated our analysis with a range of values of k, ndingqualitatively indistinguishable results. Thus, from this pointon, we restrict our analysis to the top 5,000 users in eachcategory identied by the snowball sampling method, notingthat both methods generate similar results.

    Based on this denition of elite users, Table 2 shows thatalthough ordinary users collectively introduce by far thehighest number of URLs, members of the elite categories arefar more active on a per-capita basis. In particular, usersclassied as “media” easily outproduce all other categories,

    Table 2: # of URLs initiated by category# of URLs

    category # of URLs per-capitaceleb 139,058 27.81media 5,119,739 1023.94org 523,698 104.74blog 1,360,131 272.03ordinary 244,228,364 6.10

    Table 3: Top 5 users in each categoryCelebrity Media Org Blog

    aplusk cnnbrk google mashableladygaga nytimes Starbucks problogger

    TheEllenShow asahi twitter kibelocotaylorswift13 BreakingNews joinred naosalvo

    Oprah TIME ollehkt dooce

    followed by bloggers, organizations, and celebrities. Ordi-nary users originate on average only about 6 URLs each,compared with over 1,000 for media users. In the rest of

    this paper, therefore, when we talk about “celebrity”, “me-dia”, “organization”, “blog”, we refer the top 5K users drawnfrom the snowball sample listed as “celebrity”, “media”, “or-ganization”, “blog”, respectively.

    Table 3, which shows the top 5 users in each of the fourcategories, suggests that the sampling method yields re-sults that are consistent with our objective of identifyingusers who are prominent exemplars of our target categories.Among the celebrity list, for example, “aplusk,” is the han-dle for actor Ashton Kusher, one of the rst celebrities toembrace Twitter and still one of the most followed users,while the remaining celebrity users—Lady Gaga, Ellen De-generes, Oprah Winfrey, and Taylor Swift, are all householdnames. In the media category, CNN Breaking News and theNew York Times are most prominent, followed by BreakingNews, Time, and Asahi, a leading Japanese daily newspa-per. Among organizations, Google, Starbucks, and Twit-ter are obviously large and socially prominent corporations,while JoinRed is the charity organization started by Bono of U2, and ollehkt is the Twitter account for KT, formerly Ko-rean Telecom. Finally, among the blogging category, Mash-able and ProBlogger are both prominent US blogging sites,while Kibe Loco and Nao Salvo are popular blogs in Brazil,and dooce is the blog of Heather Armstrong, a widely read“mommy blogger” with over 1.5M followers.

    4. “WHO LISTENS TO WHOM”The results of the previous section provide qualied sup-

    port for the conventional wisdom that audiences have be-

    come increasingly fragmented. Clearly, ordinary users onTwitter are receiving their information from many thou-sands of distinct sources, most of which are not traditionalmedia organizations—even though media outlets are by farthe most active users on Twitter, only about 15% of tweetsreceived by ordinary users are received directly from themedia. Equally interesting, however, is that in spite of thisfragmentation, it remains the case that 20 K elite users, com-prising less than 0.05% of the user population, attract almost50% of all attention within Twitter. Thus, while attentionthat was formerly restricted to mass media channels is now

    WWW 2011 – Session: Diffusion March 28–April 1, 2011, Hyderabad, India

    709

  • 8/19/2019 Wu, Shaomei Hofman, Jake M; Mason, Winter y Watss, Duncan. (2011). “Who Says What to Whom on Twitter”

    6/10

    Celeb Media

    Org Blog

    A B

    Category of Twitter Users

    B receive tweets from A

    % of tweets received fromCeleb Media Org Blog

    Celeb 38.27 6.23 1.55 3.98Media 3.91 26.22 1.66 5.69Org 4.64 6.41 8.05 8.70Blog 4.94 3.89 1.58 22.55

    Figure 3: Share of tweets received among elite cat-egories

    shared amongst other “elites”, information ows have notbecome egalitarian by any means.

    The prominence of elite users also raises the question of how these different categories listen to each other. To ad-dress this issue, we compute the volume of tweets exchangedbetween elite categories. Specically, Figure 3 shows theaverage percentage of tweets that category i receives from

    category j (indicated by edge thickness), exhibiting notice-able homophily with respect to attention: celebrities over-whelmingly pay attention to other celebrities, media actorspay attention to other media actors, and so on. The oneslight exception to this rule is that organizations pay moreattention to bloggers than to themselves. In general, in fact,attention paid by organizations is more evenly distributedacross categories than for any other category.

    Figure 3, it should be noted, shows only how many URLsare received by category i from category j , a particularlyweak measure of attention for the simple reason that manytweets go unread. A stronger measure of attention, there-fore, is to consider instead only those URLs introduced bycategory i that are subsequently retweeted by category j .Figure 4 shows how much information originating from eachcategory is retweeted by other categories. As with our previ-ous measure of attention, retweeting is strongly homophilousamong elite categories; however, bloggers are disproportion-ately responsible for retweeting URLs originated by all cate-gories, issuing 93 retweets per person, compared to only 1.1retweets per person for ordinary users. This result thereforereects the conventional characterization of bloggers as re-cyclers and lters of information. Interestingly, however, wealso note that the total number of URLs retweeted by blog-gers (465k) is vastly outweighed by the number retweeted byordinary users (46M); thus in spite of the much greater per-capita activity, their overall impact is still relatively small.

    4.1 Two-Step Flow of Information

    Examining information ow on Twitter also sheds newlight on the theory of the two-step ow [8], arguably the the-ory that has most successfully captured the dueling impor-tance of mass media and interpersonal inuence. As we havealready noted, on Twitter the ow of information from themedia to the masses accounts for only a fraction of the totalvolume of information. Nevertheless, it is still a substantialfraction, so it is still interesting to ask: for the special caseof information originating from media sources, what propor-tion is broadcast directly to the masses, and what proportionis transmitted indirectly via some population of intermedi-

    Celeb Media

    Org Blog

    A B

    Category of Twitter Users

    A retweet B

    # of retweets by Celeb Media Org Blog

    Celeb 4,334 1,489 1,543 5,039Media 4,624 40,263 7,628 32,027Org 1,570 2,539 18,937 11,175Blog 3,710 6,382 5,762 99,818

    Figure 4: RT behavior among elite categories

    aries? In addition, we may inquire whether these interme-diaries, to the extent they exist, are drawn from other elitecategories or from ordinary users, as claimed by the two-step ow theory; and if the latter, in what respects theydiffer from other ordinary users.

    Before proceeding with this analysis, we note that thereare two ways information can pass through an intermediary

    in Twitter. The rst is via retweeting, which occurs whena users explicitly rebroadcasts a URL that he or she has re-ceived from a friend, along with an explicit acknowledgementof the source—either using the official retweet functionalityprovided by Twitter or by making use of an informal con-vention such as “RT @user” or “via @user.” Alternatively,a user may tweet a URL that has previously been posted,but without acknowledgement of a source; in this case weassume the information was independently rediscovered andlabel this a “reintroduction” of content. For the purposesof studying when a user receives information directly fromthe media or indirectly through an intermediary, we treatretweets and reintroductions equivalently. If the rst occur-rence of a URL in Twitter came from a media user, but auser received the URL from another source, then that sourcecan be considered an intermediary, whether they are citingthe source within Twitter by retweeting the URL, or rein-troducing it, having discovered the URL outside of Twitter.

    To quantify the extent to which ordinary users get theirinformation indirectly versus directly from the media, wesampled 1M random ordinary users 6 , and for each user,counted the number n of bit.ly URLs they had received thathad originated from one of our 5K media users, where of the 1M total, 600K had received at least one such URL.For each member of this 600K subset we then counted thenumber n 2 of these URLs that they received via non-mediafriends; that is, via a two-step ow. The average fractionn 2 /n = 0 .46 therefore represents the proportion of media-originated content that reaches the masses via an interme-

    diary rather than directly. As Figure 5 shows, however,this average is somewhat misleading. In reality, the pop-ulation comprises two types—those who receive essentiallyall of their media-originating information via two-step owsand those who receive virtually all of it directly from the me-dia. Unsurprisingly, the former type is exposed to less totalmedia than the latter. What is surprising, however, is that

    6 As before, performing this analysis for the entire populationof over 40M ordinary users proved to be computationallyunfeasible.

    WWW 2011 – Session: Diffusion March 28–April 1, 2011, Hyderabad, India

    710

  • 8/19/2019 Wu, Shaomei Hofman, Jake M; Mason, Winter y Watss, Duncan. (2011). “Who Says What to Whom on Twitter”

    7/10

    10 10 2 103 104 105 106 0 . 0

    0 . 4

    0 . 8

    random sample

    # media−originated URLs

    i n d i r e c t

    f l o w r a

    t i o a

    10 10 2 103 104 105 106

    random sample

    # media−originated URLs

    # u s e r s

    0

    5 0 0 0 0

    1 5 0 0 0 0 b

    10 10 2 103 104 105 106 0

    . 0

    0 . 4

    0 . 8

    intermediaries

    # media−originated URLs

    i n d i r e c t

    f l o w r a

    t i o c

    10 10 2 103 104 105 106

    intermediaries

    # media−originated URLs

    # u s e r s

    0

    1 0 0 0 0 0

    d

    Figure 5: Percentage of information that receivedvia an intermediary as a function of total volume of media content to which a user is exposed

    even users who received up to 100 media URLs during ourobservation period received all of them via intermediaries.

    Who are these intermediaries, and how many of them arethere? In total, the population of intermediaries is smallerthan that of the users who rely on them, but still surprisinglylarge, roughly 490K, the vast majority of which (484K, or99%) are classied as ordinary users, not elites. To illustratethe difference, we note that whereas the top 20K elite userscollectively account for nearly 50% of attention, the top 10Kmost-followed ordinary users account for only 5%. Moreover,Figure 5c also shows that at least some intermediaries alsoreceive the bulk of their media content indirectly, just likeother ordinary users.

    Comparing Figure 5a and 5c, however, we note that in-termediaries are not like other ordinary users in that theyare exposed to considerably more media than randomly se-lected users (9165 media-originated URLs on average vs.1377), hence the number of intermediaries who rely on two-step ows is smaller than for random users. In addition,we nd that on average intermediaries have more followersthan randomly sampled users (543 followers versus 34) andare also more active (180 tweets on average, versus 7). Fi-nally, Figure 6 shows that although all intermediaries, bydenition, pass along media content to at least one otheruser, a minority satises this function for multiple users,where we note that the most prominent intermediaries aredisproportionately drawn from the 4% of elite users—AshtonKucher (aplusk), for example, acts as an intermediary forover 100,000 users.

    Interestingly, these results are all broadly consistent with

    the original conception of the two-step ow, advanced over50 years ago, which emphasized that opinion leaders were“distributed in all occupational groups, and on every so-cial and economic level,” corresponding to our classicationof most intermediaries as ordinary [9]. The original theoryalso emphasized that opinion leaders, like their followers,also received at least some of their information via two-stepows, but that in general they were more exposed to themedia than their followers—just as we nd here. Finally,the theory predicted that opinion leadership was not a bi-nary attribute, but rather a continuously varying one, cor-

    # of two−step recipients

    # o f o p

    i n i o n

    l e a d e r s

    0 10 10 2 103 104 1050

    10

    102

    103

    104

    105

    Figure 6: Frequency of intermediaries binned by #randomly sampled users to whom they transmit me-dia content

    responding to our nding that intermediaries vary widely inthe number of users for whom they act as lters and trans-mitters of media content. Given the length of time that haselapsed since the theory of the two-step ow was articulated,and the transformational changes that have taken place incommunications technology in the interim—given, in fact,that a service like Twitter was likely unimaginable at thetime—it is remarkable how well the theory agrees with ourobservations.

    5. WHO LISTENS TO WHAT?The results in Section 4 demonstrate the “elite” users ac-

    count for a substantial portion of all of the attention onTwitter, but also show clear differences in how the attentionis allocated to the different elite categories. It is thereforeinteresting to consider what kinds of content is being sharedby these categories. Given the large number of URLs in ourobservation period (260 M ), and the many different ways one

    can classify content (video vs. text, news vs. entertainment,political news vs. sports news, etc.), classifying even a smallfraction of URLs according to content is an onerous task.Bakshy et al. [1], for example, used Amazon’s MechanicalTurk to classify a stratied sample of 1,000 URLs along avariety of dimensions; however, this method does not scalewell to larger sample sizes.

    Instead, we restricted attention to URLs originated by theNew York Times which, with over 2.5M followers, is the mostactive and the second-most-followed news organization onTwitter (after CNN Breaking News). To classify NY Timescontent, we exploited a convenient feature of their format—namely that all NY Times URLs are classied in a consistentway by the section in which they appear (e.g. U.S., World,Sports, Science, Arts, etc.) 7 . Of the 6398 New York Timesbit.ly URLs we observed, 6370 could be successfully unshort-ened and assigned to one of 21 categories. Of these, how-ever, only 9 categories had more than 100 URLs during theobservation period, one of which—“NY region”—was highlyspecic to the New York metropolitan area; thus we focusedour attention on the remaining 8 topical categories. Figure7 shows the proportion of URLs from each New York Timessection retweeted or reintroduced by each category. World7 http://www.nytimes.com/year/month/day/category/title.html?ref=category

    WWW 2011 – Session: Diffusion March 28–April 1, 2011, Hyderabad, India

    711

  • 8/19/2019 Wu, Shaomei Hofman, Jake M; Mason, Winter y Watss, Duncan. (2011). “Who Says What to Whom on Twitter”

    8/10

    User Category

    % R

    T s a n

    d R e - i n

    t r o

    d u c t

    i o n s

    0.00

    0.05

    0.10

    0.15

    0.20

    0.25

    0.30

    0.35

    0.00

    0.05

    0.10

    0.15

    0.200.25

    0.30

    0.35

    0.00

    0.05

    0.10

    0.15

    0.20

    0.25

    0.30

    0.35

    0.00

    0.050.10

    0.15

    0.20

    0.25

    0.30

    0.35

    1. World News

    3. Business

    5. Health

    7. Science

    blog celeb media org other

    2. U.S. News

    4. Sports

    6. Technology

    8. Arts

    blog celeb media org other

    Figure 7: Number of RTs and Reintroductions of New York Times stories by content category

    news is the most popular category, followed by U.S. News,Business, and Sports, where increasingly niche categorieslike Health, Arts, Science, and Technology are less popu-lar still. In general, the overall pattern is replicated for allcategories of users, but there are some minor deviations: inparticular, organizations show disproportionately little in-terest in business and arts-related stories, and dispropor-tionately high interest in science, technology, and possiblyworld news. Celebrities, by contrast, show greater interestin sports and less interest in health, while the media showssomewhat greater interest in U.S. news stories.

    5.1 Lifespan of ContentIn addition to different types of content, URLs introduced

    by different types of elite users or ordinary users may exhibitdifferent lifespans, by which we mean the time lag betweenthe rst and last appearance of a given URL on Twitter.

    Naively, measuring lifespan seems a trivial matter; how-ever, a nite observation period—which results in censoringof our data—complicates this task. In other words, a URL

    that is last observed towards the end of the observation pe-riod may be retweeted or reintroduced after the period ends,while correspondingly, a URL that is rst observed towardthe beginning of the observation window may in fact havebeen introduced before the window began. What we observeas the lifespan of a URL, therefore, is in reality a lower boundon the lifespan. Although this limitation does not createmuch of a problem for short-lived URLs—which account forthe vast majority of our observations—it does potentiallycreate large biases for long lived URLs. In particular, URLsthat appear towards the end of our observation period will

    rst observationof URL

    last observationof URL

    est imat ion per iod = 133 days evaluat ion per iod = 90 days

    Total observation window = 223 days

    Figure 8: (a) Denition of URL lifespan τ (b)Schematic of lifespan estimation procedure

    be systematically classied as shorter-lived than URLs thatappear towards the beginning.

    To address the censoring problem, we seek to determinea buffer δ at both the beginning and the end of our 223-day period, and only count URLs as having a lifespan of τ if (a) they do not appear in the rst δ days, (b) they rstappear in the interval between the buffers, and (c) they donot appear in the last δ days, as illustrated in Figure 8(a).To determine δ we rst split the 223 day period into twosegments—the rst 133 day estimation period and the last90 day evaluation period (see Figure 8(b))—and then ask: if we (a) observe a URL rst appear in the rst (133 − δ ) daysand (b) do not see it in the δ days prior to the onset of theevaluation period, how likely are we see it in the last 90 days?Clearly this depends on the actual lifespan of the URL, asthe longer a URL lives, the more likely it will re-appearin the future. Using this estimation/evaluation split, wend an upper-bound on lifespan for which we can determinethe actual lifespan with 95% accuracy as a function of δ .Finally, because we require a beginning and ending buffer,and because we can only classify a URL as having lifespan τ if it appears at least τ days before the end of our window, weneed to pick τ and δ such that τ + 2 δ ≤ 223. We determined

    that τ = 70 and δ = 70 sufficiently satised our constraints;thus for the following analysis, we consider only URLs thathave a lifespan τ ≤ 70 8 .

    5.2 Lifespan By CategoryHaving established a method for estimating URL lifespan,

    we now explore the lifespan of URLs introduced by differentcategories of users, as shown in Figure 9(a). URLs initi-ated by the elite categories exhibit a similar distribution overlifespan to those initiated by ordinary users. As Figure 9(b)shows, however, when looking at the percentage of URLs of different lifespans initiated by each category, we see two ad-ditional results: rst, URLs originated by media actors gen-erate a large portion of short-lived URLs (especially URLswith τ = 0, those that only appeared once); and second,URLs originated by bloggers are overrepresented among thelonger-lived content. Both of these results can be explainedby the type of content that originates from different sources:whereas news stories tend to be replaced by updates on adaily or more frequent basis, the sorts of URLs that arepicked up by bloggers are of more persistent interest, andso are more likely to be retweeted or reintroduced months

    8 We also performed our analysis with different values of τ ,nding very similar results; thus our conclusions are robustwith respect to the details of our estimation procedure.

    WWW 2011 – Session: Diffusion March 28–April 1, 2011, Hyderabad, India

    712

  • 8/19/2019 Wu, Shaomei Hofman, Jake M; Mason, Winter y Watss, Duncan. (2011). “Who Says What to Whom on Twitter”

    9/10

    0 10 20 30 40 50 60 70

    lifespan (day)

    # o

    f U R L s

    0

    102

    104

    106

    108 other

    celebmediaorgblog

    (a) Count

    0 10 20 30 40 50 60 700

    1

    2

    3

    4

    5

    6

    7

    lifespan (day)

    %

    o f U R L s

    f r o m

    e l i t e s c a

    t e g o r y

    celebmediaorgblog

    (b) Percent

    Figure 9: 9(a) Count and 9(b) percentage of URLsinitiated by 4 categories, with different lifespans

    or even years after their initial introduction. Twitter, inother words, should be viewed as a subset of a much largermedia ecosystem in which content exists and is repeatedlyrediscovered by Twitter users. Some of this content—suchas daily news stories—has a relatively short period of rel-

    evance, after which a given story is unlikely to be reintro-duced or rebroadcast. At the other extreme, classic musicvideos, movie clips, and long-format magazine articles havelifespans that are effectively unbounded, and can seeminglybe rediscovered by Twitter users indenitely without losingrelevance.

    To shed more light on the nature of long-lived content onTwitter, we used the bit.ly API service to unshorten 35Kof the most long-lived URLs (URLs that lived at least 200days), and mapped them into 21034 web domains. As Figure10 shows, the population of long-lived URLs is dominatedby videos, music, and consumer goods. Two related pointsare illustrated by Figure 11, which shows the average RTrate (the proportion of tweets containing the URL that areretweets of another tweet) of URLs with different lifespans,grouped by the categories that introduced the URL 9 . First,for ordinary users, the majority of appearances of URLs af-ter the initial introduction derives not from retweeting, butrather from reintroduction, where this result is especiallypronounced for long-lived URLs. For the vast majority of URLs on Twitter, in other words, longevity is determinednot by diffusion, but by many different users independently9 Note here that URLs with lifespan = 0 are those URLsthat only appeared once in our dataset, thus the RT rate iszero.

    123greetings.comsmashingmagazine.com

    vimeo.comcollegehumor.comflickr.com

    google.comtwitter.com

    facebook.commyspace.com

    imdb.comecademy.com

    ted.comfeedproxy.google.com

    mashable.combitrebels.com

    en.wikipedia.orgpollpigeon.com

    amazon.comlast.fm

    youtube.com

    count10 10 2 103 104

    Figure 10: Top 20 domains for URLs that lived morethan 200 days

    0 10 20 30 40 50 60 700.0

    0.2

    0.4

    0.6

    0.8

    1.0

    lifespan (day)

    R T r a

    t e =

    # o f R T s

    t o t a l # o f o c c u r r e n c e s other

    celebmediaorgblog

    Figure 11: Average RT rate by lifespan for each of the originating categories

    rediscovering the same content, consistent with our inter-pretation above. Second, however, for URLs introduced byelite users, the result is somewhat the opposite—that is, theyare more likely to be retweeted than reintroduced, even forURLs that persist for weeks. Although it is unsurprisingthat elite users generate more retweets than ordinary users,the size of the difference is nevertheless striking, and sug-gests that in spite of the dominant result above that contentlifespan is determined to a large extent by the type of con-tent, the source of its origin also impacts its persistence, atleast on average—a result that is consistent with previousndings [1].

    6. CONCLUSIONSIn this paper, we investigated a classic problem in me-dia communications research, captured by the rst part of Laswell’s maxim—“who says what to whom”—in the contextof Twitter. In particular, we nd that although audience at-tention has indeed fragmented among a wider pool of contentproducers than classical models of mass media, attention re-mains highly concentrated, where roughly 0.05% of the pop-ulation accounts for almost half of all posted URLs. Withinthis population of elite users, moreover, we nd that atten-tion is highly homophilous, with celebrities following celebri-

    WWW 2011 – Session: Diffusion March 28–April 1, 2011, Hyderabad, India

    713

  • 8/19/2019 Wu, Shaomei Hofman, Jake M; Mason, Winter y Watss, Duncan. (2011). “Who Says What to Whom on Twitter”

    10/10

    ties, media following media, and bloggers following bloggers.Second, we nd considerable support for the two-step owof information—almost half the information that originatesfrom the media passes to the masses indirectly via a diffuseintermediate layer of opinion leaders, who although classiedas ordinary users, are more connected and more exposed tothe media than their followers. Third, we nd that althoughall categories devote a roughly similar fraction of their atten-tion to different categories of news (World, U.S., Business,etc), there are some differences—organizations, for exam-ple, devote a surprisingly small fraction of their attentionto business-related news. We also nd that different typesof content exhibit very different lifespans: media-originatedURLs are disproportionately represented among short-livedURLs while those originated by bloggers tend to be over-represented among long-lived URLs. Finally, we nd thatthe longest-lived URLs are dominated by content such asvideos and music, which are continually being rediscoveredby Twitter users and appear to persist indenitely.

    By restricting our attention to URLs shared on Twitter,our conclusions are necessarily limited to one narrow cross-section of the media landscape. An interesting directionfor future work would therefore be to apply similar meth-

    ods to quantifying information ow via more traditionalchannels, such as TV and radio on the one hand, and in-terpersonal interactions on the other hand. Moreover, al-though our approach of dening a limited set of predeter-mined user-categories allowed for relatively convenient anal-ysis and straightforward interpretation, it would be interest-ing to explore automatic classication schemes from whichadditional user categories could emerge. Finally, anothertwo areas for future work are rst, to extract content infor-mation in a more systematic manner—the “what” of Lass-well’s maxim; and second, to focus more on the effects of communication by merging the data regarding informationow on Twitter with other sources of outcome data, such asthe opinions or actions of the recipients of the information.

    7. REFERENCES[1] E. Bakshy, J. M. Hofman, W. A. Mason, and D. J.

    Watts. Identifying ‘inuencers’ on twitter. In Fourth ACM International Conference on Web Seach and Data Mining (WSDM) , Hong Kong, 2011. ACM.

    [2] W. L. Bennett and S. Iyengar. A new era of minimaleffects? the changing foundations of politicalcommunication. Journal of Communication ,58(4):707–731, 2008.

    [3] M. Cha, H. Haddadi, F. Benevenuto, and K. P.Gummad. Measuring user inuence on twitter: Themillion follower fallacy. In 4th Int’l AAAI Conference on Weblogs and Social Media , Washington, DC, 2010.

    [4] J. S. Coleman, E. Katz, and H. Menzel. The diffusionof an innovation among physicians. Sociometry ,20(4):253–270, 1957.

    [5] R. Crane and D. Sornette. Robust dynamic classesrevealed by measuring the response function of asocial system. Proceedings of the National Academy of Sciences , 105(41):15649, 2008.

    [6] T. Gitlin. Media sociology: The dominant paradigm.Theory and Society , 6(2):205–253, 1978.

    [7] M. Gomez Rodriguez, J. Leskovec, and A. Krause.Inferring networks of diffusion and inuence. In

    Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining ,pages 1019–1028. ACM, 2010.

    [8] E. Katz. The two-step ow of communication: Anup-to-date report on an hypothesis. Public Opinion Quarterly , 21(1):61–78, 1957.

    [9] E. Katz and P. F. Lazarsfeld. Personal inuence; the part played by people in the ow of mass communications . Free Press, Glencoe, Ill.

    ” 1955.

    [10] G. Kossinets and D. J. Watts. Empirical analysis of anevolving social network. Science , 311(5757):88–90,2006.

    [11] H. Kwak, C. Lee, H. Park, and S. Moon. What istwitter, a social network or a news media? InProceedings of the 19th international conference on World Wide Web , pages 591–600. ACM, 2010.

    [12] H. D. Lasswell. The structure and function of communication in society. In L. Bryson, editor, The Communication of Ideas , pages 117–130. University of Illinois Press, Urbana, IL, 1948.

    [13] P. F. Lazarsfeld, B. Berelson, and H. Gaudet. The people’s choice; how the voter makes up his mind in a presidential campaign . Columbia University Press,New York, 3rd edition, 1968.

    [14] R. K. Merton. Patterns of inuence: Local andcosmopolitan inuentials. In R. K. Merton, editor,Social theory and social structure , pages 441–474. FreePress, New York, 1968.

    [15] C. Sunstein. Going to extremes: how like minds unite and divide . Oxford University Press, USA, 2009.

    [16] J. B. Walther, C. T. Carr, S. S. W. Choi, D. C.DeAndrea, J. Kim, S. T. Tong, and B. Van Der Heide.Interaction of interpersonal, peer, and media inuencesources online. In Z. Papacharissi, editor, A Networked Self: Identity, Community, and Culture on Social Network Sites , pages 17–38. Routledge, 2010.

    [17] J. Weng, E. P. Lim, J. Jiang, and Q. He. Twitterrank:

    nding topic-sensitive inuential twitterers. InProceedings of the third ACM international conference on Web search and data mining , pages 261–270. ACM,2010.

    WWW 2011 – Session: Diffusion March 28–April 1, 2011, Hyderabad, India

    714