Examining User Generated Content on “Data Science” through Twitter

Embed Size (px)

Citation preview

  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    1/39

    1Ana Crisostomostudent n. 10397124

    Examining User Generated Content on Data Science through

    Twitter

    Ana Crisostomo

    Student n. 10397124

    Digital Methods

    Assignment # 5

    Supervisors: Bernhard Rieder / Erik Borra

    07.12.2012

    [email protected]

  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    2/39

    2Ana Crisostomostudent n. 10397124

    Examining User Generated Content on Data Science through

    Twitter

    Introduction

    Founded in 2006 by Jack Dorsey, Twitter is currently one of the main players in the social

    media sphere at a global level. Even though the company does not publish official reports on

    the platform statistics on a regular basis[1]

    , earlier on this year it announced having more

    than 140 million active users and managing 340 million Tweets per day[2]

    .

    Categorized under the microblogging service category due to the 140 characters limitation

    of its messages (tweets), it is considered by some as electronic word of mouth [3]providing

    the content of the messages with a very specific character in terms of structure and

    language (characterized, for example, by the usage of hashtags represented by the symbol

    # as metadata tags for content association purposes), and temporality (focus on the

    immediate present) which attracts interest from the business and corporate world, the

    academic arena and even the artistic field (albeit for distinct purposes).

    The business perspective might focus on user demographics, patterns and frequency of

    usage, categorization of content and other relevant preferences in order to gather data

    which, in most cases, is converted into marketing information[4]

    . Academia might direct its

    efforts to contextualize and explain the usage of the service, for instance, in event-based

    situations of social and political relevance such as extensive natural catastrophes[5]

    ,

    [1]The most complete set of official numbers on the platform usage in the companys blog dates from 2011:

    http://blog.twitter.com/2011/03/numbers.html.

    [2] According to numbers divulged on March 2012 on occasion of Twitters 6

    th birthday:

    http://blog.twitter.com/2012/03/twitter-turns-six.html.

    [3] See Twitter power: Tweets as electronic word of mouth by Jansen et al:

    http://onlinelibrary.wiley.com/doi/10.1002/asi.21149/full .

    [4] On this topic, consult a recent study of Twitter users by Beevolve, a social media marketing company:

    http://www.beevolve.com/twitter-statistics/.

    [5] See Earthquake shakes Twitter users: real-time event detection by social sensors by Sakaki et al:

    http://dl.acm.org/citation.cfm?id=1772777and "Tools and Methods for Capturing Twitter Data during Natural

    Disasters" by Bruns and Lang.

    http://blog.twitter.com/2011/03/numbers.htmlhttp://blog.twitter.com/2011/03/numbers.htmlhttp://blog.twitter.com/2012/03/twitter-turns-six.htmlhttp://blog.twitter.com/2012/03/twitter-turns-six.htmlhttp://onlinelibrary.wiley.com/doi/10.1002/asi.21149/fullhttp://onlinelibrary.wiley.com/doi/10.1002/asi.21149/fullhttp://www.beevolve.com/twitter-statistics/http://www.beevolve.com/twitter-statistics/http://dl.acm.org/citation.cfm?id=1772777http://dl.acm.org/citation.cfm?id=1772777http://dl.acm.org/citation.cfm?id=1772777http://www.beevolve.com/twitter-statistics/http://onlinelibrary.wiley.com/doi/10.1002/asi.21149/fullhttp://blog.twitter.com/2012/03/twitter-turns-six.htmlhttp://blog.twitter.com/2011/03/numbers.html
  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    3/39

    3Ana Crisostomostudent n. 10397124

    elections[6]

    , official political events[7]

    , and social uprisings[8]

    . Finally, the art world can use

    Twitter data to explore moods and feelings[9]

    , to portray unconventional visualizations[10]

    or simply to stage playfulness[11]

    .

    The present academic research will study one specific topic (considered to hold currentrelevance) uniquely through communication processed via Twitter for a limited period of

    time in order to try to reveal significant connections between the content, the users (often

    labeled as twitterati) and the platform usage.

    Questions

    As briefly introduced, Twitter data can be utilized for diverse purposes in several areas. This

    specific study aims at: 1) exploring to which extent a social media microblogging service as

    Twitter plays a significant role in the communication of certain ideas around one specific

    topic, and 2) unveiling the user dynamics which develop around the same topic.

    Since Twitter is a platform heavily focused on the present and privileging quick and short

    messages, this type of study lends itself more adequately to events and topics of current

    relevance and demanding a certain urgency on the communication act with the purpose, for

    example, of announcing, promoting or denouncing (one is excluding from this perspective

    all communication which could be labeled as conversations of private nature between a

    limited number of userssince that falls outside the scope of the current study).

    The topic selected for the investigation is data sciencewhich has increasingly gained more

    media significance (but also beyond the media realm) with its association to themes such as

    [6]See Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment by Tumasjan

    et al:http://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/viewFile/1441/1852 .

    [7] See Twitter, YouTube, and Flickr as platforms of alternative journalism: The social media account of the

    2010 Toronto G20 protests by Poell and Borra.

    [8] See For the ppl of Iran - #iranelection RT by the Digital Methods Initiative:

    https://movies.issuecrawler.net/for_the_ppl_of_iran.html.

    [9] As examples check, respectively, the Emoto interactive installation: http://blog.emoto2012.org/ and the

    Twistori application:http://twistori.com.

    [10]Look up the Tweetures visualization case:http://whatspop.com/entry/tweetures.

    [11]On this matter, read about the Tasty Tweets project:http://www.kfrantzis.com/Tasty-Tweets.

    http://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/viewFile/1441/1852http://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/viewFile/1441/1852http://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/viewFile/1441/1852https://movies.issuecrawler.net/for_the_ppl_of_iran.htmlhttps://movies.issuecrawler.net/for_the_ppl_of_iran.htmlhttp://blog.emoto2012.org/http://blog.emoto2012.org/http://blog.emoto2012.org/http://twistori.com/http://twistori.com/http://twistori.com/http://whatspop.com/entry/tweetureshttp://whatspop.com/entry/tweetureshttp://whatspop.com/entry/tweetureshttp://www.kfrantzis.com/Tasty-Tweetshttp://www.kfrantzis.com/Tasty-Tweetshttp://www.kfrantzis.com/Tasty-Tweetshttp://www.kfrantzis.com/Tasty-Tweetshttp://whatspop.com/entry/tweetureshttp://twistori.com/http://blog.emoto2012.org/https://movies.issuecrawler.net/for_the_ppl_of_iran.htmlhttp://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/viewFile/1441/1852
  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    4/39

    4Ana Crisostomostudent n. 10397124

    information visualization and big data (the latter certainly being one of 2012s business

    buzzwords).

    The research questions for this study can then be stated as follows: 1)What are the most

    common themes associated to data science in Twitter? 2) Are there specific sources ofinformation or actors which play an important role within this topical area? 3) Are there

    visible patterns regarding the nature of the content associated with this topic?

    The answer to these questions will hopefully shed some light into the treatment of the

    subject through this particular microblogging platform.

    Method

    The dataset which enabled this research was available through the Twitter Analytics tool

    (https://tools.digitalmethods.net/coword/twitter/analysis/index.php) which started

    capturing tweets on 23/11/2012 (15:54 GMT) including, at least, one of the following

    hashtags: #datascience, #bigdata, #dataviz and #datavis.

    In order to work with a stable but still significant dataset, a period of one full week (7 days)

    was selected (from 27/11/2012 to 03/12/2012) which summed up 21.356 tweets.

    Regarding the hashtags, using one of the several web services which allow the visualization

    of associated hashtags (in this case http://hashtagify.me/) it is possible to note that

    #datascience does indeed have a strong connection with #bigdata (see Network

    Visualization 1 onAppendix 1). However, the two other topics (which basically refer to one

    topic written in two different manners) do not seem to be regularly associated with that

    hashtag according to this tool (and between both, dataviz emerges as more common than

    datavis see Network Visualizations 2 and 3 on Appendix 1). The theme big data

    appears to be the connector between the topics data science and data visualization(see

    Network Visualization 2 onAppendix 1). According to statistics provided by the same tool,

    the most popular hashtag from the set is #bigdata (index of 56.4see Network Visualization

    4 on Appendix 1) and the least popular #datascience (index of 25.3 see Network

    Visualization 1 onAppendix 1). It will be of interest to ascertain if the findings of this study

    corroborate this particular topical correlation or not and potentially suggest other topics for

    future research.

    https://tools.digitalmethods.net/coword/twitter/analysis/index.phphttps://tools.digitalmethods.net/coword/twitter/analysis/index.phphttps://tools.digitalmethods.net/coword/twitter/analysis/index.phphttp://hashtagify.me/http://hashtagify.me/http://hashtagify.me/https://tools.digitalmethods.net/coword/twitter/analysis/index.php
  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    5/39

  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    6/39

    6Ana Crisostomostudent n. 10397124

    not tweeting about the topic during, what is considered to be, the standard leisure time (see

    Graph 1data beyond this week also verifies this particular pattern).

    Graph 1Temporal distribution of Tweets, Users and Locations (including, at least, one of the hashtags

    #datascience, #bigdata, #dataviz, #datavis) from 27/11/2012 to 03/12/2012

    Source:https://tools.digitalmethods.net/coword/twitter/analysis/index.php

    In terms of the most common hashtags (provided by the Hashtag Frequencyreport), there

    is one which clearly outnumbers all the others: #bigdata (see Chart 1all extracted data is

    available throughAppendix 5).

    0

    2,000

    4,000

    6,000

    8,000

    10,000

    12,000

    14,000

    16,000

    18,000

    Chart 1 - Hashtag Frequency (Top 15 hashtags)

    https://tools.digitalmethods.net/coword/twitter/analysis/index.phphttps://tools.digitalmethods.net/coword/twitter/analysis/index.phphttps://tools.digitalmethods.net/coword/twitter/analysis/index.phphttps://tools.digitalmethods.net/coword/twitter/analysis/index.php
  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    7/39

    7Ana Crisostomostudent n. 10397124

    Considering all the hashtags which are referred a minimum of 5 times in the dataset and

    performing a comparative analysis on their representation, then only two of the hashtags

    (#bigdata and #dataviz) represent more than 50% of all hashtags included in tweets (or,

    saying it differently, one out of two tweets would include either #bigdata or #dataviz).

    If one excludes the top 2 hashtags (which were also included in the initial set), then it is

    possible to visualize that 3 other topics gain relevance in this thematic arena: cloud,

    analyticsand cloud computing(see Chart 3).

    17,753

    1,867

    18,062

    Chart 2 - Weight of #bigdata and #dataviz hashtags compared to all

    other hashtags (referred a minimum of 5 times in the dataset)

    bigdata

    dataviz

    others

    0

    200

    400

    600

    800

    1,000

    1,200

    1,400

    1,600

    Chart 3 - Hashtag Frequency (Top 15 hashtags excluding #bigdata and #dataviz)

  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    8/39

    8Ana Crisostomostudent n. 10397124

    Moving forward to analyze the potentially most influential actors in this topical arena, the

    top 15 most mentioned users were collected (data provided by the User Mention

    Frequency reportsee Chart 4).

    Within this top 15, there were mainly corporate professional accounts (including media,

    business and non-profit organizations). Besides those, it was possible to find 2 accounts

    which were related to events (cloudexpo and bigdataexpo even if under different titles,

    they refer to the same event), 2 accounts belonging to individual users (albertocairo and

    benkerschberg) and one for a book (thefaceofbigdata).

    To examine further the level of influence of these top users, the Klout score (see Chart 5 on

    Appendix 2)was added as well as the total number of followers (see Chart 7 onAppendix 2)

    and then the total number of tweets (see Chart 6 onAppendix 2)as an indicator of activity.

    Taking into account that Klout considers 40 as being an average score[15]

    , then it is possible

    to state that almost all of the most mentioned users can be considered influential in the

    social media universe within that particular scale. Two accounts score above 90 (which is

    considered to be very high): Forbes and The World Bank. Complementing this metric with

    the number of followers, then The Wall Street Journal is, by far, the account with more

    followers, followed then by Forbes and Harvard Business. In terms of activity, The Wall

    Street Journal leads the ranking once more, followed by CloudExpo and Forbes. From these

    [15]According to Klout: The average Klout Score is 40. Your Score is determined over a large period of time, and

    is not necessarily representative of your number of followers and friends. Also, the Score is a reflection of

    influence, not activity(http://klout.com/#/corp/faq).

    0

    50

    100

    150

    200

    250

    Chart 4 - Mention Frequency (Top 15 on Number of Mentions)

    http://klout.com/#/corp/faqhttp://klout.com/#/corp/faqhttp://klout.com/#/corp/faq
  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    9/39

    9Ana Crisostomostudent n. 10397124

    numbers, we can conclude that content from The Wall Street Journal and Forbes has the

    potential to reach more users more effectively.

    One interesting aspect to add to this analysis is the fact that the number of user mentions

    does not hold a correlation with the number of tweets sent by the respective user within

    the specified period of time (see Chart 9). For instance, the accounts for Forbes, Harvard

    Business and The Wall Street Journal have hardly sent any tweets on the topic (or none at

    all) during that week (according to the results provided by the TA tool) and yet they are

    included in the most mentioned users which indicates that the content being distributed is,

    most likely, provenient from another source (which is not Twitter), such as the official

    website of that institution.

    Within this most mentioned users list, it is also of interest to understand what is the

    relative attention paid to this topic in comparison with all other topics those users tweet

    about. Using the Tweet Topic Explorer (see the 15 diagrams onAppendix 3), it is possible to

    verify that the most influential accountsThe Wall Street Journal, Forbes, Harvard Business,

    The World Bankdo not give prominence to these topics in their tweets (in some cases, the

    hashtags researched do not even appear in these diagrams) as they are rather generalist.

    The hashtags only seem to gain relevance in accounts which are specialized in this subject

    such as: bigdatablogs, bigdataexpo/cloudexpo, faceofbigdata, ibmbigdata, informaticacorp

    and the two individual user accounts.

    218

    195

    174

    147 147

    111 109 108

    90 9084 80 76 75 73

    38

    129

    1 0

    46

    27

    8

    28

    2 0

    37

    2

    42

    14

    31

    0

    50

    100

    150

    200

    Chart 9 - Mention frequency VS Number of tweets sent for top 15 most mentioned

    users

    Mention

    Frequency

    # Tweets 27/11 -

    03/12

  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    10/39

    10Ana Crisostomostudent n. 10397124

    Regarding specific topical patterns, the Gephi rendition of the Co-hashtag Analysis (see

    Network Visualization 5 on Appendix 4) confirms that #bigdata holds a very central and

    unifying position in this thematic while #datascience, #dataviz and #datavis occupy a more

    peripheral space. Other predominant topics concern cloud computing (#cloud,

    #cloudcomputing) and analytics. It is also possible to find some companies with a

    representative position in this network such as IBM, LinkedIn and Microsoft.

    The type of content shared on this theme appears to be mostly informational since more

    than 80% of the tweets include a link (see Chart 10). This value is significantly above the

    average communicated in September 2010 by Twitter: just one year ago only 25% of the

    tweets contained a link[16]

    .

    Chart 10 - % tweets containing links

    Finally, one ending note regarding content analysis. It could be of interest to classify a

    sample of tweets according to their content but, considering that the vast majority of

    messages has essentially an informational nature (sharing an article on the topic, promoting

    a training event or other educative material), it is very challenging to establish significant

    categories within this area. The study could benefit from a report on URLs shared in order to

    further investigate the influential actors and relevant content regarding this theme but thefunctionality providing these reports was not completed in time for the current research.

    Discussion

    Twitter as a microblogging platform enables what some authors label as electronic word of

    [16]According to data from this report:http://techcrunch.com/2010/09/14/twitter-event/.

    http://techcrunch.com/2010/09/14/twitter-event/http://techcrunch.com/2010/09/14/twitter-event/http://techcrunch.com/2010/09/14/twitter-event/http://techcrunch.com/2010/09/14/twitter-event/
  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    11/39

    11Ana Crisostomostudent n. 10397124

    mouth: the content of the tweets is, by default, in the public domain (unless the user

    explicitly sets his tweets to protected[17]

    ) and can theoretically reach a significant number

    of people at least, all the web-savvy individuals who are pro-actively interested in the

    subject matter.

    Some initial studies stated that most of the content shared via this service held minimal

    pass-along value[18]

    but this view has suffered modifications in the last two years with the

    rise of the platform as a crucial agent in critical situations of political and social nature[19]

    and unpredictable natural catastrophes[20]

    . This association has propelled several ambitious

    projects catering specifically for those situations[21]

    .

    In such cases, the source of the content is originally an individual user and the content of

    the message is directly related to the experienced individual context of the same. However,

    how do the content and the service usage change outside these extreme situations?

    The topic of this research held current relevance (especially in the corporate and academic

    fields) but it was not of urgent nature. In this scenario, the most influential agents are

    institutions and not individual users. This fact does not imply that individual users do not

    play an important part in the dissemination of information through the platform they

    actually do but the content being shared is not originally produced by them but byrenowned institutions which have achieved a high level of reputation and credibility prior to

    the existence of the microblogging platform itself. Individual users can still earn a position of

    [17] As referred by Twitter: http://support.twitter.com/articles/14016-about-public-and-protected-tweets . On

    this account it may be of interest to read an article on a legal decision related to issues of privacy and

    ownership on tweet content:http://www.salon.com/2012/04/26/who_owns_your_tweets/.

    [18] See the results from a study conducted by an internet marketing company (Pear) in 2009:

    http://www.pearanalytics.com/blog/2009/twitter-study-reveals-interesting-results-40-percent-pointless-

    babble/andhttp://www.pearanalytics.com/blog/2009/twitter-study-continuing-the-conversation/

    [19] The Arab Spring and related events being the most cited examples in this case:

    http://www.foreignpolicy.com/articles/2011/06/20/the_revolution_will_be_tweeted.

    [20]One example is the 2011 9.0 scale earthquake in Japan where the service played a role in disseminating

    news on specific locations. However, some authors also state that a more efficient usage of the tool can be

    promoted in future situations:http://www.sciencedaily.com/releases/2011/04/110415154734.htm .

    [21]On this matter see The Global Twitter Hearbeat project by SGI in partnership with the University of Illinois

    https://www.facebook.com/sgiglobal/app_164226463720371.

    http://support.twitter.com/articles/14016-about-public-and-protected-tweetshttp://support.twitter.com/articles/14016-about-public-and-protected-tweetshttp://support.twitter.com/articles/14016-about-public-and-protected-tweetshttp://www.salon.com/2012/04/26/who_owns_your_tweets/http://www.salon.com/2012/04/26/who_owns_your_tweets/http://www.salon.com/2012/04/26/who_owns_your_tweets/http://www.pearanalytics.com/blog/2009/twitter-study-reveals-interesting-results-40-percent-pointless-babble/http://www.pearanalytics.com/blog/2009/twitter-study-reveals-interesting-results-40-percent-pointless-babble/http://www.pearanalytics.com/blog/2009/twitter-study-reveals-interesting-results-40-percent-pointless-babble/http://www.pearanalytics.com/blog/2009/twitter-study-continuing-the-conversation/http://www.pearanalytics.com/blog/2009/twitter-study-continuing-the-conversation/http://www.pearanalytics.com/blog/2009/twitter-study-continuing-the-conversation/http://www.foreignpolicy.com/articles/2011/06/20/the_revolution_will_be_tweetedhttp://www.foreignpolicy.com/articles/2011/06/20/the_revolution_will_be_tweetedhttp://www.sciencedaily.com/releases/2011/04/110415154734.htmhttp://www.sciencedaily.com/releases/2011/04/110415154734.htmhttp://www.sciencedaily.com/releases/2011/04/110415154734.htmhttps://www.facebook.com/sgiglobal/app_164226463720371https://www.facebook.com/sgiglobal/app_164226463720371https://www.facebook.com/sgiglobal/app_164226463720371http://www.sciencedaily.com/releases/2011/04/110415154734.htmhttp://www.foreignpolicy.com/articles/2011/06/20/the_revolution_will_be_tweetedhttp://www.pearanalytics.com/blog/2009/twitter-study-continuing-the-conversation/http://www.pearanalytics.com/blog/2009/twitter-study-reveals-interesting-results-40-percent-pointless-babble/http://www.pearanalytics.com/blog/2009/twitter-study-reveals-interesting-results-40-percent-pointless-babble/http://www.salon.com/2012/04/26/who_owns_your_tweets/http://support.twitter.com/articles/14016-about-public-and-protected-tweets
  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    12/39

    12Ana Crisostomostudent n. 10397124

    relevance producing content which is considered to be worth sharing but the effort to

    achieve such has to be proportionally higher (in comparison to that of the institutions).

    Within this thematic area, the individual user performs an important activity in promoting

    (while not necessarily producing) content but these actions tend to reflect and perpetuatepower relations which exist outside (and, to a certain extent one could claim, independently

    of) the platform itself.

    Additionally, some authors are rather skeptical of Twitter based research since a significant

    number of these studies does not acknowledge the limitations of the data: Twitter does not

    represent the global population, the number of accounts does not provide the number of

    users (some users have multiple accounts and some accounts are used by multiple users)

    and many users are just bots [22].

    Considering the argument regarding the Web 2.0 industry that the more one uses and

    contributes to these platforms, the more valuable they become[23]

    , it could be of academic

    interest to critically examine more closely the type of contributions being made related to

    certain topics with special interest for the business and academic areas and try to ascertain

    if the content being shared holds primarily a promotional or an educational character and

    who are the main actors on an influential scale and how representative are theycontributions.

    [22] On this matter read Critical Questions for Big Data: Provocations for a Cultural, Technological, and

    Scholarly Phenomenonby Boyd and Crawford.

    [23]

    According to Shirky as cited by Rogers in Post-Demographic Machines - Studying Social Networking Sitesin Walled Garden(page 35).

  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    13/39

    13Ana Crisostomostudent n. 10397124

    Literature

    Offline references

    Boyd, Danah. Crawford, Kate. Critical Questions for Big Data: Provocations for a Cultural,

    Technological, and Scholarly Phenomenon.Information, Communication, & Society

    15. 5 (2012): 662-679.

    Poell, Thomas. Borra, Erik. "Twitter, YouTube, and Flickr as Platforms of Alternative

    Journalism: The Social Media Account of the 2010 Toronto G20 Protests."Journalism

    13. 6 (2012): 695-713.

    Rogers, Richard. "Post-demographic Machines." Walled Garden. Eds. Annet Dekker and

    Annette Wolfsberger. Amsterdam: Virtueel Platform, 2009. 29-39.

    Online references

    Anand, Kunal. Tweetures. 2011. 2 December 2012.

    .

    Beevolve. An Exhaustive Study of Twitter Users Across the World. 10 October 2012.

    Beevolve Technology Services Pvt. Ltd. 4 December 2012.

    .

    Bruns, Axel. Lang, Yuxian Eugene. Tools and Methods for Capturing Twitter Data during

    Natural Disasters.First Monday 17. 4 (2 April 2012): 670-685. 4 December 2012.

    .

    http://whatspop.com/entry/tweetureshttp://whatspop.com/entry/tweetureshttp://www.beevolve.com/twitter-statistics/#e1http://www.beevolve.com/twitter-statistics/#e1http://www.firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/3937/3193http://www.firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/3937/3193http://www.firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/3937/3193http://www.firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/3937/3193http://www.firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/3937/3193http://www.beevolve.com/twitter-statistics/#e1http://whatspop.com/entry/tweetures
  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    14/39

    14Ana Crisostomostudent n. 10397124

    Digital Methods Initiative. For the ppl of Iran - #iranelection RT. 2009. Digital Methods

    Initiative, Summer School, Universiteit Van Amsterdam (UvA). 3 December 2012.

    .

    Hemment, Drew. Stefaner,Moritz.Emoto - Visualising the Emotional Response to LONDON

    2012. 2012.Studio NAND - A FutureEverything project with MIT SENSEable City Lab.

    3 December 2012. .

    Hoy, Amy. Fuchs, Thomas. Twistori. 2008. 2 December 2012. .

    Inderscience Publishers. Twitter and Natural Disasters: Crisis Communication Lessons fromthe Japan Tsunami. ScienceDaily. 15 April 2011. 4 December 2012.

    .

    Jansen et al. Twitter power: Tweets as electronic word of mouth. Journal of the American

    Society for Information Science and Technology60. 11 (November 2009): 21692188.

    4 December 2012. .

    Kelly, Ryan. Twitter Study Reveals Interesting Results About Usage 40% is Pointless

    Babble. Pear Analytics Blog. 12 August 2009. Pear Analytics. 3 December 2012.

    .

    Kelly, Ryan. Twitter Study Part 2 Continuing the Conversation. Pear Analytics Blog. 24

    August 2009. Pear Analytics. 3 December 2012.

    .

    Klout. "Frequently Asked Questions - What's the average Klout Score?" Klout, Inc. 3

    December 2012. .

    Klout. Klout Score. Klout, Inc. 3 December 2012. .

    https://movies.issuecrawler.net/for_the_ppl_of_iran.htmlhttps://movies.issuecrawler.net/for_the_ppl_of_iran.htmlhttps://movies.issuecrawler.net/for_the_ppl_of_iran.htmlhttp://blog.emoto2012.org/http://blog.emoto2012.org/http://twistori.com/http://www.sciencedaily.com/releases/2011/04/110415154734.htmhttp://www.sciencedaily.com/releases/2011/04/110415154734.htmhttp://onlinelibrary.wiley.com/doi/10.1002/asi.21149/fullhttp://onlinelibrary.wiley.com/doi/10.1002/asi.21149/fullhttp://www.pearanalytics.com/blog/2009/twitter-study-reveals-interesting-results-40-percent-pointless-babble/http://www.pearanalytics.com/blog/2009/twitter-study-reveals-interesting-results-40-percent-pointless-babble/http://www.pearanalytics.com/blog/2009/twitter-study-reveals-interesting-results-40-percent-pointless-babble/http://www.pearanalytics.com/blog/2009/twitter-study-reveals-interesting-results-40-percent-pointless-babble/http://www.pearanalytics.com/blog/2009/twitter-study-continuing-the-conversation/http://www.pearanalytics.com/blog/2009/twitter-study-continuing-the-conversation/http://www.pearanalytics.com/blog/2009/twitter-study-continuing-the-conversation/http://www.pearanalytics.com/blog/2009/twitter-study-continuing-the-conversation/http://klout.com/#/corp/faqhttp://klout.com/#/corp/faqhttp://klout.com/#/corp/klout_scorehttp://klout.com/#/corp/klout_scorehttp://klout.com/#/corp/faqhttp://www.pearanalytics.com/blog/2009/twitter-study-continuing-the-conversation/http://www.pearanalytics.com/blog/2009/twitter-study-continuing-the-conversation/http://www.pearanalytics.com/blog/2009/twitter-study-reveals-interesting-results-40-percent-pointless-babble/http://www.pearanalytics.com/blog/2009/twitter-study-reveals-interesting-results-40-percent-pointless-babble/http://onlinelibrary.wiley.com/doi/10.1002/asi.21149/fullhttp://www.sciencedaily.com/releases/2011/04/110415154734.htmhttp://twistori.com/http://blog.emoto2012.org/https://movies.issuecrawler.net/for_the_ppl_of_iran.html
  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    15/39

    15Ana Crisostomostudent n. 10397124

    Lennard, Natasha. Who owns your tweets? Salon. 26 April 2012. Salon Media Group Inc. 4

    December 2012. .

    Sakaki et al. Earthquake shakes Twitter users: real-time event detection by social sensors.

    WWW '10 Proceedings of the 19th international conference on World Wide Web .

    2010: 851-860. 4 December 2012. .

    SGI. The Global Twitter Heartbeat. Silicon Graphics International Corp. 2 December 2012.

    .

    Siegler, MG. Twitter Hatches The New Twitter.com A New Two-Pane Experience (Live).

    TechCrunch. 14 September 2010. AOL Inc. 4 December 2012.

    .

    Tumasjan et al. Predicting Elections with Twitter: What 140 Characters Reveal about

    Political Sentiment. Proceedings of the Fourth International AAAI Conference on

    Weblogs and Social Media. 2010: 178-185. 4 December 2012.

    .

    Twitter Blog. #numbers. 14 March 2011. Twitter Inc. 3 December 2012.

    .

    Twitter Blog. Twitter turns 6. 21 March 2012. Twitter Inc. 3 December 2012.

    .

    Twitter Help Center. About Public and Protected Tweets.Twitter Inc. 3 December 2012.

    .

    http://www.salon.com/2012/04/26/who_owns_your_tweets/http://www.salon.com/2012/04/26/who_owns_your_tweets/http://dl.acm.org/citation.cfm?id=1772777http://dl.acm.org/citation.cfm?id=1772777https://www.facebook.com/sgiglobal/app_164226463720371https://www.facebook.com/sgiglobal/app_164226463720371http://techcrunch.com/2010/09/14/twitter-event/http://techcrunch.com/2010/09/14/twitter-event/http://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/viewFile/1441/1852http://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/viewFile/1441/1852http://blog.twitter.com/2011/03/numbers.htmlhttp://blog.twitter.com/2011/03/numbers.htmlhttp://blog.twitter.com/2012/03/twitter-turns-six.htmlhttp://blog.twitter.com/2012/03/twitter-turns-six.htmlhttp://blog.twitter.com/2012/03/twitter-turns-six.htmlhttp://support.twitter.com/articles/14016-about-public-and-protected-tweetshttp://support.twitter.com/articles/14016-about-public-and-protected-tweetshttp://support.twitter.com/articles/14016-about-public-and-protected-tweetshttp://blog.twitter.com/2012/03/twitter-turns-six.htmlhttp://blog.twitter.com/2011/03/numbers.htmlhttp://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/viewFile/1441/1852http://techcrunch.com/2010/09/14/twitter-event/https://www.facebook.com/sgiglobal/app_164226463720371http://dl.acm.org/citation.cfm?id=1772777http://www.salon.com/2012/04/26/who_owns_your_tweets/
  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    16/39

    16Ana Crisostomostudent n. 10397124

    Zorina, Kat. van der Vleuten, Ruben. Frantzis, Kostantinos. Tasty Tweets. 2012. 2

    December 2012. .

    http://www.kfrantzis.com/Tasty-Tweetshttp://www.kfrantzis.com/Tasty-Tweetshttp://www.kfrantzis.com/Tasty-Tweets
  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    17/39

    17Ana Crisostomostudent n. 10397124

    Appendix 1

    Network Visualization 1 - Hashtag network for #datascience

    Source:http://hashtagify.me

    http://hashtagify.me/http://hashtagify.me/http://hashtagify.me/http://hashtagify.me/
  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    18/39

    18Ana Crisostomostudent n. 10397124

    Network Visualization 2 - Hashtag network for #dataviz

    Source:http://hashtagify.me

    http://hashtagify.me/http://hashtagify.me/http://hashtagify.me/http://hashtagify.me/
  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    19/39

    19Ana Crisostomostudent n. 10397124

    Network Visualization 3 - Hashtag network for #datavis

    Source:http://hashtagify.me

    http://hashtagify.me/http://hashtagify.me/http://hashtagify.me/http://hashtagify.me/
  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    20/39

    20Ana Crisostomostudent n. 10397124

    Network Visualization 4 - Hashtag network for #bigdata

    Source:http://hashtagify.me

    http://hashtagify.me/http://hashtagify.me/http://hashtagify.me/http://hashtagify.me/
  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    21/39

    21Ana Crisostomostudent n. 10397124

    Appendix 2

    0102030405060708090

    100

    Chart 5 - Klout score of top 15 most mentioned users

    0

    5,000

    10,000

    15,000

    20,000

    25,000

    30,000

    35,000

    40,000

    45,000

    Chart 6 - Total number of tweets of top 15 most mentioned users

  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    22/39

    22Ana Crisostomostudent n. 10397124

    0

    500,000

    1,000,000

    1,500,000

    2,000,000

    2,500,000

    Chart 7 - Total number of followers of top 15 most mentioned users

    0

    1,000

    2,000

    3,000

    4,000

    5,000

    6,000

    7,000

    8,000

    9,000

    Chart 8 - Total number of following of top 15 most mentioned users

  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    23/39

    23Ana Crisostomostudent n. 10397124

    Appendix 3

    Diagram 1Word Cluster for @cloudexpo

    Source:http://tweettopicexplorer.neoformix.com/

    http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/
  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    24/39

    24Ana Crisostomostudent n. 10397124

    Diagram 2Word Cluster for @bigdatablogs

    Source:http://tweettopicexplorer.neoformix.com/

    http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/
  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    25/39

    25Ana Crisostomostudent n. 10397124

    Diagram 3Word Cluster for @forbes

    Source:http://tweettopicexplorer.neoformix.com/

    http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/
  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    26/39

    26Ana Crisostomostudent n. 10397124

    Diagram 4Word Cluster for @harvardbiz

    Source:http://tweettopicexplorer.neoformix.com/

    http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/
  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    27/39

    27Ana Crisostomostudent n. 10397124

    Diagram 5Word Cluster for @ibmbigdata

    Source:http://tweettopicexplorer.neoformix.com/

    http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/
  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    28/39

    28Ana Crisostomostudent n. 10397124

    Diagram 6Word Cluster for @albertocairo

    Source:http://tweettopicexplorer.neoformix.com/

    http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/
  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    29/39

    29Ana Crisostomostudent n. 10397124

    Diagram 7Word Cluster for @benkerschberg

    Source:http://tweettopicexplorer.neoformix.com/

    http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/
  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    30/39

    30Ana Crisostomostudent n. 10397124

    Diagram 8Word Cluster for @bigdataexpo

    Source:http://tweettopicexplorer.neoformix.com/

    http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/
  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    31/39

    31Ana Crisostomostudent n. 10397124

    Diagram 9Word Cluster for @informaticacorp

    Source:http://tweettopicexplorer.neoformix.com/

    http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/
  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    32/39

    32Ana Crisostomostudent n. 10397124

    Diagram 10Word Cluster for @wsj

    Source:http://tweettopicexplorer.neoformix.com/

    http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/
  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    33/39

    33Ana Crisostomostudent n. 10397124

    Diagram 11Word Cluster for @ventanaresearch

    Source:http://tweettopicexplorer.neoformix.com/

    http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/
  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    34/39

    34Ana Crisostomostudent n. 10397124

    Diagram 12Word Cluster for @worldbank

    Source:http://tweettopicexplorer.neoformix.com/

    http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/
  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    35/39

    35Ana Crisostomostudent n. 10397124

    Diagram 13Word Cluster for @faceofbigdata

    Source:http://tweettopicexplorer.neoformix.com/

    http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/
  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    36/39

    36Ana Crisostomostudent n. 10397124

    Diagram 14Word Cluster for @sqlserver

    Source:http://tweettopicexplorer.neoformix.com/

    http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/
  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    37/39

    37Ana Crisostomostudent n. 10397124

    Diagram 15Word Cluster for @iabuk

    Source:http://tweettopicexplorer.neoformix.com/

    http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/http://tweettopicexplorer.neoformix.com/
  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    38/39

    38Ana Crisostomostudent n. 10397124

    Appendix 4

    Network Visualization 5Co-hashtag Analysis as rendered by Gephi

  • 8/10/2019 Examining User Generated Content on Data Science through Twitter

    39/39

    Appendix 5

    Files with extracted data from the Twitter Analytics tool:

    datascience__2012-11-27_2012-12-03__hashtag_min5(this file contains 4

    worksheets)

    datascience__2012-11-27_2012-12-03__mention_min5(this file contains 4

    worksheets)

    https://docs.google.com/spreadsheet/ccc?key=0AijGdqUTikIqdGM3Ny1WaUlXX2hYQXZPQWkzVG94Y2chttps://docs.google.com/spreadsheet/ccc?key=0AijGdqUTikIqdGM3Ny1WaUlXX2hYQXZPQWkzVG94Y2chttps://docs.google.com/spreadsheet/ccc?key=0AijGdqUTikIqdEl2QnhubVFZVUc3MnRlWjFabGZnYmchttps://docs.google.com/spreadsheet/ccc?key=0AijGdqUTikIqdEl2QnhubVFZVUc3MnRlWjFabGZnYmchttps://docs.google.com/spreadsheet/ccc?key=0AijGdqUTikIqdEl2QnhubVFZVUc3MnRlWjFabGZnYmchttps://docs.google.com/spreadsheet/ccc?key=0AijGdqUTikIqdGM3Ny1WaUlXX2hYQXZPQWkzVG94Y2c