#FAIL!THINGS THAT DIDN‘T WORK OUT IN SOCIAL
MEDIA RESEARCH- AND WHAT WE CAN LEARN FROM THEM
Workshop at Internet Research 16, Phoenix, October 21st, 2015.
• Workshop hashtag: #fail2015b• Conference hashtag: #ir16• Workshop website:
http://failworkshops.wordpress.com• Etherpad: https://pad.okfn.org/p/fail
WELCOME
Luca Rossi@LR
Karine Nahon@karineb
Katrin Weller@kwelle
ABOUT #FAIL! WORKSHOPS
• Traveling on to different conferences. First workshop was at WebSci15 (June 2015)
• Aim: collect various examples for things that can go wrong and share them with different communities
learn from experiencesConnect different research communities
WHAT WE‘VE LEARNED SO FAR
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 20130
100
200
300
400
500
600
TwitterFacebookYouTubeBlogsWikisFoursquareLinkedInMySpace
Number of publications per year, which mention the respective social media platform‘s name in their title. Scopus Title Search. For details: http://kwelle.wordpress.com/2014/04/07/bibliometric-analysis-of-social-media-research/
SOCIAL MEDIA RESEARCH
2008-2013 papers on Twitter and elections: data sources
Weller, K. (2014). Twitter und Wahlen: Zwischen 140 Zeichen und Milliarden von Tweets. In: R. Reichert (Ed.), Big Data: Analysen zum digitalen Wandel von Wissen, Macht und Ökonomie (pp. 239-257). Bielefeld: transcript.
6
Data source numberNo information 11Collected manually from Twitter website (Copy-Paste / Screenshot)
6
Twitter API (no further information) 8Twitter Search API 3Twitter Streaming API 1Twitter Rest API 1Twitter API user timeline 1Own program for accessing Twitter APIs 4Twitter Gardenhose 1Official Reseller (Gnip, DataSift) 3YourTwapperKeeper 3Other tools (e.g. Topsy) 6Received from colleagues 1
SOCIAL MEDIA RESEARCH
What we discussed at the first workshop…
CURRENT PROBLEMS
Challenge 1: users
• How to involve social media users in the research process?
• Presentation by Elodie Crespel: “Extending data collection with web browser extension” – Participants may be creative in their use of
technology – flexibility is needed.
Challenge 2: methods
• Data analytics: which approaches should be chosen?
• Taha Yasseri: “The double-edged sword of statistical significance” – Questions the p-value as a standard for data
analytics.– “too much of attention and reliance on specific
measures or methods without being aware of the logic behind them, can be misleading”
Challenge 3: tools
• Many researchers use third party tools for data collection or analysis – which may not always work as expected.
• Presentation by Michael Bossetta and Anamaria Dutceac Segesten: “Tracing Eurosceptic Party Networks via Hyperlink Network Analysis and #FAIL!ng: Can Web Crawlers Keep up with Web Design?” – Exemplary case: issuecrawler.
Challenge 4: content
• Content analysis is heavily effected by the dynamic nature of social media.
• Presentation by Marie Van Cranenbroeck: “Managing and Using Unstable Data in a Social Science Research about Museums and Audiences on Social Media” – Data collection and storage challenges
Specific details and additions
• Researchers and users may have different ideas about the definition of social media / social networks
• Lack of evaluation standards• Availability of data (also: not enough data)• Data may be corrupt (e.g. missing data)• Social media as a moving target (Karpf, D. (2012).
Social science research methods in Internet time. Information, Communication & Society 15(5):639-661. )
Meta discussion
• Social media research can have various forms. Different disciplines involved.
• Best practices and pitfalls in social media research are mainly discussed informally. Few possibilities to share unsuccessful approaches.
WHAT WE‘D LIKE TO LEARN TODAY
• Towards a categorization of challenges for social media research: what can go wrong?
• Collection of more experiences• Structuring them into different categories
WHAT WE‘D LIKE TO LEARN TODAY
Today: - 4 presentations- Think about your own experiences!
- … in connection to each presentation - … in general
9:00 Introduction: “What we’ve learnt from the first workshop and what we’d like to learn today”
9:15 Shawn Walker: “Complexity of collecting social media data in ephemeral contexts”
9:40 Cornelius Puschmann: “Why LIWC sucks (or: saner options for social media content analysis)”
10:05 Break10:20 Luca Rossi: “The fourth deadly sin of social media researchers
(or: scientific research and unstable socio-technical platforms)”10:45 Marco Toledo Bastos, “Individual Behavior from Aggregate
Social Media Data“11:10 Discussion & Conclusions12:00 End
PROGRAM
• Other experiences? Share your thoughts!• Main categories of #fail cases?• Top 3 take away messages for next workshop?
DISCUSSION
WHERE TO GO FROM HERE?
• Next steps – lessons learnt for future workshop organisation
• Which additional conferences?• Publication? Guidebook?
• Archiving: – URLs may vanish (Question: linear rate of decay?) – Images missing– Platforms changing (moving target!) – not just about the interface!
• Visualization of results– Word cloud (compare histograms)
• Tools – sentiment140, Internet Archive, GNIP
• Methods– Content Analysis:
• replicability? Validation?• Context for social media contents (e.g. surrounding tweets).• LIWC, General Inquirer
– Predictions – „Data Science“
• Lack of theory
• Data Quality: – Can we still cite/use data and research published in 2007/2008`?– Baseline? (how to define for a moving target)
• Theory:– Can we only do descriptive work for single platforms?– Look for the theory instead for the data?
• Meta– Systematic review of existing literature is needed
• Documentation– Timeframe generalizaion– Document time, cultures?– How long will my results be valid?– Have a general base for comparison