44
What’s big data? Some examples Problems A glimpse in the kitchen Questions? #bigdata in Communication Science Some examples from research by me and my students Damian Trilling [email protected] @damian0604 www.damiantrilling.net Afdeling Communicatiewetenschap Universiteit van Amsterdam October 2013 #bigdata Damian Trilling

Guestlecture on #bigdata

Embed Size (px)

DESCRIPTION

A guest lecture in the Master elective "The Blind Spot: Tracking Young Media Users" by Susanne Baumgartner

Citation preview

Page 1: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

#bigdata in Communication Science

Some examples from researchby me and my students

Damian Trilling

[email protected]@damian0604

www.damiantrilling.net

Afdeling CommunicatiewetenschapUniversiteit van Amsterdam

October 2013#bigdata Damian Trilling

Page 2: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

1 What’s big data?

2 Some examplesRare eventsTone in tweetsCounting words and n-gramsNetwork analysis

3 Problems

4 A glimpse in the kitchen

5 Questions?

#bigdata Damian Trilling

Page 3: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

What’s big data?

#bigdata Damian Trilling

Page 4: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

What’s big data?

No definition, but . . .

• Existing data• Too big to code manually• Sometimes also too big to handle with normal tools• New research questions• Call to revisit the relationship between theory and empiricalresearch

#bigdata Damian Trilling

Page 5: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

What’s big data?

Some sources

• Social Network Sites• RSS-feeds• Databases• Scraping text from the web• . . .

#bigdata Damian Trilling

Page 6: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

It’s out there!You only have to collect it.

#bigdata Damian Trilling

Page 7: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Some examples

#bigdata Damian Trilling

Page 8: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Rare events

A recent master thesis

Rare events

Imagine you want to analyze some very rare content.Normal sampling won’t work, that’s for sure.

#bigdata Damian Trilling

Page 9: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Rare events

A recent master thesis

Rare eventsImagine you want to analyze some very rare content.

Normal sampling won’t work, that’s for sure.

#bigdata Damian Trilling

Page 10: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Rare events

A recent master thesis

Rare eventsImagine you want to analyze some very rare content.Normal sampling won’t work, that’s for sure.

#bigdata Damian Trilling

Page 11: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Rare events

So you’d better collect everything first

Getting all news coverage from Dutch news sites

We collected all articles from nine news sites during a period oftwo months, resulting in a database with 74.000 articles.In a second step, we filtered those articles containing specifickeywords. Those 292 articles where then manually coded.

Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as asource for journalists in online news articles. Master Thesis, Universiteit van Amsterdam.

#bigdata Damian Trilling

Page 12: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Rare events

So you’d better collect everything first

Getting all news coverage from Dutch news sitesWe collected all articles from nine news sites during a period oftwo months, resulting in a database with 74.000 articles.

In a second step, we filtered those articles containing specifickeywords. Those 292 articles where then manually coded.

Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as asource for journalists in online news articles. Master Thesis, Universiteit van Amsterdam.

#bigdata Damian Trilling

Page 13: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Rare events

So you’d better collect everything first

Getting all news coverage from Dutch news sitesWe collected all articles from nine news sites during a period oftwo months, resulting in a database with 74.000 articles.In a second step, we filtered those articles containing specifickeywords. Those 292 articles where then manually coded.

Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as asource for journalists in online news articles. Master Thesis, Universiteit van Amsterdam.

#bigdata Damian Trilling

Page 14: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Rare events

#bigdata Damian Trilling

Page 15: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Rare events

It’s just one line of code!

url.txthttp://www.gmx.at/themen/wissen/mensch/108g5xi-baeuerlich-schiefe-zaehnehttp://www.gmx.at/themen/unterhaltung/klatsch-tratsch/408g740-fuermann-bittet-um-verzeihunghttp://www.gmx.at/themen/nachrichten/aufruhr-arabien/268g70u-regierung-will-zuruecktretenhttp://www.gmx.at/themen/nachrichten/panorama/828g54y-neues-zur-klage-gegen-republikhttp://www.gmx.at/themen/nachrichten/panorama/968g72s-millionstrafe-wegen-oelpesthttp://www.gmx.at/themen/unterhaltung/klatsch-tratsch/368g6yc-kein-babybauch-nur-fast-food. . .. . .. . .

wget-commandowget -i urls.txt

#bigdata Damian Trilling

Page 16: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Tone in tweets

A recent bachelor thesis

Tone in tweets

Imagine you want to know something about someone’s behavior ontwitter. Or how a specific topic is discussed on Twitter.Do you really want to go through thousands of tweets by hand?

#bigdata Damian Trilling

Page 17: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Tone in tweets

A recent bachelor thesis

Tone in tweetsImagine you want to know something about someone’s behavior ontwitter. Or how a specific topic is discussed on Twitter.

Do you really want to go through thousands of tweets by hand?

#bigdata Damian Trilling

Page 18: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Tone in tweets

A recent bachelor thesis

Tone in tweetsImagine you want to know something about someone’s behavior ontwitter. Or how a specific topic is discussed on Twitter.Do you really want to go through thousands of tweets by hand?

#bigdata Damian Trilling

Page 19: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Tone in tweets

So you’d better think about automating your coding

Finding out how negative or positive politicians are towardsthere opponents

We took lists with positive and negative words and with apolitician’s opponents.We used a Python-script to check which type of words were usedto refer to opponents.For further analysis, the results where imported in SPSS.

Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarendefactoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici enpolitieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.

#bigdata Damian Trilling

Page 20: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Tone in tweets

So you’d better think about automating your coding

Finding out how negative or positive politicians are towardsthere opponentsWe took lists with positive and negative words and with apolitician’s opponents.

We used a Python-script to check which type of words were usedto refer to opponents.For further analysis, the results where imported in SPSS.

Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarendefactoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici enpolitieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.

#bigdata Damian Trilling

Page 21: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Tone in tweets

So you’d better think about automating your coding

Finding out how negative or positive politicians are towardsthere opponentsWe took lists with positive and negative words and with apolitician’s opponents.We used a Python-script to check which type of words were usedto refer to opponents.

For further analysis, the results where imported in SPSS.

Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarendefactoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici enpolitieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.

#bigdata Damian Trilling

Page 22: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Tone in tweets

So you’d better think about automating your coding

Finding out how negative or positive politicians are towardsthere opponentsWe took lists with positive and negative words and with apolitician’s opponents.We used a Python-script to check which type of words were usedto refer to opponents.For further analysis, the results where imported in SPSS.

Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarendefactoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici enpolitieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.

#bigdata Damian Trilling

Page 23: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Tone in tweets

#bigdata Damian Trilling

Page 24: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Tone in tweets

#bigdata Damian Trilling

Page 25: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Counting words and n-grams

How often are specific expressions used?

Counting words and n-grams

Imagine you want to know which words or expressions dominate adiscourse .There are plenty of possibilities to get an answer within minutes,here’s one:

#bigdata Damian Trilling

Page 26: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Counting words and n-grams

How often are specific expressions used?

Counting words and n-gramsImagine you want to know which words or expressions dominate adiscourse .

There are plenty of possibilities to get an answer within minutes,here’s one:

#bigdata Damian Trilling

Page 27: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Counting words and n-grams

How often are specific expressions used?

Counting words and n-gramsImagine you want to know which words or expressions dominate adiscourse .There are plenty of possibilities to get an answer within minutes,here’s one:

#bigdata Damian Trilling

Page 28: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Counting words and n-grams

Again, just one or two lines of code!

For example with STATA

• Install the package wordscore (net installhttp://www.tcd.ie/Political_Science/wordscores/wordscores)

• voor wordcounts: wordfreq /home/dami/texts/lab92.txt/home/dami/texts/lab97.txt

• voor ngrams (trigrams in dit geval): phrasefreq 3 lab92.txtlab97.txt

#bigdata Damian Trilling

Page 29: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Counting words and n-grams

trigrams in Obama-Tweets

#bigdata Damian Trilling

Page 30: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Network analysis

Another approach

Network analysis

Imagine you want to know who talks to whom and how networksare interconnected .Use a tool like NodeXL or Gephi!

#bigdata Damian Trilling

Page 31: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Network analysis

Another approach

Network analysisImagine you want to know who talks to whom and how networksare interconnected .

Use a tool like NodeXL or Gephi!

#bigdata Damian Trilling

Page 32: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Network analysis

Another approach

Network analysisImagine you want to know who talks to whom and how networksare interconnected .Use a tool like NodeXL or Gephi!

#bigdata Damian Trilling

Page 33: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Network analysis

#bigdata Damian Trilling

Page 34: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Problems

#bigdata Damian Trilling

Page 35: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Problems

You sometimes depend entirely on commercial parties

• Services can shut down (GoogleReader) or change their API(Twitter)

• It’s rather easy to get (up to 3200) tweets from a specific user(e.g., allmytweets.net), but if you want to capture a#hashtag, you have to record it live

• Twitter doesn’t give you all tweets, but just about 1% (+ abunch of other limits)

#bigdata Damian Trilling

Page 36: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Problems

You sometimes depend entirely on commercial parties

• Services can shut down (GoogleReader) or change their API(Twitter)

• It’s rather easy to get (up to 3200) tweets from a specific user(e.g., allmytweets.net), but if you want to capture a#hashtag, you have to record it live

• Twitter doesn’t give you all tweets, but just about 1% (+ abunch of other limits)

#bigdata Damian Trilling

Page 37: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Problems

You sometimes depend entirely on commercial parties

• Services can shut down (GoogleReader) or change their API(Twitter)

• It’s rather easy to get (up to 3200) tweets from a specific user(e.g., allmytweets.net), but if you want to capture a#hashtag, you have to record it live

• Twitter doesn’t give you all tweets, but just about 1% (+ abunch of other limits)

#bigdata Damian Trilling

Page 38: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Problems

You sometimes depend entirely on commercial parties

• Services can shut down (GoogleReader) or change their API(Twitter)

• It’s rather easy to get (up to 3200) tweets from a specific user(e.g., allmytweets.net), but if you want to capture a#hashtag, you have to record it live

• Twitter doesn’t give you all tweets, but just about 1% (+ abunch of other limits)

#bigdata Damian Trilling

Page 39: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Problems

Not sure if this a problem or a great opportunity. . .You cannot rely (only) on ready-made software but shout get readyto use tools like bash-scripts, grep, python, . . . (Which can be fun!)

#bigdata Damian Trilling

Page 40: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

A glimpse in the kitchen

#bigdata Damian Trilling

Page 41: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

What I’m doing right now

Analyzing #tvduell

• 570.000 tweets• Identifyig clusters of nouns, verbs and adjectives• Assigning positivity and negativity scores to tweets• See if they can be interpreted as frames

⇒How are Merkel and Steinbrück framed on the Second Secreenduring the debate?

#bigdata Damian Trilling

Page 42: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

#bigdata Damian Trilling

Page 43: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Something you can use?

1 What’s big data?

2 Some examplesRare eventsTone in tweetsCounting words and n-gramsNetwork analysis

3 Problems

4 A glimpse in the kitchen

5 Questions?

#bigdata Damian Trilling

Page 44: Guestlecture on #bigdata

What’s big data? Some examples Problems A glimpse in the kitchen Questions?

Vragen of opmerkingen?

Damian Trilling

[email protected]@damian0604

www.damiantrilling.net

#bigdata Damian Trilling