CULTURAL ANALSYS OF CRAWLED CAMPAIGNS
USE CASES:US elections - Rock am Ring
Why cultural analysis
– there are significant differences in the general characteristics of the Twitter network in different countries,
– And more generally, in the different social media,– differences should be taken into account in
designing crawling and preservation strategies,
US Elections 2012 Crawl
• We crawled tweets on the US elections 2012 using some target key words (e.g., obama, romney, republican,…) from 1st to 11th November 2012 to
• Let’s analyze them from a language perspective
Languages distribution exhibits a power-law
Language’s usage are power distributed
• What does it means here “a power law”– By analyzing the figure, the largest majority of
engaged users are tweeting in the same language “english” (exponentially more than tweets in othe languages)
– Ok, it is not suprising!
What about user engagement
• Do people engage in the same way whatever the language?
• How can we measure user engagement ?• There is many ways, one simple is:– The average number of tweets by user in a specific
language.
User engagement
en es de it in ar pt fr pl ko nl ja tr sv tl undda vi lt fa zh ru no is th el hu id ur iw bg ne ta hy hi bn si my ml km bo pa te ka kn gu lo0
2
4
6
8
10
12
14
16
18
20
#users/#tweets
Chinese and Iranian are highly Interested by US elections!!6127 tweets in farsi done by 430 users5282 tweets in 282 in Chinese
Language-based user engagement
Farsi and Mandarin, the us election’s languages.
• User engagement is higher in Farsi or in Mandarin than in English– It means, there is fewer people tweeting in Farsi
or Mandarin, but, people when considering people tweeting about the uselection, then people tweeting in Farsi or in Mandarin tweets in average more about uselection than people tweeting in english.
– It reveals an important cultural information taken out from the social media. Us elections get a great audience from farsi (Iran) speakers and mandarin (Chinese).
Rock am Ring campain
• Let’s do the same analysis on Rock am Ring event, by again crawling tweets from a set of key words related to the event
• But this time, let’s analyze countries• The country information can be in majority of
cases extracted from the user profiles.
Countries distributions
German
y
United
Brazil
Japan
Spain
Indonesia
Chile
Argentina
Mexico India
France
Venezu
ela
Canad
aRussi
a
Netherl
ands
Norway
ColombiaIta
ly
Turke
y
Australi
a
Belgium
Nigeria
Switz
erlan
dSo
uth
Poland
Austria
Thail
and
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
#tweets#users
Geo-location activity
What do we observe?
• Germany, United States and Brazil are the countries more engaged on Twitter for the event
User engagement ?
German
y
UnitedBraz
ilJap
anSp
ain
IndonesiaChile
Argentina
Mexico India
France
Venezu
ela
Canad
aRussi
a
Netherl
ands
Norway
ColombiaIta
ly
Turke
y
Australi
a
Belgium
Nigeria
Switz
erlan
dSo
uth
Poland
Austria
Thail
and
Portuga
l0
1
2
3
4
5
6
#tweets/#users
Geo-location user engagement
User engagement
• This time, user engagement analysis seems to tell us that users located in “Thailand” a very engaged with “Rock am Ring”
• Be careful, the sample is too small (40 tweets) to assess any statistical significance about this observation!
• Try to collect more Tweets.