8

Click here to load reader

Guess the Country - Playing with Twitter Streaming API

Embed Size (px)

DESCRIPTION

Using the Twitter statuses sample API to build a namecountry database

Citation preview

Page 1: Guess the Country - Playing with Twitter Streaming API

Guess the CountryPlaying with Twitter Streaming API

Chris Birchall#m3dev Tech Talk 2014/7/11

Page 2: Guess the Country - Playing with Twitter Streaming API

It started with an idle tweet...

https://twitter.com/cbirchall/status/466197512143912961

Page 3: Guess the Country - Playing with Twitter Streaming API

Let’s use Twitter for something (slightly) useful!

The plan:● Collect geo-tagged tweets from Twitter

Streaming API● Use them to build a name⇔country DB● Build a simple search UI as a proof of

concept● (crowbar Spark in there somewhere

because it’s cool)

Page 4: Guess the Country - Playing with Twitter Streaming API

Implementation

TwitterStreaming

API

EC2

https://github.com/cb372/guess-the-country

Twitter4j

.log

Fluentd

S3

EC2

Spark

Postgres(RDS)

Heroku

Rails

Page 5: Guess the Country - Playing with Twitter Streaming API

Collecting tweets

● Ran the collector for 13 days● Collected 285,340 geo-tagged tweets● 205,798 distinct users● Only collected names and countries,

threw everything else away

● Used Spark to filter out duplicate usersProcessing

Page 6: Guess the Country - Playing with Twitter Streaming API

Stats

Top 10 countries by user count

Distinct countries = 204Distinct first names = 40,689 Distinct last names = 81,674

country | percentage-----------------------------+------------ United States | 39.4 United Kingdom | 10.1 Indonesia | 8.9 Brasil | 8.1 Türkiye | 3.9 España | 2.4 México | 2.2 Republic of the Philippines | 2.0 Canada | 1.8 Malaysia | 1.8

first_name------------ chris alex david michael sarah

second_name------------- smith jones garcia williams johnson

Most popular first names

Most popular surnames

Page 7: Guess the Country - Playing with Twitter Streaming API

Results

It works surprisingly well!

(well, it worked for my name, anyway)

Note for the pedantic: Since the original data is geo-tagged tweets, strictly speaking we only know where a user is, not where they come from.

Page 8: Guess the Country - Playing with Twitter Streaming API

Try for yourself

Demohttp://guess-the-country.herokuapp.com/

Codehttps://github.com/cb372/guess-the-country