Twitter Big Data Logging Jonathan Durda and...

Preview:

Citation preview

Twitter Big Data Logging

Jonathan Durda and

Shashank Kumar KalakuntlaUnder the guidance of Dr. Sunnie Chung

Cleveland State University, Fall 2014 CIS 612

How to get data from Twitter?

� Neat, structured data from providers such as Gnip

� Problem? Big $$$!

How to get data from Twitter?

� Twitter allows access to real-time tweets through OAuth

� Create app, which provides unique access tokens

How to get data from Twitter?

How to get data from Twitter?

Setting up data stream

� Use Apache Flume to get stream of tweets

� Use consumer key, access token

� Store tweets in JSON format in HDFS

� Issues – config file not pointing to correct location for HDFS,

access token not entered

Setting up data stream

Data in HDFS

Data in HDFS

What to use to analyze data

� Use Hive to analyze our raw data

� Why Hive?

� Readability - familiarity of commands to SQL

� Persistence – Hive tables point to data in HDFS, therefore

tables still live when quitting and restarting

� Maintenance – Hive is very easy to maintain

Run Analysis on Data

Run Analysis on Data

� Now that we have data imported into a table created in Hive,

we can run queries to analyze the data

� How many tweets have I downloaded to work with? Lets find

out!

Run Analysis on data

Run Analysis on Data

Run Analysis on Data

Run Analysis on Data

Run Analysis on Data

Run Analysis on Data

Run Analysis on Data

Run Analysis on Data

Conclusion

�Questions?

�Thank you for listening!

Recommended