View
6.377
Download
5
Category
Tags:
Preview:
DESCRIPTION
http://www.zd8a.com Slide Deck Focus Sentiment analysis through Facebook and Twitter leveraging -Hadoop -MongoDB -Mahout -Greenplum -Solr This slide deck was a product of developing a sentiment and text analytics engine. We leveraged Facebook Connect, Twitter Firehose and web scrapting to gather text and store it in both MongoDB and Hadoop. Once we had it stored we performed Mahout and Solr text searching and anlytics to determine trends within the data. Although our dataset was not large enough to need it, we used Greenplum as a test MPP database to tie all three of those technologies into one dashboard using Pentaho.
Citation preview
Advanced Political Analysis through “Big Data”
Elections 2012
Z DATA’S AGILE ANALYSIS – THE “BIG DATA STACK”
• How we leverage the “Big Data” stack?– Technology
• Don’t back your problem into available technologies, leave your toolset open.• Organically grow new skillsets, hire the right individuals
– Development• Be agile in your approach• Comparative analysis both using new mathematical methods and open source technologies
– Embrace the shift into a data driven world• Empower your Engineering and Science team to be creative• Let the data lead your direction • Use new data types previously unavailable to drive insights
“Associating structured and unstructured data at relevant points is where the most value is gained and where the highest level of challenge is presented.” – Ryan Abo PHD – Z Data Inc.
ANALYZING THE POLITICAL LANDSCAPE
Phase 1
• Location based Google Search and Twitter mentions
• Word pair mentions
Phase 2
• Facebook and Twitter Sentiment and Geospatial Analysis
Structured Data• Standard Datawarehouse – finance, sales• GeoSpatial – locations, places• Technologies – Greenplum, Netezza, Teradata
Unstructured Data • Textual Objects - Social Media, Blogs, forums• Bitmap Objects – images, video, audio• Technologies – Hadoop, Cassandra, Solr, NoSql
UNSTRUCTURED AND STRUCTURED DATACOMPLEMENTING YOUR TECHNOLOGIES
Identifying Unstructured Data Sources
- User Likes and Favorites
- Article/Video/Link Shares
- Views
- Comments
- Location / Geospatial
Tweet Characteristics
- Length
- Language Model
- Symantics
- Emoticons
- Location / Geospatial
Google / You Tube
- Blogs
- Comments
- Search Statistics
- Likes vs Dislikes
- Shares / Views / Comments
Objective: Identify and leverage social media outlets to better predict the overall sentiment across political candidates.
Search Engine Data
• Number of Searches for a candidate or political party
• Word pair / combination analysis
Why should we care?• Determine the most successful candidate
online• Effectiveness of campaigns and conversion
to online competitive content
SEARCH, MENTION AND WORD PAIR ANALYSIS
What is this sentiment they speak of?
• Unstructured Text Data
• Using computational linguistics to accurately determine the attitude of a writer with respect to a topic.
Why should we care?
• Use “Opinion Mining” to predict political bias
ADVANCED SENTIMENT ANALYSIS
Zdata Unstructured Cluster
Customer Data
Relational and UnstructuredAnalytics / BI
ELT
Z DATA ADVANCED ANALYTICS SOLUTION
Agile Analysis - Mathematical Methods
Prediction and Machine Learning
- Unigram and Bigram Features
- Bayesian Probability- Maximum Entropy- Distant Supervision- Support Vector
Machines
POLITICAL OPINION MINING
#obama #Kardashian#iran #bieber#biglove #romney#palin #healthcare#stimulus #nexttopmodel#bigdata #teaparty
#obama#iran#romney#palin#stimulus#teaparty
UnStructured Analysis -
Naïve Bayes classifier
Political ClassificationUnstructured + Structured
Political Relevance
Erica – Wow I love cookies in the morning, check out my new batch
Daria – #Romney speech was horrible that guy knows nothing
Daria – #Romney speech was horrible that guy knows nothing
NAÏVE BAYES
0%10%20%30%40%50%60%70%80%90%
100%
ACURACYPECISIONRECALL
NAÏVE BAYES
0%10%20%30%40%50%60%70%80%90%
100%
ACURACYPECISIONRECALL
ELECTIONS 2012 DASHBOARD
Positive
EducationEconomyForeign PolicyHealth Care
Neutral
EducationEconomyForeign PolicyHealth Care
Negative
EducationEconomyForeign PolicyHealth Care
Rom
ney
Paul
Hun
tsm
an
Gin
gric
h
Sant
orum
0
1
2
3
4
5
6
7
8
9
10
Orange County (January 2011 – May 2011)
SentimentActuals
FILTER BY:
Mitt RomneyRepublican Primary
Democratic Vote
Republican Vote
Democratic Sentiment
Republican Sentiment
SOCIAL SOLUTIONS WITH BIG DATAENOUGH TALK…
Recommended