View
128
Download
3
Category
Preview:
Citation preview
TWITTER SENTIMENT ANALYSIS
HARSHIT SANGHVI
DATA COLLECTION AND PREPROCESSING
• 27 million tweets (180GB)
• Collected in a span of ~1 week (05/05/2015 to 05/09/2015)
• Using Java program running on Amazon EC2
• Stored into MongoDB on Amazon EC2
• Cleaning up text of the tweets• Punctuations, numbers, small words, remove stop words
• Filter tweets• In non-English language
• Without location data
SENTIMENT ANALYSIS
• Create Sentiment Prediction model using• Opinion Lexicon (http://www.cs.uic.edu/~liub/FBS/opinion-lexicon-English.rar)
• Using Movie Review Dataset (http://ai.stanford.edu/~amaas/data/sentiment)
USING KNIME
VISUALIZATIONS
TWEETS PER DAY PER HOUR
TOP 10 MOST USED HASHTAGS
• Shows most commonly discussed topic on twitter
TOP 5 MOST POPULAR USERS
WORD FREQUENCY
• Showing words with frequency > 500 and sorted Alphabetically
WORD ASSOCIATIONS
• E.g. “Day” appears more with “Mother” and “Happy” and “Birthday”.
LETTER FREQUENCY
# OF WORDS BY LETTER FREQUENCY
LETTER POSITION HEATMAP
SENTIMENT TIMELINE
PRESENTATION USING SHINY
WORD CLOUD
NEGATIVE TWEETS
POSITIVE TWEETS
REFERENCES
• Opinion Lexicon (http://www.cs.uic.edu/~liub/FBS/opinion-lexicon-English.rar)
• Using Movie Review Dataset (http://ai.stanford.edu/~amaas/data/sentiment)
• Twitter Data Mining & Visualizations (http://bit.ly/twtvis)
• R Studio (https://www.rstudio.com)
• Sentiment Analysis using KNIME (http://www.knime.org/blog/sentiment-analysis)
Recommended