SentweetTWITTER SENTIMENT ANALYSIS TOOLBusiness intelligence course A.A. 2015/16
EGIDI SARA
Motivation
Sentiment analysis Classification of the polarity of a given text in a
document, sentence or phrase Goal: determine whether the expressed opinion is
positive or negative Twitter
Microblogging tool, small sentences are less ambiguous
Variable audience Stock Market Products opinion Political elections
Twitter corpus (2)
Preprocessing
Tokenizer
Feature Extraction
Classify
User input
Retrieve tweets
Preprocess
Classify
Roadmap
The corpus
Two datasets: STS Stanford twitter corpus
Hand-labelled, different subjects40000 labelled balanced tweetsTweets from 2010
Auto generated using smiles ad labelsTwitter request rate limits
Preprocessing
Remove RTs English tweets Remove URLs, mentions, numbers Replace repeated characters
Replace emoticons by their polarity (auto generated database)
Have you heard about TEDx speech ? So great!by @yulia Soooin #Milan
https://www.ted.com/talks/insightful_human_portraits_made_from_data
Filters
Feature extractor Weka’s StringToWordVector
Stemmer Stoplist IDF-FT Tokenizer
Attribute Selection InfoGain and Ranker
Classifiers
FilteredClassifier (uses filters just on training set) SupportVectorMachine Naïve Bayes Naïve Bayes Multinomial J48 Decision tree
Naïve Bayes Multinomial Text ( only Weka 3.8 ) No attribute selection needed
Results
Implementation• Twitter4J• TwitterAPI• JavaFX
Thanks for your attention