View
198
Download
0
Category
Preview:
DESCRIPTION
Citation preview
IntroductionFramework
Sentiment analysisCase studiesConclusions
A Descriptive Analysis of Twitter Activity AroundBoston Terror Attacks
Álvaro Cuesta David F. Barrero María D. R-Moreno
Computer Engineering DepartmentUniversidad de Alcalá, Spain
ICCCI 2013Craiova, RomaniaSeptember 11, 2013
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 1 / 25
IntroductionFramework
Sentiment analysisCase studiesConclusions
Summary1 Introduction
MotivationObjectivesCase studies
2 FrameworkFramework overviewFramework messagingFramework components
3 Sentiment analysisOverviewClassifier
4 Case studiesBoston Terror AttackPolitical analysis
5 Conclusions and future workICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 2 / 25
IntroductionFramework
Sentiment analysisCase studiesConclusions
MotivationObjectivesCase studies
IntroductionMotivation
Great expansion of social networks in the lastyearsOne of the most successfull ones is Twitter
Microblogging platformShort messages known as tweetsOpen nature
Twitter offers great research opportunitiesOpen natureDistributed human sensor networkEasy data extraction, difficult dataprocessing
Twitter + sentiment analysisLack of tools for sentiment analysis inSpanish
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 3 / 25
IntroductionFramework
Sentiment analysisCase studiesConclusions
MotivationObjectivesCase studies
IntroductionObjectives
Twitter offers excelent API ... however there is a need of someinfraestructure (mainly storage and reporting)
Objectives1 Develop a framework for Twitter data extraction and analysis2 Provide reporting tools3 Foundation for sentiment analysis in Spanish
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 4 / 25
IntroductionFramework
Sentiment analysisCase studiesConclusions
MotivationObjectivesCase studies
IntroductionCase studies
In order to assess the framework, we have included two studycases
Event driven - Boston terror attackRegular usage - Political activity on Twitter in Spanish
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 5 / 25
IntroductionFramework
Sentiment analysisCase studiesConclusions
Framework overviewFramework messagingFramework components
Framework architectureOverview
RequirementsEasy to use, extensible, massive data processing
Design decisionsModular design: Collection of independent scriptsFocus on open data formatsBuilt around the database: MongoDB
Set of independent scripts interchanging data
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 6 / 25
IntroductionFramework
Sentiment analysisCase studiesConclusions
Framework overviewFramework messagingFramework components
Framework architectureFramework messaging
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 7 / 25
IntroductionFramework
Sentiment analysisCase studiesConclusions
Framework overviewFramework messagingFramework components
Framework architectureFramework components: Miner
MinerExtracts and storestweetsStream APISeveral filtersWritten in Python
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 8 / 25
IntroductionFramework
Sentiment analysisCase studiesConclusions
Framework overviewFramework messagingFramework components
Framework architectureFramework components: Database
DatabaseStorage for futherprocessingMongoDBNoSQL databaseHigh performance
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 9 / 25
IntroductionFramework
Sentiment analysisCase studiesConclusions
Framework overviewFramework messagingFramework components
Framework architectureFramework components: Reporting
ReportingCSV export forfuther processingR processingExtensibilityPowerful libraries
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 10 / 25
IntroductionFramework
Sentiment analysisCase studiesConclusions
Framework overviewFramework messagingFramework components
Framework architectureFramework components: Sentiment analysis
Sentiment analysisSupervised learningNeed of labelingTools for labelingClassifier buildingClassifier testing
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 11 / 25
IntroductionFramework
Sentiment analysisCase studiesConclusions
OverviewClassifier
Sentiment analysisOverview
Supervised learning with Natural Language Toolkit (NLTK)Three classes: “Positive”, “negative” and “neutral”
Need of labeled corpusSeveral ones in English ...... none in Spanish
Need of thousands manually classified tweetsCollaborative labelingWeb application to label tweets
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 12 / 25
IntroductionFramework
Sentiment analysisCase studiesConclusions
OverviewClassifier
Sentiment analysisClassifier
Naïve Bayes classifierStop words removedSome parameters to setOptimus parameter setting depends on the dataset
Need of classifier evaluationTesterCross validation
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 13 / 25
IntroductionFramework
Sentiment analysisCase studiesConclusions
Boston Terror AttackPolitical analysis
Case studyBoston Terror Attack
Main objectiveEvaluate the platform
Secondary objectiveDescribe activity around an eventStream by string filter
The eventTerror attack on 15 Apr 2013 14:49 (GMT-4) in BostonInternet witch-hunt motivated by the release of some photosShooting and manhunt
Data adquisitionBegin: Tue, 16 Apr 2013 00:43 (GMT)End: Tue, 23 Apr 2013 00:43 (GMT)Filter: “Maratón de Boston” (Boston Marathon in Spanish)
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 14 / 25
IntroductionFramework
Sentiment analysisCase studiesConclusions
Boston Terror AttackPolitical analysis
Case studyBoston Terror Attack: Dataset description
Value Relative AverageTweets 28,892 1.16/userNo-retweets 16,029 55.48%Reweets 12,863 44.52%Geolocalized 255 0.88%Users 24,989Mentions 18,937 65.54%Replies 849 2.94%Non-replies 18,088 62.61%Size 96.39 MB 3.38 KB/tweetIndex size 0.91 MBDisk 132.99 MB
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 15 / 25
Case studyBoston Terror attack: activity
Apr 17 Apr 19 Apr 21 Apr 23
01000
2500
Time
Tw
eets
Tweets
Apr 17 Apr 19 Apr 21 Apr 23
0400
1000
Time
Non−
retw
eets
Tweets (excluding RTs)
Apr 17 Apr 19 Apr 21 Apr 23
0400
1000
Time
Retw
eets
Retweets
Dashed line: BombingDotted line: Photo releaseSolid line: ShootingGray background : Manhunt
Case studyBoston Terror attack: activity
Thu 23:00 Fri 04:00 Fri 09:00 Fri 14:00 Fri 19:00 Sat 00:00
50
150
Time
Tw
eets
Tweets
Thu 23:00 Fri 04:00 Fri 09:00 Fri 14:00 Fri 19:00 Sat 00:00
20
60
120
Time
Non−
retw
eets
Tweets (excluding RTs)
Thu 23:00 Fri 04:00 Fri 09:00 Fri 14:00 Fri 19:00 Sat 00:00
020
40
60
Time
Retw
eets
Retweets
Dotted line: Photo releaseSolid line: ShootingGray background : Manhunt
IntroductionFramework
Sentiment analysisCase studiesConclusions
Boston Terror AttackPolitical analysis
Case studyPolitical analysis: Overview
Main objectiveEvaluate sentiment analysis
Secondary objectiveDescribe regular Twitter activityStream by user filter
Selection of Spanish political actorsSelected by activity and controversy
Account owner AccountsPolitical party @PPopular, @PSOE, @iunida, @UPyDPolitician @agarzon, @EduMadina, @ToniCanto1, @Re-
villaMiguelA, @ccifuentes, @_Rubalcaba_Journalist @jordievole, @iescolarActivist organization @LA_PAH
Data adquisitionFrom Tue, 16 Apr 2013 00:00 (GMT)End: 18 Apr 2013 04:00 (GMT)Filter: Account name (“@account”)
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 18 / 25
IntroductionFramework
Sentiment analysisCase studiesConclusions
Boston Terror AttackPolitical analysis
Case studyPolitical analysis: Dataset description
Value Relative AverageTweets 65,043 1.9/userNo-retweets 28,175 43.32%Reweets 36,868 56.68%Geolocalized 528 0.81%Users 34,195Mentions 56,713 87.19%Non-replies 46,981 72.23%Replies 9,732 14.96%Size 227.51 MB 3.58 KB/tweetIndex size 2.05 MBDisk 237.95 MB
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 19 / 25
Case studyPolitical analysis: Activity
Tue Wed Thu
01500
3500
Time
Tw
eets
Tweets
Tue Wed Thu
0500
1500
Time
Non−
retw
eets
Tweets (excluding RTs)
Tue Wed Thu
01000
2000
Time
Retw
eets
Retweets
IntroductionFramework
Sentiment analysisCase studiesConclusions
Boston Terror AttackPolitical analysis
Case studyPolitical analysis: Sentiment analysis
9, 884 tweets were manually classified in a collaborative way4, 739 non-neutral tweets1, 062 positives, 3, 677 negatives
Unbalanced datasetWe tried several parameters for the Naïve Bayes classifier
N-grams: {1}, {2}, {3}, {1, 2}, {1, 3} and {2, 3}Minimum score: 0, 1, 2, 3, 4, 5, 6 and 10
10-fold cross-validation
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 21 / 25
IntroductionFramework
Sentiment analysisCase studiesConclusions
Boston Terror AttackPolitical analysis
Case studyPolitical analysis: Sentiment analysis
AccuracyNaiveBayes-1_2-min3 0.8543
NaiveBayes-1-min3 0.8510NaiveBayes-1_3-min3 0.8507
NaiveBayes-1-min4 0.8476NaiveBayes-1_3-min5 0.8474NaiveBayes-1_2-min4 0.8469NaiveBayes-1_3-min4 0.8467NaiveBayes-1_3-min1 0.8459
NaiveBayes-1-min6 0.8452NaiveBayes-1-min1 0.8448
NaiveBayes-1_2-min5 0.8446NaiveBayes-1_3-min6 0.8438NaiveBayes-1_2-min6 0.8436
NaiveBayes-1-min5 0.8406NaiveBayes-1_2-min1 0.8389NaiveBayes-2_3-min6 0.8385
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 22 / 25
Case studyPolitical analysis: Normalized sentiment
Tue Wed Thu
0.0
0.2
0.4
0.6
0.8
1.0
Time
Positiv
e
IntroductionFramework
Sentiment analysisCase studiesConclusions
Conclusions and future work
We developed a framework that eases data extraction andanalysis on Twitter
Ready for productionIt will be released soon with a free licence
We briefly described two case studiesEvent driven activity - Boston terror attacksRegular activity - Political activity
Sentiment analysis is intrinsically difficultFuture work
LemmalizationNatural language processingTime series analysis
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 24 / 25
Thanks for your attention!
David F. Barrerodavid@aut.uah.es
@dfbarrero
Recommended