A Study of Positive and Negative Affect in Tracking Influenza-like Illness Rate in Twitter Data
Son Doan1, Mike Conway1, Nigel Collier2
1Division of Biomedical Informatics, University of California San Diego2National Institute of Informatics
Medicine 2.0, Boston Sep 15, 2012
Seasonal influenza and influenza-like illness
Influenza-like Illness (ILI) = fever (> 100o F)* AND cough and/or sore throat (in the absence of a known cause other than influenza)*Temperature can be measured in the office or at home
Epidemics of seasonal influenza result in about three to five million cases of severe illness and 250 000 to 500 000 deaths worldwide each year (WHO, 2009)
Case definition from CDC
Calculating ILI rate is key for seasonal influenza surveillance
Related work
• Events tracking/predicting: – Predict election, gasoline price: O’Connor et al. (2010)
– Predict stock market: Bollen et al. (2011)
– Earthquake warning: Sasaki et al. (2010), Guy et al (2010)
– Public mood tracking: Golder and Macy (2011), Doan and Collier (2011)
• Predicting the Influenza-Like Illness rate:– Google Flu Trends: Ginsberg et al. (2009), Valdivia et al. (2010), now
extended to dengue tracking (Chan et al. (2012)) use query logs– Culotta (2009), Lampos and Christinini (2010), Signorini et al
(2011), Chew and Eysenbach (2011) use Twitter
Positive and Negative Affect
• We manually created a positive affect (PA) and a negative affect (NA)word list from:– Wikipedia– Internet resources: text book, literature
• Final lists: 966 terms (509 PA and 457 NA)
Happy SadKeenJoyful Nervous Angry
Positive Affect (PA)Negative Affect (PA)
……
Link between positive and negative affect and ILI rate?
I am terribly sick, I'm thinking swine flu:O
i really hope i dont get sick otherwise im gonna be angry
what are the symptoms for swine flu I am nervous!
I do not have H1N1. I'm so happy about this fact.
tell everyone how great my swine flu cookies were.
Twitter Corpus
Timeline: 36 weeks for the US 2009 influenza season (Aug 30, 2009 to May 8, 2010)
Name Total
Tweets 587,290,394
Users 23,571,765
URL 136,034,309
Hash Tags
96,399,587
Thanks to Brendan O’Connor (CMU) and Twitter Inc.
5 mil
10 mil
15 mil
20 mil
25 mil
Methods
Twitter corpus
ILI-related keywords filtering
ILI-related tweets
Non-PA/Non-NA ILI-related tweets
PA/NA filtering
Culotta Signorini et al. Chew and Eysenbach
flu swine h1n1
cough flu swine flu
headache influenza swineflu
sore throat
Main results
ILINet Non-NA Non-PA
Culotta 0.9485 0.9483 0.9548
Signorini et al. 0.9470 0.9532 0.9586
Chew and Eysenbach 0.9448 0.9444 0.9467
• Gold standard: Laboratory data from the US Outpatient Influenza-Like Illness Surveillance Network (ILINet)
• Pearson correlation coefficients are used
Notes: - Google Flu Trend got 0.9912!!! (using query logs)
Discussion
• Retaining negative affect tweets, but filtering out positive affect tweets, helps to increase the correlation coefficient with laboratory data ILI rate.
• Many true positive tweets have negation associated with positive affect, e.g., “My little guy and I just got our H1N1. He wasn't too happy about getting the shot.” Emphasizes the need for NLP (negation detection)
• Further semantic filtering might help to significantly improve results (our work at HISB 2012)
Syndromic surveillance for gastrointestinal, respiratory, neurological, dermatological, haemorrhagic, musculoskeletal from Tweets in 40 world cities.
DIZIE: system for syndromic surveillance on Twitter
Collier and Doan. eHealth 2012;186-95
http://born.nii.ac.jp/dizie/
THANK YOU !!!