32
Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst. Prof. EMSE http://www.seas.gwu.edu/~broniatowski

Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst

Embed Size (px)

Citation preview

Page 1: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst

Bringing Together the Social and Technical in Big Data Analytics: Why You Can't

Predict the Flu from Twitter, and Here's How

David A. BroniatowskiAsst. Prof. EMSE

http://www.seas.gwu.edu/~broniatowski

Page 2: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst

PUBLIC HEALTH CYCLE

Population Doctors

Surveillance

Intervention

Page 3: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst

• Traditional mechanisms

• Surveys

• Clinical visits

REQUIRES:DATA ON THE POPULATION

This has limited research

Page 4: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst

TWITTER• Short messages (140 chars) posted to public internet

• Content: news, conversation, pointless babble

• Huge volume

• 500 million a day

Page 5: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst

WHY TWITTER?

• Huge volumes of data

• A constant stream of small updates

• Nothing like waiting in line to buy cigarettes behind a guy in a business suit buying gasoline with ten dollars in dimes

• I eat pizza too much

• I'm at Cvs Pharmacy (117th and kendall, Miami)

Page 6: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst

INFLUENZA SURVEILLANCE

Page 7: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst

INFLUENZA SURVEILLANCE

• CDC has nationwide surveillance network with 2700 outpatient centers reporting

• ILI: influenza-like illness

• Cons:

• Slow (2 weeks)

• Varying levels ofgeographicgranularity

Page 8: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst

TWITTER SURVEILLANCE

• Twitter influenza surveillance must be

• 1) Accurately track ground truth

• Identify infection tweets

• 2) Effective at both municipal and national level

• Expand tweet geolocation and evaluate municipal accuracy

• 3) Predictive in real time

• Deploy previously trained system on this flu season

Page 9: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst
Page 10: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst
Page 11: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst

PIPELINE CLASSIFIERS

• Three steps using supervised machine learning+NLP

• Step 1: Identify health tweets

• Step 2: Identify flu related

• Step 3: Awareness vs. infection

Page 12: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst

TWITTER SURVEILLANCE

• Twitter influenza surveillance must be

• 1) Accurately track ground truth

• Identify infection tweets

• 2) Effective at both municipal and national level

• Expand tweet geolocation and evaluate municipal accuracy

• 3) Predictive in real time

• Deploy previously trained system on this flu season

Page 13: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst

LOCAL EFFECTIVENESS

• Current work focuses on US national flu rates

• Useful surveillance needed by region/state/city

• How can Twitter track local trends?

• Is it accurate?

• Is there enough data?

• Only about 1% of Twitter is geocoded

Page 14: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst
Page 15: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst

CARMEN(Dredze et al., 2013)

• Over 4000 known locations (countries, states, counties, cities)

• Geocordinates only: ~1%

• Expanded locations: ~22%

• Available in Python and Java

Page 16: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst

TWITTER SURVEILLANCE

• Twitter influenza surveillance must be

• 1) Accurately track ground truth

• Identify infection tweets

• 2) Effective at both municipal and national level

• Expand tweet geolocation and evaluate municipal accuracy

• 3) Predictive in real time

• Deploy previously trained system on this flu season

Page 17: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst

SURVEILLANCE RESULTSPearson

Correlation 2009 2011

Keywords 0.97 0.646

Flu Classifier 0.97 0.519

Google Flu Trends

0.97 0.897

Infection 0.972 0.7832

Page 18: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst

GOOGLE FLU TRENDS GETS IT WRONG?Lohr, S. (2014). Google flu trends: the limits of

big data. New York Times.

Page 19: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst

Pearson Correlation:

Keywords: 0.75Infection: 0.93

Page 20: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst
Page 21: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst

• ILI counts:

• Infection: 0.88

• Keywords: 0.72

BLIND EVALUATION

Page 22: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst

2013-20140.95 Correlation

Page 23: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst
Page 24: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst

MOST RECENT DATA

Broniatowski, D. A., Dredze, M., Paul, M. J., & Dugas, A. (2015). Using Social Media to Perform Local Influenza Surveillance in an Inner-City Hospital: A Retrospective Observational Study. JMIR Public Health and Surveillance, 1(1), e5.

Page 25: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst

PREDICTING ACTUAL FLU IN BALTIMORE

Broniatowski, D. A., Dredze, M., Paul, M. J., & Dugas, A. (2015). Using Social Media to Perform Local Influenza Surveillance in an Inner-City Hospital: A Retrospective Observational Study. JMIR Public Health and Surveillance, 1(1), e5.

Page 26: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst

HEALTHTWEETS.ORG

Page 27: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst

HEALTHTWEETS WORLDWIDE

Page 28: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst

Some Other Projects

David A. BroniatowskiAsst. Prof. EMSE

http://www.seas.gwu.edu/~broniatowski

Page 29: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst

29

BIG DATA FOR GROUP DECISION MAKING: EXTRACTING SOCIAL NETWORKS FROM FDA ADVISORY PANEL

MEETING TRANSCRIPTS

(Broniatowski & Magee, 2013 American Journal of Therapeutics; Broniatowski & Magee, 2012 IEEE Signal Processing Magazine; Broniatowski & Magee, in preparation)

Page 30: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst

“GERMS ARE GERMS” AND “WHY NOT TAKE A RISK?”

MODELS AND DATA FOR RISKY DECISION MAKING IN THE ED

(Broniatowski, Klein, & Reyna, in press, Medical Decision Making Broniatowski & Reyna, in preparation)

Page 31: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst

HOW DO WE DESIGN SYSTEMS TO USE INFORMATION FLOW TO OUR ADVANTAGE?

We would like to deepen our intuitionregarding system architectures

(Broniatowski & Moses, in preparation)

Page 32: Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst

32

QUESTIONS?• Big data

• Influenza tracking and coupled contagion

• Group decision-making

• Individual decision-making

• Formal models

• Medical and engineering applications

• Formal and mathematical models

• Systems architecture

• Design for flexibility

[email protected]