Crisis ComputingFinding relevant and credible information on social media during disasters
Big Data Analytics ConferenceDelhi, India, December 2014
January 2010How/when did it start for me?
Humanitarian ComputingAt least 775 publications:
● Crisis Analysis (55)
● Crisis Management (309)
● Situational Awareness (67)
● Social Media (231)
● Mobile Phones (74)
● Crowdsourcing (116)
● Software and Tools (97)
● Human-Computer Interaction (28)
● Natural Language Processing (33)
● Trust and Security (33)
● Geographical Analysis (53)
Source: http://humanitariancomp.referata.com/
Humanitarian Computing Topics
http://www.youtube.com/watch?v=0UFsJhYBxzY
8
Carlos Castillo – [email protected]://www.chato.cl/research/
An earthquake hits a Twitter user
• When an earthquake strikes, the first tweets are posted 20-30 seconds later
• Damaging seismic waves travel at 3-5 km/s, while network communications are light speed on fiber/copper + latency
• After ~100km seismic waves may be overtaken by tweets about them
http://xkcd.com/723/
Examples of crisis tweets
Alexandra Olteanu, Sarah Vieweg and Carlos Castillo: What to Expect When the Unexpected Happens: Social Media Communications Across Crises.To appear in CSCW 2015.
Examples of crisis tweets (cont.)
11
Carlos Castillo – [email protected]://www.chato.cl/research/
Fertile grounds for applied research✔ Problems of global significance✔ Solved with labor-intensive methods✔ Better solution provides a public good✔ Large and noisy data sets available✔ Engage volunteer communities
12
Carlos Castillo – [email protected]://www.chato.cl/research/
Fertile grounds for applied research✔ Problems of global significance✔ Solved with labor-intensive methods✔ Better solution provides a public good✔ Large and noisy data sets available✔ Engage volunteer communities• Relevance to practitioners?
13
Carlos Castillo – [email protected]://www.chato.cl/research/
Current collaboratorsPatrick Meier– QCRI
Sarah Vieweg– QCRI
Muhammad Imran– QCRI
Irina Temnikova– QCRI
Alexandra Olteanu– EPFL
Aditi Gupta– IIIT Delhi
“P.K.” Kumaraguru– IIIT Delhi
Fernando Diaz– Microsoft
14
Carlos Castillo – [email protected]://www.chato.cl/research/
Outline
Crisis MapsExtractionMatching
VerificationCredibility
Crisis maps from social mediaCarlos Castillo, Fernando Diaz, and Hemant Purohit:Leveraging Social Media and Web of Data to Assist Crisis Response CoordinationTutorial at SDM, Philadelphia, PA, USA. April 2014.
Hemant Purohit, Carlos Castillo, Patrick Meier and Amit Sheth: Crisis Mapping, Citizen Sensing and Social Media AnalyticsTutorial at ICWSM, May 2013.
Patrick Meier, Social Innovation Director @ QCRI – http://irevolution.net/
“What can speed humanitarian
response to tsunami-ravaged
coasts? Expose human rights
atrocities? Launch helicopters to
rescue earthquake victims?
Outwit corrupt regimes?
A map.”
21
Carlos Castillo – [email protected]://www.chato.cl/research/
Crisis mapping goes mainstream (2011)
http://newsbeatsocial.com/watch/0_s6xxcr3p
Understanding Crisis TweetsAlexandra Olteanu, Sarah Vieweg and Carlos Castillo: What to Expect When the Unexpected Happens: Social Media Communications Across Crises.To appear in CSCW 2015.
30
Carlos Castillo – [email protected]://www.chato.cl/research/
3.
Extraction
Our approach
2.
Classification1.
Filtering
31
Carlos Castillo – [email protected]://www.chato.cl/research/
Filtering
Is disaster-related?
Contributes tosituational
awareness?
Yes Yes
No No
32
Carlos Castillo – [email protected]://www.chato.cl/research/
ClassificationCaution &
AdviceInformation
SourcesDamage &Casualties Donations
Gov
Eyewitness
Media
NGO
Outsider
...
...
Filteredtweets
33
Carlos Castillo – [email protected]://www.chato.cl/research/
A large-scale study of crisis tweets
• Collect tweets from 26 disasters• Classify according to:
● Informative / Not informative● Information provided● Information source
34
Carlos Castillo – [email protected]://www.chato.cl/research/
Advice on labeling
• Your instructions will never be correct the first time you try– e.g. personal / eyewitness– Instructions must be re-written reactively– Perform small-scale labeling first
• Instructions must be concrete and brief– If you can't do it, the task has to be divided
35
Carlos Castillo – [email protected]://www.chato.cl/research/
Information Provided in Crisis Tweets
N=26; Data available at http://crisislex.org/
36
Carlos Castillo – [email protected]://www.chato.cl/research/
What do people tweet about?• Affected individuals
– 20% on average (min. 5%, max. 57%)– most prevalent in human-induced, focalized & instantaneous events
• Sympathy and emotional support– 20% on average (min. 3%, max. 52%)– most prevalent in instantaneous events
• Other useful information– 32% on average (min. 7%, max. 59%)– least prevalent in diffused events
37
Carlos Castillo – [email protected]://www.chato.cl/research/
What do people tweet about? (cont.)• Infrastructure and utilities
– 7% on average (min. 0%, max. 22%)– most prevalent in diffused events, in particular floods
• Caution and advice– 10% on average (min. 0%, max. 34%)– least prevalent in instantaneous & human-induced events
• Donations and volunteering– 10% on average (min. 0%, max. 44%)– most prevalent in natural hazards
Distribution over information sources
Distribution over time
Extracting information and matching emergency-related resourcesMuhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier: Extracting Information Nuggets from Disaster-Related Messages in Social MediaIn ISCRAM. Baden-Baden, Germany, 2013. Best paper award.
Hemant Purohit, Amit Sheth, Carlos Castillo, Patrick Meier, Fernando Diaz:Emergency-Relief Coord. on Social Media: Auto. Matching Resource Requests and OffersFirst Monday 19 (1), January 2014
Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier: Practical Extraction of Disaster-Relevant Information from Social MediaIn SWDM. Rio de Janeiro, Brazil, 2013
41
Carlos Castillo – [email protected]://www.chato.cl/research/
Information Extraction
...
Classifiedtweets
@JimFreund: Apparently we have no choice.
There is a tornado watch in effect
tonight.
42
Carlos Castillo – [email protected]://www.chato.cl/research/
Extraction
• #hashtags, @user mentions, URLs, etc.– Regular expressions– Text library from Twitter
• Temporal expressions– Part-of-speech tagger + heuristics– Natty library
• Supervised learning
43
Carlos Castillo – [email protected]://www.chato.cl/research/
Labels for extraction• Type-dependent instruction• Ask evaluators to copy-paste a word/phrase from
each tweet
44
Carlos Castillo – [email protected]://www.chato.cl/research/
Learning: Conditional Random Fields
• Used extensively in NLP for part-of-speech tagging and information extraction
• Representation of observations is important (capitalization, position, etc.)
HMM Linear-chain CRF
hidden
observed
45
Carlos Castillo – [email protected]://www.chato.cl/research/
Tool
• CMU ARK Twitter NLP– Tokenization– Feature extraction– CRF learning
• Very easy to use: simply change the training set (part-of-speech tags) into anything, and re-train
46
Carlos Castillo – [email protected]://www.chato.cl/research/
Output examplesRT @weatherchannel: .@NYGovCuomo orders closing of NYC bridges. Only Staten Island bridges unaffected at this time. Bridges must close by 7pm. #Sandy #NYC
Wow what a mess #Sandy has made. Be sure to check on the elderly and homeless please! Thoughts and prayers to all affected
RT @twc_hurricane: Wind gusts over 60 mph are being reported at Central Park and JFK airport in #NYC this hour. #Sandy
RT @mitchellreports: Red Cross tells us grateful for Romney donation but prefer people send money or donate blood dont collect goods NOT best way to help #Sandy
47
Carlos Castillo – [email protected]://www.chato.cl/research/
Extractor evaluationSetting Rec Prec
Train 2/3 Joplin, Test 1/3 Joplin 78% 90%
Train 2/3 Sandy, Test 1/3 Sandy 41% 79%
Train Joplin, Test Sandy 11% 78%
Train Joplin + 10% Sandy, Test 90% Sandy 21% 81%
• Precision is: one word or more in common with what humans extracted
48
Carlos Castillo – [email protected]://www.chato.cl/research/
Donations matching• Identify and match requests/offers for donations
– Money, clothing, food, shelter, volunteers, blood
Average precision = 0.21 (0.16 if only text similarity is used)
Crowdsourced stream processing systemsMuhammad Imran, Ioanna Lykourentzou and Carlos Castillo: Engineering Crowdsourced Stream Processing Systemshttp://arxiv.org/abs/1310.5463
50
Carlos Castillo – [email protected]://www.chato.cl/research/
51
Carlos Castillo – [email protected]://www.chato.cl/research/
Design objectives and principlesDesign principles
Design objective Example metric Automatic components
Crowdsourced components
Low latency End-to-end time Keep-items moving Trivial tasks
High throughput Output items per unit of time
High-performance processing
Task automation
Load adaptability Rate response function
Load shedding, load queueing
Task prioritization
Cost effectiveness Cost vs. quality, throughput, etc.
N/A Task frugality
High quality Application-dependent
Redudancy, aggregation and quality control
Design patterns● QA loop
● Task assignment
● Process/verify
● Supervised learning
● Crowdwork sub-task chaining
● Humans are not a bottleneck
● Humans review every output element
54
Carlos Castillo – [email protected]://www.chato.cl/research/
Self-service for crisis-related classification
Unstructuredtext reports
Categorizedinformation
Automaticclassifier
ModelBuilder
Crowdsourcedground-truth
Library of training data
Credibility and verificationAditi Gupta, Ponnurangam Kumaraguru, Carlos Castillo and Patrick Meier:TweetCred: A Real-time Web-based System for Credibility of Content on TwitterIn SocInfo 2014. Runner-up for best paper award.
Carlos Castillo, Marcelo Mendoza, Barbara Poblete:Predicting Information Credibility in Time-Sensitive Social MediaIn Internet Research, Vol. 23, Issue 5. October 2013.
A. Popoola, D. Krasnoshtan, A. Toth, V. Naroditskiy, C. Castillo, P. Meier and I. Rahwan:Information Verification during Natural DisastersSocial Web and Disaster Management (SWDM) workshop, 2013.
3
http://www.youtube.com/watch?v=pAHoEO-K0Ek
62
Carlos Castillo – [email protected]://www.chato.cl/research/
Crowdsourced verification: Veri.ly
• Frame crowdwork correctly• Not upvoting/downvoting a claim• Instead, providing evidence for/against
@VeriDotLy — http://veri.ly/
66
Carlos Castillo – [email protected]://www.chato.cl/research/
Automatic credibility evaluation: TweetCred
• Real-time web-based service• Used as a Chrome extension• Annotates Twitter's timeline with credibility
scores
67
Carlos Castillo – [email protected]://www.chato.cl/research/
http://twitdigest.iiitd.edu.in/TweetCred/
68
Carlos Castillo – [email protected]://www.chato.cl/research/
Next steps
• Credibility facets– Factually written– Detailed– Author on the ground– ...
• Respond to searches about an event
Closing remarks
71
Carlos Castillo – [email protected]://www.chato.cl/research/
Computationally feasible
Supported bydata
Useful
Good projects in this space
72
Carlos Castillo – [email protected]://www.chato.cl/research/
Computationally feasible
Supported bydata
Useful
Good projects in this space
Temptation! Danger!
Poorly planned projects :-(
AI-complete problems
73
Carlos Castillo – [email protected]://www.chato.cl/research/
Some venues
• SWDM – Workshop on Social Webfor Disaster Management– Deadline: January 24th
• ISCRAM – International Conference on Information Systems for Crisis Response and Management
+ the usual suspects, depending on your area ;-)
74
Carlos Castillo – [email protected]://www.chato.cl/research/
Possibility of large impact by using computer science to support humanitarian work
=Applied computing at its best
Thank you!Carlos Castillo · [email protected]
http://www.chato.cl/research/With thanks to Patrick Meier for several slides