Crisis Computing

Preview:

Citation preview

Crisis ComputingFinding relevant and credible information on social media during disasters

Big Data Analytics ConferenceDelhi, India, December 2014

January 2010How/when did it start for me?

Humanitarian ComputingAt least 775 publications:

● Crisis Analysis (55)

● Crisis Management (309)

● Situational Awareness (67)

● Social Media (231)

● Mobile Phones (74)

● Crowdsourcing (116)

● Software and Tools (97)

● Human-Computer Interaction (28)  

● Natural Language Processing (33)  

● Trust and Security (33)

● Geographical Analysis (53)

Source: http://humanitariancomp.referata.com/

Humanitarian Computing Topics

http://www.youtube.com/watch?v=0UFsJhYBxzY

8

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

An earthquake hits a Twitter user

• When an earthquake strikes, the first tweets are posted 20-30 seconds later

• Damaging seismic waves travel at 3-5 km/s, while network communications are light speed on fiber/copper + latency

• After ~100km seismic waves may be overtaken by tweets about them

http://xkcd.com/723/

Examples of crisis tweets

Alexandra Olteanu, Sarah Vieweg and Carlos Castillo: What to Expect When the Unexpected Happens: Social Media Communications Across Crises.To appear in CSCW 2015.

Examples of crisis tweets (cont.)

11

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Fertile grounds for applied research✔ Problems of global significance✔ Solved with labor-intensive methods✔ Better solution provides a public good✔ Large and noisy data sets available✔ Engage volunteer communities

12

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Fertile grounds for applied research✔ Problems of global significance✔ Solved with labor-intensive methods✔ Better solution provides a public good✔ Large and noisy data sets available✔ Engage volunteer communities• Relevance to practitioners?

13

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Current collaboratorsPatrick Meier– QCRI

Sarah Vieweg– QCRI

Muhammad Imran– QCRI

Irina Temnikova– QCRI

Alexandra Olteanu– EPFL

Aditi Gupta– IIIT Delhi

“P.K.” Kumaraguru– IIIT Delhi

Fernando Diaz– Microsoft

14

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Outline

Crisis MapsExtractionMatching

VerificationCredibility

Crisis maps from social mediaCarlos Castillo, Fernando Diaz, and Hemant Purohit:Leveraging Social Media and Web of Data to Assist Crisis Response CoordinationTutorial at SDM, Philadelphia, PA, USA. April 2014.

Hemant Purohit, Carlos Castillo, Patrick Meier and Amit Sheth: Crisis Mapping, Citizen Sensing and Social Media AnalyticsTutorial at ICWSM, May 2013.

Patrick Meier, Social Innovation Director @ QCRI – http://irevolution.net/

“What can speed humanitarian

response to tsunami-ravaged

coasts? Expose human rights

atrocities? Launch helicopters to

rescue earthquake victims?

Outwit corrupt regimes?

A map.”

21

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Crisis mapping goes mainstream (2011)

http://newsbeatsocial.com/watch/0_s6xxcr3p

Understanding Crisis TweetsAlexandra Olteanu, Sarah Vieweg and Carlos Castillo: What to Expect When the Unexpected Happens: Social Media Communications Across Crises.To appear in CSCW 2015.

29

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Types of Disaster

30

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

3.

Extraction

Our approach

2.

Classification1.

Filtering

31

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Filtering

Is disaster-related?

Contributes tosituational

awareness?

Yes Yes

No No

32

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

ClassificationCaution &

AdviceInformation

SourcesDamage &Casualties Donations

Gov

Eyewitness

Media

NGO

Outsider

...

...

Filteredtweets

33

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

A large-scale study of crisis tweets

• Collect tweets from 26 disasters• Classify according to:

● Informative / Not informative● Information provided● Information source

34

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Advice on labeling

• Your instructions will never be correct the first time you try– e.g. personal / eyewitness– Instructions must be re-written reactively– Perform small-scale labeling first

• Instructions must be concrete and brief– If you can't do it, the task has to be divided

35

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Information Provided in Crisis Tweets

N=26; Data available at http://crisislex.org/

36

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

What do people tweet about?• Affected individuals

– 20% on average (min. 5%, max. 57%)– most prevalent in human-induced, focalized & instantaneous events

• Sympathy and emotional support– 20% on average (min. 3%, max. 52%)– most prevalent in instantaneous events

• Other useful information– 32% on average (min. 7%, max. 59%)– least prevalent in diffused events

37

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

What do people tweet about? (cont.)• Infrastructure and utilities

– 7% on average (min. 0%, max. 22%)– most prevalent in diffused events, in particular floods

• Caution and advice– 10% on average (min. 0%, max. 34%)– least prevalent in instantaneous & human-induced events

• Donations and volunteering– 10% on average (min. 0%, max. 44%)– most prevalent in natural hazards

Distribution over information sources

Distribution over time

Extracting information and matching emergency-related resourcesMuhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier: Extracting Information Nuggets from Disaster-Related Messages in Social MediaIn ISCRAM. Baden-Baden, Germany, 2013. Best paper award.

Hemant Purohit, Amit Sheth, Carlos Castillo, Patrick Meier, Fernando Diaz:Emergency-Relief Coord. on Social Media: Auto. Matching Resource Requests and OffersFirst Monday 19 (1), January 2014

Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier: Practical Extraction of Disaster-Relevant Information from Social MediaIn SWDM. Rio de Janeiro, Brazil, 2013

41

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Information Extraction

...

Classifiedtweets

@JimFreund: Apparently we have no choice.

There is a tornado watch in effect

tonight.

42

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Extraction

• #hashtags, @user mentions, URLs, etc.– Regular expressions– Text library from Twitter

• Temporal expressions– Part-of-speech tagger + heuristics– Natty library

• Supervised learning

43

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Labels for extraction• Type-dependent instruction• Ask evaluators to copy-paste a word/phrase from

each tweet

44

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Learning: Conditional Random Fields

• Used extensively in NLP for part-of-speech tagging and information extraction

• Representation of observations is important (capitalization, position, etc.)

HMM Linear-chain CRF

hidden

observed

45

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Tool

• CMU ARK Twitter NLP– Tokenization– Feature extraction– CRF learning

• Very easy to use: simply change the training set (part-of-speech tags) into anything, and re-train

46

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Output examplesRT @weatherchannel: .@NYGovCuomo orders closing of NYC bridges. Only Staten Island bridges unaffected at this time. Bridges must close by 7pm. #Sandy #NYC

Wow what a mess #Sandy has made. Be sure to check on the elderly and homeless please! Thoughts and prayers to all affected

RT @twc_hurricane: Wind gusts over 60 mph are being reported at Central Park and JFK airport in #NYC this hour. #Sandy

RT @mitchellreports: Red Cross tells us grateful for Romney donation but prefer people send money or donate blood dont collect goods NOT best way to help #Sandy

47

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Extractor evaluationSetting Rec Prec

Train 2/3 Joplin, Test 1/3 Joplin 78% 90%

Train 2/3 Sandy, Test 1/3 Sandy 41% 79%

Train Joplin, Test Sandy 11% 78%

Train Joplin + 10% Sandy, Test 90% Sandy 21% 81%

• Precision is: one word or more in common with what humans extracted

48

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Donations matching• Identify and match requests/offers for donations

– Money, clothing, food, shelter, volunteers, blood

Average precision = 0.21 (0.16 if only text similarity is used)

Crowdsourced stream processing systemsMuhammad Imran, Ioanna Lykourentzou and Carlos Castillo: Engineering Crowdsourced Stream Processing Systemshttp://arxiv.org/abs/1310.5463

50

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

51

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Design objectives and principlesDesign principles

Design objective Example metric Automatic components

Crowdsourced components

Low latency End-to-end time Keep-items moving Trivial tasks

High throughput Output items per unit of time

High-performance processing

Task automation

Load adaptability Rate response function

Load shedding, load queueing

Task prioritization

Cost effectiveness Cost vs. quality, throughput, etc.

N/A Task frugality

High quality Application-dependent

Redudancy, aggregation and quality control

Design patterns● QA loop

● Task assignment

● Process/verify

● Supervised learning

● Crowdwork sub-task chaining

● Humans are not a bottleneck

● Humans review every output element

53

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

http://aidr.qcri.org/

54

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Self-service for crisis-related classification

Unstructuredtext reports

Categorizedinformation

Automaticclassifier

ModelBuilder

Crowdsourcedground-truth

Library of training data

Credibility and verificationAditi Gupta, Ponnurangam Kumaraguru, Carlos Castillo and Patrick Meier:TweetCred: A Real-time Web-based System for Credibility of Content on TwitterIn SocInfo 2014. Runner-up for best paper award.

Carlos Castillo, Marcelo Mendoza, Barbara Poblete:Predicting Information Credibility in Time-Sensitive Social MediaIn Internet Research, Vol. 23, Issue 5. October 2013.

A. Popoola, D. Krasnoshtan, A. Toth, V. Naroditskiy, C. Castillo, P. Meier and I. Rahwan:Information Verification during Natural DisastersSocial Web and Disaster Management (SWDM) workshop, 2013.

3

http://www.youtube.com/watch?v=pAHoEO-K0Ek

62

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Crowdsourced verification: Veri.ly

• Frame crowdwork correctly• Not upvoting/downvoting a claim• Instead, providing evidence for/against

@VeriDotLy — http://veri.ly/

65

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Examples of evidence provided

66

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Automatic credibility evaluation: TweetCred

• Real-time web-based service• Used as a Chrome extension• Annotates Twitter's timeline with credibility

scores

67

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

http://twitdigest.iiitd.edu.in/TweetCred/

68

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Next steps

• Credibility facets– Factually written– Detailed– Author on the ground– ...

• Respond to searches about an event

Closing remarks

71

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Computationally feasible

Supported bydata

Useful

Good projects in this space

72

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Computationally feasible

Supported bydata

Useful

Good projects in this space

Temptation! Danger!

Poorly planned projects :-(

AI-complete problems

73

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Some venues

• SWDM – Workshop on Social Webfor Disaster Management– Deadline: January 24th

• ISCRAM – International Conference on Information Systems for Crisis Response and Management

+ the usual suspects, depending on your area ;-)

74

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Possibility of large impact by using computer science to support humanitarian work

=Applied computing at its best

Thank you!Carlos Castillo · chato@acm.org

http://www.chato.cl/research/With thanks to Patrick Meier for several slides

Recommended