75
Crisis Computing Finding relevant and credible information on social media during disasters Big Data Analytics Conference Delhi, India, December 2014

Crisis Computing

Embed Size (px)

Citation preview

Page 1: Crisis Computing

Crisis ComputingFinding relevant and credible information on social media during disasters

Big Data Analytics ConferenceDelhi, India, December 2014

Page 2: Crisis Computing

January 2010How/when did it start for me?

Page 3: Crisis Computing

Humanitarian ComputingAt least 775 publications:

● Crisis Analysis (55)

● Crisis Management (309)

● Situational Awareness (67)

● Social Media (231)

● Mobile Phones (74)

● Crowdsourcing (116)

● Software and Tools (97)

● Human-Computer Interaction (28)  

● Natural Language Processing (33)  

● Trust and Security (33)

● Geographical Analysis (53)

Source: http://humanitariancomp.referata.com/

Page 4: Crisis Computing

Humanitarian Computing Topics

Page 5: Crisis Computing
Page 6: Crisis Computing
Page 7: Crisis Computing

http://www.youtube.com/watch?v=0UFsJhYBxzY

Page 8: Crisis Computing

8

Carlos Castillo – [email protected]://www.chato.cl/research/

An earthquake hits a Twitter user

• When an earthquake strikes, the first tweets are posted 20-30 seconds later

• Damaging seismic waves travel at 3-5 km/s, while network communications are light speed on fiber/copper + latency

• After ~100km seismic waves may be overtaken by tweets about them

http://xkcd.com/723/

Page 9: Crisis Computing

Examples of crisis tweets

Page 10: Crisis Computing

Alexandra Olteanu, Sarah Vieweg and Carlos Castillo: What to Expect When the Unexpected Happens: Social Media Communications Across Crises.To appear in CSCW 2015.

Examples of crisis tweets (cont.)

Page 11: Crisis Computing

11

Carlos Castillo – [email protected]://www.chato.cl/research/

Fertile grounds for applied research✔ Problems of global significance✔ Solved with labor-intensive methods✔ Better solution provides a public good✔ Large and noisy data sets available✔ Engage volunteer communities

Page 12: Crisis Computing

12

Carlos Castillo – [email protected]://www.chato.cl/research/

Fertile grounds for applied research✔ Problems of global significance✔ Solved with labor-intensive methods✔ Better solution provides a public good✔ Large and noisy data sets available✔ Engage volunteer communities• Relevance to practitioners?

Page 13: Crisis Computing

13

Carlos Castillo – [email protected]://www.chato.cl/research/

Current collaboratorsPatrick Meier– QCRI

Sarah Vieweg– QCRI

Muhammad Imran– QCRI

Irina Temnikova– QCRI

Alexandra Olteanu– EPFL

Aditi Gupta– IIIT Delhi

“P.K.” Kumaraguru– IIIT Delhi

Fernando Diaz– Microsoft

Page 14: Crisis Computing

14

Carlos Castillo – [email protected]://www.chato.cl/research/

Outline

Crisis MapsExtractionMatching

VerificationCredibility

Page 15: Crisis Computing

Crisis maps from social mediaCarlos Castillo, Fernando Diaz, and Hemant Purohit:Leveraging Social Media and Web of Data to Assist Crisis Response CoordinationTutorial at SDM, Philadelphia, PA, USA. April 2014.

Hemant Purohit, Carlos Castillo, Patrick Meier and Amit Sheth: Crisis Mapping, Citizen Sensing and Social Media AnalyticsTutorial at ICWSM, May 2013.

Page 16: Crisis Computing
Page 17: Crisis Computing
Page 18: Crisis Computing
Page 19: Crisis Computing
Page 20: Crisis Computing

Patrick Meier, Social Innovation Director @ QCRI – http://irevolution.net/

“What can speed humanitarian

response to tsunami-ravaged

coasts? Expose human rights

atrocities? Launch helicopters to

rescue earthquake victims?

Outwit corrupt regimes?

A map.”

Page 21: Crisis Computing

21

Carlos Castillo – [email protected]://www.chato.cl/research/

Crisis mapping goes mainstream (2011)

Page 22: Crisis Computing
Page 23: Crisis Computing
Page 24: Crisis Computing
Page 25: Crisis Computing
Page 26: Crisis Computing

http://newsbeatsocial.com/watch/0_s6xxcr3p

Page 27: Crisis Computing
Page 28: Crisis Computing

Understanding Crisis TweetsAlexandra Olteanu, Sarah Vieweg and Carlos Castillo: What to Expect When the Unexpected Happens: Social Media Communications Across Crises.To appear in CSCW 2015.

Page 29: Crisis Computing

29

Carlos Castillo – [email protected]://www.chato.cl/research/

Types of Disaster

Page 30: Crisis Computing

30

Carlos Castillo – [email protected]://www.chato.cl/research/

3.

Extraction

Our approach

2.

Classification1.

Filtering

Page 31: Crisis Computing

31

Carlos Castillo – [email protected]://www.chato.cl/research/

Filtering

Is disaster-related?

Contributes tosituational

awareness?

Yes Yes

No No

Page 32: Crisis Computing

32

Carlos Castillo – [email protected]://www.chato.cl/research/

ClassificationCaution &

AdviceInformation

SourcesDamage &Casualties Donations

Gov

Eyewitness

Media

NGO

Outsider

...

...

Filteredtweets

Page 33: Crisis Computing

33

Carlos Castillo – [email protected]://www.chato.cl/research/

A large-scale study of crisis tweets

• Collect tweets from 26 disasters• Classify according to:

● Informative / Not informative● Information provided● Information source

Page 34: Crisis Computing

34

Carlos Castillo – [email protected]://www.chato.cl/research/

Advice on labeling

• Your instructions will never be correct the first time you try– e.g. personal / eyewitness– Instructions must be re-written reactively– Perform small-scale labeling first

• Instructions must be concrete and brief– If you can't do it, the task has to be divided

Page 35: Crisis Computing

35

Carlos Castillo – [email protected]://www.chato.cl/research/

Information Provided in Crisis Tweets

N=26; Data available at http://crisislex.org/

Page 36: Crisis Computing

36

Carlos Castillo – [email protected]://www.chato.cl/research/

What do people tweet about?• Affected individuals

– 20% on average (min. 5%, max. 57%)– most prevalent in human-induced, focalized & instantaneous events

• Sympathy and emotional support– 20% on average (min. 3%, max. 52%)– most prevalent in instantaneous events

• Other useful information– 32% on average (min. 7%, max. 59%)– least prevalent in diffused events

Page 37: Crisis Computing

37

Carlos Castillo – [email protected]://www.chato.cl/research/

What do people tweet about? (cont.)• Infrastructure and utilities

– 7% on average (min. 0%, max. 22%)– most prevalent in diffused events, in particular floods

• Caution and advice– 10% on average (min. 0%, max. 34%)– least prevalent in instantaneous & human-induced events

• Donations and volunteering– 10% on average (min. 0%, max. 44%)– most prevalent in natural hazards

Page 38: Crisis Computing

Distribution over information sources

Page 39: Crisis Computing

Distribution over time

Page 40: Crisis Computing

Extracting information and matching emergency-related resourcesMuhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier: Extracting Information Nuggets from Disaster-Related Messages in Social MediaIn ISCRAM. Baden-Baden, Germany, 2013. Best paper award.

Hemant Purohit, Amit Sheth, Carlos Castillo, Patrick Meier, Fernando Diaz:Emergency-Relief Coord. on Social Media: Auto. Matching Resource Requests and OffersFirst Monday 19 (1), January 2014

Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier: Practical Extraction of Disaster-Relevant Information from Social MediaIn SWDM. Rio de Janeiro, Brazil, 2013

Page 41: Crisis Computing

41

Carlos Castillo – [email protected]://www.chato.cl/research/

Information Extraction

...

Classifiedtweets

@JimFreund: Apparently we have no choice.

There is a tornado watch in effect

tonight.

Page 42: Crisis Computing

42

Carlos Castillo – [email protected]://www.chato.cl/research/

Extraction

• #hashtags, @user mentions, URLs, etc.– Regular expressions– Text library from Twitter

• Temporal expressions– Part-of-speech tagger + heuristics– Natty library

• Supervised learning

Page 43: Crisis Computing

43

Carlos Castillo – [email protected]://www.chato.cl/research/

Labels for extraction• Type-dependent instruction• Ask evaluators to copy-paste a word/phrase from

each tweet

Page 44: Crisis Computing

44

Carlos Castillo – [email protected]://www.chato.cl/research/

Learning: Conditional Random Fields

• Used extensively in NLP for part-of-speech tagging and information extraction

• Representation of observations is important (capitalization, position, etc.)

HMM Linear-chain CRF

hidden

observed

Page 45: Crisis Computing

45

Carlos Castillo – [email protected]://www.chato.cl/research/

Tool

• CMU ARK Twitter NLP– Tokenization– Feature extraction– CRF learning

• Very easy to use: simply change the training set (part-of-speech tags) into anything, and re-train

Page 46: Crisis Computing

46

Carlos Castillo – [email protected]://www.chato.cl/research/

Output examplesRT @weatherchannel: .@NYGovCuomo orders closing of NYC bridges. Only Staten Island bridges unaffected at this time. Bridges must close by 7pm. #Sandy #NYC

Wow what a mess #Sandy has made. Be sure to check on the elderly and homeless please! Thoughts and prayers to all affected

RT @twc_hurricane: Wind gusts over 60 mph are being reported at Central Park and JFK airport in #NYC this hour. #Sandy

RT @mitchellreports: Red Cross tells us grateful for Romney donation but prefer people send money or donate blood dont collect goods NOT best way to help #Sandy

Page 47: Crisis Computing

47

Carlos Castillo – [email protected]://www.chato.cl/research/

Extractor evaluationSetting Rec Prec

Train 2/3 Joplin, Test 1/3 Joplin 78% 90%

Train 2/3 Sandy, Test 1/3 Sandy 41% 79%

Train Joplin, Test Sandy 11% 78%

Train Joplin + 10% Sandy, Test 90% Sandy 21% 81%

• Precision is: one word or more in common with what humans extracted

Page 48: Crisis Computing

48

Carlos Castillo – [email protected]://www.chato.cl/research/

Donations matching• Identify and match requests/offers for donations

– Money, clothing, food, shelter, volunteers, blood

Average precision = 0.21 (0.16 if only text similarity is used)

Page 49: Crisis Computing

Crowdsourced stream processing systemsMuhammad Imran, Ioanna Lykourentzou and Carlos Castillo: Engineering Crowdsourced Stream Processing Systemshttp://arxiv.org/abs/1310.5463

Page 50: Crisis Computing

50

Carlos Castillo – [email protected]://www.chato.cl/research/

Page 51: Crisis Computing

51

Carlos Castillo – [email protected]://www.chato.cl/research/

Design objectives and principlesDesign principles

Design objective Example metric Automatic components

Crowdsourced components

Low latency End-to-end time Keep-items moving Trivial tasks

High throughput Output items per unit of time

High-performance processing

Task automation

Load adaptability Rate response function

Load shedding, load queueing

Task prioritization

Cost effectiveness Cost vs. quality, throughput, etc.

N/A Task frugality

High quality Application-dependent

Redudancy, aggregation and quality control

Page 52: Crisis Computing

Design patterns● QA loop

● Task assignment

● Process/verify

● Supervised learning

● Crowdwork sub-task chaining

● Humans are not a bottleneck

● Humans review every output element

Page 53: Crisis Computing

53

Carlos Castillo – [email protected]://www.chato.cl/research/

http://aidr.qcri.org/

Page 54: Crisis Computing

54

Carlos Castillo – [email protected]://www.chato.cl/research/

Self-service for crisis-related classification

Unstructuredtext reports

Categorizedinformation

Automaticclassifier

ModelBuilder

Crowdsourcedground-truth

Library of training data

Page 55: Crisis Computing
Page 56: Crisis Computing
Page 57: Crisis Computing

Credibility and verificationAditi Gupta, Ponnurangam Kumaraguru, Carlos Castillo and Patrick Meier:TweetCred: A Real-time Web-based System for Credibility of Content on TwitterIn SocInfo 2014. Runner-up for best paper award.

Carlos Castillo, Marcelo Mendoza, Barbara Poblete:Predicting Information Credibility in Time-Sensitive Social MediaIn Internet Research, Vol. 23, Issue 5. October 2013.

A. Popoola, D. Krasnoshtan, A. Toth, V. Naroditskiy, C. Castillo, P. Meier and I. Rahwan:Information Verification during Natural DisastersSocial Web and Disaster Management (SWDM) workshop, 2013.

Page 58: Crisis Computing
Page 59: Crisis Computing

3

Page 60: Crisis Computing
Page 61: Crisis Computing

http://www.youtube.com/watch?v=pAHoEO-K0Ek

Page 62: Crisis Computing

62

Carlos Castillo – [email protected]://www.chato.cl/research/

Crowdsourced verification: Veri.ly

• Frame crowdwork correctly• Not upvoting/downvoting a claim• Instead, providing evidence for/against

@VeriDotLy — http://veri.ly/

Page 63: Crisis Computing
Page 64: Crisis Computing
Page 65: Crisis Computing

65

Carlos Castillo – [email protected]://www.chato.cl/research/

Examples of evidence provided

Page 66: Crisis Computing

66

Carlos Castillo – [email protected]://www.chato.cl/research/

Automatic credibility evaluation: TweetCred

• Real-time web-based service• Used as a Chrome extension• Annotates Twitter's timeline with credibility

scores

Page 67: Crisis Computing

67

Carlos Castillo – [email protected]://www.chato.cl/research/

http://twitdigest.iiitd.edu.in/TweetCred/

Page 68: Crisis Computing

68

Carlos Castillo – [email protected]://www.chato.cl/research/

Next steps

• Credibility facets– Factually written– Detailed– Author on the ground– ...

• Respond to searches about an event

Page 69: Crisis Computing
Page 70: Crisis Computing

Closing remarks

Page 71: Crisis Computing

71

Carlos Castillo – [email protected]://www.chato.cl/research/

Computationally feasible

Supported bydata

Useful

Good projects in this space

Page 72: Crisis Computing

72

Carlos Castillo – [email protected]://www.chato.cl/research/

Computationally feasible

Supported bydata

Useful

Good projects in this space

Temptation! Danger!

Poorly planned projects :-(

AI-complete problems

Page 73: Crisis Computing

73

Carlos Castillo – [email protected]://www.chato.cl/research/

Some venues

• SWDM – Workshop on Social Webfor Disaster Management– Deadline: January 24th

• ISCRAM – International Conference on Information Systems for Crisis Response and Management

+ the usual suspects, depending on your area ;-)

Page 74: Crisis Computing

74

Carlos Castillo – [email protected]://www.chato.cl/research/

Possibility of large impact by using computer science to support humanitarian work

=Applied computing at its best

Page 75: Crisis Computing

Thank you!Carlos Castillo · [email protected]

http://www.chato.cl/research/With thanks to Patrick Meier for several slides