Download pdf - Crisis Computing

Crisis ComputingFinding relevant and credible information on social media during disasters

Big Data Analytics ConferenceDelhi, India, December 2014

January 2010How/when did it start for me?

Humanitarian ComputingAt least 775 publications:

● Crisis Analysis (55)

● Crisis Management (309)

● Situational Awareness (67)

● Social Media (231)

● Mobile Phones (74)

● Crowdsourcing (116)

● Software and Tools (97)

● Human-Computer Interaction (28)

● Natural Language Processing (33)

● Trust and Security (33)

● Geographical Analysis (53)

Source: http://humanitariancomp.referata.com/

http://humanitariancomp.referata.com/

Humanitarian Computing Topics

http://www.youtube.com/watch?v=0UFsJhYBxzY

8

Carlos Castillo – [email protected]://www.chato.cl/research/

An earthquake hits a Twitter user

• When an earthquake strikes, the first tweets are posted 20-30 seconds later

• Damaging seismic waves travel at 3-5 km/s, while network communications are light speed on fiber/copper + latency

• After ~100km seismic waves may be overtaken by tweets about them

http://xkcd.com/723/

mailto:[email protected]

http://www.chato.cl/research/

http://xkcd.com/723/

Examples of crisis tweets

Alexandra Olteanu, Sarah Vieweg and Carlos Castillo: What to Expect When the Unexpected Happens: Social Media Communications Across Crises.To appear in CSCW 2015.

Examples of crisis tweets (cont.)

http://chato.cl/papers/olteanu_vieweg_castillo_2015_social_media_crises_transversal_study.pdf

11


Fertile grounds for applied research✔ Problems of global significance✔ Solved with labor-intensive methods✔ Better solution provides a public good✔ Large and noisy data sets available✔ Engage volunteer communities



12


Fertile grounds for applied research✔ Problems of global significance✔ Solved with labor-intensive methods✔ Better solution provides a public good✔ Large and noisy data sets available✔ Engage volunteer communities• Relevance to practitioners?



13


Current collaboratorsPatrick Meier– QCRI

Sarah Vieweg– QCRI

Muhammad Imran– QCRI

Irina Temnikova– QCRI

Alexandra Olteanu– EPFL

Aditi Gupta– IIIT Delhi

“P.K.” Kumaraguru– IIIT Delhi

Fernando Diaz– Microsoft



14


Outline

Crisis MapsExtractionMatching

VerificationCredibility



Crisis maps from social mediaCarlos Castillo, Fernando Diaz, and Hemant Purohit:Leveraging Social Media and Web of Data to Assist Crisis Response CoordinationTutorial at SDM, Philadelphia, PA, USA. April 2014.

Hemant Purohit, Carlos Castillo, Patrick Meier and Amit Sheth: Crisis Mapping, Citizen Sensing and Social Media AnalyticsTutorial at ICWSM, May 2013.

http://www.slideshare.net/knoesis/siam-sdm2014tutorialsocialmediawebdataforcrisisresponsecoordination

http://www.knoesis.org/hemant/present/icwsm2013

Patrick Meier, Social Innovation Director @ QCRI – http://irevolution.net/

“What can speed humanitarian

response to tsunami-ravaged

coasts? Expose human rights

atrocities? Launch helicopters to

rescue earthquake victims?

Outwit corrupt regimes?

A map.”

http://irevolution.net/

21


Crisis mapping goes mainstream (2011)



http://newsbeatsocial.com/watch/0_s6xxcr3p

http://newsbeatsocial.com/watch/0_s6xxcr3p

Understanding Crisis TweetsAlexandra Olteanu, Sarah Vieweg and Carlos Castillo: What to Expect When the Unexpected Happens: Social Media Communications Across Crises.To appear in CSCW 2015.

http://chato.cl/papers/olteanu_vieweg_castillo_2015_social_media_crises_transversal_study.pdf

29


Types of Disaster



30


3.

Extraction

Our approach

2.

Classification1.

Filtering



31


Filtering

Is disaster-related?

Contributes tosituational

awareness?

Yes Yes

No No



32


ClassificationCaution &

AdviceInformation

SourcesDamage &Casualties Donations

Gov

Eyewitness

Media

NGO

Outsider

...

...

Filteredtweets



33


A large-scale study of crisis tweets

• Collect tweets from 26 disasters• Classify according to:

● Informative / Not informative● Information provided● Information source



34


Advice on labeling

• Your instructions will never be correct the first time you try– e.g. personal / eyewitness– Instructions must be re-written reactively– Perform small-scale labeling first

• Instructions must be concrete and brief– If you can't do it, the task has to be divided



35


Information Provided in Crisis Tweets

N=26; Data available at http://crisislex.org/



http://crisislex.org/

36


What do people tweet about?• Affected individuals

– 20% on average (min. 5%, max. 57%)– most prevalent in human-induced, focalized & instantaneous events

• Sympathy and emotional support– 20% on average (min. 3%, max. 52%)– most prevalent in instantaneous events

• Other useful information– 32% on average (min. 7%, max. 59%)– least prevalent in diffused events



37


What do people tweet about? (cont.)• Infrastructure and utilities

– 7% on average (min. 0%, max. 22%)– most prevalent in diffused events, in particular floods

• Caution and advice– 10% on average (min. 0%, max. 34%)– least prevalent in instantaneous & human-induced events

• Donations and volunteering– 10% on average (min. 0%, max. 44%)– most prevalent in natural hazards



Distribution over information sources

Distribution over time

Extracting information and matching emergency-related resourcesMuhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier: Extracting Information Nuggets from Disaster-Related Messages in Social MediaIn ISCRAM. Baden-Baden, Germany, 2013. Best paper award.

Hemant Purohit, Amit Sheth, Carlos Castillo, Patrick Meier, Fernando Diaz:Emergency-Relief Coord. on Social Media: Auto. Matching Resource Requests and OffersFirst Monday 19 (1), January 2014

Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier: Practical Extraction of Disaster-Relevant Information from Social MediaIn SWDM. Rio de Janeiro, Brazil, 2013

http://chato.cl/papers/imran_elbassuoni_castillo_diaz_meier_2013_extracting_information_nuggets_disasters.pdf

http://dx.doi.org/10.5210/fm.v19i1.4848

http://chato.cl/papers/imran_elbassuoni_castillo_diaz_meier_2013_practical_extraction_disaster_crisis.pdf

41


Information Extraction

...

Classifiedtweets

@JimFreund: Apparently we have no choice.

There is a tornado watch in effect

tonight.



42


Extraction

• #hashtags, @user mentions, URLs, etc.– Regular expressions– Text library from Twitter

• Temporal expressions– Part-of-speech tagger + heuristics– Natty library

• Supervised learning



43


Labels for extraction• Type-dependent instruction• Ask evaluators to copy-paste a word/phrase from

each tweet



44


Learning: Conditional Random Fields

• Used extensively in NLP for part-of-speech tagging and information extraction

• Representation of observations is important (capitalization, position, etc.)

HMM Linear-chain CRF

hidden

observed



45


Tool

• CMU ARK Twitter NLP– Tokenization– Feature extraction– CRF learning

• Very easy to use: simply change the training set (part-of-speech tags) into anything, and re-train



46


Output examplesRT @weatherchannel: .@NYGovCuomo orders closing of NYC bridges. Only Staten Island bridges unaffected at this time. Bridges must close by 7pm. #Sandy #NYC

Wow what a mess #Sandy has made. Be sure to check on the elderly and homeless please! Thoughts and prayers to all affected

RT @twc_hurricane: Wind gusts over 60 mph are being reported at Central Park and JFK airport in #NYC this hour. #Sandy

RT @mitchellreports: Red Cross tells us grateful for Romney donation but prefer people send money or donate blood dont collect goods NOT best way to help #Sandy



47


Extractor evaluationSetting Rec Prec

Train 2/3 Joplin, Test 1/3 Joplin 78% 90%

Train 2/3 Sandy, Test 1/3 Sandy 41% 79%

Train Joplin, Test Sandy 11% 78%

Train Joplin + 10% Sandy, Test 90% Sandy 21% 81%

• Precision is: one word or more in common with what humans extracted



48


Donations matching• Identify and match requests/offers for donations

– Money, clothing, food, shelter, volunteers, blood

Average precision = 0.21 (0.16 if only text similarity is used)



Crowdsourced stream processing systemsMuhammad Imran, Ioanna Lykourentzou and Carlos Castillo: Engineering Crowdsourced Stream Processing Systemshttp://arxiv.org/abs/1310.5463

http://chato.cl/papers/imran_lykourentzou_castillo_2014_engineering_crowdsourced_stream_processing_system.pdf

50




51


Design objectives and principlesDesign principles

Design objective Example metric Automatic components

Crowdsourced components

Low latency End-to-end time Keep-items moving Trivial tasks

High throughput Output items per unit of time

High-performance processing

Task automation

Load adaptability Rate response function

Load shedding, load queueing

Task prioritization

Cost effectiveness Cost vs. quality, throughput, etc.

N/A Task frugality

High quality Application-dependent

Redudancy, aggregation and quality control



Design patterns● QA loop

● Task assignment

● Process/verify

● Supervised learning

● Crowdwork sub-task chaining

● Humans are not a bottleneck

● Humans review every output element

53


http://aidr.qcri.org/



http://aidr.qcri.org/

54


Self-service for crisis-related classification

Unstructuredtext reports

Categorizedinformation

Automaticclassifier

ModelBuilder

Crowdsourcedground-truth

Library of training data



Credibility and verificationAditi Gupta, Ponnurangam Kumaraguru, Carlos Castillo and Patrick Meier:TweetCred: A Real-time Web-based System for Credibility of Content on TwitterIn SocInfo 2014. Runner-up for best paper award.

Carlos Castillo, Marcelo Mendoza, Barbara Poblete:Predicting Information Credibility in Time-Sensitive Social MediaIn Internet Research, Vol. 23, Issue 5. October 2013.

A. Popoola, D. Krasnoshtan, A. Toth, V. Naroditskiy, C. Castillo, P. Meier and I. Rahwan:Information Verification during Natural DisastersSocial Web and Disaster Management (SWDM) workshop, 2013.

http://chato.cl/papers/gupta_kumaraguru_castillo_meier_2014_tweetcred.pdf

http://chato.cl/papers/castillo_mendoza_poblete_2012_predicting_credibility_twitter.pdf

http://www.chato.cl/papers/pktncmr_2013_information_verification_natural_disasters.pdf

3

http://www.youtube.com/watch?v=pAHoEO-K0Ek

http://www.youtube.com/watch?v=pAHoEO-K0Ek

62


Crowdsourced verification: Veri.ly

• Frame crowdwork correctly• Not upvoting/downvoting a claim• Instead, providing evidence for/against

@VeriDotLy — http://veri.ly/



http://veri.ly/

65


Examples of evidence provided



66


Automatic credibility evaluation: TweetCred

• Real-time web-based service• Used as a Chrome extension• Annotates Twitter's timeline with credibility

scores



67


http://twitdigest.iiitd.edu.in/TweetCred/



http://twitdigest.iiitd.edu.in/TweetCred/

68


Next steps

• Credibility facets– Factually written– Detailed– Author on the ground– ...

• Respond to searches about an event



Closing remarks

71


Computationally feasible

Supported bydata

Useful

Good projects in this space



72


Computationally feasible

Supported bydata

Useful

Good projects in this space

Temptation! Danger!

Poorly planned projects :-(

AI-complete problems



73


Some venues

• SWDM – Workshop on Social Webfor Disaster Management– Deadline: January 24th

• ISCRAM – International Conference on Information Systems for Crisis Response and Management

+ the usual suspects, depending on your area ;-)



74


Possibility of large impact by using computer science to support humanitarian work

=Applied computing at its best



Thank you!Carlos Castillo · [email protected]

http://www.chato.cl/research/With thanks to Patrick Meier for several slides