27
Refactoring Earthquake-Tsunami Causality and Messaging via Big Data Analytics: The Transformative Potential of Credible Tweets L. I. Lumb 1,2 & J. R. Freemantle 3 1 York University, 2 Univa Corporation & 3 Independent MCBDA 2016 (First Workshop) PVAMU, May 17, 2016

Refactoring Earthquake-Tsunami Causality and Messaging via Big …credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1... · 2016-05-17 · Refactoring Earthquake-Tsunami Causality

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Refactoring Earthquake-Tsunami Causality and Messaging via Big …credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1... · 2016-05-17 · Refactoring Earthquake-Tsunami Causality

Refactoring Earthquake-Tsunami Causality andMessaging via Big Data Analytics: The

Transformative Potential of Credible Tweets

L. I. Lumb1,2 & J. R. Freemantle3

1York University, 2Univa Corporation & 3Independent

MCBDA 2016 (First Workshop) PVAMU, May 17, 2016

Page 2: Refactoring Earthquake-Tsunami Causality and Messaging via Big …credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1... · 2016-05-17 · Refactoring Earthquake-Tsunami Causality

Agenda● Motivation● Traditional Data ● Social-Networking Data

○ Graphs, Semantics & Machine Learning

● Conclusions

Page 4: Refactoring Earthquake-Tsunami Causality and Messaging via Big …credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1... · 2016-05-17 · Refactoring Earthquake-Tsunami Causality
Page 5: Refactoring Earthquake-Tsunami Causality and Messaging via Big …credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1... · 2016-05-17 · Refactoring Earthquake-Tsunami Causality
Page 6: Refactoring Earthquake-Tsunami Causality and Messaging via Big …credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1... · 2016-05-17 · Refactoring Earthquake-Tsunami Causality
Page 7: Refactoring Earthquake-Tsunami Causality and Messaging via Big …credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1... · 2016-05-17 · Refactoring Earthquake-Tsunami Causality

Motivation● Non-deterministic cause

○ Uncertainty inherent in any attempt to predict earthquakes■ In situ measurements may reduce uncertainty

● Lead times○ Availability of actionable observations ○ Communication of situation - advisories, warnings, etc.

● Cause-effect relationship○ Energy transfer - inputs ... coupling ... outputs

■ ‘Geometry’ - bathymetry and topography○ Other factors - e.g., tides

● Established effect○ Far-field estimates of tsunami propagation (pre-computed) and coastal inundation (real-time)

have proven to be extremely accurate ... requires● Distributed array of deep-ocean tsunami detection buoys + forecasting model

Page 8: Refactoring Earthquake-Tsunami Causality and Messaging via Big …credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1... · 2016-05-17 · Refactoring Earthquake-Tsunami Causality

Agenda● Motivation● Traditional Data ● Social-Networking Data

○ Graphs, Semantics & Machine Learning

● Conclusions

Page 10: Refactoring Earthquake-Tsunami Causality and Messaging via Big …credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1... · 2016-05-17 · Refactoring Earthquake-Tsunami Causality

http

://w

ww

.eas

.slu

.edu

/GG

P/im

ages

/igra

v2.jp

g

Page 11: Refactoring Earthquake-Tsunami Causality and Messaging via Big …credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1... · 2016-05-17 · Refactoring Earthquake-Tsunami Causality

Lumb & Aldridge, http://dx.doi.org/10.1109/HPCS.2006.26

Page 12: Refactoring Earthquake-Tsunami Causality and Messaging via Big …credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1... · 2016-05-17 · Refactoring Earthquake-Tsunami Causality

Agenda● Motivation● Traditional Data ● Social-Networking Data

○ Graphs, Semantics & Machine Learning ● Conclusions

Page 13: Refactoring Earthquake-Tsunami Causality and Messaging via Big …credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1... · 2016-05-17 · Refactoring Earthquake-Tsunami Causality
Page 14: Refactoring Earthquake-Tsunami Causality and Messaging via Big …credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1... · 2016-05-17 · Refactoring Earthquake-Tsunami Causality

GGP Scientific Data Twitter SN Data

Volume small, finite BIG, ‘infinite’

Variety semi-structured, restricted unstructured, unrestricted - except for IDs, hashtags & URLs (pages, images)

Velocity slow, sampled fast, streamed

Veracity biases, noise & abnormalities

Validity accuracy & correctness

Volatility low (stationary, irreplaceable) high? (mobile?, disposable?)

6Vs: Scientific vs. Social Networking Data

http://insidebigdata.com/2013/09/12/beyond-volume-variety-velocity-issue-big-data-veracity/

Page 15: Refactoring Earthquake-Tsunami Causality and Messaging via Big …credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1... · 2016-05-17 · Refactoring Earthquake-Tsunami Causality

Karau et al., Learning Spark, O’Reilly, 2015

Machine Learning Pipeline

Page 16: Refactoring Earthquake-Tsunami Causality and Messaging via Big …credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1... · 2016-05-17 · Refactoring Earthquake-Tsunami Causality

Deep Learning from Twitter?Represent data

● Twitter data manually curated into ‘ham’ and ‘spam’ ● In-memory representation via Spark RDDs

Extract features

● Frequency-based usage via Spark MLlib HashingTF ⇒ feature vectors

Develop model object

● Spark MLlib LogisticRegressionWithSGD used for classification

Evaluate model

Page 17: Refactoring Earthquake-Tsunami Causality and Messaging via Big …credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1... · 2016-05-17 · Refactoring Earthquake-Tsunami Causality
Page 18: Refactoring Earthquake-Tsunami Causality and Messaging via Big …credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1... · 2016-05-17 · Refactoring Earthquake-Tsunami Causality

Future Work ● Machine Learning

○ Classification algorithms ... with categories?○ Training Experiments

■ Larger data sets■ Degrees of ‘hammyness’ ■ Stop-word removal, stemming, ...

○ Real-time streaming - data from Twitter

● Multiparameter credibility - TweetCred + ML + RDF/OWL GA● Cloud-native platform

○ Containerization, dynamic scheduling and micro services

● Other examples ○ Alberta wildfires ○ Industrial incidents ○ Hurricanes

Page 19: Refactoring Earthquake-Tsunami Causality and Messaging via Big …credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1... · 2016-05-17 · Refactoring Earthquake-Tsunami Causality

Agenda● Motivation● Traditional Data ● Social-Networking Data

○ Graphs, Semantics & Machine Learning

● Conclusions

Page 20: Refactoring Earthquake-Tsunami Causality and Messaging via Big …credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1... · 2016-05-17 · Refactoring Earthquake-Tsunami Causality

Conclusions ● Credible tweets could be transformative

○ Mission-critical Big Data complement to existing data sources and approaches

● Current challenges/opportunities○ Twitter Data

■ Extraction - only 100 tweets at a time (!!!) ■ Curation - manual (read: time consuming!!!)

○ Emphasizing Machine Learning ... appears encouraging, BUT ...■ Graph Analytics ... as well ??? ■ Semantics ... as well ???

Page 21: Refactoring Earthquake-Tsunami Causality and Messaging via Big …credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1... · 2016-05-17 · Refactoring Earthquake-Tsunami Causality

Q&AL. I. Lumb1,2 & J. R. Freemantle3

[email protected], [email protected] & [email protected]

Page 22: Refactoring Earthquake-Tsunami Causality and Messaging via Big …credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1... · 2016-05-17 · Refactoring Earthquake-Tsunami Causality

Graph AnalyticsProblem

http

://w

ww

.jma.

go.

jp/jm

a/en

/201

6_K

umam

oto_

Ear

thqu

ake/

2016

_Kum

amot

o_E

arth

qua

ke.h

tml

Page 23: Refactoring Earthquake-Tsunami Causality and Messaging via Big …credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1... · 2016-05-17 · Refactoring Earthquake-Tsunami Causality
Page 24: Refactoring Earthquake-Tsunami Causality and Messaging via Big …credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1... · 2016-05-17 · Refactoring Earthquake-Tsunami Causality

Perl script prototype● Acquires tweets with the keyword “earthquake”

use Net::Twitter::Lite::WithAPIv1_1;my $nt = Net::Twitter::Lite::WithAPIv1_1->new( consumer_key => 'xxxx...xxxxxxx', consumer_secret => 'xxxxxx.....xxxxxxxxxx', access_token => 'xxxxx....xxxxxxxxxxx', access_token_secret => 'xxxxx.....xxxxxxxxxxx', ssl => 1 );my $result = $nt->search("earthquake");for my $status(@{$result->{statuses}} ) { print "$status->{text}\n";}

Page 25: Refactoring Earthquake-Tsunami Causality and Messaging via Big …credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1... · 2016-05-17 · Refactoring Earthquake-Tsunami Causality

Resilient Distributed Datasets (RDDs)

● Abstraction for in-memory computing

● Fault-tolerant, parallel data structures

o Cluster-ready

● Optionally persistent

● Can be partitioned for optimal placement

● Manipulated via operators

Zaharia et al., NSDI 2012

Page 26: Refactoring Earthquake-Tsunami Causality and Messaging via Big …credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1... · 2016-05-17 · Refactoring Earthquake-Tsunami Causality
Page 27: Refactoring Earthquake-Tsunami Causality and Messaging via Big …credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1... · 2016-05-17 · Refactoring Earthquake-Tsunami Causality