Smart Data for you and me: Personalized and Actionable Physical Cyber Social Big Data

Preview:

DESCRIPTION

Featured Keynote at Worldcomp'14, July 2014: http://www.world-academy-of-science.org/worldcomp14/ws/keynotes/keynote_sheth Video of the talk at: http://youtu.be/2991W7OBLqU Big Data has captured a lot of interest in industry, with the emphasis on the challenges of the four Vs of Big Data: Volume, Variety, Velocity, and Veracity, and their applications to drive value for businesses. Recently, there is rapid growth in situations where a big data challenge relates to making individually relevant decisions. A key example is human health, fitness, and well-being. Consider for instance, understanding the reasons for and avoiding an asthma attack based on Big Data in the form of personal health signals (e.g., physiological data measured by devices/sensors or Internet of Things around humans, on the humans, and inside/within the humans), public health signals (information coming from the healthcare system such as hospital admissions), and population health signals (such as Tweets by people related to asthma occurrences and allergens, Web services providing pollen and smog information, etc.). However, no individual has the ability to process all these data without the help of appropriate technology, and each human has different set of relevant data! In this talk, I will forward the concept of Smart Data that is realized by extracting value from Big Data, to benefit not just large companies but each individual. If I am an asthma patient, for all the data relevant to me with the four V-challenges, what I care about is simply, “How is my current health, and what is the risk of having an asthma attack in my personal situation, especially if that risk has changed?” As I will show, Smart Data that gives such personalized and actionable information will need to utilize metadata, use domain specific knowledge, employ semantics and intelligent processing, and go beyond traditional reliance on ML and NLP. For harnessing volume, I will discuss the concept of Semantic Perception, that is, how to convert massive amounts of data into information, meaning, and insight useful for human decision-making. For dealing with Variety, I will discuss experience in using agreement represented in the form of ontologies, domain models, or vocabularies, to support semantic interoperability and integration. For Velocity, I will discuss somewhat more recent work on Continuous Semantics, which seeks to use dynamically created models of new objects, concepts, and relationships, using them to better understand new cues in the data that capture rapidly evolving events and situations. Smart Data applications in development at Kno.e.sis come from the domains of personalized health, energy, disaster response, and smart city. I will present examples from a couple of these.

Citation preview

Smart Data for you and me: Personalized and Actionable Physical Cyber Social Big Data

Put Knoesis Banner

Keynote at WorldComp 2014, July 21, 2014

Amit ShethLexisNexis Ohio Eminent Scholar & Exec. Director,

The Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis)Wright State, USA

2

BIG Data 2014

http://hrboss.com/hiringboss/articles/big-data-infographic

3

Only 0.5% to 1% of the data is used for analysis.

http://www.csc.com/insights/flxwd/78931-big_data_growth_just_beginning_to_explodehttp://www.guardian.co.uk/news/datablog/2012/dec/19/big-data-study-digital-universe-global-volume

4

Variety – not just structure but modality: multimodal, multisensory

Structured

Unstructured

Semi structured

Audio

Video

Images

5

Velocity

Fast Data

Rapid Changes

Real-Time/Stream Analysis

Current application examples: financial services, stock brokerage, weather tracking, movies/entertainment and online retail

6

What has changed now?

About 2 billion of the 5+ billion have data connections – so they perform “citizen sensing”.And there are more devices connected to the Internet than the entire human population.

These ~2 billion citizen sensors and 10 billion devices & objects connected to the Internet makes this an era of IoT (Internet of Things) and Internet of Everything (IoE).

http://www.cisco.com/web/about/ac79/docs/innov/IoT_IBSG_0411FINAL.pdf

7

“The next wave of dramatic Internet growth will come through the confluence of people, process, data, and things — the Internet of Everything (IoE).”

- CISCO IBSG, 2013

http://www.cisco.com/web/about/ac79/docs/innov/IoE_Economy.pdf

Beyond the IoE based infrastructure, it is the possibility of developing applications that spansPhysical, Cyber and the Social Worlds that is very exciting.

What has changed now?

8

What has not changed?

We need computational paradigms to tap into the rich pulse of the human populace,

and utilize diverse data

We are still working on the simpler representations of the real-world!

Represent, capture, and compute with richer and fine-grained representations of real-world

problems

What should change?

9

Current focus on Big Data is on meeting Enterprise/Company needs.

Significant opportunity in applications for individual and community needs. Many of these, esp. in complex domains such as health, fitness and well-being; better disaster coordination, personalized smart energy These need to exploit diverse data types and sources: Physical(sensor/IoT), Cyber(Web) and Social data.

Smart data –personalized, contextually relevant, actionable information – provide a better computational paradigm.

My take on thinking beyond the Big Data buzz

10

• Not just data to information, not just analysis, but actionable information, delivering insight and support better decision making right in the context of human activities

What is needed?

Data InformationActionable: An apple a day

keeps the doctor away

A blood test has ~30 bio markers…how will a doctor cope with a test with 300K data points?

11

What is needed? Taking inspiration from cognitive models

• Bottom up and top down cognitive processes: – Bottom up: find patterns, mine (ML, …)– Top down: Infusion of models and background

knowledge (data + knowledge + reasoning)

Left(plans)/Right(perceives) BrainTop(plans)/Bottom(perceives) Brainhttp://online.wsj.com/news/articles/SB10001424052702304410204579139423079198270

12

• Ambient processing as much as possible while enabling natural human involvement to guide the system

What is needed?

Smart Refrigerator: Low on Apples

Adapting the Plan: shopping for apples

13

Contextual

Information Smart Data

Makes Sense to a human

Is actionable – timely and better decisions/outcomes

15

My 2004-2005 formulation of SMART DATA - Semagix

Formulation of Smart Data strategy providing services for Search, Explore, Notify.

“Use of Ontologies and Data repositories to gain

relevant insights”

16

Smart Data (2014 retake)

Smart data makes sense out of Big data

It provides value from harnessing the challenges posed by volume, velocity, variety

and veracity of big data, in-turn providing actionable information and improve decision

making.

17

OF human, BY human FOR human

Smart data is about extracting value by improving human involvement in data creation,

processing and consumption. It is about (improving)

computing for human experience.

Another perspective on Smart Data

18Petabytes of Physical(sensory)-Cyber-Social Data everyday! More on PCS Computing: http://wiki.knoesis.org/index.php/PCS

‘OF human’ : Relevant Real-time Data Streams for Human Experience

Use of Prior Human-created Knowledge Models

19

‘BY human’: Involving Crowd Intelligence in data processing

Crowdsourcing and Domain-expert guided Machine Learning Modeling

20

Detection of events, such as wheezing sound, indoor temperature, humidity,

dust, and CO level

Weather Application

Asthma Healthcare Application

Close the window at home during day to avoid CO in

gush, to avoid asthma attacks at night

‘FOR human’ : Improving Human Experience (Smart Health)

Population Level

Personal

Public Health

Action in the Physical World

Luminosity

CO levelCO in gush during day time

21

Electricity usage over a day, device at work, power consumption, cost/kWh,

heat index, relative humidity, and public events from social stream

Weather Application

Power Monitoring Application

‘FOR human’ : Improving Human Experience (Smart Energy)

Population Level Observations

Personal Level Observations

Action in the Physical World

Washing and drying has resulted in significant cost

since it was done during peak load period. Consider

changing this time to night.

22

Every one and everything has Big Data –It is Smart Data that matter!

23http://www.technologyreview.com/featuredstory/426968/the-patient-of-the-future/

MIT Technology Review, 2012

The Patient of the Future

Physical-Cyber-Social Computing An early 21st century approach to Computing for Human

Experience

PCS Computing

People live in the physical world while interacting with the cyber and social worlds

Physical WorldCyber World

Social World

26

Computations leverage observations form sensors, knowledge and experiences from

people to understand, correlate, and personalize solutions.

Physical-Cyber

Social-Cyber

Physical-Cyber-Social

What if?

27

Sensors around, on, and in humans will bridge the physical and cyber world.

Cyber

Physical

We believe that current CPS should view the physical worldby incorporate solutions form (knowledge) cyber worldwith a lens of social context.

There are silos of knowledge on the cyber world which are under utilized.

Social

Social networks bridge the social interactions in the physical and cyber world.

Mark’s discomfort sensed by: galvanic skin response, heart rate, fitbit, and Microsoft Kinect

Physical Cyber Social Computing involves: (1) Comparing physiological observations from people similar to him (age, weight, lifestyle,ethnicity, etc.) (2) Analyzing health experiences of similar people reporting heartburn (3) Incorporating history of ailments of Mark (4) Leveraging medical domain knowledge of diseases and symptoms.•He is advised to visit a doctor since he had a heart condition (from EMR) in the past and heartburns in similar people (social) was a symptom of arterial blockage

Mark is experiencing heartburn.

Alert to contact his doctor.

Physical

Sensing

Actuating

Computing

Rich knowledge of the medical

domain

EMR and PHR

Physiological sensor data from

human population

Health related experiences shared by humans

PCS Computing: Health Scenario

28

Vertical operators facilitate transcending from data-information-knowledge-

wisdom using background knowledge

Horizontal operators facilitate semantic integration of multimodal and

multisensory observations

PCS Computing

PCS computing is a holistic treatment of data, information, and knowledge from physical, cyber, and social worlds to integrate, understand, correlate, and provide contextually relevant abstractions to humans. Think of PCS Computing as the application/semantic layer for the IoE-based infrastructure.

http://wiki.knoesis.org/index.php/PCS

DATAsensor observations

KNOWLEDGEsituation awareness useful

for decision making

29

Primary challenge is to bridge the gap between data and knowledge

30

What if we could automate this sense making ability?

… and do it efficiently and at scale

31

Making sense of sensor data with

Henson et al An Ontological Approach to Focusing Attention and Enhancing Machine Perception on the Web, Applied Ont, 2011

32

People are good at making sense of sensory input

What can we learn from cognitive models of perception?

The key ingredient is prior knowledge

33* based on Neisser’s cognitive model of perception

ObserveProperty

PerceiveFeature

Explanation

Discrimination

1

2

Translating low-level signals into high-level knowledge

Focusing attention on those aspects of the environment that provide useful information

Prior Knowledge

Perception Cycle*

Convert large number of observations to semantic abstractions that provide insights and translate into

decisions

34

To enable machine perception,

Semantic Web technology is used to integrate

sensor data with prior knowledge on the Web

W3C SSN XG 2010-2011, SSN Ontology

35

W3C Semantic Sensor Network (SSN) Ontology Bi-partite Graph

Prior knowledge on the Web

36

W3C Semantic Sensor Network (SSN) Ontology Bi-partite Graph

Prior knowledge on the Web

38

Inference to the best explanation• In general, explanation is an abductive

problem; and hard to compute

Finding the sweet spot between abduction and OWL• Single-feature assumption* enables use

of OWL-DL deductive reasoner

* An explanation must be a single feature which accounts for

all observed properties

Explanation

Explanation is the act of choosing the objects or events that best account for a set of observations; often referred to as hypothesis building

Representation of Parsimonious Covering Theory in OWL-DL

39

ExplanatoryFeature ≡ ssn:isPropertyOf∃ —.{p1} … ssn:isPropertyOf⊓ ⊓ ∃ —.{pn}

elevated blood pressure

clammy skin

palpitations

Hypertension

Hyperthyroidism

Pulmonary Edema

Observed Property Explanatory Feature

Explanation

Explanatory Feature: a feature that explains the set of observed properties

40

ObserveProperty

PerceiveFeature

Explanation

Discrimination2

Focusing attention on those aspects of the environment that provide useful information

Discrimination

Discrimination is the act of finding those properties that, if observed, would help distinguish between multiple explanatory

features

41

ExpectedProperty ≡ ssn:isPropertyOf.{f∃ 1} … ssn:isPropertyOf.{f⊓ ⊓ ∃ n}

elevated blood pressure

clammy skin

palpitations

Hypertension

Hyperthyroidism

Pulmonary Edema

Expected Property Explanatory Feature

Discrimination

Expected Property: would be explained by every explanatory feature

42

NotApplicableProperty ≡ ¬ ssn:isPropertyOf.{f∃ 1} … ¬ ssn:isPropertyOf.{f⊓ ⊓ ∃ n}

elevated blood pressure

clammy skin

palpitations

Hypertension

Hyperthyroidism

Pulmonary Edema

Not Applicable Property Explanatory Feature

Discrimination

Not Applicable Property: would not be explained by any explanatory feature

43

DiscriminatingProperty ≡ ¬ExpectedProperty ¬NotApplicableProperty⊓

elevated blood pressure

clammy skin

palpitations

Hypertension

Hyperthyroidism

Pulmonary Edema

Discriminating Property Explanatory Feature

Discrimination

Discriminating Property: is neither expected nor not-applicable

Qualities-High BP-Increased Weight

Entities-Hypertension-Hypothyroidism

kHealth

Machine Sensors

Personal Input

EMR/PHR

Comorbidity risk score e.g., Charlson Index

Longitudinal studies of cardiovascular risks

- Find risk factors- Validation - domain knowledge - domain expert

Find contribution of each risk

factor

Risk Assessment Model

Current Observations-Physical-Physiological-History

Risk Score (e.g., 1 => continue3 => contact clinic)

Model CreationValidate correlations

Historical observations e.g., EMR, sensor observations

44

Risk Score: from Data to Abstraction and Actionable Information

45

Use of OWL reasoner is resource intensive (especially on resource-constrained devices), in terms of both memory and time

• Runs out of resources with prior knowledge >> 15 nodes

• Asymptotic complexity: O(n3)

How do we implement machine perception efficiently on a

resource-constrained device?

46

intelligence at the edge

Approach 1: Send all sensor observations to the cloud for processing

Approach 2: downscale semantic processing so that

each device is capable of machine perception

47

0101100011010011110010101100011011011010110001101001111001010110001101011000110100111

Efficient execution of machine perception

Use bit vector encodings and their operations to encode prior knowledge and execute semantic reasoning

Henson et al. 'An Efficient Bit Vector Approach to Semantics-based Machine Perception in Resource-Constrained Devices, ISWC 2012.

48

O(n3) < x < O(n4)

O(n)

Efficiency Improvement

• Problem size increased from 10’s to 1000’s of nodes

• Time reduced from minutes to milliseconds• Complexity growth reduced from polynomial

to linear

Evaluation on a mobile device

49

2 Prior knowledge is the key to perceptionUsing SW technologies, machine perception can be formalized and integrated with prior knowledge on the Web

3 Intelligence at the edgeBy downscaling semantic

inference, machine perception can execute efficiently on resource-constrained devices

1 Translate low-level data to high-level knowledge

Machine perception can be used to convert low-level sensory signals into high-level knowledge useful for decision making

Semantic Perception for smarter analytics: 3 ideas to takeaway

50

• Healthcare: ADFH, Asthma, GI– Using kHealth system

• Smart Cities: Traffic management

I will use applications in 2 domains to demonstrate

• Social Media Analysis*:Crisis coordinationUsing Twitris platform

kHealthKnowledge-enabled Healthcare

To reduce preventable readmissions of patients with chronic heart failure (CHF, specifically ADHF) and GI;

Asthma in children

51

Brief Introduction Video

53

Through physical monitoring and analysis, our cellphones could act as an early warning system to detect serious health conditions, and provide actionable information

canary in a coal mine

Empowering Individuals (who are not Larry Smarr!) for their own health

kHealth: knowledge-enabled healthcare

Weight Scale

Heart Rate Monitor

Blood PressureMonitor

54

Sensors

Android Device (w/ kHealth App)

Readmissions cost $17B/year: $50K/readmission; Total kHealth kit cost: <

$500

kHealth Kit for the application for reducing ADHF readmission

ADHF – Acute Decompensated Heart Failure

55

1http://www.nhlbi.nih.gov/health/health-topics/topics/asthma/2http://www.lung.org/lung-disease/asthma/resources/facts-and-figures/asthma-in-adults.html 3Akinbami et al. (2009). Status of childhood asthma in the United States, 1980–2007. Pediatrics,123(Supplement 3), S131-S145.

25 million

300 million

$50 billion

155,000

593,000

People in the U.S. are diagnosed with asthma (7 million are children)1.

People suffering from asthma worldwide2.

Spent on asthma alone in a year2

Hospital admissions in 20063

Emergency department visits in 20063

Asthma: Severity of the problem

Sensordrone (Carbon monoxide,

temperature, humidity) Node Sensor

(exhaled Nitric Oxide)

56

Sensors

Android Device (w/ kHealth App)

Total cost: ~ $500

kHealth Kit for the application for Asthma management

*Along with two sensors in the kit, the application uses a variety of population level signals from the web:

Pollen level Air Quality Temperature & Humidity

59

what can we do to avoid asthma episode?

Real-time health signals from personal level (e.g., Wheezometer, NO in breath, accelerometer, microphone), public health (e.g., CDC, Hospital EMR), and population level (e.g., pollen level, CO2) arriving continuously in fine grained samples potentially with missing information and uneven sampling frequencies.

Variety Volume

VeracityVelocity

ValueWhat risk factors influence asthma control?What is the contribution of each risk factor?

sem

antic

s Understanding relationships betweenhealth signals and asthma attacksfor providing actionable information

WHY Big Data to Smart Data: Asthma example

kHealth: Health Signal Processing Architecture

Personal level Signals

Public level Signals

Population level Signals

Domain Knowledge

Risk Model

Events from Social Streams

Take Medication before going to work

Avoid going out in the evening due to high pollen levels

Contact doctor

AnalysisPersonalized Actionable

Information

Data Acquisition & aggregation

60

61

Asthma Domain Knowledge

Domain Knowledge

ICS= inhaled corticosteroid, LABA = inhaled long-acting beta2-agonist, SABA= inhaled short-acting beta2-agonist ; *consider referral to specialist

Asthma Control and Actionable Information

62

Patient Health Score (diagnostic)

Risk assessment model

Semantic Perception

Personal level Signals

Public level Signals

Domain Knowledge

Population level Signals

GREEN -- Well Controlled YELLOW – Not well controlledRed -- poor controlled

How controlled is my asthma?

63

Patient Vulnerability Score (prognostic)

Risk assessment model

Semantic Perception

Personal level Signals

Public level Signals

Domain Knowledge

Population level Signals

Patient health Score

How vulnerable* is my control level today?

*considering changing environmental conditions and current control level

67

Sensordrone – for monitoring environmental air quality

Wheezometer – for monitoringwheezing sounds

Can I reduce my asthma attacks at night?

What are the triggers? What is the wheezing level?

What is the propensity toward asthma?

What is the exposure level over a day?

Commute to Work

Asthma: Actionable Information for Asthma Patients

Luminosity

CO level

CO in gush during day time

Actionable Information

Personal level Signals

Public level Signals

Population level Signals

What is the air quality indoors?

68

Population Level

Personal

Wheeze – YesDo you have tightness of chest? –Yes

Observations Physical-Cyber-Social System Health Signal Extraction Health Signal Understanding

<Wheezing=Yes, time, location>

<ChectTightness=Yes, time, location>

<PollenLevel=Medium, time, location>

<Pollution=Yes, time, location>

<Activity=High, time, location>

Wheezing

ChectTightness

PollenLevel

Pollution

Activity

Wheezing

ChectTightness

PollenLevel

Pollution

Activity

RiskCategory

<PollenLevel, ChectTightness, Pollution,Activity, Wheezing, RiskCategory><2, 1, 1,3, 1, RiskCategory><2, 1, 1,3, 1, RiskCategory><2, 1, 1,3, 1, RiskCategory><2, 1, 1,3, 1, RiskCategory>

.

.

.

Expert Knowledge

Background Knowledge

tweet reporting pollution level and asthma attacks

Acceleration readings fromon-phone sensors

Sensor and personal observations

Signals from personal, personal spaces, and community spaces

Risk Category assigned by doctors

Qualify

Quantify

Enrich

Outdoor pollen and pollution

Public Health

Health Signal Extraction to Understanding

Well Controlled - continueNot Well Controlled – contact nursePoor Controlled – contact doctor

73

RDF OWL

How are machines supposed to integrate and interpret sensor data?

Semantic Sensor Networks (SSN)

74

W3C Semantic Sensor Network Ontology

Lefort, L., Henson, C., Taylor, K., Barnaghi, P., Compton, M., Corcho, O., Garcia-Castro, R., Graybeal, J., Herzog, A., Janowicz, K., Neuhaus, H., Nikolov, A., and Page, K.: Semantic Sensor Network XG Final Report, W3C Incubator Group Report (2011).

76

W3C Semantic Sensor Network Ontology

Lefort, L., Henson, C., Taylor, K., Barnaghi, P., Compton, M., Corcho, O., Garcia-Castro, R., Graybeal, J., Herzog, A., Janowicz, K., Neuhaus, H., Nikolov, A., and Page, K.: Semantic Sensor Network XG Final Report, W3C Incubator Group Report (2011).

SSNOntology

2 Interpreted data(deductive)[in OWL] e.g., threshold

1 Annotated Data[in RDF]e.g., label

0 Raw Data[in TEXT]e.g., number

Levels of Abstraction

3 Interpreted data (abductive)[in OWL]e.g., diagnosis

Intellego

“150”

Systolic blood pressure of 150 mmHg

ElevatedBlood

Pressure

Hyperthyroidism

less

use

ful …

mor

e us

eful

……

78

79

Making sense of sensor data with

80

People are good at making sense of sensory input

What can we learn from cognitive models of perception?• The key ingredient is prior knowledge

81* based on Neisser’s cognitive model of perception

ObserveProperty

PerceiveFeature

Explanation

Discrimination

1

2

Perception Cycle*

Translating low-level signals into high-level knowledge

Focusing attention on those aspects of the environment that provide useful information

Prior Knowledge

82

To enable machine perception,

Semantic Web technology is used to integrate sensor data with prior knowledge on the Web

83

Prior knowledge on the Web

W3C Semantic Sensor Network (SSN) Ontology Bi-partite Graph

84

Prior knowledge on the Web

W3C Semantic Sensor Network (SSN) Ontology Bi-partite Graph

86

Discrimination is the act of finding those properties that, if observed, would help distinguish between multiple explanatory features

ObserveProperty

PerceiveFeature

Explanation

Discrimination2

Focusing attention on those aspects of the environment that provide useful information

Discrimination

87

Discrimination

Discriminating Property: is neither expected nor not-applicable

DiscriminatingProperty ≡ ¬ExpectedProperty ¬NotApplicableProperty⊓

elevated blood pressure

clammy skin

palpitations

Hypertension

Hyperthyroidism

Pulmonary Edema

Discriminating Property Explanatory Feature

88

Semantic scalability: Resource savings of abstracting sensor data

Orders of magnitude resource savings for generating and storing relevant abstractions vs. raw observations.

Relevant abstractions

Raw observations

89

How do we implement machine perception efficiently on aresource-constrained device?

Use of OWL reasoner is resource intensive (especially on resource-constrained devices), in terms of both memory and time

• Runs out of resources with prior knowledge >> 15 nodes• Asymptotic complexity: O(n3)

90

intelligence at the edge

Approach 1: Send all sensor observations to the cloud for processing

Approach 2: downscale semantic processing so that each device is capable of machine perception

Henson et al. 'An Efficient Bit Vector Approach to Semantics-based Machine Perception in Resource-Constrained Devices, ISWC 2012.

91

Efficient execution of machine perception

Use bit vector encodings and their operations to encode prior knowledge and execute semantic reasoning

0101100011010011110010101100011011011010110001101001111001010110001101011000110100111

92

O(n3) < x < O(n4) O(n)

Efficiency Improvement

• Problem size increased from 10’s to 1000’s of nodes• Time reduced from minutes to milliseconds• Complexity growth reduced from polynomial to

linear

Evaluation on a mobile device

93

2 Prior knowledge is the key to perceptionUsing SW technologies, machine perception can be formalized and integrated with prior knowledge on the Web

3 Intelligence at the edgeBy downscaling semantic inference, machine perception can

execute efficiently on resource-constrained devices

Semantic Perception for smarter analytics: 3 ideas to takeaway

1 Translate low-level data to high-level knowledgeMachine perception can be used to convert low-level sensory signals into high-level knowledge useful for decision making

94

PCS Computing for Traffic Analytics:for personal and community needs

96

Duration: 36 months

Requested funding: 2.531.202 €

CityPulse Consortium

City of Aarhus

City of Brasov

97

Vehicular traffic data from San Francisco Bay Area aggregated from on-road sensors (numerical data/Physical), incident reports (textual/Cyber) and Tweets (Social)

http://511.org/

Every minute update of speed, volume, travel time, and occupancy resulting in 178 million link status observations, 8 million tweets, 738 active events, and 146 scheduled events with many unevenly sampled observations collected over 3 months.

Variety Volume

VeracityVelocity

ValueCan we detect the onset of traffic congestion?Can we characterize traffic congestion based on events?Can we estimate traffic delays in a road network?

sem

antic

s Representing prior knowledge of traffic lead to a focused exploration of this massive dataset

Big Data to Smart Data: Traffic Management example

98

Heterogeneity leading to complementary observations

Textual Streams for City Related Events

99

City Event Annotation – CRF Annotation Examples

Last O night O in O CA... O (@ O Half B-LOCATION Moon I-LOCATION Bay B-LOCATION Brewing I-LOCATION Company O w/ O 8 O others) O http://t.co/w0eGEJjApY O

B-LOCATIONI-LOCATIONB-EVENTI-EVENTO

Tags used in our approach:

These are the annotations providedby a Conditional Random Field modeltrained on tweet corpus to spotcity related events and location

BIO – Beginning, Intermediate, and Other is a notation used in multi-phrase entity spotting 100

Accident

Music event

Sporting event Road Work

Theatre event

External events<ActiveEvents, ScheduledEvents>

Internal observations<speed, volume, traveTime>

Weather

Time of Day

101

Modeling Traffic Events: Pictorial representation

102

Slow moving traffic

Link Description

Scheduled Event

Scheduled Event

511.org

511.org

Schedule Information

511.org

Domain Experts

ColdWeather

PoorVisibility

SlowTraffic

IcyRoad

Declarative domain knowledge

Causal knowledge

Linked Open Data

ColdWeather(YES/NO)IcyRoad (ON/OFF) PoorVisibility (YES/NO)SlowTraffic (YES/NO)

1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 0

Domain Observations

Domain Knowledge

Structure and parameters

103

WinterSeason

Otherknowledge

Correlations to causations using Declarative knowledge on the Semantic Web

Combining Data and Knowledge Graph

Traffic jam

Link Descriptio

n

Scheduled Event

traffic jam

baseball game

Add missing random variables

Time of day

bad weather CapableOf slow traffic

bad weather

Traffic data from sensors deployed on road network in San Francisco

Bay Area

time of day

traffic jam

baseball gametime of day

slow traffic

Three Operations: Complementing graphical model structure extraction

Add missing links bad weather

traffic jam

baseball gametime of day

slow traffic

Add link directionbad weather

traffic jam

baseball gametime of day

slow traffic

go to baseball game Causes traffic jam

Knowledge from ConceptNet5

traffic jam CapableOfoccur twice each daytraffic jam CapableOf slow traffic

104

City Infrastructure

Tweets from a cityPOS

Tagging

Hybrid NER+ Event term extraction

Geohashing

Temporal Estimation

Impact Assessment

Event Aggregation

OSM Locations

SCRIBE ontology

511.org hierarchy

City Event Extraction

City Event Extraction Solution Architecture

City Event Annotation

OSM – Google Open Street MapsNER – Named Entity Recognition 105

City Events from Sensor and Social Streams can be…

• Complementary• Additional information• e.g., slow traffic from sensor data and accident from textual data

• Corroborative• Additional confidence• e.g., accident event supporting a accident report from ground truth

• Timely • Additional insight• e.g., knowing poor visibility before formal report from ground truth

106

Evaluation – Extracted Events AND Ground Truth Verification

Complementary Events

Event SourcesCity events extracted from tweets511.org, Active events e.g., accidents, breakdowns 511.org, Scheduled events e.g., football game, parade

City event extracted from twitter reporting about traffic complementing the road construction event reported on 511.org

Evaluation – Extracted Events AND Ground Truth Verification

Corroborative Events

Event SourcesCity events extracted from tweets511.org, Active events e.g., accidents, breakdowns 511.org, Scheduled events e.g., football game, parade

City event from twitter providing corroborative evidence for fog reported by 511.org

Evaluation – Extracted Events AND Ground Truth Verification

Event SourcesCity events extracted from tweets511.org, Active events e.g., accidents, breakdowns 511.org, Scheduled events e.g., football game, parade

City event from twitter providing report of a tornado before an event related to strong winds is reported by 511.org

Timeliness

Events from Social Streams and City Department*

Corroborative EventsComplementary Events

Event SourcesCity events extracted from tweets511.org, Active events e.g., accidents, breakdowns 511.org, Scheduled events e.g., football game, parade

City event from twitter providing complementary and corroborative evidence for fog reported by 511.org

*511.org 110

111

Actionable Information in City Management

Tweets from a CityTraffic Sensor Data OSM Locations

SCRIBE ontology

511.org hierarchy

Web of Data

How issues in a city can be resolved?e.g., what should I do when I have fog condition?

112

Two excellent videos• Vinod Khosla:

the Power of Storytelling and the Future of Healthcare

• Larry Smarr: The Human Microbiome and the Revolution in Digital Health

Wrapping up: For more on importance of what we talked about

113

• Big Data is every where– at individual and community levels - not just

limited to corporation – with growing complexity: Physical-Cyber-Social

• Analysis is not sufficient• Bottom up techniques are not sufficient, need

top down processing, need background knowledge

Wrapping up: Take Away

114

Wrapping up: Take Away

• Focus on Humans and Improve human life and experience with SMART Data.– Data to Information to Contextually Relevant

Abstractions (Semantic Perception)– Actionable Information (Value from data) to assist

and support human in decision making.

• Focus on Value -- SMART Data– Big Data Challenges without the intention of deriving

Value is a “Journey without GOAL”.

• Collaborators: Clinicians: Dr. William Abrahams (OSU-Wexner), Dr. Shalini Forbis (Dayton Childrens), Dr. Sangeeta Agrawal (VA), Valerie Shalin (WSU Cognitive Scientists ), Payam Barnaghi (U-Surrey), Ramesh Jain(UCI), …

• Funding: NSF (esp. IIS-1111183 “SoCS: Social Media Enhanced Organizational Sensemaking in Emergency Response,”), AFRL, NIH, Industry….

Acknowledgment

116

Amit Sheth’s PHD students

Ashutosh Jadhav*

Hemant Purohit

Vinh Nguyen

Lu ChenPavan Kapanipathi

*

Pramod Anantharam

*

Sujan Perera

Maryam Panahiazar

Sarasi Lalithsena

Shreyansh Batt

Kalpa Gunaratna

Delroy Cameron

Sanjaya Wijeratne

Wenbo Wang

Special thanks: Ashu. This presentation covers some of the work of my PhD students. Key contributors: Pramod Anantharam, Cory Henson and TK Prasad.

Special thanks

117

• Among top universities in the world in World Wide Web (cf: 10-yr impact, Microsoft Academic Search: among top 10 in June2014)

• Among the largest academic groups in the US in Semantic Web + Social/Sensor Webs, Mobile/Cloud/Cognitive Computing, Big Data, IoT, Health/Clinical & Biomedicine Applications

• Exceptional student success: internships and jobs at top salary (IBM Watson/Research, MSR, Amazon, CISCO, Oracle, Yahoo!, Samsung, research universities, NLM, startups )

• 100 researchers including 15 World Class faculty (>3K citations/faculty avg) and ~45 PhD students- practically all funded

• Extensive research for largely multidisciplinary projects; world class resources; industry sponsorships/collaborations (Google, IBM, …)

118

Top organization in WWW: 10-yr Field Rating (MAS)

119

Smart Data to Big Data; Physical-Cyber-Social Computing

http://knoesis.org

120

thank you, and please visit us at

http://knoesis.org

Smart Data to Big Data; Physical-Cyber-Social Computing

Recommended