35
Semantics-empowered Big Data Processing for PCS Applications Krishnaprasad Thirunarayan (T. K. Prasad) and Amit Sheth Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, OH-45435

Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Embed Size (px)

DESCRIPTION

Presentation at the AAAI 2013 Fall Symposium on Semantics for Big Data, Arlington, Virginia, November 15-17, 2013 Additional related material at: http://wiki.knoesis.org/index.php/Smart_Data Related paper at: http://www.knoesis.org/library/resource.php?id=1903 Abstract: We discuss the nature of Big Data and address the role of semantics in analyzing and processing Big Data that arises in the context of Physical-Cyber-Social Systems. We organize our research around the five V's of Big Data, where four of the Vs are harnessed to produce the fifth V - value. To handle the challenge of Volume, we advocate semantic perception that can convert low-level observational data to higher-level abstractions more suitable for decision-making. To handle the challenge of Variety, we resort to the use of semantic models and annotations of data so that much of the intelligent processing can be done at a level independent of heterogeneity of data formats and media. To handle the challenge of Velocity, we seek to use continuous semantics capability to dynamically create event or situation specific models and recognize new concepts, entities and facts. To handle Veracity, we explore the formalization of trust models and approaches to glean trustworthiness. The above four Vs of Big Data are harnessed by the semantics-empowered analytics to derive Value for supporting practical applications transcending physical-cyber-social continuum.

Citation preview

Page 1: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Semantics-empowered Big Data Processing for PCS ApplicationsKrishnaprasad Thirunarayan (T. K. Prasad) and Amit Sheth

Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing

Wright State University, Dayton, OH-45435

Page 2: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad 2

Outline

• 5 V’s of Big Data Research

• Semantic Perception for Scalability

• Lightweight semantics to manage heterogeneity – Cost-benefit trade-off and continuum

• Hybrid Knowledge Representation and Reasoning– Anomaly, Correlation, Causation

11/15/2013

Page 3: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad 3

5V’s of Big Data Research

Volume

Velocity

Variety

Veracity

Value

11/15/2013

Big Data => Smart Data

Page 4: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad 4

Volume : Assorted Examples

• 25+ billion sensors have been deployed.

• About 250TB of sensor data are generated for a NY-LA flight on Boeing 737.

• Parkinson disease dataset that tracked 16 patients with mobile phone using 7 sensors over 8 weeks is 12GB.

Check engine light analogy11/15/2013

Page 5: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad 5

Volume : Semantic Perception

• Abstracting machine-sensed data – E.g., fine-grained to coarse-grained– E.g., average, peak, rate of change

• Extracting human-comprehensible features/entities• Machine perception

– Derive conclusions using domain models

and hybrid abductive/deductive reasoning

Goal: Human accessible situational awareness and actionable intelligence for decision making

11/15/2013

Page 6: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad 6

Weather Use Case

• Machine-sensed phenomenon– temperature, precipitation, humidity, wind speed, etc.

• Human perceived features– blizzard, flurry, rain storm, clear, etc.– categories of hurricanes (SSHWS)

• Machine perception– Using domain models from NOAA

• Ultimately, generate weather alerts …

11/15/2013

Page 7: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad 7

Parkinson’s Disease Use Case

• Data from machine-sensors– accelerometer, GPS, compass, microphone, etc.

• Human perceived features– tremors, walking style, balance, slurred speech, etc.

• Machine perception– Using domain models to be created to diagnose and

monitor disease progression

• Ultimately, recommend options to control chronic conditions …

11/15/2013

Page 8: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad 8

Heart Failure Use Case

• Machine-sensed data – weight, heart rate, blood pressure, oxygen level, etc.

• Human perceived features– Risk-level for hospital readmission of CHF/ADHR patient

• Machine perception– Using domain models to be created to monitor heart

condition of a cardiac patient post hospital discharge

• Ultimately, recommend treatments to reduce preventable hospital readmissions …

11/15/2013

Page 9: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad 9

Asthma Use Case

• Data from machine-sensors– Environmental sensors, physiological sensors, etc.

• Human perceived features– Asthma severity gleaned from frequency of asthma

attacks, wheezing, coughing, sleeplessness, etc.

• Machine perception– Using domain models to be created to monitor asthma

patients and their surroundings

• Ultimately, recommend prevention and control options …

11/15/2013

Page 10: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad 10

Traffic Use Case

• Data from machine-sensors, social media stream, and planned event schedules– Traffic flow sensors : link speed, link volume, Event-

specific tweets, etc.

• Human perceived features– traffic delays and congestion, etc.

• Machine perception– Using domain models to be created to understand traffic

patterns in response to events

• Ultimately, recommend traffic management options …

11/15/2013

Page 11: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Slow moving traffic

Link Description

Scheduled Event

Scheduled Event

511.org

511.org

Schedule Information

511.org

Traffic Monitoring

11

Heterogeneity in a Physical-Cyber-Social System

Page 12: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad 12

Volume with a Twist

Resource-constrained reasoning on mobile-devices

Goal: Boolean encodings to ensure feasibility, efficiency, and economy

11/15/2013

Page 13: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

13* based on Neisser’s cognitive model of perception

ObserveProperty

PerceiveFeature

Explanation

Discrimination

1

2

Perception Cycle* that exploits background knowledge / domain models

Abstracting raw data for human

comprehension

Focus generation for disambiguation and action(incl. human in the loop)

Prior Knowledge

Page 14: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Virtues of Our Approach to Semantic Perception

Blends simplicity, effectiveness, and scalability.

• Declarative specification of explanation and discrimination;

• With applications (e.g., to healthcare) that are of contemporary relevance and interdisciplinary;

• Using encodings/algorithms that are significant (asymptotic order of magnitude gain) and necessary (“tractable” due to time/memory reduction for typical problem sizes); and

• Prototyped using extant PCs and mobile devices.

Page 15: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

O(n3) < x < O(n4) O(n)

Efficiency Improvement

• Problem size increased from 10’s to 1000’s of nodes• Time reduced from minutes to milliseconds• Complexity growth reduced from polynomial to

linear

Evaluation on a mobile device

15

Page 16: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad 16

Volume and Velocity

• Lightweight semantics-based Adaptive/Continuous Filtering

E.g.,: Track evolution of crowd-sourced and verified Wikipedia event pages for relevance ranking of Twitter hashtags in Disaster response use-case

• Building domain models dynamically

11/15/2013

Page 17: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Heliopolis is a suburb of

Cairo.

Dynamic Model Creation

Continuous Semantics 17

Page 18: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Variety

Syntactic and semantic heterogeneity • in textual and sensor data, • in (legacy) materials data• in (long tail) geosciences data

Idea: Semantics-empowered integration

11/15/2013 Prasad 18

Page 19: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad 19

Variety (What?): Materials/Geosciences Use Case

• Structured Data (e.g., relational)

• Semi-structured, Heterogeneous Documents (e.g., Publications and technical specs, which usually include text, numerics, maps and images)

• Tabular data (e.g., ad hoc spreadsheets and complex tables incorporating “irregular” entries)

11/15/2013

Page 20: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

20

Variety (How?/Why?): Granularity of Semantics & Applications

• Lightweight semantics: File and document-level annotation to enable discovery and sharing

• Richer semantics: Data-level annotation and extraction for semantic search and summarization

• Fine-grained semantics: Data integration, interoperability and reasoning in Linked Open Data

Cost-benefit trade-off and continuum

Page 21: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad 21

Challenges Associated with Typical Spreadsheet/Table

• Meant for human consumption • Irregular :

– Not simple rectangular grid• Heterogeneous

– All rows not interpreted similarly• Complex

– Meaning of each row and each column context dependent • Footnotes modify meaning of entries (esp. in materials

and process specifications)

11/15/2013

Page 22: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

22

Page 23: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad 23

Practical Semi-Automatic Content Extraction

• DESIGN: Develop regular data structures that can be used to formalize tabular information.– Provide a natural expression of data – Provide semantics to data, thereby removing potential

ambiguities– Enable automatic translation

• USE: Manual population of regular tables and automatic translation into LOD

11/15/2013

Page 24: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Variety (What?) : Sensor Data Use Case

Develop/learn domain models to exploit complementary and corroborative information

• To relate patterns in multimodal data to “situation”

• To integrate machine sensed and human sensed data

11/15/2013 Prasad 24

Page 25: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Variety: Hybrid KRR

Blending data-driven models with declarative knowledge – Data-driven: Bottom-up, correlation-based,

statistical– Declarative: Top-down, causal/taxonomical,

logical– Refine structure to better estimate parameters

E.g., Traffic Analytics using PGMs + KBs

11/15/2013 Prasad 25

Page 26: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Variety (Why?): Hybrid KRR

Data can help compensate for our overconfidence in our own intuitions and reduce the extent to which our desires distort our perceptions.

-- David Brooks of New York Times

However, inferred correlations require clear justification that they are not coincidental, to inspire confidence.

11/15/2013 Prasad 26

Page 27: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

• Correlations due to common cause or origin

• Coincidental due to data skew or misrepresentation

• Coincidental new discovery

• Strong correlation vs causation

• Anomalous and accidental

• Correlation turning into causations

Correlations vs Causation vs Anomalies

11/15/2013 Prasad 27

Page 28: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

• Correlations Due to common cause or origin– E.g., Planets: Copernicus > Kepler > Newton > Einstein

• Coincidental due to data skew or misrepresentation – E.g., Tall policy claims made by politicians!

• Coincidental new discovery– E.g., Hurricanes and Strawberry Pop-Tarts Sales

• Strong correlation vs causation– E.g., Spicy foods vs Helicobacter Pyroli : Stomach Ulcers

• Anomalous and accidental– E.g., CO2 levels and Obesity

• Correlation turning into causations– E.g., Pavlovian learning: conditional reflex

Correlations vs Causation vs Anomalies

11/15/2013 Prasad 28

Page 29: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

• Correlations Due to common cause or origin– E.g., Planets: Copernicus > Kepler > Newton > Einstein

• Coincidental due to data skew or misrepresentation – E.g., Tall policy claims made by politicians!

• Coincidental new discovery– E.g., Hurricanes and Strawberry Pop-Tarts Sales

• Strong correlation vs causation– E.g., Spicy foods vs Helicobacter Pyroli : Stomach Ulcers

• Anomalous and accidental– E.g., CO2 levels and Obesity

• Correlation turning into causations– E.g., Pavlovian learning: conditional reflex

Paradoxes: The Seeds of Progress

Correlations vs Causation vs Anomalies

11/15/2013 Prasad 29

Page 30: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Veracity

Lot of existing work on Trust ontologies, metrics and models, and on Provenance tracking

• Homogeneous data: Statistical techniques• Heterogeneous data: Semantic models

Open Problem: Develop semantics of trust using expressive frameworks that are both declarative and computational • To make explicit all aspects that go into trust

formation, to inspire confidence in inferences

11/15/2013 Prasad 30

Page 31: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Veracity

Machine sensing: objective, quantitative,

but prone to environmental effects, battery life, …

Human sensing: subjective, qualitative,

but prone to bias, perceptual errors, rumors, …

Open problem: Improving trustworthiness by combining machine sensing and human sensing– E.g., 2002 Überlingen mid-air collision :Pilot incorrectly

using Traffic controller advice over electronic TCAS system recommendation

11/15/2013 Prasad 31

Page 32: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

(More on) Value

Learning domain models from “big data” for prediction

E.g., Harnessing Twitter "Big Data" for Automatic Emotion Identification

Idea: Exploit “emotion-hashtagged” tweets as training dataset

11/15/2013 Prasad 32

Page 33: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

(More on) Value

Discovering gaps and enriching domain models using data

E.g., Data driven knowledge acquisition method for domain knowledge enrichment in the healthcare

Idea: Use associations between diseases, symptoms and medications in EMR documents

11/15/2013 Prasad 33

Page 34: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad 34

Conclusions

• Glimpse of our research organized around

the 5 V’s of Big Data• Discussed role in harnessing Value

– Semantic Perception (Volume)– Continuum of Semantic models to manage

Heterogeneity (Variety)– Hybrid KRR: Probabilistic + Logical (Variety)– Continuous Semantics (Velocity)– Trust Models (Veracity)

11/15/2013

Page 35: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad35

thank you, and please visit us at

http://knoesis.org/

Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled ComputingWright State University, Dayton, Ohio, USA

Kno.e.sis

11/15/2013

Special Thanks to: Pramod Anantharam and Cory Henson