Upload
emanuele-della-valle
View
123
Download
3
Embed Size (px)
Citation preview
STREAM REASONINGAN APPROACH TO TAME THE VELOCITY AND VARIETY DIMENSIONS OF BIG DATA
Emanuele Della VallePolitecnico di Milano
http://emanueledellavalle.org@manudellavalle
Oslo, Norway - 15.6.2017
BIG DATA TECHS
CAN TAME VOLUME
▸ Hadoop, MapReduce, HIVE
▸ “schema on read” methodology
▸ spark (x100 faster)
▸ “data lake” concept
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
BIG DATA TECHS
CAN TAME VELOCITY
▸ Storm
▸ Kafka
▸ Spark Streaming
▸ Flink
▸ paradigmatic change
▸ from persistent data and transient queries
▸ to persistent queries and transient data
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
BIG DATA TECHS
CANNOT TAME VOLUME AND VELOCITY SIMULTANEOUSLY
ZB
EB
PB
TB
GB
MB
KB
months days hours min. sec. ms. Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
BIG DATA TECHS
CAN TAME VARIETY USING SEMANTIC TECHNOLOGIES
▸ RDF data model
▸ SPARQL query language
▸ OWL ontological language
▸ R2RML mapping language
▸ Ontology Based Data Access methodology
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
BIG DATA TECHS
VARIETY MAKES PROBLEMS HARDER
ZB
EB
PB
TB
GB
MB
KB
months days hours min. sec. ms.
VARIETY
STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs
OFF-SHORE OIL OPERATIONS
‣ When sensors on a drilling pipe in an oil-rig indicate that it is about to get stuck, how long — according to historical records — can I keep drilling?
‣ 400,000 sensors from 10s of differente producers
‣ 10,000 observations per second, many out-of-operational-ranges
STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs
SMART CITIES
▸ Can you suggest where to spend my next hours given my interests, the presence of people and what their doing?
▸ 100,000s people generating 10,000s information items per second
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs
SOCIAL MEDIA ANALYSIS
▸ Who are the current top influencer users that are driving the discussion about the top emerging topics across all the social networks
▸ billions of active users (facebook, 1.86 bln in February 2017)
▸ millions of actions (facebook, 2.92 mln post per minute)
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs
REQUIREMENT ANALYSIS
A system able to answer those queries must be able to
▸ handle massive datasets x
▸ process data streams on the fly x
▸ cope with heterogeneous datasets x
▸ cope with incomplete data x x
▸ cope with noisy data x
▸ provide reactive answers x
▸ support fine-grained information access x x
▸ integrate complex domain models x
Volu
me
Velo
city
Varie
ty
VERA
CITY
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs
(PARTIAL) SOLUTIONS: STREAM PROCESSING
▸ A paradigmatic change!window
input streams streams of answerRegistered Continuous Query
Dynamic System
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs
STREAM PROCESSING VS. REQUIREMENTS
Requirement SPmassive datasets
data streamsheterogeneous dataset incomplete datanoisy data
reactive answers fine-grained information access complex domain models
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs
(PARTIAL) SOLUTIONS: SEMANTIC TECHS
▸ Given an ontology O (an information model), a query Q and a set of ground facts A contained in multiple heterogenous databases …,
▸ use O to rewrite Q as Q’ so that
▸ answer(Q,O,A) = answer(Q’,!,A) The answer of the query Q using the ontology O for any set of ground facts A is equal to answer of a query Q’ without considering the ontology O
▸ Use mapping M to map Q’ to multiple SQL queries to the various databases
Rewrite
O
Q Q’Map
SQL
M
answerA
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs
SEMANTIC TECHS VS. REQUIREMENTS
Requirement SP STmassive datasetsdata streamsheterogeneous dataset incomplete datanoisy data reactive answers fine-grained information access complex domain models
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
Is it possible to make sense in real time of multiple, heterogeneous, gigantic and inevitably noisy and incomplete data streams in order to support the decision processes of extremely large numbers of concurrent users?
E. Della Valle, S. Ceri, F. van Harmelen & H. Stuckenschmidt, 2010
STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs
STREAM REASONING RESEARCH QUESTION
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
(,13),(,12),(,8),(,8)
STREAM REASONING
THEORY: STREAM PROCESSING
time
1minutewidewindow
Which are the top-4 most frequent colours in the last minute?
Is there a followed by a in the last minute yes, many
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STREAM REASONING
THEORY: STREAM PROCESSING + SEMANTIC TECHS
time
1minutewidewindow
An ontology of colours
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
(,13),(,8),(,8)
STREAM REASONING
THEORY: STREAM REASONING
time
1minutewidewindow
Which are the top-2 most frequent cool colours in the last minute?
Is there a primary cool colour followed by a secondary warm one
yes, followed by .
An ontology of colours
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STREAM REASONING
THEORY: STREAM REASONING
time
1minutewidewindow
A better ontology of colours
Which are the most frequent sentiments in the last minute?
Is there a impulsive, irritating colour followed by an happy one
The better is the ontology of the colours we are using the more expressive are the queries we can register
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STREAM REASONING
THEORY: 1000 SCIENTIFIC PAPERS IN 10 YEAR
▸ It is possible extend the Semantic Web stack in order to represent heterogeneous data streams (RDF streams), continuous queries (C-SPARQL, CQELS-QL, … RSP-QL), and continuous reasoning (LARS, STARQL, …) tasks
▸ The ordered nature of data streams and the possibility to forget old enough information allow to optimise continuous querying (C-SPARQL Engine, CQELS, MorphStream, … RSP Engine) and continuous reasoning (IMaRS, RDFox, StreamRule, ETALIS…) tasks so to provide reactive answers
▸ Semantic Web and Machine Learning technologies can be jointly employed to cope with the noisy and incomplete nature of data streams
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
Traditional
STREAM REASONING
THEORY: STREAM REASONING PARADIGMATIC CHANGE ENABLED
TRADITIONAL APPROACH
Data “in-motion” Data
“in-motion”Registered
analysisInsights
“in-motion”
Data put“at-rest”in DWH
Analysis
Analysis
Insight
PANOPTIQUE APPROACH
Ontology+
Mappings
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
Traditional Stream Reasoning
STREAM REASONING
THEORY: STREAM REASONING PARADIGMATIC CHANGE ENABLED
TRADITIONAL APPROACH
Data “in-motion” Data
“in-motion”Registered
analysisInsights
“in-motion”
Data put“at-rest”in DWH
Analysis
Analysis
Insight
PANOPTIQUE APPROACH
Ontology+
Mappings
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STREAM REASONING
(MY) APPLICATIONS
BOTTARI Winner of Semantic Web Challenge 2011
URBAN BIG DATA SCIENCE
Winner of IBM faculty award 2013Funded by 8 EIT Digital yearly grants
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STREAM REASONING
URBAN BIG DATA SCIENCE: CROWDINSIGHTS PROJECT
October July
1000
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STREAM REASONING
PRODUCTS: I STARTED UP
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STREAM REASONING
PRODUCTS: I STARTED UP
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STREAM REASONING
PRODUCTS: I STARTED UP
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STREAM REASONING
STREAM REASONING VS. REQUIREMENTS
Requirement Stream Reasoningmassive datasetsdata streamsheterogeneous dataset incomplete datanoisy data reactive answers fine-grained information access complex domain models
not specifically treated so far treated but not resolved universally addressed by all studies
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STREAM REASONING
NOW WHAT?
▸ Focus on languages and abstractions able to easily capture user needs
▸ Analytic queries
▸ Which electricity-producing turbine has sensor readings similar (i.e., Pearson correlated by at least 0.75) to any turbine that subsequently had a critical failure in the past year?
▸ Advance analytics (Machine Learning) tasks
▸ Where am I likely going to run into a traffic jam during my commute tonight and how long will it take, given current weather and traffic conditions?
▸ … many more …
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
▸ Find the sweet-spot between scalability and expressive semantics
▸ the data access layers are clear (enough)
▸ … but, what kind of reasoning should we put at the top?
▸ Rule language? Answer set programming? Temporal logic?
STREAM REASONING
NOW WHAT?
ComplexityRaw Stream Processing
Semantic Streams
DL-Lite
???Abstraction
Selection
Interpretation
Reasoning
Re-writing
Mapping
Change Frequency
PTIME
NEXPTIME
104 Hz
1 Hz
Complexity vs. Dynamics AC0
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STREAM REASONING
NOW WHAT?
▸ Used semantics to model more than the data access
▸ Data are imperfect, get over it!
STREAM REASONING
ARE YOU INTERESTED TO LEARN MORE?
▸ the official stream reasoning community web site
▸ http://streamreasoning.org/
▸ the RDF Stream Processing W3C community
▸ https://www.w3.org/community/rsp/
▸ my personal pages
▸ http://emanueledellavalle.org/ + twitter: @manudellavalle
▸ my company page
▸ http://fluxedo.com/en/
Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
STREAM REASONING
THANK YOU! ANY QUESTION?
Emanuele Della VallePolitecnico di Milano
http://emanueledellavalle.org@manudellavalle
Oslo, Norway - 15.6.2017