Upload
alexandre-de-castro-alves
View
363
Download
1
Tags:
Embed Size (px)
Citation preview
<Insert Picture Here>
Speeding-up Big Data with Event ProcessingAlexandre de Castro Alves
1Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Disclaimers
• The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
2Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 3
<Insert Picture Here>
Agenda• CEP
• Drivers• Formal description
• Big Data• Scenarios• Architecture• Integration with CEP
• Fast Data• Architecture• Integration with CEP
• Predictive Analytics• Data Mining• Online data mining
• Scenarios
3Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Event-Driven Applications
Financial Services
Transportation & Logistics
Public Sector & Military
Manufacturing
Utilities & Insurance
Telecommunications & ServicesAlgorithmic trading
Asset management
Distributed order orchestration
‘Negative Working Capital’ inventory management
Grid Infrastructure ManagementReponses to calamities – earthquake, flooding
• Proximity/Location Tracking• Intrusion detection systems• Military asset allocation
4Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Business Drivers & Enablers
• Exploding volume of digital event data: • The cost of sensors and computing power has dropped, network
capacity has increased
• Accelerating business process: • “the pace of business has increased, the world is changing faster,
and competition is getting tougher” • Roy Schulte - VP Gartner Analyst
• "Event-driven systems are intrinsically smart because they are context-aware and run when they detect changes in the business world rather than occurring on a simple schedule or requiring someone to tell them when to run."
• K. Mani Chandy, Simon Ramo Professor at the California Institute of Technology in Pasadena
5Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Event processingTaxonomy
• Event passing• Events are exchanged, but not processed• Simple pub-sup applications• Example: JMS
• Event mediation (brokering)• Events are filtered, routed, and enriched• However not state-full
• Example: ESB• Complex Event Processing
• Events are aggregated and new complex events are created• Extremely state-full
6Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Inverted Database
RDBMS
Data
Query CEP
Query
Event
DataData
QueryQuery
• Data is ‘static’• Queries are ‘dynamic’
• Data (event) is ‘dynamic’• Queries are ‘static’
7Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
EPTS and Standards
• Event processing technical society• Defines glossary
• http://www.ep-ts.com/component/option,com_docman/task,cat_view/gid,16/Itemid,84/
• Steering committee:• Opher Etzion (IBM), Louis Lovas (Apama), David Luckham
(Stanford), Alan Lundberg (TIBCO), John Morrell (SAP Corel8), Roy Schulte (Gartner), Richard Tibbetts (Streambase), Alexandre Alves (Oracle)
• Participation at DEBS• ANSI SQL Standards Proposal for CQL Pattern Matching
• Oracle, IBM, Stanford University• OpenSource Adoption of CQL (Swiss University)
8Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
CEP Models
9Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
CEP Languages
inferencerules
ECA
State-oriented
Script-oriented
Agent-oriented
SQL-idioms
TIBCO
ApamaRuleCore
AgentLogic
Streambase
IBM (AptSoft)
Oracle CEP
Oracle CEP
Source: EPTS/DEBS Tutorial 2009
10Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Contextual Data
EVENT SOURCES
EVENT SINKSSTREAM
RELATION
NOT JEE!
Application Model
11Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Application Model
Contextual Data NOT JEE!
• Event Processing Network (EPN)• Non-rooted directed graph describing event flow from event sources to event
sinks• References to contextual static data (e.g. table, cache, HDFS)
• Intermediate nodes:• Process events (CQL processor, Java Event-Beans)• Stage or route processing (channels)
• Edge nodes:• Adapters (e.g. JMS, HTTP pub/sub JSON)
Event Sinks
Event Sources
12Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Application Model
• Event models:• STREAM (append-only, unbounded)• RELATION (insert/delete, bounded)
• Event formats:• Java Class• Map (key-value pairs)• XML
• Timing models:• system timestamped• application timestamped
Adapter
Adapter
Processor
Listener- POJO
Event Source
Data Source
Query
RuleProcessor
Query
Query
RuleProcessor
Query
RuleProcessor
Query
RuleCache Rule
Processor
QueryListener- ALSB
13Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
• EVENT• Defined by a schema: event -type • Tuple of event properties
StockEventTypeStockEventTypesymbol stringlastBid floatlastAsk float
Event properties
Application Model
14Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
• STREAM• Time ordered sequence of events in time• APPEND-only
• One cannot remove events, just add them to the sequence• Unbounded
• There is no end to the sequence{event1, event2, event3, event4, …, eventN}
Application Model
15Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
• STREAM• Examples:
• {{1s, event1}, {2s, event2}, {4s, event3}}
• {{1s, event1}, {4s, event2}, {2s, event3}}
Application Model
16Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
• STREAM• Examples:
• {{1s, event1}, {2s, event2}, {4s, event3}}
• {{1s, event1}, {4s, event2}, {2s, event3}}
Application Model
STREAM
16Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
• STREAM• Examples:
• {{1s, event1}, {2s, event2}, {4s, event3}}
• {{1s, event1}, {4s, event2}, {2s, event3}}
Application Model
STREAM
EVENT CLOUD
16Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
• RELATION• Bag of events at some instantaneous time T• Allow for INSERT, DELETE, and UPDATE• Example:
• At T=1: {{event1}, {event2}, {event3}}• At T=2: {{event1}, {event3}, {event4}}
• No changes to event1 and event3• Event2 was deleted• Event4 was inserted
Application Model
17Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Event Processing Language: CQL
• High-level descriptive language for EP, dynamically changeable
• Continuous and incremental• Driven by time and events, incremental calculations
• Leverages SQL principles/implementation, and extends it with formal STREAM calculus.
• Based on STREAMs project in Stanford
continuous continuous
Stream-Relational Algebra Control Rate of Event Output
Define Window of Events
18Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Stream-relation Window Operator
Time (in secs) Input event Output event00 ∅ {AVG(price) = 0.0}01 {symbol = “aaa”, price = 4.0} {AVG(price) = 4.0}10 {symbol = “bbb”, price = 2.0} {AVG(price) = 3.0}59 {symbol = “aaa”, price = 5.0} {AVG(price) = 3.6}61 ∅ {AVG(price) = 3.5}70 ∅ {AVG(price) = 5.0}
80 {symbol = “aaa”, price = 6.0} {AVG(price) = 5.5}
SELECT AVG(price) FROM marketFeed [RANGE 1 MINUTE]
19Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
• Window variations:• Sliding• Jumping (batching)• Partitioned• User-defined windows• Time-based• Tuple-based• Value windows• CurrentHour (left edge is fixed, and right edge moves)
Stream-relation Window Operator
20Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Relation-stream operators
21Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Relation-stream operators
Time Input event WINDOW ISTREAM output output
00 ∅ +{AVG(price) = 0.0} +{AVG(price) = 0.0}
01 +{price = 4.0} -{AVG(price) = 0.0}, +{AVG(price) = 4.0}+{AVG(price) = 4.0}
10 +{price = 2.0} -{AVG(price) = 4.0}, +{AVG(price) = 3.0} +{AVG(price) = 3.0}
59 +{price = 5.0} -{AVG(price) = 3.0}, +{AVG(price) = 3.6}+{AVG(price) = 3.6}
61 ∅ -{AVG(price) = 3.6}, +{AVG(price) = 3.5}+{AVG(price) = 3.5}
70 ∅ -{AVG(price) = 3.5}, +{AVG(price) = 5.0}+{AVG(price) = 5.0}
80 +{price = 6.0} -{AVG(price) = 5.0}, +{AVG(price) = 5.5}+{AVG(price) = 5.5}
ISTREAM (SELECT AVG(price) FROM marketFeed [RANGE 1 MINUTE])
DSTREAM (SELECT AVG(price) FROM marketFeed [RANGE 1 MINUTE])
Time Input event WINDOW DSTREAM output output
00 ∅ +{AVG(price) = 0.0} ∅
01 +{price = 4.0} -{AVG(price) = 0.0}, +{AVG(price) = 0.0}+{AVG(price) = 4.0}
10 +{price = 2.0} -{AVG(price) = 4.0}, +{AVG(price) = 4.0}+{AVG(price) = 3.0}
59 +{price = 5.0} -{AVG(price) = 3.0}, +{AVG(price) = 3.0}+{AVG(price) = 3.6}
61 ∅ -{AVG(price) = 3.6}, +{AVG(price) = 3.6}+{AVG(price) = 3.5}
70 ∅ -{AVG(price) = 3.5}, +{AVG(price) = 3.5}+{AVG(price) = 5.0}
80 +{price = 6.0} -{AVG(price) = 5.0}, +{AVG(price) = 5.0}+{AVG(price) = 5.5}
22Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Pattern Matching
• Detect complex relationships amongst events
• State-machine model
• ANSI standards proposal• IBM, Oracle, Streambase
• Starting to see adoption by other vendors/users (e.g. MySQL) [1]
23Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Pattern Matching
SELECT M.up
FROM ticker
MATCH_RECOGNIZE ( MEASURES B.price as up, A.price as down PATTERN (A B)
DEFINE A as price < 10.0, B as price => 10.0
) as M
Input event Output event+{symbol = ‘ORCL’, price = 9.0} ∅
+{symbol = ‘ORCL’, price = 9.5} ∅
+{symbol = ‘ORCL’, price = 12.0} +{M.up = 12.0}
A
A B
price=9.0
price=9.5
price=12.0 up=12.0
price=9.5
24Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Pattern Matching
25Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Event Processing Ecosystem
JMS
HTTP PUB/SUB
JMS
HTTP PUB/SUB
Events Events
Contextual Data
IDE OEP Server Visualizer Web Console / BAM
deploy manage
RDBMS Cache Hadoop NoSqlDb
26Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Summary
• Event Processing Network defines the assembly
• CQL defines the processing
• STREAM vs RELATION
• RELATION can be any relational source:• tables, caches, Hadoop HDFS files, etc.
27Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 28
<Insert Picture Here>
Agenda• CEP
• Drivers• Formal description
• Big Data• Scenarios• Architecture• Integration with CEP
• Fast Data• Architecture• Integration with CEP
• Predictive Analytics• Data Mining• Online data mining
• Scenarios
28Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Big Data Scenarios
MEDIA/ENTERTAINMENTViewers / advertising effectivenessCross Sell
COMMUNICATIONSLocation-based advertising
EDUCATION &RESEARCHExperiment sensor analysis
Retail / CPGSentiment analysisHot productsOptimized Marketing
HEALTH CAREPatient sensors, monitoring, EHRsQuality of care
LIFE SCIENCESClinical trialsGenomics
HIGH TECHNOLOGY / INDUSTRIAL MFG.Mfg qualityWarranty analysis
OIL & GASDrilling exploration sensor analysis
FINANCIALSERVICESRisk & portfolio analysis New products
AUTOMOTIVEAuto sensors reporting location, problems
GamesAdjust toplayer behaviorIn-Game Ads
LAW ENFORCEMENT & DEFENSEThreat analysis - social media monitoring, photo analysis
TRAVEL &TRANSPORTATIONSensor analysis for optimal traffic flowsCustomer sentiment
UTILITIESSmart Meter analysis for network capacity,
ON-LINE SERVICES / SOCIAL MEDIAPeople & career matchingWeb-site optimization
29Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
What’s Big Data?
VELOCITYVOLUME VARIETY
10110010100100100110101010101110010101010010010
Web
SMS
VALUE
30Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Big Data Architecture (Map-Reduce)
DataData
DataData
DataData
DataData
Data
Big,Immutable
(append-only, avoids corruption)
Batch-Layer
Batch viewsquery = function(data)
e.g. Hadoop
Data
batchinput
batchinput
map
key1, value1
key2, value2
key1, value3
key2, value4
key1, value5
reduce
key1, {value1, value3, value5}
key2, {value2, value4}
31Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
When is CEP needed?
• If Big Data is about VVV (volume, variety, velocity), then Stream Processing is needed when at least 2 of the 3 V’s are present.• If there is high volume and low-latency is needed (velocity),
then stream processing must be done.• If there is NOT a lot of volume, but the data is semi-structured
(variety), such as the case of social feeds, and low-latency is needed, then stream processing must still be applied.
• If volume is low, and no need to do it fast, then use regular messaging processing technology, such as JMS.
32Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
CEP with Big Data
33Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 34
<Insert Picture Here>
Agenda• CEP
• Drivers• Formal description
• Big Data• Scenarios• Architecture• Integration with CEP
• Fast Data• Architecture• Integration with CEP
• Predictive Analytics• Data Mining• Online data mining
• Scenarios
34Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Big Data Architecture Limitations
DataData
DataData
DataData
DataData
Data
Big,Immutable
(append-only, avoids corruption)
Batch-Layer
Batch viewsquery = function(data)
e.g. Hadoop
Data
batchinput
batchinput
map
key1, value1
key2, value2
key1, value3
key2, value4
key1, value5
reduce
key1, {value1, value3, value5}
key2, {value2, value4}
Batch output
Deep, but not real-time
35Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
DataData
DataData
DataData
DataData
Data
Big,Immutable
(append-only, avoids corruption)
Batch-Layer
Batch viewsquery = function(data)
e.g. Hadoop
Indexing-Layere.g. ElephantDB,
Cassandra,NoSqlDB
Indexed batch viewsquery = function(data)
Fast-Layere.g. CEP,
Storm
real-time viewsquery = function(data)
+ inc-update
Data
Fast Data Architecture
36Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
• Integration with other Big Data technologies:• HBase, • Hive• Avro (Flume)
• Incremental merge of Hadoop Jobs with OEP queries• Avoids developer from
having to create own Hadoop job
Fast Data with CEP
37Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 38
<Insert Picture Here>
Agenda• CEP
• Drivers• Formal description
• Big Data• Scenarios• Architecture• Integration with CEP
• Fast Data• Architecture• Integration with CEP
• Predictive Analytics• Data Mining• Online data mining
• Scenarios
38Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Data Mining
• Identify patterns and relationships in real world
• Develop descriptive models of datasets
• Use models to evaluate future options, risks and decisions
39Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Data Mining
Data-SetWorld Model
population sample
statistical summaries,regressions,
machine-learning
Data Model Prediction
(1) TRAIN
(2) SCORE
(3) RE-TRAIN
40Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Online Data Mining
continuous continuous
EventModel
Export model
Rebuild modelScore events
Predict if price of next event will be above 0.8 using model
Model Repository
41Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Challenges (Right Model, Right Cost)
Data
Model
Induction
Data
Deduction
k-Nearest-Neighbors
Decision trees
Neural nets/SVMIncreased
Compression
Computational Cost
42Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Challenges
• All models are wrong, some are useful (George Box)• Central Limit Theorem
• Means of random samples of the same population will be normally distributed (even if the data is not normally distributed)
• However, all bets are off if not from the same population• Consider a regression function of weight -> height• Will not work if model is build using samples of a city bus
and scored in bus containing only basketball players• What confidence level to use?
• Scientific papers demand a 95% confidence level. What about streaming use-cases? 95% seems too high...
43Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
• http://www.oracle.com/technetwork/middleware/complex-event-processing/overview/index.html
• http://adcalves.wordpress.com
• http://www.packtpub.com/getting-started-with-oracle-event-processing-11g/book
Material
44Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
45Thursday, July 18, 13
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Insert Information Protection Policy Classification from Slide 8
46
46Thursday, July 18, 13