Debs 2011 tutorial on non functional properties of event processing

IBM Haifa Research Lab – Event Processing

Non Functional Properties of Event Processing

Presenters: Opher Etzion and Tali Yatzkar-Haham

Participated in the preparation: Ella Rabinovich and Inna Skarbovsky

Introduction to non functional properties of event processing

The variety

There are variety of cheesecakes

There are many systems that conceptually look like EPN, but they are different in non functional properties

Two examples

Very large network management:Millions of events every minute; Very few are significant, same event is repeated. Time windows are very short.

Patient monitoring according to medical Treatment protocol :Sporadic events, but each is meaningful, time windows can span for weeks.

Both of them can be implemented by event

Processing – but very differently.

Agenda

Introduction toNon functional properties of event processing

Performance and scalabilityconsiderations

Availability considerations

Usability considerations

Security and privacy considerations

Summary

II III IV V

Performance and Scalability Considerations

Performance benchmarks

There is a large variance among applications, thus a collection of benchmarks should be devised, and each application should be classified to a benchmark

Some classification criteria:

Application complexity

Filtering rate Required Performance metrics

Performance benchmarks – cont.Adi A., Etzion O. Amit - the situation manager.The VLDB Journal – The International Journal on Very Large Databases. Volume 13 Issue 2, 2004.

Mendes M., Bizarro P., Marques P. Benchmarkingevent processing systems: current state and future directions. WOSP/SIPEW 2010: 259-260.

event processing system benchmark

standby w orld noisy w orld filtered w orld complex w orld

category

system 1

system 2

system 3

Previous studies indicate that thereis a major performance degradation asapplication complexity increases.

Some benchmarks scenarios

Previous studies indicate that there is a major performance degradation asapplication complexity increases a single performance measure (e.g., event/s) isnot good enough.

Example for event processing system benchmark: Scenario 1: an empty scenario (upper bound on the performance) Scenario 2: low percentage of event instances is filtered in, agents are simple Scenario 3: low percentage of event instances is filtered in, agents are complex Scenario 4: high percentage of event instances is filtered in, agents are complex

scenario 1 scenario 2 scenario 3 scenario 4

total external events 100000 100000 100000 100000

throughput (event/s) 72887 57470 7903 1923

accumulated latency (ms) 1372 1742 16503 124319

Adi A., Etzion O. Amit - the situation manager.The VLDB Journal – The International Journal on Very Large Databases. Volume 13 Issue 2, 2004.

Performance indicators

One of the sources of variety

Observations:

The same system provides extremely different behavior based on type of functions employed

Different application may require different metrics

Throughput

Input throughput

output throughput

Processing throughput

Measures: number of input events that the system can digest within a given time interval

Measures: Total processing times /# of event processed within a giventime interval

Measures: # of events that were emitted to consumers within a given time interval

Latency

latency

In the E2E level it is defined as the elapsed time FROM the time-point when the producer emits an input event TO the time-point when the consumer receives an output event

The latency definition

But – input event may not result in output event:It may be filtered out, participate in a pattern but does not result in patterndetection, or participates in deferred operation (e.g. aggregation)

Similar definitions for the EPA level, or path level

Latency definition – two variations:

Producer 1

Producer 2

Producer 3

Detecting Sequence (E1,E2,E3) withinSliding window of1 hour

Consumer

11:00 12:0011:10 11:15 11:30

E1 E2 E3

Variation I:We measure thelatency of E3 only

Variation II:We measure the Latency of each event; for events that don’t createderived events directly, we measure thetime until the system finishes processing them

Performance goals and metrics

Multi-objective optimization function:min(*avg latency + (1-)*(1/thoughput))

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59

minmax latency minavg latency latency leveling

Max throughput

All/ 80% have max/avg latency < δ

All/ 90% of time units have throughput > Ω

Optimization tools

Blackbox optimizations:DistributionParallelismSchedulingLoad balancing Load shedding

Whitebox optimizations:Implementation selectionImplementation optimizationPattern rewriting

Scalability

Scalability is the ability of a system to handle growing amounts of work in a graceful manner, or its ability to be enlarged effortlessly and transparently to accommodate

this growth

Scalability is the ability of a system to handle growing amounts of work in a graceful manner, or its ability to be enlarged effortlessly and transparently to accommodate

this growth

Scale upScale up

Vertical scalabilityAdding resources within the same logical unit to increase

capacity

Scale upScale up

Vertical scalabilityAdding resources within the same logical unit to increase

capacity

Scale outScale out

Horizontal scalabilityAdding additional logical units to

increase processing power

Scale outScale out

Horizontal scalabilityAdding additional logical units to

increase processing power

Vertical Scalability- Scaling up

Adding resources to a single logical unit to increase it’s processing abilitiesAdding resources to a single logical unit to increase it’s processing abilities

Adding CPUs, memory Expanding storage by adding hard-drives

Parallel concurrent execution support, such

as multi-threading

Parallel concurrent execution support, such

as multi-threading

Qualifications of application designed for scale-upQualifications of application designed for scale-up

Common design patterns: the Actor model

Utilizes the in-process memory for message

passing

Utilizes the in-process memory for message

passing

Horizontal Scalability - Scaling out

Adding multiple logical units and making them work as a single unitAdding multiple logical units and making them work as a single unit

Computer cluster Load balancing

Qualifications of application designed for scale-outQualifications of application designed for scale-out

Distributed cachingDistributed caching

Partitioning of state (sharding)Partitioning of state (sharding)

For stateful applicationsFor stateful applicationsMaster/WorkerMaster/Worker

Shared Nothing approachShared Nothing approach

Spaced Based ArchitectureSpaced Based Architecture

Map ReduceMap Reduce

Different patterns associatedDifferent patterns associated

Distributed services -do not assume locality

Load balancingLoad balancing

Scale-out and scale-up tradeoffs

Scale upScale up

Simpler programming model Simpler management layer No network overhead due to in-

memory communication

Finite growth limit Single point of failure

Scale outScale out

Redundancy Flexibility Fault tolerance

Increased management complexity More complex programming model Issues as throughput and latency

between nodes

General approach to scalability

Scaling out by…Scaling out by…

Spreading application modules Load partitioning and load

balancing Distributed cache

Scaling up by…Scaling up by…

Running multiple threads in each

module

Usually applications combine the two approaches…

Scalability in event processing:various dimensions

producers

# of input events

# of EPA types

# of concurrent runtimeinstances

# of concurrent runtime contexts

Internal statesize

# of consumers

# of derived events

Processingcomplexity

# ofgeographicalLocations

Event-processing techniques for scalability

Load sheddingLoad shedding

Load partitioning according to EPAs topology and Runtime Contexts

Scalability in event volume

Scalability in event volume is the ability to handle variable event loads effectively as the quantity of

events may go up and down over time

Scalability in event volume is the ability to handle variable event loads effectively as the quantity of

events may go up and down over time

Some applications requiring high event throughputSome applications requiring high event throughput

financial weather phone-call tracking

Scale out techniquesScale out techniques

Load partitioningLoad partitioningParallel processingParallel processing

Scale up techniquesScale up techniquesLoad sheddingLoad shedding

Applicable scale-up and scale-out techniquesApplicable scale-up and scale-out techniques

Load balancingLoad balancing

Scale out techniques

Scalability in quantity of event processing agents

Scalability in the quantity of EPAs is the ability of the system to adapt to substantial growth of event processing network and

a high quantity of event processing agents

Scalability in the quantity of EPAs is the ability of the system to adapt to substantial growth of event processing network and

a high quantity of event processing agents

Some applications allow users to create their own custom EPAs

Applicable scale-up and scale-out techniquesApplicable scale-up and scale-out techniques

PartitioningPartitioning

Optimization in agent assignment (mapping between logical and physical artifacts)

Parallelism and distributionParallelism and distribution

Scalability in quantity of event processing agents – partitioning and parallelism

Parallelism : Running all artifacts in a single powerful unit

Saves network communication overhead

Distribution: Running all artifacts in multiple units

When event load is also an issue

Parallelism/DistributionParallelism/Distribution

PartitioningPartitioning

Dependency analysisDependency analysis

Number of core processors

Level of distributionLevel of distribution

Communication overheadCommunication overhead

Performance objective fun.

EPA complexity analysisEPA complexity analysis

Scalability in a number of producers/consumers

Growth in a number of producers usually results in growth in event load even if number of events

produced by each one is small

Growth in a number of producers usually results in growth in event load even if number of events

produced by each one is small

Growth in a number of consumers Requires optimization at routing level, such as multicasting

Scalability in a number of context partitions and context-state size

Each context partition is represented by internal state of a certain size Each context partition is represented by internal state of a certain size

Use partitioning on contextUse partitioning on context

Growth in a number of context partitions

Affects EPA performance since iterating on large statesAffects EPA performance since iterating on large states

Significant growth of internal state for a single context partition

Use EPA optimization techniquesUse EPA optimization techniques

Hash (customer id)

events

Availability Considerations

Availability

Availability is ratio of time the system is perceived as functioning by its users to the time it is required or expected

to function

Availability is ratio of time the system is perceived as functioning by its users to the time it is required or expected

to function

Can be expressed asCan be expressed as

Direct proportion : 9/10 or 0.9 Percentage: 99% Can be expressed in terms of average or total downtime

Availability expectations and solutions

Major outages…Major outages…

Disaster recovery techniques

Replicas on site

Additional sites

Continuous operation is the ability to avoid planned outages

Minor outages…Minor outages…

High availability System design and

implementation approach

Ensures pre-arranged level of availability during measuring period (SLA)

Represents ability to avoid minor unplanned outages by eliminating single points of failure

Continuous availability provides the ability to keep the business application running without any noticeable downtime

Components of high availability

Fault avoidance – redundancy and duplicationFault avoidance – redundancy and duplication

Distributed application Clustering Duplication of storage systems Failover for systems and data

Fault tolerance -recoverabilityFault tolerance -recoverability

Failure recovery

Redundancy and duplicationDuplicationDuplication

A single live component is paired with a single backup which takes over in event of failure

Example : Storage – RAID 0

RedundancyRedundancy

Using multiple components with a method to detect failure and perform failover of the failed component

Scale out techniquesScale out techniques

Continuous monitoring of components (“heart-bit”)Continuous monitoring of components (“heart-bit”)

Failover – automatic reconfiguration

Load balancing is one of the players

Failover – automatic reconfiguration

Load balancing is one of the players

When one fails – load balancer no longer sends trafficWhen one fails – load balancer no longer sends traffic

When initial component recovers the load balancer routes traffic backWhen initial component recovers the load balancer routes traffic back

Recoverability in stateful applications – state management tradeoffs

Data grid – replication of state between multiple machinesData grid – replication of state between multiple machines

Recoverability achieved by duplication of state Better performance than pure db

Memory based stateMemory based state

Better performance than pure db

Complexity in recoverability implementation

In-memory db with caching capabilitiesIn-memory db with caching capabilities

Better performance than pure db Guaranteed recoverability

Complexity in persistency layer implementation Performance costs on cache misses and cache outs

Network overhead on replication of state Complexity in synchronization of replicas

High availability costs

Implementing some of HA practices can be very expensive…Implementing some of HA practices can be very expensive…

Performance costs State changes need to be logged

Entire state has to be persisted at least periodically

Toll on processing latency and overall event throughput

Actual costs Duplication of hardware for redundancy and duplication

Application complexity For implementing failover , recovery

Availability in event processing

Fault avoidance Duplication and redundancy of processing

components

Failover mechanisms for processing components

Fault tolerance Recoverability of state for all processing

components

EPAs state Context state Channels state

Using the general availability techniques…Using the general availability techniques…

Cost-effectiveness of recoverability techniques in EP

Mission critical applications

Lost state might result in incorrect decisions

Recoverability is a must

Have to consider if implementing recoverability is cost-effective?Have to consider if implementing recoverability is cost-effective?

Applications not requiring recoverability solution

Applications where events are symptom of some underlying problem and will occur again

Systems looking for statistical trends, which might be based on sampling

Usability Considerations

Usability 101 Definition by Jakob Nilsen*

*http://www.useit.com/alertbox/20030825.html

Learnability:How easy it is for Users to accomplishbasic tasks thefirst time they encounter the system?

Efficiency:Once users have Learned the system, How quickly can theyperform tasks?

Memorability:When users returnafter period of not using the system,How easily canthey reestablish proficiency ?

Errors:How many errorsdo users make, how severe are these errors, andhow easily theycan recover fromthe errors?

Satisfaction:How pleasant is itto use the system?

Utility:Does the systemdo what the userintended?

In this part of the tutorial we’ll talk about

Build time IDE

Runtime control and audittools

Correctness – internal Consistency

Debug and validation

Consistency with the environment- Transactional behavior

Build time interfaces

Text based programming languages

Visual languages

Form based languages

Natural languages interfaces

Text-based IDE (Sybase/CCL)

Another Text-based IDE (Apama)

Visual language – StreamSQL EventFlow (Streambase)

Visual language – StreamSQL EventFlow (Streambase) – cont.

Form based language –Websphere Business Events (IBM)

Whenever transfer occurs more than once in a month, then the Account Managershould be notified and Sales should contact the customer.

Natural language for event processing

Business-oriented tool that intended to define business concepts that involve events and rules without consideration of the implementation details

The tool uses an adaptation of the OMG's SBVR standard

free text

Frequent big cash deposit patternis defined as “at least 4 big cash deposits to the same account”, where big deposit decision depends on customer’s profile.

structured English

A derived event that is derived froma big cash deposit using the frequent deposits in same account applying threshold the count of the participant event set of frequent big cash deposits is greater than or equal to 4.4.

Based on work done by Mark H Linehan (IBM T.J.Watson Research Center)

Run time tools

Performance monitoring

Dashboards

Audit and provenance

Two types of run time tools:

Monitoring the application

Monitoring the event processing systems

Performance Monitoring (Aleri/Sybase)

Dashboard (Apama)

Dashboard Construction (Apama)

Dashboard (IBM WBE)

Provenance and audit

Tracking all consequences of an eventTracking the reasons that something happensWithin the event processing system:Derivation of events, routing of events, Actions triggered by the events

Example: Pharmaceutical pedigree

Validation and debugging

Debugger

Testing and simulation

Validation

Breakpoints and Debugging

Breakpoints and Debugging (StreamBase)

Testing & simulation – IBM WBE

Application validation Changing a certain event, what are the application artifacts affected? What are all possible ways to produce a certain action (derived event)? There was an event that should have resulted in a certain action, but that never happened! “Wrong” action was taken, how did that happen?

Validation techniques

Static AnalysisStatic Analysis

Navigate through mass of information wisely Discover event processing application artifacts dependencies

and change rules with confidence

Dynamic AnalysisDynamic Analysis

Compare the actual output against the expected results Explore rule coverage with multiple scenario invocation System consistency tests

Build-timeDevelopment phase

Run-timeDevelopment andproduction phases

Analysis with Formal MethodsAnalysis with Formal Methods

Advanced correctness and logical integrity observations Build-timeDevelopment phase

Static analysis

Disconnected agents Event possible consequences Event possible provenance Potential infinite cycles

Dynamic Analysis

Runtime ScenarioRuntime Scenario

Dynamic Analysis Component

EP ApplicationDefinition

HistoryDataStore

Observations fordynamic analysisObservations fordynamic analysis

EP system invocation on runtime scenario

Results analysis forcorrectness and coverage

Analysisresults

Event instance forward trace Event instance backward trace Application coverage by scenario execution Agent evaluation in context

Advanced verification with formal methods

Static analysis methods enable to derive a set of “shallow” observations on top of theapplication graph an agent can be physically connected to the graph, but not reachableduring the application runtime (e.g., due to a self-contradicting condition)

Agent/derived event unreachability Automatic generation of scenario for application coverage Logical equivalence of several agents Mutual exclusion of several agents

Correctness

The ability of a developer to create correct implementation for all cases (including the boundaries)

Observation:A substantial amount of effort is invested today in manyof the tools to workaround the inability of the languageto easily create correct solutions

Some correctness topics

The right interpretation of language constructs

The right order of events

The right classification of events to windows

The right interpretation of language constructs – example

All (E1, E2) – what do we mean?

10:00 11:02 13:35

Buy Amount: $2M

SellAmount: $7.8M

Buy Amount: $10.6M

A customer both sells and buys the same security in value of more than $1M within a single day

Deal fulfillment: Package arrival and payment arrival

6/310:00

7/311:00

8/311:00

8/314:00

Fine tuning of the semantics (I)

When should the derived event be emitted?

When the Pattern is matched?

At the window end?

Fine tuning of the semantics (II)

How many instances of derived events should be emitted?

Only once?

Every time there is a match?

Fine tuning of the semantics (III)

What happens if the same event happens several times?

Only one – first, last, higher/lower value on some predicate?

All of them participate in a match?

Fine tuning of the semantics (IV)

Can we consume or reuse events that participate in a match?

Fine tuning of semantics – conclusion

Some languages have explicit policies:Example: CCL Keep policies

–KEEP LAST PER Id–KEEP 3 MINUTES–KEEP EVERY 3 MINUTES–KEEP UNTIL (”MON 17:00:00”)–KEEP 10 ROWS–KEEP LAST ROW–KEEP 10 ROWS PER Symbol

In other cases – explicit programming and workarounds are used if semantics intended is different than the default semantics

The right order of events - scenario

Bid scenario- ground rules:1. All bidders that issued a bid within the validity interval participate in the bid.2. The highest bid wins. In the case of tie between bids, the first accepted bid wins the auction

===Input Bids===

Bid Start 12:55:00credit bid id=2,occurrence time=12:55:32,price=4 cash bid id=29,occurrence time=12:55:33,price=4cash bid id=33,occurrence time=12:55:34,price=3credit bid id=66,occurrence time=12:55:36,price=4credit bid id=56,occurrence time=12:55:59,price=5Bid End 12:56:00

===Winning Bid===cash bid id=29,occurrence time=12:55:33,price=4

Trace:

Race conditions:

Between events;Between events andWindow start/end

Ordering in a distributed environment - possible issues

Even if the occurrence time of an event is accurate, it might arrive after some processing has already been done

If we used occurrence time of an event as reported bythe sources it might not be accurate, due to clock accuracy in the source

Most systems order event by detection time – but events may switchtheir order on the way

Clock accuracy in the source

Clock synchronization

Time server, example: http://tf.nist.gov/service/its.htm

Buffering technique

Assumptions: Events are reported by the producers as soon as they occur; The delay in reporting events to the system is relatively small, and can be

bounded by a time-out offset; Events arriving after this time-out can be ignored.

Principles: Let be the time-out offset, according to the assumption it is safe to assume

that at any time-point t, all events whose occurrence time is earlier than t - have already arrived.

Each event whose occurrence time is To is then kept in the buffer until To+, at which time the buffer can be sorted by occurrence time, and then events can be processed in this sorted order.

Sorted Buffer (by occurrence time)

t > To +

Producers Event Processing

Retrospective compensation

Find out all EPAs that have already sent derived events which would have been affected by the "out-of-order" event if it had arrived at the right time.

Retract all the derived events that should not have been emitted in their current form.

Replay the original events with the late one inserted in its correct place in the sequence so that the correct derived events are generated.

Classification to windows - scenario

Calculate Statisticsfor each Player

(aggregate per quarter)

Calculate Statisticsfor each Team

(aggregate per quarter)

Window classification:

Player statistics are calculated at the end of each quarterTeam statistics are calculated at the end of each quarter based on the players events arrived within the same quarter

All instances of player statistics that occur within a quarter window must be classified to the same window, even if they are derived after the window termination.

Transactional Behavior

In a complete transactional system:

In event processing system this implies:

Nothing gets out of the system until the transaction is committed

-The ability to track the effects of event (forward and backwards) -The system knows to withdraw events from the EPAs’ internal state

Transactional behavior in event processing?

Typically, event processingsystems have decoupledarchitecture, and does not exhibit transactional behavior

However, in several cases event processingis embedded within a transactional environment

CASE I: Transactional ECA at the consumer side

When a derived event is emitted to a consumer, there is an ECA rule, with several actions, that is required to run as atomic unit.

If failed, theDerived event should be withdrawn

CASE II: An event processing system monitors transactional system

In this case, the producer may emit events that are not confirmed and may be rolled back.

Case III: Event processing is part of a chain

There is some transactional relationship between the producer and consumer

The event processing system should transfer rollback notice from the consumer to the producer

-Need to be able to track the effects/causes of event (forward and backwards)-This implies rollback of other events

Case IV: A path in the event processing network should act as “unit of work”

Example: the “determine winner” fails, and the bid is cancelled, all bid events are not kept in the event stores, and are withdrawn for other processing purposes

Transactions in event processing systems

Usually in transactional systems there is assumption that a transaction time is short

This is not necessarily the case in event processing systems

All (E1, E2)

- E2 arrived 5 days after E1- The processing of the pattern failed– What do we mean? Withdraw only E2? Withdraw also E1 after 5 days?

Security and Privacy Considerations

Security, privacy and trust

Dependability Executing predictably and operating correctly under all conditions, including hostile conditions.

Trustworthiness

Containing no malicious logic that causes it to behave in a malicious manner.

Survivability Recovering as quickly as possible with as little damage as possible from attacks.

Security requirements ensure that operations are only performed by authorizedparties, and that privacy considerations are met.Security requirements ensure that operations are only performed by authorizedparties, and that privacy considerations are met.

Based on Enhancing the Development Life Cycle to Produce Secure Software [DHS/DACS 08]

Characteristics of secure application:

Towards security assurance

Identify and categorize the information the software is

goingto contain

Identify and categorize the information the software is

goingto contain

Low sensitivity – The impact of security violation is minimal

High sensitivity – Violation may pose a threat to human life

Develop security requirementsDevelop security requirements

- Access control (Authentication) - Data management and data access (Authorization)- Human resource security (Privacy)- Audit trails

Security in event processing systems

Only authorized parties are allowed to be event producers or consumers

Incoming events are filtered to avoid events that producers are not entitled to publish

Consumers only receivederived events to which they are entitled (in some cases only some attributes of an event) Extensive work on secure

subscription was done in pub/sub systems

Security in event processing systems – cont.

Unauthorized parties can not make modifications in the application Off-line definition modifications or hot

updates

All database and data communications links used bythe system are secure, including data transfer in distributed environments

Keeping auditable logs of events received and processed

Preventing spam events Can all twitter events be trusted?

Security patterns in event processing

Application definitions access patterns Access type control – view/edit/manage

Access destination control – application parts access restrictions per user/group

Both above should be enforced in development and runtime phases (hot updates)

Event data access patterns Access to events satisfying a certain condition (selection)

Access to a subset of event attributes (projection)

Summary

Non Functional properties determine thenature of event processing applications – distribution,availability, optimization, correctness and security are someof the dimensions

There are often the main decision factor in selecting whether touse an event processing system, and in the selection amongvarious alternatives.

Debs 2011 tutorial on non functional properties of event processing

Technology

Tutorial: Complex Event Recognition Languages · aim of our DEBS 2017 tutorial: to present a uni˙ed view of the foun-dations of CER, allowing for a comparison of di˛erent approaches

DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics

Solving DEBS Grand Challenge with WSO2 CEPdocs.huihoo.com/wso2/Solving-DEBS-Grand-Challenge-with-WSO2-… · Solving DEBS Grand Challenge with WSO2 CEP Srinath Perera, Suhothayan

Debs 2012 basic proactive

OTB Cold shoulder project Debs

Debs Child Care And Preschool

Debs 2013 tutorial : Why is event-driven thinking different from traditional thinking

The HERITAGE GENE DEBS - marxists.org · The HERITAGE of GENE DEBS by Alexander ... he was elected by the Democratic Party to the Indiana State ... In 1918 Debs made his famous Canton

The Trial of Eugene Debs

Cadence Tutorial A: Schema tic Entry and Functional

Ernest E Debs Park

DEBS 2015 tutorial When Artificial Intelligence meets the Internet of Things

DEBS 2014 tutorial on the Internet of Everything

Christina Debs

Work debs media

Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Debs 2011 pattern rewritingforeventprocessingoptimization

Debs 2012 uncertainty tutorial

ASPDAC / VLSI 2002 - Tutorial on "Functional Verification of SoCs" 1 ASPDAC/VLSI 2002 Tutorial Functional Verification of System on Chip - Practices, Issues

2006 Functional Neuroanatomy Tutorial