View
2.792
Download
2
Category
Tags:
Preview:
DESCRIPTION
Slides prepared based on the paper Efficient Filtering in Publish-Subscribe Systems using BDD by Alexis Campailla, SagarChaki, Edmund Clarke, SomeshJha, Helmut Veith
Citation preview
Efficient Filtering in Publish-
Subscribe Systems using BDDAlexis Campailla, Sagar Chaki, Edmund Clarke, Somesh Jha, Helmut Veith
Prepared by Nabeel Mohamed
4/16/08
1
Outline
Research problem at hand
Content-based Publish-Subscribe
Subscription Query Language
BDD Semantics
BDD Based matching
Experimental Results
Discussion (Pros and Cons)
2
Research Problem at Hand
Loosely-coupled interactions in
publish-subscribe systems allows to
build very large scale systems
However, filtering techniques used are
a major bottleneck
Efficiency of the filtering technique
plays a major role in scalability
Whatever technique we use should be
provably correct
3
Major Contributions
A Precise semantics to match
messages (events) to subscriptions
(subscription queries)
Modeling filtering as a satisfiability
check in BDD
4
Roadmap
Research problem at hand
Content-based Publish-Subscribe
Subscription Query Language
BDD Semantics
BDD Based matching
Experimental Results
Discussion (Pros and Cons)
5
Publish-Subscribe Systems
Publisher
Publisher
Publisher
SubscriberNotify()
SubscriberNotify()
SubscriberNotify()
Distributed
Subscription
Mgmt and Routing
Distributed
Content Routers
Notify()
Subscribe()
Unsubscribe()
publish
publish
notify
subscribe
unsubscribe
6
Publish-Subscribe Systems
Publishers and Subscribers are
loosely coupled
◦ Space decoupled
◦ Time decoupled
◦ Synchronization decoupled
Content routers (brokers) form a
structured p2p system
Scalable Systems
7
Message (Event) Filtering
Filtering
◦ Matching incoming messages (events) generated by Publishers with subscription criteria
◦ A main task of content routers (brokers) –filtering engine
Content-based pub-sub systems routes messages (events) based on the content itself
Example: Filter Quotes with symbol = Google and offer price < 400 in a Financial ticker.
8
Example Pub-Sub Systems
Stock market feeds
◦ For delivery of financial data such as
stock quotes, trade reports, news, etc. to
customers
◦ OPRA feed disseminates more than
100,000 quotes/sec
Sensor networks
Network traffic analysis
Transaction log analysis
9
Desirable Functions of a Filtering
Engine Correctness:
◦ Correctly matching incoming messages with subscription criteria
Expressiveness:◦ Rich subscription language
Efficiency:◦ Real time matching
Scalability:◦ Handling a large number of subscriptions
Dynamic:◦ Capability to add and remove subscriptions
online
10
Related Work
Most existing systems support only conjunctive subscriptions
◦ GRYPHON
◦ SIENA
◦ Le Subscribe
Example: The following subscription requires 27 GRYPHON-like subscriptions while BDD handles it naturally.
11
Related Work
Some systems have higher expressive power at the expense of less efficient filtering.◦ ELVIN
Can we come up with an efficient filtering technique while providing an expressive subscription language?
BDD based filtering may be employed in existing systems to improve matching efficiency
12
Roadmap
Research problem at hand
Content-based Publish-Subscribe
Subscription Query Language
BDD Semantics
BDD Based matching
Experimental Results
Discussion (Pros and Cons)
13
Subscription Query Language
The language used to describe
subscription criteria or subscriptions
Three Subscription Languages of
increasing complexity
◦ SiSL – Simple Subscription Language
◦ StSL – Strict Subscription Language
◦ DeSL – Default Subscription Language
14
Messages and Attributes
V = <v1, .., vn> = a finite sequence of
attributes
Each attribute vi has a type
Each attribute vi has a corresponding
domain
Event schema =
15
Messages and Attributes
A message = an assignment of values
to some (not necessarily all) of the
attributes
Formally, a message is a mapping m
such that for each attribute v, either
(m does not define v) ≡
A message is total if it defines all
attributes in V.16
Messages and Attributes –
Example 1 Let V = <company, product, price>
over the event schema <STR, STR, DBL>
Consider the following message:<company> IBM </company><product>PC AT, 20 Mhz, 256 KB RAM</product><price>5000</price>
This describes a total message m1
where m1(company) = “IBM”, m1(product) = “PC AT, 20 Mhz, 256 KB RAM” and m1(price) = 5000.
17
Messages and Attributes –
Example 2 Consider the following message:
<company> IBM </company>
<product>PC AT, 20 Mhz, 256 KB RAM</product>
This describes a different message m2
which is not total (i.e. partial), since
m2(price) = *.
18
Three Subscription Languages
SiSL – Simple Subscription Language
◦ All messages are total
StSL – Strict Subscription Language◦ Messages define all attributes that occur in
the query (subscription criteria)
◦ SiSL is a subset of StSL
DeSL – Default Subscription Language
◦ All attributes are initialized to default values (e.g. using NULL)
◦ Extends the functionality of SiSL to heterogeneous message formats
19
Formalizing SiSL Queries
(Subscriptions) Atomic formulas
Let v be an attribute in V
If and
then the formulas v = c, v < c, c < v
are atomic formulas.
If , atomic formulas are
defined similarly.
If
then the formulas are
atomic formulas. ( ≡ substring)20
Formalizing SiSL Queries
(Subscriptions) Atoms = the set of atomic formulas
A Query is a Boolean combination
of atomic formulas
= the set of attributes occurring
in
= the set of atomic formulas
occurring in
21
Formalizing SiSL Queries
(Subscriptions) Abbreviations
22
Example: SiSL Query
The following SiSL query matches all
messages for 1000 Mhz PCs
manufactured by IBM, Dell or Siemens
which cost at most $1000.
23
Formalizing SiSL Queries
(Subscriptions) = The instantiation of a query by
a message m.
Definition:
is defined as the query obtained
from by replacing all variables
for which m(v) ≠ * by m(v).
Definition:
The SiSL query matches the total
message m if evaluates to true.
24
Formalizing StSL Queries
(Subscriptions) StSL (Strict Subscription Language) is
generalization of SiSL.
Definition: adequacy
A message m is adequate for a query
, if for all , it holds that m(v)
≠ *.
Definition:
The query matches m, iff m is
adequate for and
25
Formalizing DeSL Queries
(Subscriptions) DeSL (Default Subscription Language)
is the most general out of the three.
For each attribute vi, there’s a default
value
Definition:
The default extension of m is
defined as follows.
26
Formalizing DeSL Queries
(Subscriptions) Definition:
The query matches the message m
under default semantics if (i.e.
evaluates to true)
27
Roadmap
Research problem at hand
Content-based Publish-Subscribe
Subscription Query Language
BDD Semantics
BDD Based matching
Experimental Results
Discussion (Pros and Cons)
28
BDDs (Binary Decision
Diagrams) Notations
A = a set of propositional variables
= a linear ordering (variable
ordering) on A
= An ordered BDD over A, whose
non-terminal nodes are labeled by
variables in A, terminals by 0 or 1.
= The Boolean function
represented by node v in
29
Properties of BDDs
Each non-terminal node v has two out-
edges: low edge and high edge
Let a non-terminal node v with label ai
has successors at the low and high
edges u and w respectively. Then,
≡
Size = # nodes in the BDD
30
Example: BDD
The following BDD represents the
Boolean function x AND ( y OR z).
The variable ordering is
31
Shared BDDs (SBDDs)
While OBDDs represent one Boolean function, SBDDs represent multiple Boolean functions.
SBDD is a collection of component OBDDs respecting same variable ordering.
SBDD has a set of output nodes Vo = {o1, …, on} each corresponding to Boolean functions <f1,…, fn> respectively.
32
SBDDs
Every root node of component
OBDDS Vo
Notation:
Denotes the BDD together with its
output nodes {o1, …, on}
is polynomial time
computable from any other shared
BDD over A for <f1,…, fn>
33
Example: Shared BDD
Node 1 represents
Node 2 represents
Node 3 represents
34
BDD Data Structure
A BDD with n nodes is represented as a graph whose vertices are the natural numbers 1,…, n.
The adjacency relationship is described by an array of size n.
ith element = (low[i], high[i], label[i], value[i])◦ low[i] = low successor of i◦ high[i] = high successor of i
◦ label[i] = label of i◦ value[i] = used later to store the result of the
BDD evaluation corresponding to i.
35
BDD Evaluation
The above algorithm computes the
value of each node in under the
assignment where
= = value of ith component36
BDD Evaluation
Notice that we can compute the value
of Boolean functions associated with
each output node in one pass.
37
BDD Restrictions
The idea is to restrict the possible
truth assignments such that
external constraint f (a Boolean fn
over A) evaluates to true under
Definition: f-restriction
38
Roadmap
Research problem at hand
Content-based Publish-Subscribe
Subscription Query Language
BDD Semantics
BDD Based matching
Experimental Results
Discussion (Pros and Cons)
39
Query BDDs
Key Idea
◦ Represent many subscription queries by a
single shared BDD whose nodes
correspond to atomic sub-formulas of the
queries.
◦ Messages are matched against queries
by simply running EvalBDD on the shared
BDD.
40
Query BDDs
, a sequence of queries
over the set of attributes V
A = , the set of atomic
sub-formulas of the queries.
is the set of propositional variables
such that each atomic sub-formula a
in A is assigned a propositional
variable
= Boolean query obtained by
substituting each a with 41
Example: Query BDDs
Let & two subscriptions received
Then, =
Three atomic sub-formulas => Three
propositional variables
42
Example: Query BDDs
Let the variable order be
SBDD corresponding
to the queries
43
Query Matching: SiSL
Use EvalBDD algorithm for query
matching
A query Qi is considered matched if
the BDD node corresponding to Qi
evaluates to 1.
Bottom-up evaluation makes sure sub-
queries are evaluated only once.
44
Query Matching: DeSL
Same as handling complete
messages
When a message received, it is
extended to a total message before
performing the matching.
45
Query Matching: StSL
Recall that a message m matches a
subscription Q iff m is adequate for Q
and m satisfies Q.
Can use a modified EvalBDD to
perform faster matching
Key Ideas
◦ An undefined atom renders all sub-
formulas in which it occurs undefined.
◦ Treat * as new value undefined
46
Query Matching: StSL
MVEvalBDD for StSL is significantly
faster than EvalBDD for SiSL
47
Roadmap
Research problem at hand
Content-based Publish-Subscribe
Subscription Query Language
BDD Semantics
BDD Based matching
Experimental Results
Discussion (Pros and Cons)
48
# Nodes in SBDD vs. #
Subscriptions
Number of nodes scale almost linearly
◦ High scalability
Restriction further reduces node count,
minimizing memory requirements
49
Matching time for SiSL and StSL
Inputs: Number of subscription queries and message density (how total)
Partial messages can be matched quickly.
Time for StSL queries
50
Roadmap
Research problem at hand
Content-based Publish-Subscribe
Subscription Query Language
BDD Semantics
BDD Based matching
Experimental Results
Discussion (Pros and Cons)
51
Variable Ordering vs. BDD size
Variable ordering has a tremendous
influence on BDD size.
52
Pros
Introduces a well-formed semantics to
describe the matching process in
publish-subscribe systems
Matching as a satisfiability checking in
SBDD allows to incrementally check
multiple subscriptions
Scalable
StSL is more efficient than SiSL
53
Cons/Improvements
Does not describe any heuristics to select node ordering (NP-hard);
◦ Can we order based on the significance of the attributes involved?
Does not explore possibility of eliminating redundancies due to semantically related atomic sub-formulas (e.g.: price = 100 and price > 80) (again NP-hard)
◦ Can we further reduce the node count exploiting the semantics without causing side effect?
Efficiency of matching is not compared with existing systems
54
Conclusion
Two major contributions
◦ A Precise semantics to match messages
to subscriptions
◦ Modeling filtering as a satisfiability check
in BDD
55
Questions
56
Thank You
57
Recommended