Scalable Publish/Subscribe Architectures & Algorithms Scalable Publish/Subscribe Architectures & Algorithms Part I: Introduction Pascal Felber Pascal Felber

Scalable Scalable Publish/Subscribe Publish/Subscribe

Architectures & Architectures & AlgorithmsAlgorithms

Part I: Introduction

Pascal FelberPascal FelberUniversity of Neuchatel

[email protected]

Based on work with many othersC.-Y. Chan, W. Fan, M. Garofalakis, R. Rastogi

R. Chand, S. Bianchi, M. Gradinariu

Scalable Publish/Subscribe Architectures & Algorithms — P. Felber 2

AgendaAgenda

Part I: Introduction Part II: Routing and Filtering AlgorithmsPart II: Routing and Filtering Algorithms

Distributed routing in broker overlays Efficient content filtering Scalable filter aggregation

Part III: Publish/subscribe overlaysPart III: Publish/subscribe overlays From broker overlays to P2P architectures Semantic communities for publish/subscribe


The publish/subscribe The publish/subscribe problemproblem

Publishers:Publishers: producers of information(e.g., stock quote, news feeds…)

Subscribers:Subscribers: consumers of

information

Filters:Filters: identify events that match

consumer interests

Publish/subscribe middleware

Centralized vs. distributed, persistent (DB) vs. transient, topic- vs. content-based, reliable vs. best-effort, etc.


Publish/subscribe benefitsPublish/subscribe benefits

Provides loose coupling between participants Space:Space: consumers and producers do not need to

know each other (don’t care where data comes from, where it goes, nor whether there is any consumer at all!)

Time:Time: consumers and producers do not need to participate at the same time

Synchronization:Synchronization: producers are not blocked, consumers receive messages asynchronously

Indirect addressing based on data and subcriptions

Simple, generic APIpublish()

subscribe()unsubscribe()

notify()


Publish/subscribe Publish/subscribe addressingaddressing

Topic-based P/S Named subscriber groups (like newsgroups, GC) Often topic hierarchies and wildcard-based

subscriptions (subtree, tree level, etc.)

Content-based P/S Subscription = filter on content of messages Very powerful (but complex matching algorithms)

Type-based P/S Publish/subscribe at the PL level, subscriptions

derived from class hierarchy, content-based filters

publish("quotes/NYSE/IBM","price=100,volume=150000,…")

subscribe("quotes/NYSE/*")

publish("place=NYSE,name=IBM,price=100,volume=150000,…")

subscribe("exchange=NYSE,price<60,volume>130000")


Publish/subscribe Publish/subscribe architecturesarchitectures

Centralized Single matching engine Limited scalability

Broker overlay Multiple P/S brokers Participants connected to

some broker Events routed through

overlay Peer-to-peer

Publishers & subscribers connected in P2P network

Participants collectively filter/route events, can be both producer & consumer


Content routing Content routing approachesapproaches

InternetInternet

KeepKeep

Discard

Discard

I. Flooding Broadcast, filter by consumerPros: simple protocol / routersPros: simple protocol / routersCons: network-inefficientCons: network-inefficient

InternetInternet

Compute

dst. list

Compute

dst. list

II. Match-first Precompute destination listPros: bandwidth-efficient, Pros: bandwidth-efficient, simplesimpleCons: time- and space-Cons: time- and space-inefficientinefficient

III. Distributed routing Brokers have partial view of subscriptions Determine who to forward events to

qq

ppForward e?Forward e?e


Our focusOur focus

Content-basedContent-based filtering and routing DecentralizedDecentralized architecture (broker overlay &

P2P) DistributedDistributed routing protocol ScalableScalable to millions of subscriptions EfficientEfficient (near real-time) processing (Semi-)structured data based on standards

XMLXML data (mainly) XPathXPath subscriptions (mainly)



Part II: Routing and Filtering Algorithms


[email protected]


AgendaAgenda

Part I: IntroductionPart I: Introduction Part II: Routing and Filtering Part II: Routing and Filtering

AlgorithmsAlgorithms Distributed routing in broker overlays Efficient content filtering Scalable filter aggregation



Publish/subscribe modelPublish/subscribe model Consumers register

subscriptionssubscriptions Producers publish eventsevents Messages are routed to

interested consumers Interested message

matchesmatches subscription Matching based on the

contentcontent of messages P/S broker overlaybroker overlay Large number of

consumers (100s of 1,000s)

Large amounts of data

InternetInternet

Symbol: LUPrice: $10Volume: 101,000


City: NiceWeather: SunnyTemp: 24ºC

City: NiceWeather: SunnyTemp: 24ºC

Stock QuotesSymbol = LU andPrice ≥ 10


Weather ForecastCity = Nice

Weather ForecastCity = Nice

Stock QuotesVolume > 100,000

Stock QuotesVolume > 100,000


Distributed content routingDistributed content routing We have a network of brokers that collectively

route events based on their content Given an event, a broker must determine which other

brokers and consumers to forward it to (like IP routing)

Goal: design a distributed routing protocol such that Routing is “perfect”: messages are received by all,

and only those, consumer that have a matching subscription

Space-, time- and bandwidth-efficientSpace-, time- and bandwidth-efficient

qq

ppForward e?Forward e?e

RTRTNext-hop computation


Content routing: Content routing: techniquestechniques

Space-efficiency Filter containmentcontainment and aggregationaggregation to reduce size

of routing tables Small routing table also mean faster filtering

Time-efficiency Smart subscription indexingindexing for fast filtering Given a message, quickly identify matching

subscription Bandwidth-efficiency

Maintain accurate routingaccurate routing tables to only forward if interested consumer downstream

Subscription cancellations might trigger updates in routing tables to avoid false positives


Next Hop Computation(Matching Algorithm)

Next Hop Computation(Matching Algorithm)

Forwarding Table

-------- ---------------- ---------------- ---------------- ---------------- ---------------- ---------------- --------

Forwarding Engine

Router model: forwardingRouter model: forwarding

CONTENT

IncomingMessage

HEADER

Message Content

Message Content

Content FilterContent Filter Next HopsNext Hops

Content-Based RoutingContent-Based Routing

Destination PrefixDestination Prefix Next HopNext Hop

Destination AddressDestination Address

IP RoutingIP Routing


Content routing: principleContent routing: principle Routing tables store

content filters and next hops

Do not care about duplicate subscriptions downstream one router

Do not care about “more specialized” subscriptions downstream one router:containmentcontainment & aggregationaggregation

Subscription cancellation can trigger complex updates to routing tables

Stock QuotesSymbol = LU




r

Filter Next hopSymbol=LU r

Routing table



r


Routing table





r


Routing table

Stock QuotesSymbol = LU and Price ≥ 10


Symbol: LUPrice: $9Symbol: LUPrice: $9

r

Filter Next hop

Routing table



Symbol=LU& Price≥10 r


ContainmentContainment

Filter p containscontains filter q (p ⊒ q) iif any message m that satisfies q also satisfies p

Containment relation defines a partial order



Stock QuotesSymbol = LUStock QuotesSymbol = LU ⊒⊒

128.178/16128.178/16 128.178.192/24128.178.192/24⊒⊒ 128.178.192.112128.178.192.112⊒⊒

128.178/16128.178/16

128.178.192/24128.178.192/24

128.178.192.32128.178.192.32 128.178.192.112128.178.192.112

128.178.62/24128.178.62/24



Stock QuotesSymbol = LUStock QuotesSymbol = LU

Stock QuotesPrice ≥ 10Stock QuotesPrice ≥ 10


AggregationAggregation

Observation: if one is interested in messages that match filters p and q, and p ⊒ q, then it is sufficient to test messages against p

Aggregation:Aggregation: combine a set of filters S into an aggregate filter pa s.t. q S, pa ⊒ q E.g., IP prefix aggregation in BGP tables Smaller routing tables, more efficient filtering

qq

pp

Forward iif matches pForward iif matches p


Aggregation (cont’d)Aggregation (cont’d)

Perfect aggregation:Perfect aggregation: any message that matches pa matches some q S (pa = qS q)

Imperfect (lossy):Imperfect (lossy): otherwise — may generate unnecessary traffic (false positives)

128.178/16128.178/16

128.178.192/18128.178.192/18

128.178.64/18128.178.64/18

128.178.0/18128.178.0/18

128.178.128/18128.178.128/18




Stock QuotesSymbol = LU andPrice < 11

Stock QuotesSymbol = LU andPrice < 11

⊒⊒⊒⊒

128.178/16128.178/16

128.178.224/20128.178.224/20

128.178.0/17128.178.0/17

128.178.160/20128.178.160/20

128.178.128/19128.178.128/19




Stock QuotesSymbol = LU andVolume > 100,000

Stock QuotesSymbol = LU andVolume > 100,000

⊒⊒⊒⊒

128.178.190.43128.178.190.43Symbol = LUPrice = 0.75Volume = 12,053

Symbol = LUPrice = 0.75Volume = 12,053

False positive

False positive

10000000 10110010 / 16

10000000 10110010 / 16

00

01

10

11

0

100

1100

1110

10000000 1011001010111110 10101100


Content routing protocol Content routing protocol (sketch)(sketch)

Multicast tree between producer and consumers

Subscriptions sent upward and aggregated on the way If S1 and S2 are received

from distinct interfaces (S1 ⊒ S2), S2 substituted by S1

For the router upstream, S2 is represented by S1

Messages are forwarded downward to consumers using routing tables

Cancellations update routing tables to maintain perfect routing

A

C

B

E

D

F G H

C1 C2




S1


Stock QuotesSymbol = LU andPrice ≥ 10⊒⊒

S2

Aggregation

S1→C1S1→C1

S1→GS1→G

S1→BS1→B

S1→DS1→D

S2→C2S2→C2

S2→HS2→H

S1→DS2→E

S1→DS2→E

S1→BS1→B

S2→C2S2→C2

S2→HS2→H

S2→ES2→E

S2→BS2→B

Similar to SIENA routing protocolDetails in [SRDS 05] [TPDS 08]


How effective is How effective is aggregation?aggregation?

Sample Topology GT-ITM (transit-stub)

64 routers 24 consumers nodes (red) 1 producer node (green) Up to 50,000 subscriptions

Random, Zipfian skew Compared with

“Simple” protocol: factorize same subscriptions

“Match-first”: producers compute destination list


Space efficiencySpace efficiency Average table size is

15% the size of “simple” protocol 252 entries with 50,000

subscriptions

Max table size is 43 times bigger with “match-first” approach (at producer nodes) — 13 times bigger with “simple”


AgendaAgenda





SubscriptionsSubscriptions

XML-based XML-based publish/subscribepublish/subscribe

Exponentially growing amount of data on Internet XML is the de-facto standard for data representation

Goal: P/S overlay for XML content distribution Build “smart” application-layer XML content XML content

routersrouters Quickly match XML content against standing

subscriptions Route documents to interested data consumers

Two important problems to solveI. Fast forwarding: efficient XML filteringII.Keep routing table small: XML filter aggregation


Context: XML (messages)Context: XML (messages) Extensible Markup Language: universal

interchange (meta-)language, standard, semi-structured

Type/structureType/structure (tags, defined by DTD or schema) vs. contentcontent (values, data associated with tags)

Well-formed: syntactically correct Valid: matches DTD or schema XML documents : single-rooted trees<quotes><quotes> <stock><stock> <name><name>Lucent Tech.Lucent Tech.</name></name> <symbol><symbol>LULU</symbol></symbol> <price><price>1010</price></price> </stock></stock> <stock><stock> <name><name>Cisco Systems, Inc.Cisco Systems, Inc.</name></name> <symbol><symbol>CSCOCSCO</symbol></symbol> <price><price>1717</price></price> </stock></stock></quotes></quotes>

Start/End Tags(properly nested)

Character Data

quotesquotes

stockstock

namename symbolsymbol priceprice

stockstock


LucentLucent LULU 1010 CiscoCisco CSCOCSCO 1717


Context: XPath Context: XPath (subscriptions)(subscriptions)

Simple language: navigate/select parts of XML tree

XPath Expression: sequence of node tests, child (/), descendant (//), wildcard (*), qualifiers ([...]) Constraints on structure and content of messages Using qualifiers, define tree patterntree pattern: specifies

existential condition for paths with conjunctions at branching nodes

XPath fragment, binary output: selection match

quotesquotes

stockstock


stockstock


LucentLucent LULU 1010 CiscoCisco CSCOCSCO 1717

priceprice

symbolsymbol

stockstock

quotesquotes ////

symbolsymbol

stockstock

**

=LU=LU

////

stockstock

symbolsymbol priceprice

=LU=LU >15>15

//stock//stock[price>15][price>15][symbol=“LU”][symbol=“LU”]

/quotes/stock/symbol/quotes/stock/symbol

//price//price

/*/stock[symbol=“LU”]/*/stock[symbol=“LU”]


XML filtering:XML filtering: problem problem abstractionabstraction

Goal: match XML data against large numbers of XPath expressions in “real-time”

Challenge: tree-structured, include “*” and “//” operators

XPath FilteringEngine

XPath FilteringEngine

S: Set of TreePatterns (XPEs)S: Set of Tree

Patterns (XPEs)

D: XMLDocument

S’: Subset of Sthat match D


Filtering algorithm: XTrieFiltering algorithm: XTrie Speed up XML filtering with a (reverse-)index: XTrie

Complextree patterns

(XPEs)

3. XTrieMatchingAlgorithm

3. XTrieMatchingAlgorithm

Start/EndElement

Events

Set of XPEsthat match

D

XMLdocument

D

XML Parser(SAX based)XML Parser(SAX based)

1. DecomposeXPEs

1. DecomposeXPEs

Set of simple,linear patterns(substrings)

2. Build XTrieindex

2. Build XTrieindex

TrieSubstring

Table

XTrie Index Construction Algorithm

XTrie Index


1. Decomposition of XPEs1. Decomposition of XPEs Decompose each XPE p into a set of (possibly

overlapping) linear substrings that “cover” p Substring: sequence of elements along some path in

XPE tree, with consecutive node pairs separated by “/” (no “*” or “//”)

Several valid decompositions (e.g., single-element, minimal)

Minimal decomposition

/a/b[c/d//e][g//e/f]/*/*/e/fXPE:

{ { /a/b/a/b, , /a/b/c/d/a/b/c/d, , //e//e, , /a/b/g/a/b/g,,

//e/f//e/f, , /*/*/e/f/*/*/e/f } }Substrings:

Substring-tree of S

bb

cc **

dd **

aa

gg

////

ee

cc

ff////

ee

/a/b/a/b

/a/b/g/a/b/g

//e/f//e/f

/a/b/c/d/a/b/c/d

//e//e

/*/*/e/f/*/*/e/f

Single-element decomposition


2. XTrie indexing2. XTrie indexing

XTrie index consists of 2 components Trie traversed down upon start tag Single pass over XML documents Table probed only for complete substrings

Runtime data structures to keep track of partial matches

TrieSubstring

Table

XTrie Index


2. XTrie indexing (cont'd)2. XTrie indexing (cont'd)

Substring TableParen

tRelPo

sRank

# Child

Next

1 0 [4,] 1 1 0 //a/a/b/c2 1 [3,3] 1 0 3 /*/a/b3 0 [2,2] 1 2 6 /a/b4 3 [2,2] 1 0 0 /a/b/c/e5 3 [4,4] 2 0 0 /*/b/c/d6 0 [2,2] 1 2 0 /a/b7 6 [1,1] 1 1 0 /a/b/c8 7 [2,2] 1 0 12 /*/d9 6 [2,] 2 0 0 //b/c

10 0 [2,] 1 1 0 //c/b11 10 [2,] 1 1 0 //c/d12 11 [3,3] 1 0 0 /*/*/d

p1=//a/a/b/c/*/a/b

p2=/a/b[c/e]/*/b/c/d

p3=/a/b[c/*/d]//b/c

p4=//c/b//c/d/*/*/d

p1=//a/a/b/c/*/a/b

p2=/a/b[c/e]/*/b/c/d

p3=/a/b[c/*/d]//b/c

p4=//c/b//c/d/*/*/d

41

112

78

07

02

01

01

01

01

94

510

103

115

81

23

S1={ //a/a/b/c, /*/a/b }

S2={ /a/b, /a/b/c/e, /*/b/c/d }

S3={ /a/b, /a/b/c, */d, //b/c }

S4={ //c/b, //c/d, /*/*/d }

S1={ //a/a/b/c, /*/a/b }

S2={ /a/b, /a/b/c/e, /*/b/c/d }

S3={ /a/b, /a/b/c, */d, //b/c }

S4={ //c/b, //c/d, /*/*/d }

Decomposed substringsXPEs

1

32 4 5

7 8 9 106

11 12

14 15

13

a b c d

a b

b

b

c

c

c d

d

e

Trie

Child node ptr

Substring table ptr

Max. suffix ptr


3. XTrie matching3. XTrie matching

1

32 4 5

7 8 9 106

11 12

14 15

13

a b c d

a b

b

b

c

c

c d

d

e

Substring TableParen

tRelPo

sRank

# Child

Next

1 0 [4,] 1 1 0 //a/a/b/c2 1 [3,3] 1 0 3 /*/a/b3 0 [2,2] 1 2 6 /a/b4 3 [2,2] 1 0 0 /a/b/c/e5 3 [4,4] 2 0 0 /*/b/c/d6 0 [2,2] 1 2 0 /a/b7 6 [1,1] 1 1 0 /a/b/c8 7 [2,2] 1 0 12 /*/d9 6 [2,] 2 0 0 //b/c

10 0 [2,] 1 1 0 //c/b11 10 [2,] 1 1 0 //c/d12 11 [3,3] 1 0 0 /*/*/d

Trie

<a> <a> <b> <c>...

<a> <a> <b> <c>...

41

112

78

07

02

01

01

01

01

94

510

103

115

81

23

XML

Child node ptr

Substring table ptr

Max. suffix ptr

Eager Ordered Tree-Structured matching

Details in [ICDE 02] [VLDBJ 02]


XTrie performanceXTrie performance

Varying # of unique XPEs P withT≈100, L=20, pw=pd=0.1, pb=0, =0

Varying document length T withP=100k, L=20, pw=pd=0.1, pb=0, =0

Scalability vs. # XPEs Scalability vs. # tags

10 DTDs (up to 2727 elements, 8512 attributes)Intel P4 (1.5 GHz) with 512 MB memory, Linux, GNU C++10 DTDs (up to 2727 elements, 8512 attributes)Intel P4 (1.5 GHz) with 512 MB memory, Linux, GNU C++


AgendaAgenda





Tree pattern aggregationTree pattern aggregation Problem:Problem: content routers need to store and match

content against huge numbers of subscriptions Need techniques to aggregate user subscriptionsaggregate user subscriptions

to a smaller set of aggregated content specifications Networking analog: Heavy aggregation of IP addresses in the

routing tables of routers on the Internet backbone However, subscription aggregation also implies a

“precision loss” False positives matching the aggregated content

specifications without matching the original subscriptions Goal: Goal: aggregate subscriptions to a small

collection while minimizing the “precision loss”


Aggregation:Aggregation: problem problem statementstatement

Given a set of tree patterns S and a space bound k, compute anew set S’ of aggregate patterns such that:1) S’ ⊒ S (i.e., S’ “generalizes” S — for each p S

there exists q S’ s.t. p ⊒ q)

2) (i.e., S’ is concise — |p| = number of tree nodes in p)

3) S’ is as precise as possible (i.e., any other set of patternssatisfying (1) and (2) is at least as general as S’) Minimize extra coverage (false positives) for the aggregated set S’

'

||Sp

kp

Basic algorithmic tools: containment, minimization, least-upper-bound (LUB) computation


⊒⊒

Basic algorithmsBasic algorithms Containment:Containment: given p and q, does p contain

q? Principle: find an homomorphism from q to p Algorithm based on dynamic programming

Basic DP recurrence p(v), q(w) = sub-patterns rooted

at nodes v, w of patterns p, q

“//” maps to empty path“//” maps to path ≥ 2

tag(v) is at least as general; “//” ≥ “*” ≥ a

aa

**

bb cc

w

q(w)

aa

** ////

bb cc

v

p(v)

CONTAINS[ p(v), q(w) ] = [ tag(v) ≥ tag(w) ] AND⋀v’=child(v) ( ⋁w’=child(w) CONTAINS[ p(v’), q(w’) ] )If tag(v) = “//” then

CONTAINS[ p(v), q(w) ] = CONTAINS[ p(v), q(w) ] OR⋀v’=child(v) ( CONTAINS[ p(v’), q(w) ] )

OR ⋁w’=child(w) CONTAINS[ p(v), q(w’) ]


Basic algorithms (cont’d)Basic algorithms (cont’d) Theorem:Theorem: CONTAINS[p, q] algorithm determines

whether p ⊒ q in O(|p|*|q|) time Tree-pattern minimization:Tree-pattern minimization: interested in TPs

with minimal # nodes — eliminate “redundant” sub-trees

Algorithm MINIMIZE[p]: minimize pattern p by recursive, top-down applications of CONTAINS[ ]

Theorem:Theorem: MINIMIZE[p] minimizes p in O(|p|2) time

Contains the left-child sub-pattern can be eliminated without changing pattern semantics !

xx

aa

aa

bb cc

////

cc


Basic algorithms (cont’d)Basic algorithms (cont’d)

Least-upper-bound:Least-upper-bound: given tree patterns p and q, find the most precise/specific tree pattern containing both p and q LUB(p, q) = tightest generalization of p, q Shown that LUB(p, q) exists and is unique (up to

pattern equivalence) Straightforward generalization to any set of tree

patterns Algorithm LUB[p, q]: computes LUB of p and q

Uses of pattern containment and minimization algorithms

Similar, dynamic-programming flavor as CONTAINS[ ] algorithm, but somewhat more complicated

Details in [VLDB 02]


Quantifying precision lossQuantifying precision loss Consider aggregated pattern pa that generalizes

a set of patterns S (i.e., pa ⊒ q for each q S) Want to quantify the “loss in precision” when using pa

instead of S, i.e., the fraction of “false positives” Selectivity(pa) = fraction of documents matching pa

Selectivity(S) = fraction of documents matching any q S

Clearly, Selectivity(p) ≥ Selectivity(S) Precision loss = Selectivity(p) - Selectivity(S)Precision loss = Selectivity(p) - Selectivity(S)

Idea: use document distribution statistics to estimate selectivity and quantify precision loss Cannot keep the entire document distribution! Use coarse statistics (“document tree” synopsis)

computed on-the-fly over the streaming documents


The document-tree The document-tree synopsissynopsis

Document-tree synopsis:Document-tree synopsis: tree with paths labeled by frequency counts (# documents containing path) Summary of path-distribution characteristics of

documents Construction

Identify distinct document paths

Install all skeleton-tree paths in the synopsis Trace each path from the root, increasing frequency

counts and adding new nodes where necessary

Coalesce same-tag siblings

XML Document Skeleton Tree

xx

aaaa bb

bb cccc dd

xx

aa bb

bb cc dd


Sample document-tree Sample document-tree synopsissynopsis

XML documents:

Synopsis: Merge low-frequency nodes

for further compression

xx

aaaa bb

bb cccc dd

xx

aa bb

cc aadd aa

cc dd

xx

aa aa bb

bbcc dd aa

dd

xx

aa bb

bb cc aadd dd

cc dd

33

33 33

22 33 2222

11

2211

xx

aa bb

** **

**

33

33 33

2.32.31.51.5

1.51.5


Estimating pattern Estimating pattern selectivityselectivity

For simple patterns (no branching or wildcards), get the selectivity directly from the synopsis

For branching paths: assume branch independence Selectivity(p) = (individual branch selectivity)

Selectivity(S) = Selectivity(⋁qS q) Selectivity(S) = max { Selectivity(q) } (fuzzy-OR) Same idea for wildcards: max of all possible instantiations

SEL[ ] estimates selectivity in O(|p|*|T|) time

Selectivity = 2/3

Selectivity = (2/3)*(2/3) = 4/9

xx

aa

dd

xx

aa

bb dd

xx

aa bb

bb cc aadd dd

cc dd

33

33 33

22 33 2222

11

2211


Selectivity-based Selectivity-based aggregationaggregation

Algorithm AGGREGATE(S, k)

BENEFIT(BENEFIT(xx)) based on marginal gain Maximize the gain in space per unit of “precision

loss” ( let c(x) = { p in S that are contained in x } )

S: set of tree patternsk: space bound

BENEFIT(BENEFIT(xx)) = (c(x) |p| - |x| ) / ( SEL(x) – SEL(c(x)) )

while ( p S |p| > k ) do

C = candidate aggregate patterns generated using LUB

computations & node pruning on patterns in S

Select pattern x in C such that BENEFIT(BENEFIT(xx)) is maximized

S = S + { x } - { p in S that are contained in x }


Aggregation effectivenessAggregation effectiveness Compare AGGR against

“naive” PRUNE algorithm Delete “prunable” nodes

with highest frequencies Key metrics: Selectivity loss

(# FPs) / (# Documents not matching any original TP)

XHTML and NITF DTDs Optional Zipfian skew on

documents and TPs 1k documents to “learn” the

synopsis, 1k to measure algorithm performance

10k TPs (≥ 100k tree nodes)


SummarySummary Application-layer networking using overlays Important problem: XML content routing1. Scalable routing protocol

Based on containment and aggregation

2. Efficient data filtering XTrie — A novel index structure that supports the

efficient filtering of streaming XML data based on XPath expressions

3. Tree-Pattern Aggregation LUB computations and coarse document statistics to

compute “precise” aggregates Selection of aggregates based on marginal gains



Part III: Architectures


[email protected]


AgendaAgenda

Part I: IntroductionPart I: Introduction Part II: Routing and Filtering AlgorithmsPart II: Routing and Filtering Algorithms


Part III: Publish/subscribe overlaysPart III: Publish/subscribe overlays From broker overlays to P2P

architectures Semantic communities for publish/subscribe


Broker-based approachBroker-based approach

Fixed infrastructure of reliable brokers (Subset of) subscriptions stored at brokers in

routing tables Typically takes advantage of “containment”

relationship Filtering engine matches message against

subscriptions to determine next hop(s)

Cons:Cons: dedicated infrastructure, large routing tables, complex filtering algorithms


P2P approachP2P approach Producers and consumers also act as routers

Directly communicate with each other Filter and forward events to interested consumers

Key idea:Key idea: Place consumers with similar interests close to each

others Trivial routing: forward to neighbors

iif event matches our interests(disseminate messages in “semanticcommunity” & stop when reachingboundaries)

Pros:Pros: broker-less, space-efficient, low filtering cost Cons:Cons: hard to maintain, less reliable, some FPs (& FNs)

Key problem:Key problem: build overlay according to interests


P2P approach (cont’d)P2P approach (cont’d) Problem: build overlay according to interestsI. Use “rigid” structure

Based on containment trees, spatial filters, DHTs, etc. New consumers inserted at specific position in overlay Overlay designed to avoid false negatives, limit false

positives

II.Use “loose” structure Gather consumers in semantic communities build using

proximity metric New consumers connect to peers with “close” interests More flexible architecture, but can have false negatives


Building interest-based Building interest-based overlaysoverlays

Exploit containment relationship and organize consumer in containment treecontainment tree Assumption: 1 subscription = 1 node Sa is S’s parent if Sa is the most specialized

subscription (deepest in tree) such that S ⊒ Sa

Virtual root node(s) Equivalence trees for same

subscriptionsPrice>10

Name=A10<Price<

30

Name=APrice=20

Name=A

Name=APrice=30

Volume=100

Name=AVolume=15

0

Name=AVolume<2

00

Name=AVolume>500

Name=AVolume>50

0

Name=AVolume>50

0

pr

p1

p2

p3

p4

p5

p6 p7

p8

p9 p10


Routing eventsRouting events Events forwarded downward and upward

If received from downward, also propagate upward (even if it does not match local subscription)

No false negatives, some false positives

Price>10

Name=A10<Price<

30

Name=APrice=20

Name=A

Name=APrice=30

Volume=100

Name=AVolume=15

0

Name=AVolume<2

00

Name=AVolume>500

Name=AVolume>50

0

Name=AVolume>50

0

pr

p1

p2

p3

p4

p5

p6 p7

p8

p9 p10

e: Name=APrice=30Volume=100

e

Problems• Tree is often unbalanced• Root node(s) heavily

loaded• Non-trivial reorganization

upon arrival, departure

Problems• Tree is often unbalanced• Root node(s) heavily

loaded• Non-trivial reorganization

upon arrival, departure


Evaluation: false positivesEvaluation: false positives Setup:

1,000 XML documents Varying population of

peers (XPath TPs) On average, 25%

interested peers Observations:

Low FP ratio, decreases exponentially with # peers

Reorganizations help: a new peer may be a better parent for an existing one

Broadcast would give 75% false positives Details in [EP 05]


Spatial filtersSpatial filters

Often, events are simple attribute-value pairs and subscriptions are predicates over these values Each attributes represents one dimension Events are points in an N-dimensional space Predicates are ranges, i.e., poly-space rectangles in

the N-dimensional spaceSpatialrepresentation

(N=2)

Associatedcontainment

graph


R-tree spatial filtersR-tree spatial filters Height-balanced tree

data structures for indexing multi-dimensional data Leaves: subscriptions Inner nodes: bounding

rectangles

R-tree

Spatialrepresentation

of R-tree


Distributed R-treesDistributed R-trees Idea: organize consumers in R-tree structure

Peers at leaves and inner nodes An inner node is its own child Promote more general (i.e., larger) subscription as

parent Events routing as for containment tree Use classical rules for constructing R-trees (or R+,

R*) No false negativeDistributed

R-treeAssociated

communicationgraph

Details in [ICDCS 07] [TPDS 09]


Evaluation: false positivesEvaluation: false positives

Different R-tree variants (splitting method): linear, quadratic, R* Dimensions N=4 — degree (m,M)=(5,10) — uniform

events Few false positives (1-3%)z(25:75) 10,000 subscriptions


AgendaAgenda

Part I: IntroductionPart I: Introduction Part II: Routing and Filtering AlgorithmsPart II: Routing and Filtering Algorithms


Part III: Publish/subscribe overlaysPart III: Publish/subscribe overlays From broker overlays to P2P architectures Semantic communities for

publish/subscribe


Building semantic Building semantic communitiescommunities

Gather consumers in semantic communities according to interests (subscriptions)

Disseminate messages in community & stop when reaching boundaries

Challenge: identify subscription proximity

““are two distinct subscription likely to match are two distinct subscription likely to match the same set of documents?”the same set of documents?”


Problem statementProblem statement

GivenS: valid tree patterns (subscriptions)D: valid documentsp, q S

computepp ~ ~ qq ~: S2 [0,1]

(probability that p matches the same subset of D as q)

Algorithms useH D: historical data about document streamk: space bound


Basic approachBasic approach

1.1. SummarizeSummarize the document stream Synopsis maintained incrementally Accurate yet compact (compression,

pruning)

2. Evaluate selectivityselectivity of tree pattern using synopsis

Recursive algorithm matches TP against synopsis

3. Estimate similaritysimilarity using various metrics

Similarity computed from selectivity


1. Document-tree synopsis1. Document-tree synopsis

Maintain a concise, accurate synopsis HS

Built on-line as documents stream by Captures the path distribution of documents

in H Captures cross-pattern correlationscross-pattern correlations in

the stream p, q match the samesame documents (not just the

same number) Allows us to estimate the fraction of

documents matching different patterns


1. Document-tree synopsis1. Document-tree synopsis

Document-tree synopsis:Document-tree synopsis: tree with paths labeled with matching setsmatching sets (documents containing path) Summary of path-distribution characteristics of

documents Adding a document to the synopsis:

Trace each path from the root of the synopsis, updating the matching sets and adding new nodes where necessary

XML documents:

Synopsis:

xx

aaaa bb

bb cccc dd

xx

aa bb

cc aadd aa

cc dd

xx

aa aa bb

bbcc dd aa

dd

xx

aa bb

bb cc aadd dd

cc dd

{1,2,3}{1,2,3}

{1,2,3}{1,2,3}{1,2,3}{1,2,3}

{1,2,3}{1,2,3}{2,3}{2,3}{1,3}{1,3} {2,3}{2,3} {1}{1}

{2}{2} {2,3}{2,3}

/./.1

2

3


1. Matching set 1. Matching set compressioncompression

Problem: cannot maintain full matching set With N documents: O(N)

Approach 1: only maintain document countdocument count Independence assumption unrealistic (no cross-

pattern correlation)P(S1) = 2/3 * 1/3 = 2/9 vs. 0

P(S2) = 2/3 * 2/3 = 4/9 vs. 2/3xx

aa bb

bb cc aadd dd

cc dd

#3#3

#3#3#3#3

#3#3#2#2#2#2 #2#2 #1#1

#1#1 #2#2

/./.

////

bb

aa dd

S1

////

bb

dd aa

S2aa



Approach 2: use fixed-size sample setsfixed-size sample sets Keep uniform sample of s documents

[Vitter’s reservoir-sampling] P(kth document in synopsis) = min(1,s/k)

Once replaced, document ID deleted from whole tree

Sampling rate decided uniformly over all nodes Inefficient utilization of the space budget Poor estimates

1 2Document stream

3 4 5 6 7 8 9 10 11 12

Sample set

1 2 3 4

1 2 3 4

1 1 1 1 4/5 2/3 4/7 1/2 4/95

5

6 7 8

7

9



Approach 3: use per-node hash samplesper-node hash samples Gibbon’s distinct sampling: hash functions maps

document IDs on logarithmic range of levelsPr[h(x)≥l] = 1/2l

Hash samples start at level l=0, keep d h(d)≥l Once sample is full, increment level and “sub-

sample”

Fine sampling granularity, keep low frequency paths Much better estimates

0 1/2 3/4 7/8 1

l=0 l=1 l=2

3 …

Hash sample

1 23 456

l=0

87

l=1

1 2 3 4

2 45 8



Approach 3: (cont’d) Computing union/intersection: sub-sample lower

level to higher prior to union/intersection, then possibly once more

Estimate cardinality of sample with n elements: n2l

Only need to store document ID in hash samples at final nodes in incoming paths Matching set of parent can be reconstructed by

recursively unioning those of descendants

Good utilization of the space budget xx

aa bb

bb cc aadd dd

cc dd

{ }{ }

{ }{ }{ }{ }

{1,2,3}{1,2,3}{2,3}{2,3}{1,3}{1,3} { }{ } {1}{1}

{2}{2} {2,3}{2,3}

/./.

xx

aaaa bb

bb cccc dd

xx

aa bb

bb cc dd

1


1. Synopsis pruning1. Synopsis pruning Synopsis may grow very large (due to path

diversity) PrunePrune nodes with little influence on selectivity

estimation1. Merge same-label leaf nodes with high similarity2. Fold leave nodes in parent with high similarity3. Delete low-cardinality nodes

Similar: S(t) S(t’) / S(t) S(t’) 11

23


2. Evaluate selectivity2. Evaluate selectivity

Recursive algorithm matches TP against synopsis

≥: at least as general (// ≥ * ≥ a)

u: node of tree pattern

Found matching path for rootu in synopsis

No matching pathLook for any path in synopsisfor each branch of TP

Maps // to empty path……or to path of length ≥2

v: node of synopsis

No matching path

Leaf of synopsis, not leaf of TP Return matching set

Selectivity is # of matching documents / # total documents

For Countersmax*

Details in [ICDE 07]


3. Estimate similarity3. Estimate similarity Metrics to estimate similarity using selectivity

Conditional probability of p given q (if p and q match the same set of documents as q alone, then p ~ q)

Symmetrical conditional probability

Ratio of joint to union probability (also symmetric)

PP((ppqq) computed by merging root nodes of ) computed by merging root nodes of pp and and qq


Evaluation: setupEvaluation: setup

NITF and xCBL DTDs D: 10,000 documents with approx. 100 tag pairs, 10 levels Sp: 1,000 “positive” TPs (some match in D)

* (10%), // (10%), branches (10%), ≤10 levels, Zipf skew (1)

Sn: 1,000 “negative” TPs (no match in D)

Synopses with 3 variants for matching sets Different space budgets (sizes of matching sets,

compression degrees for pruning) Compare result of proximity metrics with exact

value computed from sets of matching documents


Evaluation: error metricsEvaluation: error metrics Let

P(p): exact selectivity of pP’(p): our estimate of the selectivity of p

Mi(p,q): exact proximity of p and q using Mi

M’i(p,q): our estimate of the proximity of p and q using Mi

Positive error:

Negative error:

Metrics error:


Positive error vs. hash sizePositive error vs. hash sizeHashes outperforms other approaches in

terms of accuracyHashes outperforms other approaches in

terms of accuracy

Less than 5% with 1,000 entries


Negative error vs. hash Negative error vs. hash sizesize

Hashes also outperforms other approaches (no error with xCBL for Hashes & Sets)

Hashes also outperforms other approaches (no error with xCBL for Hashes & Sets)


Positive error vs. synopsis Positive error vs. synopsis sizesize

For a given space budget, Hashes is the most accurate (after some threshold)

For a given space budget, Hashes is the most accurate (after some threshold)

Hashes becomes more accurate than Counters


Error of proximity metricsError of proximity metrics

Hashes produces the best

estimates

Hashes produces the best

estimates


Error vs. compression ratioError vs. compression ratioError remains small even for relatively high

compression degreesError remains small even for relatively high

compression degrees

Less than 15% error with 1:5 compression


ConclusionConclusion Decentralized (P2P) architectures for P/S

Key idea:Key idea: create P2P overlay with consumers sharing similar interests close to each other

I. “Rigid” structure (trees, R-trees, etc.) Organize peers according to containment relationship Trivial routing protocol, more complex maintenance Problem:Problem: build robust structure, balance load

II. “Loose” structure Create semantic communities for publish/subscribe Easier maintenance, may have false negatives Problem:Problem: estimate similarity of (seemingly unrelated)

subscriptions

Extra slidesExtra slides


ReferencesReferences[ICDE 02] C.Y. Chan, P. Felber, M.N. Garofalakis, R. Rastogi. Efficient Filtering

of XML Documents with XPath Expressions. In Proceedings of the 18th International Conference on Data Engineering (ICDE'02), San Jose, CA, February-March 2002.

[VLDBJ 02] Extended version of [ICDE 02] in VLDB Journal, Special Issue on XML, Volume 11, Issue 4, pp. 354-379, 2002.

[VLDB 02] C.Y. Chan, W. Fan, P. Felber, M.N. Garofalakis, and R. Rastogi. Tree Pattern Aggregation for Scalable XML Data Dissemination. In Proceedings of the 28th International Conference on Very Large Data Bases (VLDB'02), Hong Kong, China, August 2002.

[IC 03] P. Felber, C.Y. Chan, M.N. Garofalakis, R. Rastogi. Scalable Filtering of XML Data for Web Services. In IEEE Internet Computing, Volume 7, Issue 1, pp 49-57, 2003.

[NCA 03] R. Chand and P. Felber. A Scalable Protocol for Content-Based Routing in Overlay Networks. In Proceedings of the IEEE International Symposium on Network Computing and Applications (NCA'03), Cambridge, MA, April 2003.

[CS 03] P. Eugster, P. Felber, R. Guerraoui, and A.-M. Kermarrec. The Many Faces of Publish/Subscribe. In ACM Computing Surveys, Volume 35, Issue 2, pp. 114-131, June 2003.

[DEBS 04] R. Chand and P. Felber. Efficient Subscription Management in Content-based Networks. In Proceedings of the International Workshop on Distributed Event-Based Systems (DEBS'04), Edinburgh, Scotland, May 2004.


References (cont’d)References (cont’d)[SRDS 05] R. Chand and P. Felber. XNet: A Reliable Content Routing

Network. In Proceedings of the 23rd IEEE Symposium on Reliable Distributed Systems (SRDS'04), pp. 264-273, Florianopolis, Brazil, October 2004.

[EP 05] R. Chand and P. Felber. Semantic Peer-to-Peer Overlays for Publish/Subscribe Networks. In Proceedings of the International Conference on Parallel and Distributed Computing (Euro-Par'05), Lisboa, Portugal, August 2005.

[ICDE 07] R. Chand, P. Felber, and M. Garofalakis. Tree-Pattern Similarity Estimation for Scalable Content-based Routing. In Proceedings of the 23rd International Conference on Data Engineering (ICDE'07), Istanbul, Turkey, April 2007.

[ICDS 07] S. Bianchi, A.K. Datta, P. Felber, and M. Gradinariu. Stabilizing Peer-to-Peer Spatial Filters. In Proceedings of the 27th International Conference on Distributed Computing Systems (ICDCS'07), Toronto, Canada, June 2007.

[EP 07] S. Bianchi, P. Felber, and M. Gradinariu. Content-based Publish/Subscribe using Distributed R-trees. In Proceedings of the International Conference on Parallel and Distributed Computing (Euro-Par'07), Rennes, France, August 2007.

[TPDS 08] R. Chand and P. Felber. Scalable distribution of XML content with XNet. In IEEE Transactions on Parallel and Distributed Systems, Volume 19, Issue 4, pp. 447-461, April 2008.

[TPDS 09] S. Bianchi, P. Felber, and M. Gradinariu. Stabilizing Distributed R-trees for Peer-to-Peer Content Routing. In IEEE Transactions on Parallel and Distributed Systems, 2009.


What is an overlay What is an overlay network?network?

Physical Network

Overlay Network

AA

BB

CC

Focus on the application layer

Focus on the application layer

Treat multiple hops through IP network as one hop in an overlay

network

Treat multiple hops through IP network as one hop in an overlay

network

Overlay paths may be longer than IP paths

(large “stretch”)

Overlay paths may be longer than IP paths

(large “stretch”)