Upload
benedict-watkins
View
214
Download
1
Tags:
Embed Size (px)
Citation preview
Sven Bittner and Sven Bittner and Annika HinzeAnnika Hinze,,
2 November 20052 November 2005
Talk at the 13th International Conference onTalk at the 13th International Conference onCooperative Information Systems (CoopIS 2005) Cooperative Information Systems (CoopIS 2005)
A Detailed Investigation of A Detailed Investigation of Memory Requirements for Memory Requirements for Publish/Subscribe Filtering Publish/Subscribe Filtering
AlgorithmsAlgorithms
22/26/26
Motivation: Motivation: Publish/SubscribePublish/Subscribe
• Subscribers Subscribers register register subscriptionssubscriptions• Publishers Publishers send send event messagesevent messages• SystemSystem informs usinginforms using notificationsnotifications EBayEBay
TradeMeTradeMe
UserUser
Pub/Sub SystemPub/Sub System
pub(item,price,
pub(item,price,
timeLeft,…)timeLeft,…)
pub(item,price,
pub(item,price,timeLeft,…)timeLeft,…)
Notify about Notify about items of interestitems of interest
SubscriptionSubscription
pub(item,...)pub(item,...)
FilteringFiltering
Annika Hinze – Expressive Event Filtering in Distributed SystemsAnnika Hinze – Expressive Event Filtering in Distributed Systems
33/26/26
Motivation: Application Motivation: Application ScenarioScenario
• A subscriber is interested in A subscriber is interested in FrenchFrench books whose books whose title contains the phrase “title contains the phrase “Harry PotterHarry Potter”. ”.
• According to the condition of the copy of the book According to the condition of the copy of the book ((newnew, , usedused), she wants to pay at most ), she wants to pay at most NZ$10.0NZ$10.0 or or NZ$15.0NZ$15.0. .
• To avoid unnecessary notifications, the subscriber To avoid unnecessary notifications, the subscriber will be notified not earlier than will be notified not earlier than one dayone day before the before the auction ends.auction ends.
title like “Harry Potter” endingWithin < 1 day language = FRENCH
condition = NEW condition = USEDprice < 10.0price < 15.0
AND AND
AND
OR
Annika Hinze – Expressive Event Filtering in Distributed SystemsAnnika Hinze – Expressive Event Filtering in Distributed Systems
44/26/26
Motivation: Research Motivation: Research QuestionQuestion
• Current approaches only support Current approaches only support conjunctionsconjunctions
Canonical conversion (DNF) requiredCanonical conversion (DNF) required
++ Fast filtering process (no Boolean Fast filtering process (no Boolean expressions)expressions)
−− HighHigh memory usage (exponentially-sized memory usage (exponentially-sized DNF)DNF)
Effective in DBMS, but also in pub/sub?Effective in DBMS, but also in pub/sub?Annika Hinze – Expressive Event Filtering in Distributed SystemsAnnika Hinze – Expressive Event Filtering in Distributed Systems
55/26/26
title like “Harry Potter” endingWithin < 1 day language = FRENCH
condition = NEW condition = USEDprice < 10.0price < 15.0
AND AND
AND
OR
title like “Harry Potter” endingWithin < 1 day language = FRENCHcondition = NEW price < 15.0
AND
title like “Harry Potter” endingWithin < 1 day language = FRENCHcondition = USEDprice < 10.0
AND
Canonical conversionCanonical conversion
Motivation: Canonical Motivation: Canonical ConversionConversion
Annika Hinze – Expressive Event Filtering in Distributed SystemsAnnika Hinze – Expressive Event Filtering in Distributed Systems
66/26/26
Motivation: GoalMotivation: Goal• Analyse influence of conversions on Analyse influence of conversions on
memory memory scalability (and efficiency) scalability (and efficiency)
– Define scheme to characterise Define scheme to characterise subscriptionssubscriptions
• Describe structure of subscriptionsDescribe structure of subscriptions• Abstraction from specific application scenarioAbstraction from specific application scenario
Derive memory requirements of Derive memory requirements of algorithmsalgorithms
Annika Hinze – Expressive Event Filtering in Distributed SystemsAnnika Hinze – Expressive Event Filtering in Distributed Systems
77/26/26
StructureStructure• MotivationMotivation
• Characterisation Scheme/AlgorithmsCharacterisation Scheme/Algorithms
• Theoretical Analysis and ComparisonTheoretical Analysis and Comparison
• Practical AnalysisPractical Analysis
• Summary and OutlookSummary and OutlookAnnika Hinze – Expressive Event Filtering in Distributed SystemsAnnika Hinze – Expressive Event Filtering in Distributed Systems
88/26/26
StructureStructure• MotivationMotivation
• Characterisation Scheme/AlgorithmsCharacterisation Scheme/Algorithms
• Theoretical Analysis and ComparisonTheoretical Analysis and Comparison
• Practical AnalysisPractical Analysis
• Summary and OutlookSummary and OutlookAnnika Hinze – Expressive Event Filtering in Distributed SystemsAnnika Hinze – Expressive Event Filtering in Distributed Systems
99/26/26
Characterisation Scheme (1)Characterisation Scheme (1)• Fourteen parameters in four classesFourteen parameters in four classes
– Subscription-related (S)Subscription-related (S)• Characteristics of subscriptionsCharacteristics of subscriptions
– Algorithm-related (A)Algorithm-related (A)• Influence internal storageInfluence internal storage
– Conversion-related (C)Conversion-related (C)• Canonical conversionsCanonical conversions
– Subscription-event-related (E)Subscription-event-related (E)• Relation between events and subscriptionsRelation between events and subscriptions
MotivationMotivation Characterisation/Algorithms Characterisation/Algorithms Theoretical Analysis Experiments Theoretical Analysis Experiments OutlookOutlook
1010/26/26
Characterisation Scheme (2)Characterisation Scheme (2)
MotivationMotivation Characterisation/Algorithms Characterisation/Algorithms Theoretical Analysis Experiments Theoretical Analysis Experiments OutlookOutlook
1111/26/26
Characterisation Scheme: Characterisation Scheme: ExampleExample
||pp|| = 7 = 7 (number of predicates)(number of predicates)
||opop|| = 4 = 4 (number of Boolean operators)(number of Boolean operators)
opop rr = |= |opop|/||/|pp| = 4/7 | = 4/7 0.6 0.6 (relative number of Boolean operators)(relative number of Boolean operators)
SSss = 2 = 2 (disj. comb. elements after conversion)(disj. comb. elements after conversion)
sspp = (3*2+4*1)/7 = 10/7 = (3*2+4*1)/7 = 10/7 1.4 1.4 (conjunctive elements per predicate)(conjunctive elements per predicate)
ss rr = = sspp/S/Ss s == (10/7)/2 = 5/7 (10/7)/2 = 5/7 0.7 0.7 (relative conjunctive elements per pred.)(relative conjunctive elements per pred.)MotivationMotivation Characterisation/Algorithms Characterisation/Algorithms Theoretical Analysis Experiments Theoretical Analysis Experiments OutlookOutlook
title like “Harry Potter” endingWithin < 1 day language = FRENCH
condition = NEW price < 10.0price < 15.0
AND AND
AND
OR
condition = USED
AND AND
AND
OR
endingWithin < 1 day language = FRENCHcondition = NEW price < 15.0
ANDANDtitle like “Harry Potter” endingWithin < 1 day language = FRENCH
AND
condition = USEDprice < 10.0
AND
2222 2211 11 11 1122
title like “Harry Potter”
OriginalOriginalsubs-subs-criptioncription
Conver-Conver-ted ted subscrip-subscrip-tionstions
1212/26/26
Analysed AlgorithmsAnalysed Algorithms• Three filtering algorithmsThree filtering algorithms
– Canonical approaches (conjunctions)Canonical approaches (conjunctions)• Counting algorithm [Ashayer02,Yan94]Counting algorithm [Ashayer02,Yan94]• Cluster algorithm [Fabret01,Hanson90]Cluster algorithm [Fabret01,Hanson90]
– Non-canonical approach (Boolean Non-canonical approach (Boolean subscriptions)subscriptions)• Subscription-tree-based filtering approach Subscription-tree-based filtering approach
[Bittner05a,Bittner05b][Bittner05a,Bittner05b]
MotivationMotivation Characterisation/Algorithms Characterisation/Algorithms Theoretical Analysis Experiments Theoretical Analysis Experiments OutlookOutlook
1313/26/26
StructureStructure• MotivationMotivation
• Characterisation Scheme/AlgorithmsCharacterisation Scheme/Algorithms
• Theoretical Analysis and ComparisonTheoretical Analysis and Comparison
• Practical AnalysisPractical Analysis
• Summary and OutlookSummary and OutlookAnnika Hinze – Expressive Event Filtering in Distributed SystemsAnnika Hinze – Expressive Event Filtering in Distributed Systems
1414/26/26
Memory Usage: AnalysisMemory Usage: Analysis• Counting algorithm Counting algorithm
• Cluster algorithmCluster algorithm
• Non-canonical algorithmNon-canonical algorithm
Motivation Characterisation/Algorithms Motivation Characterisation/Algorithms Theoretical AnalysisTheoretical Analysis Experiments Experiments OutlookOutlook
1515/26/26
Memory Usage: Comparison Memory Usage: Comparison (1)(1)
• All formulae All formulae – grow linearly with |s|grow linearly with |s|– Cut ordinate in zeroCut ordinate in zero
Comparison of first derivations in |s| sufficientComparison of first derivations in |s| sufficient
• Assumptions (less parameters)Assumptions (less parameters)– Reasonable values for algorithm-related Reasonable values for algorithm-related
parameters (A)parameters (A)– Usage of relative parametersUsage of relative parameters
• Determine Determine turning pointturning point when NCA requires when NCA requires less memory than canonical solutionsless memory than canonical solutions
Motivation Characterisation/Algorithms Motivation Characterisation/Algorithms Theoretical AnalysisTheoretical Analysis Experiments Experiments OutlookOutlook
1616/26/26
Memory Usage: Comparison Memory Usage: Comparison (2)(2)
• Description of turning point by number of Description of turning point by number of disjunctive elements in DNF (disjunctive elements in DNF (SSss))– Beneficial behaviour of NCABeneficial behaviour of NCA– Boolean subscriptions worthwhileBoolean subscriptions worthwhile
Counting requires less memory than cluster Counting requires less memory than cluster algorithmalgorithm
Motivation Characterisation/Algorithms Motivation Characterisation/Algorithms Theoretical AnalysisTheoretical Analysis Experiments Experiments OutlookOutlook
1717/26/26
Memory Usage: ExampleMemory Usage: Example
||pp| = 7 | = 7 (number of predicates)(number of predicates)
opop rr = 4/7 = 4/7 (relative number of Boolean operators)(relative number of Boolean operators)
ss rr = 5/7 = 5/7 (relative conjunctive elements per predicate)(relative conjunctive elements per predicate)
= 89/49 1.82
= 89/56 1.59
Practice: Practice: SSss = 2 = 2
NCA uses less memory (turning point less than one disj.)NCA uses less memory (turning point less than one disj.)Motivation Characterisation/Algorithms Motivation Characterisation/Algorithms Theoretical AnalysisTheoretical Analysis Experiments Experiments OutlookOutlook
title like “Harry Potter” endingWithin < 1 day language = FRENCH
condition = NEW condition = USEDprice < 10.0price < 15.0
AND AND
AND
OR endingWithin < 1 day language = FRENCH
condition = NEW condition = USEDprice < 10.0price < 15.0
AND AND
AND
OR
1818/26/26
Memory Usage: IllustrationMemory Usage: Illustration• Setting Setting
– half as many operators as predicates (half as many operators as predicates (opop rr))– conjunctions per predicate vary (conjunctions per predicate vary (ss rr))
Only one disjunction per subscriptionOnly one disjunction per subscriptionresults in less memory requirementsresults in less memory requirementsof non-canonical approach.of non-canonical approach.
Counting vs. non-canonicalCounting vs. non-canonical Cluster vs. non-canonicalCluster vs. non-canonical
Motivation Characterisation/Algorithms Motivation Characterisation/Algorithms Theoretical AnalysisTheoretical Analysis Experiments Experiments OutlookOutlook
1919/26/26
StructureStructure• MotivationMotivation
• Characterisation Scheme/AlgorithmsCharacterisation Scheme/Algorithms
• Theoretical Analysis and ComparisonTheoretical Analysis and Comparison
• Practical AnalysisPractical Analysis
• Summary and OutlookSummary and OutlookAnnika Hinze – Expressive Event Filtering in Distributed SystemsAnnika Hinze – Expressive Event Filtering in Distributed Systems
2020/26/26
Practical AnalysisPractical Analysis• Verification of theoretical resultsVerification of theoretical results
• More memory required for More memory required for management of data structures, e.g.,management of data structures, e.g.,– ListsLists– Dynamic arraysDynamic arrays– Hash tablesHash tables
Overhead for different algorithms Overhead for different algorithms similar?similar?
MotivationMotivation Characterisation/Algorithms Theoretical Analysis Characterisation/Algorithms Theoretical Analysis ExperimentsExperiments OutlookOutlook
2121/26/26
Practical Analysis: Results Practical Analysis: Results (1)(1)
• ss rr=0.3 (predicates =0.3 (predicates
in few in few conjunctions)conjunctions)
• Consistent Consistent behaviour in behaviour in theory/practicetheory/practice
MotivationMotivation Characterisation/Algorithms Theoretical Analysis Characterisation/Algorithms Theoretical Analysis ExperimentsExperiments OutlookOutlook
2222/26/26
Practical Analysis: EfficiencyPractical Analysis: Efficiency
MotivationMotivation Characterisation/Algorithms Theoretical Analysis Characterisation/Algorithms Theoretical Analysis ExperimentsExperiments OutlookOutlook
• Nearly similar efficiency propertiesNearly similar efficiency properties
Overhead of converted (=more) Overhead of converted (=more) subscriptions outweighs more efficient subscriptions outweighs more efficient filtering (time and space)filtering (time and space)
2323/26/26
StructureStructure• MotivationMotivation
• Characterisation Scheme/AlgorithmsCharacterisation Scheme/Algorithms
• Theoretical Analysis and ComparisonTheoretical Analysis and Comparison
• Practical AnalysisPractical Analysis
• Summary and Future WorkSummary and Future WorkAnnika Hinze – Expressive Event Filtering in Distributed SystemsAnnika Hinze – Expressive Event Filtering in Distributed Systems
2424/26/26
Summary (1)Summary (1)• Characterisation schemeCharacterisation scheme
– Describe subscriptionsDescribe subscriptions– Calculate memory requirements of filter Calculate memory requirements of filter
algorithmsalgorithms
• Theoretical analysis and comparisonTheoretical analysis and comparison– Three algorithmsThree algorithms– Determination of point when NCA requires Determination of point when NCA requires
less memoryless memory
Even one disjunction might favour NCAEven one disjunction might favour NCA
MotivationMotivation Characterisation/Algorithms Theoretical Analysis Experiments Characterisation/Algorithms Theoretical Analysis Experiments OutlookOutlook
2525/26/26
Summary (2)Summary (2)• Practical analysisPractical analysis
– Memory in practical settingsMemory in practical settings– Correlation of efficiency propertiesCorrelation of efficiency properties
Theoretical results hold in practiceTheoretical results hold in practice NCA is equally/more time efficientNCA is equally/more time efficient
NCA is preferable algorithm if NCA is preferable algorithm if subscriptions include disjunctions subscriptions include disjunctions
MotivationMotivation Characterisation/Algorithms Theoretical Analysis Experiments Characterisation/Algorithms Theoretical Analysis Experiments OutlookOutlook
2626/26/26
Future WorkFuture Work• Distribute algorithmDistribute algorithm
– Optimise event and subscription routingOptimise event and subscription routing
– Problem: Problem:
Current routing optimisations only work for Current routing optimisations only work for conjunctive subscriptions (covering, conjunctive subscriptions (covering, merging)merging)
Design novel routing optimisationsDesign novel routing optimisations• Support arbitrary subscriptionsSupport arbitrary subscriptions• Subscription tree pruningSubscription tree pruning• Predicate replacementPredicate replacement
MotivationMotivation Characterisation/Algorithms Theoretical Analysis Experiments Characterisation/Algorithms Theoretical Analysis Experiments OutlookOutlook
Thank you for your Thank you for your attention!attention!
Contact:Contact:
Sven Bittner, Sven Bittner, Annika HinzeAnnika Hinze{s.bittner, a.hinze}@cs.waikato.ac.nz{s.bittner, a.hinze}@cs.waikato.ac.nz
ReferencesReferences[Ashayer02][Ashayer02] G. Ashayer, H.-A. Jacobsen, and H. Leung. Predicate Matching and G. Ashayer, H.-A. Jacobsen, and H. Leung. Predicate Matching and
Subscription Matching in Publish/Subscribe Systems. In Subscription Matching in Publish/Subscribe Systems. In Proceedings of the Proceedings of the 22nd IEEE International Conference on Distributed Computing Systems 22nd IEEE International Conference on Distributed Computing Systems Workshops (ICDCSW ’02)Workshops (ICDCSW ’02), pages 539–548, Vienna, Austria, July 2–5 2002., pages 539–548, Vienna, Austria, July 2–5 2002.
[Bittner05a][Bittner05a] S. Bittner and A. Hinze. On the Benefits of Non-Canonical Filtering in S. Bittner and A. Hinze. On the Benefits of Non-Canonical Filtering in Publish/Subscribe Systems. In Publish/Subscribe Systems. In Proceedings of the 25th IEEE International Proceedings of the 25th IEEE International Conference on Distributed Computing Systems Workshops (ICDCSW ’05)Conference on Distributed Computing Systems Workshops (ICDCSW ’05), , pages 451–457, Columbus, USA, June 6–10 2005.pages 451–457, Columbus, USA, June 6–10 2005.
[Bittner05b][Bittner05b] S. Bittner and A. Hinze. On the Benefits of Non-Canonical Filtering in S. Bittner and A. Hinze. On the Benefits of Non-Canonical Filtering in Publish/Subscribe Systems. In Publish/Subscribe Systems. In Proceedings of the 13th International Conference Proceedings of the 13th International Conference on Cooperative Information Systems (CoopIS 2005)on Cooperative Information Systems (CoopIS 2005), Agia Napa, Cyprus, , Agia Napa, Cyprus, October 31–November 4 2005.October 31–November 4 2005.
[Fabret01][Fabret01] F. Fabret, A. Jacobsen, F. Llirbat, J. Pereira, K. Ross, and D. Shasha. F. Fabret, A. Jacobsen, F. Llirbat, J. Pereira, K. Ross, and D. Shasha. Filtering Algorithms and Implementation for Very Fast Publish/Subscribe Filtering Algorithms and Implementation for Very Fast Publish/Subscribe Systems. In Systems. In Proceedings of the 2001 ACM SIGMOD International Conference on Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data (SIGMOD 2001)Management of Data (SIGMOD 2001), pages 115-126, Santa Barbara, USA, May , pages 115-126, Santa Barbara, USA, May 21–24 2001.21–24 2001.
[Hanson90][Hanson90] E. N. Hanson, M. Chaabouni, C.-H. Kim, and Y.-W. Wang. A Predicate E. N. Hanson, M. Chaabouni, C.-H. Kim, and Y.-W. Wang. A Predicate Matching Algorithm for Database Rule Systems. In Matching Algorithm for Database Rule Systems. In Proceedings of the 1990 Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data (SIGMOD ACM SIGMOD International Conference on Management of Data (SIGMOD 1990)1990), pages 271-280, Atlantic City, USA, May 23-25 1990., pages 271-280, Atlantic City, USA, May 23-25 1990.
[Yan94][Yan94] T. W. Yan and H. Garcia-Molina. Index Structures for Selective T. W. Yan and H. Garcia-Molina. Index Structures for Selective Dissemination of Information Under the Boolean Model. Dissemination of Information Under the Boolean Model. ACM Transactions on ACM Transactions on Database Systems (TODS)Database Systems (TODS), 19(2):332–364, 1994., 19(2):332–364, 1994.
Annika Hinze – Expressive Event Filtering in Distributed SystemsAnnika Hinze – Expressive Event Filtering in Distributed Systems