The Continuous Distributed
Monitoring ModelFarzad Nozarian
Chalmers University of Technology
18/04/2016
218/04/2016
Outline
Chalmers University of Technology
Countdown Problem
Monitoring Entropy
Geometric Approach
Sampling
Introduction
318/04/2016
What Is the Problem?
Chalmers University of Technology
Simple countdown!Tracking the entropy Distinct elementsSamplingTop-k items
Several processing nodes receive streams of data items
The goal is how to monitor a function over the union of items
Examples of monitoring functions:
with minimum communication cost
418/04/2016
Motivation and Applications
Chalmers University of Technology
Monitoring the global health of the network in a large ISP
Tracking the usage of resources in distributed data centers by social
networks
Tracking global changes by collecting information from sensors
518/04/2016
What Are the Challenges?
Chalmers University of Technology
Continuous MonitoringReal-time tracking, rather than one-shot query
StreamingData is received at a very high speed
Distributed Processing
Each node only sees part of the global streamCommunication cost is important
618/04/2016
Trivial Solutions
Chalmers University of Technology
High communication cost!
Summarizing information in complex functionsParameter tuning for frequency of the polling
Infrequent polling
Delay in identifying events
Frequent polling
High communication
Centralizing all the items
Periodic polling
The Countdown Problem
818/04/2016
The Countdown Problem
Chalmers University of Technology
A threshold monitoring problem with many applications
Identifying when the total number of observations reaches
Trivial solution: Observers notify the coordinator by sending a bit when an event is observed
But we can improve it!
communication
918/04/2016
A First Approach
Chalmers University of Technology
The total communication is
Idea: there are many events at each site before reaching the threshold
At least one site should see items before thresholdEvery site waits to see at least items before reporting to the coordinator
After receiving a report from observer the coordinator updates and informs all nodes
1018/04/2016
A Quadratic Improvement
Chalmers University of Technology
Waiting for more updates before reporting to coordinatorProtocol runs over rounds
The total communication is
In round , all nodes wait to receive items before reporting to the coordinator
Coordinator starts the th round after receiving messages
Monitoring Entropy
1218/04/2016
Monitoring Entropy
Chalmers University of Technology
Monitoring non-monotone functions
Let denote the number of occurrences of item
Let denote the total number of items
Union of input streams implicitly define a probability distribution given by ,
The goal is monitoring the entropy of this distribution
1318/04/2016
Entropy Protocol
Chalmers University of Technology
The protocol proceeds in multiple rounds
In the first round, coordinator collects a constant number of items from sites
In each subsequent round coordinator does the following:
Computes the parameter
Runs the approximate countdown protocol with Collects frequency distribution from all sites and computes current entropy
The Geometric Approach
1518/04/2016
The Geometric Approach (1/2)
Chalmers University of Technology
Goal: monitoring of arbitrary threshold non-linear functions
A geometric fact:
Idea: break down the testing of or into local conditions
1618/04/2016
The Geometric Approach (2/2)
Chalmers University of Technology
Each site checks whether its sphere is monochromaticWhen all the constraints are upheld:
Query result remains unchangedNo communication is required
When a constraint is violated:New data is gathered from the streamsNew constraints are set on the streams
Sampling
1818/04/2016
Sampling
Chalmers University of Technology
Given inputs of total size , draw a sample of size Uniform over all subsets of size
Sampling cases
Sampling applications
Approximate query answeringQuery planningNumber of distinct elementsHeavy hitters
Infinite windowsSliding windows
1918/04/2016
Infinite Windows (1/2)
Chalmers University of Technology
Each site associates a random weight with each observation
Coordinator maintains the following variables:
Set of random sample with weight no more than
Weight : the -th smallest weight so far in the system
Each site only maintains its local -th smallest weight
2018/04/2016
Infinite Windows (2/2)
Chalmers University of Technology
Protocol outline:
Each site sends an element with weight smaller than to the coordinator
Coordinator updates and , if weight of received item is smaller than
Coordinator replies back to site with the current value of
Thank You :)
Support Slides
2318/04/2016
A First Approach (long Ver.)
Chalmers University of Technology
Algorithm steps:Initially, each site report the coordinator whenever its num. of observed items exceeds Coordinator compute current slack based on the sum of all local count: ( is current count)Each site set upper bound on its local count
The total communication is
Idea: there are many events at each site before reaching the threshold At least one site should see items before
threshold
2418/04/2016
Approximate Countdown
Chalmers University of Technology
Improve the cost by approximating the answer
Similar to previous approach but now terminate when the bound of unreported count reaches The number of rounds is reduced to
The total communication is
Let be the approx. parameter
Report 0 if count Report 1 if count
2518/04/2016
Randomized Countdown Protocol (1/2)
Chalmers University of Technology
If grows very large the cost will be high
Allow algorithm to give an wrong answer with small probability
Randomization reduces the dependency to by parameter
2618/04/2016
Randomized Countdown Protocol (2/2)
Chalmers University of Technology
With randomization parameter determined by analysis:
Each site collect of observations
With probability it sends a message otherwise remains silent
The coordinator wait until receive messages, then terminates
The total communication cost is
2718/04/2016
Geometric Computational Model (1/2)
Chalmers University of Technology
Each site has a -dimensional vector called local statistics vector
Let be weights assigned to the streams
Define the global statistics vector as the weighted average of the s
Let be an arbitrary monitoring function
Goal: determining at any given time and threshold
2818/04/2016
Geometric Computational Model (2/2)
Chalmers University of Technology
is the last statistics vector collected from the node Coordinator constructs estimate vector is the weighted average of the
Each node also maintains following parameters:
Decomposing relies on the following fact:
Delta vector:
Drift vector:
2918/04/2016
Geometric Interpretation
Chalmers University of Technology
Geometric interpretation:
Convex hull can be fully covered by spheres with radius centered at
�⃗�
𝑢1𝑢2
𝑢3
𝑢4𝑢5