View
23
Download
1
Category
Tags:
Preview:
DESCRIPTION
Predicting and Bypassing End-to-End Internet Service Degradation. Anat Bremler-BarrEdith CohenHaim KaplanYishay Mansour Tel-Aviv University AT&T Labs Tel-Aviv University Talk Omer Ben-Shalom Tel-Aviv University. Outline:. Degradation - PowerPoint PPT Presentation
Citation preview
Predicting and Bypassing End-to-End Internet Service Degradation
Anat Bremler-Barr Edith Cohen Haim Kaplan Yishay Mansour
Tel-Aviv University AT&T Labs Tel-Aviv University
Talk
Omer Ben-ShalomTel-Aviv University
Outline:
• Degradation– deviation from “normal” (minimum) RTT.
• Predicting Degradation:– Different Predictors
• Performance Evaluation:– Precision/recall methodology
• Suggested Application: Gateway selection
Motivating Application
AS 56Peering link
Peering link AS 123
Intelligent Routing device
?
• Gateway selection (Intelligent Routing device)• Choosing peering links
AS 12
AS 41
Data and Measurements: Sources
•Aciri (CA2)•AT&T (CA1)
•AT&T(NJ1)•Princeton (NJ2)
•Base Measurements from 4 different location (AS) simulated 4gateway:
California (CA): AT&T + ACIRINew Jersey (NJ): AT&T + Princeton
Data and Measurements: Destinations
•Obtaining a representative sets of web servers + weights(derived from proxy-log)
•Aciri (CA2)•AT&T(CA1)
•AT&T(NJ1)•Princeton (NJ2)
Data and Measurements: RTT
• Data: Weekly RTT (SYN) ( End to End (path+server)) Hourly measurements 35,124 servers Once-a-minute weighted sample measurements 100 servers
•Aciri (CA2)•AT&T(CA1)
•AT&T(NJ1)•Princeton(NJ2)
Degradation: Definition• Deviation from minimum recorded RTT (propagation delay)
• Discrete degradation levels 1-6.
Leveltime (ms)
150+
2+100
3+200
4+400
5+800
6+1600
Objective: Avoiding degradation?
• Attempt to reroute through a different gateway
• Two conditions have to hold
Need to be able to predict the failure from a gateway
Need to have a substitute gateway (low correlation between gateways)
• Blackout (consecutive degradation) through one gateway
Blackout durations• Longer duration, easier to predict.
• Majority of blackouts are short 1-3 consecutive points
• However, considerable fraction occurs in longer durations.
Long duration blackout
Gateways Correlation
• Gateways are correlated but often the correlation is not too strong
Gateways Correlation• Longer blackouts more likely to be shared
– failure closer to the server
• Majority of 2-gateways blackouts involved same-coast pairs
Building predictors
• For a given degradation level l.
• Prediction per IP.
• Input: Previous RTT Measurements for the IP-address.
• Output: probability for a failure
• Predict “failure” if probability > Ф
Precision \ Recall Methodology
Predicted degraded
Actual degraded
PrecisionPrecision= = Predicted degradedPredicted degraded
Actual degraded & Predicted DegradedActual degraded & Predicted Degraded
RecallRecall= = Actual degradedActual degraded
Actual degraded & Predicted DegradedActual degraded & Predicted Degraded
Precision-recall curve
• Sweep the threshold Ф in [0,1] to obtain a precision-recall curve.
• In other words, let P(t) the predicted failure probability at time t
])(| tat time failurePr[)(
] tat time failure|)(Pr[)(
tPprecision
tPrecall
What is important for prediction?
• Recency principle– The more recent RTTs are more important.
• Quantity Principle– The more measurements the higher the
accuracy.
Recency Principle : Importance• Test case: Single measurement predictor
– predict according to a measurement x-minute ago.– observe the change in the quality of the prediction.
15% different between using the last minute measurement or the 15 minutes ago measurement
Minute ago
NJ-2 failure level 6 recall(=precision)
NJ-1 failure level 3 recall(=precision)
10.330.5220.310.4940.290.4870.280.46
100.270.45150.260.44
Quantity Principle: Importance
• Test case: Fixed-Window-Count (FWC)– the prediction is the fraction of failures in the W most
recent measurements
By quantity we can achieve better precision for high recall
FWC 1FWC 5FWC 10FWC 50
Our predictors
– Exponential Decay – Polynomial Decay– Model based Predictors:
• VW-cover : Variable Window Cover algorithm
• HMM : Hidden Markov Model
Exponential-decay predictors
• The weight of each measurement is exponentially decreasing with its age by factor λ.
For consecutive measurements:
– Binary variable ft represents a failure at time t.
• In general,
t
t
Ht
tt
tt
Ht tft
'
'
'
' ')(ExpDecay
)1(ExpDecay)1()(ExpDecay tft t
Polynomial-decay predictors
t
t
Ht
Ht t
tt
ttft
'
' '
)'(
)'()(PolyDecay
• Exact computation required to maintaining the complete history.
• We approximated it.
The VW-Cover predictor
• Consists of a list of pairs
( a1 , b1) ( a2 , b2 ) …( an , bn )
• Predict a failure if exist i such that there are at least bi failures among previous ai
measurements
VW-Cover predictor: Building
• Build the predictor greedily to cover the failures.
• Use a learning set of measurements – Pick ( a1 , b1 ) to be the pair which maximizes
precision
– Pick ( ai , bi ) to be the pair which maximizes precision among uncovered failures
Hidden Markov Model
• Finite set states S (we use 3 states)
• Output probability as(0),as(1)
• Transition function, determines the probability distribution of the
next state.
• The probability for a failure:
Where ps(t) is the probability to
be at state s at time t. Ps(t) is updated according to the output of time t-1.
)()1()( spatHMM tSs
s
Experimental Evaluation
A recall 0.5 precision close to 0.9
Predictor Performance – Level 3
FWC10FWC 50ExpDecay 0.99ExpDecay 0.95VW-CoverHMM
Predictor Performance – Level 6
Degradation of level-6 are harder to predict: recall 0.5 precision 0.4
FWC10FWC 50ExpDecay 0.99ExpDecay 0.95VW-CoverHMM
Predictor Performance: Conclusion
• The best predictors in level 3 and 6 are
VW-cover and HMM
• But they only slightly outperform ExpDecay0.95 which is considerable simpler to implement
Gateway Selection
Best Gateway
Worst Gateway
OptimalExpDecay0.95VW-Cover
Static:
IP Gateway
1.15%3.29%0.08%0.52%0.49%0.86%
Level 6
Best-Gateway
Worst Gateway
OptimalExpDecay0.95VW-Cover
Static:
IP Gateway
3.45%5.77%0.45%1.56%1.50%2.41%
Level 3
Gateway Selection: Conclusion
• Active gateway selection resulted in 50% reduction in the degradation-rate with respect to best single gateway.
• Static gateway selection can avoid at most 25% of degradations.
• Again ExpDecay0.95 only slightly under perform the best predictor (VW-cover).
Performance of gateway selection as a function of recency
Correlation between coast
• Gateway selection on same-coast pair resulted only in 10% reduction. Chose independent gateways
NJ-2 NJ-1 CA-2 NJ-2
levelBest gateway
Best Predictor
Best gateway
Best Predictor
61.15%1.05%1.15%0.54%
33.45%3.05%3.45%1.78%
Controlling prediction overhead
• Type of measurements:– Active measurements :
• initiate probes (SYN,ping,HTTP request).• Scalability problem.
– Passive measurements:• collected on regular traffic
• Controlling the prediction overhead:– Using less-recent measurements– Active measurements only to small set of destinations,
which cover the majority of traffic.– Cluster destinations. The measurements of one destination
can be used to predict another.
Questions??
natali@cs.tau.ac.iledith@research.att.comhaimk@cs.tau.ac.ilmansour@cs.tau.ac.il
Recommended