Upload
aziza
View
29
Download
0
Embed Size (px)
DESCRIPTION
Objective-Optimal Algorithms for Long-term Web Prefetching. Ajay Kshemkalyani ( jointly with Bin Wu ) Univ. of Illinois at Chicago [email protected]. Outline. Prefetching: definition and background Survey of web prefetching algorithms Performance metrics - PowerPoint PPT Presentation
Citation preview
1
Objective-Optimal Algorithms for Long-term Web Prefetching
Ajay Kshemkalyani (jointly with Bin Wu)
Univ. of Illinois at Chicago
2
Outline
• Prefetching: definition and background• Survey of web prefetching algorithms • Performance metrics• Objective-Greedy algorithms (O(n) time)
– Hit rate greedy (also hit rate optimal)– Bandwidth greedy (also bandwidth optimal)– H/B greedy
• H/B-Optimal algorithm (expected O(n) time)• Simulation results• Variants under different constraints
3
Introduction Web caching reduces user-perceived latency
– Client-server mode– Bottleneck occurs at server side– Means of improving performance:
• local cache, proxy server, server farm, …
– Cache management: LRU, Greedy dual-size, …
On-demand caching vs. (long-term) prefetching– Prefetching is effective in dynamic environments.– Clients subscribe to web objects– Server “pushes” fresh copies into web caches– Selection of prefetched objects based on long-term
statistical characteristics, maintained by CDS
4
Introduction
• Web prefetching Caches web objects in advanceUpdated by web serverReduces retrieval latency and user access timeRequires more bandwidth and increases traffic.
• Performance metricsHit rateBandwidth usageBalance of the two
5
Object Selection Criteria
Popularity
(Access frequency)Lifetime Good FetchAPL
6
Web Object Characteristics
• Access frequencyZipf-like request model is used in web traffic
modeling.
The relationship between access frequency p and popularity rank i of web object:
i i
kwhereikp1
/1,/
7
Web Object CharacteristicsThe generalized “Zipf’s-like” distribution of web
requests is calculated as:
k is a normalization constant, i is the object ID (popularity rank), and α is a Zipf’s parameter:
0.986 (Cunha et al.),
0.75 (Nishikawa et al.) and
0.64 (Breslau et al.)
i i
kwhereikp 1
/1,/
8
Web Object Characteristics
• Size of objects (heavy-tailed Pareto, lognormal)
– Average object size:10–15 KB.
– No strong correlation between object size si and its access frequency pi.
• Access (read) pattern of objects: (Poisson)
– Average access rate api
• Lifetime of web objects (exponential)
– Average time interval between updates li– Weak correlation between access frequency pi and
lifetime li.
9
Caching/Prefetching Architecture
CachePrefetching algorithm
Reuters
NYSE
BBC
BSE
10
Caching Architecture
• Prefetching selection algorithms use as an input these global statistics:– estimates of object reference frequencies– estimates of object lifetimes
• Content distribution servers cooperate to maintain these statistics
• When an object is updated in the original server, the new version will be sent to any cache that has subscribed to it.
11
Solution space for web prefetching
• Two extreme cases:Passive caches (non-prefetching)
- Least network bandwidth and lowest cache hit rate
Prefetching all objects - 100% cache hit rate- Huge amount of unnecessary bandwidth
• Existing algorithms use different object-selecting criteria and fetch objects exceeding some threshold.
14
Existing Prefetching Algorithms
• Popularity [Markatos et al.]Popularity [Markatos et al.]Keeps the most popular objects in the systemUpdates these objects immediately when they changeCriterion – object’s popularityExpected to achieve high hit rate
• Lifetime [Jiang et al.]Lifetime [Jiang et al.]Keeps objects with longest lifetimesMostly considers the network resource demands Threshold – the expected lifetime of objectExpected to minimize bandwidth usage
15
Existing Prefetching Algorithms
• Good Fetch [Venkataramani et al.]Computes the probability that an object is accessed
before it changes.Prefetches objects with “high probability of being
accessed during their average lifetime”
Prefetches object i if the probability exceeds threshold.
Objects with higher access frequencies and longer update intervals are more likely to be prefetched
Balances the benefit (hit rate increase) against the cost (bandwidth increase) of keeping an object.
16
Existing Prefetching Algorithms
• APL [Jiang et al.]• Computes apl values of web objects.• apl of an object represents “expected number of
accesses during its lifetime”• Prefetches object i if its apl exceeds threshold.• Tends to improve hit rate; attempts to balance benefit
(hit rate) against cost (bandwidth).
• Enhanced APL: apkl– k>1, prefers objects with higher popularity (emphasize
hit rate)– k<1, prefers objects with longer lifetime (emphasize
network bandwidth)
17
Objective-Greedy Algorithms
• Existing algorithms choose prefetching criteria based on intuitions– not aimed at any specific performance metrics– consider only individual objects’
characteristics, not the global impact
• None gives optimal performance based on any metric– Simple counter-examples can be shown
18
Objective-Greedy Algorithms
• Objective-Greedy algorithms select criteria to intentionally improve performance based on various metrics.
• E.g., Hit Rate-Greedy algorithm aims to improve the overall hit rate, thus, reduce the latency of object requests.
19
Steady State Properties• Steady state hit rate for object i
is defined as freshness factor, f(i)
• Overall hit rate:
• On-demand hit rate:
1iiii
lap
lap
prefetchednotisiobject
lap
lap
prefetchedisiobjectiii
ii
h 11
i
iihpH
i
idemand ifpH )(
20
Steady State Properties
• Steady state bandwidth for object i
• Total bandwidth:
• On-demand bw:
prefetchednotisiobjectsifap
prefetchedisiobjectl
siii
i
ib ))(1(
i
ibBW
i
iidemand sifapBW ))(1(
21
Objective Metrics
• Hit rate – benefit • Bandwidth – cost• H/B model – balance of benefit and cost
Basic H/B
Enhanced H/B
Demandefetching
Demandefetching
BWBW
HitHitBH
Pr
Pr
Demandefetching
kDemandefetchingk
BWBW
HitHitBH
Pr
Pr )(
22
H/B-Greedy Prefetching
• Considers the H/B value of on-demand caching:
• If object j is prefetched, then H/B is updated to:
i i
i
ii
demand
demand
demand ifls
ifp
BW
Hit
B
H
)(
)(
Si i
i
j
j
Sii
j
demand
j
j
Si i
i
Siji
ifl
s
jfl
s
ifp
jfp
B
H
jfl
sif
l
s
jfpifp
)(
))(1(
1
)(
))(1(1
))(1()(
))(1()(
23
H/B-Greedy Prefetching• We define
as the increase factor of object j, incr(j).
• incr(j) indicates the factor by which H/B can be increased if object j is selected.
Si i
i
j
j
Sii
j
ifls
jfl
s
ifp
jfp
)(
))(1(
1
)(
))(1(1
24
H/B-Greedy Prefetching
• H/B-Greedy prefetching prefetches those m objects with greatest increase factors.
• The selection is based on the effect of prefetching individual objects on the hit rate.
• H/B-Greedy is still not an optimal algorithm in terms of H/B value.
25
26
Hit Rate-Greedy Prefetching
• To maximize the overall hit rate given the number of objects to prefetch, m, we select the m objects with the greatest hit rate contribution:
• This algorithm is optimal in terms of hit rate.
1))(1()(_
ii
ii lap
pifpiContrHR
27
Bandwidth-Greedy Prefetching• To minimize the total bandwidth given m, the
number of objects to prefetch, we select the m objects with least bandwidth contribution:
• Bandwidth-Greedy Prefetching is optimal in terms of bandwidth consumption.
iii
i
i
i
llap
sif
l
siContrBW
2))(1()(_
28
H/B-Optimal Prefetching
• Optimal algorithm for H/B metric provided by a solution to the following selection problem.
• This is equivalent to maximum weighted average problem with pre-selected items.
'
'
))(1()(
))(1()(
maxargmaxarg,','
'
Sj j
j
Si i
i
Sjj
Sii
mSSSprefmSSS jfl
sif
l
s
jfpifp
B
HS
29
Maximum Weighted Average
Maximum Weighted Average Problem:• Totally n courses, with different credit hours and scores• select m (m < n ) courses• maximize the GPA of m selected courses
Solution:
• If m=1
Then select course with highest score
What if m>1? Misleading intuition: select m courses with highest scores.
30
A Course Selection Problem (example)
• If m=2
If we select the 2 courses with highest scores: C and B.
then GPA: 93.33
But if we select C and D, then GPA: 93.57
• Question: how to select m courses such that the GPA is maximized?
Answer: Eppstein & Hirschberg solved this
Courses A B C D E F G HCredit
hours 5.0 3.0 6.0 1.0 2.0 4.0 3.0 6.0Scores 70 90 95 85 75 60 65 80
31
With Pre-selected Items
Maximum Weighted Average with pre-selected items: • Totally n courses, with different credit hours and scores• Example:
– Courses A and E must be selected, plus:– Select additional m (m is given, m<n) courses, such that:
the resulting GPA is maximized
(m=1): with D, GPA=77.7, with C, GPA=74.3, with B, GPA=77
Courses A B C D E F G H
Credit
hours5.0 3.0 1.0 6.0 2.0 4.0 3.0 6.0
Scores 70 90 95 85 75 60 65 80
32
Pre-selection clause … (example)
1) Selection domain B~I, no pre-selection, m=2optimal subset: {B,C}, GPA: 88.33
2) Selection domain B~I, A is pre-selected, m=2one candidate subset: {A,D,H}, GPA: 75.61better than: {A,B,C}, GPA: 70.625
Conclusion: {B,C} not contained in optimal subset for pre-selection problem.
Course A B C D E F G H ICredit 5.0 1.0 2.0 10.0 1.5 2.5 2.0 3.0 4.0Score 60 95 85 83 63 71 80 77 65
33
H/B-Optimal v.s. Course Selection
• The problem is formulated as:
Where v0=5.0*70+2.0*75=500, and w0=5.0+2.0=7.0, in the previous example.• Equivalent to H/B-Optimal selection problem:
'
'
0
0
,'
' maxargSj
j
Sjj
mSSS ww
vv
S
'
'
))(1()(
))(1()(
maxarg,'
'
Sj j
j
Si i
i
Sjj
Sii
mSSS jfl
sif
l
s
jfpifp
S
34
H/B-Optimal v.s. Course Selection
35
H/B-Optimal Algorithm Design • The selection of m courses is not trivial• For course i, we define auxiliary function
• And for a given number m, we define a Utility function
xm
ww
m
vvxr iii )()()( 00
'
',')(max)(
Sii
SSmSxrxF
36
H/B-Optimal Algorithm Design
• Lemma 1
Suppose A* is the maximum GPA we are computing, then for any subset S’ S and |S|=m
Thus, the optimal subset contains those courses that have the m largest ri (A*) values
.'0)().2
;0)().1
'
*
'
*
subsetoptimaltheisSiffAr
Ar
Sii
Sii
37
H/B-Optimal Algorithm Design
• n=6, m=4• Each line is ri (x)• Assume we know A*
• Optimal subset has the 4 courses
with largest ri (A*) values.
• Dilemma: A* is unknown
38
H/B-Optimal Algorithm Design
*
*
*
0)().3
0)().2
0)().1
AxiffxF
AxiffxF
AxiffxF
• Lemma 2:
• Lemma 2 used to
narrow range of A*
(Xl , Xr) is the current
A*-range
39
H/B-Optimal Algorithm Design
• If F (xl) > 0 and F (xr) < 0, then A* in (xl, xr)
• Compute F((xl+xr)/2)
- if F((xl+xr)/2) > 0, then A* > (xl+xr)/2
- if F((xl+xr)/2) < 0, then A* < (xl+xr)/2
- if F((xl+xr)/2) = 0, then A* = (xl+xr)/2;
(Lemma 2)
• Narrow the range of A* by half (use binary search)
40
H/B-Optimal Algorithm Design (Idea)
• Why keep on narrowing down the range of A* ?
– If intersection of rj (x) and rk (x) falls out of range, then
• the ordering of rj (x) and rk (x) is determined within the range, so is rj (A*) and rk (A*), by comparing their slopes.
– If the range is narrow enough that there are no intersections of r (x) lines within the range then
• the total ordering of all r (A*) values is determined.
– Now our optimal problem is solved: just select the m candidates with highest r (A*) values.
41
H/B-Optimal Algorithm Design
• However, the total ordering requires O(n2) time complexity
• A randomized approach is used instead, this randomized algorithm:– Iteratively reduces the problem domain into a
smaller one.– The algorithm maintains 4 sets:
• X, Y, E, Z, initially empty• (larger, smaller, equal, or undetermined r)
42
H/B-Optimal Algorithm DesignIn each iteration, randomly select a course i, compare it with each other course k. One of 4 possibilities:
1). if rk(A*) > ri(A*): insert k in set X
2). if rk(A*) < ri(A*): insert k in set Y
3). if wk=wi and vk=vi: insert k in set E4). if undetermined: insert k in set Z Now do the following loop:
loop:narrow the range of A* by half
compare ri(A*) with rk’(A*) for k’ in Zif appropriate, move k’ to X or Y, accordingly
until |Z| is sufficiently small (i.e., |Z| < |S|/32)
43
H/B-Optimal Algorithm Design
• After the loop, either X or Y has “enough” members to ensure speedy “convergence”.
• Next, examine and compare the sizes of X, Y and E:– |X|+|E| > m // delete Y
– |Y|+|E| > |S|-m //combine X and E into 1 course
44
H/B-Optimal Algorithm Design
1). If |X|+|E| > m:
At least m courses whose r(A*) values are greater than r(A*) value of all courses in Y. All members in Y may be removed. Then: |S| = |S| - |Y|
45
H/B-Optimal Algorithm Design
2). If |Y|+|E| > |S|-m: All members in X are among the top m courses. All members in X must be in the optimal set. Collapse X into a single course (This course is included in the final optimal set). Then:
|S| = |S| - |X| + 1;
m = m - |X| + 1.
46
H/B-Optimal Algorithm Design• In either case, the resulting domain has reduced size.• By iteratively removing or collapsing courses, the
problem domain finally has only one course remaining: formed by collapsing all courses in optimal set.
• Expected time complexity:
(Assume Sb is the domain before iteration and Sa after.)
1). Each iteration takes expected time O(|Sb|)
2). Expected size |Sa| = (207/256) |Sb|
The recurrence relation of the iteration:T(n) = O(n) + T[(207/256)n]
Resolves to linear time complexity.
47
H/B-Greedy v.s. H/B-Optimal
• H/B-Greedy is an approximation to H/B-Optimal
• H/B-Greedy achieves higher H/B metric than
any existing algorithms.
• H/B-Greedy easier to implement than H/B-Optimal.– Lower constant– Easily adjust to updates of object characteristics
48
Simulation Results
• Evaluation of H/B Greedy PrefetchingFigure 1 : H/B , for total object number =1,000.Figure 2 : H/B , for total object number =10,000.Figure 3 : H/B , for total object number =100,000.Figure 4 : H/B , for total object number
=1,000,000.
• Evaluation of H-Greedy and B-Greedy algorithmFigure 5 : H-Greedy algorithm.Figure 6 : B-Greedy algorithm.Figure 7 : B-Greedy algorithm, zoomed in.
49
Figure 1: H/B, for total object number=1,000
50
Figure 2: H/B, for total object number=10,000
51
Figure 3: H/B, total object number=100,000
52
Figure 4: H/B, total object number=1,000,000
53
Figure 5: H-Greedy algorithm
54
Figure 6: B-Greedy algorithm
55
Figure 7: B-Greedy, Bandwidth magnified
56
Performance Comparison
Table 1. Performance comparison of different algorithms in terms of various metrics. (Lower values represents better performance)
57
Prefetching under Different Constraints
58
H under Cache Size Constraint
• Hit rate: Knapsack problem. • Objective:
Mapping:H contribution → item valuesize → item weightcache size constraint → knapsack capacity
– Bandwidth: Not a problem
i i
i CsicontrH )(_
59
H/B-Greedy under Cache Size Constraint
– H/B-Greedy : Knapsack problem. Objective:
Mapping:
H/B contribution → item value
size → item weight
cache size constraint → Knapsack capacity
ii
i
CsicontrB
H)(_
60
H/B-Optimal under Cache Size Constraint
– H/B-Optimal: Objective:
Mapping? Not a knapsack problem
Why? Objective not a sum of single object property
Solution? Yes
Polynomial algorithm? Open
i
i CsB
H
61
H under Bandwidth Constraint
Hit rate: Knapsack problem. Objective:
Mapping:H contribution → item value
B contribution → item weight
Bandwidth constraint → knapsack capacity
– Bandwidth: Not a problem
i i
BCicontrBicontrH )(_)(_
62
H/B-Greedy under Bandwidth Constraint
– H/B-Greedy : Knapsack problem. Objective:
Mapping:
H/B contribution → item value
B contribution → item weight
Bandwidth constraint → Knapsack capacity
63
H/B-Optimal under Bandwidth Constraint
– H/B-Optimal: Objective:
Mapping? Not a knapsack problem
Why? Objective not a sum of single object property
Solution? Yes
Polynomial algorithm? Open
i
BCicontrBB
H)(_
64
Conclusions
• Proposed Objective-Greedy prefetching algorithms, that are superior to Popularity, Good Fetch, APL, & Lifetime– Hit rate greedy (this is also optimal)– Bandwidth greedy (this is also optimal)– H/B greedy
• The proposed algos need O(n) time
• Proposed H/B-Optimal algorithm, that has O(n) expected time
65
Conclusions (contd.)
• Simulations show significant gains of proposed algorithms over existing algorithms
• H/B-greedy is almost as good as H/B-optimal, both O(n) time
• Future work: – Consider power control, BW, latency– Address H/B-Optimal/C, H/B-Optimal/B– Determine when H/B attains global max
66
Thank you!