Practical Techniques Practical Techniques for Searches on for Searches on Encrypted Data Encrypted Data Yongdae Kim Yongdae Kim [email protected][email protected]Written by Song, Wagner, Written by Song, Wagner, Perrig Perrig
Efficient Algorithms for Locating Web Proxies Copyright, 1996
Dale Carnegie & Associates, Inc. Li-Chuan Chen [email protected]
The MITRE Corporation Co-author: Hyeong-Ah Choi George Washington
University 2001 CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS
(CISS01)
Slide 2
MITRE: Li-Chuan Chen2 Outline Research Motivation. Background.
Research Goals. Literature Review, problem formulation, and
results. Summary.
Slide 3
MITRE: Li-Chuan Chen3 Research Motivation With the increased
popularity of World- Wide-Web (WWW or Web) there are a number of
problems: Servers overloaded Internet backbone congestion Slow Web
services access
Slide 4
MITRE: Li-Chuan Chen4 Background Approaches to Reduce Server
Load: Mirror Web Sites: Replicate web server contents throughout
network. (User must select server.) Distributed Web Server: Cluster
of distributed servers acting as a single server. Web Caching:
Stores frequently requested Web documents in proxy servers or users
machines.
Slide 5
MITRE: Li-Chuan Chen5 Research Goals To reduce Web server load
and to increase efficiency and reliability of Web system
performance by caching frequently accessed documents at
strategically Web proxy locations throughout the network. We will
consider the design of optimization algorithms for achieving these
objectives. Note that most formulations of these problems are NP-
hard. We consider special cases and approximation algorithms for
Proxy Location.
Slide 6
MITRE: Li-Chuan Chen6 Proxy Location Problem Popular web sites
have to cope with an enormous number of requests. A Web proxy
(cache) sits between users and servers. Proxy returns the requested
document to the user if it is in the cache, else requests the
document from the server and stores it before returning it to the
user.
Slide 7
MITRE: Li-Chuan Chen7 Proxy Location Problem A popular Web site
places its documents closer to users by replicating them on Web
proxies throughout the network. Goal: Locate k proxy servers
throughout network of n nodes to minimize the overall cost for
accessing Web documents.
Slide 8
MITRE: Li-Chuan Chen8 Proxy Location: Literature Replacement
Algorithms: When the cache is full, how do you replace existing Web
documents with new one? [CK99, Ira97, AY97] Cache Consistency:
Deals with problem of keeping Web documents consistent with the
original copy [LC98, Din96]. Proxy Placement: Where to place
proxies so the Web documents are closer to the user? [KRS00,
LGIDS99, LDGS98, HT91]
Slide 9
MITRE: Li-Chuan Chen9 Proxy Location: Problem Formulation Given
a network G=(V,E) with n nodes and integer k. Each node v i is
associated with number of document requests w(v i ). Let D(u,u i )
denote the communication cost from u to proxy u i. Objective: Place
k proxies U = {u 1, u 2,, u k } and assign each node v to its
nearest proxy u i, to minimize the sum w(v)D(v,u i ) over all nodes
and over all proxies. Linear topology Ring topology
Slide 10
MITRE: Li-Chuan Chen10 Proxy Location: History Li, et al.
[LDGS98] presented an O(kn 2 ) algorithm for the linear
unidirectional case. We improved this to O((log k)n 2 ) and
generalized to the bidirectional case with the same running time.
Krishman, et al. [KRS00] recently presented an O(kn 3 ) time
algorithm for the unidirectional case. Later we discovered an O(kn)
time algorithm in the OR literature by Hassin and Tamir
[HT91].
Slide 11
MITRE: Li-Chuan Chen11 Proxy Allocation: Results Uni- and
bidirectional Ring Topologies: Compute optimal proxy placement in
O(n 2 ) time. (Improves O(kn 4 ) by Krishman, et al. [KRS00].)
Slide 12
MITRE: Li-Chuan Chen12 Proxy Location: Linear Topology Dynamic
Programming Formulation: Break interval [1,j] into subintervals
[1,j ] and [j +1,j]. Place one proxy in [j +1,j] and k-1 proxies in
[1,j ]. i=1 jj +1 Find j
Slide 13
MITRE: Li-Chuan Chen13 Proxy Location: Ring Topology Break ring
at any point, and reduce to linear case. Solve linear problem in
O(kn) time [HT91]. To get the optimal solution, we need to break
the ring at an optimal break point. A brute-force approach would
result in an O(kn 2 ) time algorithm.
Slide 14
MITRE: Li-Chuan Chen14 Ring Topology: Our Method Rather than
trying all n possible choices for the optimal break point, we show
that the optimal break point can be selected from a set of only n/k
candidate break points. Interleaving Property: Let x 1,x 2,,x k
denote the optimum break point sequence for the ring, and let y 1,y
2,,y k be the optimal linear break points resulting from an
arbitrary cut to the ring, then x1x1 x2x2 x3x3 x4x4 y1y1 y2y2 y3y3
y4y4
Slide 15
MITRE: Li-Chuan Chen15 Ring Topology: Our Method Break the ring
at each of these positions, and solve the linear problem for each.
By interleaving, one of these will be optimal. Select the one with
lowest cost. Using Interleaving, we can find a set of n/k candidate
break points as follows. We break the ring at an arbitrary point
and compute the optimal linear break points. Choose the interval
that has least # of node (at most n/k).
Slide 16
MITRE: Li-Chuan Chen16 Heuristics and Performance Analysis Many
of our existing results are approximations or apply to special
cases, because the underlying optimization problems are NP-hard. We
implemented heuristics for the proxy location problem for general
Internet topologies given a fixed number of servers k.
Slide 17
MITRE: Li-Chuan Chen17 Internet Topology Input Graph Used the
Tiers model, by Calvert, Doar and Zegura of Georgia Tech [CDZ97]
for Internet topology generation. Tiers is based on a 3-level
hierarchical network (WAN, MAN, LAN). 20 random Internet graphs
were generated for each of 63, 119, 267, 575, and 1144 nodes.
Slide 18
MITRE: Li-Chuan Chen18 Internet Topology Input Graph Example of
n = 575 nodes:
Slide 19
MITRE: Li-Chuan Chen19 Heuristics for Proxy Location Given
Number of Servers k: Random: Randomly select k servers and output
cost. n-(and n log n)-Random-Pairs: Start with Random. Repeatedly
select a node i at random and swap with a random existing server.
If swap is profitable, then do it. The process is repeated for n
(or n log n) times. (n log n)-Random-Clients: Similar to (n log n)-
Random-Pairs, except after randomly selecting node i, we swap with
the server giving the best cost. We assume all nodes have equal
demand, w(v i ) = 1.
Slide 20
MITRE: Li-Chuan Chen20 Heuristics for Proxy Location Given
Number of Servers k: (continued) Swap-to-Limit: Start with Random.
For each existing server j, swap j with each client i. Select the
swap that gives the best cost. Repeat until no swap improves the
cost.
Slide 21
MITRE: Li-Chuan Chen21 Simulation Results Brute-Force Search:
Computes optimal solution by generating all k-node subsets of {1,2,
, n}, and computing the cost for each subset. Requires O(n k )
time, and so is not practical for large values of k and n. Given
small values of k = 2, 3, 4, 5, 6 servers, we ran and compared the
heuristics with the brute-force algorithm.
Slide 22
MITRE: Li-Chuan Chen22 Simulation Results: Brute force versus
heuristics for k = 3: cost
Slide 23
MITRE: Li-Chuan Chen23 Simulation Results: Brute force versus
heuristics for k = 3: CPU
Slide 24
MITRE: Li-Chuan Chen24 Simulation Results: For larger values of
k = 2, 4, 8, 16, 32 servers, we ran and compared the heuristics for
proxy location given a fixed number of servers k. Also collected
statistics on the intermediate costs for n = 63, 119, 267, 575,
1144 and k = 2, 4, 8, 16, 32.
Slide 25
MITRE: Li-Chuan Chen25 Simulation Results: Heuristics given
number of servers for k = 32: cost
Slide 26
MITRE: Li-Chuan Chen26 Simulation Results: Heuristics given
number of servers for k = 32: CPU time
Slide 27
MITRE: Li-Chuan Chen27 Simulation Results: Intermediate cost
for n = 1144, k = 32:
Slide 28
MITRE: Li-Chuan Chen28 Summary We have introduced the problem
of improving efficiency of access to Web system services through
the use of proxy location. Most problem formulations are NP-hard.
We have presented algorithms for the ring topology. We have
implemented heuristics for the general case and presented
simulations for performance evaluation.
Slide 29
MITRE: Li-Chuan Chen29 Thank you!
Slide 30
MITRE: Li-Chuan Chen30 Proxy Location: Ring Submodular:
Slide 31
MITRE: Li-Chuan Chen31 Proxy Location: Ring Interleaving : X
and Y interleave but not X_opt and Y_opt
Slide 32
MITRE: Li-Chuan Chen32 Heuristics for Proxy Location Given the
cost of opening each server: We assume that there is a fixed cost
for opening each server. Random: Opens a random server and computes
the cost. Repeat as long as cost decreases. Greedy: Similar to
random, but repeatedly selects the server that gives the maximum
cost reduction (never deletes a server). Repeated until a server
cannot be added without increasing the cost.
Slide 33
MITRE: Li-Chuan Chen33 Heuristics for Proxy Location Run-(n log
n): (Charikar and Guha [CG99]). Start with Random. Repeat n log n
times: Select node i at random as a new server location. For each
existing server i, consider closing i and reassigning its clients
to i. If this is profitable do it. If the overall cost is lower,
then open i and do all of this, otherwise ignore. Run-to-Limit:
Same as Run-(n log n) except the algorithm only terminates when no
more improvement can be made.
Slide 34
MITRE: Li-Chuan Chen34 Simulation Results: Heuristics given the
cost of $20K of opening each server:
Slide 35
MITRE: Li-Chuan Chen35 Simulation Results: Heuristics given the
cost of $20K of opening a server: CPU
Slide 36
MITRE: Li-Chuan Chen36 Fault Tolerance Possible Faults: network
failures, server failures, document demand changes, network
transfer rate changes. Constraints: After any single failure, a
constant number of proxies may be relocated. Goal: Design an
algorithm to achieve approximately optimal solution to restore Web
services when server fails.
Slide 37
MITRE: Li-Chuan Chen37 Fault Tolerance Optimal placement for 5
proxies Optimal placement for 4 proxies: very costly. Proxy
fails
Slide 38
MITRE: Li-Chuan Chen38 Fault Tolerance: On-line Approach
On-line placement for 5 proxies. Not optimal but good. 1 5 4 3 2
Proxy fails 1 4 3 Move last proxy to replace failed server. Same as
4 proxy on-line placement. (2)
Slide 39
MITRE: Li-Chuan Chen39 Fault Tolerance: Server Failures On-line
Algorithm: Makes decisions to a series of requests without
knowledge of the entire input sequence. Approximate Optimality: For
any m, our on-line algorithm for 2m proxies has cost is less than
the optimal algorithm with m proxies. Strategy: Build an initial
set of m proxies using the on-line algorithm.When a server x fails:
If x is the last proxy added, then no action is needed. Else let y
be the last proxy, move ys documents to node nearest to x, and
remove y. We now have m-1 remaining servers, and approximate
optimality.
Slide 40
MITRE: Li-Chuan Chen40 Fault Tolerance: Other Problems Network
Failures: How to reroute network traffic to make use of the
existing set of proxies? How to determine the best way to place
proxies in the updated topology? (Cannot be tolerated in linear
topology, only applies to more general topologies.) Link Transfer
Rate Changes: How to move proxies when such changes are present?
(Can model this as a change in the distance function.) Temporal
Variations: Demand rate and network transfer rate varies (e.g.,
lunch time, events). Determine solutions that are approx. optimal
for each possible demand scenario and apply them accordingly.
Slide 41
MITRE: Li-Chuan Chen41 Major Contributions Efficient algorithms
for optimal proxy location on ring topologies. Use of submodularity
to produce more efficient DP solutions.
Slide 42
MITRE: Li-Chuan Chen42 Future Directions Consider ways of
strengthening our existing results either by improving efficiency
of the algorithms or by eliminating some of the assumptions that
are made. Tree topology: Generalize the proxy location results from
linear to tree topologies. Non-homogeneous proxies/Documents: We
have assumed all proxies hold the same documents. An important
generalization would be to determine both the placement of proxies
and how documents are assigned to proxies. Fault-Tolerance: How to
deal with proxy failures and fluctuations in demand.