Upload
rachel-wagner
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Squirrel: A peer-to-peer web
cacheSitaram Iyer (Rice University)
Joint work withAnt Rowstron (MSR Cambridge)Peter Druschel (Rice University)
PODC 2002 / Sitaram Iyer / Tuesday July 23 / Monterey, CA
Web Caching
1. Latency, 2. External traffic,3. Load on web servers and routers.
Deployed at: Corporate network boundaries, ISPs, Web Servers, etc.
Centralized
Web Cache
Web Cache
BrowserBrowser
Cache
WebServer
BrowserBrowser
Cache
Client
Client
InternetCorporate LAN
InternetCorporate LAN
Cooperative Web Cache
BrowserBrowser
Cache
WebServer
BrowserBrowser
Cache
Client
ClientWeb
Cache
Web Cache
Web Cache
Web Cache
Web Cache
Internet
Decentralized Web Cache
Browser
WebServer
BrowserBrowser
Cache
Client
Client
Corporate LAN
Browser
Cache
Squirrel
Distributed Hash Table
Peer-to-peer location service: Pastry
• Completely decentralized and self-organizing• Fault-tolerant, scalable, efficient
Operations:
Insert(k,v)Lookup(k)
k6,v6
k1,v1
k5,v5
k2,v2
k4,v4
k3,v3
nodes
<key,value>
Peer-to-peer routing and
location substrate
Why peer-to-peer?
1. Cost of dedicated web cache No additional hardware
2. Administrative effortSelf-organizing network
3. Scaling implies upgrading Resources grow with clients
Setting
•Corporate LAN
•100 - 100,000 desktop machines
•Located in a single building or campus
•Each node runs an instance of
Squirrel•Sets it as the browser’s proxy
Mapping Squirrel onto Pastry
Two approaches:
• Home-store
• Directory
Home-store model
client
homeLANInternet
URL hash
Home-store model
client
home
…that’s how it works!
Directory model
Client nodes always cache objects locally.
Home-store: home node also stores objects.
Directory: the home node only stores pointers to recent clients, and forwards requests.
Directory model
client
home
InternetLAN
Directory model
client
homeRandomly choose
entry from table
Directory: Advantages
Avoids storing unnecessary copies of objects.
Rapidly changing directory for popular objects seems to improve load balancing.
Home-store scheme can incur hotspots.
Directory: Disadvantages
Cache insertion only happens at clients, so:
• active clients store all the popular objects,
• inactive clients waste most of their storage.
Implications:1. Reduced cache size.2. Load imbalance.
Directory: Load spike example
• Web page with many embedded images, or
• Periods of heavy browsing.
Many home nodes point to such clients!
Evaluate …
Trace characteristics
Microsoft in : Redmond
Cambridge
Total duration 1 day 31 days
Number of clients 36,782 105
Number of HTTP requests
16.41 million
0.971 million
Peak request rate606 req/sec
186 req/sec
Number of objects5.13 million
0.469 million
Number of cacheable objects
2.56 million
0.226 million
Mean cacheable object reuse
5.4 times 3.22 times
Total external traffic
85
90
95
100
105
0.001 0.01 0.1 1 10 100
Directory
Home-store
No web cache
Centralized cache
Redm
ond
[low
er
is b
ett
er]
Per-node cache size (in MB)Tota
l exte
rnal tr
affi
c (
GB
)
Total external traffic
5.5
5.6
5.7
5.8
5.9
6
6.1
0.001 0.01 0.1 1 10 100
Tota
l exte
rnal tr
affi
c (
GB
)[l
ow
er
is b
ett
er]
Per-node cache size (in MB)
Directory
Home-store
No web cache
Centralized cache
Cam
brid
ge
LAN Hops
0%
20%
40%
60%
80%
100%
0 1 2 3 4 5 6
Total hops within the LAN
Redm
ond
Centralized Home-store Directory
% o
f cach
eab
le r
eq
uests
LAN Hops
0%
20%
40%
60%
80%
100%
0 1 2 3 4 5
% o
f cach
eab
le r
eq
uests
Centralized Home-store Directory
Cam
brid
ge
Total hops within the LAN
Load in requests per sec
1
10
100
1000
10000
100000
0 10 20 30 40 50
Nu
mb
er
of
tim
es o
bserv
ed
Max objects served per-node / second
Home-storeDirectoryRed
mon
d
Load in requests per sec
1
10
100
1000
10000
100000
1e+06
1e+07
0 10 20 30 40 50
Nu
mb
er
of
tim
es o
bserv
ed
Max objects served per-node / second
Home-storeDirectoryCa
mbr
idge
Load in requests per min
1
10
100
0 50 100 150 200 250 300 350
Nu
mb
er
of
tim
es o
bserv
ed
Max objects served per-node / minute
Home-storeDirectoryRed
mon
d
Load in requests per min
1
10
100
1000
10000
0 20 40 60 80 100 120
Nu
mb
er
of
tim
es o
bserv
ed
Max objects served per-node / minute
Home-storeDirectoryCa
mbr
idge
Fault tolerance
Sudden node failures result inpartial loss of cached content.
Home-store: Proportional to failed nodes.
Directory: More vulnerable.
Fault tolerance
Home-store
Directory
Redmond
Mean 1%
Max 1.77%
Mean 1.71%
Max 19.3%
Cambridge
Mean 1%
Max 3.52%
Mean 1.65%
Max 9.8%
If 1% of Squirrel nodes abruptly crash, the fraction of lost cached content is:
Conclusions
• Possible to decentralize web caching.
• Performance comparable to a centralized web cache,
• Is better in terms of cost, scalability, and administration effort, and
• Under our assumptions, the home-store scheme is superior to the directory scheme.
Other aspects of Squirrel
•Adaptive replication–Hotspot avoidance– Improved robustness
•Route caching–Fewer LAN hops
Thanks.
(backup) Storage utilization
Redmond Home-store Directory
Total 97641 MB 61652 MB
Mean per-node 2.6 MB 1.6 MB
Max per-node 1664 MB 1664 MB
(backup) Fault tolerance
Home-store Directory
EquationsMean H/OMax Hmax /O
Mean (H+S)/OMax max(Hmax,Smax)/O
Redmond
Mean 0.0027%Max 0.0048%
Mean 0.198%Max 1.5%
Cambridge
Mean 0.95%Max 3.34%
Mean 1.68%Max 12.4%
(backup) Full home-store protocol
server
client
otherother
req
home
req
req
a : object or notmod from home
b : object or notmod from origin3
1
b2
(WAN)(LAN)
origin
b : req
(backup) Full directory protocol
dir
server
servere : cGET req
origin
origin
otherother
req
home
req
client
req
2
b : not-modified
3
e3
21c ,e : req
c ,e : object1
4a , d
2a , d : req 1a : no dir, go to origin. Also d2
3
1
not-modifiedobject or
dele-gate
(backup) Peer-to-peer Computing
Decentralize a distributed protocol:– Scalable
– Self-organizing
– Fault tolerant
– Load balanced
Not automatic!!
Decentralized Web Cache
Browser
Browser
Browser
Cache
Browser
Cache
Web
Server
LAN Internet
Challenge
Decentralized web caching algorithm:
Need to achieve those benefits in practice!
Need to keep overhead unnoticeably low.
Node failures should not become significant.
Peer-to-peer routing, e.g., Pastry
Peer-to-peer object location and routing substrate = Distributed Hash Table.
Reliably maps an object key to a live node.
Routes in log16(N) steps
(e.g. 3-4 steps for 100,000 nodes)
Home-store is better!
Simpler home-store scheme achieves load balancing by hash function randomization.
Directory scheme implicitly relies on access patterns for load distribution.
Directory scheme seems better…
Avoids storing unnecessary copies of objects.
Rapidly changing directory for popular objects results in load balancing.
Interesting difference
Consider:– Web page with many images, or– Heavily browsing node
Directory: many pointers to some
node.
Home-store: natural load balancing.
Evaluate …
Fault tolerance
Home-store
Directory
Redmond
Mean
0.0027%
Max 0.0048%
Mean 0.2%
Max 1.5%
Cambridge
Mean
0.95%
Max 3.34%
Mean 1.7%
Max 12.4%
When a single Squirrel node crashes, the fraction of lost cached content is: