Scaling Data Servers via Cooperative Caching
Siddhartha Annapureddy
New York University
Committee:David Mazieres, Lakshminarayanan Subramanian, Jinyang Li,
Dennis Shasha, Zvi Kedem
Siddhartha Annapureddy Thesis Defense
Motivation
New functionality over the Internet
I Large number of users consume many (large) filesI YouTube, Azureus etc. for movies – DVDs no moreI Data dissemination to large distributed farms
Siddhartha Annapureddy Thesis Defense
Current approaches
Server
Client Client Client
I Scalability ↓ – Server’s uplink saturates
I Operation takes long time to finish
Siddhartha Annapureddy Thesis Defense
Current approaches
Server
Client Client Client
I Scalability ↓ – Server’s uplink saturatesI Operation takes long time to finish
Siddhartha Annapureddy Thesis Defense
Current approaches (contd.)
I Server farms replace central serverI Scalability problem persistsI Notoriously expensive to maintain
I Content Distribution Networks(CDNs)
I Still quite expensive
I Peer-to-peer (P2P) systemsI A new development
Siddhartha Annapureddy Thesis Defense
Peer-to-peer (P2P) systems
I Low-end central server
I Large client population
I File usually divided into chunks/blocksI Use client resources to disseminate file
I Bandwidth, disk space etc.
I Clients organized into a connected overlayI Each client knows only a few other clients
How to design P2P systems to scale data servers?
Siddhartha Annapureddy Thesis Defense
P2P scalability – Design principles
In order to achieve scalability, need to avoid hotspots
I Keep bw load on server to a minimumI Server gives data only to few users
I Leverage participants to distribute to other users
I Maintenance overhead with users to be kept low
I Keep bw load on users to a minimumI Each node interacts with only a small subset of users
Siddhartha Annapureddy Thesis Defense
P2P systems – Themes
I Usability – Exploiting P2P for a range of systemsI File systems, user-level apps etc.
I Security issues in such wide-area P2P systemsI Providing data integrity, privacy, authentication
I Cross file(system) sharing – Use other groupsI Make use of all available sources
I Topology management – Arrange nodes efficientlyI Underlying network (locality properties)I Node capacities (bandwidth, disk space etc.)I Application-specific (certain nodes together work well)
I Scheduling dissemination of chunks efficientlyI Nodes only have partial information
Siddhartha Annapureddy Thesis Defense
P2P systems – Themes
I Usability – Exploiting P2P for a range of systemsI File systems, user-level apps etc.
I Security issues in such wide-area P2P systemsI Providing data integrity, privacy, authentication
I Cross file(system) sharing – Use other groupsI Make use of all available sources
I Topology management – Arrange nodes efficientlyI Underlying network (locality properties)I Node capacities (bandwidth, disk space etc.)I Application-specific (certain nodes together work well)
I Scheduling dissemination of chunks efficientlyI Nodes only have partial information
Siddhartha Annapureddy Thesis Defense
P2P systems – Themes
I Usability – Exploiting P2P for a range of systemsI File systems, user-level apps etc.
I Security issues in such wide-area P2P systemsI Providing data integrity, privacy, authentication
I Cross file(system) sharing – Use other groupsI Make use of all available sources
I Topology management – Arrange nodes efficientlyI Underlying network (locality properties)I Node capacities (bandwidth, disk space etc.)I Application-specific (certain nodes together work well)
I Scheduling dissemination of chunks efficientlyI Nodes only have partial information
Siddhartha Annapureddy Thesis Defense
P2P systems – Themes
I Usability – Exploiting P2P for a range of systemsI File systems, user-level apps etc.
I Security issues in such wide-area P2P systemsI Providing data integrity, privacy, authentication
I Cross file(system) sharing – Use other groupsI Make use of all available sources
I Topology management – Arrange nodes efficientlyI Underlying network (locality properties)I Node capacities (bandwidth, disk space etc.)I Application-specific (certain nodes together work well)
I Scheduling dissemination of chunks efficientlyI Nodes only have partial information
Siddhartha Annapureddy Thesis Defense
Shark – Motivation
Scenario – Want to test a new application on a hundred nodes
Workstation
Workstation
Workstation
Workstation
WorkstationWorkstation
Workstation Workstation
WorkstationWorkstation
Problem – Need to push software to all nodes and collect results
Siddhartha Annapureddy Thesis Defense
Current approaches
Distribute from a central server
Client
Client
Client
Network A Network B
Server
rsync, unison, scp
– Server’s network uplink saturates
– Wastes bandwidth along bottleneck links
Siddhartha Annapureddy Thesis Defense
Current approaches
File distribution mechanisms
BitTorrent, Bullet
+ Scales by offloading burden on server
– Client downloads from half-way across the world
Siddhartha Annapureddy Thesis Defense
Inherent problems with copying files
Users have to decide a priori what to ship
I Ship too much – Waste bandwidth, takes long time
I Ship too little – Hassle to work in a poor environment
Idle environments consume disk space
I Users are loath to cleanup ⇒ Redistribution
I Need low cost solution for refetching files
Manual management of software versions
I Point /usr/local to a central server
Illusion of having development environmentPrograms fetched transparently on demand
Siddhartha Annapureddy Thesis Defense
Networked file systems
+ Know how to deploy these systems
+ Know how to administer such systems
+ Simple accountability mechanisms
Eg: NFS, AFS, SFS etc.
Siddhartha Annapureddy Thesis Defense
Problem: Scalability
0
100
200
300
400
500
600
700
800
0 20 40 60 80 100
Tim
e to
fini
sh R
ead
(sec
)
Unique nodes
40 MB readSFS
Very slow ≈ 775sMuch better !!! ≈ 88s (9x better)
Siddhartha Annapureddy Thesis Defense
Problem: Scalability
0
100
200
300
400
500
600
700
800
0 20 40 60 80 100
Tim
e to
fini
sh R
ead
(sec
)
Unique nodes
40 MB readSFSShark
Very slow ≈ 775sMuch better !!! ≈ 88s (9x better)
Siddhartha Annapureddy Thesis Defense
Let’s design this new filesystem
ServerClient
Vanilla SFS
I Central server model
Siddhartha Annapureddy Thesis Defense
Scalability – Client-side caching
ServerClient
Cache
Large client cache a la AFS
I Whole file caching
I Scales by avoiding redundant data transfers
I Leases to ensure NFS-style cache consistency
Siddhartha Annapureddy Thesis Defense
Scalability – Cooperative caching
Clients fetch data from each other and offload burden from server
Client
Client
Client
Server
ClientClient
I With multiple clients, must address bandwidth concerns
Siddhartha Annapureddy Thesis Defense
Scalability – Cooperative caching
Clients fetch data from each other and offload burden from server
Client
Client
Client
Server
ClientClient
I Shark clients maintain distributed index
Siddhartha Annapureddy Thesis Defense
Scalability – Cooperative caching
Clients fetch data from each other and offload burden from server
Client
Client
Client
Server
ClientClient
I Shark clients maintain distributed index
I Fetch a file from multiple other clients in parallel
Siddhartha Annapureddy Thesis Defense
Scalability – Cooperative caching
Clients fetch data from each other and offload burden from server
Client
Client
Client
Server
ClientClient
I Shark clients maintain distributed index
I Fetch a file from multiple other clients in parallel
I Read-heavy workloads – Writes still go to server
Siddhartha Annapureddy Thesis Defense
P2P systems – Themes
I Usability – Exploiting P2P for a range of systems
I Cross file(system) sharing – Use other groups
I Security issues in such wide-area P2P systems
I Topology management – Arrange nodes efficiently
I Scheduling dissemination of chunks efficiently
Siddhartha Annapureddy Thesis Defense
Cross file(system) sharing – Application
Linux distribution with LiveCDs
Sharkcd
Sharkcd
Sharkcd
Sharkcd
Knoppix
Knoppix Mirror
Knoppix
Knoppix
Slax
Slax
Slax
I LiveCD – Run an entire OS without using hard diskI But all your programs must fit on a CD-ROMI Download dynamically from server but scalability problemsI Knoppix and Slax users form global cache – Relieve servers
Siddhartha Annapureddy Thesis Defense
Cross file(system) sharing
Global cooperative cache regardless of origin servers
Client
ClientClient
Client
Client
Client
ClientClient
Client
Client
Slax Knoppix
I Two groups of clients accessing Slax/Knoppix servers
I Client groups share a large amount of software
I Such clients automatically form a global cache
Siddhartha Annapureddy Thesis Defense
Cross file(system) sharing
Global cooperative cache regardless of origin servers
Client
ClientClient
Client
Client
Client
ClientClient
Client
Client
KnoppixSlax
I Two groups of clients accessing Slax/Knoppix servers
I Client groups share a large amount of software
I Such clients automatically form a global cache
Siddhartha Annapureddy Thesis Defense
Cross file(system) sharing
Global cooperative cache regardless of origin servers
Client
ClientClient
Client
Client
Client
ClientClient
Client
Client
KnoppixSlax
I Two groups of clients accessing Slax/Knoppix servers
I Client groups share a large amount of software
I BitTorrent supports only per-file groups
Siddhartha Annapureddy Thesis Defense
Content-based chunking
Client
Client
Client
Server
ClientClient
I Single overlay for all filesystems and content-based chunksI Chunks – Variable-sized blocks based on content
I Chunks are better – Exploit commonalities across filesI LBFS-style chunks preserved across versions, concatenations
[Muthitacharoen et al. A Low-Bandwidth Network File System. SOSP ’01]
Siddhartha Annapureddy Thesis Defense
P2P systems – Themes
I Usability – Exploiting P2P for a range of systems
I Cross file(system) sharing – Use other groups
I Security issues in such wide-area P2P systems
I Topology management – Arrange nodes efficiently
I Scheduling dissemination of chunks efficiently
Siddhartha Annapureddy Thesis Defense
Security issues – Client authentication
Client Client
Client
Server
Client
Client
TraditionalAuthentication
I Traditionally, server authenticated read requests using uids
I Challenge – Interaction between distrustful clients
Siddhartha Annapureddy Thesis Defense
Security issues – Client authentication
Client Client
Client
Server
Client
Client
TraditionalAuthentication From Token
Authenticator Derived
I Traditionally, server authenticated read requests using uids
I Challenge – Interaction between distrustful clients
Siddhartha Annapureddy Thesis Defense
Security issues – Client communication
I Client should be able to check integrity of downloaded chunk
I Client should not send chunks to other unauthorized clients
I An eavesdropper shouldn’t be able to obtain chunk contents
I Accomplish this without a complex PKI
I Keep the number of rounds to a minimum
Siddhartha Annapureddy Thesis Defense
File metadata
Client ServerMetadata:(fh,offset,count)GET_TOKEN
Chunk tokens
I Possession of token implies server permissions to read
I Tokens are a shared secret between authorized clients
I Tokens can be used to check integrity of fetched data
Given a chunk B...
I Chunk token TB = H(B)
I H is a collision resistant hash function
Siddhartha Annapureddy Thesis Defense
File metadata
Client ServerMetadata:(fh,offset,count)GET_TOKEN
Chunk tokens
I Possession of token implies server permissions to read
I Tokens are a shared secret between authorized clients
I Tokens can be used to check integrity of fetched data
Given a chunk B...
I Chunk token TB = H(B)
I H is a collision resistant hash function
Siddhartha Annapureddy Thesis Defense
File metadata
Client ServerMetadata:(fh,offset,count)GET_TOKEN
Chunk tokens
I Possession of token implies server permissions to read
I Tokens are a shared secret between authorized clients
I Tokens can be used to check integrity of fetched data
Given a chunk B...
I Chunk token TB = H(B)
I H is a collision resistant hash function
Siddhartha Annapureddy Thesis Defense
File metadata
Client ServerMetadata:(fh,offset,count)GET_TOKEN
Chunk tokens
I Possession of token implies server permissions to read
I Tokens are a shared secret between authorized clients
I Tokens can be used to check integrity of fetched data
Given a chunk B...
I Chunk token TB = H(B)
I H is a collision resistant hash function
Siddhartha Annapureddy Thesis Defense
File metadata
Client ServerMetadata:(fh,offset,count)GET_TOKEN
Chunk tokens
I Possession of token implies server permissions to read
I Tokens are a shared secret between authorized clients
I Tokens can be used to check integrity of fetched data
Given a chunk B...
I Chunk token TB = H(B)
I H is a collision resistant hash function
Siddhartha Annapureddy Thesis Defense
Security protocol
Client
Receiver
Client
Source
I RC,RP – Random nonces to ensure freshness
I AuthC – Authenticator to prove receiver has tokenI AuthC = MAC (TB, “Auth C”, C,P,RC,RP)
I K – Key to encrypt chunk contentsI K = MAC (TB, “Encryption”, C,P,RC,RP)
Siddhartha Annapureddy Thesis Defense
Security protocol
Client
Receiver
Client
Source
RC
RP
I RC,RP – Random nonces to ensure freshness
I AuthC – Authenticator to prove receiver has tokenI AuthC = MAC (TB, “Auth C”, C,P,RC,RP)
I K – Key to encrypt chunk contentsI K = MAC (TB, “Encryption”, C,P,RC,RP)
Siddhartha Annapureddy Thesis Defense
Security protocol
Client
Receiver
Client
Source
RC
RP
IB, AuthC
EK(B)
I RC,RP – Random nonces to ensure freshness
I AuthC – Authenticator to prove receiver has tokenI AuthC = MAC (TB, “Auth C”, C,P,RC,RP)
I K – Key to encrypt chunk contentsI K = MAC (TB, “Encryption”, C,P,RC,RP)
Siddhartha Annapureddy Thesis Defense
Security protocol
Client
Receiver
Client
Source
RC
RP
IB, AuthC
EK(B)
I RC,RP – Random nonces to ensure freshness
I AuthC – Authenticator to prove receiver has tokenI AuthC = MAC (TB, “Auth C”, C,P,RC,RP)
I K – Key to encrypt chunk contentsI K = MAC (TB, “Encryption”, C,P,RC,RP)
Siddhartha Annapureddy Thesis Defense
Security properties
Client
Receiver
Client
Source
RC
RP
IB, AuthC
EK(B)
Client can check integrity of downloaded chunk
I Client checks H(Downloaded chunk)?= TB
Source should not send chunks to unauthorized clients
I Malicious clients cannot send correct AuthC
Eavesdropper shouldn’t get chunk contents
I All communication encrypted with K
Siddhartha Annapureddy Thesis Defense
P2P systems – Themes
I Usability – Exploiting P2P for a range of systems
I Cross file(system) sharing – Use other groups
I Security issues in such wide-area P2P systems
I Topology management – Arrange nodes efficiently
I Scheduling dissemination of chunks efficiently
Siddhartha Annapureddy Thesis Defense
Topology management
Client Client
Client
Server
Client
Client
Fetch Metadata
I Fetch metadata from the server
I Look up clients caching needed chunks in overlay
I Overlay is common to all file systems
I Overlay organized like a DHT – Get and PutSiddhartha Annapureddy Thesis Defense
Topology management
Client Client
Client
Server
Client
Client
Lookup Clients
I Fetch metadata from the server
I Look up clients caching needed chunks in overlay
I Overlay is common to all file systems
I Overlay organized like a DHT – Get and PutSiddhartha Annapureddy Thesis Defense
Topology management
Client Client
Client
Server
Client
Client
Lookup Clients
I Fetch metadata from the server
I Look up clients caching needed chunks in overlay
I Overlay is common to all file systems
I Overlay organized like a DHT – Get and PutSiddhartha Annapureddy Thesis Defense
Topology management
Client Client
Client
Server
Client
Client
Lookup Clients
I Fetch metadata from the server
I Look up clients caching needed chunks in overlay
I Overlay is common to all file systems
I Overlay organized like a DHT – Get and PutSiddhartha Annapureddy Thesis Defense
Overlay – Get operation
Client Client
Client
Server
Client
Client
Discover Clients
I For every chunk B, there’s indexing key IBI IB used to index clients caching BI Cannot set IB = TB, as TB is a secret
I IB = MAC (TB, “Indexing Key”)
Siddhartha Annapureddy Thesis Defense
Overlay – Put operation
Register as a source
I Client now becomes a source for the downloaded chunk
I Client registers in distributed index – PUT(IB,Addr)
Chunk Reconciliation
I Reuse TCP connection to download more chunks
I Exchange mutually needed chunks w/o indexing overhead
Siddhartha Annapureddy Thesis Defense
Overlay – Locality awareness
Client
Client
Client
Client
Network A Network B
I Overlay organized as clusters based on latency
I Preferentially returns sources in same cluster as the client
I Hence, chunks usually transferred from nearby clients
[Freedman et al. Democratizing content publication with Coral. NSDI ’04]
Siddhartha Annapureddy Thesis Defense
Topology management – Shark vs. BitTorrent
BitTorrent
I Neighbours “fixed”
I Makes cross file sharing difficult
Shark
I Chunk-based overlay
I Neighbours based on data
I Enables cross file sharing
Server
Client
ClientClient
Client
Siddhartha Annapureddy Thesis Defense
Evaluation
I How does Shark compare with SFS? With NFS?
I How scalable is the server?
I How fair is Shark across clients?
I Which order is better? Random or Sequential
Siddhartha Annapureddy Thesis Defense
Emulab – 100 Nodes on LAN
0
20
40
60
80
100
0 100 200 300 400 500 600 700 800
Perc
enta
ge c
ompl
eted
with
in ti
me
Time since initialization (sec)
40 MB readShark, rand, negotiationNFSSFS
I Shark – 88sI SFS – 775s (≈ 9x better), NFS – 350s (≈ 4x better)I SFS less fair because of TCP backoffs
Siddhartha Annapureddy Thesis Defense
PlanetLab – 185 Nodes
0
20
40
60
80
100
0 500 1000 1500 2000 2500
Perc
enta
ge c
ompl
eted
with
in ti
me
Time since initialization (sec)
40 MB readSharkSFS
I Shark ≈ 7 min – 95th Percentile
I SFS ≈ 39 min – 95th Percentile (5x better)
I NFS – Triggered kernel panics in server
Siddhartha Annapureddy Thesis Defense
Data pushed by server
SFS Shark0.0
0.5
1.0
Nor
mal
ized
ban
dwid
th
7400
954
I Shark vs SFS – 23 copies vs 185 copies (8x better)I Overlay not responsive enoughI Torn connections, timeouts force going to server
Siddhartha Annapureddy Thesis Defense
Data served by Clients
0
20
40
60
80
100
120
140
0 20 40 60 80 100 120 140 160 180
Band
widt
h tra
nsm
itted
(MB)
Unique Shark proxies
PlanetLab hostsShark, 40MB
I Maximum contribution ≈ 3.5 copies
I Median contribution ≈ 0.75 copies
Siddhartha Annapureddy Thesis Defense
P2P systems – Themes
I Usability – Exploiting P2P for a range of systems
I Cross file(system) sharing – Use other groups
I Security issues in such wide-area P2P systems
I Topology management – Arrange nodes efficiently
I Scheduling dissemination of chunks efficiently
Siddhartha Annapureddy Thesis Defense
Fetching Chunks – Order Matters
I Scheduling problem – What order to fetch chunks of file?
I Natural choices – Random or Sequential
Intuitively, when many clients start simultaneouslyI Random
I All clients fetch independent chunksI More chunks become available in the cooperative cache
I SequentialI Better disk I/O scheduling on the serverI Client that downloads most chunks alone adds to cache
Siddhartha Annapureddy Thesis Defense
Emulab – 100 Nodes on LAN
0
20
40
60
80
100
0 100 200 300 400 500 600 700 800
Perc
enta
ge c
ompl
eted
with
in ti
me
Time since initialization (sec)
40 MB readShark, randShark, seq
I Random – 133sI Sequential – 203sI Random Wins !!! – 35% better
Siddhartha Annapureddy Thesis Defense
Performance characteristics of workloads
I For Shark, what matters is throughputI Operation should finish as soon as
possible
Utilit
y →
Throughput →
I Another interesting class of applicationsI Video streaming – RedCarpet
Utilit
y →
Throughput →
I Study the scheduling problem for Video-on-DemandI Press a button, wait a little while, and watch playback
Siddhartha Annapureddy Thesis Defense
Challenges for VoD
I Near VoDI Small setup time to play
videos
I High sustainable goodputI Largest slope osculating block
arrival curveI Highest video encoding rate
system can support
Goal: Do block scheduling to achieve low setup time and highgoodput
Siddhartha Annapureddy Thesis Defense
System design – Outline
I Components based on BitTorrentI Central server (tracker + seed)I Users interested in the data
I Central serverI Maintains a list of nodes
I Each node finds neighboursI Contacts the server for thisI Another option – Use gossip
protocols
I Exchange data with neighboursI Connections are bi-directional
Siddhartha Annapureddy Thesis Defense
Simulator – Outline
I All nodes have unit bandwidth capacityI 500 nodes in most of our experiments
I Each node has 4-6 neighboursI Simulator is round-based
I Block exchanges done at each roundI System moves as a whole to the next round
I File divided into 250 blocksI Segment is a set of consecutive blocksI Segment size = 10
I Maximum possible goodput – 1 block/round
Siddhartha Annapureddy Thesis Defense
Why 95th percentile?
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80
50
100
150
200
250
300
350
400
450
500
Goodput (in blocks/round)
Node
s
I (x, y)I y nodes have a
goodput atleast x
I Red line at topshows 95th
percentile
I 95th percentile agood indicator ofgoodput
I Most nodeshave at leastthat goodput
Siddhartha Annapureddy Thesis Defense
Feasibility of VoD – What does block/round mean?
I 0.6 blocks/round with 35 rounds of setup time
I Two independent parametersI 1 round ≈ 10 secondsI 1 block/round ≈ 512 kbps
I Consider 35 rounds a good setup timeI 35 rounds ≈ 6 min
I Goodput = 0.6 blocks/roundI Max encoding rate = 0.6 ∗ 512 > 300 kbps
I Size of video = 250/0.6 ≈ 70 min
Siddhartha Annapureddy Thesis Defense
Naıve scheduling policies
I Segment-Random policyI Divide file into segmentsI Fetch blocks in random order inside same segmentI Download segments in order
Siddhartha Annapureddy Thesis Defense
Naıve approaches – Random
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Setup Time (in rounds)
Goo
dput
(blo
cks/
roun
d)
I Each node fetches ablock at random
I Throughput – Highas nodes fetchdisjoint blocks
I More opportunitiesfor blockexchanges
I Goodput – Low asnodes do not getblocks in order
I 0 blocks/round
Siddhartha Annapureddy Thesis Defense
Naıve approaches – Sequential
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Setup Time (in rounds)
Goo
dput
(blo
cks/
roun
d)
0.2 blocks/round
SequentialRandom
I Each node fetches blocks in order of playbackI Throughput – Low as fewer opportunities for exchange
I Need to increase this
Siddhartha Annapureddy Thesis Defense
Naıve approaches – Segment-Random policy
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Setup Time (in rounds)
Goo
dput
(blo
cks/
roun
d)
0.4 blocks/round
Segment-RandomSequentialRandom
I Segment-RandomI Fetch random block from the segment the earliest block falls in
I Increases the number of blocks propagating in the system
Siddhartha Annapureddy Thesis Defense
NetCoding – How it helps?
I A has blocks 1 and 2
I B gets 1 or 2 withequal prob. from A
I C gets 1 in parallel
I If B downloaded 1,link B-C becomesuseless
I Network coding routinely sends 1⊕ 2
Siddhartha Annapureddy Thesis Defense
Network coding – Mechanics
I Coding over blocksB1,B2, . . . ,Bn
I Choose coefficientsc1, c2, . . . , cn from finitefield
I Generate encoded block:E1 =
∑ni=1 ci × Bi
I Without n coded blocks, decoding cannot be doneI Setup time at least the time to fetch a segment
[Gkantsidis et al. Network Coding for Large Scale Content Distribution. InfoCom ’05]
Siddhartha Annapureddy Thesis Defense
Benefits of network coding
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Setup Time (in rounds)
Goo
dput
(blo
cks/
roun
d)
Segment-NetcodingSegment-RandomSequentialRandom
I GoodputI ≈ 0.6 blocks/round with netcodingI 0.4 blocks/round with seg-rand
Siddhartha Annapureddy Thesis Defense
Segment scheduling
I Netcoding good for fetching blocks in a segment, but cannotbe used across segments
I Because decoding requires all encoded blocks
I Problem: How to schedule fetching of segments?I Until now, sequential segment schedulingI Results in rare segments when nodes arrive at different times
I Segment scheduling algorithm to avoid rare segmentsI Evaluation with C# prototype
Siddhartha Annapureddy Thesis Defense
Segment scheduling
I A has 75% of the file
I Flash crowd of 20 Bs joinwith empty caches
I Server shared between Aand Bs
I Throughput to Adecreases, as server isbusy serving Bs
I Initial segments servedrepeatedly
I Throughput of Bs lowbecause the samesegments are fetchedfrom server
Siddhartha Annapureddy Thesis Defense
Segment scheduling – Problem
I 8 segments in the video; A has 75% = 6 segments, Bs havenone
I Green is popularity 2, Red is popularity 1I Popularity = # full copies in the system
Siddhartha Annapureddy Thesis Defense
Segment scheduling – Problem
I Instead of serving A...
Siddhartha Annapureddy Thesis Defense
Segment scheduling – Problem
I The server gives segments to Bs
Siddhartha Annapureddy Thesis Defense
Segment scheduling – Problem
0
10
20
30
40
50
60
70
80
90
100
0 25 50 75 100 125 150 175 200 225 250
# Da
ta B
lock
s
Time (in sec)
Naive -- #Blks Downloaded (A)Naive -- #Blks Downloaded (B1..n)
I A
I Bs
I A’s goodput plummetsI Get the best of both worlds – Improve A and Bs
I Idea: Each node serves only rarest segments in system
Siddhartha Annapureddy Thesis Defense
Segment scheduling – Algorithm
I When dst node connects to the src node...I Here: dst is B, src is server
Siddhartha Annapureddy Thesis Defense
Segment scheduling – Algorithm
I src node sorts segments in order of popularityI Segments 7, 8 least popular at 1I Segments 1-6 equally popular at 2
Siddhartha Annapureddy Thesis Defense
Segment scheduling – Algorithm
I src considers segments in sorted order one-by-one and servesdst
I Either completely available at src, orI First segment required by dst
Siddhartha Annapureddy Thesis Defense
Segment scheduling – Algorithm
I Server injects blocks from segment needed by A into systemI Avoids wasting bandwidth in serving initial segments multiple
times
Siddhartha Annapureddy Thesis Defense
Segment scheduling – Popularities
I How does src figure out popularities?I Centrally available at the server
I Our implementation uses this technique
I Each node maintains popularities for segmentsI Could use a gossiping protocol for aggregation
Siddhartha Annapureddy Thesis Defense
Segment scheduling
0
10
20
30
40
50
60
70
80
90
100
0 25 50 75 100 125 150 175 200 225 250
# Da
ta B
lock
s
Time (in sec)
Naive -- #Blks Downloaded (A)Naive -- #Blks Downloaded (B1..n)
#Blks Downloaded (A)#Blks Downloaded (B1..n)
I Note that goodput of both A and Bs improvesSiddhartha Annapureddy Thesis Defense
Conclusions
I Problem: Designing scalable data-intensive applicationsI Enabled low-end servers to scale to large numbers of users
I Shark: Scales file server for read-heavy workloadsI RedCarpet: Scales video server for near-Video-on-Demand
I Exploited P2P for file systems, solved security issuesI Techniques to improve scalability
I Cross file(system) sharingI Topology management – Data-based, locality awarenessI Scheduling of chunks – Network coding
Siddhartha Annapureddy Thesis Defense