Transcript
Page 1: Scaling Data Servers via Cooperative Caching

Scaling Data Servers via Cooperative Caching

Siddhartha Annapureddy

New York University

Committee:David Mazieres, Lakshminarayanan Subramanian, Jinyang Li,

Dennis Shasha, Zvi Kedem

Siddhartha Annapureddy Thesis Defense

Page 2: Scaling Data Servers via Cooperative Caching

Motivation

New functionality over the Internet

I Large number of users consume many (large) filesI YouTube, Azureus etc. for movies – DVDs no moreI Data dissemination to large distributed farms

Siddhartha Annapureddy Thesis Defense

Page 3: Scaling Data Servers via Cooperative Caching

Current approaches

Server

Client Client Client

I Scalability ↓ – Server’s uplink saturates

I Operation takes long time to finish

Siddhartha Annapureddy Thesis Defense

Page 4: Scaling Data Servers via Cooperative Caching

Current approaches

Server

Client Client Client

I Scalability ↓ – Server’s uplink saturatesI Operation takes long time to finish

Siddhartha Annapureddy Thesis Defense

Page 5: Scaling Data Servers via Cooperative Caching

Current approaches (contd.)

I Server farms replace central serverI Scalability problem persistsI Notoriously expensive to maintain

I Content Distribution Networks(CDNs)

I Still quite expensive

I Peer-to-peer (P2P) systemsI A new development

Siddhartha Annapureddy Thesis Defense

Page 6: Scaling Data Servers via Cooperative Caching

Peer-to-peer (P2P) systems

I Low-end central server

I Large client population

I File usually divided into chunks/blocksI Use client resources to disseminate file

I Bandwidth, disk space etc.

I Clients organized into a connected overlayI Each client knows only a few other clients

How to design P2P systems to scale data servers?

Siddhartha Annapureddy Thesis Defense

Page 7: Scaling Data Servers via Cooperative Caching

P2P scalability – Design principles

In order to achieve scalability, need to avoid hotspots

I Keep bw load on server to a minimumI Server gives data only to few users

I Leverage participants to distribute to other users

I Maintenance overhead with users to be kept low

I Keep bw load on users to a minimumI Each node interacts with only a small subset of users

Siddhartha Annapureddy Thesis Defense

Page 8: Scaling Data Servers via Cooperative Caching

P2P systems – Themes

I Usability – Exploiting P2P for a range of systemsI File systems, user-level apps etc.

I Security issues in such wide-area P2P systemsI Providing data integrity, privacy, authentication

I Cross file(system) sharing – Use other groupsI Make use of all available sources

I Topology management – Arrange nodes efficientlyI Underlying network (locality properties)I Node capacities (bandwidth, disk space etc.)I Application-specific (certain nodes together work well)

I Scheduling dissemination of chunks efficientlyI Nodes only have partial information

Siddhartha Annapureddy Thesis Defense

Page 9: Scaling Data Servers via Cooperative Caching

P2P systems – Themes

I Usability – Exploiting P2P for a range of systemsI File systems, user-level apps etc.

I Security issues in such wide-area P2P systemsI Providing data integrity, privacy, authentication

I Cross file(system) sharing – Use other groupsI Make use of all available sources

I Topology management – Arrange nodes efficientlyI Underlying network (locality properties)I Node capacities (bandwidth, disk space etc.)I Application-specific (certain nodes together work well)

I Scheduling dissemination of chunks efficientlyI Nodes only have partial information

Siddhartha Annapureddy Thesis Defense

Page 10: Scaling Data Servers via Cooperative Caching

P2P systems – Themes

I Usability – Exploiting P2P for a range of systemsI File systems, user-level apps etc.

I Security issues in such wide-area P2P systemsI Providing data integrity, privacy, authentication

I Cross file(system) sharing – Use other groupsI Make use of all available sources

I Topology management – Arrange nodes efficientlyI Underlying network (locality properties)I Node capacities (bandwidth, disk space etc.)I Application-specific (certain nodes together work well)

I Scheduling dissemination of chunks efficientlyI Nodes only have partial information

Siddhartha Annapureddy Thesis Defense

Page 11: Scaling Data Servers via Cooperative Caching

P2P systems – Themes

I Usability – Exploiting P2P for a range of systemsI File systems, user-level apps etc.

I Security issues in such wide-area P2P systemsI Providing data integrity, privacy, authentication

I Cross file(system) sharing – Use other groupsI Make use of all available sources

I Topology management – Arrange nodes efficientlyI Underlying network (locality properties)I Node capacities (bandwidth, disk space etc.)I Application-specific (certain nodes together work well)

I Scheduling dissemination of chunks efficientlyI Nodes only have partial information

Siddhartha Annapureddy Thesis Defense

Page 12: Scaling Data Servers via Cooperative Caching

Shark – Motivation

Scenario – Want to test a new application on a hundred nodes

Workstation

Workstation

Workstation

Workstation

WorkstationWorkstation

Workstation Workstation

WorkstationWorkstation

Problem – Need to push software to all nodes and collect results

Siddhartha Annapureddy Thesis Defense

Page 13: Scaling Data Servers via Cooperative Caching

Current approaches

Distribute from a central server

Client

Client

Client

Network A Network B

Server

rsync, unison, scp

– Server’s network uplink saturates

– Wastes bandwidth along bottleneck links

Siddhartha Annapureddy Thesis Defense

Page 14: Scaling Data Servers via Cooperative Caching

Current approaches

File distribution mechanisms

BitTorrent, Bullet

+ Scales by offloading burden on server

– Client downloads from half-way across the world

Siddhartha Annapureddy Thesis Defense

Page 15: Scaling Data Servers via Cooperative Caching

Inherent problems with copying files

Users have to decide a priori what to ship

I Ship too much – Waste bandwidth, takes long time

I Ship too little – Hassle to work in a poor environment

Idle environments consume disk space

I Users are loath to cleanup ⇒ Redistribution

I Need low cost solution for refetching files

Manual management of software versions

I Point /usr/local to a central server

Illusion of having development environmentPrograms fetched transparently on demand

Siddhartha Annapureddy Thesis Defense

Page 16: Scaling Data Servers via Cooperative Caching

Networked file systems

+ Know how to deploy these systems

+ Know how to administer such systems

+ Simple accountability mechanisms

Eg: NFS, AFS, SFS etc.

Siddhartha Annapureddy Thesis Defense

Page 17: Scaling Data Servers via Cooperative Caching

Problem: Scalability

0

100

200

300

400

500

600

700

800

0 20 40 60 80 100

Tim

e to

fini

sh R

ead

(sec

)

Unique nodes

40 MB readSFS

Very slow ≈ 775sMuch better !!! ≈ 88s (9x better)

Siddhartha Annapureddy Thesis Defense

Page 18: Scaling Data Servers via Cooperative Caching

Problem: Scalability

0

100

200

300

400

500

600

700

800

0 20 40 60 80 100

Tim

e to

fini

sh R

ead

(sec

)

Unique nodes

40 MB readSFSShark

Very slow ≈ 775sMuch better !!! ≈ 88s (9x better)

Siddhartha Annapureddy Thesis Defense

Page 19: Scaling Data Servers via Cooperative Caching

Let’s design this new filesystem

ServerClient

Vanilla SFS

I Central server model

Siddhartha Annapureddy Thesis Defense

Page 20: Scaling Data Servers via Cooperative Caching

Scalability – Client-side caching

ServerClient

Cache

Large client cache a la AFS

I Whole file caching

I Scales by avoiding redundant data transfers

I Leases to ensure NFS-style cache consistency

Siddhartha Annapureddy Thesis Defense

Page 21: Scaling Data Servers via Cooperative Caching

Scalability – Cooperative caching

Clients fetch data from each other and offload burden from server

Client

Client

Client

Server

ClientClient

I With multiple clients, must address bandwidth concerns

Siddhartha Annapureddy Thesis Defense

Page 22: Scaling Data Servers via Cooperative Caching

Scalability – Cooperative caching

Clients fetch data from each other and offload burden from server

Client

Client

Client

Server

ClientClient

I Shark clients maintain distributed index

Siddhartha Annapureddy Thesis Defense

Page 23: Scaling Data Servers via Cooperative Caching

Scalability – Cooperative caching

Clients fetch data from each other and offload burden from server

Client

Client

Client

Server

ClientClient

I Shark clients maintain distributed index

I Fetch a file from multiple other clients in parallel

Siddhartha Annapureddy Thesis Defense

Page 24: Scaling Data Servers via Cooperative Caching

Scalability – Cooperative caching

Clients fetch data from each other and offload burden from server

Client

Client

Client

Server

ClientClient

I Shark clients maintain distributed index

I Fetch a file from multiple other clients in parallel

I Read-heavy workloads – Writes still go to server

Siddhartha Annapureddy Thesis Defense

Page 25: Scaling Data Servers via Cooperative Caching

P2P systems – Themes

I Usability – Exploiting P2P for a range of systems

I Cross file(system) sharing – Use other groups

I Security issues in such wide-area P2P systems

I Topology management – Arrange nodes efficiently

I Scheduling dissemination of chunks efficiently

Siddhartha Annapureddy Thesis Defense

Page 26: Scaling Data Servers via Cooperative Caching

Cross file(system) sharing – Application

Linux distribution with LiveCDs

Sharkcd

Sharkcd

Sharkcd

Sharkcd

Knoppix

Knoppix Mirror

Knoppix

Knoppix

Slax

Slax

Slax

I LiveCD – Run an entire OS without using hard diskI But all your programs must fit on a CD-ROMI Download dynamically from server but scalability problemsI Knoppix and Slax users form global cache – Relieve servers

Siddhartha Annapureddy Thesis Defense

Page 27: Scaling Data Servers via Cooperative Caching

Cross file(system) sharing

Global cooperative cache regardless of origin servers

Client

ClientClient

Client

Client

Client

ClientClient

Client

Client

Slax Knoppix

I Two groups of clients accessing Slax/Knoppix servers

I Client groups share a large amount of software

I Such clients automatically form a global cache

Siddhartha Annapureddy Thesis Defense

Page 28: Scaling Data Servers via Cooperative Caching

Cross file(system) sharing

Global cooperative cache regardless of origin servers

Client

ClientClient

Client

Client

Client

ClientClient

Client

Client

KnoppixSlax

I Two groups of clients accessing Slax/Knoppix servers

I Client groups share a large amount of software

I Such clients automatically form a global cache

Siddhartha Annapureddy Thesis Defense

Page 29: Scaling Data Servers via Cooperative Caching

Cross file(system) sharing

Global cooperative cache regardless of origin servers

Client

ClientClient

Client

Client

Client

ClientClient

Client

Client

KnoppixSlax

I Two groups of clients accessing Slax/Knoppix servers

I Client groups share a large amount of software

I BitTorrent supports only per-file groups

Siddhartha Annapureddy Thesis Defense

Page 30: Scaling Data Servers via Cooperative Caching

Content-based chunking

Client

Client

Client

Server

ClientClient

I Single overlay for all filesystems and content-based chunksI Chunks – Variable-sized blocks based on content

I Chunks are better – Exploit commonalities across filesI LBFS-style chunks preserved across versions, concatenations

[Muthitacharoen et al. A Low-Bandwidth Network File System. SOSP ’01]

Siddhartha Annapureddy Thesis Defense

Page 31: Scaling Data Servers via Cooperative Caching

P2P systems – Themes

I Usability – Exploiting P2P for a range of systems

I Cross file(system) sharing – Use other groups

I Security issues in such wide-area P2P systems

I Topology management – Arrange nodes efficiently

I Scheduling dissemination of chunks efficiently

Siddhartha Annapureddy Thesis Defense

Page 32: Scaling Data Servers via Cooperative Caching

Security issues – Client authentication

Client Client

Client

Server

Client

Client

TraditionalAuthentication

I Traditionally, server authenticated read requests using uids

I Challenge – Interaction between distrustful clients

Siddhartha Annapureddy Thesis Defense

Page 33: Scaling Data Servers via Cooperative Caching

Security issues – Client authentication

Client Client

Client

Server

Client

Client

TraditionalAuthentication From Token

Authenticator Derived

I Traditionally, server authenticated read requests using uids

I Challenge – Interaction between distrustful clients

Siddhartha Annapureddy Thesis Defense

Page 34: Scaling Data Servers via Cooperative Caching

Security issues – Client communication

I Client should be able to check integrity of downloaded chunk

I Client should not send chunks to other unauthorized clients

I An eavesdropper shouldn’t be able to obtain chunk contents

I Accomplish this without a complex PKI

I Keep the number of rounds to a minimum

Siddhartha Annapureddy Thesis Defense

Page 35: Scaling Data Servers via Cooperative Caching

File metadata

Client ServerMetadata:(fh,offset,count)GET_TOKEN

Chunk tokens

I Possession of token implies server permissions to read

I Tokens are a shared secret between authorized clients

I Tokens can be used to check integrity of fetched data

Given a chunk B...

I Chunk token TB = H(B)

I H is a collision resistant hash function

Siddhartha Annapureddy Thesis Defense

Page 36: Scaling Data Servers via Cooperative Caching

File metadata

Client ServerMetadata:(fh,offset,count)GET_TOKEN

Chunk tokens

I Possession of token implies server permissions to read

I Tokens are a shared secret between authorized clients

I Tokens can be used to check integrity of fetched data

Given a chunk B...

I Chunk token TB = H(B)

I H is a collision resistant hash function

Siddhartha Annapureddy Thesis Defense

Page 37: Scaling Data Servers via Cooperative Caching

File metadata

Client ServerMetadata:(fh,offset,count)GET_TOKEN

Chunk tokens

I Possession of token implies server permissions to read

I Tokens are a shared secret between authorized clients

I Tokens can be used to check integrity of fetched data

Given a chunk B...

I Chunk token TB = H(B)

I H is a collision resistant hash function

Siddhartha Annapureddy Thesis Defense

Page 38: Scaling Data Servers via Cooperative Caching

File metadata

Client ServerMetadata:(fh,offset,count)GET_TOKEN

Chunk tokens

I Possession of token implies server permissions to read

I Tokens are a shared secret between authorized clients

I Tokens can be used to check integrity of fetched data

Given a chunk B...

I Chunk token TB = H(B)

I H is a collision resistant hash function

Siddhartha Annapureddy Thesis Defense

Page 39: Scaling Data Servers via Cooperative Caching

File metadata

Client ServerMetadata:(fh,offset,count)GET_TOKEN

Chunk tokens

I Possession of token implies server permissions to read

I Tokens are a shared secret between authorized clients

I Tokens can be used to check integrity of fetched data

Given a chunk B...

I Chunk token TB = H(B)

I H is a collision resistant hash function

Siddhartha Annapureddy Thesis Defense

Page 40: Scaling Data Servers via Cooperative Caching

Security protocol

Client

Receiver

Client

Source

I RC,RP – Random nonces to ensure freshness

I AuthC – Authenticator to prove receiver has tokenI AuthC = MAC (TB, “Auth C”, C,P,RC,RP)

I K – Key to encrypt chunk contentsI K = MAC (TB, “Encryption”, C,P,RC,RP)

Siddhartha Annapureddy Thesis Defense

Page 41: Scaling Data Servers via Cooperative Caching

Security protocol

Client

Receiver

Client

Source

RC

RP

I RC,RP – Random nonces to ensure freshness

I AuthC – Authenticator to prove receiver has tokenI AuthC = MAC (TB, “Auth C”, C,P,RC,RP)

I K – Key to encrypt chunk contentsI K = MAC (TB, “Encryption”, C,P,RC,RP)

Siddhartha Annapureddy Thesis Defense

Page 42: Scaling Data Servers via Cooperative Caching

Security protocol

Client

Receiver

Client

Source

RC

RP

IB, AuthC

EK(B)

I RC,RP – Random nonces to ensure freshness

I AuthC – Authenticator to prove receiver has tokenI AuthC = MAC (TB, “Auth C”, C,P,RC,RP)

I K – Key to encrypt chunk contentsI K = MAC (TB, “Encryption”, C,P,RC,RP)

Siddhartha Annapureddy Thesis Defense

Page 43: Scaling Data Servers via Cooperative Caching

Security protocol

Client

Receiver

Client

Source

RC

RP

IB, AuthC

EK(B)

I RC,RP – Random nonces to ensure freshness

I AuthC – Authenticator to prove receiver has tokenI AuthC = MAC (TB, “Auth C”, C,P,RC,RP)

I K – Key to encrypt chunk contentsI K = MAC (TB, “Encryption”, C,P,RC,RP)

Siddhartha Annapureddy Thesis Defense

Page 44: Scaling Data Servers via Cooperative Caching

Security properties

Client

Receiver

Client

Source

RC

RP

IB, AuthC

EK(B)

Client can check integrity of downloaded chunk

I Client checks H(Downloaded chunk)?= TB

Source should not send chunks to unauthorized clients

I Malicious clients cannot send correct AuthC

Eavesdropper shouldn’t get chunk contents

I All communication encrypted with K

Siddhartha Annapureddy Thesis Defense

Page 45: Scaling Data Servers via Cooperative Caching

P2P systems – Themes

I Usability – Exploiting P2P for a range of systems

I Cross file(system) sharing – Use other groups

I Security issues in such wide-area P2P systems

I Topology management – Arrange nodes efficiently

I Scheduling dissemination of chunks efficiently

Siddhartha Annapureddy Thesis Defense

Page 46: Scaling Data Servers via Cooperative Caching

Topology management

Client Client

Client

Server

Client

Client

Fetch Metadata

I Fetch metadata from the server

I Look up clients caching needed chunks in overlay

I Overlay is common to all file systems

I Overlay organized like a DHT – Get and PutSiddhartha Annapureddy Thesis Defense

Page 47: Scaling Data Servers via Cooperative Caching

Topology management

Client Client

Client

Server

Client

Client

Lookup Clients

I Fetch metadata from the server

I Look up clients caching needed chunks in overlay

I Overlay is common to all file systems

I Overlay organized like a DHT – Get and PutSiddhartha Annapureddy Thesis Defense

Page 48: Scaling Data Servers via Cooperative Caching

Topology management

Client Client

Client

Server

Client

Client

Lookup Clients

I Fetch metadata from the server

I Look up clients caching needed chunks in overlay

I Overlay is common to all file systems

I Overlay organized like a DHT – Get and PutSiddhartha Annapureddy Thesis Defense

Page 49: Scaling Data Servers via Cooperative Caching

Topology management

Client Client

Client

Server

Client

Client

Lookup Clients

I Fetch metadata from the server

I Look up clients caching needed chunks in overlay

I Overlay is common to all file systems

I Overlay organized like a DHT – Get and PutSiddhartha Annapureddy Thesis Defense

Page 50: Scaling Data Servers via Cooperative Caching

Overlay – Get operation

Client Client

Client

Server

Client

Client

Discover Clients

I For every chunk B, there’s indexing key IBI IB used to index clients caching BI Cannot set IB = TB, as TB is a secret

I IB = MAC (TB, “Indexing Key”)

Siddhartha Annapureddy Thesis Defense

Page 51: Scaling Data Servers via Cooperative Caching

Overlay – Put operation

Register as a source

I Client now becomes a source for the downloaded chunk

I Client registers in distributed index – PUT(IB,Addr)

Chunk Reconciliation

I Reuse TCP connection to download more chunks

I Exchange mutually needed chunks w/o indexing overhead

Siddhartha Annapureddy Thesis Defense

Page 52: Scaling Data Servers via Cooperative Caching

Overlay – Locality awareness

Client

Client

Client

Client

Network A Network B

I Overlay organized as clusters based on latency

I Preferentially returns sources in same cluster as the client

I Hence, chunks usually transferred from nearby clients

[Freedman et al. Democratizing content publication with Coral. NSDI ’04]

Siddhartha Annapureddy Thesis Defense

Page 53: Scaling Data Servers via Cooperative Caching

Topology management – Shark vs. BitTorrent

BitTorrent

I Neighbours “fixed”

I Makes cross file sharing difficult

Shark

I Chunk-based overlay

I Neighbours based on data

I Enables cross file sharing

Server

Client

ClientClient

Client

Siddhartha Annapureddy Thesis Defense

Page 54: Scaling Data Servers via Cooperative Caching

Evaluation

I How does Shark compare with SFS? With NFS?

I How scalable is the server?

I How fair is Shark across clients?

I Which order is better? Random or Sequential

Siddhartha Annapureddy Thesis Defense

Page 55: Scaling Data Servers via Cooperative Caching

Emulab – 100 Nodes on LAN

0

20

40

60

80

100

0 100 200 300 400 500 600 700 800

Perc

enta

ge c

ompl

eted

with

in ti

me

Time since initialization (sec)

40 MB readShark, rand, negotiationNFSSFS

I Shark – 88sI SFS – 775s (≈ 9x better), NFS – 350s (≈ 4x better)I SFS less fair because of TCP backoffs

Siddhartha Annapureddy Thesis Defense

Page 56: Scaling Data Servers via Cooperative Caching

PlanetLab – 185 Nodes

0

20

40

60

80

100

0 500 1000 1500 2000 2500

Perc

enta

ge c

ompl

eted

with

in ti

me

Time since initialization (sec)

40 MB readSharkSFS

I Shark ≈ 7 min – 95th Percentile

I SFS ≈ 39 min – 95th Percentile (5x better)

I NFS – Triggered kernel panics in server

Siddhartha Annapureddy Thesis Defense

Page 57: Scaling Data Servers via Cooperative Caching

Data pushed by server

SFS Shark0.0

0.5

1.0

Nor

mal

ized

ban

dwid

th

7400

954

I Shark vs SFS – 23 copies vs 185 copies (8x better)I Overlay not responsive enoughI Torn connections, timeouts force going to server

Siddhartha Annapureddy Thesis Defense

Page 58: Scaling Data Servers via Cooperative Caching

Data served by Clients

0

20

40

60

80

100

120

140

0 20 40 60 80 100 120 140 160 180

Band

widt

h tra

nsm

itted

(MB)

Unique Shark proxies

PlanetLab hostsShark, 40MB

I Maximum contribution ≈ 3.5 copies

I Median contribution ≈ 0.75 copies

Siddhartha Annapureddy Thesis Defense

Page 59: Scaling Data Servers via Cooperative Caching

P2P systems – Themes

I Usability – Exploiting P2P for a range of systems

I Cross file(system) sharing – Use other groups

I Security issues in such wide-area P2P systems

I Topology management – Arrange nodes efficiently

I Scheduling dissemination of chunks efficiently

Siddhartha Annapureddy Thesis Defense

Page 60: Scaling Data Servers via Cooperative Caching

Fetching Chunks – Order Matters

I Scheduling problem – What order to fetch chunks of file?

I Natural choices – Random or Sequential

Intuitively, when many clients start simultaneouslyI Random

I All clients fetch independent chunksI More chunks become available in the cooperative cache

I SequentialI Better disk I/O scheduling on the serverI Client that downloads most chunks alone adds to cache

Siddhartha Annapureddy Thesis Defense

Page 61: Scaling Data Servers via Cooperative Caching

Emulab – 100 Nodes on LAN

0

20

40

60

80

100

0 100 200 300 400 500 600 700 800

Perc

enta

ge c

ompl

eted

with

in ti

me

Time since initialization (sec)

40 MB readShark, randShark, seq

I Random – 133sI Sequential – 203sI Random Wins !!! – 35% better

Siddhartha Annapureddy Thesis Defense

Page 62: Scaling Data Servers via Cooperative Caching

Performance characteristics of workloads

I For Shark, what matters is throughputI Operation should finish as soon as

possible

Utilit

y →

Throughput →

I Another interesting class of applicationsI Video streaming – RedCarpet

Utilit

y →

Throughput →

I Study the scheduling problem for Video-on-DemandI Press a button, wait a little while, and watch playback

Siddhartha Annapureddy Thesis Defense

Page 63: Scaling Data Servers via Cooperative Caching

Challenges for VoD

I Near VoDI Small setup time to play

videos

I High sustainable goodputI Largest slope osculating block

arrival curveI Highest video encoding rate

system can support

Goal: Do block scheduling to achieve low setup time and highgoodput

Siddhartha Annapureddy Thesis Defense

Page 64: Scaling Data Servers via Cooperative Caching

System design – Outline

I Components based on BitTorrentI Central server (tracker + seed)I Users interested in the data

I Central serverI Maintains a list of nodes

I Each node finds neighboursI Contacts the server for thisI Another option – Use gossip

protocols

I Exchange data with neighboursI Connections are bi-directional

Siddhartha Annapureddy Thesis Defense

Page 65: Scaling Data Servers via Cooperative Caching

Simulator – Outline

I All nodes have unit bandwidth capacityI 500 nodes in most of our experiments

I Each node has 4-6 neighboursI Simulator is round-based

I Block exchanges done at each roundI System moves as a whole to the next round

I File divided into 250 blocksI Segment is a set of consecutive blocksI Segment size = 10

I Maximum possible goodput – 1 block/round

Siddhartha Annapureddy Thesis Defense

Page 66: Scaling Data Servers via Cooperative Caching

Why 95th percentile?

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80

50

100

150

200

250

300

350

400

450

500

Goodput (in blocks/round)

Node

s

I (x, y)I y nodes have a

goodput atleast x

I Red line at topshows 95th

percentile

I 95th percentile agood indicator ofgoodput

I Most nodeshave at leastthat goodput

Siddhartha Annapureddy Thesis Defense

Page 67: Scaling Data Servers via Cooperative Caching

Feasibility of VoD – What does block/round mean?

I 0.6 blocks/round with 35 rounds of setup time

I Two independent parametersI 1 round ≈ 10 secondsI 1 block/round ≈ 512 kbps

I Consider 35 rounds a good setup timeI 35 rounds ≈ 6 min

I Goodput = 0.6 blocks/roundI Max encoding rate = 0.6 ∗ 512 > 300 kbps

I Size of video = 250/0.6 ≈ 70 min

Siddhartha Annapureddy Thesis Defense

Page 68: Scaling Data Servers via Cooperative Caching

Naıve scheduling policies

I Segment-Random policyI Divide file into segmentsI Fetch blocks in random order inside same segmentI Download segments in order

Siddhartha Annapureddy Thesis Defense

Page 69: Scaling Data Servers via Cooperative Caching

Naıve approaches – Random

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Setup Time (in rounds)

Goo

dput

(blo

cks/

roun

d)

I Each node fetches ablock at random

I Throughput – Highas nodes fetchdisjoint blocks

I More opportunitiesfor blockexchanges

I Goodput – Low asnodes do not getblocks in order

I 0 blocks/round

Siddhartha Annapureddy Thesis Defense

Page 70: Scaling Data Servers via Cooperative Caching

Naıve approaches – Sequential

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Setup Time (in rounds)

Goo

dput

(blo

cks/

roun

d)

0.2 blocks/round

SequentialRandom

I Each node fetches blocks in order of playbackI Throughput – Low as fewer opportunities for exchange

I Need to increase this

Siddhartha Annapureddy Thesis Defense

Page 71: Scaling Data Servers via Cooperative Caching

Naıve approaches – Segment-Random policy

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Setup Time (in rounds)

Goo

dput

(blo

cks/

roun

d)

0.4 blocks/round

Segment-RandomSequentialRandom

I Segment-RandomI Fetch random block from the segment the earliest block falls in

I Increases the number of blocks propagating in the system

Siddhartha Annapureddy Thesis Defense

Page 72: Scaling Data Servers via Cooperative Caching

NetCoding – How it helps?

I A has blocks 1 and 2

I B gets 1 or 2 withequal prob. from A

I C gets 1 in parallel

I If B downloaded 1,link B-C becomesuseless

I Network coding routinely sends 1⊕ 2

Siddhartha Annapureddy Thesis Defense

Page 73: Scaling Data Servers via Cooperative Caching

Network coding – Mechanics

I Coding over blocksB1,B2, . . . ,Bn

I Choose coefficientsc1, c2, . . . , cn from finitefield

I Generate encoded block:E1 =

∑ni=1 ci × Bi

I Without n coded blocks, decoding cannot be doneI Setup time at least the time to fetch a segment

[Gkantsidis et al. Network Coding for Large Scale Content Distribution. InfoCom ’05]

Siddhartha Annapureddy Thesis Defense

Page 74: Scaling Data Servers via Cooperative Caching

Benefits of network coding

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Setup Time (in rounds)

Goo

dput

(blo

cks/

roun

d)

Segment-NetcodingSegment-RandomSequentialRandom

I GoodputI ≈ 0.6 blocks/round with netcodingI 0.4 blocks/round with seg-rand

Siddhartha Annapureddy Thesis Defense

Page 75: Scaling Data Servers via Cooperative Caching

Segment scheduling

I Netcoding good for fetching blocks in a segment, but cannotbe used across segments

I Because decoding requires all encoded blocks

I Problem: How to schedule fetching of segments?I Until now, sequential segment schedulingI Results in rare segments when nodes arrive at different times

I Segment scheduling algorithm to avoid rare segmentsI Evaluation with C# prototype

Siddhartha Annapureddy Thesis Defense

Page 76: Scaling Data Servers via Cooperative Caching

Segment scheduling

I A has 75% of the file

I Flash crowd of 20 Bs joinwith empty caches

I Server shared between Aand Bs

I Throughput to Adecreases, as server isbusy serving Bs

I Initial segments servedrepeatedly

I Throughput of Bs lowbecause the samesegments are fetchedfrom server

Siddhartha Annapureddy Thesis Defense

Page 77: Scaling Data Servers via Cooperative Caching

Segment scheduling – Problem

I 8 segments in the video; A has 75% = 6 segments, Bs havenone

I Green is popularity 2, Red is popularity 1I Popularity = # full copies in the system

Siddhartha Annapureddy Thesis Defense

Page 78: Scaling Data Servers via Cooperative Caching

Segment scheduling – Problem

I Instead of serving A...

Siddhartha Annapureddy Thesis Defense

Page 79: Scaling Data Servers via Cooperative Caching

Segment scheduling – Problem

I The server gives segments to Bs

Siddhartha Annapureddy Thesis Defense

Page 80: Scaling Data Servers via Cooperative Caching

Segment scheduling – Problem

0

10

20

30

40

50

60

70

80

90

100

0 25 50 75 100 125 150 175 200 225 250

# Da

ta B

lock

s

Time (in sec)

Naive -- #Blks Downloaded (A)Naive -- #Blks Downloaded (B1..n)

I A

I Bs

I A’s goodput plummetsI Get the best of both worlds – Improve A and Bs

I Idea: Each node serves only rarest segments in system

Siddhartha Annapureddy Thesis Defense

Page 81: Scaling Data Servers via Cooperative Caching

Segment scheduling – Algorithm

I When dst node connects to the src node...I Here: dst is B, src is server

Siddhartha Annapureddy Thesis Defense

Page 82: Scaling Data Servers via Cooperative Caching

Segment scheduling – Algorithm

I src node sorts segments in order of popularityI Segments 7, 8 least popular at 1I Segments 1-6 equally popular at 2

Siddhartha Annapureddy Thesis Defense

Page 83: Scaling Data Servers via Cooperative Caching

Segment scheduling – Algorithm

I src considers segments in sorted order one-by-one and servesdst

I Either completely available at src, orI First segment required by dst

Siddhartha Annapureddy Thesis Defense

Page 84: Scaling Data Servers via Cooperative Caching

Segment scheduling – Algorithm

I Server injects blocks from segment needed by A into systemI Avoids wasting bandwidth in serving initial segments multiple

times

Siddhartha Annapureddy Thesis Defense

Page 85: Scaling Data Servers via Cooperative Caching

Segment scheduling – Popularities

I How does src figure out popularities?I Centrally available at the server

I Our implementation uses this technique

I Each node maintains popularities for segmentsI Could use a gossiping protocol for aggregation

Siddhartha Annapureddy Thesis Defense

Page 86: Scaling Data Servers via Cooperative Caching

Segment scheduling

0

10

20

30

40

50

60

70

80

90

100

0 25 50 75 100 125 150 175 200 225 250

# Da

ta B

lock

s

Time (in sec)

Naive -- #Blks Downloaded (A)Naive -- #Blks Downloaded (B1..n)

#Blks Downloaded (A)#Blks Downloaded (B1..n)

I Note that goodput of both A and Bs improvesSiddhartha Annapureddy Thesis Defense

Page 87: Scaling Data Servers via Cooperative Caching

Conclusions

I Problem: Designing scalable data-intensive applicationsI Enabled low-end servers to scale to large numbers of users

I Shark: Scales file server for read-heavy workloadsI RedCarpet: Scales video server for near-Video-on-Demand

I Exploited P2P for file systems, solved security issuesI Techniques to improve scalability

I Cross file(system) sharingI Topology management – Data-based, locality awarenessI Scheduling of chunks – Network coding

Siddhartha Annapureddy Thesis Defense


Recommended