87
Scaling Data Servers via Cooperative Caching Siddhartha Annapureddy New York University Committee: David Mazi` eres, Lakshminarayanan Subramanian, Jinyang Li, Dennis Shasha, Zvi Kedem Siddhartha Annapureddy Thesis Defense

Scaling Data Servers via Cooperative Caching

Embed Size (px)

DESCRIPTION

In this thesis, we present design techniques -- and systems that illustrate and validate these techniques -- for building data-intensive applications over the Internet. We enable the use of a traditional bandwidth-limited server in these applications. A large number of cooperating users contribute resources such as disk space and network bandwidth, and form the backbone of such applications. The applications we consider fall in one of two categories. The first type provide user-perceived utility in proportion to the data download rates of the participants; bulk data distributionsystems is a typical example. The second type are usable only when the participants have data download rates above a certain threshold; video streaming is a prime example.

Citation preview

Page 1: Scaling Data Servers via Cooperative Caching

Scaling Data Servers via Cooperative Caching

Siddhartha Annapureddy

New York University

Committee:David Mazieres, Lakshminarayanan Subramanian, Jinyang Li,

Dennis Shasha, Zvi Kedem

Siddhartha Annapureddy Thesis Defense

Page 2: Scaling Data Servers via Cooperative Caching

Motivation

New functionality over the Internet

I Large number of users consume many (large) filesI YouTube, Azureus etc. for movies – DVDs no moreI Data dissemination to large distributed farms

Siddhartha Annapureddy Thesis Defense

Page 3: Scaling Data Servers via Cooperative Caching

Current approaches

Server

Client Client Client

I Scalability ↓ – Server’s uplink saturates

I Operation takes long time to finish

Siddhartha Annapureddy Thesis Defense

Page 4: Scaling Data Servers via Cooperative Caching

Current approaches

Server

Client Client Client

I Scalability ↓ – Server’s uplink saturatesI Operation takes long time to finish

Siddhartha Annapureddy Thesis Defense

Page 5: Scaling Data Servers via Cooperative Caching

Current approaches (contd.)

I Server farms replace central serverI Scalability problem persistsI Notoriously expensive to maintain

I Content Distribution Networks(CDNs)

I Still quite expensive

I Peer-to-peer (P2P) systemsI A new development

Siddhartha Annapureddy Thesis Defense

Page 6: Scaling Data Servers via Cooperative Caching

Peer-to-peer (P2P) systems

I Low-end central server

I Large client population

I File usually divided into chunks/blocksI Use client resources to disseminate file

I Bandwidth, disk space etc.

I Clients organized into a connected overlayI Each client knows only a few other clients

How to design P2P systems to scale data servers?

Siddhartha Annapureddy Thesis Defense

Page 7: Scaling Data Servers via Cooperative Caching

P2P scalability – Design principles

In order to achieve scalability, need to avoid hotspots

I Keep bw load on server to a minimumI Server gives data only to few users

I Leverage participants to distribute to other users

I Maintenance overhead with users to be kept low

I Keep bw load on users to a minimumI Each node interacts with only a small subset of users

Siddhartha Annapureddy Thesis Defense

Page 8: Scaling Data Servers via Cooperative Caching

P2P systems – Themes

I Usability – Exploiting P2P for a range of systemsI File systems, user-level apps etc.

I Security issues in such wide-area P2P systemsI Providing data integrity, privacy, authentication

I Cross file(system) sharing – Use other groupsI Make use of all available sources

I Topology management – Arrange nodes efficientlyI Underlying network (locality properties)I Node capacities (bandwidth, disk space etc.)I Application-specific (certain nodes together work well)

I Scheduling dissemination of chunks efficientlyI Nodes only have partial information

Siddhartha Annapureddy Thesis Defense

Page 9: Scaling Data Servers via Cooperative Caching

P2P systems – Themes

I Usability – Exploiting P2P for a range of systemsI File systems, user-level apps etc.

I Security issues in such wide-area P2P systemsI Providing data integrity, privacy, authentication

I Cross file(system) sharing – Use other groupsI Make use of all available sources

I Topology management – Arrange nodes efficientlyI Underlying network (locality properties)I Node capacities (bandwidth, disk space etc.)I Application-specific (certain nodes together work well)

I Scheduling dissemination of chunks efficientlyI Nodes only have partial information

Siddhartha Annapureddy Thesis Defense

Page 10: Scaling Data Servers via Cooperative Caching

P2P systems – Themes

I Usability – Exploiting P2P for a range of systemsI File systems, user-level apps etc.

I Security issues in such wide-area P2P systemsI Providing data integrity, privacy, authentication

I Cross file(system) sharing – Use other groupsI Make use of all available sources

I Topology management – Arrange nodes efficientlyI Underlying network (locality properties)I Node capacities (bandwidth, disk space etc.)I Application-specific (certain nodes together work well)

I Scheduling dissemination of chunks efficientlyI Nodes only have partial information

Siddhartha Annapureddy Thesis Defense

Page 11: Scaling Data Servers via Cooperative Caching

P2P systems – Themes

I Usability – Exploiting P2P for a range of systemsI File systems, user-level apps etc.

I Security issues in such wide-area P2P systemsI Providing data integrity, privacy, authentication

I Cross file(system) sharing – Use other groupsI Make use of all available sources

I Topology management – Arrange nodes efficientlyI Underlying network (locality properties)I Node capacities (bandwidth, disk space etc.)I Application-specific (certain nodes together work well)

I Scheduling dissemination of chunks efficientlyI Nodes only have partial information

Siddhartha Annapureddy Thesis Defense

Page 12: Scaling Data Servers via Cooperative Caching

Shark – Motivation

Scenario – Want to test a new application on a hundred nodes

Workstation

Workstation

Workstation

Workstation

WorkstationWorkstation

Workstation Workstation

WorkstationWorkstation

Problem – Need to push software to all nodes and collect results

Siddhartha Annapureddy Thesis Defense

Page 13: Scaling Data Servers via Cooperative Caching

Current approaches

Distribute from a central server

Client

Client

Client

Network A Network B

Server

rsync, unison, scp

– Server’s network uplink saturates

– Wastes bandwidth along bottleneck links

Siddhartha Annapureddy Thesis Defense

Page 14: Scaling Data Servers via Cooperative Caching

Current approaches

File distribution mechanisms

BitTorrent, Bullet

+ Scales by offloading burden on server

– Client downloads from half-way across the world

Siddhartha Annapureddy Thesis Defense

Page 15: Scaling Data Servers via Cooperative Caching

Inherent problems with copying files

Users have to decide a priori what to ship

I Ship too much – Waste bandwidth, takes long time

I Ship too little – Hassle to work in a poor environment

Idle environments consume disk space

I Users are loath to cleanup ⇒ Redistribution

I Need low cost solution for refetching files

Manual management of software versions

I Point /usr/local to a central server

Illusion of having development environmentPrograms fetched transparently on demand

Siddhartha Annapureddy Thesis Defense

Page 16: Scaling Data Servers via Cooperative Caching

Networked file systems

+ Know how to deploy these systems

+ Know how to administer such systems

+ Simple accountability mechanisms

Eg: NFS, AFS, SFS etc.

Siddhartha Annapureddy Thesis Defense

Page 17: Scaling Data Servers via Cooperative Caching

Problem: Scalability

0

100

200

300

400

500

600

700

800

0 20 40 60 80 100

Tim

e to

fini

sh R

ead

(sec

)

Unique nodes

40 MB readSFS

Very slow ≈ 775sMuch better !!! ≈ 88s (9x better)

Siddhartha Annapureddy Thesis Defense

Page 18: Scaling Data Servers via Cooperative Caching

Problem: Scalability

0

100

200

300

400

500

600

700

800

0 20 40 60 80 100

Tim

e to

fini

sh R

ead

(sec

)

Unique nodes

40 MB readSFSShark

Very slow ≈ 775sMuch better !!! ≈ 88s (9x better)

Siddhartha Annapureddy Thesis Defense

Page 19: Scaling Data Servers via Cooperative Caching

Let’s design this new filesystem

ServerClient

Vanilla SFS

I Central server model

Siddhartha Annapureddy Thesis Defense

Page 20: Scaling Data Servers via Cooperative Caching

Scalability – Client-side caching

ServerClient

Cache

Large client cache a la AFS

I Whole file caching

I Scales by avoiding redundant data transfers

I Leases to ensure NFS-style cache consistency

Siddhartha Annapureddy Thesis Defense

Page 21: Scaling Data Servers via Cooperative Caching

Scalability – Cooperative caching

Clients fetch data from each other and offload burden from server

Client

Client

Client

Server

ClientClient

I With multiple clients, must address bandwidth concerns

Siddhartha Annapureddy Thesis Defense

Page 22: Scaling Data Servers via Cooperative Caching

Scalability – Cooperative caching

Clients fetch data from each other and offload burden from server

Client

Client

Client

Server

ClientClient

I Shark clients maintain distributed index

Siddhartha Annapureddy Thesis Defense

Page 23: Scaling Data Servers via Cooperative Caching

Scalability – Cooperative caching

Clients fetch data from each other and offload burden from server

Client

Client

Client

Server

ClientClient

I Shark clients maintain distributed index

I Fetch a file from multiple other clients in parallel

Siddhartha Annapureddy Thesis Defense

Page 24: Scaling Data Servers via Cooperative Caching

Scalability – Cooperative caching

Clients fetch data from each other and offload burden from server

Client

Client

Client

Server

ClientClient

I Shark clients maintain distributed index

I Fetch a file from multiple other clients in parallel

I Read-heavy workloads – Writes still go to server

Siddhartha Annapureddy Thesis Defense

Page 25: Scaling Data Servers via Cooperative Caching

P2P systems – Themes

I Usability – Exploiting P2P for a range of systems

I Cross file(system) sharing – Use other groups

I Security issues in such wide-area P2P systems

I Topology management – Arrange nodes efficiently

I Scheduling dissemination of chunks efficiently

Siddhartha Annapureddy Thesis Defense

Page 26: Scaling Data Servers via Cooperative Caching

Cross file(system) sharing – Application

Linux distribution with LiveCDs

Sharkcd

Sharkcd

Sharkcd

Sharkcd

Knoppix

Knoppix Mirror

Knoppix

Knoppix

Slax

Slax

Slax

I LiveCD – Run an entire OS without using hard diskI But all your programs must fit on a CD-ROMI Download dynamically from server but scalability problemsI Knoppix and Slax users form global cache – Relieve servers

Siddhartha Annapureddy Thesis Defense

Page 27: Scaling Data Servers via Cooperative Caching

Cross file(system) sharing

Global cooperative cache regardless of origin servers

Client

ClientClient

Client

Client

Client

ClientClient

Client

Client

Slax Knoppix

I Two groups of clients accessing Slax/Knoppix servers

I Client groups share a large amount of software

I Such clients automatically form a global cache

Siddhartha Annapureddy Thesis Defense

Page 28: Scaling Data Servers via Cooperative Caching

Cross file(system) sharing

Global cooperative cache regardless of origin servers

Client

ClientClient

Client

Client

Client

ClientClient

Client

Client

KnoppixSlax

I Two groups of clients accessing Slax/Knoppix servers

I Client groups share a large amount of software

I Such clients automatically form a global cache

Siddhartha Annapureddy Thesis Defense

Page 29: Scaling Data Servers via Cooperative Caching

Cross file(system) sharing

Global cooperative cache regardless of origin servers

Client

ClientClient

Client

Client

Client

ClientClient

Client

Client

KnoppixSlax

I Two groups of clients accessing Slax/Knoppix servers

I Client groups share a large amount of software

I BitTorrent supports only per-file groups

Siddhartha Annapureddy Thesis Defense

Page 30: Scaling Data Servers via Cooperative Caching

Content-based chunking

Client

Client

Client

Server

ClientClient

I Single overlay for all filesystems and content-based chunksI Chunks – Variable-sized blocks based on content

I Chunks are better – Exploit commonalities across filesI LBFS-style chunks preserved across versions, concatenations

[Muthitacharoen et al. A Low-Bandwidth Network File System. SOSP ’01]

Siddhartha Annapureddy Thesis Defense

Page 31: Scaling Data Servers via Cooperative Caching

P2P systems – Themes

I Usability – Exploiting P2P for a range of systems

I Cross file(system) sharing – Use other groups

I Security issues in such wide-area P2P systems

I Topology management – Arrange nodes efficiently

I Scheduling dissemination of chunks efficiently

Siddhartha Annapureddy Thesis Defense

Page 32: Scaling Data Servers via Cooperative Caching

Security issues – Client authentication

Client Client

Client

Server

Client

Client

TraditionalAuthentication

I Traditionally, server authenticated read requests using uids

I Challenge – Interaction between distrustful clients

Siddhartha Annapureddy Thesis Defense

Page 33: Scaling Data Servers via Cooperative Caching

Security issues – Client authentication

Client Client

Client

Server

Client

Client

TraditionalAuthentication From Token

Authenticator Derived

I Traditionally, server authenticated read requests using uids

I Challenge – Interaction between distrustful clients

Siddhartha Annapureddy Thesis Defense

Page 34: Scaling Data Servers via Cooperative Caching

Security issues – Client communication

I Client should be able to check integrity of downloaded chunk

I Client should not send chunks to other unauthorized clients

I An eavesdropper shouldn’t be able to obtain chunk contents

I Accomplish this without a complex PKI

I Keep the number of rounds to a minimum

Siddhartha Annapureddy Thesis Defense

Page 35: Scaling Data Servers via Cooperative Caching

File metadata

Client ServerMetadata:(fh,offset,count)GET_TOKEN

Chunk tokens

I Possession of token implies server permissions to read

I Tokens are a shared secret between authorized clients

I Tokens can be used to check integrity of fetched data

Given a chunk B...

I Chunk token TB = H(B)

I H is a collision resistant hash function

Siddhartha Annapureddy Thesis Defense

Page 36: Scaling Data Servers via Cooperative Caching

File metadata

Client ServerMetadata:(fh,offset,count)GET_TOKEN

Chunk tokens

I Possession of token implies server permissions to read

I Tokens are a shared secret between authorized clients

I Tokens can be used to check integrity of fetched data

Given a chunk B...

I Chunk token TB = H(B)

I H is a collision resistant hash function

Siddhartha Annapureddy Thesis Defense

Page 37: Scaling Data Servers via Cooperative Caching

File metadata

Client ServerMetadata:(fh,offset,count)GET_TOKEN

Chunk tokens

I Possession of token implies server permissions to read

I Tokens are a shared secret between authorized clients

I Tokens can be used to check integrity of fetched data

Given a chunk B...

I Chunk token TB = H(B)

I H is a collision resistant hash function

Siddhartha Annapureddy Thesis Defense

Page 38: Scaling Data Servers via Cooperative Caching

File metadata

Client ServerMetadata:(fh,offset,count)GET_TOKEN

Chunk tokens

I Possession of token implies server permissions to read

I Tokens are a shared secret between authorized clients

I Tokens can be used to check integrity of fetched data

Given a chunk B...

I Chunk token TB = H(B)

I H is a collision resistant hash function

Siddhartha Annapureddy Thesis Defense

Page 39: Scaling Data Servers via Cooperative Caching

File metadata

Client ServerMetadata:(fh,offset,count)GET_TOKEN

Chunk tokens

I Possession of token implies server permissions to read

I Tokens are a shared secret between authorized clients

I Tokens can be used to check integrity of fetched data

Given a chunk B...

I Chunk token TB = H(B)

I H is a collision resistant hash function

Siddhartha Annapureddy Thesis Defense

Page 40: Scaling Data Servers via Cooperative Caching

Security protocol

Client

Receiver

Client

Source

I RC,RP – Random nonces to ensure freshness

I AuthC – Authenticator to prove receiver has tokenI AuthC = MAC (TB, “Auth C”, C,P,RC,RP)

I K – Key to encrypt chunk contentsI K = MAC (TB, “Encryption”, C,P,RC,RP)

Siddhartha Annapureddy Thesis Defense

Page 41: Scaling Data Servers via Cooperative Caching

Security protocol

Client

Receiver

Client

Source

RC

RP

I RC,RP – Random nonces to ensure freshness

I AuthC – Authenticator to prove receiver has tokenI AuthC = MAC (TB, “Auth C”, C,P,RC,RP)

I K – Key to encrypt chunk contentsI K = MAC (TB, “Encryption”, C,P,RC,RP)

Siddhartha Annapureddy Thesis Defense

Page 42: Scaling Data Servers via Cooperative Caching

Security protocol

Client

Receiver

Client

Source

RC

RP

IB, AuthC

EK(B)

I RC,RP – Random nonces to ensure freshness

I AuthC – Authenticator to prove receiver has tokenI AuthC = MAC (TB, “Auth C”, C,P,RC,RP)

I K – Key to encrypt chunk contentsI K = MAC (TB, “Encryption”, C,P,RC,RP)

Siddhartha Annapureddy Thesis Defense

Page 43: Scaling Data Servers via Cooperative Caching

Security protocol

Client

Receiver

Client

Source

RC

RP

IB, AuthC

EK(B)

I RC,RP – Random nonces to ensure freshness

I AuthC – Authenticator to prove receiver has tokenI AuthC = MAC (TB, “Auth C”, C,P,RC,RP)

I K – Key to encrypt chunk contentsI K = MAC (TB, “Encryption”, C,P,RC,RP)

Siddhartha Annapureddy Thesis Defense

Page 44: Scaling Data Servers via Cooperative Caching

Security properties

Client

Receiver

Client

Source

RC

RP

IB, AuthC

EK(B)

Client can check integrity of downloaded chunk

I Client checks H(Downloaded chunk)?= TB

Source should not send chunks to unauthorized clients

I Malicious clients cannot send correct AuthC

Eavesdropper shouldn’t get chunk contents

I All communication encrypted with K

Siddhartha Annapureddy Thesis Defense

Page 45: Scaling Data Servers via Cooperative Caching

P2P systems – Themes

I Usability – Exploiting P2P for a range of systems

I Cross file(system) sharing – Use other groups

I Security issues in such wide-area P2P systems

I Topology management – Arrange nodes efficiently

I Scheduling dissemination of chunks efficiently

Siddhartha Annapureddy Thesis Defense

Page 46: Scaling Data Servers via Cooperative Caching

Topology management

Client Client

Client

Server

Client

Client

Fetch Metadata

I Fetch metadata from the server

I Look up clients caching needed chunks in overlay

I Overlay is common to all file systems

I Overlay organized like a DHT – Get and PutSiddhartha Annapureddy Thesis Defense

Page 47: Scaling Data Servers via Cooperative Caching

Topology management

Client Client

Client

Server

Client

Client

Lookup Clients

I Fetch metadata from the server

I Look up clients caching needed chunks in overlay

I Overlay is common to all file systems

I Overlay organized like a DHT – Get and PutSiddhartha Annapureddy Thesis Defense

Page 48: Scaling Data Servers via Cooperative Caching

Topology management

Client Client

Client

Server

Client

Client

Lookup Clients

I Fetch metadata from the server

I Look up clients caching needed chunks in overlay

I Overlay is common to all file systems

I Overlay organized like a DHT – Get and PutSiddhartha Annapureddy Thesis Defense

Page 49: Scaling Data Servers via Cooperative Caching

Topology management

Client Client

Client

Server

Client

Client

Lookup Clients

I Fetch metadata from the server

I Look up clients caching needed chunks in overlay

I Overlay is common to all file systems

I Overlay organized like a DHT – Get and PutSiddhartha Annapureddy Thesis Defense

Page 50: Scaling Data Servers via Cooperative Caching

Overlay – Get operation

Client Client

Client

Server

Client

Client

Discover Clients

I For every chunk B, there’s indexing key IBI IB used to index clients caching BI Cannot set IB = TB, as TB is a secret

I IB = MAC (TB, “Indexing Key”)

Siddhartha Annapureddy Thesis Defense

Page 51: Scaling Data Servers via Cooperative Caching

Overlay – Put operation

Register as a source

I Client now becomes a source for the downloaded chunk

I Client registers in distributed index – PUT(IB,Addr)

Chunk Reconciliation

I Reuse TCP connection to download more chunks

I Exchange mutually needed chunks w/o indexing overhead

Siddhartha Annapureddy Thesis Defense

Page 52: Scaling Data Servers via Cooperative Caching

Overlay – Locality awareness

Client

Client

Client

Client

Network A Network B

I Overlay organized as clusters based on latency

I Preferentially returns sources in same cluster as the client

I Hence, chunks usually transferred from nearby clients

[Freedman et al. Democratizing content publication with Coral. NSDI ’04]

Siddhartha Annapureddy Thesis Defense

Page 53: Scaling Data Servers via Cooperative Caching

Topology management – Shark vs. BitTorrent

BitTorrent

I Neighbours “fixed”

I Makes cross file sharing difficult

Shark

I Chunk-based overlay

I Neighbours based on data

I Enables cross file sharing

Server

Client

ClientClient

Client

Siddhartha Annapureddy Thesis Defense

Page 54: Scaling Data Servers via Cooperative Caching

Evaluation

I How does Shark compare with SFS? With NFS?

I How scalable is the server?

I How fair is Shark across clients?

I Which order is better? Random or Sequential

Siddhartha Annapureddy Thesis Defense

Page 55: Scaling Data Servers via Cooperative Caching

Emulab – 100 Nodes on LAN

0

20

40

60

80

100

0 100 200 300 400 500 600 700 800

Perc

enta

ge c

ompl

eted

with

in ti

me

Time since initialization (sec)

40 MB readShark, rand, negotiationNFSSFS

I Shark – 88sI SFS – 775s (≈ 9x better), NFS – 350s (≈ 4x better)I SFS less fair because of TCP backoffs

Siddhartha Annapureddy Thesis Defense

Page 56: Scaling Data Servers via Cooperative Caching

PlanetLab – 185 Nodes

0

20

40

60

80

100

0 500 1000 1500 2000 2500

Perc

enta

ge c

ompl

eted

with

in ti

me

Time since initialization (sec)

40 MB readSharkSFS

I Shark ≈ 7 min – 95th Percentile

I SFS ≈ 39 min – 95th Percentile (5x better)

I NFS – Triggered kernel panics in server

Siddhartha Annapureddy Thesis Defense

Page 57: Scaling Data Servers via Cooperative Caching

Data pushed by server

SFS Shark0.0

0.5

1.0

Nor

mal

ized

ban

dwid

th

7400

954

I Shark vs SFS – 23 copies vs 185 copies (8x better)I Overlay not responsive enoughI Torn connections, timeouts force going to server

Siddhartha Annapureddy Thesis Defense

Page 58: Scaling Data Servers via Cooperative Caching

Data served by Clients

0

20

40

60

80

100

120

140

0 20 40 60 80 100 120 140 160 180

Band

widt

h tra

nsm

itted

(MB)

Unique Shark proxies

PlanetLab hostsShark, 40MB

I Maximum contribution ≈ 3.5 copies

I Median contribution ≈ 0.75 copies

Siddhartha Annapureddy Thesis Defense

Page 59: Scaling Data Servers via Cooperative Caching

P2P systems – Themes

I Usability – Exploiting P2P for a range of systems

I Cross file(system) sharing – Use other groups

I Security issues in such wide-area P2P systems

I Topology management – Arrange nodes efficiently

I Scheduling dissemination of chunks efficiently

Siddhartha Annapureddy Thesis Defense

Page 60: Scaling Data Servers via Cooperative Caching

Fetching Chunks – Order Matters

I Scheduling problem – What order to fetch chunks of file?

I Natural choices – Random or Sequential

Intuitively, when many clients start simultaneouslyI Random

I All clients fetch independent chunksI More chunks become available in the cooperative cache

I SequentialI Better disk I/O scheduling on the serverI Client that downloads most chunks alone adds to cache

Siddhartha Annapureddy Thesis Defense

Page 61: Scaling Data Servers via Cooperative Caching

Emulab – 100 Nodes on LAN

0

20

40

60

80

100

0 100 200 300 400 500 600 700 800

Perc

enta

ge c

ompl

eted

with

in ti

me

Time since initialization (sec)

40 MB readShark, randShark, seq

I Random – 133sI Sequential – 203sI Random Wins !!! – 35% better

Siddhartha Annapureddy Thesis Defense

Page 62: Scaling Data Servers via Cooperative Caching

Performance characteristics of workloads

I For Shark, what matters is throughputI Operation should finish as soon as

possible

Utilit

y →

Throughput →

I Another interesting class of applicationsI Video streaming – RedCarpet

Utilit

y →

Throughput →

I Study the scheduling problem for Video-on-DemandI Press a button, wait a little while, and watch playback

Siddhartha Annapureddy Thesis Defense

Page 63: Scaling Data Servers via Cooperative Caching

Challenges for VoD

I Near VoDI Small setup time to play

videos

I High sustainable goodputI Largest slope osculating block

arrival curveI Highest video encoding rate

system can support

Goal: Do block scheduling to achieve low setup time and highgoodput

Siddhartha Annapureddy Thesis Defense

Page 64: Scaling Data Servers via Cooperative Caching

System design – Outline

I Components based on BitTorrentI Central server (tracker + seed)I Users interested in the data

I Central serverI Maintains a list of nodes

I Each node finds neighboursI Contacts the server for thisI Another option – Use gossip

protocols

I Exchange data with neighboursI Connections are bi-directional

Siddhartha Annapureddy Thesis Defense

Page 65: Scaling Data Servers via Cooperative Caching

Simulator – Outline

I All nodes have unit bandwidth capacityI 500 nodes in most of our experiments

I Each node has 4-6 neighboursI Simulator is round-based

I Block exchanges done at each roundI System moves as a whole to the next round

I File divided into 250 blocksI Segment is a set of consecutive blocksI Segment size = 10

I Maximum possible goodput – 1 block/round

Siddhartha Annapureddy Thesis Defense

Page 66: Scaling Data Servers via Cooperative Caching

Why 95th percentile?

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80

50

100

150

200

250

300

350

400

450

500

Goodput (in blocks/round)

Node

s

I (x, y)I y nodes have a

goodput atleast x

I Red line at topshows 95th

percentile

I 95th percentile agood indicator ofgoodput

I Most nodeshave at leastthat goodput

Siddhartha Annapureddy Thesis Defense

Page 67: Scaling Data Servers via Cooperative Caching

Feasibility of VoD – What does block/round mean?

I 0.6 blocks/round with 35 rounds of setup time

I Two independent parametersI 1 round ≈ 10 secondsI 1 block/round ≈ 512 kbps

I Consider 35 rounds a good setup timeI 35 rounds ≈ 6 min

I Goodput = 0.6 blocks/roundI Max encoding rate = 0.6 ∗ 512 > 300 kbps

I Size of video = 250/0.6 ≈ 70 min

Siddhartha Annapureddy Thesis Defense

Page 68: Scaling Data Servers via Cooperative Caching

Naıve scheduling policies

I Segment-Random policyI Divide file into segmentsI Fetch blocks in random order inside same segmentI Download segments in order

Siddhartha Annapureddy Thesis Defense

Page 69: Scaling Data Servers via Cooperative Caching

Naıve approaches – Random

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Setup Time (in rounds)

Goo

dput

(blo

cks/

roun

d)

I Each node fetches ablock at random

I Throughput – Highas nodes fetchdisjoint blocks

I More opportunitiesfor blockexchanges

I Goodput – Low asnodes do not getblocks in order

I 0 blocks/round

Siddhartha Annapureddy Thesis Defense

Page 70: Scaling Data Servers via Cooperative Caching

Naıve approaches – Sequential

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Setup Time (in rounds)

Goo

dput

(blo

cks/

roun

d)

0.2 blocks/round

SequentialRandom

I Each node fetches blocks in order of playbackI Throughput – Low as fewer opportunities for exchange

I Need to increase this

Siddhartha Annapureddy Thesis Defense

Page 71: Scaling Data Servers via Cooperative Caching

Naıve approaches – Segment-Random policy

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Setup Time (in rounds)

Goo

dput

(blo

cks/

roun

d)

0.4 blocks/round

Segment-RandomSequentialRandom

I Segment-RandomI Fetch random block from the segment the earliest block falls in

I Increases the number of blocks propagating in the system

Siddhartha Annapureddy Thesis Defense

Page 72: Scaling Data Servers via Cooperative Caching

NetCoding – How it helps?

I A has blocks 1 and 2

I B gets 1 or 2 withequal prob. from A

I C gets 1 in parallel

I If B downloaded 1,link B-C becomesuseless

I Network coding routinely sends 1⊕ 2

Siddhartha Annapureddy Thesis Defense

Page 73: Scaling Data Servers via Cooperative Caching

Network coding – Mechanics

I Coding over blocksB1,B2, . . . ,Bn

I Choose coefficientsc1, c2, . . . , cn from finitefield

I Generate encoded block:E1 =

∑ni=1 ci × Bi

I Without n coded blocks, decoding cannot be doneI Setup time at least the time to fetch a segment

[Gkantsidis et al. Network Coding for Large Scale Content Distribution. InfoCom ’05]

Siddhartha Annapureddy Thesis Defense

Page 74: Scaling Data Servers via Cooperative Caching

Benefits of network coding

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Setup Time (in rounds)

Goo

dput

(blo

cks/

roun

d)

Segment-NetcodingSegment-RandomSequentialRandom

I GoodputI ≈ 0.6 blocks/round with netcodingI 0.4 blocks/round with seg-rand

Siddhartha Annapureddy Thesis Defense

Page 75: Scaling Data Servers via Cooperative Caching

Segment scheduling

I Netcoding good for fetching blocks in a segment, but cannotbe used across segments

I Because decoding requires all encoded blocks

I Problem: How to schedule fetching of segments?I Until now, sequential segment schedulingI Results in rare segments when nodes arrive at different times

I Segment scheduling algorithm to avoid rare segmentsI Evaluation with C# prototype

Siddhartha Annapureddy Thesis Defense

Page 76: Scaling Data Servers via Cooperative Caching

Segment scheduling

I A has 75% of the file

I Flash crowd of 20 Bs joinwith empty caches

I Server shared between Aand Bs

I Throughput to Adecreases, as server isbusy serving Bs

I Initial segments servedrepeatedly

I Throughput of Bs lowbecause the samesegments are fetchedfrom server

Siddhartha Annapureddy Thesis Defense

Page 77: Scaling Data Servers via Cooperative Caching

Segment scheduling – Problem

I 8 segments in the video; A has 75% = 6 segments, Bs havenone

I Green is popularity 2, Red is popularity 1I Popularity = # full copies in the system

Siddhartha Annapureddy Thesis Defense

Page 78: Scaling Data Servers via Cooperative Caching

Segment scheduling – Problem

I Instead of serving A...

Siddhartha Annapureddy Thesis Defense

Page 79: Scaling Data Servers via Cooperative Caching

Segment scheduling – Problem

I The server gives segments to Bs

Siddhartha Annapureddy Thesis Defense

Page 80: Scaling Data Servers via Cooperative Caching

Segment scheduling – Problem

0

10

20

30

40

50

60

70

80

90

100

0 25 50 75 100 125 150 175 200 225 250

# Da

ta B

lock

s

Time (in sec)

Naive -- #Blks Downloaded (A)Naive -- #Blks Downloaded (B1..n)

I A

I Bs

I A’s goodput plummetsI Get the best of both worlds – Improve A and Bs

I Idea: Each node serves only rarest segments in system

Siddhartha Annapureddy Thesis Defense

Page 81: Scaling Data Servers via Cooperative Caching

Segment scheduling – Algorithm

I When dst node connects to the src node...I Here: dst is B, src is server

Siddhartha Annapureddy Thesis Defense

Page 82: Scaling Data Servers via Cooperative Caching

Segment scheduling – Algorithm

I src node sorts segments in order of popularityI Segments 7, 8 least popular at 1I Segments 1-6 equally popular at 2

Siddhartha Annapureddy Thesis Defense

Page 83: Scaling Data Servers via Cooperative Caching

Segment scheduling – Algorithm

I src considers segments in sorted order one-by-one and servesdst

I Either completely available at src, orI First segment required by dst

Siddhartha Annapureddy Thesis Defense

Page 84: Scaling Data Servers via Cooperative Caching

Segment scheduling – Algorithm

I Server injects blocks from segment needed by A into systemI Avoids wasting bandwidth in serving initial segments multiple

times

Siddhartha Annapureddy Thesis Defense

Page 85: Scaling Data Servers via Cooperative Caching

Segment scheduling – Popularities

I How does src figure out popularities?I Centrally available at the server

I Our implementation uses this technique

I Each node maintains popularities for segmentsI Could use a gossiping protocol for aggregation

Siddhartha Annapureddy Thesis Defense

Page 86: Scaling Data Servers via Cooperative Caching

Segment scheduling

0

10

20

30

40

50

60

70

80

90

100

0 25 50 75 100 125 150 175 200 225 250

# Da

ta B

lock

s

Time (in sec)

Naive -- #Blks Downloaded (A)Naive -- #Blks Downloaded (B1..n)

#Blks Downloaded (A)#Blks Downloaded (B1..n)

I Note that goodput of both A and Bs improvesSiddhartha Annapureddy Thesis Defense

Page 87: Scaling Data Servers via Cooperative Caching

Conclusions

I Problem: Designing scalable data-intensive applicationsI Enabled low-end servers to scale to large numbers of users

I Shark: Scales file server for read-heavy workloadsI RedCarpet: Scales video server for near-Video-on-Demand

I Exploited P2P for file systems, solved security issuesI Techniques to improve scalability

I Cross file(system) sharingI Topology management – Data-based, locality awarenessI Scheduling of chunks – Network coding

Siddhartha Annapureddy Thesis Defense