45
Content Distribution Networks CPE 401 / 601 Computer Network Systems d from Ravi Sundaram, Janardhan R. Iyengar, and others

Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

Embed Size (px)

Citation preview

Page 1: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

Content Distribution Networks

CPE 401 / 601Computer Network Systems

Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

Page 2: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

Content and Internet TrafficInternet traffic:

1. Shifts seismically (emailFTPWebP2Pvideo)2. Has many small/unpopular and few large/popular flows

Zipf popularity distribution, 1/k Shows up as a line on log-log plot

Page 3: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

Server Farms and Web Proxies (1)

Server farms enable large-scale Web servers: Front-end load-balances requests over servers Servers access the same backend database

Page 4: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

Server Farms and Web Proxies (2)

Proxy caches help organizations to scale the Web Caches server content over clients for performance implements organization policies (e.g., access)

Page 5: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

HTTP Caching

Clients often cache documents When should origin be checked for changes? Every time? Every session? Date?

HTTP includes caching information in headers HTTP 0.9/1.0 used: “Expires: <date>”; “Pragma: no-cache” HTTP/1.1 has “Cache-Control”

• “No-Cache”, “Private”, “Max-age: <seconds>”• “E-tag: <opaque value>”

If not expired, use cached copy If expired, use condition GET request to origin

“If-Modified-Since: <date>”, “If-None-Match: <etag>” 304 (“Not Modified”) or 200 (“OK”) response

5

Page 6: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

Web Proxy Caches

User configures browser: Web accesses via cache

Browser sends all HTTP requests to cache Object in cache: cache

returns object Else: cache requests object

from origin, then returns to client

6

client

Proxyserver

client

HTTP request

HTTP request

HTTP response

HTTP response

HTTP request

HTTP response

origin server

origin server

Page 7: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

Caching Example (1)Assumptions Average object size = 100K bits Avg. request rate from browsers to

origin servers = 20/sec Delay from institutional router to

any origin server and back to router = 2 sec

Consequences Utilization on LAN = 20% Utilization on access link = 100% Total delay = Internet delay +

access delay + LAN delay = 2 sec + minutes + milliseconds

7

originservers

public Internet

institutionalnetwork 10 Mbps LAN

1.5 Mbps access link

Page 8: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

Caching Example (2)Possible Solution Increase bandwidth of access

link to, say, 10 Mbps Often a costly upgrade

Consequences Utilization on LAN = 20% Utilization on access link = 20% Total delay = Internet delay +

access delay + LAN delay = 2 sec + milliseconds

8

originservers

public Internet

institutionalnetwork 10 Mbps LAN

10 Mbps access link

Page 9: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

Caching Example (3)Install Cache Support hit rate is 60%

Consequences 60% requests satisfied almost

immediately (say 10 msec) 40% requests satisfied by origin Utilization of access link down to

53%, yielding negligible delays Weighted average of delays = .6*2 s + .4*10 ms < 1.3 s

9

originservers

public Internet

institutionalnetwork 10 Mbps LAN

1.5 Mbps access link

institutionalcache

Page 10: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

When a single cache isn’t enough What if the working set is > proxy disk?

Cooperation!

A static hierarchy Check local If miss, check siblings If miss, fetch through parent

Internet Cache Protocol (ICP) ICPv2 in RFC 2186 (& 2187) UDP-based, short timeout

10

public Internet

Parent

web cache

Page 11: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

Problems with discussed approaches:Server farms and Caching proxies

• Server farms do nothing about problems due to network congestion, or to improve latency issues due to the network

• Caching proxies serve only their clients, not all users on the Internet

• Content providers (say, Web servers) cannot rely on existence and correct implementation of caching proxies

• Accounting issues with caching proxies. For instance, www.cnn.com needs to know the number of hits to the webpage for advertisements displayed on the webpage

Page 12: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

Problems

Significant fraction (>50% ) of HTTP objects un-cacheable

Sources of dynamism? Dynamic data: Stock prices, scores, web cams CGI scripts: results based on passed parameters Cookies: results may be based on passed data SSL: encrypted data is not cacheable Advertising / analytics: owner wants to measure # hits

• Random strings in content to ensure unique counting

But…most dynamic content is small, while static content is large (images, video, .js, .css, etc.)

12

Page 13: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

Content Distribution Networks (CDNs) Content providers are CDN customers

Content replication CDN company installs thousands of

servers throughout Internet In large datacenters or, close to users

CDN replicates customers’ content When provider updates content,

CDN updates servers

13

origin server

in North America

CDN distribution node

CDN server

in S. America CDN server

in Europe

CDN server

in Asia

Page 14: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

Internet

ContentProviders

EndUsers

The Web:Simple on the Outside…

Page 15: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

NAP

NAP

UUNet

Qwest

AOL

Network Providers

ContentProviders

EndUsers

PeeringPoints

…But Problematic on the Inside

Page 16: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

Why does my click not work

Latency - Browser takes a long time to load the page

Packet Loss - Browser hangs, user needs to hit refresh

Jitter - Streams are jerky Server load - Browser connects but

does not fully load the page Broken/missing content

Page 17: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

The Akamai SolutionServers

at Network EdgeContent

ProvidersEnd

Users

NAP

NAP

Page 18: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

Benefits of CDNs

Improved end-user experience reduce latency reduce loss reduce jitter

Reduced network congestion Increased scalability Improved fault-tolerance Reduced vulnerability Reduced costs

Page 19: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

• Caches are used by ISPs to reduce bandwidth consumption, CDNs are used by content providers to improve quality of service to end users

• Caches are reactive, CDNs are proactive

• Caching proxies cater to their users (web clients) and not to content providers (web servers),

• CDNs cater to the content providers (web servers) and clients

• CDNs give control over the content to the content providers, caching proxies do not

CDN vs. Caching Proxies

Page 20: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

CDNs – Content Delivery Networks (1)

CDNs scale Web servers by having clients get content from a nearby CDN node (cache)

Page 21: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

Content Delivery Networks (2)

Directing clients to nearby CDN nodes with DNS: Client query returns local CDN node as response Local CDN node caches content for nearby clients and reduces load on

the origin server

Page 22: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

Content Delivery Networks (3)Origin server rewrites pages to serve content via CDN

Page that distributes content via CDN

Traditional Web page on server

Page 23: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

Surrogate Surrogate

Request Routing

Infrastructure

Distributionand

Accounting Infrastructure

CDN

CDN Architecture

Origin Server

Client Client

Page 24: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

• Content Delivery Infrastructure: Delivering content to clients from surrogates

• Request Routing Infrastructure: Steering or directing content request from a client to a suitable surrogate

• Distribution Infrastructure: Moving or replicating content from content source (origin server, content provider) to surrogates

• Accounting Infrastructure: Logging and reporting of distribution and delivery activities

CDN Components

Page 25: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

Server Interaction with CDN

DistributionInfrastructure

1

1. Origin server pushes new content to CDN OR CDN pulls content from origin server

Accounting Infrastructure

2

2. Origin server requests logs and other accounting info from CDN OR CDN provides logs and other accounting info to origin server

CDN

Origin Server

www.cnn.com

Page 26: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

Request Routing

Infrastructure

Client Interaction with CDN

1

1. Hi! I need www.cnn.com/sochi

2

2. Go to surrogate delaware.cnn.akamai.com

3

3. Hi! I need content /sochi

Q:How did the CDN choose the Delaware surrogate over the California surrogate ?

Client

Surrogate(DE)

Surrogate(CA)

CDNcalifornia.cnn.akamai.com

delaware.cnn.akamai.com

Page 27: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

Content Distribution Networks & Server Selection Replicate content on many servers

Challenges How to replicate content Where to replicate content How to find replicated content How to choose among known replicas How to direct clients towards replicas

27

Page 28: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

Server Selection

Which server? Lowest load: to balance load on servers Best performance: to improve client performance

• Based on Geography? RTT? Throughput? Load? Any alive node: to provide fault tolerance

How to direct clients to a particular server? As part of routing: anycast, cluster load balancing As part of application: HTTP redirect As part of naming: DNS

28

Page 29: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

Trade-offs between approaches

Routing based (IP anycast) Pros: Transparent to clients, circumvents many routing issues Cons: Little control, complex, scalability

Application based (HTTP redirects) Pros: Application-level, fine-grained control Cons: Additional load and RTTs, hard to cache

Naming based (DNS selection) Pros: Well-suitable for caching, reduce RTTs Cons: Request by resolver not client, request for domain not

URL, hidden load factor of resolver’s population• Much of this data can be estimated “over time”

29

Page 30: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

DNS based Request-Routing

• Common due to the ubiquity of DNS as a directory service

• Specialized DNS server inserted in DNS resolution process

• DNS server is capable of returning a different set of A, NS or CNAME records based on policies/metrics

Page 31: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

DNS based Request-Routing

Akamai DNS

DN

S q

uery

:w

ww

.cnn

.com

DN

S r

espo

nse:

A 1

45.1

55.1

0.15

Sess

ion

local DNS server (louie.udel.edu)128.4.4.12

DNS query:www.cnn.com

DNS response:A 145.155.10.15

www.cnn.com

Surrogate145.155.10.15

Surrogate58.15.100.152

AkamaiCDN

merlot.cis.udel.edu

128.4.30.15

delaware.cnn.akamai.com

california.cnn.akamai.com

Q:How does the Akamai DNS know which surrogate is closest ?

Page 32: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

DNS based Request-Routing

DN

S q

uery

DN

S r

espo

nse

Sess

ion

Akamai DNS

www.cnn.com

Surrogate

Surrogate

AkamaiCDN

merlot.cis.udel.edu

128.4.30.15

local DNS server (louie.udel.edu)

128.4.4.12

DNS query

DNS response

Measure

to

Client D

NS

Measure to Client DNS

Measurement results

Measurem

ent re

sults

Mea

surem

ents

Measurements

Page 33: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

What is the Mapping Problem

Problem of directing requests to servers so as to optimize end-user experience reduce latency reduce loss reduce jitter

Assumption - servers are fine Applicable to 2 mirrors or 1000s of

Akamai locations

Page 34: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

Server Selection Metrics

• Network Proximity (Surrogate to Client): - Network hops (traceroute)- Internet mapping services (NetGeo, IDMaps)- …

• Surrogate Load: - Number of active TCP connections- HTTP request arrival rate- Other OS metrics- …

• Bandwidth Availability

• …

Page 35: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

Attempt

Measure which is closer Closeness changes over time

Measure frequently Bothers people Too many to do

~500,000 unique nameservers on any given day10 sec per measurement cycle

Page 36: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

Idea

Topology relatively static changes in BGP time order of hours if not days

Congestion dynamic changes in round-trip time order of milliseconds

Page 37: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

Set cover

Let sets represent proxy points Let elements represent nameservers Find minimum collection of proxy points

covering nameserversX covers 1, 2, 3 and 4

X

1 2 3 4

Page 38: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

Topology Discovery

At each mirror maintain list of partial paths to nameservers

At each epoch extend paths by 1, in randomized fashion, and exchange with other mirror

If the two (partial) paths to a namerver have intersected then declare that nameserver done.

If path has reached forbidden IP then wait Use pair of proxies in case of failure

Page 39: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

Topology Discovery 500,000 nameservers

reduced to

90,000 proxy points (clusters)

Problem - Still too many measurements to do. 90,000 measurements every 10s with 32B packets requires a few Mbps per mirror. Solution - Importance based sampling• 7,000 account for 95% end-user load!

Maps built every 10s

Page 40: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

Value of a CDN

• Scale: Aggregate infrastructure size

• Reach: Diversity of content locations (diverse placement of surrogates)

• Request routing efficiency, delivery techniques

Page 41: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

How Akamai Works

Clients fetch html document from primary server E.g. fetch index.html from cnn.com

URLs for replicated content are replaced in HTML E.g. <img src=“http://cnn.com/af/x.gif”> replaced with

<img src=http://a73.g.akamai.net/7/23/cnn.com/af/x.gif> Or, cache.cnn.com, and CNN adds CNAME (alias) for

cache.cnn.com a73.g.akamai.net

Client resolves aXYZ.g.akamaitech.net hostname Maps to a server in one of Akamai’s clusters

42

Page 42: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

How Akamai Works Akamai only replicates static content

At least, simple version. Akamai also lets sites write code that run on their servers, but that’s a pretty different beast

Modified name contains original file name Akamai server is asked for content

1. Checks local cache2. Check other servers in local cluster3. Otherwise, requests from primary server and cache file

CDN is a large-scale, distributed network Akamai has ~25K servers spread over ~1K clusters world-wide Why do you want servers in many different locations? Why might video distribution architectures be different?

43

Page 43: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

How Akamai Works Root server gives NS record for akamai.net

This nameserver returns NS record for g.akamai.net Nameserver chosen to be in region of client’s name server TTL is large

g.akamai.net nameserver chooses server in region Should try to chose server that has file in cache Uses aXYZ name and hash TTL is small Small modification to before:

• CNAME cache.cnn.com cache.cnn.com.akamaidns.net• CNAME cache.cnn.com.akamaidns.net a73.g.akamai.net

44

Page 44: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

How Akamai Works – Already Cached

End-user

45

cnn.com (content provider) DNS root server Akamai server

1 2

Akamai high-level DNS server

Akamai low-level DNS server

Nearby

hash-chosenAkamai

server

7

8

9

10

GET index.html

GET /cnn.com/foo.jpg

Page 45: Content Distribution Networks CPE 401 / 601 Computer Network Systems Modified from Ravi Sundaram, Janardhan R. Iyengar, and others

How Akamai Works

End-user

46

cnn.com (content provider) DNS root server Akamai server

1 2 3

4

Akamai high-level DNS server

Akamai low-level DNS server

Nearby

hash-chosenAkamai

server

11

67

8

9

10

GET index.html

GET /cnn.com/foo.jpg

12

GET foo.jpg

5