School of Computing National University of Singaporetbma/teaching/cs4226y15_past/07_CDN.pdf · School of Computing National University of Singapore Content Delivery Networks CS 4226:

Richard T. B. Ma

School of Computing

National University of Singapore

Content Delivery Networks

CS 4226: Internet Architecture

Motivation

Serving web content from one location scalability -- “flash crowd” problem

reliability

performance

Key ideas: cache content & serve requests from multiple severs at network edge reduce demand on site’s infrastructure

provide faster services to users

Web cache and caching proxy

Replication and load balancing

The middle mile problem

The last mile problem is solved by high levels of global broadband penetration

but imposes a new question of scale by demand

The first mile is easy in terms of Performance and reliability

Get stuck in the middle

Inside the Internet

Tier 1 ISP Tier 1 ISP

Large Content Distributor


IXP IXP

Tier 1 ISP

Tier 2 ISP

Tier 1 ISP Tier 1 ISP



IXP IXP

Tier 1 ISP

Tier 2 ISP

Tier 2 ISP

Tier 2 ISP

Tier 2 ISP Tier 2

ISP Tier 2 ISP

Tier 2 ISP

Tier 2 ISP

Inside the Internet

The middle mile problem

The last mile problem is solved by high levels of global broadband penetration

but imposes a new question of scale by demand

The first mile is easy in terms of Performance and reliability

Stuck in the middle, potential solutions: “Big data center” CDNs

Highly distributed CDNs

How about P2P?

The challenge

The fat file paradox though bits are transmitted at a speed of light,

distance between user and server is critical

latency and throughput are coupled due to TCP

Distance (Server to User)

Network RTT

Packet Loss

Throughput 4GB DVD Download Time

Local: <100 mi. 1.6 ms 0.6% 44 Mbps (high quality HDTV)

12 min.

Regional: 500–1,000 mi. 16 ms 0.7% 4 Mbps (basic HDTV)

2.2 hrs.

Cross-continent: ~3K mi. 48 ms 1.0% 1 Mbps (TV) 8.2 hrs.

Multi-continent: ~6K mi. 96 ms 1.4% 0.4 Mbps (poor)

20 hrs.

Major CDNs (by ‘15 revenue)

Limelight $174M $120M of CDN

Level 3 $8B $235M of CDN

tier-1 transit provider

Akamai $1.03B $700M of CDN

Amazon $6B $1.8B of CDN, but

big % on storage

cloud provider

EdgeCast $180M $125M of CDN

Fastly $60M $9M of CDN

Rest of smaller regional CDNs (MaxCDN, CDN77 etc.) $100M combined.

Major CDNs (by ‘15 revenue)

Highwinds $135M $95M of CDN

ChinaCache $270M $81M of CDN

also a cloud provider

Reference

Cheng Huang, Angela Wang, Jin Li and Keith W. Ross, “Measuring and Evaluating Large-Scale CDNs”, Internet Measurement Conference 2008.

Erik Nygren, Ramesh K. Sitaraman and Jennifer Sun, ” The Akamai Network: A Platform for High-erformance Internet Applications ,” ACM SIGOPS Operating Systems Review 44(3), July 2010.

How can we understand a CDN?

We don’t know their internal structures but could “infer” via a measurement approach

We know that CDNs use a DNS trick for example, end-user types www.youtube.com

resolve IP address via local DNS (LDNS) server

LDNS queries YouTube’s authoritative DNS

YouTube uses CDN if returns a CNAME like • a1105.b.akamai.net or move.vo.llnwd.net

• LDNS then queries CNAME’s authoritative DNS server and get the IP address of the content server

DNS records

DNS: distributed db storing resource records (RR)

Type=NS (Name Server) name is domain (e.g.,

foo.com)

value is hostname of authoritative name server for this domain

RR format: (name, value, type, ttl)

Type=A (Address) name is hostname value is IP address

Type=CNAME (Canonical NAME) name is alias name for some

“canonical” (the real) name www.ibm.com is really servereast.backup2.ibm.com

value is canonical name

Type=MX (Mail eXchange) value is name of mailserver

associated with name

Content Server Assignment

The returned content server will be close to the issuing local DNS (LDNS) server

Measurement Framework

Assumptions: CDN chooses nearby content server based on

the location of LDNS that originates the query

the same LDNS might get different content servers for the same query at different times

1. Determine all the CNAMEs of a CDN

2. Query a large number of LDNSs all over the world, at different times of the day, for all of the CNAMEs found in step 1

Finding CNAMEs and LDNSs

Find all the CNAMEs of a CDN use over 16 million web hostnames

a DNS query tells if it resolves to a CNAME

whether the CNAME belongs to the target CDN

thousands of CNAMEs for Akamai and Limelight

Locate a large # of distributed LDNSs need open recursive DNS servers

use over 7 million unique client IP addresses and over 16 million web hostnames

reverse DNS lookup and test trial DNS queries

Open recursive DNS servers

many different DNS servers map into same IP addresses

obtain 282,700 unique open recursive DNS servers

Measurement Platform

300 PlanetLab nodes, 3 DNS queries per second

more than 1 day for the measurement

The Akamai Network

Type (a): returns 2 IP addresses, different for different locations

hundreds of IPs behind a CNAME, ~11,500 content servers

Type (c): returns only 1 IP address; 20-100 IPs for each CNAME

guesses virtualization used for isolated environments

Type # of CNAMES

# of IPs Usage

(a) *.akamai.net 1964 ~11,500 conventional content distribution

(b) *.akadns.net 757 A few per CNAME

load balancing for customers who has their own networks

(c) *.akamaiedge.net 539 ~36,000 dynamic content distribution/secure service

The Akamai Network

~27K content servers, ~6K also run DNS

60% in the US, 90% in top 10 countries

flat distribution in ISPs: 15% in top 7

The Limelight Network

Easier as it is an Autonomous System (AS) obtain the IP addresses of the AS

only ~4K servers

Measuring performance

Two metrics availability: how reliable are the CDN servers?

delay: how fast content can be retrieved?

Performance results are controversial do the metrics sufficiently match overall

system performance goals?

how does performance metric map to specific customer performance perception?

both Akamai and Limelight issued statements to “correct” the research results

Availability

Monitor all servers for 2 months, ping once every hour

If a server does not respond in 2 consecutive hours, considered “down”

But does “down” server necessarily affect availability?

Delay

Different reasons: Number of content servers? Optimality (for delay) of routing?

More detailed delay comparison

Akamai’s statement

Availability cannot be reflected based on server uptime alone

Akamai’s CDN has more servers but not necessarily harder to maintain

The use of open-resolvers miss many Akamai servers, hence over-estimating delay in Akamai case

Akamaiedge is not a “virtualized network”

Limelight’s statement

Overall performance can’t be represented by just two dimensions (availability & delay)

Server downtime does not necessarily affect availability; suggested some way to measure and claim in the 99.9% range

RTT of a packet can’t represent delay for objects; suggest use different object sizes

More authoritative performance study should be based on customer trial

Akamai vs. Limelight

Akamai Limelight

# of servers ~27K ~4K

# of clusters 1158 18

95 percentile delay ~100ms ~200ms

average delay ~30ms ~80ms

penetration in ISPs high low

cost high low

complexity high low

approach highly distributed

“big data center”

Facts about Akamai (2014-2015)

CDN company evolved from MIT research invent better ways to deliver Internet content

tackle the "flash crowd" problem

Earns over US$1B revenue in 2015, 25% of the whole CDN market

Runs on 150,000 servers in 1,200 networks across 92 countries

Internet delivery challenge

5% traffic for the largest network

Over 650 networks to reach 90%

“Long tail” distribution of traffic

% of access traffic from top networks

Other challenges

Peering point congestion little economic incentive for middle mile

Inefficient routing protocols how does BGP work?

Unreliable networks de-peering between ISPs

Inefficient communication protocols

Scalability

App limitations and slow rate of adoption

Delivery network as a virtual network

Works as an overlay compatible

transparent to users

adaptive to changes

The untaken clean-slate approach adoption problem

development cost

The Akamai Network at ~2010

A large distributed system, consists of ~ 60000 servers

~ 1000 networks

~ 70 countries

Can also be regarded as multiple delivery networks for different types of content static web

streaming media

dynamic applications

Anatomy of Delivery Network

edge servers global deployment

thousands of cites

mapping system assigns requests to edge servers use historic data

system conditions


transport system move content from

origin to edge

may cache data

communication and control system disseminate status

and control message

configuration update


data collection and analysis collect and process

data, e.g., logs

used for monitoring, analytics, billing …

management portal customer visibility &

fine-grained control

update edge servers

System Design Principles

Goals: scalable and fast data collection & management

safe, quick & consistent configuration updates

enterprise visibility & fine-grained control

Assumption: a significant number of failures is expected to be occurring at all times machine, rack, cluster, connectivity or network

philosophy: failures are normal and the delivery network must operate seamlessly despite them

System Design Principles

Design for reliability ~100% end-to-end availability

full redundancy and fault tolerance protocols

Design for scalability handle large volumes of traffic, data, control …

Limit the necessity for human management automatic, needed to scale, respond to faults

Design for performance improve bottleneck, response time, cache hit

rate, resource utilization and energy efficiency

Streaming and content delivery

Architectural considerations of CDNs for web content and streaming media

Principle: minimize long-haul communication through the

middle-mile bottleneck of the Internet

feasible by pervasive, distributed architectures where servers sit as “close” to users as possible

Key question: how distributed it needs to be?

How distributed it needs to be?

Akamai’s approach: deploy server clusters not only in in Tier 1 and Tier 2 data centers

also in network edges, thousands of locations

more complexity and costs

Reasons: highly fragmented Internet traffic, e.g., top 45

network only account for half of access traffic

distance between server and users is the bottleneck for video throughput due to TCP

P2P is not good for management and control

Video-grade scalability

Content providers’ problem YouTube receives 2 billion views per day

high rates for video, e.g., 2-40 Mbps for HDTV

need to scale with user requests

high capital and operational costs to over-provision so as to absorb spikes on-demand

Akamai’s throughput 3.45 Tbps in April 2010

~ 50-100 Tbps throughput now needed

Akamai’s challenges

need consider throughput along entire path

bottlenecks everywhere original data centers, peering points, network’s

backhaul capacity, ISP’s upstream connectivity

a data center’s egress capacity has little impact on real throughput to end users

even 50 well-provisioned, connected data centers cannot achieve ~100 Tbps

IP-layer multicast does not work in practice, needs its own transport system

Transport system for content

Tiered content distribution target for “cold” or infrequently-accessed

efficiency cache strategy with high hit rates

well-provisioned and highly connected “parent” clusters are utilized

original servers are offloaded in the high 90’s

helpful in flash crowds for large objects

Tiered distribution

Transport system for streaming

An overlay network for live streaming once a stream is captured & encoded, it’s sent

to a cluster of servers called the entrypoint

automatic failover among multiple entrypoints

within an entrypoint cluster, distributed leader election is used to tolerate machine failure

publish-subscribe (pub-sub) model: • entrypoint publishes available streams, and each edge

server subscribes to streams that it requires

Transport system for streaming

An overlay network for live streaming reflectors act as intermediaries between the

entrypoints and the edge clusters

scaling: enables rapidly replicating a stream to a large number of edge clusters to serve popular events

quality: provides alternate paths between each entrypoint and edge cluster, enhancing end-to-end quality via path optimization

can use multiple link-disjoint paths

need efficient algorithms for path selection

Application delivery network

Target for dynamic web application and non-cacheable content

Two complementary approaches speed up long-haul communications by using the

Akamai platform as a high-performance overlay network, i.e., the transport system

pushes application logic from the origin server out to the edge of the Internet

Transport system for app acceleration

Path optimization overcome BGP, collect topology & performance

data from mapping system

dynamically select potential intermediate nodes for a particular path, or multiple paths

~30-50% performance improvement by overlay

used also for packet loss reduction

Middle East cable cut in 2008

Transport system for app acceleration

Transport protocol optimizations proprietary transport-layer protocol

use pools of persistent connections to eliminate connection setup and teardown overhead

optimal TCP window sizing with global knowledge

intelligent retransmission after packet loss

Application optimizations parse HTML and prefetch embedded content

content compression reduces # of roundtrips

implement app logic at edge, e.g., authentication

Distributing applications to the edge

EdgeComputing Services of Akamai

E.g., deploy and execute request-driven Java J2EE apps on Akamai’s edge servers

Not all apps can be run entirely on the edge

Some use cases Content aggregation/transformation

Static databases

Data collection

Complex applications

Platform components

Other platform components

Edge server platform

Mapping system

Communications and control system

Data collection and analysis system

Additional systems and services

Edge server platform

Functionalities controlled by metadata

origin server location and response to failures

cache control and indexing

access control

header alteration (HTTP)

EdgeComputing

performance optimization

Mapping system

Global traffic director uses historic and real-time data about the

health of the Akamai network and the Internet

objective: create maps that are used to direct traffic on the Akamai network in a reliable, efficient, and high performance manner

a fault-tolerant distributed platform: run in multiple independent sites and leader-elect based on the current health status of each site

two parts: scoring system + real-time mapping

Mapping system

Scoring system creates the current Internet topology

collects/processes data: ping, BGP, traceroute

monitors latency, loss, connectivity frequently

Real-time mapping creates the actual maps used to direct end

users’ requests to the best edge servers

selects intermediates for tiered distribution and the overlay network

first step: map to cluster • Based on scoring system info, updated every minute

second step: map to server • Based on content locality, load changes, and etc.

Communications and control system

Real-time distribution of status and control information small real-time message throughout the net

solution: pub-sub model

Point-to-point RPC and web services

Dynamic configuration updates quorum-based replication … another whole paper

Key management infrastructure

Software/machine config. management

Data collection and analysis system

Log collection over 10 million HTTP/sec 100TB/day

compression, aggregation, pipeline and filter …

reporting and billing

Real-time data collection and monitoring a distributed real-time relational database that

supports SQL query … another whole paper

Analytics and Reporting enable customers to view traffic & performance

uses log and Query system, & e.g., MapReduce

Documents

School of Computing National University of Singaporetbma/teaching/cs4226y15_past/07_CDN.pdf · School of Computing National University of Singapore Content Delivery Networks CS 4226: