Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Richard T. B. Ma
School of Computing
National University of Singapore
Content Delivery Networks
CS 4226: Internet Architecture
Motivation
Serving web content from one location scalability -- “flash crowd” problem
reliability
performance
Key ideas: cache content & serve requests from multiple severs at network edge reduce demand on site’s infrastructure
provide faster services to users
Web cache and caching proxy
Replication and load balancing
The middle mile problem
The last mile problem is solved by high levels of global broadband penetration
but imposes a new question of scale by demand
The first mile is easy in terms of Performance and reliability
Get stuck in the middle
Inside the Internet
Tier 1 ISP Tier 1 ISP
Large Content Distributor
Large Content Distributor
IXP IXP
Tier 1 ISP
Tier 2 ISP
Tier 1 ISP Tier 1 ISP
Large Content Distributor
Large Content Distributor
IXP IXP
Tier 1 ISP
Tier 2 ISP
Tier 2 ISP
Tier 2 ISP
Tier 2 ISP Tier 2
ISP Tier 2 ISP
Tier 2 ISP
Tier 2 ISP
Inside the Internet
The middle mile problem
The last mile problem is solved by high levels of global broadband penetration
but imposes a new question of scale by demand
The first mile is easy in terms of Performance and reliability
Stuck in the middle, potential solutions: “Big data center” CDNs
Highly distributed CDNs
How about P2P?
The challenge
The fat file paradox though bits are transmitted at a speed of light,
distance between user and server is critical
latency and throughput are coupled due to TCP
Distance (Server to User)
Network RTT
Packet Loss
Throughput 4GB DVD Download Time
Local: <100 mi. 1.6 ms 0.6% 44 Mbps (high quality HDTV)
12 min.
Regional: 500–1,000 mi. 16 ms 0.7% 4 Mbps (basic HDTV)
2.2 hrs.
Cross-continent: ~3K mi. 48 ms 1.0% 1 Mbps (TV) 8.2 hrs.
Multi-continent: ~6K mi. 96 ms 1.4% 0.4 Mbps (poor)
20 hrs.
Major CDNs (by ‘15 revenue)
Limelight $174M $120M of CDN
Level 3 $8B $235M of CDN
tier-1 transit provider
Akamai $1.03B $700M of CDN
Amazon $6B $1.8B of CDN, but
big % on storage
cloud provider
EdgeCast $180M $125M of CDN
Fastly $60M $9M of CDN
Rest of smaller regional CDNs (MaxCDN, CDN77 etc.) $100M combined.
Major CDNs (by ‘15 revenue)
Highwinds $135M $95M of CDN
ChinaCache $270M $81M of CDN
also a cloud provider
Reference
Cheng Huang, Angela Wang, Jin Li and Keith W. Ross, “Measuring and Evaluating Large-Scale CDNs”, Internet Measurement Conference 2008.
Erik Nygren, Ramesh K. Sitaraman and Jennifer Sun, ” The Akamai Network: A Platform for High-erformance Internet Applications ,” ACM SIGOPS Operating Systems Review 44(3), July 2010.
How can we understand a CDN?
We don’t know their internal structures but could “infer” via a measurement approach
We know that CDNs use a DNS trick for example, end-user types www.youtube.com
resolve IP address via local DNS (LDNS) server
LDNS queries YouTube’s authoritative DNS
YouTube uses CDN if returns a CNAME like • a1105.b.akamai.net or move.vo.llnwd.net
• LDNS then queries CNAME’s authoritative DNS server and get the IP address of the content server
DNS records
DNS: distributed db storing resource records (RR)
Type=NS (Name Server) name is domain (e.g.,
foo.com)
value is hostname of authoritative name server for this domain
RR format: (name, value, type, ttl)
Type=A (Address) name is hostname value is IP address
Type=CNAME (Canonical NAME) name is alias name for some
“canonical” (the real) name www.ibm.com is really servereast.backup2.ibm.com
value is canonical name
Type=MX (Mail eXchange) value is name of mailserver
associated with name
Content Server Assignment
The returned content server will be close to the issuing local DNS (LDNS) server
Measurement Framework
Assumptions: CDN chooses nearby content server based on
the location of LDNS that originates the query
the same LDNS might get different content servers for the same query at different times
1. Determine all the CNAMEs of a CDN
2. Query a large number of LDNSs all over the world, at different times of the day, for all of the CNAMEs found in step 1
Finding CNAMEs and LDNSs
Find all the CNAMEs of a CDN use over 16 million web hostnames
a DNS query tells if it resolves to a CNAME
whether the CNAME belongs to the target CDN
thousands of CNAMEs for Akamai and Limelight
Locate a large # of distributed LDNSs need open recursive DNS servers
use over 7 million unique client IP addresses and over 16 million web hostnames
reverse DNS lookup and test trial DNS queries
Open recursive DNS servers
many different DNS servers map into same IP addresses
obtain 282,700 unique open recursive DNS servers
Measurement Platform
300 PlanetLab nodes, 3 DNS queries per second
more than 1 day for the measurement
The Akamai Network
Type (a): returns 2 IP addresses, different for different locations
hundreds of IPs behind a CNAME, ~11,500 content servers
Type (c): returns only 1 IP address; 20-100 IPs for each CNAME
guesses virtualization used for isolated environments
Type # of CNAMES
# of IPs Usage
(a) *.akamai.net 1964 ~11,500 conventional content distribution
(b) *.akadns.net 757 A few per CNAME
load balancing for customers who has their own networks
(c) *.akamaiedge.net 539 ~36,000 dynamic content distribution/secure service
The Akamai Network
~27K content servers, ~6K also run DNS
60% in the US, 90% in top 10 countries
flat distribution in ISPs: 15% in top 7
The Limelight Network
Easier as it is an Autonomous System (AS) obtain the IP addresses of the AS
only ~4K servers
Measuring performance
Two metrics availability: how reliable are the CDN servers?
delay: how fast content can be retrieved?
Performance results are controversial do the metrics sufficiently match overall
system performance goals?
how does performance metric map to specific customer performance perception?
both Akamai and Limelight issued statements to “correct” the research results
Availability
Monitor all servers for 2 months, ping once every hour
If a server does not respond in 2 consecutive hours, considered “down”
But does “down” server necessarily affect availability?
Delay
Different reasons: Number of content servers? Optimality (for delay) of routing?
More detailed delay comparison
Akamai’s statement
Availability cannot be reflected based on server uptime alone
Akamai’s CDN has more servers but not necessarily harder to maintain
The use of open-resolvers miss many Akamai servers, hence over-estimating delay in Akamai case
Akamaiedge is not a “virtualized network”
Limelight’s statement
Overall performance can’t be represented by just two dimensions (availability & delay)
Server downtime does not necessarily affect availability; suggested some way to measure and claim in the 99.9% range
RTT of a packet can’t represent delay for objects; suggest use different object sizes
More authoritative performance study should be based on customer trial
Akamai vs. Limelight
Akamai Limelight
# of servers ~27K ~4K
# of clusters 1158 18
95 percentile delay ~100ms ~200ms
average delay ~30ms ~80ms
penetration in ISPs high low
cost high low
complexity high low
approach highly distributed
“big data center”
Facts about Akamai (2014-2015)
CDN company evolved from MIT research invent better ways to deliver Internet content
tackle the "flash crowd" problem
Earns over US$1B revenue in 2015, 25% of the whole CDN market
Runs on 150,000 servers in 1,200 networks across 92 countries
Internet delivery challenge
5% traffic for the largest network
Over 650 networks to reach 90%
“Long tail” distribution of traffic
% of access traffic from top networks
Other challenges
Peering point congestion little economic incentive for middle mile
Inefficient routing protocols how does BGP work?
Unreliable networks de-peering between ISPs
Inefficient communication protocols
Scalability
App limitations and slow rate of adoption
Delivery network as a virtual network
Works as an overlay compatible
transparent to users
adaptive to changes
The untaken clean-slate approach adoption problem
development cost
The Akamai Network at ~2010
A large distributed system, consists of ~ 60000 servers
~ 1000 networks
~ 70 countries
Can also be regarded as multiple delivery networks for different types of content static web
streaming media
dynamic applications
Anatomy of Delivery Network
edge servers global deployment
thousands of cites
mapping system assigns requests to edge servers use historic data
system conditions
Anatomy of Delivery Network
transport system move content from
origin to edge
may cache data
communication and control system disseminate status
and control message
configuration update
Anatomy of Delivery Network
data collection and analysis collect and process
data, e.g., logs
used for monitoring, analytics, billing …
management portal customer visibility &
fine-grained control
update edge servers
System Design Principles
Goals: scalable and fast data collection & management
safe, quick & consistent configuration updates
enterprise visibility & fine-grained control
Assumption: a significant number of failures is expected to be occurring at all times machine, rack, cluster, connectivity or network
philosophy: failures are normal and the delivery network must operate seamlessly despite them
System Design Principles
Design for reliability ~100% end-to-end availability
full redundancy and fault tolerance protocols
Design for scalability handle large volumes of traffic, data, control …
Limit the necessity for human management automatic, needed to scale, respond to faults
Design for performance improve bottleneck, response time, cache hit
rate, resource utilization and energy efficiency
Streaming and content delivery
Architectural considerations of CDNs for web content and streaming media
Principle: minimize long-haul communication through the
middle-mile bottleneck of the Internet
feasible by pervasive, distributed architectures where servers sit as “close” to users as possible
Key question: how distributed it needs to be?
How distributed it needs to be?
Akamai’s approach: deploy server clusters not only in in Tier 1 and Tier 2 data centers
also in network edges, thousands of locations
more complexity and costs
Reasons: highly fragmented Internet traffic, e.g., top 45
network only account for half of access traffic
distance between server and users is the bottleneck for video throughput due to TCP
P2P is not good for management and control
Video-grade scalability
Content providers’ problem YouTube receives 2 billion views per day
high rates for video, e.g., 2-40 Mbps for HDTV
need to scale with user requests
high capital and operational costs to over-provision so as to absorb spikes on-demand
Akamai’s throughput 3.45 Tbps in April 2010
~ 50-100 Tbps throughput now needed
Akamai’s challenges
need consider throughput along entire path
bottlenecks everywhere original data centers, peering points, network’s
backhaul capacity, ISP’s upstream connectivity
a data center’s egress capacity has little impact on real throughput to end users
even 50 well-provisioned, connected data centers cannot achieve ~100 Tbps
IP-layer multicast does not work in practice, needs its own transport system
Transport system for content
Tiered content distribution target for “cold” or infrequently-accessed
efficiency cache strategy with high hit rates
well-provisioned and highly connected “parent” clusters are utilized
original servers are offloaded in the high 90’s
helpful in flash crowds for large objects
Tiered distribution
Transport system for streaming
An overlay network for live streaming once a stream is captured & encoded, it’s sent
to a cluster of servers called the entrypoint
automatic failover among multiple entrypoints
within an entrypoint cluster, distributed leader election is used to tolerate machine failure
publish-subscribe (pub-sub) model: • entrypoint publishes available streams, and each edge
server subscribes to streams that it requires
Transport system for streaming
An overlay network for live streaming reflectors act as intermediaries between the
entrypoints and the edge clusters
scaling: enables rapidly replicating a stream to a large number of edge clusters to serve popular events
quality: provides alternate paths between each entrypoint and edge cluster, enhancing end-to-end quality via path optimization
can use multiple link-disjoint paths
need efficient algorithms for path selection
Application delivery network
Target for dynamic web application and non-cacheable content
Two complementary approaches speed up long-haul communications by using the
Akamai platform as a high-performance overlay network, i.e., the transport system
pushes application logic from the origin server out to the edge of the Internet
Transport system for app acceleration
Path optimization overcome BGP, collect topology & performance
data from mapping system
dynamically select potential intermediate nodes for a particular path, or multiple paths
~30-50% performance improvement by overlay
used also for packet loss reduction
Middle East cable cut in 2008
Transport system for app acceleration
Transport protocol optimizations proprietary transport-layer protocol
use pools of persistent connections to eliminate connection setup and teardown overhead
optimal TCP window sizing with global knowledge
intelligent retransmission after packet loss
Application optimizations parse HTML and prefetch embedded content
content compression reduces # of roundtrips
implement app logic at edge, e.g., authentication
Distributing applications to the edge
EdgeComputing Services of Akamai
E.g., deploy and execute request-driven Java J2EE apps on Akamai’s edge servers
Not all apps can be run entirely on the edge
Some use cases Content aggregation/transformation
Static databases
Data collection
Complex applications
Platform components
Other platform components
Edge server platform
Mapping system
Communications and control system
Data collection and analysis system
Additional systems and services
Edge server platform
Functionalities controlled by metadata
origin server location and response to failures
cache control and indexing
access control
header alteration (HTTP)
EdgeComputing
performance optimization
Mapping system
Global traffic director uses historic and real-time data about the
health of the Akamai network and the Internet
objective: create maps that are used to direct traffic on the Akamai network in a reliable, efficient, and high performance manner
a fault-tolerant distributed platform: run in multiple independent sites and leader-elect based on the current health status of each site
two parts: scoring system + real-time mapping
Mapping system
Scoring system creates the current Internet topology
collects/processes data: ping, BGP, traceroute
monitors latency, loss, connectivity frequently
Real-time mapping creates the actual maps used to direct end
users’ requests to the best edge servers
selects intermediates for tiered distribution and the overlay network
first step: map to cluster • Based on scoring system info, updated every minute
second step: map to server • Based on content locality, load changes, and etc.
Communications and control system
Real-time distribution of status and control information small real-time message throughout the net
solution: pub-sub model
Point-to-point RPC and web services
Dynamic configuration updates quorum-based replication … another whole paper
Key management infrastructure
Software/machine config. management
Data collection and analysis system
Log collection over 10 million HTTP/sec 100TB/day
compression, aggregation, pipeline and filter …
reporting and billing
Real-time data collection and monitoring a distributed real-time relational database that
supports SQL query … another whole paper
Analytics and Reporting enable customers to view traffic & performance
uses log and Query system, & e.g., MapReduce