Upload
audra-schultz
View
36
Download
1
Embed Size (px)
DESCRIPTION
Internet Quality-of-Service (QoS). Henning Schulzrinne Columbia University Fall 2003. Quality of Service. Motivation Service availability Elementary queueing theory Traffic characterization & control Integrated services (RSVP, NSIS) Differentiated services (DiffServ). - PowerPoint PPT Presentation
Citation preview
Internet Quality-of-Service (QoS)
Henning Schulzrinne
Columbia University
Fall 2003
Quality of Service
Motivation Service availability Elementary queueing theory Traffic characterization & control Integrated services (RSVP, NSIS) Differentiated services (DiffServ)
What is quality of service?
Many applications are sensitive to the effects of delay (+ jitter) and packet loss
– may have “floor” below which utility drops to zero The existing Internet architecture provides a best
effort service.– All traffic is treated equally (generally, FIFO
queuing) – No mechanism for distinguishing between delay
sensitive and best effort traffic Original IP architecture (IPv4) has TOS (type-of-
service byte) in packet header– RFC 795: defined multiple axes (delay,
throughput, reliability)– rarely used outside some (rumor) military networks
utility ($)
bandwidth
Motivation
QoS service availability– not good enough if all but 2 minutes of my phone call sound
perfect Support mission-critical applications that can’t
tolerate disruption– VoIP– VPNs (LAN emulation)– high-availability computing
Charge more for business applications vs. consumer applications
Service availability
Users do not care about QoS at least not about packet loss, jitter, delay rather, it’s service availability how likely is it that I can place
a call and not get interrupted? availability = MTBF / (MTBF + MTTR)
– MTBF = mean time between failures– MTTR = mean time to repair
availability = successful calls / first call attempts– equipment availability: 99.999% (“5 nines”) 5 minutes/year– AT&T (2003): – Sprint IP frame relay SLA: 99.5%
Long-distance voice 99.978%
ATM data 99.999%
Frame relay data 99.998%
IP 99.991%
Availability – PSTN metrics
PSTN metrics (Worldbank study):– fault rate
“should be less than 0.2 per main line”
– fault clearance (~ MTTR) “next business day”
– call completion rate during network busy hour “varies from about 60% - 75%”
– dial tone delay
Example PSTN statistics
Source: Worldbank
Measurement setup
Node name Location Connectivity Networkcolumbia Columbia University, NY >= OC3 I2
wustl Washington U., St. Louis I2
unm Univ. of New Mexico I2
epfl EPFL, Lausanne, CH I2+
hut Helsinki University of Technology I2+
rr NYC cable modem ISP
rrqueens Queens, NY cable modem ISP
njcable New Jersey cable modem ISP
newport New Jersey ADSL ISP
sanjose San Jose, California cable modem ISP
suna Kitakyushu, Japan 3 Mb/s ISP
sh Shanghai, China cable modem ISP
Shanghaihome Shanghai, China cable modem ISP
Shanghaioffice Shanghai, China ADSL ISP
Measurement setup
Active measurements call duration 3 or 7 minutes UDP packets:
– 36 bytes alternating with 72 bytes (FEC)– 40 ms spacing
September 10 to December 6, 2002 13,500 call hours
Call success probability
62,027 calls succeeded, 292 failed 99.53% availability
roughly constant across I2, I2+, commercial ISPs
All 99.53%
Internet2 99.52%
Internet2+ 99.56%
Commercial 99.51%
Domestic (US) 99.45%
International 99.58%
Domestic commercial
99.39%
International commercial
99.59%
Overall network loss
PSTN: once connected, call usually of good quality
– exception: mobile phones compute periods of time
below loss threshold– 5% causes degradation for
many codecs– others acceptable till 20%
loss 0% 5% 10% 20%
All 82.3 97.48 99.16 99.75
ISP 78.6 96.72 99.04 99.74
I2 97.7 99.67 99.77 99.79
I2+ 86.8 98.41 99.32 99.76
US 83.6 96.95 99.27 99.79
Int. 81.7 97.73 99.11 99.73
US ISP
73.6 95.03 98.92 99.79
Int. ISP
81.2 97.60 99.10 99.71
Network outages
sustained packet losses– arbitrarily defined at 8 packets– far beyond any recoverable loss (FEC, interpolation)
23% outages make up significant part of 0.25% unavailability symmetric: AB BA spatially correlated: AB AX not correlated across networks (e.g., I2 and
commercial)
Network outages
0.0001
0.001
0.01
0.1
1
0 50 100 150 200 250 300 350 400
Com
plem
enta
ry C
DF
outage duration (sec)
US Domestic pathsInternational paths
1e-05
0.0001
0.001
0.01
0.1
1
0 50 100 150 200 250 300 350 400C
ompl
emen
tary
CD
F
outage duration (sec)
all pathsInternet2
Network outagesno. of outages
% symmetric
duration (mean)
duration (median)
total (all, h:m)
outages > 1000 packets
all 10,753 30% 145 25 17:20 10:58
I2 819 14.5% 360 25 3:17 2:33
I2+ 2,708 10% 259 26 7:47 5:37
ISP 8,045 37% 107 24 9:33 4:58
US 1,777 18% 269 20 5:18 3:53
Int. 8,976 33% 121 26 12:02 6:42
Outage-induced call abortion probability
Long interruption user likely to abandon call
from E.855 survey: P[holding] = e-t/17.26 (t in seconds)
half the users will abandon call after 12s
2,566 have at least one outage 946 of 2,566 expected to be
dropped 1.53% of all calls
all 1.53%
I2 1.16%
I2+ 1.15%
ISP 1.82%
US 0.99%
Int. 1.78%
US ISP 0.86%
Int. ISP 2.30%
Conclusions from measurement
Availability in space is (mostly) solved availability in time restricts usability for new applications
initial investigation into service availability for VoIP need to define metrics for, say, web access unify packet loss and “no Internet dial tone’’ far less than “5 nines” working on identifying fault sources and locations looking for additional measurement sites
What’s next?
Existing SLAs are mostly useless– too many exceptions– wrong time scales: month vs. minutes– no guarantees for interconnects
Existing measurements similarly dubious Limited ability to learn from mistakes
– what are the primary causes of service unavailability?– what can I do to protect myself – multi-homing via same fiber? diverse
access mechanisms? Consumers of services have no good ways to compare service
availability– only some very large customers may get access to carrier-internal data
Thus, market failure Need published metrics
– similar to switch availability reporting
What's hard to scale (and not)
Signaling does not have be hard:– one message, on a reliable peering channel or IP router
alert option– NSIS effort in the IETF?
YESSIR: RTCP-based signaling– 700 MHz Celeron processor– 10,000 flow setups/second 300,000 softstate flows
If scaling matters, sink-tree based reservation (BGRP)
Diversity is good
Unlike routing, no need for single signaling protocol:– multicast is much harder– dumb end devices– edge "pop-up" only show up in edge nodes
AAA
Signaling can easily be done in ASIC (no harder than IP), but
– need cryptographic verification of request– need interface to Authentication, Authorization, Accounting
(AAA)– cross-domain authentication hard, but 3G networks will
do it anyway– easier if both sides ask their own access router– see also: iPass for dial-up, OSP (open settlement protocol)
AAA example
AR1 AR2Internetsource destinationsigns request
reserves for bothdirections
Cell phone model: both sides pay
Reservation scaling
Example: every long-distance call in the US uses VoIP with per-flow resource reservation
2000: 567.4 billion minutes @ 10 minutes each 1,800 calls/second
single mySQL server can sustain 500—2,000 queries+updates/second
Business models don't work
Most of the time, "tin" service is no worse than "platinum" service– can't impress others with platinum AmEx card– no frequent flyer bonuses
everybody switches only when the network is in bad shape
Resource control & reservation
ReservationProtocol
ApplicationAdmission
Control
Packet Scheduler
Classifier &route selectionData
QoS queuing
Routing Protocols &
DBs
Best-effort queuing
TrafficControl DB
Tspec Y/N
USC EE-S 555
RED (Random Early Detection)
THminTHmax0
Do not discardDiscard withincreasingprobability Pd
Discard
TCP synchronization effect during overload, many connections lose packets and go into slowstart
RED: start dropping based on average queue occupancy (vs. instantaneous queue occupancy)
Parameter setting critical and non-trivial
See also RFC 2309
ECN (Explicit Congestion Notification)
Extension of RED: mark instead of drop
RFC 2481 (“A Proposal to add Explicit Congestion Notification (ECN) to IP”)
IP TOS6 bit indicates congestion: ECN
IP TOS7 bit indicates support for mechanism
Needs cooperation of TCP (or similar protocols)
TCP should act almost as if packet was dropped
– ½ congestion window– but don’t do slow-start
ECT=1ECN=1
ECT=1ECN=0
TCP ACK: ECN echo
Next steps in signaling (NSIS)
RSVP not widely used for resource reservation– but is used for MPLS path setup– design heavily biased by multicast needs– marginal and after-the-fact security– limited support for IP mobility
Thus, IETF NSIS working group developing new framework for general state management protocol
– resource reservation– NAT and firewall control– traffic and QoS measurement– MPLS and lambda path setup
Split into two components:– NSLP: services– NTLP: transport
NSIS
On-path vs. off-path– off-path bandwidth brokers
Discovery of next NTLP or NSLP hop– use router alert option
UDP TCP SCTP
SCTP
NTLP
QoS NAT/FW measure