Upload
norman
View
50
Download
0
Tags:
Embed Size (px)
DESCRIPTION
QoS Measurement and Management for VoIP. Wenyu Jiang IRT Lab March 5, 2003. Introduction to VoIP & IP Telephony. Transport of voice packets over IP networks Cost savings Consolidates voice and data networks Avoids leased lines, long-distance toll calls Smart and new services - PowerPoint PPT Presentation
Citation preview
QoS Measurement and QoS Measurement and Management for VoIPManagement for VoIP
Wenyu Jiang
IRT LabMarch 5, 2003
Introduction to VoIP & Introduction to VoIP & IP TelephonyIP Telephony
Transport of voice packets over IP networks Cost savings
– Consolidates voice and data networks– Avoids leased lines, long-distance toll calls
Smart and new services– Call management (filtering, TOD forwarding): CPL– Better than PSTN quality: wide-band codecs
Protocols and Standards– Signaling: SIP (IETF), H.323 (ITU-T)– Transport: RTP/RTCP (IETF)
Practical Issues in VoIPPractical Issues in VoIPQuality of Service (QoS)
– Internet is a best-effort network Loss, delay and jitter Users expect at least PSTN quality for VoIP!
Ease of deployment– Requires seamless integration with legacy
networks (PSTN/PBX)– Security is a must
High yardstick of service availability– Can your network achieve 99.999% up time?
OutlineOutline QoS measurement
– Objective vs. subjective metrics – Automated measurement of subjective quality
QoS management: improving your quality– End-to-End: FEC, LBR, PLC– Network provisioning: voice traffic aggregation
Reality check– Performance of end-points (IP phones, …)– Deployment issues in VoIP– Evaluation of VoIP service availability through
Internet measurement
Workings of a VoIP ClientWorkings of a VoIP ClientAudio is packetized, encoded and transmittedForward error correction (FEC) may be used
to recover lost packetsPlayout control smoothes out jitter to
minimize late losses; coupled with FECPacket loss concealment (PLC)
– Last line of “defense” after FEC and playout
FEC affects playout control
addedloss, jitter
recoveryFEC
unrecoverableplayoutdelaycontrol
losses by FEC
& decoding
lossconcealmentInternet
addedlatelosses
packets with FECmultimedia
LBR: An Alternative to FECLBR: An Alternative to FEC An (n,k) block FEC code can recover n-k losses Low Bit-rate Redundancy (LBR)
– Transmit a lower bit-rate version of original audio– No notion of “blocks”– Not bit-exact recovery
CA B D
A BF
E
C D
transmission time
FEC block 1 FEC block 2
FEC dataFEC data
C
a'A B
transmission time
LBR datab'
E
c'
F
d'
D
Objective QoS Metrics: LossObjective QoS Metrics: Loss Internet packet loss is often bursty
– May worsen voice quality than random (Bernoulli) loss Characterization of packet loss
– 2-state Markov (Gilbert) model: conditional loss prob.
– More detailed models, but more states! Extended Gilbert model, nth order Markov model Hidden Markov model, Gilbert-Elliot model, inter-loss distance
– More states Larger test set, loss of big picture, and Adaptive applications can trade-off model accuracy for fast feedback Gilbert model provides an acceptable compromise
0 11-p p
q
(non-loss) (loss)
1-q = p c
Effect of Gilbert Loss ModelEffect of Gilbert Loss Model Loss burst distribution of a packet trace
– Roughly, though not exactly exponential Loss burstiness on FEC performance
– FEC less efficient under bursty loss
0.1
1
10
100
1000
0 2 4 6 8 10 12
nu
mb
er o
f o
ccu
rren
ces
Loss burst length
Packet traceGilbert model
0
0.5
1
1.5
2
2.5
3
10 20 30 40 50 60
p_f:
fina
l los
s% a
fter
FE
C
conditional loss p_c (%)
GilbertBernoulli
Objective QoS Metrics: DelayObjective QoS Metrics: Delay Complementary Conditional CDF (C3DF)
– More descriptive than auto-correlation function (ACF)– Delay correlation rises rapidly beyond a threshold– Approximates conditional late loss probability
lag=3
lag=5
lag=10lag=20
unconditional
lag=2
lag=1
0
0.2
0.4
0.6
0.8
1
0 0.05 0.1 0.15 0.2 0.25 0.3
y: p
roba
bilit
y
x: delay (sec)
idltdtdPtf ilii packet ofdelay : ,...,3,2,1 lag ],|[)(
Subjective QoS MetricsSubjective QoS MetricsPerceived quality
– Mean Opinion Score (MOS) ITU-T P.800/830 Obtained via listening tests
– MOS variations DMOS (Degradation) CMOS (Comparison) MOSc (Conversational): considers delay A/B preference
Pros: more meaningful to end usersCons: time consuming, labor intensive
MOS Grade Score
Excellent 5
Good 4
Fair 3
Poor 2
Bad 1
Effect of Loss Model on Effect of Loss Model on Perceived QualityPerceived Quality
Codec: G.729 (8kb/s ITU std)Random (Bernoulli) vs. bursty (Gilbert) loss
– Bursty lower MOS– True even when FEC or LBR is used
2
2.5
3
3.5
4
4.5
0.02 0.04 0.06 0.08 0.1 0.12
MO
S
loss probability
Effect of random vs. bursty loss on MOS quality
random (Bernoulli) lossbursty (Gilbert) loss
2
2.5
3
3.5
4
4.5
5
0.02 0.04 0.06 0.08 0.1 0.12
MO
S
loss probability
random vs. bursty loss on FEC (G.723.1) quality
FEC (3,2) (Gilbert)FEC (3,2) (Bernoulli)
Going Further: Bridging Going Further: Bridging Objective and Subjective MetricsObjective and Subjective Metrics The E-model (ITU-T G.107/108)
– Originally for telephone network planning– Considers various impairments– Reduces to delay and loss impairment when adapted for
VoIP
Objective quality estimation algorithms– Suitable when network stats is not available, e.g.,
phone-to-phone service with IP in between.– Speech recognition performance may be used as a
quality predictor, by comparing with original text
The E-modelThe E-model Map from loss and delay to
impairment scores (Ie, Id) Compute a gross score (R
value) and map to MOSc
Limited number of codec loss impairment mappings 10
15
20
25
30
35
40
45
50
0 0.03 0.06 0.09 0.12 0.15 0.18
Ie (l
oss
impa
irmen
t)
average loss probability
G.729 T=20ms random loss
0.5
1
1.5
2
2.5
3
3.5
4
4.5
20 40 60 80 100
MO
S
R value
R to MOS mapping
0
5
10
15
20
25
30
35
0 50 100 150 200 250 300 350 400
Id (d
elay
impa
irmen
t)
delay (ms)
E-model Id
Using Speech Recognition to Using Speech Recognition to Predict MOSPredict MOS
Evaluation of automatic speech recognition (ASR) based MOS prediction– IBM ViaVoice Linux version– Codec used: G.729– Performance metric
absolute word recognition ratio
relative word recognition ratio
dsspoken wor of # total
wordsrecognizedcorrectly of #absR
yprobabilit loss is ,%)0(
)()( p
R
pRpR
abs
absrel
Recognition Ratio vs. MOSRecognition Ratio vs. MOSBoth MOS and Rabs
decrease w.r.t. lossThen, eliminate
middle variable p 2
2.2
2.4
2.6
2.8
3
3.2
3.4
3.6
0 2 4 6 8 10 12 14 16
MO
S
loss rate (%)
Impact of packet loss on audio quality
G.729 codec
28
30
32
34
36
38
40
42
44
0 2 4 6 8 10 12 14 16
wor
d re
cogn
ition
rat
io (%
)
loss rate (%)
Impact of packet loss on automatic speech recognition
G.729 codec
2
2.2
2.4
2.6
2.8
3
3.2
3.4
3.6
3.8
28 30 32 34 36 38 40 42 44
MO
S
word recognition ratio (%)
mapping from speech recognition performance to MOS
speech recognition performance
Speaker DependencySpeaker Dependency Absolute performance
is speaker-dependent But relative word
recognition ratio is not Suitable for MOS
prediction
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
0 2 4 6 8 10 12 14 16
rela
tive
wor
d re
cogn
ition
rat
io R
_rel
packet loss probability p (%)
Speaker ASpeaker BSpeaker C
2
2.2
2.4
2.6
2.8
3
3.2
3.4
3.6
3.8
0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
MO
S
relative word recognition ratio R_rel
Speaker ASpeaker BSpeaker C
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 2 4 6 8 10 12 14 16
wor
d re
cogn
ition
rat
io
packet loss probability p (%)
Speaker ASpeaker BSpeaker C
Summary of QoS Summary of QoS MeasurementMeasurement
Loss burstiness:– Affects (generally worsens) perceived quality as well
as FEC performance– May be described with, e.g., a Gilbert model
Delay correlation:– Increases rapidly beyond a threshold, revealed through
Complementary Conditional CDF (C3DF)– Late losses are also bursty
Perceived quality (MOS) estimation– Analytical: the E-model– If network statistics N/A: relative word recognition
ratio can provide speaker-independent MOS prediction
OutlineOutline QoS measurement
– Objective vs. subjective metrics – Automated measurement of subjective quality
QoS management: improving your quality– End-to-End: FEC, LBR, PLC– Network provisioning: voice traffic aggregation
Reality check– Performance of VoIP end-points (IP phones, …)– Deployment issues in VoIP– Evaluation of VoIP service availability through Internet
measurement
Quality of FEC vs. LBRQuality of FEC vs. LBR FEC is substantially and consistently better
– At comparable bandwidth overhead– Across all codec configurations tested
2
2.5
3
3.5
4
4.5
0.02 0.04 0.06 0.08 0.1 0.12
MO
S
loss probability
FEC vs. LBR based on G.723.1
J: FEC (2,1)I: G.723.1 LBR
2
2.5
3
3.5
4
4.5
0.02 0.04 0.06 0.08 0.1 0.12
MO
S
loss probability
FEC vs. LBR based on AMR
N: AMR12.2+FEC (3,2)M: AMR12.2+6.7 LBR
G.729+G.723.1 LBR AMR LBR
Quality of FEC under Bursty Quality of FEC under Bursty LossLoss
Packet interval T has a stronger effect on MOS with FEC than without FEC
0.5-0.6 MOS
2.5
3
3.5
4
4.5
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18
MO
S (M
ean
Opi
nion
Sco
re)
p_u (overall loss rate)
conditional loss probability p_c = 30%
T=20ms
2
T=40ms
T=20ms, FEC
T=40ms, FEC
FEC MOS Optimization FEC MOS Optimization Considering Delay EffectConsidering Delay Effect
Larger T FEC efficiency, but delay Optimizing T with the E-model
– Calculate final loss probability after FEC, apply delay impairment of FEC, map to MOSc
Prediction close to FEC MOS test results– Suitable for analytical perceived quality prediction
2
2.5
3
3.5
4
20 40 60 80 100 120 140 160 180
MO
S_c
packet interval T (ms)
FEC MOS optimization, Id != 0, d=3*T
p_u=4%p_u=8%
p_u=12%p_u=16%
2.4
2.6
2.8
3
3.2
3.4
3.6
3.8
4
4.2
0 2 4 6 8 10 12 14 16
MO
S_c
original loss rate (%)
FEC MOS prediction, p_c=30%
E-model prediction T=40msreal MOS test T=40ms
Trade-off Analysis between Trade-off Analysis between Codec Robustness and FECCodec Robustness and FEC
3 loss repair options– FEC, LBR, PLC
Loss-resilient codec– Better PLC
iLBC (IETF)
– But more bit-rates– Better than FEC?
1.5
2
2.5
3
3.5
4
0 0.03 0.06 0.09 0.12 0.15
MO
S
average loss probability
iLBC 14kb/sG.729 8kb/s
G.723.1 6.3kb/s
Observations and ResultsObservations and Results When considering delay:
– iLBC is usually preferred in low loss conditions– G.729 or G.723.1 + FEC better for high loss
Example: max bandwidth 14 kb/s– Consider delay impairment (use MOSc)
2.4
2.6
2.8
3
3.2
3.4
3.6
3.8
4
0 0.03 0.06 0.09 0.12 0.15
MO
S_c
average loss probability
iLBC,no FECG.729+(5,3)
G.723.1+(2,1),T=60ms
G.729+(5,3)
G.723.1+(2,1),T=60ms
iLBC
33.23.43.63.8
4
0 0.03 0.06 0.09 0.12 0.15
MO
S_c
average loss probability
Max BW: 14 kb/s
2.82.62.4
Effect of Max Bandwidth on Effect of Max Bandwidth on Achievable QualityAchievable Quality
14 to 21 kb/s: significant improvement in MOSc
From 21 to 28 kb/s: marginal change due to increasing delay impairment by FEC
2.4
2.6
2.8
3
3.2
3.4
3.6
3.8
4
0 0.03 0.06 0.09 0.12 0.15
MO
S_c
average loss probability
Max BW: 14 kb/sMax BW: 21 kb/sMax BW: 28 kb/s
Provisioning a VoIP NetworkProvisioning a VoIP Network Silence detection/suppression
– Transmit only during On period, saves bandwidth– Allows traffic aggregation through statistical multiplexing
Characteristics of On/Off patterns in VoIP– Traditionally found to be exponentially distributed– Modern silence detectors (G.729B VAD, NeVoT SD) produce
different patterns
1e-05
0.0001
0.001
0.01
0.1
1
0 50 100 150 200 250 300 350 400 450 500
com
plem
enta
ry C
DF
spurt/gap duration (in 10 ms frames)
talk-spurt/gap distribution, G.729B VAD
real spurt CDFexponential spurt CDF
real gap CDFexponential gap CDF
1e-05
0.0001
0.001
0.01
0.1
1
0 200 400 600 800 1000
com
plem
enta
ry C
DF
spurt/gap duration (in 10 ms frames)
talk-spurt/gap distribution, Nevot SD (default setting)
real spurt CDFexponential spurt CDF
real gap CDFexponential gap CDF
Traffic Aggregation SimulationTraffic Aggregation Simulation Token bucket filter with N sources, R: reserved to peak BW ratio CDF model resembles trace model in most cases Exponential (traditional) model
– Under-predicts out-of-profile packet probability;– Under-prediction ratio as token buffer size B
Similar results for NeVoT SD
Summary of QoS Summary of QoS ManagementManagement
End-to-End– FEC is superior in quality to LBR– Codec robustness is better than FEC in low loss
conditions Combining both schemes brings the best of both sides
Network provisioning– Observation: New silence detectors (G.729B, NeVoT
SD) non-exponential voice On/Off patterns– Result: performance of voice traffic aggregation under
new On/Off patterns– Important in traffic engineering and Service Level
Agreement (SLA) validation
OutlineOutline QoS measurement
– Objective vs. subjective metrics – Automated measurement of subjective quality
QoS management: improving your quality– End-to-End: FEC, LBR, PLC– Network provisioning: voice traffic aggregation
Reality check– Performance of end-points (IP phones, …)– Deployment issues in VoIP– Assessment of VoIP service availability through Internet
measurement
Mouth-to-ear Delay of VoIP Mouth-to-ear Delay of VoIP End-pointsEnd-points
All receivers can adjust M2E delay adaptively whenever it is too low or too high
M2E delay depends mainly on receiver (esp. RAT) HW phones have relatively low delay (~45-90ms)
35
40
45
50
55
60
0 50 100 150 200 250 300 350
M2E
del
ay (m
s)
time (sec)
experiment 1-1experiment 1-2
silence gaps
406080
100120140160180200220240
3Com Cisco Mediatrix Pingtel RAT
M2E
del
ay (m
s)
Receiver
Effect of Sender and Receiver
Sender: 3ComSender: Cisco
Sender: MediatrixSender: Pingtel
Sender: RAT
But Adaptiveness But Adaptiveness PerfectionPerfection
Symptom of playout buffer underflow
Waveforms are dropped
Occurred at point of delay adjustment
Bugs in software?
LAN perfect quality?
Major ObservationsMajor Observations Overall: end-points matter a lot! HW IP phones: 45-90ms average M2E delay SW clients:
– Messenger 2000 lowest (68ms), XP (96-120ms) c.f. GSMPSTN: 110ms either direction
– NetMeeting very bad (> 400ms) PLC robustness
– Acceptable in all 3 IP phones tested, Cisco phone more robust Silence detection/suppression
– Works for speech input– Often fails for non-speech (e.g., music) input
Generates many unnatural gaps Not good for customer support center (on-hold music)!
Acoustic echo cancellation (AEC): – Good on most IP phones (Echo Return Loss > 40 dB)– But some do not implement AEC at all
Reality Check #2: IP Reality Check #2: IP Telephony DeploymentTelephony Deployment
Localized deployment at Columbia Univ.
SIP proxy,redirectserver
SQLdatabase
sipd
ConferenceServer
VoicemailServer
T1/E1RTP/SIP
Regular phone
SIP/PSTN Gateway
TelephoneSwitch/PBX
Web based configuration
Web Server
Server status monitoring
Core Server
IP Phones
Issues and Lessons LearnedIssues and Lessons Learned PSTN/PBX integration
– Requires full understanding of legacy networks Lower layer (e.g., T1 line configuration)
– Parameters must match on both PSTN/PBX and gateway! PBX access configurations
– To ensure calls go through in both directions Address translation (dial-plan) in both directions
– Previous lessons/experiences can help greatly E.g., second gateway installed in weeks instead of months
Security– Issue: SIP/PSTN gateway has no authentication feature– Solution:
Use gateway’s access control lists to block direct calls SIP proxy server handles authentication using record-route
Reality Check #3: VoIP Reality Check #3: VoIP Service AvailabilityService Availability
Focus on availability rather than traditional QoS– Delay is a minor issue; FEC recovers most isolated losses– Ability to make a call is vital, especially in emergency
Internet measurement sites:– 14 nodes worldwide, not just Internet2 and alike
Definitions:– Availability = MTBF / (MTBF + MTTR)– Availability = successful calls / first call attempts
Equipment availability: 99.999% (“5 nines”) 5 minutes/year AT&T: 99.98% availability (1997) IP frame relay SLA: 99.9% UK mobile phone survey: 97.1-98.8%
First Look of AvailabilityFirst Look of Availability Call success probability:
– 62,027 calls succeeded, 292 failed 99.53% availability
– Roughly constant across I2, I2+, commercial ISPs: 99.39-99.58%
Overall network loss– PSTN: once connected, call
usually of good quality exception: mobile phones
– Compute % time below loss threshold
5% loss causes degradation for many codecs
others acceptable till 20%
loss 0% 5% 10% 20%
All 82.3 97.48 99.16 99.75
ISP 78.6 96.72 99.04 99.74
I2 97.7 99.67 99.77 99.79
I2+ 86.8 98.41 99.32 99.76
US 83.6 96.95 99.27 99.79
Int. 81.7 97.73 99.11 99.73
US ISP
73.6 95.03 98.92 99.79
Int. ISP
81.2 97.60 99.10 99.71
Network OutagesNetwork Outages Sustained packet losses
– arbitrarily defined at 8 packets– far beyond recoverable (FEC,
interpolation) 23% packet losses are outages Make up significant part of 0.25%
unavailability Symmetric: AB BA Spatially correlated: AB
AX Not correlated across networks
(e.g., I2 and commercial) Mostly short (a few seconds), but
some are very long (100’s of seconds), make up majority of outage time
0.0001
0.001
0.01
0.1
1
0 50 100 150 200 250 300 350 400
Com
plem
enta
ry C
DF
outage duration (sec)
US Domestic pathsInternational paths
1e-05
0.0001
0.001
0.01
0.1
1
0 50 100 150 200 250 300 350 400
Com
plem
enta
ry C
DF
outage duration (sec)
all pathsInternet2
Outage-induced Call Abortion Outage-induced Call Abortion ProbabilityProbability
Long interruption user likely to abandon call
from E.855 survey: P[holding] = e-t/17.26 (t in seconds)
half the users will abandon call after 12s
2,566 have at least one outage 946 of 2,566 expected to be
dropped 1.53% of all calls
all 1.53%
I2 1.16%
I2+ 1.15%
ISP 1.82%
US 0.99%
Int. 1.78%
US ISP 0.86%
Int. ISP 2.30%
Summary of Service Summary of Service AvailabilityAvailability
Through several metrics, one can translate from network loss to VoIP service availability (no Internet dial-tone)
Current results show availability far below five 9’s, but comparable to mobile telephony– Outage statistics are similar in research and ISP
networks Working on identifying fault sources and locations Additional measurement sites are welcome
ConclusionsConclusions Measuring QoS
– Loss burstiness and delay correlation affects (generally worsens) perceived quality
– Bridging objective and subjective metrics: the E-model, or speech recognition based MOS prediction
– Performance of real products: IP phones and soft clients Ensuring/improving QoS
– Network provisioning (voice traffic aggregation) Efficient, but may be expensive to deploy and manage
– End-to-End (FEC > LBR, PLC) Easier to deploy, but must control overhead of FEC
Reality Check– Good implementation at the end-point (e.g., IP phones) is vital– VoIP deployment requires PSTN integration and security– Service availability is crucial for VoIP, but still far from 99.999%
over the Internet
Ongoing and Future WorkOngoing and Future Work
Sampling Internet performance– Where do the problems reside?
Access networks (Cable, DSL), or International paths?
– How can we solve these problems? Can adaptive FEC react fast enough to changes in
network conditions?
Playout delay behaviors of VoIP end-points– How well do they react to jitter, delay spikes?