QoS Measurement and Management for VoIP

QoS Measurement and QoS Measurement and Management for VoIPManagement for VoIP

Wenyu Jiang

IRT LabMarch 5, 2003

Introduction to VoIP & Introduction to VoIP & IP TelephonyIP Telephony

Transport of voice packets over IP networks Cost savings

– Consolidates voice and data networks– Avoids leased lines, long-distance toll calls

Smart and new services– Call management (filtering, TOD forwarding): CPL– Better than PSTN quality: wide-band codecs

Protocols and Standards– Signaling: SIP (IETF), H.323 (ITU-T)– Transport: RTP/RTCP (IETF)

Practical Issues in VoIPPractical Issues in VoIPQuality of Service (QoS)

– Internet is a best-effort network Loss, delay and jitter Users expect at least PSTN quality for VoIP!

Ease of deployment– Requires seamless integration with legacy

networks (PSTN/PBX)– Security is a must

High yardstick of service availability– Can your network achieve 99.999% up time?

OutlineOutline QoS measurement

– Objective vs. subjective metrics – Automated measurement of subjective quality

QoS management: improving your quality– End-to-End: FEC, LBR, PLC– Network provisioning: voice traffic aggregation

Reality check– Performance of end-points (IP phones, …)– Deployment issues in VoIP– Evaluation of VoIP service availability through

Internet measurement

Workings of a VoIP ClientWorkings of a VoIP ClientAudio is packetized, encoded and transmittedForward error correction (FEC) may be used

to recover lost packetsPlayout control smoothes out jitter to

minimize late losses; coupled with FECPacket loss concealment (PLC)

– Last line of “defense” after FEC and playout

FEC affects playout control

addedloss, jitter

recoveryFEC

unrecoverableplayoutdelaycontrol

losses by FEC

& decoding

lossconcealmentInternet

addedlatelosses

packets with FECmultimedia

LBR: An Alternative to FECLBR: An Alternative to FEC An (n,k) block FEC code can recover n-k losses Low Bit-rate Redundancy (LBR)

– Transmit a lower bit-rate version of original audio– No notion of “blocks”– Not bit-exact recovery

CA B D

A BF

E

C D

transmission time

FEC block 1 FEC block 2

FEC dataFEC data

C

a'A B

transmission time

LBR datab'

E

c'

F

d'

D

Objective QoS Metrics: LossObjective QoS Metrics: Loss Internet packet loss is often bursty

– May worsen voice quality than random (Bernoulli) loss Characterization of packet loss

– 2-state Markov (Gilbert) model: conditional loss prob.

– More detailed models, but more states! Extended Gilbert model, nth order Markov model Hidden Markov model, Gilbert-Elliot model, inter-loss distance

– More states Larger test set, loss of big picture, and Adaptive applications can trade-off model accuracy for fast feedback Gilbert model provides an acceptable compromise

0 11-p p

q

(non-loss) (loss)

1-q = p c

Effect of Gilbert Loss ModelEffect of Gilbert Loss Model Loss burst distribution of a packet trace

– Roughly, though not exactly exponential Loss burstiness on FEC performance

– FEC less efficient under bursty loss

0.1

1

10

100

1000

0 2 4 6 8 10 12

nu

mb

er o

f o

ccu

rren

ces

Loss burst length

Packet traceGilbert model

0

0.5

1

1.5

2

2.5

3

10 20 30 40 50 60

p_f:

fina

l los

s% a

fter

FE

C

conditional loss p_c (%)

GilbertBernoulli

Objective QoS Metrics: DelayObjective QoS Metrics: Delay Complementary Conditional CDF (C3DF)

– More descriptive than auto-correlation function (ACF)– Delay correlation rises rapidly beyond a threshold– Approximates conditional late loss probability

lag=3

lag=5

lag=10lag=20

unconditional

lag=2

lag=1

0

0.2

0.4

0.6

0.8

1

0 0.05 0.1 0.15 0.2 0.25 0.3

y: p

roba

bilit

y

x: delay (sec)

idltdtdPtf ilii packet ofdelay : ,...,3,2,1 lag ],|[)(

Subjective QoS MetricsSubjective QoS MetricsPerceived quality

– Mean Opinion Score (MOS) ITU-T P.800/830 Obtained via listening tests

– MOS variations DMOS (Degradation) CMOS (Comparison) MOSc (Conversational): considers delay A/B preference

Pros: more meaningful to end usersCons: time consuming, labor intensive

MOS Grade Score

Excellent 5

Good 4

Fair 3

Poor 2

Bad 1

Effect of Loss Model on Effect of Loss Model on Perceived QualityPerceived Quality

Codec: G.729 (8kb/s ITU std)Random (Bernoulli) vs. bursty (Gilbert) loss

– Bursty lower MOS– True even when FEC or LBR is used

2

2.5

3

3.5

4

4.5

0.02 0.04 0.06 0.08 0.1 0.12

MO

S

loss probability

Effect of random vs. bursty loss on MOS quality

random (Bernoulli) lossbursty (Gilbert) loss

2

2.5

3

3.5

4

4.5

5

0.02 0.04 0.06 0.08 0.1 0.12

MO

S

loss probability

random vs. bursty loss on FEC (G.723.1) quality

FEC (3,2) (Gilbert)FEC (3,2) (Bernoulli)

Going Further: Bridging Going Further: Bridging Objective and Subjective MetricsObjective and Subjective Metrics The E-model (ITU-T G.107/108)

– Originally for telephone network planning– Considers various impairments– Reduces to delay and loss impairment when adapted for

VoIP

Objective quality estimation algorithms– Suitable when network stats is not available, e.g.,

phone-to-phone service with IP in between.– Speech recognition performance may be used as a

quality predictor, by comparing with original text

The E-modelThe E-model Map from loss and delay to

impairment scores (Ie, Id) Compute a gross score (R

value) and map to MOSc

Limited number of codec loss impairment mappings 10

15

20

25

30

35

40

45

50

0 0.03 0.06 0.09 0.12 0.15 0.18

Ie (l

oss

impa

irmen

t)

average loss probability

G.729 T=20ms random loss

0.5

1

1.5

2

2.5

3

3.5

4

4.5

20 40 60 80 100

MO

S

R value

R to MOS mapping

0

5

10

15

20

25

30

35

0 50 100 150 200 250 300 350 400

Id (d

elay

impa

irmen

t)

delay (ms)

E-model Id

Using Speech Recognition to Using Speech Recognition to Predict MOSPredict MOS

Evaluation of automatic speech recognition (ASR) based MOS prediction– IBM ViaVoice Linux version– Codec used: G.729– Performance metric

absolute word recognition ratio

relative word recognition ratio

dsspoken wor of # total

wordsrecognizedcorrectly of #absR

yprobabilit loss is ,%)0(

)()( p

R

pRpR

abs

absrel

Recognition Ratio vs. MOSRecognition Ratio vs. MOSBoth MOS and Rabs

decrease w.r.t. lossThen, eliminate

middle variable p 2

2.2

2.4

2.6

2.8

3

3.2

3.4

3.6

0 2 4 6 8 10 12 14 16

MO

S

loss rate (%)

Impact of packet loss on audio quality

G.729 codec

28

30

32

34

36

38

40

42

44

0 2 4 6 8 10 12 14 16

wor

d re

cogn

ition

rat

io (%

)

loss rate (%)

Impact of packet loss on automatic speech recognition

G.729 codec

2

2.2

2.4

2.6

2.8

3

3.2

3.4

3.6

3.8

28 30 32 34 36 38 40 42 44

MO

S

word recognition ratio (%)

mapping from speech recognition performance to MOS

speech recognition performance

Speaker DependencySpeaker Dependency Absolute performance

is speaker-dependent But relative word

recognition ratio is not Suitable for MOS

prediction

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

0 2 4 6 8 10 12 14 16

rela

tive

wor

d re

cogn

ition

rat

io R

_rel

packet loss probability p (%)

Speaker ASpeaker BSpeaker C

2

2.2

2.4

2.6

2.8

3

3.2

3.4

3.6

3.8

0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

MO

S

relative word recognition ratio R_rel


0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 2 4 6 8 10 12 14 16

wor

d re

cogn

ition

rat

io

packet loss probability p (%)


Summary of QoS Summary of QoS MeasurementMeasurement

Loss burstiness:– Affects (generally worsens) perceived quality as well

as FEC performance– May be described with, e.g., a Gilbert model

Delay correlation:– Increases rapidly beyond a threshold, revealed through

Complementary Conditional CDF (C3DF)– Late losses are also bursty

Perceived quality (MOS) estimation– Analytical: the E-model– If network statistics N/A: relative word recognition

ratio can provide speaker-independent MOS prediction




Reality check– Performance of VoIP end-points (IP phones, …)– Deployment issues in VoIP– Evaluation of VoIP service availability through Internet

measurement

Quality of FEC vs. LBRQuality of FEC vs. LBR FEC is substantially and consistently better

– At comparable bandwidth overhead– Across all codec configurations tested

2

2.5

3

3.5

4

4.5

0.02 0.04 0.06 0.08 0.1 0.12

MO

S

loss probability

FEC vs. LBR based on G.723.1

J: FEC (2,1)I: G.723.1 LBR

2

2.5

3

3.5

4

4.5

0.02 0.04 0.06 0.08 0.1 0.12

MO

S

loss probability

FEC vs. LBR based on AMR

N: AMR12.2+FEC (3,2)M: AMR12.2+6.7 LBR

G.729+G.723.1 LBR AMR LBR

Quality of FEC under Bursty Quality of FEC under Bursty LossLoss

Packet interval T has a stronger effect on MOS with FEC than without FEC

0.5-0.6 MOS

2.5

3

3.5

4

4.5

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18

MO

S (M

ean

Opi

nion

Sco

re)

p_u (overall loss rate)

conditional loss probability p_c = 30%

T=20ms

2

T=40ms

T=20ms, FEC

T=40ms, FEC

FEC MOS Optimization FEC MOS Optimization Considering Delay EffectConsidering Delay Effect

Larger T FEC efficiency, but delay Optimizing T with the E-model

– Calculate final loss probability after FEC, apply delay impairment of FEC, map to MOSc

Prediction close to FEC MOS test results– Suitable for analytical perceived quality prediction

2

2.5

3

3.5

4

20 40 60 80 100 120 140 160 180

MO

S_c

packet interval T (ms)

FEC MOS optimization, Id != 0, d=3*T

p_u=4%p_u=8%

p_u=12%p_u=16%

2.4

2.6

2.8

3

3.2

3.4

3.6

3.8

4

4.2

0 2 4 6 8 10 12 14 16

MO

S_c

original loss rate (%)

FEC MOS prediction, p_c=30%

E-model prediction T=40msreal MOS test T=40ms

Trade-off Analysis between Trade-off Analysis between Codec Robustness and FECCodec Robustness and FEC

3 loss repair options– FEC, LBR, PLC

Loss-resilient codec– Better PLC

iLBC (IETF)

– But more bit-rates– Better than FEC?

1.5

2

2.5

3

3.5

4

0 0.03 0.06 0.09 0.12 0.15

MO

S


iLBC 14kb/sG.729 8kb/s

G.723.1 6.3kb/s

Observations and ResultsObservations and Results When considering delay:

– iLBC is usually preferred in low loss conditions– G.729 or G.723.1 + FEC better for high loss

Example: max bandwidth 14 kb/s– Consider delay impairment (use MOSc)

2.4

2.6

2.8

3

3.2

3.4

3.6

3.8

4

0 0.03 0.06 0.09 0.12 0.15

MO

S_c


iLBC,no FECG.729+(5,3)

G.723.1+(2,1),T=60ms

G.729+(5,3)

G.723.1+(2,1),T=60ms

iLBC

33.23.43.63.8

4

0 0.03 0.06 0.09 0.12 0.15

MO

S_c


Max BW: 14 kb/s

2.82.62.4

Effect of Max Bandwidth on Effect of Max Bandwidth on Achievable QualityAchievable Quality

14 to 21 kb/s: significant improvement in MOSc

From 21 to 28 kb/s: marginal change due to increasing delay impairment by FEC

2.4

2.6

2.8

3

3.2

3.4

3.6

3.8

4

0 0.03 0.06 0.09 0.12 0.15

MO

S_c


Max BW: 14 kb/sMax BW: 21 kb/sMax BW: 28 kb/s

Provisioning a VoIP NetworkProvisioning a VoIP Network Silence detection/suppression

– Transmit only during On period, saves bandwidth– Allows traffic aggregation through statistical multiplexing

Characteristics of On/Off patterns in VoIP– Traditionally found to be exponentially distributed– Modern silence detectors (G.729B VAD, NeVoT SD) produce

different patterns

1e-05

0.0001

0.001

0.01

0.1

1

0 50 100 150 200 250 300 350 400 450 500

com

plem

enta

ry C

DF

spurt/gap duration (in 10 ms frames)

talk-spurt/gap distribution, G.729B VAD

real spurt CDFexponential spurt CDF

real gap CDFexponential gap CDF

1e-05

0.0001

0.001

0.01

0.1

1

0 200 400 600 800 1000

com

plem

enta

ry C

DF

spurt/gap duration (in 10 ms frames)

talk-spurt/gap distribution, Nevot SD (default setting)

real spurt CDFexponential spurt CDF

real gap CDFexponential gap CDF

Traffic Aggregation SimulationTraffic Aggregation Simulation Token bucket filter with N sources, R: reserved to peak BW ratio CDF model resembles trace model in most cases Exponential (traditional) model

– Under-predicts out-of-profile packet probability;– Under-prediction ratio as token buffer size B

Similar results for NeVoT SD

Summary of QoS Summary of QoS ManagementManagement

End-to-End– FEC is superior in quality to LBR– Codec robustness is better than FEC in low loss

conditions Combining both schemes brings the best of both sides

Network provisioning– Observation: New silence detectors (G.729B, NeVoT

SD) non-exponential voice On/Off patterns– Result: performance of voice traffic aggregation under

new On/Off patterns– Important in traffic engineering and Service Level

Agreement (SLA) validation




Reality check– Performance of end-points (IP phones, …)– Deployment issues in VoIP– Assessment of VoIP service availability through Internet

measurement

Mouth-to-ear Delay of VoIP Mouth-to-ear Delay of VoIP End-pointsEnd-points

All receivers can adjust M2E delay adaptively whenever it is too low or too high

M2E delay depends mainly on receiver (esp. RAT) HW phones have relatively low delay (~45-90ms)

35

40

45

50

55

60

0 50 100 150 200 250 300 350

M2E

del

ay (m

s)

time (sec)

experiment 1-1experiment 1-2

silence gaps

406080

100120140160180200220240

3Com Cisco Mediatrix Pingtel RAT

M2E

del

ay (m

s)

Receiver

Effect of Sender and Receiver

Sender: 3ComSender: Cisco

Sender: MediatrixSender: Pingtel

Sender: RAT

But Adaptiveness But Adaptiveness PerfectionPerfection

Symptom of playout buffer underflow

Waveforms are dropped

Occurred at point of delay adjustment

Bugs in software?

LAN perfect quality?

Major ObservationsMajor Observations Overall: end-points matter a lot! HW IP phones: 45-90ms average M2E delay SW clients:

– Messenger 2000 lowest (68ms), XP (96-120ms) c.f. GSMPSTN: 110ms either direction

– NetMeeting very bad (> 400ms) PLC robustness

– Acceptable in all 3 IP phones tested, Cisco phone more robust Silence detection/suppression

– Works for speech input– Often fails for non-speech (e.g., music) input

Generates many unnatural gaps Not good for customer support center (on-hold music)!

Acoustic echo cancellation (AEC): – Good on most IP phones (Echo Return Loss > 40 dB)– But some do not implement AEC at all

Reality Check #2: IP Reality Check #2: IP Telephony DeploymentTelephony Deployment

Localized deployment at Columbia Univ.

SIP proxy,redirectserver

SQLdatabase

sipd

ConferenceServer

VoicemailServer

T1/E1RTP/SIP

Regular phone

SIP/PSTN Gateway

TelephoneSwitch/PBX

Web based configuration

Web Server

Server status monitoring

Core Server

IP Phones

Issues and Lessons LearnedIssues and Lessons Learned PSTN/PBX integration

– Requires full understanding of legacy networks Lower layer (e.g., T1 line configuration)

– Parameters must match on both PSTN/PBX and gateway! PBX access configurations

– To ensure calls go through in both directions Address translation (dial-plan) in both directions

– Previous lessons/experiences can help greatly E.g., second gateway installed in weeks instead of months

Security– Issue: SIP/PSTN gateway has no authentication feature– Solution:

Use gateway’s access control lists to block direct calls SIP proxy server handles authentication using record-route

Reality Check #3: VoIP Reality Check #3: VoIP Service AvailabilityService Availability

Focus on availability rather than traditional QoS– Delay is a minor issue; FEC recovers most isolated losses– Ability to make a call is vital, especially in emergency

Internet measurement sites:– 14 nodes worldwide, not just Internet2 and alike

Definitions:– Availability = MTBF / (MTBF + MTTR)– Availability = successful calls / first call attempts

Equipment availability: 99.999% (“5 nines”) 5 minutes/year AT&T: 99.98% availability (1997) IP frame relay SLA: 99.9% UK mobile phone survey: 97.1-98.8%

First Look of AvailabilityFirst Look of Availability Call success probability:

– 62,027 calls succeeded, 292 failed 99.53% availability

– Roughly constant across I2, I2+, commercial ISPs: 99.39-99.58%

Overall network loss– PSTN: once connected, call

usually of good quality exception: mobile phones

– Compute % time below loss threshold

5% loss causes degradation for many codecs

others acceptable till 20%

loss 0% 5% 10% 20%

All 82.3 97.48 99.16 99.75

ISP 78.6 96.72 99.04 99.74

I2 97.7 99.67 99.77 99.79

I2+ 86.8 98.41 99.32 99.76

US 83.6 96.95 99.27 99.79

Int. 81.7 97.73 99.11 99.73

US ISP

73.6 95.03 98.92 99.79

Int. ISP

81.2 97.60 99.10 99.71

Network OutagesNetwork Outages Sustained packet losses

– arbitrarily defined at 8 packets– far beyond recoverable (FEC,

interpolation) 23% packet losses are outages Make up significant part of 0.25%

unavailability Symmetric: AB BA Spatially correlated: AB

AX Not correlated across networks

(e.g., I2 and commercial) Mostly short (a few seconds), but

some are very long (100’s of seconds), make up majority of outage time

0.0001

0.001

0.01

0.1

1

0 50 100 150 200 250 300 350 400

Com

plem

enta

ry C

DF

outage duration (sec)

US Domestic pathsInternational paths

1e-05

0.0001

0.001

0.01

0.1

1

0 50 100 150 200 250 300 350 400

Com

plem

enta

ry C

DF

outage duration (sec)

all pathsInternet2

Outage-induced Call Abortion Outage-induced Call Abortion ProbabilityProbability

Long interruption user likely to abandon call

from E.855 survey: P[holding] = e-t/17.26 (t in seconds)

half the users will abandon call after 12s

2,566 have at least one outage 946 of 2,566 expected to be

dropped 1.53% of all calls

all 1.53%

I2 1.16%

I2+ 1.15%

ISP 1.82%

US 0.99%

Int. 1.78%

US ISP 0.86%

Int. ISP 2.30%

Summary of Service Summary of Service AvailabilityAvailability

Through several metrics, one can translate from network loss to VoIP service availability (no Internet dial-tone)

Current results show availability far below five 9’s, but comparable to mobile telephony– Outage statistics are similar in research and ISP

networks Working on identifying fault sources and locations Additional measurement sites are welcome

ConclusionsConclusions Measuring QoS

– Loss burstiness and delay correlation affects (generally worsens) perceived quality

– Bridging objective and subjective metrics: the E-model, or speech recognition based MOS prediction

– Performance of real products: IP phones and soft clients Ensuring/improving QoS

– Network provisioning (voice traffic aggregation) Efficient, but may be expensive to deploy and manage

– End-to-End (FEC > LBR, PLC) Easier to deploy, but must control overhead of FEC

Reality Check– Good implementation at the end-point (e.g., IP phones) is vital– VoIP deployment requires PSTN integration and security– Service availability is crucial for VoIP, but still far from 99.999%

over the Internet

Ongoing and Future WorkOngoing and Future Work

Sampling Internet performance– Where do the problems reside?

Access networks (Cable, DSL), or International paths?

– How can we solve these problems? Can adaptive FEC react fast enough to changes in

network conditions?

Playout delay behaviors of VoIP end-points– How well do they react to jitter, delay spikes?

Documents

QoS Measurement and Management for VoIP