University of Plymouth United Kingdom {L.Sun; E.Ifeachor}@plymouth.ac.uk

New Models for Perceived Voice Quality Prediction and their

Applications in Playout Buffer Optimization for VoIP Networks

University of PlymouthUnited Kingdom{L.Sun; E.Ifeachor}@plymouth.ac.uk

Dr. Lingfen SunProf Emmanuel Ifeachor

ICC 2004, Paris France, 20-24 June 2004 2

Outline

Background Speech quality for VoIP networks Current status Aims of the project

Main Contributions Novel non-intrusive voice quality prediction models Novel perceptual-based speech quality optimization (e.g. jitter

buffer optimization) mechanism Conclusions and Future Work


Background – Speech Quality for VoIP Networks

VoIP speech quality: end-user perceived quality (MOS), an important metric.

Affected by IP network impairments and other impairments. Voice quality measurement: subjective (MOS ) or objective

(intrusive or non-intrusive)

SCN SCNIP Network

Gateway Gateway

SCN: Switched Comm. Networks (PSTN, ISDN, GSM …)

End-to-end Perceived speech quality

Intrusivemeasurement

Non-intrusivemeasurement

MOS

MOS

Reference speech Degraded speech


Current Status and Problems

Lack of an efficient non-intrusive speech quality measurement method E-model (a complicated computational model) Based on subjective tests to derive models/parameters, time-

consuming and expensive. Only limited models exist Lack of perceptual optimization control methods

only based on individual network parameters for buffer optimization and QoS control purposes

not perceptual-based optimization control


Aims of the Project

IP Network

ReceiverVoice source

Voice receiver

Encoder

Sender

PacketizerJitter

bufferDecoder

De-packetizer

Non-intrusivemeasurement

MOS

End-to-end perceived voice quality (MOS)

To develop novel and efficient method/models for non-intrusive quality prediction,

To apply the models for perceptual-based optimization control ( e.g. buffer optimization or adaptive sender-bit-rate QoS control).


Novel Non-intrusive Voice Quality Prediction

Based on intrusive quality measurement (e.g. PESQ) to predict voice quality non-intrusively which avoids subjective tests.

A generic method which can be applied to audio, image and video.

VoIP Network

New model

(packet loss, delay, codec …)

Predicted MOSc

PESQ

E-model Measured MOScdelay

MOS(PESQ)

Reference speech Degraded speech

Intrusive method

(regression or ANN models)Non-intrusive method


New Structure to Obtain MOSc

PESQ can only predict one-way listening speech quality (expressed as MOS).

By a new combined PESQ/E-model structure, a conversational speech quality (MOSc) can be obtained as Measured MOSc.

PESQ

Delay model

MOS R Ie

Ie

End-to-end delay

E-modelMOSc

Id

Reference speech

Degraded speech

MOS (PESQ)


Regression based Models (1)

Nonlinear regression models are derived for Ie based on PESQ/PESQ-LQ

Further combine Ie with Id to obtain MOSc.

MOS (PESQ)

Ie model

Ie

E-modelMOSc

Id modelId

Delay (d)

CodecPacket loss

Reference speech

Degraded speech

Speechdatabase

Encoder Loss model Decoder

Nonlinear regression model (Ie model) Predicted Ie

PESQ/PESQ-LQ

MOS RIeMeasured Ie

(a)

(b)


Regression based Models (2)

Ie can be modelled by a logarithm fitting function with the form of

Parameters for different codecs (PESQ) cbaIe )1ln(

Parameters AMR(H) AMR(L) G.729 G.723.1 iLBC

a 16.68 30.86 21.14 20.06 12.59

b*100 30.11 4.26 12.73 10.24 9.45

c 14.96 31.66 22.45 25.63 20.42


Regression Models for AMR (12.2Kb/s)

96.14)3011.01ln(68.16 eI

e.g. for AMR (12.2Kb/s),

The goodness of fit is:

SSE = 2.83 and R2 = 0.998

MOS vs. packet loss and delay


Perceptual-based Buffer Optimization

Motivation: only based on individual network parameters (e.g. delay or loss) targeting only minimum average delay or minimum late arrival loss,

not maximum MOS. There is a need to design buffer algorithm to achieve optimum

perceived speech quality.

Contribution A perceptual-based optimization jitter buffer algorithm

o Use regression based models for buffer optimizationo Use a minimum impairment criterion instead of traditional maximum

MOS scoreo A Weibull delay distribution based on trace analysiso A perceptual-based optimization of playout buffer algorithm


Impairment Function Im Define: impairment function Im

parameters related codec are and 0 if 1)(

0 if 0)(

)1ln()3.177()3.177(11.0024.0

),(

baxxH

xxHwhere

badHdd

IIdfI edm

rdnnnnbn edXP )/)(()100()()100(

Playout delay d

Weilbull distributionbuffer loss

b


Minimum Impairment Criterion Define: minimum impairment criterion

Given: network delay dn, network loss n and codec type

Estimate: an optimized playout delay dopt

Such that: minimize Im can be reached.

d1 d2 d3

d4

Minimum Im


Perceptual-based Optimization Buffer Algorithm

For every packet i received, calculate network delay ni

If mode == SPIKE then

if ni tail*old_d then

mode = NORMAL

elseif ni > head*di then

mode = SPIKE; old_d = di

else

-update delay records for the past W packets

endifAt the beginning of a talkspurt

If mode == SPIKE then

di = ni

else

-obtain (, , ) for Weilbull distribution for the past W packets

-search playout d which meets minimum Im criterion

endif


Performance Analysis and Comparison (1)

Selected five traces from UoP to CU (USA), DUT (Germany), BUPT (China), and NC (China).

Traces 1 and 3 with high delay variation and traces 2, 4, 5 with low delay variation

Trace Delay (ms)

Jitter (ms)

Loss (%)

1 153 16.2 1.1

2 46 0.8 0.3

3 186 19.5 14.3

4 16 0.7 4.4

5 150 0.2 0.2


Performance Analysis and Comparison (2)

“p-optimum” algorithm achieves the optimum voice quality for all traces.

“adaptive” algorithm achieves sub-optimum quality with low complexity.

Performance comparison for buffer algorithms

0.5

1

1.5

2

2.5

3

3.5

4

1 2 3 4 5

Traces

MO

S

exp-avg

fast-exp

min-delay

spk-delay

adaptive

p-optimum


Conclusions and Future Work

Conclusions The development of a new methodology and regression models to

predict voice quality non-intrusively. Demonstrated the application of new non-intrusive voice quality

prediction models to perceptual-based optimization of playout buffer algorithms.

Future Work To consider buffer adaptation during a talkspurt in order to achieve

the best trade-off between delay, loss and end-to-end jitter. To extend the work to improve the performance of multimedia

services (e.g. audio/image/video) over IP networks


Contact Details

http://www.tech.plymouth.ac.uk/spmc Dr. Lingfen Sun

[email protected] Prof Emmanuel Ifeachor

[email protected] Any questions?

Thank you!

Documents

University of Plymouth United Kingdom {L.Sun; E.Ifeachor}@plymouth.ac.uk