18
New Models for Perceived Voice Quality Prediction and their Applications in Playout Buffer Optimization for VoIP Networks University of Plymouth United Kingdom {L.Sun; E.Ifeachor}@plymouth.ac.uk Dr. Lingfen Sun Prof Emmanuel Ifeachor

University of Plymouth United Kingdom {L.Sun; E.Ifeachor}@plymouth.ac.uk

  • Upload
    vaughn

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

New Models for Perceived Voice Quality Prediction and their Applications in Playout Buffer Optimization for VoIP Networks. Dr. Lingfen Sun Prof Emmanuel Ifeachor. University of Plymouth United Kingdom {L.Sun; E.Ifeachor}@plymouth.ac.uk. Outline. Background Speech quality for VoIP networks - PowerPoint PPT Presentation

Citation preview

New Models for Perceived Voice Quality Prediction and their

Applications in Playout Buffer Optimization for VoIP Networks

University of PlymouthUnited Kingdom{L.Sun; E.Ifeachor}@plymouth.ac.uk

Dr. Lingfen SunProf Emmanuel Ifeachor

ICC 2004, Paris France, 20-24 June 2004 2

Outline

Background Speech quality for VoIP networks Current status Aims of the project

Main Contributions Novel non-intrusive voice quality prediction models Novel perceptual-based speech quality optimization (e.g. jitter

buffer optimization) mechanism Conclusions and Future Work

ICC 2004, Paris France, 20-24 June 2004 3

Background – Speech Quality for VoIP Networks

VoIP speech quality: end-user perceived quality (MOS), an important metric.

Affected by IP network impairments and other impairments. Voice quality measurement: subjective (MOS ) or objective

(intrusive or non-intrusive)

SCN SCNIP Network

Gateway Gateway

SCN: Switched Comm. Networks (PSTN, ISDN, GSM …)

End-to-end Perceived speech quality

Intrusivemeasurement

Non-intrusivemeasurement

MOS

MOS

Reference speech Degraded speech

ICC 2004, Paris France, 20-24 June 2004 4

Current Status and Problems

Lack of an efficient non-intrusive speech quality measurement method E-model (a complicated computational model) Based on subjective tests to derive models/parameters, time-

consuming and expensive. Only limited models exist Lack of perceptual optimization control methods

only based on individual network parameters for buffer optimization and QoS control purposes

not perceptual-based optimization control

ICC 2004, Paris France, 20-24 June 2004 5

Aims of the Project

IP Network

ReceiverVoice source

Voice receiver

Encoder

Sender

PacketizerJitter

bufferDecoder

De-packetizer

Non-intrusivemeasurement

MOS

End-to-end perceived voice quality (MOS)

To develop novel and efficient method/models for non-intrusive quality prediction,

To apply the models for perceptual-based optimization control ( e.g. buffer optimization or adaptive sender-bit-rate QoS control).

ICC 2004, Paris France, 20-24 June 2004 6

Novel Non-intrusive Voice Quality Prediction

Based on intrusive quality measurement (e.g. PESQ) to predict voice quality non-intrusively which avoids subjective tests.

A generic method which can be applied to audio, image and video.

VoIP Network

New model

(packet loss, delay, codec …)

Predicted MOSc

PESQ

E-model Measured MOScdelay

MOS(PESQ)

Reference speech Degraded speech

Intrusive method

(regression or ANN models)Non-intrusive method

ICC 2004, Paris France, 20-24 June 2004 7

New Structure to Obtain MOSc

PESQ can only predict one-way listening speech quality (expressed as MOS).

By a new combined PESQ/E-model structure, a conversational speech quality (MOSc) can be obtained as Measured MOSc.

PESQ

Delay model

MOS R Ie

Ie

End-to-end delay

E-modelMOSc

Id

Reference speech

Degraded speech

MOS (PESQ)

ICC 2004, Paris France, 20-24 June 2004 8

Regression based Models (1)

Nonlinear regression models are derived for Ie based on PESQ/PESQ-LQ

Further combine Ie with Id to obtain MOSc.

MOS (PESQ)

Ie model

Ie

E-modelMOSc

Id modelId

Delay (d)

CodecPacket loss

Reference speech

Degraded speech

Speechdatabase

Encoder Loss model Decoder

Nonlinear regression model (Ie model) Predicted Ie

PESQ/PESQ-LQ

MOS RIeMeasured Ie

(a)

(b)

ICC 2004, Paris France, 20-24 June 2004 9

Regression based Models (2)

Ie can be modelled by a logarithm fitting function with the form of

Parameters for different codecs (PESQ) cbaIe )1ln(

Parameters AMR(H) AMR(L) G.729 G.723.1 iLBC

a 16.68 30.86 21.14 20.06 12.59

b*100 30.11 4.26 12.73 10.24 9.45

c 14.96 31.66 22.45 25.63 20.42

ICC 2004, Paris France, 20-24 June 2004 10

Regression Models for AMR (12.2Kb/s)

96.14)3011.01ln(68.16 eI

e.g. for AMR (12.2Kb/s),

The goodness of fit is:

SSE = 2.83 and R2 = 0.998

MOS vs. packet loss and delay

ICC 2004, Paris France, 20-24 June 2004 11

Perceptual-based Buffer Optimization

Motivation: only based on individual network parameters (e.g. delay or loss) targeting only minimum average delay or minimum late arrival loss,

not maximum MOS. There is a need to design buffer algorithm to achieve optimum

perceived speech quality.

Contribution A perceptual-based optimization jitter buffer algorithm

o Use regression based models for buffer optimizationo Use a minimum impairment criterion instead of traditional maximum

MOS scoreo A Weibull delay distribution based on trace analysiso A perceptual-based optimization of playout buffer algorithm

ICC 2004, Paris France, 20-24 June 2004 12

Impairment Function Im Define: impairment function Im

parameters related codec are and 0 if 1)(

0 if 0)(

)1ln()3.177()3.177(11.0024.0

),(

baxxH

xxHwhere

badHdd

IIdfI edm

rdnnnnbn edXP )/)(()100()()100(

Playout delay d

Weilbull distributionbuffer loss

b

ICC 2004, Paris France, 20-24 June 2004 13

Minimum Impairment Criterion Define: minimum impairment criterion

Given: network delay dn, network loss n and codec type

Estimate: an optimized playout delay dopt

Such that: minimize Im can be reached.

d1 d2 d3

d4

Minimum Im

ICC 2004, Paris France, 20-24 June 2004 14

Perceptual-based Optimization Buffer Algorithm

For every packet i received, calculate network delay ni

If mode == SPIKE then

if ni tail*old_d then

mode = NORMAL

elseif ni > head*di then

mode = SPIKE; old_d = di

else

-update delay records for the past W packets

endifAt the beginning of a talkspurt

If mode == SPIKE then

di = ni

else

-obtain (, , ) for Weilbull distribution for the past W packets

-search playout d which meets minimum Im criterion

endif

ICC 2004, Paris France, 20-24 June 2004 15

Performance Analysis and Comparison (1)

Selected five traces from UoP to CU (USA), DUT (Germany), BUPT (China), and NC (China).

Traces 1 and 3 with high delay variation and traces 2, 4, 5 with low delay variation

Trace Delay (ms)

Jitter (ms)

Loss (%)

1 153 16.2 1.1

2 46 0.8 0.3

3 186 19.5 14.3

4 16 0.7 4.4

5 150 0.2 0.2

ICC 2004, Paris France, 20-24 June 2004 16

Performance Analysis and Comparison (2)

“p-optimum” algorithm achieves the optimum voice quality for all traces.

“adaptive” algorithm achieves sub-optimum quality with low complexity.

Performance comparison for buffer algorithms

0.5

1

1.5

2

2.5

3

3.5

4

1 2 3 4 5

Traces

MO

S

exp-avg

fast-exp

min-delay

spk-delay

adaptive

p-optimum

ICC 2004, Paris France, 20-24 June 2004 17

Conclusions and Future Work

Conclusions The development of a new methodology and regression models to

predict voice quality non-intrusively. Demonstrated the application of new non-intrusive voice quality

prediction models to perceptual-based optimization of playout buffer algorithms.

Future Work To consider buffer adaptation during a talkspurt in order to achieve

the best trade-off between delay, loss and end-to-end jitter. To extend the work to improve the performance of multimedia

services (e.g. audio/image/video) over IP networks

ICC 2004, Paris France, 20-24 June 2004 18

Contact Details

http://www.tech.plymouth.ac.uk/spmc Dr. Lingfen Sun

[email protected] Prof Emmanuel Ifeachor

[email protected] Any questions?

Thank you!