Upload
dangminh
View
216
Download
0
Embed Size (px)
Citation preview
©Dingli (27/7/2013) DL1AMOSWP Rev1 1 / 37
Speech Quality Testing Solution
(MOS) Whitepaper
©Dingli (27/7/2013) DL1AMOSWP Rev1 2 / 37
Revision History Date Version Author Description
2013-05-06 1.0 Geng
Xiaoming
First Edition
2013-07-27 1.1 Zhang
Qifei
Change document organization, optimize MOS
testing solution, and add POLQA parameters
©Dingli (27/7/2013) DL1AMOSWP Rev1 3 / 37
Contents 1 Introduction ............................................................................................................................... 4
1.1 Document Organization ............................................................................................ 4
1.2 Abbreviations and Acronyms .................................................................................... 4
2 Development of Speech Quality Testing ................................................................................... 6
3 PESQ Algorithm ....................................................................................................................... 9
3.1 Introduction ............................................................................................................... 9
3.2 PESQ Algorithm Structure ........................................................................................ 9
4 POLQA Algorithm .................................................................................................................. 12
4.1 Introduction ............................................................................................................. 12
4.2 Evolution of POLQA Based on PESQ .................................................................... 12
4.3 Super-Wideband and Narrowband in POLQA ........................................................ 13
4.4 POLQA Perceptual Model and Structure ................................................................ 14
4.5 Ideal POLQA MOS Values in Different Network Environment ............................. 15
5 Dingli MOS Testing Solution .................................................................................................. 16
5.1 Product History ....................................................................................................... 16
5.2 Added Value of Dingli MOS Testing System .......................................................... 16
5.3 Dingli MOS Test System ......................................................................................... 17
5.4 MOS Testing Solutions ........................................................................................... 19
5.4.1 MOS Testing Solution Based on Pilot Pioneer ......................................... 19
5.4.2 MOS Analysis Solution Based on Pilot Navigator ................................... 21
5.4.3 MOS Testing Solution Based on Pilot Fleet (RCU) ................................. 22
5.4.4 MOS Testing Solution Based on Pilot Walktour ....................................... 24
5.5 Details About Dingli MOS Testing Solution ........................................................... 25
5.5.1 Impact of Network Factors on PESQ MOS Value ................................... 25
5.5.2 Parameters in POLQA Testing Result ....................................................... 29
5.5.3 Reference PESQ-LQ Value: Some practical examples .......................... 32
6 Appendix 1: ITU‐T P.862 Standard Relevant Information ................................................... 33
7 References ............................................................................................................................... 37
©Dingli (27/7/2013) DL1AMOSWP Rev1 4 / 37
11 IInnttrroodduuccttiioonn
This document describes the development and evolution of speech quality testing
technologies in Telecommunications network, and focuses on two kinds of objective
testing methods--PESQ and POLQA.
1.1 Document Organization
Chapter 1: Introduction to document objectives, document organization, and
abbreviations and acronyms
Chapter 2: Basics and development of speech quality testing
Chapter 3: Objective speech quality testing method –PESQ algorithm
Chapter 4: New speech quality testing method recommended by ITU –POLQA
algorithm
Chapter 5: Dingli’s speech quality evaluation solutions
1.2 Abbreviations and Acronyms
Table 1 Abbreviations and acronyms
Item Full Spelling Description
ITU International Telecommunication
Union
MOS Mean Opinion Score
MOS-LQO MOS – Listening Quality Objective
AMR Adaptive Multi-Rate Codec
AAC Advance Audio Codec
FER Frame Error Rate
BER Bit Error Rate
POTS Plain Old Telephone System
GSM Global System for Mobile
Communications
CDMA Code Division Multiple Access
LTE Long Term Evolution
©Dingli (27/7/2013) DL1AMOSWP Rev1 5 / 37
Item Full Spelling Description
PAMS Perceptual Analysis Measurement
System Recommendation of ITU-T P.800
PSQM Perceptual Speech Quality
Measure Recommendation of ITU-T P.861
PESQ Perceptual Evaluation of Speech
Quality Recommendation of ITU-T P.862
POLQA Perceptual Objective Listening
Quality Assessment Recommendation of ITU-T P.863
VoIP Voice Over IP
P.56 ITU-T Rec. P.56 Recommendation of ITU-T P.56
©Dingli (27/7/2013) DL1AMOSWP Rev1 6 / 37
22 DDeevveellooppmmeenntt ooff SSppeeeecchh QQuuaalliittyy TTeessttiinngg
Speech quality testing on different networks with a unified standard can be a challenge
due to different communication technologies. For example, in GSM network, RxQual
(based on BER) is used to evaluate the speech quality; and in CDMA network, FER is
used to evaluate the speech quality. In addition, even within the same network, a single
RxQual or FER value cannot represent the true speech quality. A professional speech
quality testing system should be an end-to-end and user-based system. Therefore, a
general testing method is required to perform direct comparative testing for different
networks.
Based on the test subject, speech quality testing can be divided into two categories:
subjective and objective test.
Subjective test depends on listeners to evaluate the communication system performance.
According to the studies in ITU-T P.800 and ITU-T P.830, about 40 to 60 trained listeners
are required to perform subjective perceptual comparison for reference signal and
degraded signal based on detailed criteria. Score on the degraded signal is in accordance
with the MOS scoring standard (scoring from 0 to 5). This way, a final MOS value is
obtained.
Table 2 MOS scoring standard
Degree MOS Value User’s Satisfaction
Excellent 5.0
Very clear
No distortion
No delay
Good 4.0
Clear
Small delay
Little noise
Fair 3.0
Unclear
A certain of delay
A certain of noise
©Dingli (27/7/2013) DL1AMOSWP Rev1 7 / 37
Degree MOS Value User’s Satisfaction
A certain of distortion
Poor 2.0
Unclear
Big noise or interruption
Serious distortion
Bad 1.0 Silence or totally unclear
Very big noise
The subjective testing result is most reliable, and this method can be used to evaluate
network performance and quality of speech with any speech coding mode. However, its
disadvantages are obvious. In the test, factors (such as evaluation environment and
listener) should be strictly controlled; the speech material must be carefully selected;
otherwise, the final result may be affected. All these makes the test time-consuming,
laborious, difficult to organize, and with poor repeatability. As a result, a more efficient and
repeatable method is required in the actual test, that is, an objective testing method.
In practice, objective testing depends on the parameter comparison (of reference and
degraded speech signal) in time and frequency domain, while the test result is calculated
by hardware or software. Some objective testing methods such as PAMS and PSQM are
introduced during the research of objective speech quality testing. However, these
methods have significant limitations. The test result is affected by particular speech codec,
and in some cases, the result is much different from the MOS value in subjective testing.
In ITU-T P.862-2001, the core speech quality testing method is upgraded to PESQ
algorithm which integrates all advantages of previous algorithms. The PESQ test result is
very close to the MOS value in the subjective testing, and PESQ algorithm is widely
accepted and applied.
Later, with the development and evolution of new communication technologies, POLQA
algorithm is developed to support new speech codecs and super-wideband speech, and
handle the time factor in VoIP. Compared with previous algorithms, introduction of POLQA
algorithm to the unified and complex communication networks will achieve significant
©Dingli (27/7/2013) DL1AMOSWP Rev1 8 / 37
improvements in accuracy and credibility. In addition, the capability of handling time factor
makes POLQA algorithm applicable to any speech quality testing scenarios. POLQA
algorithm includes two modes: NB (Narrow Band) and SWB (Supper Wideband),
corresponding to different speech samples with different frequencies. POLQA algorithm
has become the recommended algorithm in ITU-T P.863-2011.
Figure 2.1 shows the evolution of ITU-T recommendations for speech quality testing. The
information is obtained from http://www.polqa.info/index.html.
Figure 2.1 Evolution of ITU-T recommendations for speech quality testing
©Dingli (27/7/2013) DL1AMOSWP Rev1 9 / 37
33 PPEESSQQ AAllggoorriitthhmm
3.1 Introduction
PESQ algorithm became an ITU-T recommendation (P.862) for objective speech
quality testing in February 2001. Due to its innovative features and correlation accuracy,
PESQ quickly becomes the mainstream speech quality testing method. PESQ
algorithm is applicable to end-to-end speech quality testing. It takes into account
various perceptual factors (such as codec distortion, errors, packet loss, variable delay,
jitter and filtration) to objectively test the speech quality, and provides a fully
quantifiable speech quality testing method.
Figure 3.1 PESQ reference signal and degraded signal
3.2 PESQ Algorithm Structure
Figure 3.2 shows the entire PESQ algorithm structure. The model begins by level
aligning both signals to a standard listening level. They are filtered (using an FFT)
with an input filter to model a standard telephone handset. The signals are aligned in
time and then processed through an auditory transform similar to that of PSQM. The
transformation also involves equalizing for linear filtering in the system and for gain
variation. Two distortion parameters are extracted from the disturbance (the difference
between the transforms of the signals), and are aggregated in frequency and time and
mapped to a prediction of subjective MOS. Generally, the greater the difference
between the degraded signal and the reference signal, the lower the speech quality
score.
©Dingli (27/7/2013) DL1AMOSWP Rev1 10 / 37
Figure 3.2 PESQ algorithm structure
Reference Signal
Level Align
System under test
Level AlignDegraded
Signal
Input filter
Input filter
Time Align and Equailise
Disturbance Processing
Auditory transform
Cognitive modelling
Auditory transform
Identify bad
intervals
Prediction of perceived
speech quality
Re-align bad intervals
PSQM and measuring normalizing blocks (MNB) were only recommended for use in
narrowband codec assessment and were known to produce inaccurate predictions
with certain types of codec, background noise, and end-to-end effects such as filtering
and variable delay. The scope of PESQ is therefore very much wider. In addition,
PESQ provides significantly higher correlation with subjective opinion than the
models by P.861, PSQM, and MNB. Results indicate that it gives accurate
predictions of subjective quality in a very wide range of conditions, including those with
background noise, analogue filtering, and/or variable delay.
Table 3 Correlation of different speech quality testing methods
TYPE Corr. Coeff. PESQ PAMS PSQM PSQM+ MNB
Mobile Network average 0.962 0.954 0.924 0.935 0.884
Mobile Network worst-case 0.905 0.895 0.843 0.859 0.731
According to related ITU-T information, PESQ algorithm can provide very accurate
prediction value, and is applicable to all known network technologies (such as GSM,
CDMA, 3 G, etc.,) at that time.
PESQ algorithm is the most sophisticated and accurate speech quality testing method,
and the test result obtained from this method mostly conforms to users' subjective
perceptions.
There are three kinds of PESQ speech quality testing values:
PESQ MOS: mean opinion score of perceptual evaluation of speech quality
(value range: 1.0 to 5.0, where 1.0 represents the lowest quality)
PESQ SCORE: score of perceptual evaluation of speech quality (value range:
-0.5 to 4.5, where –0.5 represents the lowest quality)
©Dingli (27/7/2013) DL1AMOSWP Rev1 11 / 37
PESQ LQ: listening quality of perceptual evaluation of speech quality (value
range: 1.0 to 4.5, where 1.0 represents the lowest quality)
The differences between the three values are as follow:
The value of PESQ SCORE is directly calculated from the algorithm; the value of
PESQ MOS is a subjective mean opinion score. If the speech quality is poor, the value
of PESQ SCORE is always higher than the value of PESQ MOS, which is
unreasonable. In this case, PESQ LQ is introduced by ITU, which value is closer to the
subjective value. In other words, PESQ SCORE is the ideal value calculated by the
algorithm; PESQ LQ is a more reasonable value obtained through adjustments.
Based on simulation and actual test, Figure 3.3 shows ideal PESQ values under
various network conditions and codecs. However, these results are based on the
transmission without errors or packet loss. In real networks, the test results may be
close to these values based on different test environments.
Figure 3.3 Typical PESQ score under various network conditions
©Dingli (27/7/2013) DL1AMOSWP Rev1 12 / 37
44 PPOOLLQQAA AAllggoorriitthhmm
4.1 Introduction
POLQA standard was developed during 2006-2010 by leading experts in a competition
carried out by ITU-T Study Group 12, in order to define a technology update for the
P.862 ‘PESQ’ standard. POLQA – which stands for Perceptual Objective Listening
Quality Analysis - will offer a new level of benchmarking capability to determine the
speech quality of mobile network services. The POLQA perceptual measurement
algorithm is a joint development by OPTICOM, SwissQual and TNO. POLQA has been
selected to form the new ITU-T speech quality testing standard, P.863, and will be used
with HD Voice, 3 G and 4 G/LTE.
4.2 Evolution of POLQA Based on PESQ
PESQ itself contains a very wide range of applications, such as fixed and wireless
network data testing, POTS (Plain Old Telephone Service), VoIP, and 3G. Compared to
PESQ, POLQA makes a variety of improvements to suit scenarios that PESQ is
inapplicable of.
The major improvements of POLQA are listed as follows:
Applicable to new speech codecs ( such as AAC and MP3) in 3G/LTE/4G
network technologies
Applicable to speech enhanced system (such as VQE and VED) that uses
non-linear processing to increase intelligence
Applicable to speech codecs (such as SBR) that changes or extends voice
bandwidth
Supports testing with very high background noise
Correct modelling of effects caused by variable sound presentation levels
Providing NB (narrowband) and SWB (super-wideband) modes
Handling time-scaling and time-warping as seen in VoIP and 3G
Using correct weighting of reverberation, linear and non-linear filtering
Supporting direct comparison between AMR (in GSM/CDMA) and EVRC (in
CDMA) encoded speeches
Applying POLQA to today's complex, unified networks will give a significant boost in
©Dingli (27/7/2013) DL1AMOSWP Rev1 13 / 37
accuracy and reliability compared to current standards. Due to its ability to handle time
scaling effects, POLQA can be used in virtually any scenario today; from video
telephony to lab testing, from codec or network optimization to the development and
maintenance of Unified Collaboration and Communication services.
As with all methods of ITU-T’s P.86X series, POLQA compares a known speech signal
to the degraded voice signal by simulating human hearing. Unlike methods estimating
quality based on network parameters or packet data analysis, this guarantees utmost
accuracy and applicability independent of the underlying network technology and
therefore provides measurement of true end-to-end quality.
POLQA is thus not only a full replacement for PESQ, but it is also a significant
enhancement to the applicability of end-to-end speech quality testing. Since it is not
considerably more complex than PESQ, POLQA is a logical upgrade path for all
customers using PESQ today.
4.3 Super-Wideband and Narrowband in POLQA
Telecom industries are now initiating the evolution from narrow-band telephony to
wideband speech transmission. The codecs for wide band are ready. Current
developments of voice codecs are processing the so-called super-wideband (up to
14,000 Hz) or even higher (‘full-band’), up to approx. 24,000 Hz. However, the
perceived difference between super-wideband and full-band can be ignored in the
case of human speech.
In the speech quality testing, users will face corresponding bandwidth problems. In
traditional telephony scenarios, the expectation is set to a perfect narrow-band voice
signal. A signal that is close or identical to such a signal is scored subjectively by
human listeners with a high quality value (usually a MOS-LQ of around 4.5 on a
five-point scale). Within a super-wideband scenario the situation is different. The
expectation of excellent quality is a perfect super-wideband speech signal. Since the
same five-point scale is used, such a perfect super-wideband signal is also
subjectively scored close to excellent in the range of 4.5. Obviously, a narrow-band
signal in that super-wideband context will not fulfil the expectation of high quality due to
its band limitation. Consequently, it will be scored lower in this context.
Since the range of the scores is the same but the meaning is different depending on
the context, the two are named as different scales: narrow-band or super-wideband.
Broadly the main difference is that narrow-band signals will be scored lower in a
super-wideband context than in narrow-band experiments, since the band-limitation is
scored as degradation. Hence, scores given on the two different scales must not be
©Dingli (27/7/2013) DL1AMOSWP Rev1 14 / 37
mixed or directly compared.
To cover both application areas – narrowband telephony and super-wideband
communication -POLQA supports two operational modes. The application of POLQA is
exactly the same in both cases; the change between the two modes just requires the
use of a control flag. All required adjustments are automatically made by POLQA
internally. Consequently, in narrowband mode POLQA scores on a five-point
narrowband scale, in the super-wideband mode on a five-point super-wideband scale.
The MOS-LQO value range obtained from these two modes is different (1 to 4.75 for
super-wideband mode, 1 to 4.5 for narrowband mode)
4.4 POLQA Perceptual Model and Structure
POLQA uses an advanced psycho-acoustic model for emulating the human perception
and transforming the sound into an internal neuronal representation. POLQA, as a full
reference approach, compares the input or high quality reference signal and the
associated degraded signal under test. This process is shown in Figure 5. POLQA
takes into account masking effects of the human hearing and uses the concept of
idealization of both input signals in multiple steps. This ensures that only the relevant
perfect speech information is used for comparison and any unwanted signal
components are discarded.
Figure 4.1 shows the POLQA algorithm structure. The module performs space/time
alignment for the reference signal and degrading signal, which is used to estimate the
delay and sample rate differences between the two signals. Once the correct delay is
determined and the sample rate differences have been compensated, the signals and
the delay information are passed on to the core model, which calculates the
perceptibility as well as the annoyance of the distortions and maps them to a MOS
scale.
©Dingli (27/7/2013) DL1AMOSWP Rev1 15 / 37
Figure 4.1 POLQA algorithm structure
4.5 Ideal POLQA MOS Values in Different Network Environment
Figure 4.2 shows typical values to be expected from POLQA. These were confirmed by
subjective auditory experiments. In actual network, the MOS values may be close to
the following values based on environmental factors.
Figure 4.2 Typical values to be expected from POLQA
©Dingli (27/7/2013) DL1AMOSWP Rev1 16 / 37
55 DDiinnggllii MMOOSS TTeessttiinngg SSoolluuttiioonn
5.1 Product History
Domestic partner of PESQ patent owner, the earliest manufacturer that
purchases PESQ patent
Started the R&D of speech quality testing in 2002
Developed PESQ speech quality testing in Drive Test in June 2003
Developed PESQ speech quality testing in Pilot Pioneer based on multi-test
mobile phones and commercial mobile phones in 2004
Released Pilot Pioneer test tool with POLQA support at the end of 2012
As an integrated test platform, Pilot Pioneer can be upgraded to the version with MOS
test functions just by adding a separate audio MOS box.
Note: In the remaining of this document, MOS value refers to the speech quality testing
score based on PESQ or POLQA unless stated otherwise, and MOS box refers to the
independent hardware system used by Dingli to test speech quality.
5.2 Added Value of Dingli MOS Testing System
In addition to providing the latest speech quality test tool, Dingli also focuses on
speech quality optimization in practice. Dingli’s solution analyzes and explores the
exact impact of wireless environment factors on the MOS values to provide the most
accurate and credible theoretical and practical reference information for network
optimization. The main research content includes:
Impact of different voice coding rate on the MOS value
Impact of handover frequency and handover failure on the MOS value
Impact of DTX (Discontinuous Transmission) functions on the MOS value
Impact of C/I (Carrier/Interference) on the MOS value within the same coverage
Impact of signal strength on the MOS value when the C/I is good
Impact of transmission quality on the MOS value
Impact of different network technologies on the MOS value
MOS testing, optimization methods and flow
Dingli MOS test system covers a comprehensive solution that includes an automatic
and traditional drive test system, which has been widely used by various operators. It
©Dingli (27/7/2013) DL1AMOSWP Rev1 17 / 37
helps operators to test, locate and solves speech quality related issues, and provides a
lot of relevant experience in MOS testing which is helpful to optimize the MOS test
system.
5.3 Dingli MOS Test System
Dingli MOS box is an accessory specially designed for the wireless network speech
quality testing. Users may use Pilot Pioneer and MOS box with different test terminals
to complete speech quality testing for various networks in various scenarios. In
practical, the MOS box has a variety of technical advantages. See Table 4.
Table 4 Technical advantages of Dingli MOS box
Item Description
Compact
45 (L) x 13 (H) x 38 (W) cm
Standard weight: 5 KG
Power: Built-in battery or external power supply
Easy to carry
Protective Material
Aluminum alloy material, anti-compression and
anti-seismic
Plastics and protective film composition for mobile
phone slots
©Dingli (27/7/2013) DL1AMOSWP Rev1 18 / 37
Item Description
Protect test terminals from damage
Terminal Distance
The distance between test terminals is at least 15 cm,
which avoids mutual interference between the terminals
and ensures test accuracy.
Simple Panel
Test terminals data and headphone cables are located
within lower section of the MOS box. Only the connectors
are visible in the upper layer. This makes the MOS box
more presentable and helps to reduce the impact of
twisted or broken cables during testing.
Easy Operation
All test terminals that are connected to the MOS box, are
connected to a computer via three cables, therefore
simplifying the connection.
Humanized Design
Test terminals can be elevated up to 75° in a fixed slot,
which can improve the receive signal strength and
dissipate heat from the mobile phones.
Table 5 describes the technical specifications of Dingli MOS box.
Table 5 Technical specifications of Dingli MOS box
Item Description
Physical
features
Input voltage: 10 V-18 V
Power: ≤ 2.1 W
Working temperature: 0℃- 55℃
Storage temperature: -20℃ - 85℃
S/N Ratio (Signal to Noise Ratio): ≥ 42 dB
Record or playback distortion: ≤ 3%
Output power: ≥ 50m W (headphone drive)
Record or playback echo Rejection Ratio: ≥ 40 dB
System feature
Support mobile to mobile or mobile to land test
Support a maximum of eight test terminals
simultaneously
Provide standard 8S, 12S and 4S male speech
sample files (bit rate: 128 K)
Display the current audio file waveform and MOS
value, save audio files in real time, and save the
entire audio file or save the audio file based on
settings
©Dingli (27/7/2013) DL1AMOSWP Rev1 19 / 37
Item Description
Provide high performance recording and playback
device to avoid noise
Support device abnormality alarms ( audio or
notification)
Able to view relevant in-depth analysis parameters
provided under the MOS algorithm
Support multi-network comparison (such as GSM
/CDMA / WCDMA / TD-SCDMA / LTE CSFB)
Support various type of terminals (such as
SAMSUNG S3, SAMSUNG Note I, ZTE V8860 +,
Huawei 8800 series)
Background
processing
Voice call report (Coverage, Connection Rate,
Call-drop Rate, Call Setup Time)
Voice Call Details (outgoing and incoming call
details)
Diagnostic event details (such as details on
dropped calls, lost calls, and MOS low values)
Parameter list details (various parameter sample
rate)
Display of MOS value on a map
Graphical display of various parameters
5.4 MOS Testing Solutions
MOS test includes three test solutions based on Pilot Pioneer, Pilot RCU, and Pilot
Walktour respectively, and one analysis solution based on Pilot Navigator. All the three
test solutions support PESQ and POLQA algorithm. Users only need to select PESQ or
POLQA during configuration.
5.4.1 MOS Testing Solution Based on Pilot Pioneer
1. Mobile-to-Mobile
This mode supports speech quality and benchmarking test for operators, with a
maximum of four networks simultaneously. The test terminals can be randomly
combined with any network technologies (such as 2G1C, 3W1C, and etc.).
Users may initiate a call from a mobile phone to another mobile phone. The calling
party initiates a call, and a speech sample is replayed after the connection, and
returned to the called party through the base station. The called party records the
©Dingli (27/7/2013) DL1AMOSWP Rev1 20 / 37
speech and compare the speech with a standard speech sample to obtain the uplink
value of the calling party (it is also the downlink value of the called party). Then, the
called party replays the speech, and returns the speech to the calling party through
the base station. The calling party records the speech and compares the speech with
a standard speech sample to obtain the uplink value of the called party (it is also the
downlink value of the calling party). Users may alternate the test terminals and
perform infinite loop testing.
Note: Above description is valid for PESQ. For POLQA, currently Pilot Pioneer
supports POLQA score only for the calling party.
©Dingli (27/7/2013) DL1AMOSWP Rev1 21 / 37
Figure 5.1 Pioneer+MOS (2W1C)
2. Mobile-to-Land
Users may conduct mobile-to-land MOS test based on network type, or customize
the solution by defining the terminal type and quantity according to network type.
Currently, this test mode supports a maximum of four networks simultaneously.
Note: For POLQA, currently Pilot Pioneer supports only 8K speech sampling
and downlink POLQA score when POLQA algorithm is used.
Figure 5.2 Mobile-to-Land
5.4.2 MOS Analysis Solution Based on Pilot Navigator
Pilot Navigator is a professional post processing tool developed by Dingli
Communications. It provides multiple professional analysis reports which include
speech quality test results.
1. Speech Quality Report
In the Speech Quality Report, the PESQ or POLQA Mean Value and Middle Value
are displayed. Figure 5.3 is an example.
©Dingli (27/7/2013) DL1AMOSWP Rev1 22 / 37
Figure 5.3 Speech Quality Report provided by Pilot Navigator
Speech Quality(General) Result Speech Quality(MOC) Result Speech Quality(MTC) Result
PESQ MOS Mean Value 3.63 PESQ MOS Mean Value 3.63 PESQ MOS Mean Value 3.63
PESQ MOS Middle Value 3.71 PESQ MOS Middle Value 3.71 PESQ MOS Middle Value 3.71
POLQA ScoreNB Mean Value 0.00 POLQA ScoreNB Mean Value 0.00 POLQA ScoreNB Mean Value 0.00
POLQA ScoreNB Middle Value POLQA ScoreNB Middle Value POLQA ScoreNB Middle Value
POLQA ScoreSWB Mean Value 0.00 POLQA ScoreSWB Mean Value 0.00 POLQA ScoreSWB Mean Value 0.00
POLQA ScoreSWB Middle Value POLQA ScoreSWB Middle Value POLQA ScoreSWB Middle Value
PESQ Score Average Value 3.58 PESQ Score Average Value 3.58 PESQ Score Average Value 3.58
PESQ Score Medium Value 3.62 PESQ Score Medium Value 3.62 PESQ Score Medium Value 3.62
PESQ LQ Average Value 3.58 PESQ LQ Average Value 3.58 PESQ LQ Average Value 3.58
PESQ LQ Medium Value 3.67 PESQ LQ Medium Value 3.67 PESQ LQ Medium Value 3.67
BLER<=3% 138255 BLER<=3% 55741 BLER<=3% 82514
BLER<=5% 0 BLER<=5% 0 BLER<=5% 0
BLER<=10% 1075 BLER<=10% 451 BLER<=10% 624
BLER<=100% 464 BLER<=100% 188 BLER<=100% 276
Speech Quality(by BLER) 98.90% Speech Quality(by BLER) 98.87% Speech Quality(by BLER) 98.92%
Average BLER 0.20% Average BLER 0.21% Average BLER 0.20%
Speech Quality
2. Abnormal MOS Value Details Report
In the Abnormal MOS Value Details report, the information about MOS Time,
Network, and MOS Value is provided for abnormality cause analysis. Figure 5.4 is
an example for Abnormal MOS Value Details report.
Figure 5.4 Abnormal MOS Value Details Report provided by Pilot Navigator
File Name Network MOS TimeFreq.
(BCCH)
PSC
(BSIC)
Ec/Io
(TCH C/I)
TxPower_
W
(TxPower_
G)
Rx Power
(Rxlevel
Sub)
BLER
(Rxqual
Sub)
RSCP
(TA)Longitude Latitude
1 (10.00)UMTS4_TD+WCDMA WCDMA 14:28:02.578 265 0 -11.51 1.12 -68.62 0.96 -87.64 116.28183 39.95340
2 (10.00)UMTS5_TD+WCDMA GSM 14:28:11.515 122 17 17.95 39.00 -80.90 0.87 1.00 116.28103 39.95344
3 (10.00)UMTS6_TD+WCDMA GSM 14:32:59.296 640 65 9.42 30.00 -84.62 2.76 2.00 116.27477 39.95549
4 -9.999 UMTS7_TD+WCDMA GSM 14:33:07.671 664 23 12.52 29.70 -77.14 0.34 0.86 116.27481 39.95503
5 (10.00)UMTS8_TD+WCDMA GSM 14:33:16.171 655 46 17.27 30.00 -80.11 0.47 0.32 116.27484 39.95457
6 -9.999 UMTS9_TD+WCDMA GSM 14:33:41.265 666 37 9.77 26.52 -83.58 3.77 0.40 116.27499 39.95317
7 -9.999UMTS10_TD+WCDMA GSM 14:33:49.781 122 50 7.32 20.00 -82.43 6.15 1.00 116.27543 39.95309
8 -9.999UMTS11_TD+WCDMA GSM 14:50:43.359 664 61 18.14 31.66 -69.02 0.33 2.00 116.28235 39.95263
9 -9.999UMTS12_TD+WCDMA GSM 14:50:51.765 651 40 19.71 34.91 -74.94 0.65 2.00 116.28238 39.95331
10 -9.999UMTS13_TD+WCDMA GSM 14:54:32.625 643 13 22.35 35.80 -64.37 0.00 0.86 116.28572 39.95820
11 -9.999UMTS14_TD+WCDMA GSM 14:54:41.109 121 75 20.95 33.49 -59.98 0.00 0.49 116.28576 39.95795
12 -9.999UMTS15_TD+WCDMA GSM 15:51:57.218 666 37 21.42 27.46 -57.95 0.00 0.50 116.27474 39.95573
13 (10.00)UMTS16_TD+WCDMA GSM 15:52:05.562 655 46 21.72 24.69 -60.57 0.00 0.61 116.27471 39.95601
NO.
MOS
Abnormal
Type
MOS
Value
PositionOutgoing Call
Abnormal MOS Value Details
5.4.3 MOS Testing Solution Based on Pilot Fleet (RCU)
Dingli Pilot Fleet (RCU) supports automatic speech quality testing, which can
internally integrate multiple MOS test modules to perform single-network MOS testing
or multi-network MOS benchmarking testing. It offers a variety of end-to-end testing
methods; such as from a remote test module to a fixed-line phone, from a test module
to a test module, and from a test module to a test module in different cities. This in
turn allows user to test pure uplink or downlink MOS value, user’s perceptual MOS
value, and long-distance speech quality MOS value.
1. From an RCU Test Module to Server
Figure 5.5 shows the general architecture of the speech quality testing from a MOS
test module built in RCU to a server. By using E1 cables on the server side to set up
©Dingli (27/7/2013) DL1AMOSWP Rev1 23 / 37
multiple channels for speech quality testing, multiple RCU MOS test modules can
simultaneously call the server number. In this way, multi-channel MOS values are
obtained. Since the called party is a fixed-line phone number, the transmission loss is
extremely small. As a result, the MOS values obtained on the server side and the
RCU side can be considered as pure uplink and downlink MOS values.
Note: For POLQA, this mode supports only the downlink score and a maximum
of three dual-core RCU test modules simultaneously.
Figure 5.5 Calling from an RCU MOS Test Module to server
2. From an RCU MOS Test Module to an RCU MOS Test Module
Figure 5.6 shows the general structure of the speech quality testing from an RCU
MOS test module to an RCU MOS test module. Users may initiate calls from one
MOS test module to another to perform MOS comparison testing. The speech can be
replayed at one end and recorded at the other end, and then the MOS value can be
calculated through the MOS algorithm. In addition, users may initiate calls from one
test module to another in the same RCU. This testing method is relatively simple,
without installing speech quality testing server at the server side. Therefore, it is
applicable to the speech quality testing in remote communication.
©Dingli (27/7/2013) DL1AMOSWP Rev1 24 / 37
Figure 5.6 Calling from an RCU MOS Test Module to an RCU MOS Test Module
The testing process is almost the same as that of from an RCU test module to a
server. The difference is that the calls are between two RCU test modules or two test
modules in the same RCU, and this solution also supports multi-channel MOS
testing.
Note: This mode supports a maximum of two dual-core RCU test modules to
dial to each other when POLQA algorithm is used.
5.4.4 MOS Testing Solution Based on Pilot Walktour
1. Mobile to Mobile
As shown in Figure 5.7, Dingli Pilot Walktour supports calling from a mobile phone to
a mobile phone to perform MOS testing. By using the mobile phone integrated
software kernel and MOS algorithm, users may use one mobile phone to replay the
speech sample and the other to record the voice, and perform speech quality testing.
Note: In this mode, IOS Walktour POLQA score can only be viewed when
analyzed with Pilot Navigator but not on the mobile phone; Android Walktour
supports POLQA score only on the mobile phone of the calling party.
©Dingli (27/7/2013) DL1AMOSWP Rev1 25 / 37
Figure 5.7 MOS testing based on Pilot Walktour
2. Mobile-to-Land
Users may conduct mobile-to-land MOS test based on network type, or customize
the solution by defining the terminal type and quantity according to network type.
Note: In this mode, iOS Walktour support POLQA score only when analyzed
with Pilot Navigator but not on the mobile phone. Android Walktour supports
POLQA score only on the mobile phone of the calling party.
5.5 Details About Dingli MOS Testing Solution
5.5.1 Impact of Network Factors on PESQ MOS Value
1. Impact of Voice Coding Mode on MOS Value
In GSM network technology, there are multiple voice coding modes such as
Enhanced Full Rate (EFR), Full Rate (FR), and Half-Rate (HR). Based on the coding
principle, it is possible to obtain:
MOS value in the EFR mode > MOS value in the FR mode > MOS value in the HR
mode
The following is a group of speech quality testing results (from a mobile phone to a
fixed-line phone) without environmental interference.
Uplink PESQ MOS value: value in the EFR mode (max 4.20) > value in the
FR mode (max 3.94) > value in the HR mode (max 3.69)
Downlink PESQ MOS value: value in the EFR mode (max 4.255) > value in
the FR mode (max 3.940) > value in the HR mode (max 3.728)
©Dingli (27/7/2013) DL1AMOSWP Rev1 26 / 37
In commercial networks, because of network interference and other factors, the test
results obtained by using different coding rates will be quite different from the results
obtained in ideal environment. Table 6 shows an example of the average value of an
actual speech quality testing in urban environment.
Table 6 Impact of HR in GSM Network on PESQ result
PESQ Result HR Rate = 0% HR Percentage
= 40%
HR Percentage
=100%
Uplink PESQ
MOS value 3.588 3.428 3.331
Downlink PESQ
MOS value 3.418 3.325 3.259
2. Impact of Link and Routing on MOS Value
In speech quality testing, different links and routings are also important factors that
affect the PESQ MOS value. For example, if the called party is from a different
network (e.g. CDMA to GSM) instead of the same Radio Access Network, the result
may differ.
In an actual network test, it was found that the tested MOS value is lower than usual.
After ruling out equipment and wireless causes it was discovered that the problem is
caused by long-distance switching and routing. In the scenario, the PESQ speech
quality test uses two routings. Each call is then locked to a separate route to
determine which one reported the lower MOS value. One of the routes was in fact
transferring some error code which leads to the lower MOS value. The reported MOS
value was normal again after the issue has been resolved.
3. Impact of Abis Compression on MOS Values
Back in January 2005 (Beijing), Dingli conducted a test on Abis compression
(bandwidth savings on Abis level) impact on MOS value from a live network. The test
was divided into three sections: test with no compression, test with 5-slot
compression, and test with 6-slot compression. See Table 7.
Table 7 Impact of Abis compression on MOS value
Item Average Downlink Value Average Uplink Value
No compression 3.964 3.938
5 Time-slot compression 3.424 3.464
6 Time-slot compression 3.364 3.367
©Dingli (27/7/2013) DL1AMOSWP Rev1 27 / 37
Conclusion: The greater the compression, the lower the MOS value.
4. Impact of Network Quality on MOS Value
There is a certain relationship between the network quality (RxQual Sub for GSM,
FFER for CDMA) and PESQ MOS value.
When the GSM RxQual Sub value range is [0, 2], the PESQ MOS value
range is [3, 4.1].
When the CDMA FFER value range is [0, 3%], the PESQ MOS value range is
[3, 4.1].
Generally, if the RxQual Sub/FFER value is high, the MOS value is low. However, if
the MOS value is low, the RxQual Sub/FFER value is not necessarily high (the MOS
value may be affected by other factors).
Good network environment is the basic element ensuring good wireless
communication. Low C/I and C/A (Carrier/Adjacent) value will result in high BER,
which will decrease the call quality or triggers dropped call. High BER caused by
network frequency interference have been the major concern for network
optimization. From user’s perceptual aspect, transient BER does not affect user's
listening experience, while continuous BER causes frame loss and serious impact on
listening.
Following conclusion can be made after practical tests were conducted:
In EFR mode, when the downlink RXQUAL value is greater than 4.8, the
downlink PESQ MOS value is lower; when the downlink RXQUAL value is
greater than 5.4, the downlink PESQ MOS value is lower than 3.3; when the
downlink RXQUAL value is greater than 6, the downlink PESQ MOS value is
straight down to an unbearable degree.
In FR mode, when the downlink RXQUAL value is greater than 5.1, the
downlink PESQ MOS value is lower; when the downlink RXQUAL value is
greater than 5.6, the downlink PESQ MOS value is lower than 3.3; when the
downlink RXQUAL value is greater than 6, the downlink PESQ MOS value is
straight down to an unbearable degree.
In HR mode, when the downlink RXQUAL value is greater than 4.8, the
©Dingli (27/7/2013) DL1AMOSWP Rev1 28 / 37
downlink PESQ MOS value is lower; when the downlink RXQUAL value is
greater than 5.2, the downlink PESQ MOS value is lower than 3.3; when the
downlink RXQUAL value is greater than 6, the downlink PESQ MOS value is
straight down to an unbearable degree.
5. Impact of Network Handover on MOS Value
Handover is the most basic and important feature in GSM network technology. In the
handover process, TCH (traffic channel) frame is used as FACCH (Fast Associated
Control Channel) to transfer relevant handover signaling. This temporary interruption
is to maintain the network connection performance and complete the handover to a
more suitable cell. However, it is at the expense of continuous voice and has impact
on the speech quality to a certain extent. When frequent ping-pong handover occurs,
successive frame used (as FACCH) will cause voice interruption similar to that
caused by frame loss.
Following conclusion can be made after practical tests were conducted:
Handover has serious impact on the PESQ MOS value. In addition, when the
handover frequency increases, the speech quality may drop to an intolerable
degree, causing low value.
In EFR mode, if the handover occurs once every six seconds in transferring
uplink and downlink voice, the average uplink and downlink PESQ MOS
value is 1 lower than the maximum value. If the handover occurs twice every
six seconds in transferring uplink and downlink voice, the average uplink and
downlink PESQ MOS value is 2 lower than the maximum value.
In FR mode, if the handover occurs once every six seconds in transferring
uplink and downlink voice, the average uplink and downlink PESQ MOS
value is 1 lower than the maximum value. If the handover occurs twice every
six seconds in transferring uplink and downlink voice, the average uplink and
downlink PESQ MOS value is 1.5 lower than the maximum value.
In EFR / FR / HR mode, when handover failure occurs ( e.g. similar to
ping-pong handover), as the handover count increases, so is the drop in
PESQ MOS value (the PESQ MOS value may dropped close to 1).
6. Impact of Signal Strength on MOS Value
When signal strength changes and BER / FER is not greater than 0, the RXQUAL
©Dingli (27/7/2013) DL1AMOSWP Rev1 29 / 37
Sub value also maintains the value 0. In this case, the speech quality is theoretically
not affected.
Following conclusion can be made after practical tests were conducted:
When the C/I value is not changed, signal strength fluctuation does not affect the
PESQ MOS value; when the signal strength fluctuation affects C/I value, the PESQ
MOS value will also be affected.
5.5.2 Parameters in POLQA Testing Result
Theoretically, the parameters affecting PESQ MOS values will have an impact on the
POLQA test results. POLQA research is currently ongoing. This section describes the
detailed information about the POLQA test interface and parameters in Pilot Pioneer.
Figure 5.8 shows the POLQA test interface in Pilot Pioneer.
©Dingli (27/7/2013) DL1AMOSWP Rev1 30 / 37
Figure 5.8 POLQA test interface in Pilot Pioneer
The reference wave and degraded wave is displayed on the upper part of the
interface, and the POLQA test results are displayed on the lower part of interface,
including information about Library Version, Processing Mode (NB/SWB), Mean
Delay, Minimum Delay, Maximum Delay, and etc. For detailed information, see
Table 8.
Table 8 POLQA parameters in Pilot Pioneer
Parameter Description
Library Version POLQA OEM lib version.
P863 Version ITU P.863 reference version code.
Processing Mode Mode, including Narrowband (NB) or Super Wideband
(SWB).
Sample Rate(Hz) Sample rate of the reference speech signal in Hz
Typical sample rates of NB and SWB are 8 KHz and 48
©Dingli (27/7/2013) DL1AMOSWP Rev1 31 / 37
Parameter Description
KHz respectively.
POLQA Score POLQA score according to P.863, 1 to 4.5 for NB mode, 1
to 4.75 for SWB mode.
Mean Delay (ms) The average delay between the reference and the test
signal in ms.
Minimum Delay (ms) The minimum delay between the reference and the test
signal in ms.
Maximum Delay (ms) The maximum delay between the reference and the test
signal in ms.
Attenuation(dB) The attenuation between the two input signals in dB.
R value(E Model) The POLQA score mapped as a G.107 (E-Model) Ie value.
Estimated Sample
Rate(Hz)
The sample rate of the degraded signal as measured by
POLQA.
Resampling Applied
If the sample rates of the reference and the degraded
signal differ by more than 0.5%, POLQA will down sample
the signal with the higher sample rate. If this happens,
Resampling Applied will be set to 1.
Direction Uplink or Downlink.
Downlink means the device is playing the reference signal.
Level Reference
(dBov)
The Level of the reference signal in dBov (averaged over
the entire signal).
Level Degraded (dBov) The Level of the degraded signal in dBov (averaged over
the entire signal).
ASL Reference (dBov) The active speech level of the reference signal in dBov,
measured similar to P.56.
ASL Degraded (dBov) The active speech level of the degraded signal in dBov,
measured similar to P.56.
Pause Level Reference
(dBov)
The silence level of the reference signal in dBov,
measured similar to P.56.
Pause Level Degraded
(dBov)
The silence level of the degraded signal in dBov,
measured similar to P.56.
SNR Reference(dB) The SNR in dB of the reference signal.
©Dingli (27/7/2013) DL1AMOSWP Rev1 32 / 37
Parameter Description
SNR Degraded(dB) The SNR in dB of the degraded signal.
ASR Reference
The active Speech Ratio (ASR) of the reference signal.
ASR is the ratio of the length of the active speech signal
parts and the total signal length [0...1].
ASR Degraded
The active Speech Ratio (ASR) of the degraded signal.
ASR is the ratio of the length of the active speech signal
parts and the total signal length [0...1].
Pitch Reference(Hz) The average pitch frequency of the reference signal.
Pitch Degraded(Hz) The average pitch frequency of the degraded signal.
Record file The recorded degraded signal file, along with its location.
5.5.3 Reference PESQ-LQ Value: Some practical examples
Table 9, Table 10, and Table 11 describes the mean value and excellent value ratio
(excellent value range: 3.0 to 4.5) of three live networks with different scenarios in a
single test.
Table 9 Testing result of Operator A
Urban DT Voice Highway DT Voice
PESQ-LQ
Mean Value
3.0-4.5
Percentage
PESQ-LQ
Mean Value
3.0-4.5
Percentage
3.27 88.17% 3.25 86.00%
Table 10 Testing result of Operator B
Urban DT Voice Highway DT Voice
PESQ-LQ
Mean Value
3.0-4.5
Percentage
PESQ-LQ
Mean Value
3.0-4.5
Percentage
3.49 79.69% 3.36 75.64%
Table 11 Testing result of Operator C
Urban DT Voice Highway DT Voice
PESQ-LQ
Mean Value
3.0-4.5
Percentage
PESQ-LQ
Mean Value
3.0-4.5
Percentage
3.46 78.42% 3.43 78.11%
©Dingli (27/7/2013) DL1AMOSWP Rev1 33 / 37
66 AAppppeennddiixx 11:: IITTUU‐‐TT PP..886622 SSttaannddaarrdd RReelleevvaanntt IInnffoorrmmaattiioonn
1. Factors for which PESQ had demonstrated acceptable accuracy
©Dingli (27/7/2013) DL1AMOSWP Rev1 34 / 37
2. Factors for which PESQ had demonstrated acceptable accuracy (concluded)
3. PESQ is known to provide inaccurate predictions when used in conjunction
with these variables, or is otherwise not intended to be use with these variables
4. For further study. Factors, technologies and applications for which PESQ has
not currently been validated
©Dingli (27/7/2013) DL1AMOSWP Rev1 35 / 37
5. For further study. Factors, technologies and applications for which PESQ has
not currently been validated (concluded)
©Dingli (27/7/2013) DL1AMOSWP Rev1 36 / 37
6. ITU‐T Temporal structure and duration of source material
©Dingli (27/7/2013) DL1AMOSWP Rev1 37 / 37
77 RReeffeerreenncceess
[1]. OPTICOM GmbH& SwissQual AG, Perceptual Objective Listening Quality Analysis,
Technical White Paper.
[2]. Psytechnics Limited, PESQ An Induction White Paper, September 2001
[3]. ITU-T Recommendation P.862. Perceptual Evaluation of Speech Quality (PESQ), an
Objective Method for End-To-End Speech Quality Assessment of Narrowband
Telephone Networks and Speech Codecs, February 2001.
[4]. ITU-T Recommendation P.862.1. Mapping Function for Transforming P.862 Raw
Result Scores to MOS-LQO, November 2003.
[5]. ITU-T Recommendation P.863, Perceptual objective listening quality assessment,
January 2011
[6]. ITU-T TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU P.862 Table
1/P.862, Table 2/P.862, Table 3/P.862 ITU‐T Temporal structure and duration of source
material (8.1.2) February 2001
©Dingli (27/7/2013) DL1AMOSWP Rev1 38 / 37
Dingli Communications Corp., Ltd
Address: No.8, the Fifth Technology Road, Gangwan Avenue, Zhuhai,
P.R.China, 519085.
Tel: 0756-3391933 Fax: 0756-3391900
Technical Hotline: 4008878822
Email: [email protected]