Upload
phenomfab
View
227
Download
0
Embed Size (px)
Citation preview
8/12/2019 Manual - Reference Speech Signals
1/15
WHENQUALITYMATTERS
Reference Speech Signals for SQuadMeasurements
Manual
November 2010
SwissQual License AGAllmendweg 8 CH-4528 Zuchwil Switzerland
t+41 32 686 65 65 f+41 32 686 65 66 e [email protected]
Part Number: 16-070-200349/3 Rev 1.3
8/12/2019 Manual - Reference Speech Signals
2/15
SwissQual has made every effort to ensure that eventual instructions contained in the document are adequate and freeof errors and omissions. SwissQual will, if necessary, explain issues which may not be covered by the documents.SwissQuals liability for any errors in the documents is limited to the correction of errors and the aforementioned advisoryservices.
Copyright 2000 - 2010 SwissQual AG. All rights reserved.
No part of this publication may be copied, distributed, transmitted, transcribed, stored in a retrieval system, or translated
into any human or computer language without the prior written permission of SwissQual AG.
Confidential materials.
All information in this document is regarded as commercial valuable, protected and privileged intellectual property, and isprovided under the terms of existing Non-Disclosure Agreements or as commercial-in-confidence material.
When you refer to a SwissQual technology or product, you must acknowledge the respective text or logo trademarksomewhere in your text.
SwissQual, Seven.Five, SQuad, QualiPoc, NetQual, VQuad, Diversity as well as the following logos areregistered trademarks of SwissQual AG.
Diversity Explorer, Diversity Ranger, Diversity Unattended, NiNA+, NiNA, NQAgent, NQComm, NQDI,NQTM, NQView, NQWeb, QPControl, QPView, QualiPoc Freerider, QualiPoc iQ, QualiPoc Mobile,QualiPoc Static, QualiWatch-M, QualiWatch-S, SystemInspector, TestManager, VMon, VQuad-HD aretrademarks of SwissQual AG.
SwissQual acknowledges the following trademarks for company names and products:
Adobe, Adobe Acrobat, and Adobe Postscript are trademarks of Adobe Systems Incorporated.
Apple is a trademark of Apple Computer, Inc.
DIMENSION, LATITUDE, and OPTIPLEX are registered trademarks of Dell Inc.
ELEKTROBIT is a registered trademark of Elektrobit Group Plc.
Google is a registered trademark of Google Inc.
Intel, Intel Itanium, Intel Pentium, and Intel Xeon are trademarks or registered trademarks of Intel Corporation.
INTERNET EXPLORER, SMARTPHONE, TABLET are registered trademarks of Microsoft Corporation.
Java is a U.S. trademark of Sun Microsystems, Inc.
Linux is a registered trademark of Linus Torvalds.
Microsoft, Microsoft Windows, Microsoft Windows NT, and Windows Vista are either registered trademarks ortrademarks of Microsoft Corporation in the United States and/or other countries U.S.
NOKIA is a registered trademark of Nokia Corporation.
Oracle is a registered US trademark of Oracle Corporation, Redwood City, California.
SAMSUNG is a registered trademark of Samsung Corporation.
SIERRA WIRELESS is a registered trademark of Sierra Wireless, Inc.
TRIMBLE is a registered trademark of Trimble Navigation Limited.
U-BLOX is a registered trademark of u-blox Holding AG.
UNIX is a registered trademark of The Open Group.
8/12/2019 Manual - Reference Speech Signals
3/15
Reference Speech Signals for SQuad Measurements Manual
2000 - 2010 SwissQual AG
Contents |
CONFIDENTIAL MATERIALS
ii
Contents
1 Pre-Filtering of Reference Speech Material ...................................................................................... 1Narrowband (Telephony) Applications .................................................................................................. 1Wideband (Telephony) Applications ...................................................................................................... 1
2 SQuad-LQ Speech Quality Measurements .................................................................................... 3Basics .................................................................................................................................................... 3Squad-LQ SpeechDesign Of Samples .............................................................................................. 3SwissQual Speech MaterialNarrowband ........................................................................................... 3SwissQual Speech MaterialWideband .............................................................................................. 4
3 SQuad-NS Noise Suppression Measurement ................................................................................ 6Basics .................................................................................................................................................... 6SQuad-NS Speech Material .................................................................................................................. 6SwissQual Speech Material ................................................................................................................... 6
4 SQuad-AEC (Passive)Passive Echo Disturbance Measurement ................................................. 8Basics .................................................................................................................................................... 8SQuad-AEC (Passive) Speech Material ................................................................................................ 8SwissQual Speech Material ................................................................................................................... 8
5 SQuad-AEC (Active)Active Echo Disturbance Measurement ...................................................... 9Basics .................................................................................................................................................... 9SQuad-AEC (Active) Speech Material................................................................................................... 9SwissQual Speech Material ................................................................................................................... 9
6 SQuad-RTT Round Trip Time Measurement ................................................................................ 11Basics .................................................................................................................................................. 11Speech-Like Sequences ...................................................................................................................... 11SwissQual Speech Material ................................................................................................................. 11
TablesTable 2-1 Description of the settings for an SQuad-LQ measurement ............................................................. 3Table 2-2 Description of the IRS pre-filtered reference speech samples for narrowband samples .................. 4Table 2-3 Description of the non-IRS pre-filtered reference speech samples for wideband scenarios ............ 4Table 2-4 Description of the WB-IRS pre-filtered reference speech samples for wideband scenarios ............ 5Table 3-1 Description of the settings for an SQuad-NS measurement ............................................................. 6Table 3-2 Description of the prefix for a reference speech sample ................................................................... 7Table 4-1 Description of the settings for an SQuad-AEC (Passive) measurement........................................... 8Table 5-1 Description of the settings for an SQuad-AEC (Active) measurement ............................................. 9Table 5-2 Description of the settings for a double-talk SQuad-AEC (Active) measurement ............................. 9Table 5-3 Description of the double-talk reference speech samples .............................................................. 10
8/12/2019 Manual - Reference Speech Signals
4/15
Reference Speech Signals for SQuad Measurements Manual
2000 - 2010 SwissQual AG
Contents |
CONFIDENTIAL MATERIALS
iii
Table 6-1 Description of the characteristics for a SQuad-RTT measurement ................................................ 11
8/12/2019 Manual - Reference Speech Signals
5/15
Reference Speech Signals for SQuad Measurements Manual
2000 - 2010 SwissQual AG
Chapter 1 | Pre-Filtering of Reference Speech Material
CONFIDENTIAL MATERIALS
1
1 Pre-Filtering of Reference Speech Material
The Quality of Service (QoS) measurements that you perform with SwissQual equipment are designed toprovide the same level of quality that a subscriber experiences. For best results, SwissQual recommendsthat you use human speech references for most of your measurements and speech-like test signals forspecial measurements. Only human speech samples allow for an in-band transmission of the referencesignal and ensure that the transmission component reacts correctly.
For best results, use the following guidelines when you create a speech sample:
Record in a low noise environment with high quality equipment
Avoid long reverberation times in the complete frequency range of the speaker's environment
Include male and female voices as well as human utterances that are typical for telephoneconversations
Ensure that the text is well-balanced from a phonological point of view
SwissQual equipment is designed to be connected to the electrical interface of the sending-side.Accordingly, the acoustical behaviour of a sending device has to be modelled by the measurementequipment in use.
Narrowband (Telephony) Applications
Conventional shaped handsets tend to show a weak high-pass characteristic, or pre-emphasis, in thesending direction, which means that the terminal filters the real spoken voice at the microphone before thesignal is transmitted. If the sending interface of the measurement equipment is the network termination point(two-wire analog or ISDN), the filtering that is normally done by the handset must be modelled by themeasurement equipment. To create this model, the ITU-T recommends an IRS (Intermediate ReferenceSystem) characteristic within Recommendations P.48 and P.830.
The IRS (send) filter is defined for traditional narrowband applications (up to 3.4kHz) and for traditionalwideband applications (up to 7kHz).
In narrowband scenarios (traditional telephony band), the usual behaviour of a handset is similar to the IRS(send). To realize a normative input signal, SwissQual recommends that you use the IRS pre-filtered signalsas the input signal to the headset connector. Along with a built-in filter in a Diversity MCM (Mobile ConnectModule), this connector can be considered as flat. When you use an IRS (send) pre-filtered input signal,exactly one IRS handset is emulated.
For this reason, SwissQual provides pre-filtered reference files. These files can be sent directly from theelectrical interface of the connection to emulate a microphone.
These pre-filtered speech signals should also be used as a high quality reference for the SQuad-LQmeasurements in narrowband measurements. The differences between the optimal speech signal in a
telephone connection (IRS-pre-filtered, but completely undistorted) and the transmitted one can be taken intoaccount. An MCM (Radio-Interface-Manager) provides an interface that is similar to a 4-wire networktermination point and can also be used with IRS pre-filtered speech signals.
Wideband (Telephony) Applications
The ITU-T also defines an IRS (send) filter for traditional wideband scenarios. However, typical test cases forwideband telephony services tend to prefer unfiltered flat signals.
To serve wideband and super-wideband (up to 14kHz) test cases, SwissQual provides all reference speechmaterial for the speech-Wideband test type. This material has a sampling frequency of 32kHz and aneffective audio bandwidth of 50 to 14000 Hz. This signal can be used to directly feed the electrical headset
connector from a Diversity MCM.For special wideband applications, the usage of IRS(send)filtered speech material might be required. Forsuch scenarios, SwissQual provides wideband IRS (send) signals.
8/12/2019 Manual - Reference Speech Signals
6/15
8/12/2019 Manual - Reference Speech Signals
7/15
Reference Speech Signals for SQuad Measurements Manual
2000 - 2010 SwissQual AG
Chapter 2 | SQuad-LQSpeech Quality Measurements
CONFIDENTIAL MATERIALS
3
2 SQuad-LQ Speech Quality Measurements
Basics
The measurement of listening quality is based on a comparison between a high quality un-degraded speechsample, which is used as the input signal and the transmitted and probably distorted signal that is recordedat the output of the connection. A psychoacoustic model is then applied to both signals after which allperceptible differences are measured. The result of these measurements forms the overall listening qualityscore. Since, the linear distortions (frequency response) also influence the score, the selection of the inputsignal can also depend on the sending interface that is used.
Squad-LQ SpeechDesign Of Samples
The SQuad-LQ algorithm calculates the listening quality for any arbitrary speech signal, where thecharacteristics of the speech material that you use can influence the end result.
To obtain representative and reproducible measurements, the speech sample should reflect typical humanutterances in a telephone conversation.
For auditory tests that are in accordance with ITU-T P.800, short sentence pairs are used. Both sentenceswere spoken from one speaker. The average derived by scoring of some of these sentence pairs forms themean opinion score.
For network measuring purposes, the transmission of separate files over a longer period is not an acceptablesolution. For this reason, SwissQual recommends speech clips that contain at least two sentences from amale and a female native speaker.
Note: The sentences are selected to avoid QoS dependencies on the text.
Table 2-1 Description of the settings for an SQuad-LQ measurement
Setting Description
Length 6.0 s
Speech Activity approximately 70 %
Structure Two sentences, pause between sentences > 0.5s
Speaker Male and female native speakers
Samplingfrequency
16 kHz (for narrowband telephony)32 kHz (for wideband telephony)
File Format WAVE, 16bit, INTEL
Level -26.0 dB OVL
Pre-Filtering ITU-T Rec. P.830, mod. IRS(send)
If you want to use your own speech material, SwissQual strongly recommends a minimum sample length of5 seconds of which at least 50% contains speech activity.
SwissQual Speech MaterialNarrowband
For illustration purposes, SwissQual provides speech material in different languages in accordance with ITU-T P.800 and ITU-T P.862.3 recommendations. For consistent results, SwissQual recommends the IRS pre-filtered speech samples inTable 2-2 for narrowband telephony scenarios.
8/12/2019 Manual - Reference Speech Signals
8/15
Reference Speech Signals for SQuad Measurements Manual
2000 - 2010 SwissQual AG
Chapter 2 | SQuad-LQSpeech Quality Measurements
CONFIDENTIAL MATERIALS
4
Table 2-2 Description of the IRS pre-filtered reference speech samples for narrowband samples
ReferenceSample
Description
am_fm_IRS.wav American English, male+femalear_fm_IRS.wav Arabian, male+female
ch_fm_IRS.wav German, Swiss pronunciation, male+female
cn_fm_IRS.wav Chinese Mandarin, male+female
en2_fm_IRS.wav British English, male+female
Note: The en2_fm_IRS.wav file replaces the en_fm_IRS.wav, which is alsoincluded for existing deployments. For new deployments, use theen2_fm_IRS.wav reference sample.
1
fr_fm_IRS.wav French, male+female
Note: This sample systematically yields slightly lower MOS values than the otherlanguage reference samples in comparable situations. This discrepancy might bethe result of the generation and recording process of the French source material.
ge_fm_IRS.wav German, male+female
gr_fm_IRS.wav Greek, male+female
hu_fm_IRS.wav Hungarian, male+female
it_fm_IRS.wav Italian, male+female
jp_fm_IRS.wav Japanese, male+female
pl_fm_IRS.wav Polish, male+female
pt_fm_IRS.wav Portuguese, male+female
ru_fm_IRS.wav Russian, male+female
sp_fm_IRS.wav Spanish, male+female
tk_fm_IRS.wav Turkish, male+female
On request, SwissQual can provide all speech material for narrowband as un-filtered (flat) source material. Inaddition to the 6 s samples, a 11 s sample in American English (AM_CallQual_IRS.wav ) is also provided,which you can use for Call Quality measurements.
SwissQual Speech Material
Wideband
For illustration purposes, SwissQual provides speech material in different languages in accordance with ITU-T P.800 and ITU-T P.862.3 recommendations. SwissQual recommends the NON-IRS pre-filtered speechsamples inTable 2-3 for wideband telephony scenarios.
Table 2-3 Description of the non-IRS pre-filtered reference speech samples for wideband scenarios
Reference Sample Description
am_fm_wide.wav American English, male+female
du_fm_wide.wav Dutch, male+female
1SwissQual would like thank Psytechnics Ltd, UK, for their kindly permission to use their British English source material
to generate the new speech sample.
8/12/2019 Manual - Reference Speech Signals
9/15
Reference Speech Signals for SQuad Measurements Manual
2000 - 2010 SwissQual AG
Chapter 2 | SQuad-LQSpeech Quality Measurements
CONFIDENTIAL MATERIALS
5
Reference Sample Description
ch_fm_wide.wav German, Swiss pronunciation, male+female
en_fm_wide.wav British English, male+female2
ge_fm_wide.wav German, male+female
it_fm_wide.wav Italian, male+female
For special purposes, all speech material in wideband is also available with WB-IRS (send) pre-filtering.
Note: The use of wideband material with IRS pre-filtering can lead to a recognizable limitation in audiobandwidth. In speech wideband test cases, this limitation is scored as a degradation.
Table 2-4 Description of the WB-IRS pre-filtered reference speech samples for wideband scenarios
Reference Sample Description
am_fm_IRS_wide.wav American English, male+female
du_fm_IRS_wide.wav Dutch, male+female
ch_fm_IRS_wide.wav German, Swiss pronunciation, male+female
en_fm_IRS_wide.wav British English, male+female
ge_fm_IRS_wide.wav German, male+female
it_fm_IRS_wide.wav Italian, male+female
2SwissQual would like thank Psytechnics Ltd, UK, for their kindly permission for use their British English source material
to generate the new speech sample.
8/12/2019 Manual - Reference Speech Signals
10/15
Reference Speech Signals for SQuad Measurements Manual
2000 - 2010 SwissQual AG
Chapter 3 | SQuad-NSNoise Suppression Measurement
CONFIDENTIAL MATERIALS
6
3 SQuad-NS Noise Suppression Measurement
Basics
The main parameter of the SQuad Noise Suppression (NS) measurement assesses the improvement ordegradation of the noisy speech sample during transmission by comparing the input of the noisy speechsample to the output sample. Other parameters assess the noise suppression and level deviations. TheSQuad-NS measurement requires the following reference signals:
Noise-free reference speech signal
Same noise-free signal but mixed with an additive noise signal
SQuad-NS Speech Material
You can use an arbitrary speech signal in combination with a noise to run SQuad-NS. However, for a properSQuad-NS measurement, you need to include the noise free (clean) reference sample and the sample withthe background noise (noisy sample) for a proper measurement. Furthermore, the speech signal must be inconformance with the signals that are described for SQuad-LQ. The initial pause of the signal must be aminimum of 2.0 s. To account for the so-called Lombard-effect, the speech level must be at 3 dB above therecommended value for SQuad-LQ.
Table 3-1 Description of the settings for an SQuad-NS measurement
Setting Description
Length 8.0 s
Speech Activity approx. 50 %
Structure Two sentences initial pause > 2.0 s pause between the sentences > 0.5 sSpeaker Male and/or female native speakers
Sampling frequency 16 kHz
File Format WAVE, 16bit, INTEL
Speechlevel -23.0 dB OVL
Noiselevel -23.0 -50.0 dB OVL (recommended)
Pre-Filtering ITU-T Rec. P.830, mod. IRS(send)
SwissQual Speech Material
SwissQual provides IRS pre-filtered speech material in different languages. This material is based on thereference samples that are recommended for SQuad-LQ. The reference speech samples are available withthe following types of background noise:
In-car (stationary)
Street-noise (non-stationary)
Each of these noise types is mixed to each of the speech samples in four different level steps:
-26 dB OVL noise level (SNR = 3 dB)
-32 dB OVL noise level (SNR = 9 dB)
-38 dB OVL noise level (SNR = 15 dB)
-44 dB OVL noise level (SNR = 21 dB)
8/12/2019 Manual - Reference Speech Signals
11/15
Reference Speech Signals for SQuad Measurements Manual
2000 - 2010 SwissQual AG
Chapter 3 | SQuad-NSNoise Suppression Measurement
CONFIDENTIAL MATERIALS
7
The filenames of the reference speech samples provide a description of the file content. For example, theAm_fm_IRS_16k_car_32.wav file is the American English speech sample that has been mixed with carnoise at 32 dB. The Am_fm_IRS_16k_car_32_clean.wavfile is the corresponding speech sample withoutthe background noise.
For measurements in English, the following reference samples are recommended: Am_fm_IRS_16k_car_26.wav / Am_fm_IRS_16k_car_26_clean.wav
Am_fm_IRS_16k_car_44.wav / Am_fm_IRS_16k_car_44_clean.wav
The same speech signals are also available interlaced with street noise:
Am_fm_IRS_16k_str_26.wav / Am_fm_IRS_16k_str_26_clean.wav
Am_fm_IRS_16k_str_44.wav / Am_fm_IRS_16k_str_44_clean.wav
The following table contains the other languages that SwissQual provides similar reference samples for.
Table 3-2 Description of the prefix for a reference speech sample
Reference Sample LanguageEn_*.wav English
Ge_*.wav German
Gr_*.wav Greek
It_*.wav Italian
Jp_*.wav Japanese
Ru_*.wav Russian
Sp_*.wav Spanish
For best results, use the SwissQual reference sample files.
Note: You can use custom reference material if the material fulfills the defined requirements.
8/12/2019 Manual - Reference Speech Signals
12/15
Reference Speech Signals for SQuad Measurements Manual
2000 - 2010 SwissQual AG
Chapter 4 | SQuad-AEC (Passive)Passive Echo Disturbance Measurement
CONFIDENTIAL MATERIALS
8
4 SQuad-AEC (Passive)Passive Echo DisturbanceMeasurement
Basics
The SQuad-AEC (passive) algorithm searches for reflections (echoes) of a sent speech signal and if present,calculates the delay of the reflection with respect to the sent signal and, if also present, the echo return lossof the reflected signal. Side-tones, that is, reflections with a delay of less than 20 ms, are ignored by both ofthe algorithms, but are still signalized.
SQuad-AEC (Passive) Speech Material
You can use an arbitrary speech signal to measure the passive echo disturbance with the SQuad-AEC(passive) algorithm.
Table 4-1 Description of the settings for an SQuad-AEC (Passive) measurement
Setting Description
Length > 12.0 s
Speech Activity > 90 %
Structure Continuous speech
Speaker Male and/or female native speakers
Sampling frequency 16 kHz
File Format WAVE, 16bit, INTEL
Speech level -23.0 -29.0 dB OVL
Pre-Filtering ITU-T Rec. P.830, mod. IRS(send)
SwissQual Speech Material
squad_aec.wav
8/12/2019 Manual - Reference Speech Signals
13/15
Reference Speech Signals for SQuad Measurements Manual
2000 - 2010 SwissQual AG
Chapter 5 | SQuad-AEC (Active)Active Echo Disturbance Measurement
CONFIDENTIAL MATERIALS
9
5 SQuad-AEC (Active)Active Echo DisturbanceMeasurement
Basics
The SQuad-AEC (active) algorithm searches for reflections (echoes) of a sent speech signal that isgenerated actively by the far-end side and calculates the echo delay as well as the echo return loss of theresidual echo. Side-tones (reflections with a delay of less than 20 ms) are ignored by both of the algorithms,but are still signalized.
SQuad-AEC (Active) Speech Material
Due to the complexity of the measurement, the file-length of the active measurement is shorter than thepassive measurement. Basically, you can use an arbitrary speech signal with an exact length of 6 s to
measure the echo disturbance with SQuad-AEC (active).
Table 5-1 Description of the settings for an SQuad-AEC (Active) measurement
Setting Description
Length 6.0 s
Speech Activity > 90 %
Structure Continuous speech
Speaker Male and/or female native speakers
Sampling frequency 16 kHz
File Format WAVE, 16bit, INTEL
Speech level -26.0 dB OVL
Pre-Filtering ITU-T Rec. P.830, mod. IRS(send)
SwissQual also provides 6-second speech clips to generate double talk at the far end side. The speechactivity is lower than the default speech clips and focused on speech bursts. Although you can use thedefault clips (length = 6sec), some echo mis-spotting can occur.
Table 5-2 Description of the settings for a double-talk SQuad-AEC (Active) measurement
Setting Description
Length 6.0 s
Speech Activity 10 50%
Structure Isolated utterances
Speaker Male or female native speakers
Sampling frequency 8 kHz (!)
File Format WAVE, 16bit, INTEL
Speech level -26.0 dB OVL
Pre-Filtering ITU-T Rec. P.830, mod. IRS(send)
SwissQual Speech Material
SQuadAECact.wav
8/12/2019 Manual - Reference Speech Signals
14/15
Reference Speech Signals for SQuad Measurements Manual
2000 - 2010 SwissQual AG
Chapter 5 | SQuad-AEC (Active)Active Echo Disturbance Measurement
CONFIDENTIAL MATERIALS
10
SwissQual strongly recommends that you use the default reference sample files as they are optimallyadjusted to avoid interactions between the files.
Table 5-3 Description of the double-talk reference speech samples
Reference Sample Languagedt_10_8kHz.wav (10% speech activity, female Croatian)
dt_25_8kHz.wav (25% speech activity, female Croatian)
dt_50_8kHz.wav (50% speech activity, female Croatian)
8/12/2019 Manual - Reference Speech Signals
15/15
Reference Speech Signals for SQuad Measurements Manual
2000 - 2010 SwissQual AG
Chapter 6 | SQuad-RTTRound Trip Time Measurement
CONFIDENTIAL MATERIALS
11
6 SQuad-RTT Round Trip Time Measurement
Basics
The measurement of the round trip time is based on an in-band transmission of short voice-like sequences.During the measurement, one sequence is sent repeatedly from the A-side to B-side and after the signal isreceived, a different sequence is sent back from the B-side to A-side. SwissQual strongly recommends thatyou use the default reference speech samples RTTvoice_A.wavand RTTvoice_B.wav.
Speech-Like Sequences
For the in-band RTT measurement, two different sequences are necessary, where each sequence must fulfilthe technical characteristics inTable 6-1.
Table 6-1 Description of the characteristics for a SQuad-RTT measurement
Setting Description
Length 0.5 s 0.6 s
Speech Activity > 80 %
Sampling frequency 16 kHz
File Format WAVE, 16bit, INTEL
Level -27.0 -23.0 dB OVL
Pre-Filtering ITU-T Rec. P.830, mod. IRS(send)
SwissQual Speech Material
RTTvoice_A.wav
RTTvoice_B.wav