Upload
torcellite
View
224
Download
0
Embed Size (px)
Citation preview
8/10/2019 Ch8 - Choosing Speech Codecs for Mobile Communication
1/19
CHAPTER 8
SPEECH CODING
8/10/2019 Ch8 - Choosing Speech Codecs for Mobile Communication
2/19
Choosing Speech Codecs for Mobile
Communication
Important step in the design of a digital mobile communicationsystem is to choose the right speech codec
Available bandwidth is limited, so it is required to compress
speech to maximize the number of users on the system.
It must include the end to end encoding delay, the algorithmic
complexity of the coder, the dc power requirements, compatibility
with existing standards, and robustness of the encoded speech to
transmission errors.
The choice of the speech coder will also depend o the cell size
used. When the cell size is sufficiently small such that high
spectral efficiency is achieved through frequency reuse, it may besufficient to use a simple high rate speech codec
8/10/2019 Ch8 - Choosing Speech Codecs for Mobile Communication
3/19
8/10/2019 Ch8 - Choosing Speech Codecs for Mobile Communication
4/19
The GSM Codec The original speech coder used in the pan-European digital cellular
standard GSM goes by a rather grandiose name of regular pulse
excited long-term prediction(RLE-LTP) codec. This codec has a net bit rate of 13kbps.
It combines the advantages of the earlier French proposed basebandRELP codec with those of the (MPE-LTP) multipulse excited longterm prediction codec proposed by Germany.
The advantage of RELP codec is that it provides good qualityspeech at low complexity.
The MPE-LTP technique produces excellent speech quality at highcomplexity and is not affected by bit errors in the channel.
By modifying the RELP codec to incorporate the features of theMPE-LTP, the net bit rate was reduced from 14.77kbps to 13kbpswithout loss of quality.
The most important modification was the addition of a long termprediction loop.
8/10/2019 Ch8 - Choosing Speech Codecs for Mobile Communication
5/19
GSM codec is complex
Fig 8.10 shows a block diagram of the speech encoder
Encoder consists of four major processing blocks
Speech sequence is first pre-emphasized, ordered intosegments of 20 ms duration and then Hamming windowed
This is followed by short-term prediction (STP) filtering
analysis where the logarithmic area ratios (LARs) of the
reflection coefficients rn
(k) (eight in number) are computed
Eight LAR have different dynamic ranges and probability
distribution functions
So all are not encoded with the same number of bits for
transmission
8/10/2019 Ch8 - Choosing Speech Codecs for Mobile Communication
6/19
LAR parameters are decoded by LPC inverse filters so as
to minimize the error e nLTP involves finding the pitch period pn and gain factor
gn is then carried out such that the LTP residual rn is
minimised
Pitch extraction is done by determining that value of
delay D.
The extracted pitach pn and gain gn are transmitted andencoded at a rate of 3.6kbps
The LTP residual, rn, is weighted and decomposed into
three sequences
8/10/2019 Ch8 - Choosing Speech Codecs for Mobile Communication
7/19
Fig 8.11 shows a block diagram of the GSM speech
decoder
Consists of four blocks operations
The received excitation parameters are RPE decoded and
passed to the LTP synthesis filter which uses the pitch and
gain parameter to synthesize the long-term signal
Short-term synthesis is carried out using the received
reflection coefficients to recreate the original speech signal
Every 260 bits of the coder output (i.e., 20ms blocks of
speech) are ordered, depending on their importance, into
groups of 50, 132, and 78 bits each
8/10/2019 Ch8 - Choosing Speech Codecs for Mobile Communication
8/19
The bits in the first group are very important bits called
type Ia bits, next 132bits are Ib bits and the last 78bits are II
bits
Since type Ia bits are the ones which effect speech quality
, they have error detection CRC bits added
Both Ia and Ib bits are convolution ally encoded for
forward error correction
The least significant type II bits have no error correctionor detection
8/10/2019 Ch8 - Choosing Speech Codecs for Mobile Communication
9/19
8/10/2019 Ch8 - Choosing Speech Codecs for Mobile Communication
10/19
8/10/2019 Ch8 - Choosing Speech Codecs for Mobile Communication
11/19
The USDC Codec
The US digital cellular system(IS-136) uses a vector sum
excited linear predictive coder(VSELP).
It operates at data rate of 7950 bps and a total rate of 13kbps
after channel coding.
It is a variant of the CELP type vocoders.
This coder was designed to accomplish the three goals of
highest speech quality, modest computational complexity
robustness to channel errors.
The code books in the VSELP encoder are organized with a
predefined structure such that a brute force search is avoided.
This significantly reduces the time required for the optimum
code word search
8/10/2019 Ch8 - Choosing Speech Codecs for Mobile Communication
12/19
Fig 8.12 shows a block diagram of VSELP encoder
The 8kps VSELP codec utilizes three excitation resources
One from the long-term (pitch) predictor state, oradaptive code book
Second and third sources from the two VSELP excitation
code books
Each of these VSELP code books contain the equivalent of
128 vectors
These three excitation sequences are multiplied by their
corresponding gain terms and summed to give the combined
excitation sequence.
After each sub frame the combined excitation sequence is
used to update the long term filter state.
8/10/2019 Ch8 - Choosing Speech Codecs for Mobile Communication
13/19
The synthesis filter is a direct from 10thorder LPC all
pole filter. The LPC coefficients are coded once per 20ms
frame and updated in each 5ms sub frame.The number of sub frame is 40 at an 8kHz sampling rate .
The decoder is shown in Fig 8.13
8/10/2019 Ch8 - Choosing Speech Codecs for Mobile Communication
14/19
8/10/2019 Ch8 - Choosing Speech Codecs for Mobile Communication
15/19
8/10/2019 Ch8 - Choosing Speech Codecs for Mobile Communication
16/19
Performance Evaluation of Speech Coders
There are two approaches in evaluating the performance of a
speech coder in terms of its ability to preserve the signal
quality.
Objective measures have the general nature of a SNR an dprovide a quantitative value of how well the reconstructed
speech approximates the original speech.
MSE- Mean Square Error distortion, frequency weighted
MSE, and segmented SNR, articulation index are examples ofobjective measures.
8/10/2019 Ch8 - Choosing Speech Codecs for Mobile Communication
17/19
Speech coders are highly speaker dependent in that the
quality varies with the age and gender of the speaker, the speed
at which the speaker speaks, and other factors .
The diagnostic acceptability measure (DAM) is another test
that evaluates acceptability of speech coding systems.
The most popular ranking system is known as the meanopinion score or MOS ranking.
One of the difficult conditions for speech coders to perform
well in the case where a digital speech-coded signal is
transmitted from the mobile to the base station and then
demodulated into an analog signal which is then speech coded
for retransmission as a digital signal over a landline or wireless
link. This situation is called tandem signaling
8/10/2019 Ch8 - Choosing Speech Codecs for Mobile Communication
18/19
Th MOS i f h d d i h d i
8/10/2019 Ch8 - Choosing Speech Codecs for Mobile Communication
19/19
The MOS rating of a speech code decreases with decreasing
bit rate
Table 8.3 gives the performance of some of the popular
speech coders on the MOS scale
Table 8.3 Performance of Coders