Ch8 - Choosing Speech Codecs for Mobile Communication

8/10/2019 Ch8 - Choosing Speech Codecs for Mobile Communication

1/19

CHAPTER 8

SPEECH CODING


2/19

Choosing Speech Codecs for Mobile

Communication

Important step in the design of a digital mobile communicationsystem is to choose the right speech codec

Available bandwidth is limited, so it is required to compress

speech to maximize the number of users on the system.

It must include the end to end encoding delay, the algorithmic

complexity of the coder, the dc power requirements, compatibility

with existing standards, and robustness of the encoded speech to

transmission errors.

The choice of the speech coder will also depend o the cell size

used. When the cell size is sufficiently small such that high

spectral efficiency is achieved through frequency reuse, it may besufficient to use a simple high rate speech codec


3/19


4/19

The GSM Codec The original speech coder used in the pan-European digital cellular

standard GSM goes by a rather grandiose name of regular pulse

excited long-term prediction(RLE-LTP) codec. This codec has a net bit rate of 13kbps.

It combines the advantages of the earlier French proposed basebandRELP codec with those of the (MPE-LTP) multipulse excited longterm prediction codec proposed by Germany.

The advantage of RELP codec is that it provides good qualityspeech at low complexity.

The MPE-LTP technique produces excellent speech quality at highcomplexity and is not affected by bit errors in the channel.

By modifying the RELP codec to incorporate the features of theMPE-LTP, the net bit rate was reduced from 14.77kbps to 13kbpswithout loss of quality.

The most important modification was the addition of a long termprediction loop.


5/19

GSM codec is complex

Fig 8.10 shows a block diagram of the speech encoder

Encoder consists of four major processing blocks

Speech sequence is first pre-emphasized, ordered intosegments of 20 ms duration and then Hamming windowed

This is followed by short-term prediction (STP) filtering

analysis where the logarithmic area ratios (LARs) of the

reflection coefficients rn

(k) (eight in number) are computed

Eight LAR have different dynamic ranges and probability

distribution functions

So all are not encoded with the same number of bits for

transmission


6/19

LAR parameters are decoded by LPC inverse filters so as

to minimize the error e nLTP involves finding the pitch period pn and gain factor

gn is then carried out such that the LTP residual rn is

minimised

Pitch extraction is done by determining that value of

delay D.

The extracted pitach pn and gain gn are transmitted andencoded at a rate of 3.6kbps

The LTP residual, rn, is weighted and decomposed into

three sequences


7/19

Fig 8.11 shows a block diagram of the GSM speech

decoder

Consists of four blocks operations

The received excitation parameters are RPE decoded and

passed to the LTP synthesis filter which uses the pitch and

gain parameter to synthesize the long-term signal

Short-term synthesis is carried out using the received

reflection coefficients to recreate the original speech signal

Every 260 bits of the coder output (i.e., 20ms blocks of

speech) are ordered, depending on their importance, into

groups of 50, 132, and 78 bits each


8/19

The bits in the first group are very important bits called

type Ia bits, next 132bits are Ib bits and the last 78bits are II

bits

Since type Ia bits are the ones which effect speech quality

, they have error detection CRC bits added

Both Ia and Ib bits are convolution ally encoded for

forward error correction

The least significant type II bits have no error correctionor detection


9/19


10/19


11/19

The USDC Codec

The US digital cellular system(IS-136) uses a vector sum

excited linear predictive coder(VSELP).

It operates at data rate of 7950 bps and a total rate of 13kbps

after channel coding.

It is a variant of the CELP type vocoders.

This coder was designed to accomplish the three goals of

highest speech quality, modest computational complexity

robustness to channel errors.

The code books in the VSELP encoder are organized with a

predefined structure such that a brute force search is avoided.

This significantly reduces the time required for the optimum

code word search


12/19

Fig 8.12 shows a block diagram of VSELP encoder

The 8kps VSELP codec utilizes three excitation resources

One from the long-term (pitch) predictor state, oradaptive code book

Second and third sources from the two VSELP excitation

code books

Each of these VSELP code books contain the equivalent of

128 vectors

These three excitation sequences are multiplied by their

corresponding gain terms and summed to give the combined

excitation sequence.

After each sub frame the combined excitation sequence is

used to update the long term filter state.


13/19

The synthesis filter is a direct from 10thorder LPC all

pole filter. The LPC coefficients are coded once per 20ms

frame and updated in each 5ms sub frame.The number of sub frame is 40 at an 8kHz sampling rate .

The decoder is shown in Fig 8.13


14/19


15/19


16/19

Performance Evaluation of Speech Coders

There are two approaches in evaluating the performance of a

speech coder in terms of its ability to preserve the signal

quality.

Objective measures have the general nature of a SNR an dprovide a quantitative value of how well the reconstructed

speech approximates the original speech.

MSE- Mean Square Error distortion, frequency weighted

MSE, and segmented SNR, articulation index are examples ofobjective measures.


17/19

Speech coders are highly speaker dependent in that the

quality varies with the age and gender of the speaker, the speed

at which the speaker speaks, and other factors .

The diagnostic acceptability measure (DAM) is another test

that evaluates acceptability of speech coding systems.

The most popular ranking system is known as the meanopinion score or MOS ranking.

One of the difficult conditions for speech coders to perform

well in the case where a digital speech-coded signal is

transmitted from the mobile to the base station and then

demodulated into an analog signal which is then speech coded

for retransmission as a digital signal over a landline or wireless

link. This situation is called tandem signaling


18/19

Th MOS i f h d d i h d i


19/19

The MOS rating of a speech code decreases with decreasing

bit rate

Table 8.3 gives the performance of some of the popular

speech coders on the MOS scale

Table 8.3 Performance of Coders

Documents

Ch8 - Choosing Speech Codecs for Mobile Communication