4
Abstract—Traditionally, G.729 Annexure A and B and other speech codecs are implemented on programmable Digital Signal Processors (DSPs). This paper presents the implementation of G.729 A/B on Field Programmable Gate Arrays (FPGAs). This implementation has improved performance in terms of computational delay and memory as compared to its implementation on DSPs. The advantage of reduction in the computational delay can allow additional channels in 10ms time frame of G.729. The Architecture for G.729 A/B is highly optimized. It is used on Time-shared basis such that same hardware is used for encoding and decoding of speech signal frames. It is implemented on Xilinx’s Vertex-II Pro FPGA. I. INTRODUCTION The sophisticated speech compression algorithms are used today to compress the speech signals to achieve efficiency in storage and transmission. They are used in application requiring high quality speech compression such as Voice over IP (VoIP), Videophones, Digital Satellite System, Digital Mobile Radios, Integrated Services Digital Network, Land- Digital Mobile Radio, Digital Simultaneous Voice and Data, and other applications. The requirements for good speech coder are low bit-rate, low coding delay, less algorithmic complexity and high speech quality. G.729 is a low bit rate, toll quality codec using Conjugate-Structure Algebric-Code- Excited Linear-Prediction (CS-ACELP) [1]. The quality of this 8Kbits/sec codec is equivalent to that of a 32-Kbits/s ADPCM under most operating conditions. G.729 is very computationally intensive. Its annex A is reduced complexity version of it but it marginally worsened the speech quality [2]. Its annex B is about voice activity detection (VAD), discontinuous transmission (DTX) and comfort noise generator algorithms which reduces transmission rates during silence period of speech [3]. Its annex A & B are most widely used in industry. Traditionally, G.729 A/B has been implemented on programmable Digital Signal Processors (DSPs) [4, 5, 6]. But, Field Programmable Gate Array (FPGAs) can also be used to implement it to get even greater performance. This, greatly, can reduce the computational delay. The advantage of reduction in the computational delay can allow additional channels in 10ms time frame of G.729. II. G.729 A/B ENCODING & DECODING The general description of encoding/decoding principal of G.729A is similar to that of G.729 [2]. The coder is based on code-excited linear prediction (CELP) coding model. In CELP, locally decoded speech is compared against the original speech samples and coder parameters are selected such that mean squared weighted error between the original and reconstructed speech is minimized. The coder operates on a speech frame of 10 ms using a 5 ms of look ahead for linear-prediction (LP) analysis. The input and output are represented using 16-bit linear PCM. The encoding principle is shown in figure 1. After processing the input samples through a 140Hz high pass filter, the 10 th order LP analysis is performed, and the LP parameters are quantized in linear spectral pair (LSP) with 18-bits. The input frame is divided into two sub-frames of 5 ms each. The use of sub- frames allows better tracking of pitch and gain parameters and reduces the complexity of codebook searches. The quantized and un-quantized filter coefficients are used for the second sub-frame while in the first sub-frame interpolated LP filter coefficients are used. For each sub-frame, the excitation is represented by an adaptive-codebook and fixed-codebook. The adaptive and fixed-codebook parameters are transmitted every sub-frame. The adaptive-codebook index is encoded with 8 bits in the first sub-frame and differentially encoded with 5 bits in the second sub-frame. Also, to improve the robustness of the codec to channel errors, the six most significant bits of adaptive-codebook index in the first sub-frame have a parity bit added. The target signal is updated by removing the adaptive-codebook contribution, and this new target is used in the fixed-codebook search. The fixed codebook is a 17 bits algebraic codebook. The gains of the adaptive and fixed codebook are vector quantized with 7 bits using a conjugate structure codebook (with moving-average (MA) prediction applied to the fixed-codebook gain. Multi-Channel Implementation of G .729 A/B on FPGA Habibullah Jamal Vice Chancellor University of Engineering and Technology, Taxila Pakistan Email: [email protected] Ali Adnan Malik Department of Computer Engineering University of Engineering and Technology, Taxila Pakistan Email: [email protected] 978-1-4244-1847-3/07/$25.00 ©2007 IEEE IEEE ICM - December 2007

[IEEE 2007 International Conference on Microelectronics - ICM - Cairo, Egypt (2007.12.29-2007.12.31)] 2007 Internatonal Conference on Microelectronics - Multi-channel implementation

Embed Size (px)

Citation preview

Abstract—Traditionally, G.729 Annexure A and B and other

speech codecs are implemented on programmable Digital Signal

Processors (DSPs). This paper presents the implementation of

G.729 A/B on Field Programmable Gate Arrays (FPGAs). This

implementation has improved performance in terms of

computational delay and memory as compared to its

implementation on DSPs. The advantage of reduction in the

computational delay can allow additional channels in 10ms time

frame of G.729. The Architecture for G.729 A/B is highly

optimized. It is used on Time-shared basis such that same

hardware is used for encoding and decoding of speech signal

frames. It is implemented on Xilinx’s Vertex-II Pro FPGA.

I. INTRODUCTION

The sophisticated speech compression algorithms are used

today to compress the speech signals to achieve efficiency in

storage and transmission. They are used in application

requiring high quality speech compression such as Voice over

IP (VoIP), Videophones, Digital Satellite System, Digital

Mobile Radios, Integrated Services Digital Network, Land-

Digital Mobile Radio, Digital Simultaneous Voice and Data,

and other applications. The requirements for good speech

coder are low bit-rate, low coding delay, less algorithmic

complexity and high speech quality. G.729 is a low bit rate,

toll quality codec using Conjugate-Structure Algebric-Code-

Excited Linear-Prediction (CS-ACELP) [1]. The quality of this

8Kbits/sec codec is equivalent to that of a 32-Kbits/s ADPCM

under most operating conditions. G.729 is very

computationally intensive. Its annex A is reduced complexity

version of it but it marginally worsened the speech quality [2].

Its annex B is about voice activity detection (VAD),

discontinuous transmission (DTX) and comfort noise

generator algorithms which reduces transmission rates during

silence period of speech [3]. Its annex A & B are most widely

used in industry.

Traditionally, G.729 A/B has been implemented on

programmable Digital Signal Processors (DSPs) [4, 5, 6]. But,

Field Programmable Gate Array (FPGAs) can also be used to

implement it to get even greater performance. This, greatly,

can reduce the computational delay. The advantage of

reduction in the computational delay can allow additional

channels in 10ms time frame of G.729.

II. G.729 A/B ENCODING & DECODING

The general description of encoding/decoding principal of

G.729A is similar to that of G.729 [2]. The coder is based on

code-excited linear prediction (CELP) coding model. In

CELP, locally decoded speech is compared against the original

speech samples and coder parameters are selected such that

mean squared weighted error between the original and

reconstructed speech is minimized.

The coder operates on a speech frame of 10 ms using a 5 ms

of look ahead for linear-prediction (LP) analysis. The input

and output are represented using 16-bit linear PCM. The

encoding principle is shown in figure 1. After processing the

input samples through a 140Hz high pass filter, the 10th order

LP analysis is performed, and the LP parameters are quantized

in linear spectral pair (LSP) with 18-bits. The input frame is

divided into two sub-frames of 5 ms each. The use of sub-

frames allows better tracking of pitch and gain parameters and

reduces the complexity of codebook searches. The quantized

and un-quantized filter coefficients are used for the second

sub-frame while in the first sub-frame interpolated LP filter

coefficients are used. For each sub-frame, the excitation is

represented by an adaptive-codebook and fixed-codebook. The

adaptive and fixed-codebook parameters are transmitted every

sub-frame.

The adaptive-codebook index is encoded with 8 bits in the

first sub-frame and differentially encoded with 5 bits in the

second sub-frame. Also, to improve the robustness of the

codec to channel errors, the six most significant bits of

adaptive-codebook index in the first sub-frame have a parity

bit added. The target signal is updated by removing the

adaptive-codebook contribution, and this new target is used in

the fixed-codebook search. The fixed codebook is a 17 bits

algebraic codebook. The gains of the adaptive and fixed

codebook are vector quantized with 7 bits using a conjugate

structure codebook (with moving-average (MA) prediction

applied to the fixed-codebook gain.

Multi-Channel Implementation of G .729 A/B

on FPGA

Habibullah Jamal Vice Chancellor

University of Engineering and Technology, Taxila

Pakistan

Email: [email protected]

Ali Adnan Malik Department of Computer Engineering

University of Engineering and Technology, Taxila

Pakistan

Email: [email protected]

978-1-4244-1847-3/07/$25.00 ©2007 IEEE IEEE ICM - December 2007

.

Fig. 1 Block Diagram of Encoder [1]

The decoder function synthesizes the output speech samples

using received bitstream. First, the parameter indices are

extracted from the received bitstream. These indices are

decoded to obtain the coder parameters corresponding to l0 ms

speech frame. These parameters are the LSP coefficients, the

two fractional pitch delays, the two fixed codebook vectors,

and the two sets of adaptive and fixed codebook gains. The

LSP coefficients are interpolated and converted to LP filter

coefficients for each sub-frame. Then, for each 5ms sub-frame,

the excitation is constructed by adding the adaptive and fixed

codebook vectors scaled by their respective gains, the speech

is reconstructed by filtering the excitation through the LP

synthesis filter, and the reconstructed speech signal is passed

through an adaptive postfilter to enhance the speech quality.

Annex B is about voice activity detection (VAD),

discontinuous transmission (DTX) and comfort noise

generator algorithms. The VAD algorithm makes a voice

activity decision every 10 ms in accordance with the frame

size of the G.729A speech coder. Figure 2 shows Speech

communication system with Voice Activity Detection. A set of

difference parameters is extracted and used for an initial

decision. The parameters are the full band energy, the low

band energy, the zero-crossing rate and a spectral measure.

The long-term averages of the parameters during non-active

voice segments follow the changing nature of the background

noise. A set of differential parameters is obtained at each

frame. These are a difference measure between each parameter

and its respective long-term average. The initial voice activity decision is obtained using a piecewise linear decision

boundary between each pair of differential parameters. A final

voice activity decision is obtained by smoothing the initial

decision.

The output of the VAD module is either 1 or 0, indicating

the presence or absence of voice activity respectively. If the

VAD output is 1, the G.729A speech codec is invoked to

code/decode the active voice frames. However, if the VAD

output is 0, the DTX/CNG algorithms are used to code/decode

the non-active voice frames. Traditional speech coders and

Fig 2 Speech communication system with Voice Activity

Detection [3]

decoders use comfort noise to simulate the background noise

in the non-active voice frames. If the background noise is not

stationary, a mere comfort noise insertion does not provide the

naturalness of the original background noise. Therefore it is

desirable to intermittently send some information about the

background noise in order to obtain a better quality when non-

active voice frames are detected. The coding efficiency of the

non-active voice frames can be achieved by coding the energy

of the frame and its spectrum with as few as fifteen bits. These

bits are not automatically transmitted whenever there is a non-

active voice detection. Rather, the bits are transmitted only

when an appreciable change has been detected with respect to

the last transmitted non-active voice frame.

At the decoder side, the received bit stream is decoded. If

the VAD output is 1, the G.729A decoder is invoked to

synthesize the reconstructed active voice frames. If the VAD

output is 0, the CNG module is called to reproduce the non-

active voiced frames.

III. ARCHITECTURE

As the standard give precedence to its 16-bit fixed-point

ANSI C code over mathematical description of the algorithm,

therefore, we referred to ANSI C code for its implementation.

For efficient implementation of algorithm on Xilinx’s Vertex-

II Pro FPGA., a highly optimized time-shared architecture is

developed. This architecture has been used very efficiently to

encode and decode multiple channels of speech. This

implementation used resource more efficiently rather than its

implementation based on fully dedicated architecture [7].

The block diagram of the architecture is shown in the figure

3. Due to large requirement of memory, dual port block

memory of the FPGA is used.

IEEE ICM - December 2007

RAM Input/output

Buffers

64

64

16

64-bits 16-bits

Fig 3 Block Diagram of Codec Architecture

To get multiple operands in one cycle, memory is divided into

4 banks of 16-bit word-size. It provides us eight 16-bit

operands in one cycle. Functional Unit performs the operations

according to functions given in BASIC_OP.C and

OPER_32B.C of the ANSI C code. These operations are

highly optimized because the performance of coder and

decoder depend on these operations. Multiple units of each

operation units are used in Functional Unit to operate

efficiently on 8 operands obtained from memory. These units

are connected in different configuration among themselves to

form different fully pipelined paths that helps to optimize the

loops in ANSI C code. The input/output buffers of encoder

and decoder are maintained in a separate memory block. This

does not add the delay of buffering in and out of channels in to

the main computational activity of encoding or decoding.

The upper portion of the RAM is initialized with lookup

tables and constants for both encoder and decoder. The scratch

memory space acts as temporary memory required for either

for encoding or decoding. Due to adaptive nature of the

algorithm, large amount of static memory is used in the C

code. Therefore we allocated memory for each encoding

/decoding channel in RAM and initialized it. A separate space

for channels static memory pointers is allocated in lookup

tables and constant portion of RAM. These pointers pointed to

channel static memories in RAM. The information about

number of channels is also maintained in that space. The

memory for input/output buffer is organized to support

multiple-channels. The dual-buffering scheme is used for both

input/output buffers. The different memory requirement for

lookup tables, scratch RAM, channel static RAM is shown in

the table 1. This shows that for adding each encoding and

decoding channel, 2448 and 2368 bytes memory is required

respectively for static memory.

As the codec operate on 10 ms speech, so, the same

architecture is used for both encoding and decoding on time-

shared basis. The G.729 A/B Controller perform the encoding

or decoding at a time depending on which channel is

scheduled by the scheduler. The Input/output Buffers

Controller is responsible for buffering in and out of each

channel. The scheduler is responsible for scheduling channels

I/O buffers for encoding or decoding. It performs scheduling

on round-robin basis giving equal priority to each channel. The

architecture is optimized such that more than one channel can

be add in 10 ms time frame of G.729. Thus, we can add

multiple channels in our design just by adding their

corresponding static memory in RAM and I/O buffer.

TABLE I

MEMORY REQUIREMENT FOR CODER

Memory for Memory

Required

(Bytes)

Memory in each

bank (Bytes)

Lookup tables &

constants

7808

3904

Scratch RAM

4992

1248

Encoder Static

RAM(each ch)

2448

612

Decoder Static

RAM(each ch)

2368

592

IV. SYNTHESIS RESULTS

The design is synthesized for Xilinx’s Vertex-II Pro FPGA.

It is coded in Verilog HDL. The design is simulated on

ModelSim. The results for encoding and decoding are fully

verified with ANSI C code of G.729 A/B. The synthesis result

of resource utilization is shown in table 2. To achieve

efficiency, built-in multipliers of vertex-II Pro are used. The

design is able to run at 161 MHz

TABLE II

RESOURCES UTILIZATION USING VERTEX-II PRO

No Of Slices 3411

No Of Slice Flip Flops 2571

No Of 4 Input LUT 9969

No Of BRAMs(for 1 channel) 2 (1K blocks)

No of MULT18X18s 4

The number of cycles and time required to encode and

decode using TSTSEQ1 of TESTVCI G.729 A/B running at

161 MHz is shown in table 3.

Lookup

Tables

Scratch

RAM Ch#1

Static

RAM

Ch#2

Static

RAM

Ch#N

Functional

Unit

Ch#1

Ch#2

Ch#N

Ch#1

Ch#2

Ch#N

G.729 A/B

Controller

Input/output

Buffers

Controller

Scheduler

IEEE ICM - December 2007

TABLE III

CYCLES REQUIRED USING TSTSEQ1 OF G.729 A/B

Operation Cycles Required Time

Required

Encoding 40,112 0.249ms

Decoding 38,634 0.239ms

The total delay of encoding and decoding become 0.249 ms

+ 0.239 ms = 0.488 ms, this mean that we can have about 20

channel of encoding and 20 channel of decoding in 10 ms of

time frame. As the architecture is fully optimized to support

multiple channels, the number of channels in 10 ms can be

accommodated as per our requirements.

V. CONCLUSION

FPGA implementation of G.729 A/B has improved

performance in term of computational delay and memory as

compared to its implementation on DSPs. The advantage of

reduction in the computational delay can allow additional

channels in 10ms time frame of G.729. As the time-shared

architecture is fully optimized to support multiple channels,

the number of channels in 10 ms can be accommodated as per

our requirements. This implementation used resource more

efficiently rather than its implementation based on fully

dedicated architecture.

REFERENCES

[1] ITU-T Recommendation G.729, “Coding of speech at 8 Kbps using

Conjugate-Structure Algebraic-Code-Excited Linear-prediction (CS-

ACELP)”.

[2] ITU-T Recommendation G.729-Annex A, “Annex A: Reduced

complexity 8 Kbp/s CS-ACELP speech codec”.

[3] ITU-T Recommendation G.729-Annex B, “Annex B: A silence

compression scheme for G.729 optimize for terminal conforming to

Recommendation V.70”.

[4] M. Arora, N. Lahane, A. Prakash, “All assembly implementation of

G.729 Annex B speech codec on a fixed point DSP” IEEE International

Conference on Acoustics, Speech, and Signal Processing 2002,

Conference Proceedings.

[5] Mithun Banerjee, Vani. B. A, G. Radha Krishna, “Optimal Real Time

DSP implementation of ITU G.729 Speech Codec” IEEE 60th Vehicular

Technology Conference 2004.

[6] Jaewon Kim, Hyungjung Kim, Songin Choi, Younggap You, “The

implementation of G.729 speech coder on a 16-bit DSP chip for the

CDMA IMT-2000 system” IEEE Transactions on Consumer

Electronics, Volume 45, Issue 2, May 1999, pp 443-448.

[7] N. Mobini, B. Vahdat, M.H. Radfar, “An FPGA based implementation

of G.729” IEEE International Symposium on Circuit and System 2005,

23-26 May 2005, pp 3571-3574, Vol 4.

ACKNOWLEDGEMENTS

Simulations and Syntheses for this work were conducted in

the ASIC Design and DSP Laboratory, University of

Engineering and Technology Taxila.

IEEE ICM - December 2007