Upload
ali-adnan
View
216
Download
4
Embed Size (px)
Citation preview
Abstract—Traditionally, G.729 Annexure A and B and other
speech codecs are implemented on programmable Digital Signal
Processors (DSPs). This paper presents the implementation of
G.729 A/B on Field Programmable Gate Arrays (FPGAs). This
implementation has improved performance in terms of
computational delay and memory as compared to its
implementation on DSPs. The advantage of reduction in the
computational delay can allow additional channels in 10ms time
frame of G.729. The Architecture for G.729 A/B is highly
optimized. It is used on Time-shared basis such that same
hardware is used for encoding and decoding of speech signal
frames. It is implemented on Xilinx’s Vertex-II Pro FPGA.
I. INTRODUCTION
The sophisticated speech compression algorithms are used
today to compress the speech signals to achieve efficiency in
storage and transmission. They are used in application
requiring high quality speech compression such as Voice over
IP (VoIP), Videophones, Digital Satellite System, Digital
Mobile Radios, Integrated Services Digital Network, Land-
Digital Mobile Radio, Digital Simultaneous Voice and Data,
and other applications. The requirements for good speech
coder are low bit-rate, low coding delay, less algorithmic
complexity and high speech quality. G.729 is a low bit rate,
toll quality codec using Conjugate-Structure Algebric-Code-
Excited Linear-Prediction (CS-ACELP) [1]. The quality of this
8Kbits/sec codec is equivalent to that of a 32-Kbits/s ADPCM
under most operating conditions. G.729 is very
computationally intensive. Its annex A is reduced complexity
version of it but it marginally worsened the speech quality [2].
Its annex B is about voice activity detection (VAD),
discontinuous transmission (DTX) and comfort noise
generator algorithms which reduces transmission rates during
silence period of speech [3]. Its annex A & B are most widely
used in industry.
Traditionally, G.729 A/B has been implemented on
programmable Digital Signal Processors (DSPs) [4, 5, 6]. But,
Field Programmable Gate Array (FPGAs) can also be used to
implement it to get even greater performance. This, greatly,
can reduce the computational delay. The advantage of
reduction in the computational delay can allow additional
channels in 10ms time frame of G.729.
II. G.729 A/B ENCODING & DECODING
The general description of encoding/decoding principal of
G.729A is similar to that of G.729 [2]. The coder is based on
code-excited linear prediction (CELP) coding model. In
CELP, locally decoded speech is compared against the original
speech samples and coder parameters are selected such that
mean squared weighted error between the original and
reconstructed speech is minimized.
The coder operates on a speech frame of 10 ms using a 5 ms
of look ahead for linear-prediction (LP) analysis. The input
and output are represented using 16-bit linear PCM. The
encoding principle is shown in figure 1. After processing the
input samples through a 140Hz high pass filter, the 10th order
LP analysis is performed, and the LP parameters are quantized
in linear spectral pair (LSP) with 18-bits. The input frame is
divided into two sub-frames of 5 ms each. The use of sub-
frames allows better tracking of pitch and gain parameters and
reduces the complexity of codebook searches. The quantized
and un-quantized filter coefficients are used for the second
sub-frame while in the first sub-frame interpolated LP filter
coefficients are used. For each sub-frame, the excitation is
represented by an adaptive-codebook and fixed-codebook. The
adaptive and fixed-codebook parameters are transmitted every
sub-frame.
The adaptive-codebook index is encoded with 8 bits in the
first sub-frame and differentially encoded with 5 bits in the
second sub-frame. Also, to improve the robustness of the
codec to channel errors, the six most significant bits of
adaptive-codebook index in the first sub-frame have a parity
bit added. The target signal is updated by removing the
adaptive-codebook contribution, and this new target is used in
the fixed-codebook search. The fixed codebook is a 17 bits
algebraic codebook. The gains of the adaptive and fixed
codebook are vector quantized with 7 bits using a conjugate
structure codebook (with moving-average (MA) prediction
applied to the fixed-codebook gain.
Multi-Channel Implementation of G .729 A/B
on FPGA
Habibullah Jamal Vice Chancellor
University of Engineering and Technology, Taxila
Pakistan
Email: [email protected]
Ali Adnan Malik Department of Computer Engineering
University of Engineering and Technology, Taxila
Pakistan
Email: [email protected]
978-1-4244-1847-3/07/$25.00 ©2007 IEEE IEEE ICM - December 2007
.
Fig. 1 Block Diagram of Encoder [1]
The decoder function synthesizes the output speech samples
using received bitstream. First, the parameter indices are
extracted from the received bitstream. These indices are
decoded to obtain the coder parameters corresponding to l0 ms
speech frame. These parameters are the LSP coefficients, the
two fractional pitch delays, the two fixed codebook vectors,
and the two sets of adaptive and fixed codebook gains. The
LSP coefficients are interpolated and converted to LP filter
coefficients for each sub-frame. Then, for each 5ms sub-frame,
the excitation is constructed by adding the adaptive and fixed
codebook vectors scaled by their respective gains, the speech
is reconstructed by filtering the excitation through the LP
synthesis filter, and the reconstructed speech signal is passed
through an adaptive postfilter to enhance the speech quality.
Annex B is about voice activity detection (VAD),
discontinuous transmission (DTX) and comfort noise
generator algorithms. The VAD algorithm makes a voice
activity decision every 10 ms in accordance with the frame
size of the G.729A speech coder. Figure 2 shows Speech
communication system with Voice Activity Detection. A set of
difference parameters is extracted and used for an initial
decision. The parameters are the full band energy, the low
band energy, the zero-crossing rate and a spectral measure.
The long-term averages of the parameters during non-active
voice segments follow the changing nature of the background
noise. A set of differential parameters is obtained at each
frame. These are a difference measure between each parameter
and its respective long-term average. The initial voice activity decision is obtained using a piecewise linear decision
boundary between each pair of differential parameters. A final
voice activity decision is obtained by smoothing the initial
decision.
The output of the VAD module is either 1 or 0, indicating
the presence or absence of voice activity respectively. If the
VAD output is 1, the G.729A speech codec is invoked to
code/decode the active voice frames. However, if the VAD
output is 0, the DTX/CNG algorithms are used to code/decode
the non-active voice frames. Traditional speech coders and
Fig 2 Speech communication system with Voice Activity
Detection [3]
decoders use comfort noise to simulate the background noise
in the non-active voice frames. If the background noise is not
stationary, a mere comfort noise insertion does not provide the
naturalness of the original background noise. Therefore it is
desirable to intermittently send some information about the
background noise in order to obtain a better quality when non-
active voice frames are detected. The coding efficiency of the
non-active voice frames can be achieved by coding the energy
of the frame and its spectrum with as few as fifteen bits. These
bits are not automatically transmitted whenever there is a non-
active voice detection. Rather, the bits are transmitted only
when an appreciable change has been detected with respect to
the last transmitted non-active voice frame.
At the decoder side, the received bit stream is decoded. If
the VAD output is 1, the G.729A decoder is invoked to
synthesize the reconstructed active voice frames. If the VAD
output is 0, the CNG module is called to reproduce the non-
active voiced frames.
III. ARCHITECTURE
As the standard give precedence to its 16-bit fixed-point
ANSI C code over mathematical description of the algorithm,
therefore, we referred to ANSI C code for its implementation.
For efficient implementation of algorithm on Xilinx’s Vertex-
II Pro FPGA., a highly optimized time-shared architecture is
developed. This architecture has been used very efficiently to
encode and decode multiple channels of speech. This
implementation used resource more efficiently rather than its
implementation based on fully dedicated architecture [7].
The block diagram of the architecture is shown in the figure
3. Due to large requirement of memory, dual port block
memory of the FPGA is used.
IEEE ICM - December 2007
RAM Input/output
Buffers
64
64
16
64-bits 16-bits
Fig 3 Block Diagram of Codec Architecture
To get multiple operands in one cycle, memory is divided into
4 banks of 16-bit word-size. It provides us eight 16-bit
operands in one cycle. Functional Unit performs the operations
according to functions given in BASIC_OP.C and
OPER_32B.C of the ANSI C code. These operations are
highly optimized because the performance of coder and
decoder depend on these operations. Multiple units of each
operation units are used in Functional Unit to operate
efficiently on 8 operands obtained from memory. These units
are connected in different configuration among themselves to
form different fully pipelined paths that helps to optimize the
loops in ANSI C code. The input/output buffers of encoder
and decoder are maintained in a separate memory block. This
does not add the delay of buffering in and out of channels in to
the main computational activity of encoding or decoding.
The upper portion of the RAM is initialized with lookup
tables and constants for both encoder and decoder. The scratch
memory space acts as temporary memory required for either
for encoding or decoding. Due to adaptive nature of the
algorithm, large amount of static memory is used in the C
code. Therefore we allocated memory for each encoding
/decoding channel in RAM and initialized it. A separate space
for channels static memory pointers is allocated in lookup
tables and constant portion of RAM. These pointers pointed to
channel static memories in RAM. The information about
number of channels is also maintained in that space. The
memory for input/output buffer is organized to support
multiple-channels. The dual-buffering scheme is used for both
input/output buffers. The different memory requirement for
lookup tables, scratch RAM, channel static RAM is shown in
the table 1. This shows that for adding each encoding and
decoding channel, 2448 and 2368 bytes memory is required
respectively for static memory.
As the codec operate on 10 ms speech, so, the same
architecture is used for both encoding and decoding on time-
shared basis. The G.729 A/B Controller perform the encoding
or decoding at a time depending on which channel is
scheduled by the scheduler. The Input/output Buffers
Controller is responsible for buffering in and out of each
channel. The scheduler is responsible for scheduling channels
I/O buffers for encoding or decoding. It performs scheduling
on round-robin basis giving equal priority to each channel. The
architecture is optimized such that more than one channel can
be add in 10 ms time frame of G.729. Thus, we can add
multiple channels in our design just by adding their
corresponding static memory in RAM and I/O buffer.
TABLE I
MEMORY REQUIREMENT FOR CODER
Memory for Memory
Required
(Bytes)
Memory in each
bank (Bytes)
Lookup tables &
constants
7808
3904
Scratch RAM
4992
1248
Encoder Static
RAM(each ch)
2448
612
Decoder Static
RAM(each ch)
2368
592
IV. SYNTHESIS RESULTS
The design is synthesized for Xilinx’s Vertex-II Pro FPGA.
It is coded in Verilog HDL. The design is simulated on
ModelSim. The results for encoding and decoding are fully
verified with ANSI C code of G.729 A/B. The synthesis result
of resource utilization is shown in table 2. To achieve
efficiency, built-in multipliers of vertex-II Pro are used. The
design is able to run at 161 MHz
TABLE II
RESOURCES UTILIZATION USING VERTEX-II PRO
No Of Slices 3411
No Of Slice Flip Flops 2571
No Of 4 Input LUT 9969
No Of BRAMs(for 1 channel) 2 (1K blocks)
No of MULT18X18s 4
The number of cycles and time required to encode and
decode using TSTSEQ1 of TESTVCI G.729 A/B running at
161 MHz is shown in table 3.
Lookup
Tables
Scratch
RAM Ch#1
Static
RAM
Ch#2
Static
RAM
Ch#N
Functional
Unit
Ch#1
Ch#2
Ch#N
Ch#1
Ch#2
Ch#N
G.729 A/B
Controller
Input/output
Buffers
Controller
Scheduler
IEEE ICM - December 2007
TABLE III
CYCLES REQUIRED USING TSTSEQ1 OF G.729 A/B
Operation Cycles Required Time
Required
Encoding 40,112 0.249ms
Decoding 38,634 0.239ms
The total delay of encoding and decoding become 0.249 ms
+ 0.239 ms = 0.488 ms, this mean that we can have about 20
channel of encoding and 20 channel of decoding in 10 ms of
time frame. As the architecture is fully optimized to support
multiple channels, the number of channels in 10 ms can be
accommodated as per our requirements.
V. CONCLUSION
FPGA implementation of G.729 A/B has improved
performance in term of computational delay and memory as
compared to its implementation on DSPs. The advantage of
reduction in the computational delay can allow additional
channels in 10ms time frame of G.729. As the time-shared
architecture is fully optimized to support multiple channels,
the number of channels in 10 ms can be accommodated as per
our requirements. This implementation used resource more
efficiently rather than its implementation based on fully
dedicated architecture.
REFERENCES
[1] ITU-T Recommendation G.729, “Coding of speech at 8 Kbps using
Conjugate-Structure Algebraic-Code-Excited Linear-prediction (CS-
ACELP)”.
[2] ITU-T Recommendation G.729-Annex A, “Annex A: Reduced
complexity 8 Kbp/s CS-ACELP speech codec”.
[3] ITU-T Recommendation G.729-Annex B, “Annex B: A silence
compression scheme for G.729 optimize for terminal conforming to
Recommendation V.70”.
[4] M. Arora, N. Lahane, A. Prakash, “All assembly implementation of
G.729 Annex B speech codec on a fixed point DSP” IEEE International
Conference on Acoustics, Speech, and Signal Processing 2002,
Conference Proceedings.
[5] Mithun Banerjee, Vani. B. A, G. Radha Krishna, “Optimal Real Time
DSP implementation of ITU G.729 Speech Codec” IEEE 60th Vehicular
Technology Conference 2004.
[6] Jaewon Kim, Hyungjung Kim, Songin Choi, Younggap You, “The
implementation of G.729 speech coder on a 16-bit DSP chip for the
CDMA IMT-2000 system” IEEE Transactions on Consumer
Electronics, Volume 45, Issue 2, May 1999, pp 443-448.
[7] N. Mobini, B. Vahdat, M.H. Radfar, “An FPGA based implementation
of G.729” IEEE International Symposium on Circuit and System 2005,
23-26 May 2005, pp 3571-3574, Vol 4.
ACKNOWLEDGEMENTS
Simulations and Syntheses for this work were conducted in
the ASIC Design and DSP Laboratory, University of
Engineering and Technology Taxila.
IEEE ICM - December 2007