Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
87
5 FFT DESIGN METHODOLOGY
5.1 General
The fast Fourier transform is used to deliver a fast approach for the processing of
data in the wireless transmission. The Fast Fourier Transform is one of the methods
of converting the time domain data to frequency domain data with less hardware
requirement and fast time utilization.
5.2 Fast Fourier Transform
The conventional signal and image processing applications requires high
computational power based on Fast Fourier Transform (FFT) in addition to the
ability to choose the algorithm and architecture. When considering alternate FFT
algorithm implementations the criteria to consider are: execution speed,
programming effort, hardware design effort, system cost, flexibility and precision.
Nevertheless, for real time signal processing the main concern is execution speed.
The implementation has been made on a Field Programmable Gate Array (FPGA)
as a way of obtaining high performance at economical price and a short time of
realization. It can be used with segmented arithmetic of any level of pipeline in
order to speed up the operating frequency.
5.3 Fixed-Radix FFT Algorithms
In this section we will introduce several fixed-radix FFT algorithms such as radix-2,
radix-4, mixed radix-4-2, R2MDC, Proposed Modified R2MDC etc.
88
5.3.1 Radix-2 FFT Algorithm
The radix-2 FFT algorithm is obtained by using the divide-and-conquer approach
split the output sequence X(k) into two summations[87], one of which involves the
sum over the first 2/ N data points and the second sum involves the last 2/ N data
points. Thus we obtain,
(5.1)
Now, let us restrict the discussion to N power of 2 and consider computing
separately the even-numbered frequency samples and the odd-numbered frequency
samples. Thus we obtain the even-numbered frequency samples as
(5.2)
89
Equation (5.2) is the 2 N point DFT of the 2 N point sequence obtained by
subtracting the bottom half of the input sequence from the upper half and
multiplying the resulting sequence by n
NW . If we define the 2 N point sequences
g (n) and h(n) as
(5.3)
Figure 5.1 Signal flow graph of a typical 2-point DFT
The computation of the sequence g(n) and h(n) according to Equation (5.3) and the
subsequent use of these sequences to compute the N/2 point DFTs are depicted in
Figure 5.1. For the 64-point DFT [1], the computation has been reduced to a
computation of 2-point DFTs. With the computation of Figure inserted in the signal
flow graph of Figure 5.1, we obtain the complete signal flow graph for computation
of the 64-point DFT, as shown in Figure 5.1.
From Figure 5.2 the proceeding from one stage to the next, the basic computation in
the form of Figure 5.1 i.e., it involves obtaining a pair of values in one stage from a
pair of values in the preceding stage, where the coefficients are always power of WN
and the exponents are separated by N/2. Because of the shape of the signal flow
graph, this elementary computation is called a butterfly [84]. It is also noted that the
butterfly number of N/2 is regular in each stage. The basic butterfly of Figure 5.1
90
can be redrawn in Figure 5.2, which requires only one complex multiplication and
two complex additions.
Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6
Figure 5.2 Radix-2 DIF FFT signal flow graph of 64-point
From Figure 5.2 the time domain input data x (n) occurs in natural order, but the
frequency domain output DFT X (k) occurs in bit-reversed order. It is also noted
that the computations are performed in-place. In-place represents memory read and
memory write in each butterfly processing use the same memory location.
By this method, the required memory space can be minimized. It is also observed
from Figure 5.2, the relationship between the input data and the output data that the
output data index is [kok1……klog2N-2klog2N-1]2 mapped to index [klog2N-
1klog2N-2.... kok1]2 in a one dimension memory array.
For example in the 64-point radix-2 DIF FFT signal flow graph, the output index
1011 is mapped to index 1101 of the memory array. For the radix-2 16-point DIF
91
FFT signal flow graph, the relationship between normal order and bit-reversed order
can be explained clearly in Figure 5.3.
Figure 5.3 Bit-reversed order
However, it is possible to reconfigure the decimation-in-frequency algorithm so that
the input sequence occurs in bit-reversed order while the output DFT occurs in
normal order. Furthermore, if we abandon the requirement that the computations be
done in place, it is also possible to have both the input data and the output DFT in
normal order. This case is called out-of-place mode.
5.3.2 Radix 4 FFT
A radix-4 common-factor FFT algorithm can be employed when N = 4k by
recursively reorganizing sequences into N × N/4 arrays. The development of a
radix-4 algorithm is similar to the development of a radix-2 FFT. Here, both DIT
and DIF versions are possible. Rabiner and Gold (1975) provide more details on
radix-4 algorithms [89].
The Radix-4 decimation in time butterfly is represented in Figure 5.4. As with the
development of the radix-2 butterfly, the radix-4 butterfly is formed by merging a 4-
point DFT with the associated twiddle factors that are normally between DFT
stages. The four inputs A, B, C, and D are on the left side of the butterfly diagram
and the latter three are multiplied by the complex coefficients Wb, Wc, and Wd
respectively. These coefficients are all of the same form but are shown with
92
different subscripts here to differentiate the three since there is more than one in a
single butterfly.
Figure 5.4 Radix-4 DIT butterfly
When the number of data points N in the DFT is a power of 4 (i.e., N =4 ), one can
always use a radix-2 algorithm for the computation. However, it is computationally
more efficient to employ a radix-4 FFT algorithm. Similarly to the radix-2 FFT
algorithm we use divide-and-conquer approach decimate the N-point DFT into four
point N/4 DFTs. We have
(5.4)
From the definition of the twiddle factors, we have
93
(5.5)
The relation is not an N/4-point DFT because the twiddle factor [90] depends on N
and not on N/4. To convert it into an N/4-point DFT we subdivide the DFT
sequence into four N/4-point subsequences, X(4k), X(4k+1), X(4k+2), and
X(4k+3), k = 0, 1, ..., N/4. Thus we obtain the radix-4 decimation-in frequency DFT
as
(5.6)
where, the propertykn
N
kn
N WW 4/
4 . Note that the input to each N/4-point DFT
is a linear combination of four signal samples scaled by a twiddle factor. This
procedure is repeated v times, where4log Nv .
5.3.3 Radix 8 FFT
A radix-8 common-factor FFT algorithm can be employed similar to radix 4 when
N = 8k by recursively reorganizing sequences into N × N/8 arrays. The
development of a radix-8 algorithm is also similar to the development of a radix-4
FFT. Since the Radix 8 FFT is beyond the scope of this thesis more descriptions are
not included.
94
5.3.4 Split Radix FFT
After one has studied the fixed radix (radix-2 and radix-4) algorithms, it is
interesting to see that for radix-2 the even-numbered points of the DFT [91, 92] and
20] can be computed independently of the odd-numbered points. This suggests the
possibility of using different computational methods for independent parts of the
algorithm with the objective of reducing the number of computations.
The split-radix FFT (SRFFT) algorithms exploit this idea by using different fixed-
radix decomposition in the same FFT algorithm. The split-radix approach was first
proposed by Duhanmel and Hollmann in 1984 [52]. This FFT algorithm can be
developed by mixing various two or more fixed-radix decomposition methods, such
as split-radix 2/4, split-radix 2/8, split-radix 2/4/8 etc. Split-radix 2/4 alone is will
be considered here. In the mixing fixed-radix, the radix-2 is the basic component
because the radix-2 can compute all of power of 2-point DFTs.
We illustrate this approach with a DIF SRFFT algorithm. First, we recall that in the
radix-2 DIF FFT algorithm, the even-numbered samples of N-point DFT are given
as
(5.7)
A radix-2 suffices for this computation. The odd-numbered samples {X(2k+1)} of
the DFT require the pre-multiplication of the input sequence with the twiddle
factors n
NW . For these samples radix-4 decomposition produces some
computational efficiency because the four-point DFT has the largest multiplication-
free butterfly. Indeed, it can be shown that using a radix greater than 4 does not
result in a significant reduction in computational complexity.
95
If we use a radix-4 decimation-in-frequency FFT algorithm for the odd-numbered
samples of the N-point DFT, we obtain the following N/4-point DFTs:
(5.8)
Thus the N-point DFT is decomposed into one 2 N -point DFT [93] without
additional twiddle factors and two 4 N -point DFTs with twiddle factors. The N-
point DFT is obtained by successive use of these decompositions up to the last
stage. Thus we obtain a DIF split-radix 2/4 algorithm [6]. The signal flow graph of
basic butterfly cell of split-radix 2/4 DIF FFT algorithm is shown in Figure 5.5.
Figure 5.5 Signal flow graph of basic butterfly cell of split-radix 2/4 DIF FFT
We have,
(5.9)
96
As a result, even and odd frequency samples of each basic processing block are not
produced in the same stage of the complete signal flow graph. This property causes
irregularity of signal flow graph, because the signal flow graph is an “L”-shape
topology.
It is noted that the butterfly counting can not have regularity with each stage as the
radix-2 or radix-4 FFT algorithm, and its coefficients arrangement is very irregular
too, that it requires more effort in implementation than the other FFT algorithms.
5.3.5 Mixed Radix 4-2 FFT
The mixed-radix 4/2 butterfly unit is shown in Figure 5.6. It uses both the radix-2^2
and the radix-2 algorithms and can process FFTs that are not power of four. The
mixed-radix 4/2 [2], [3], [4], which calculates four butterfly outputs based on
X(0)~X(3). The proposed butterfly unit has three complex multipliers and eight
complex adders. Four multiplexers represented by the solid box are used to select
either the radix-4 calculation or the radix-2 calculation.
Figure 5.6 The basic butterfly for mixed-radix 4/2 DIF FFT algorithm
In order to verify the proposed scheme, 64-points FFT based on the proposed
Mixed-Radix 4-2 butterfly with simple bit reversing for ordering the output
97
sequences is exampled. As shown in the Figure 5.7, the block diagram for 64-points
FFT is composed of total six-teen Mixed-Radix 4-2 Butterflies. In the first stage,
the 64 point input sequences are divided by the 8 groups which correspond to n3=0,
n3=1, n3=2, n3=3, n3=4, n3=5, n3=6, n3=7 respectively. Each group is input
sequence for each Mixed-Radix 4-2 Butterfly. After the input sequences pass the
first Mixed-Radix 4-2 Butterfly stage, the order of output value is expressed with
small number below each butterfly output line in the figure 5.7. The proposed
Mixed-Radix 4-2 is composed of two radix-4 butterflies and four radix-2 butterflies
[98], [99].
In the first stage, the input data of two radix-4 butterflies which are expressed in
equation (5.9), are grouped with the x(n2), x(N/2±n2), x(N/4±n2), x(3N/4±n2) and
x(N/8±n2), x(5N/8±n2), x(3N/8±n2), x(7N/8±n2) respectively. After the each input
group data passes the first radix-4 butterflies, the outputted data is multiplied by the
special twiddle factors. Then, these outputted sequences are inputted into the second
stage which is composed of the radix-2 butterflies. After passing the second radix-2
butterflies, the outputted data are multiplied by the twiddle factors. These twiddle
factors WQ (1+k) is the unique multiplier unit in the proposed Mixed-Radix 4-2
Butterfly [99] with simple bit reversing the output sequences. Finally, we can also
show order of the output sequences shown in above Figure 5.6.
The order of the output sequence is 0,4,2,6,1,5,3 and 7 which are exactly same at
the simple binary bit reversing of the pure radix butterfly structure. Consequently,
proposed mixed radix 4-2 butterfly with simple bit reversing output sequence
include two radix 4 butterflies, four radix 2 butterflies, one multiplier unit and
additional shift unit for special twiddle factors [98], [99], [100].
The Mixed-Radix 4-2 butterfly structure with simple bit reversing for ordering the
output sequences derived by index decomposition techniques is given. The Mixed-
Radix 4-2 butterfly structure is using the same number of multiplier as the Radix-
98
2^3 and the Split-Radix 2/4/8 algorithm. However, the Split-Radix 2/4/8 butterfly
[88] has not a regular shape. Therefore the realization is very complicated.
Figure 5.7 The Mixed-Radix 4-2 butterfly structure
5.3.6 R2MDC FFT Algorithm
This section investigates a new architecture for pipelined Radix-2 FFT used in
MIMO-OFDM. The radix-2 multipath delay commutation (R2MDC) is one of the
commutated architectures of radix-2 FFT algorithm which is used to commutate the
values as fast as possible in order to process the values and to commutate the FFT
inputs. One of the most straightforward approaches for pipeline implementation of
99
radix-2 FFT algorithm is Radix-2 Multi-path Delay Commutator (R2MDC)
architecture [94]. It’s the simplest way to rearrange data for the FFT/IFFT
algorithm. The input data sequence are broken into two parallel data stream flowing
forward, with correct distance between data elements entering the butterfly
scheduled by proper delays. At each stage of this architecture half of the data flow
is delayed via the memory (Reg) and processed with the second half data stream.
The delay for each stage is 4, 2, and 1 respectively.
In this R2MDC architecture, both Butterflies (BF) and multipliers are idle half the
time waiting for the new inputs. The 8-point FFT/IFFT processor has one
multiplier, 3 of radix-2 butterflies, 10 registers (R) (delay elements) and 2 switches
(S).
Figure 5.8 R2MDC architecture
The A input comes from the previous component twiddle factor multipliers (TFM).
The B output is fed to the next component, normally BFII. In first cycles,
multiplexors direct the input data to the feedback registers until they are filled
(position “0”). On next cycles, the multiplexors select the output of the adders/sub
tractors (position “1”), the butterfly computes a 2-point DFT with incoming data
and the data stored in the feedback registers [94]. The detailed structure of BFI is
shown in Fig.5.9 (a).
100
Figure 5.9 (a) BF I Structure and Figure 5. (b) BF II Structure
The B input comes from the previous component, BFI. The Z output fed to the next
component, normally TFM. In first cycles, multiplexors direct the input data to the
feedback registers until they are filled (position “0”). On next cycles, the
multiplexors select the output of the adders/sub tractors (position “1”), the butterfly
computes a 2-point DFT with incoming data and the data stored in the feedback
registers. The multiplication by –j involves real-imaginary swapping and sign
inversion. The real-imaginary swapping is handled by the multiplexors MUX in
efficiently and the sign inversion is handled by switching the adding-subtracting
operations by mean of MUX. When there is a need for multiplication by −j, all
multiplexors switches to position “1”, the real-imaginary data are swapped and the
adding-subtracting operations are switched. The detailed structure of BF I and BFII
are shown in Figure 5.9 (a) & (b). The adders and sub tractors in BFI and BFII are
fully-pipelined and followed by divide-by-2 and rounding [94]. The algorithm used
here is to commutate the radix-2 algorithm in the IFFT architecture and to replace
by R2MDC architecture in order to get a low area than the existing system.
5.3.7 Proposed Modified R2MDC FFT
The Radix-2 butterfly processor is consists of a complex adder and complex
subtraction. Besides that, an additional complex multiplier for the twiddle factors
WN is implemented. The complex multiplication with the twiddle factor requires
four real multiplications and two add/subtract operations.
101
The A input comes from the previous component twiddle factor multipliers (TFM).
The B output is fed to the next component, normally BFII. In first cycles,
multiplexors direct the input data to the feedback registers until they are filled
(position “0”). On next cycles, the multiplexors select the output of the adders/sub
tractors (position “1”), the butterfly computes a 2-point DFT with incoming data
and the data stored in the feedback registers. The detailed structure of BFI is shown
in Fig. 5.10 (a).
The architecture of BFI and BFII supporting two receive chains is shown in Fig.
5.10 (a) and Fig.5.10 (b). In BFI structure the sample routing MUXs and DEMUXs
at the input and output of the BF_RAMs are controlled based on c2 and c3 control
signals while the computation unit is controlled by c1 control signal. The control
signals are issued by the BFI controller. Depending on the programming of number
of receive chains the extra BF_RAMs are enabled. WiMAX supports 1Rx and 2Rx,
LTE supports 1Rx, 2Rx and 4Rx. Based on the requirement extra buffers can be
extended to the existing BF structure.
(a) (b)
Figure.5.10 (a) BF I Structure (b) BF II Structure
Since the handling -1, +j and -j multiplication is handled inside the BFII structure,
two control signals c1 and c2 are used in the basic computation unit. The muxes and
102
the demuxes are controlled by c3 and c4 control signals. The product with ‘-j’ term
is implemented by swapping the real and imaginary part considering the sign of the
sample. The algorithm used here is to commutate the radix-2 algorithm in the IFFT
architecture [94].
In order to optimize the processor, the proposed shift and add method that
eliminates the non-trivial complex multiplication with the twiddle factors (W81,
W83) and implements the processor without complex multiplication. The proposed
butterfly processor performs the multiplication with the trivial factor W82=-j by
switching from real to imaginary part and imaginary to real part, with the factor W80
by a simple cable. With the non-trivial factors W81= e
-jπ/4, W8
3= e
-j3π/4, the processor
realize the multiplication by the factor 1/√2 using hardwired shift-and-add operation
as shown in Figure.5.11.
Figure 5.11 MOD-R2MDC Butterfly FFT with no complex multiplication.
5.4 Summary
This chapter includes the detailed description about different FFT design
methodology for Radix-2, Radix-4, Radix-8, Mixed Radix 4-2, Split Radix,
R2MDC and Modified R2MDC FFT.