Upload
lamdiep
View
224
Download
3
Embed Size (px)
Citation preview
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Mse: Hardware AlgorithmsParallelization
Marcel JacometJosef Goette
Bern University of Applied SciencesBfh-Ti HuCE-microLab, Biel/Bienne
huce.ti.bfh.ch/microlab
October 11, 2017
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Introduction
Parallelization
Unfolding
Hardware Rules
OCT ExampleIntroduction to OCT
Parallelization at OCT ExampleData-Path UnfoldingFiFo UnfoldingDFT Unfoldingl
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Textbooks
◮ Vlsi Digital Signal Processing Systems, Design andImplementation, Keshab K. Parhi, John Wiley & Sons,Isbn 0-471-24186-5, 1999, USD 135
◮ Oct texts discussing the lab example can be found on theweb
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Parallelization Principles 1
◮ parallelization at degree p speeds up hardware algorithms byup to factor p
◮ parallelization of hardware basically can be done in two ways:◮ p identical hardware paths executing time delayed
data-streams in parallel◮ p interlinked hardware paths executing a stream of data
vectors of length p data sets in parallel
◮ the first approach is a straight forward implementation usingp times the number of non parallelized hardware
◮ the second approach is more challenging, using p times thenumber of operators of the non parallelized hardware, but theidentical number of storage elements only
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Parallelization Principles: Parallel Streams
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Parallelization Principles: Parallel Sets
data sampling channel 1
data sampling channel 2
data sampling channel 3
data sampling channel 4
data sampling channel 5
data sample(5 set vector)
interlinked parallel processing of samples (vectors)
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Dataflow Graph Representation
y [n] = a · x [n] + b · x [n − 1] + c · x [n − 2]
◮ block diagram of 3-tap FIR filter
1z
1z
y[n]
x[n-2]x[n-1]x[n]
a b c
◮ data-flow diagram of 3-tap FIR filter
y[n]
x[n]
a b c
D 2D
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Dataflow Graph: Pipelining
◮ pipelining is done by introducing additional delay elements(registers)
◮ pipelining delays elements can only be set in feed-forwardpaths
y[n]
x[n]
a b c
D2D
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Dataflow Graph: Pipelining
◮ pipelining is done by introducing additional delay elements(registers)
◮ pipelining delays elements can only be set in feed-forwardpaths
y[n]
x[n]
a b c
D3D
D
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Dataflow Graph: Pipelining for Speedup
◮ pipelining to increase clock frequency
◮ retiming theory (Bellman-Ford or Floyd-Warshall algoithms)
◮ Fir example: frequency 1/(4u)
y[n]
x[n]
a b c
D2D
(2u) (2u) (2u)
(1u) (1u)
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Dataflow Graph: Pipelining for Speedup
◮ pipelining to increase clock frequency
◮ retiming theory (Bellman-Ford or Floyd-Warshall algoithms)
◮ Fir example: frequency is 1/(2u) instead of 1/(4u)
y[n]
x[n]
a b c
D2D
(2u) (2u) (2u)
(1u) (1u)D
D D
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Dataflow Graph: Pipelining for Speedup
◮ pipelining to increase clock frequency
◮ retiming theory (Bellman-Ford or Floyd-Warshall algoithms)
◮ Fir example: frequency is 1/(2u) instead of 1/(4u)
y[n]
x[n]
a b c
D D
(2u) (2u) (2u)
(1u) (1u)D
D D
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Unfolding 1
◮ unfolding or loop unrolling
◮ example
y [n] = a · y [n − 9] + x [n]
1: for i ← 1, to ∞ do
2: y [i ]← a · y [i − 9] + x [i ]
◮ replacing index n by 2k and n + 1 by 2k + 1
◮ together, the 2 equations describe the same algorithm
y [2k] = a · y [2k − 9] + x [2k]
y [2k + 1] = a · y [2k − 8] + x [2k + 1]
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Unfolding 2
◮ parallelization degree: J-slow
◮ J-slow means that for an input x [kJ +m] the output after adelay is x [(k − 1)J +m]
◮ thus we get:
y [2k] = a · y [2(k − 5) + 1] + x [2k]
y [2k + 1] = a · y [2(k − 4) + 0] + x [2k + 1]
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Unfolding 3
◮ data flow graph of example
◮ algorithm of example (2-slow)
x[n]
a
9D
y[n]
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Unfolding 3
◮ data flow graph of example
◮ algorithm of example (2-slow)
x[2k+1]
a
4D
x[2k]
a
5D
y[2k+1]
y[2k]
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Unfolding Design Procedure
◮ for each node U in the original Dfg, draw the J nodesU0,U1, · · · ,UJ−1
◮ for each edge U → V with w delays in the original Dfg,draw the J edges Ui → V
(i+w)mod (J)with ⌊ i+w
J⌋ delays for
i = 0, 1, 2, · · · , J − 1
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Unfolding Design Procedure
◮ for each node U in the original Dfg, draw the J nodesU0,U1, · · · ,UJ−1
◮ for each edge U → V with w delays in the original Dfg,draw the J edges Ui → V
(i+w)mod (J)with ⌊ i+w
J⌋ delays for
i = 0, 1, 2, · · · , J − 1
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Unfolding Design Procedure
◮ for each node U in the original Dfg, draw the J nodesU0,U1, · · · ,UJ−1
◮ for each edge U → V with w delays in the original Dfg,draw the J edges Ui → V(i+w)mod (J) with ⌊ i+w
J⌋ delays for
i = 0, 1, 2, · · · , J − 1
U0
U1
U2
V0
V1
V2
T0
T1
T2
U V
T
D
6D
5D
D
D
2D
2D
2D
2D
2D
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Signal Processing Hardware Rules: ”No ControlPath”
◮ 1/z register stores at every clock cycle a new input sample
◮ if clause asks for controllable registers (with enable)
◮ let’s built it in Simulink: hardware rule
1z
Unit Delay
Register
D
clk
Q
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Signal Processing Hardware Rules: ”No ControlPath”
◮ 1/z register stores at every clock cycle a new input sample
◮ if clause asks for controllable registers (with enable)
◮ let’s built it in Simulink: hardware rule
1z
Unit Delay
Register
D
clk
Q
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Signal Processing Hardware Rules: ”No ControlPath”
◮ 1/z register stores at every clock cycle a new input sample
◮ if clause asks for controllable registers (with enable)
◮ let’s built it in Simulink: hardware rule
u
E
1z
Unit Delay
y
Enabled
1z
Unit Delay
Register
D
clk
Q
EnabledRegister
D
clk
Q
ena
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Signal Processing Hardware Rules: ”No ControlPath”
◮ 1/z register stores at every clock cycle a new input sample
◮ if clause asks for controllable registers (with enable)
◮ let’s built it in Simulink: hardware rule
u
E
1z
Unit Delay
y
Enabled
1z
Unit Delay
Register
D
clk
Q
EnabledRegister
D
clk
Q
ena
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Signal Processing Hardware Rules: ”No ControlPath”
◮ 1/z register stores at every clock cycle a new input sample
◮ if clause asks for controllable registers (with enable)
◮ let’s built it in Simulink: hardware rule
1z
Unit Delay
~=0
Switch
Register
D
clk
Q
EnabledRegister
D
clk
Q
ena
1z
Unit Delay
ena
DQ
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Signal Processing Hardware Rules: ”No ControlPath”
◮ 1/z register stores at every clock cycle a new input sample
◮ if clause asks for controllable registers (with enable)
◮ let’s built it in Simulink: hardware rule
1z
Unit Delay
~=0
Switch
Register
D
clk
Q
EnabledRegister
D
clk
Q
ena
1z
Unit Delay
ena
DQ
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Introduction to OCT: Features
◮ Oct is an optical signal acquisition and processing method
◮ micro-meter resolution in 3-D images
◮ optical scattering/reflecting media: biological tissues
◮ interferometric technique with near infrared laser
◮ reflection is caused by refraction index changes at tissueboundaries
◮ recent Oct technology is frequency domain Oct provideslow Snr and high speed signal acquisition
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Introduction to OCT: Applications
◮ applications in medicine: ophthalmology, ...
◮ depth penetration of 1 to 3 mm (A-scan)
◮ speeds of 100 kS/s per depth scans at 2048 pixels, ≥ 200MS/s
◮ Oct image of pig eye at HuCE-optoLab (left),Oct setup with Gecko platform at HuCE-microLab (right)
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Introduction to OCT: Principle
◮ low coherence source (Lcs)
◮ beam splitter (Bs)
◮ reference (Ref) and sample arm (Smp)
◮ diffraction grating (Dg) and full field camera Cam) asspectrometer (source wiki)
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Introduction to OCT: Signals
◮ top: captured fourier domain Oct signals of A-scan◮ middle: signals after filtering and remapping◮ bottom: final A-scan image after inverse Fft
0 200 400 600 800 1000 12000
1
2
3
wave length [nm]
Inte
nsity
a.u
.
7.25 7.3 7.35 7.4 7.45 7.5 7.55 7.6 7.65 7.7 7.75
−0.5
0
0.5
1
wave number [1/um]
Inte
nsity
a.u
.
−1000 −800 −600 −400 −200 0 200 400 600 800 10000
0.05
0.1
0.15
0.2
depth z [um]
Inte
nsity
a.u
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Signal Processing in OCT: Remapping 1
◮ Oct input signals are captured in λ (wave length) domain
◮ they have to be transformed into k (wave number) domain
◮ this process is called remapping
7.25 7.3 7.35 7.4 7.45 7.5 7.55 7.6 7.65 7.7 7.75
7.25
7.3
7.35
7.4
7.45
7.5
7.55
7.6
7.65
7.7
7.75
camparison of k (linear) and k = 2*pi/lambda(n)
linear k
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Signal Processing in OCT: Remapping 2
◮ λ (wave length) from 810 nm to 870 nm◮ λ equidistant sampling in wave length: Ln◮ λ equidistant sampling in wave number: Lm
Ln-1 Ln Ln+1 Ln+2
Lm-1 Lm Lm+1
L (equidistant in L)
L (equidistant in k)
Lstep
valA
valBout(m)
input signal
remapped signal
◮ relation is: k = 2π/λ with
Lstep =λmax−λmin
NLn = λmin + n · Lstep
kstep =2π
λmin−
2πλmax
NLm = 2π
kmax−m·kstep
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Signal Processing in OCT: Remapping 3
◮ signal processing with look-up table◮ no division with iteration◮ no error due to continuous summing
Ln-1 Ln Ln+1 Ln+2
Lm-1 Lm Lm+1
L (equidistant in L)
L (equidistant in k)
Lstep
valA
valBout(m)
input signal
remapped signal
outm = valA+ (valB−valA)Lstop
· (Lm − Ln)
outm = valA+ (valB− valA) · LUTk(addr)
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Signal Processing in OCT: Control Path
◮ signal processing: data path and control path◮ for clause would be perfect◮ if clause in code asks for control path◮ control can also be done by look-up tables
Ln-1 Ln Ln+1 Ln+2
Lm-1 Lm Lm+1
L (equidistant in L)
L (equidistant in k)
Lstep
valA
valBout(m)
input signal (equidistant sampling in wave length)
remapped signal (equidistant sampling in wave number)
Lm+2
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Signal Processing in OCT: Control Path
◮ signal processing: data path and control path◮ for clause would be perfect◮ if clause in code asks for control path◮ control can also be done by look-up tables
Ln-1 Ln Ln+1 Ln+2
Lm-1 Lm Lm+1
L (equidistant in L)
L (equidistant in k)
Lstep
valA
valBout(m)
input signal (equidistant sampling in wave length)
remapped signal (equidistant sampling in wave number)
Lm+2
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Signal Processing in OCT: Datapath and ControlPath
1: i ← 1, j ← 1, m← 1, adr ← 12: while m ≤ 1024 do
3: varA← inp[i ]4: varB ← inp[i + 1]5: if lutCtr(adr − 1) 6= 2 then
6: outm(j)← varA+ (varB − varA) ∗ lutK (adr)7: if lutCtr(adr) = 0 increment input and output sample
index then
8: m← m + 19: i ← i + 1
10: else if lutCtr(adr) = 3 keep, do not load new input samplethen
11: m← m + 112: else if lutCtr(adr) = 2 skip, do not generate output
sample then
13: i ← i + 114: adr ← adr + 1
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Signal Processing in OCT: Simulink
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Signal Processing in OCT: Simulink
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Signal Processing in OCT: ”No Control Path”
outm = valA+ (valB− valA) · LUTk(addr)
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Signal Processing in OCT: ”No Control Path”
outm = valA+ (valB− valA) · LUTk(addr)
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Signal Processing in OCT:Simplifications in Control Path
outm = valA+ (valB− valA) · LUTk(addr)
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Unfolding: OCT Example 1
◮ OCT data flow graph for interpolation
◮ exercise: design a 4-slow unfolding
◮ simulate it with Matlab/Simulinik
in Mux
wr
Mux
wr
+
- *out+
D
D
D
D
D
D
D
lutKlutCTR
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Unfolding: How to Model the FiFo?
◮ OCT data flow graph for interpolation
◮ exercise: 4-slow unfolding inlcuding control path
◮ what about the FiFos?
in Mux
wr
Mux
wr
+
- *Mux
wr
out+
not 3 not 2
+
LUT ctr
LUT k1
D
D
D
D
D
D
DD
D
D
D
2D 3D
D
D
1
?? D
push pop
FiFo ??
push pop
FiFo
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
FiFo Model
◮ Dfg model of a FiFo
◮ the FiFo has to be decomposed downto delay elements andcombinational logic
push pop
FiFo
Mux
wr
D
D Mux
wr
D
D
push pop
dual portRAM
in out
adrWadrRD
D
1
D
D
1
in out
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Unfolding the FiFo Model
◮ Dfg model of an 2-slow unfolding of FiFos
◮ impossible to compose again FiFos
◮ shall we start to re-implement all IP cores?
Mux
wr
Mux
wr
push
pop
dual portRAM
in out
adrWadrR
1
D
1
inout
Mux
wr
D
Mux
wr
pushpop
dual portRAM
in out
adrWadrR
11
inout
D D
D
D
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Dft (Dtfs): Discrete Fourier Transform
◮ natural parallelization by Fft algos
◮ N-point Dft
X [k] =
N−1∑
n=0
x [n]W knN , k = 0, 1, 2, . . . ,N − 1
where WN =̂ Nth root of unity
WN =N√1 = e−j(2π/N)
◮ inverse transform
x [n] =1
N
N−1∑
k=0
X [k]W−knN , n = 0, 1, 2, . . . ,N − 1
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Dft: Matrix Form◮ denote the vector of input samples by
x =(
x [0] , x [1] , x [2] , . . . , x [N − 1])T
◮ denote the vector of spectral samples by
X =(
X [0] , X [1] , X [2] , . . . , x [N − 1])T
◮ then the Dft can be written as
X = DFT (x) = Fx
with F =̂
1 1 1 · · · 1
1 WN W 2N · · · W N−1
N
1 W 2N W 2·2
N · · · W2·(N−1)N
...
1 W N−1N W
(N−1)·2N · · · W
(N−1)·(N−1)N
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Dft: Low-Order Fourier Matrix Examples
◮ for N = 2: WN = W2 =2√1 = e−j2π/2 = e−jπ = −1
F2 =̂
(
1 1
1 W2
)
=
(
1 1
1 −1
)
◮ for N = 4: WN = W4 =4√1 = e−j2π/4 = e−jπ/2 = −j
F4 =̂
1 1 1 1
1 W4 W 24 W 3
4
1 W 24 W 2·2
4 W 2·34
1 W 34 W 3·2
4 W 3·34
=
1 1 1 1
1 −j −1 j
1 −1 1 −11 j −1 −j
Mse: HardwareAlgorithms
Parallelization
Marcel Jacomet
Introduction
Parallelization
Unfolding
Hardware Rules
OCT Example
OCT Introduction
Parallelization atOCTExample
Data-Path Unfolding
FiFo Unfolding
DFT Unfolding
Dft: Matrix Factorization Fft
◮ for example N = 1024:
F1024 =̂
(
I512 D512
I512 −D512
)
·(
F512 O
O F512
)
·(
even
odd
)
where I512 =̂ identity matrix
D512 =̂ diag{
1,W1024,W21024, . . . ,W
5111024
}
F512 =̂ 512-point Fourier matrix
permutation at end separates even and odd part:
(↓) x =(
x [0] , x [2] , . . .)
(↓) (z) x =(
x [1] , x [3] , . . .)