48
Mse: Hardware Algorithms Parallelization Marcel Jacomet Introduction Parallelization Unfolding Hardware Rules OCT Example OCT Introduction Parallelization at OCTExample Data-Path Unfolding FiFo Unfolding DFT Unfolding Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-Ti HuCE-microLab, Biel/Bienne [email protected] huce.ti.bfh.ch/microlab October 11, 2017

Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

  • Upload
    lamdiep

  • View
    224

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Mse: Hardware AlgorithmsParallelization

Marcel JacometJosef Goette

Bern University of Applied SciencesBfh-Ti HuCE-microLab, Biel/Bienne

[email protected]

huce.ti.bfh.ch/microlab

October 11, 2017

Page 2: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Introduction

Parallelization

Unfolding

Hardware Rules

OCT ExampleIntroduction to OCT

Parallelization at OCT ExampleData-Path UnfoldingFiFo UnfoldingDFT Unfoldingl

Page 3: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Textbooks

◮ Vlsi Digital Signal Processing Systems, Design andImplementation, Keshab K. Parhi, John Wiley & Sons,Isbn 0-471-24186-5, 1999, USD 135

◮ Oct texts discussing the lab example can be found on theweb

Page 4: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Parallelization Principles 1

◮ parallelization at degree p speeds up hardware algorithms byup to factor p

◮ parallelization of hardware basically can be done in two ways:◮ p identical hardware paths executing time delayed

data-streams in parallel◮ p interlinked hardware paths executing a stream of data

vectors of length p data sets in parallel

◮ the first approach is a straight forward implementation usingp times the number of non parallelized hardware

◮ the second approach is more challenging, using p times thenumber of operators of the non parallelized hardware, but theidentical number of storage elements only

Page 5: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Parallelization Principles: Parallel Streams

Page 6: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Parallelization Principles: Parallel Sets

data sampling channel 1

data sampling channel 2

data sampling channel 3

data sampling channel 4

data sampling channel 5

data sample(5 set vector)

interlinked parallel processing of samples (vectors)

Page 7: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Dataflow Graph Representation

y [n] = a · x [n] + b · x [n − 1] + c · x [n − 2]

◮ block diagram of 3-tap FIR filter

1z

1z

y[n]

x[n-2]x[n-1]x[n]

a b c

◮ data-flow diagram of 3-tap FIR filter

y[n]

x[n]

a b c

D 2D

Page 8: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Dataflow Graph: Pipelining

◮ pipelining is done by introducing additional delay elements(registers)

◮ pipelining delays elements can only be set in feed-forwardpaths

y[n]

x[n]

a b c

D2D

Page 9: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Dataflow Graph: Pipelining

◮ pipelining is done by introducing additional delay elements(registers)

◮ pipelining delays elements can only be set in feed-forwardpaths

y[n]

x[n]

a b c

D3D

D

Page 10: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Dataflow Graph: Pipelining for Speedup

◮ pipelining to increase clock frequency

◮ retiming theory (Bellman-Ford or Floyd-Warshall algoithms)

◮ Fir example: frequency 1/(4u)

y[n]

x[n]

a b c

D2D

(2u) (2u) (2u)

(1u) (1u)

Page 11: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Dataflow Graph: Pipelining for Speedup

◮ pipelining to increase clock frequency

◮ retiming theory (Bellman-Ford or Floyd-Warshall algoithms)

◮ Fir example: frequency is 1/(2u) instead of 1/(4u)

y[n]

x[n]

a b c

D2D

(2u) (2u) (2u)

(1u) (1u)D

D D

Page 12: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Dataflow Graph: Pipelining for Speedup

◮ pipelining to increase clock frequency

◮ retiming theory (Bellman-Ford or Floyd-Warshall algoithms)

◮ Fir example: frequency is 1/(2u) instead of 1/(4u)

y[n]

x[n]

a b c

D D

(2u) (2u) (2u)

(1u) (1u)D

D D

Page 13: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Unfolding 1

◮ unfolding or loop unrolling

◮ example

y [n] = a · y [n − 9] + x [n]

1: for i ← 1, to ∞ do

2: y [i ]← a · y [i − 9] + x [i ]

◮ replacing index n by 2k and n + 1 by 2k + 1

◮ together, the 2 equations describe the same algorithm

y [2k] = a · y [2k − 9] + x [2k]

y [2k + 1] = a · y [2k − 8] + x [2k + 1]

Page 14: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Unfolding 2

◮ parallelization degree: J-slow

◮ J-slow means that for an input x [kJ +m] the output after adelay is x [(k − 1)J +m]

◮ thus we get:

y [2k] = a · y [2(k − 5) + 1] + x [2k]

y [2k + 1] = a · y [2(k − 4) + 0] + x [2k + 1]

Page 15: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Unfolding 3

◮ data flow graph of example

◮ algorithm of example (2-slow)

x[n]

a

9D

y[n]

Page 16: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Unfolding 3

◮ data flow graph of example

◮ algorithm of example (2-slow)

x[2k+1]

a

4D

x[2k]

a

5D

y[2k+1]

y[2k]

Page 17: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Unfolding Design Procedure

◮ for each node U in the original Dfg, draw the J nodesU0,U1, · · · ,UJ−1

◮ for each edge U → V with w delays in the original Dfg,draw the J edges Ui → V

(i+w)mod (J)with ⌊ i+w

J⌋ delays for

i = 0, 1, 2, · · · , J − 1

Page 18: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Unfolding Design Procedure

◮ for each node U in the original Dfg, draw the J nodesU0,U1, · · · ,UJ−1

◮ for each edge U → V with w delays in the original Dfg,draw the J edges Ui → V

(i+w)mod (J)with ⌊ i+w

J⌋ delays for

i = 0, 1, 2, · · · , J − 1

Page 19: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Unfolding Design Procedure

◮ for each node U in the original Dfg, draw the J nodesU0,U1, · · · ,UJ−1

◮ for each edge U → V with w delays in the original Dfg,draw the J edges Ui → V(i+w)mod (J) with ⌊ i+w

J⌋ delays for

i = 0, 1, 2, · · · , J − 1

U0

U1

U2

V0

V1

V2

T0

T1

T2

U V

T

D

6D

5D

D

D

2D

2D

2D

2D

2D

Page 20: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Signal Processing Hardware Rules: ”No ControlPath”

◮ 1/z register stores at every clock cycle a new input sample

◮ if clause asks for controllable registers (with enable)

◮ let’s built it in Simulink: hardware rule

1z

Unit Delay

Register

D

clk

Q

Page 21: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Signal Processing Hardware Rules: ”No ControlPath”

◮ 1/z register stores at every clock cycle a new input sample

◮ if clause asks for controllable registers (with enable)

◮ let’s built it in Simulink: hardware rule

1z

Unit Delay

Register

D

clk

Q

Page 22: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Signal Processing Hardware Rules: ”No ControlPath”

◮ 1/z register stores at every clock cycle a new input sample

◮ if clause asks for controllable registers (with enable)

◮ let’s built it in Simulink: hardware rule

u

E

1z

Unit Delay

y

Enabled

1z

Unit Delay

Register

D

clk

Q

EnabledRegister

D

clk

Q

ena

Page 23: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Signal Processing Hardware Rules: ”No ControlPath”

◮ 1/z register stores at every clock cycle a new input sample

◮ if clause asks for controllable registers (with enable)

◮ let’s built it in Simulink: hardware rule

u

E

1z

Unit Delay

y

Enabled

1z

Unit Delay

Register

D

clk

Q

EnabledRegister

D

clk

Q

ena

Page 24: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Signal Processing Hardware Rules: ”No ControlPath”

◮ 1/z register stores at every clock cycle a new input sample

◮ if clause asks for controllable registers (with enable)

◮ let’s built it in Simulink: hardware rule

1z

Unit Delay

~=0

Switch

Register

D

clk

Q

EnabledRegister

D

clk

Q

ena

1z

Unit Delay

ena

DQ

Page 25: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Signal Processing Hardware Rules: ”No ControlPath”

◮ 1/z register stores at every clock cycle a new input sample

◮ if clause asks for controllable registers (with enable)

◮ let’s built it in Simulink: hardware rule

1z

Unit Delay

~=0

Switch

Register

D

clk

Q

EnabledRegister

D

clk

Q

ena

1z

Unit Delay

ena

DQ

Page 26: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Introduction to OCT: Features

◮ Oct is an optical signal acquisition and processing method

◮ micro-meter resolution in 3-D images

◮ optical scattering/reflecting media: biological tissues

◮ interferometric technique with near infrared laser

◮ reflection is caused by refraction index changes at tissueboundaries

◮ recent Oct technology is frequency domain Oct provideslow Snr and high speed signal acquisition

Page 27: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Introduction to OCT: Applications

◮ applications in medicine: ophthalmology, ...

◮ depth penetration of 1 to 3 mm (A-scan)

◮ speeds of 100 kS/s per depth scans at 2048 pixels, ≥ 200MS/s

◮ Oct image of pig eye at HuCE-optoLab (left),Oct setup with Gecko platform at HuCE-microLab (right)

Page 28: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Introduction to OCT: Principle

◮ low coherence source (Lcs)

◮ beam splitter (Bs)

◮ reference (Ref) and sample arm (Smp)

◮ diffraction grating (Dg) and full field camera Cam) asspectrometer (source wiki)

Page 29: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Introduction to OCT: Signals

◮ top: captured fourier domain Oct signals of A-scan◮ middle: signals after filtering and remapping◮ bottom: final A-scan image after inverse Fft

0 200 400 600 800 1000 12000

1

2

3

wave length [nm]

Inte

nsity

a.u

.

7.25 7.3 7.35 7.4 7.45 7.5 7.55 7.6 7.65 7.7 7.75

−0.5

0

0.5

1

wave number [1/um]

Inte

nsity

a.u

.

−1000 −800 −600 −400 −200 0 200 400 600 800 10000

0.05

0.1

0.15

0.2

depth z [um]

Inte

nsity

a.u

Page 30: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Signal Processing in OCT: Remapping 1

◮ Oct input signals are captured in λ (wave length) domain

◮ they have to be transformed into k (wave number) domain

◮ this process is called remapping

7.25 7.3 7.35 7.4 7.45 7.5 7.55 7.6 7.65 7.7 7.75

7.25

7.3

7.35

7.4

7.45

7.5

7.55

7.6

7.65

7.7

7.75

camparison of k (linear) and k = 2*pi/lambda(n)

linear k

Page 31: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Signal Processing in OCT: Remapping 2

◮ λ (wave length) from 810 nm to 870 nm◮ λ equidistant sampling in wave length: Ln◮ λ equidistant sampling in wave number: Lm

Ln-1 Ln Ln+1 Ln+2

Lm-1 Lm Lm+1

L (equidistant in L)

L (equidistant in k)

Lstep

valA

valBout(m)

input signal

remapped signal

◮ relation is: k = 2π/λ with

Lstep =λmax−λmin

NLn = λmin + n · Lstep

kstep =2π

λmin−

2πλmax

NLm = 2π

kmax−m·kstep

Page 32: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Signal Processing in OCT: Remapping 3

◮ signal processing with look-up table◮ no division with iteration◮ no error due to continuous summing

Ln-1 Ln Ln+1 Ln+2

Lm-1 Lm Lm+1

L (equidistant in L)

L (equidistant in k)

Lstep

valA

valBout(m)

input signal

remapped signal

outm = valA+ (valB−valA)Lstop

· (Lm − Ln)

outm = valA+ (valB− valA) · LUTk(addr)

Page 33: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Signal Processing in OCT: Control Path

◮ signal processing: data path and control path◮ for clause would be perfect◮ if clause in code asks for control path◮ control can also be done by look-up tables

Ln-1 Ln Ln+1 Ln+2

Lm-1 Lm Lm+1

L (equidistant in L)

L (equidistant in k)

Lstep

valA

valBout(m)

input signal (equidistant sampling in wave length)

remapped signal (equidistant sampling in wave number)

Lm+2

Page 34: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Signal Processing in OCT: Control Path

◮ signal processing: data path and control path◮ for clause would be perfect◮ if clause in code asks for control path◮ control can also be done by look-up tables

Ln-1 Ln Ln+1 Ln+2

Lm-1 Lm Lm+1

L (equidistant in L)

L (equidistant in k)

Lstep

valA

valBout(m)

input signal (equidistant sampling in wave length)

remapped signal (equidistant sampling in wave number)

Lm+2

Page 35: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Signal Processing in OCT: Datapath and ControlPath

1: i ← 1, j ← 1, m← 1, adr ← 12: while m ≤ 1024 do

3: varA← inp[i ]4: varB ← inp[i + 1]5: if lutCtr(adr − 1) 6= 2 then

6: outm(j)← varA+ (varB − varA) ∗ lutK (adr)7: if lutCtr(adr) = 0 increment input and output sample

index then

8: m← m + 19: i ← i + 1

10: else if lutCtr(adr) = 3 keep, do not load new input samplethen

11: m← m + 112: else if lutCtr(adr) = 2 skip, do not generate output

sample then

13: i ← i + 114: adr ← adr + 1

Page 36: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Signal Processing in OCT: Simulink

Page 37: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Signal Processing in OCT: Simulink

Page 38: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Signal Processing in OCT: ”No Control Path”

outm = valA+ (valB− valA) · LUTk(addr)

Page 39: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Signal Processing in OCT: ”No Control Path”

outm = valA+ (valB− valA) · LUTk(addr)

Page 40: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Signal Processing in OCT:Simplifications in Control Path

outm = valA+ (valB− valA) · LUTk(addr)

Page 41: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Unfolding: OCT Example 1

◮ OCT data flow graph for interpolation

◮ exercise: design a 4-slow unfolding

◮ simulate it with Matlab/Simulinik

in Mux

wr

Mux

wr

+

- *out+

D

D

D

D

D

D

D

lutKlutCTR

Page 42: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Unfolding: How to Model the FiFo?

◮ OCT data flow graph for interpolation

◮ exercise: 4-slow unfolding inlcuding control path

◮ what about the FiFos?

in Mux

wr

Mux

wr

+

- *Mux

wr

out+

not 3 not 2

+

LUT ctr

LUT k1

D

D

D

D

D

D

DD

D

D

D

2D 3D

D

D

1

?? D

push pop

FiFo ??

push pop

FiFo

Page 43: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

FiFo Model

◮ Dfg model of a FiFo

◮ the FiFo has to be decomposed downto delay elements andcombinational logic

push pop

FiFo

Mux

wr

D

D Mux

wr

D

D

push pop

dual portRAM

in out

adrWadrRD

D

1

D

D

1

in out

Page 44: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Unfolding the FiFo Model

◮ Dfg model of an 2-slow unfolding of FiFos

◮ impossible to compose again FiFos

◮ shall we start to re-implement all IP cores?

Mux

wr

Mux

wr

push

pop

dual portRAM

in out

adrWadrR

1

D

1

inout

Mux

wr

D

Mux

wr

pushpop

dual portRAM

in out

adrWadrR

11

inout

D D

D

D

Page 45: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Dft (Dtfs): Discrete Fourier Transform

◮ natural parallelization by Fft algos

◮ N-point Dft

X [k] =

N−1∑

n=0

x [n]W knN , k = 0, 1, 2, . . . ,N − 1

where WN =̂ Nth root of unity

WN =N√1 = e−j(2π/N)

◮ inverse transform

x [n] =1

N

N−1∑

k=0

X [k]W−knN , n = 0, 1, 2, . . . ,N − 1

Page 46: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Dft: Matrix Form◮ denote the vector of input samples by

x =(

x [0] , x [1] , x [2] , . . . , x [N − 1])T

◮ denote the vector of spectral samples by

X =(

X [0] , X [1] , X [2] , . . . , x [N − 1])T

◮ then the Dft can be written as

X = DFT (x) = Fx

with F =̂

1 1 1 · · · 1

1 WN W 2N · · · W N−1

N

1 W 2N W 2·2

N · · · W2·(N−1)N

...

1 W N−1N W

(N−1)·2N · · · W

(N−1)·(N−1)N

Page 47: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Dft: Low-Order Fourier Matrix Examples

◮ for N = 2: WN = W2 =2√1 = e−j2π/2 = e−jπ = −1

F2 =̂

(

1 1

1 W2

)

=

(

1 1

1 −1

)

◮ for N = 4: WN = W4 =4√1 = e−j2π/4 = e−jπ/2 = −j

F4 =̂

1 1 1 1

1 W4 W 24 W 3

4

1 W 24 W 2·2

4 W 2·34

1 W 34 W 3·2

4 W 3·34

=

1 1 1 1

1 −j −1 j

1 −1 1 −11 j −1 −j

Page 48: Mse: Hardware Algorithms Parallelization - :: microLab · Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-TiHuCE-microLab,

Mse: HardwareAlgorithms

Parallelization

Marcel Jacomet

Introduction

Parallelization

Unfolding

Hardware Rules

OCT Example

OCT Introduction

Parallelization atOCTExample

Data-Path Unfolding

FiFo Unfolding

DFT Unfolding

Dft: Matrix Factorization Fft

◮ for example N = 1024:

F1024 =̂

(

I512 D512

I512 −D512

)

·(

F512 O

O F512

)

·(

even

odd

)

where I512 =̂ identity matrix

D512 =̂ diag{

1,W1024,W21024, . . . ,W

5111024

}

F512 =̂ 512-point Fourier matrix

permutation at end separates even and odd part:

(↓) x =(

x [0] , x [2] , . . .)

(↓) (z) x =(

x [1] , x [3] , . . .)