1
Wang Chen , Dr. Miriam Leeser , Dr. Carey Rappaport [email protected] [email protected] [email protected] Goal Speedup 3D Finite-Difference Time- Domain (FDTD) Algorithm through the use of Field Programmable Gate Arrays (FPGAs). We have implemented the 2D FDTD on the FIREBIRD™/PCI board before. Now the New WILDSTAR™-II PRO/PCI board from Annapolis Micro Systems, Inc. is the target for our 3D FDTD hardware implementation. Reconfigurable Hardware Performance Result FDTD Hardware Design Structure Current Work •We quantize the double floating-point precision data to fix-point data for hardware implementation according to data analysis. •The Forward Model simulates the whole electromagnetic space and wave propagation in the model space with Ground Penetrating Radar, dispersive soil and rough air- soil surface. •We Compare the relative error between floating-point Fortran code and fixed-point C code. •We Choose the suitable bit-width considering the trade-off between accuracy and area. PC HOST M em ory in PC M em ory in PC W ILD STAR -IIPR O /PC IBO AR D PCIBUS O n-Board MEMORY M aterial Param eters and Source D ata Xilinx Virtex-IIPro FPGA D ESIG N O n-Board MEMORY O n-Board MEMORY M em ory Interface Sim ulated Electrom agnetic Space Electric Field Pipeline Module Magnetic Field Pipeline Module Abstract Understanding and predicting electromagnetic behavior is needed more and more in modern technology. The Finite-Difference Time-Domain (FDTD) method is a powerful computational electromagnetic technique for modelling electromagnetic space. However, the computation of this method is complex and time consuming. Implementing this algorithm in hardware will greatly increase its computational speed and widen its usage. We present the first fixed-point 3D FDTD FPGA accelerator, which supports a wide range of materials including dispersive media. By analyzing the performance of fixed-point arithmetic in both soil-based media and human tissue media, we choose the right fixed-point representation to minimize the relative error between fixed-point and floating point results. The FPGA accelerator supports the UPML absorbing boundary conditions which have better performance in dispersive soil and human tissue media than PML boundary conditions. The 3D FDTD design is implemented on a WildStarII-Pro FPGA board and experimental results is provided. The speedup is due to pipelining, parallelism, use of fixed point arithmetic, and careful memory architecture design. Acceleration of the 3D FDTD Algorithm in Fixed- point Arithmetic using Reconfigurable Hardware PCI PCIBUS W ILDSTAR TM -IIPC IPro 32/64 Bits 33/66/133 M Hz 50 DDR DRAM I/O 80 20 32 32 DDR II/ Q D RII SRAM 36 Sw itches 32 32 I/O 36 36 36 36 36 DDR DRAM 80 32 32 D ifferentialPairs Single Ended 50 D DR II/ Q D RII SRAM D DRII/ Q DRII SRAM DDR II/ Q D RII SRAM DD RII/ Q D RII SRAM D DRII/ Q DRII SRAM DDR II/ Q DRII SRAM 36 36 36 D DR II/ Q DR II SRAM D DR II/ Q DR II SRAM 36 36 36 DD RII/ Q D RII SRAM DD RII/ Q DRII SRAM D DR II/ Q DRII SRAM 20 PE 1 VIR TEX TM IIPro XC 2VP 70,100,125 PE 2 VIR TEX TM IIPro XC 2VP 70,100,125 R ocketIO WILDSTAR™-II PRO/PCI Fixed-point components is faster in hardware design Data range of the FDTD algorithm is good for the fixed-point representation Block Diagram Architecture EM Field M em ory P ipeline H xs P ipeline H ys BlockRam EM Field M em ory EM Field M em ory BlockRam P ipeline H zs EM Field M em ory EM Field M em ory EM Field M em ory P ipeline E xs P ipeline E ys P ipeline E zs Updating E Field Free Space and Lossy Dielectric Lossy Soil or Dispersive M edia Perfect Conductor x x - + W rite to M emory R ead from M em ory x x x x - + R ead from Mem ory Set all P to zero x x - + R ead from M em ory x - - x x + x x - + x - - - - - - Electric Field Updating Pipeline Features of the WILDSTAR™-II PRO/PCI boards: • Uses two Xilinx® Virtex-II Pro™ FPGAs XC2V70 (33088 slices and 5904Kb BlockRAM) • 12 ports of DDR II SRAM totally 48MBytes, 2 ports of DDR SDRAM totally 256 MBytes • 11 GBytes/sec memory bandwidth FDTD Application Models •The 3D FDTD Buried Object Detection Forward Model and Breast Cancer Detection Forward Model were developed by Panos Kosmas and Dr. Carey Rappaport of Northeastern University. 3D UPML FDTD Hardware Implementation • Schneider et. al. implement the 1D FDTD on hardware, but the architecture is too simple. • Durbano et. al. implement the 3D FDTD on hardware, but their design use floating-point representation which sacrifice the speed for the precision. Memory Interface This work was supported in part by CenSSIS, the Center for Subsurface Sensing and Imaging Systems, under the Engineering Research Centers Program of the National Science Foundation (Award Number EEC-9986821). This work is a part of CenSSIS Research Thrust R3A. As we know, forward modeling of large complex scattering geometries is too slow for real-time applications or iterative solution of inverse problems. Our goal is to develop hardware/software implementation of forward modeling processing to achieve real-time inversion. Research Level 1 Thrust R3A • Our 3D FDTD implementation has 16 times speedup compared to 3.0G PC, using fixed-point representation and support dispervice media and UPML boundary conditions. State of the Art [1] Ryan N. Schneider et. al., ``Application of FPGA Technology to Accelerate the Finite-Difference Time-Domain (FDTD) Method'', Proceedings of the FPGA 2002, pp.97 - 105. [2] J. P. Durbano et. al., ``FPGA-Based Acceleration of the 3D Finite-Difference Time-Domain Method”, Proceeding of the FCCM 2004, pp. 156-163. Publications Acknowledging NSF Support [1] W. Chen, P. Kosmas, M. Leeser, C. Rappaport, "An FPGA Implementation of the Two-Dimensional Finite-Difference Time-Domain (FDTD) Algorithm", Proceedings of the 2004 ACM International Symposium on Field-Programmable Gate Arrays, February 2004, Monterey, CA, USA, pp.213-222. [2] Kosmas, P., Wang, Y., and Rappaport, C., ``Three-Dimensional FDTD Model for GPR Detection of Objects Buried in Realistic Dispersive Soil'', SPIE Aerosense Conference, Orlando, FL, April 2002, pp.330--338. R2 Fundamental Science Validating TestBEDs L1 L2 L3 R3 S1 S4 S5 S3 S2 Bio-Med Enviro-Civil R1 Breast Cancer Detection Forward Model Spiral Antenna Model Spiral Antenna Floorplan FDTD Sim ulated 2D Space Mine X Y Z O bject X Y Z Transm itting A ntenna R eceiving A ntenna Initialization Initialize param eters ofm odel space and tim e step Load all the E M space data into m emory Excitation C alculate H Field E nd Tim e over? Yes N o,G o to N ext Tim e Step n = n + 1 ExteriorBoundary Conditions C alculate E Field Buried Object Detection Forward Model 0.000% 0.500% 1.000% 1.500% 2.000% 2.500% 3.000% 3.500% 29 31 33 35 Electric Field V alue Ex M agnetic Field V alue Hy M agnetic Field V alue Hz B it-w idth afterthe B inary Point ErrorA nalysis on Fixed-pointR epresentation G eom etry M ap Sim ulated M odel Space Spiral Antenna Floorplan Accurate computational modelling of microwaves in human tissue with the FDTD method is very helpful for breast cancer detection research. This model uses the modified 3D FDTD algorithm and the modified UPML ABC for better performance in dispersive human tissue Use the FDTD method to simulate the radiation of the Archimedean spiral antenna. 0 5 10 15 20 25 30 35 40 45 50 A B P erform ance Result Executing Tim e (Second) A Softw are Floating-point ~~ 49s Fortran code at3.0G H z PC B H ardw are -W ildStar-IIPro ~~ 2.98s D esign w orking at90M Hz 16X Speedup 3D UPM L FD TD algorithm M odel space 50*50*50 cells Iterate 500 tim e steps 3D UPM L FD TD algorithm M odelspace 50*50*50 cells,Iterate 500 tim e steps TotalTask:62.5 M illion N odes Optimize 3D FDTD Implementation Two FPGA parallel computing on board 3D UPML FDTD accelerator for general FDTD problems. More Generic Hardware Design Support More Complex Sources Better User Interface 4 X 3 R ow s ofD ata R ead approxim ately 2 R ow s w hile C alculating 1 R ow Level 2 C ache SRAM Level 1 C ache BlockR AM Input Level 2 C ache SRAM Level 1 C ache BlockR AM Input 4 R ow s ofD ata R ead 1 R ow w hile C alculating 1 R ow and W rite out1 R ow ofD ata atthe sam e tim e Level 1 C ache BlockR AM Output E H New H New E,H Level 1 C ache BlockR AM Input Level 1 C ache BlockR AM O utput

Wang Chen, Dr. Miriam Leeser, Dr. Carey Rappaport [email protected] [email protected] [email protected] Goal Speedup 3D Finite-Difference Time-Domain

Embed Size (px)

Citation preview

Page 1: Wang Chen, Dr. Miriam Leeser, Dr. Carey Rappaport wchen@ece.neu.edu mel@ece.neu.edu rappaport@ece.neu.edu Goal Speedup 3D Finite-Difference Time-Domain

Wang Chen , Dr. Miriam Leeser , Dr. Carey Rappaport [email protected] [email protected] [email protected]

Goal Speedup 3D Finite-Difference Time-

Domain (FDTD) Algorithm through the use of Field Programmable Gate Arrays

(FPGAs).

We have implemented the 2D FDTD on the FIREBIRD™/PCI board before. Now the New WILDSTAR™-II PRO/PCI board from Annapolis Micro Systems, Inc. is the target for our 3D FDTD hardware implementation.

Reconfigurable Hardware

Performance Result

FDTD Hardware Design Structure

Current Work

•We quantize the double floating-point precision data to fix-point data for hardware implementation according to data analysis.

•The Forward Model simulates the whole electromagnetic space and wave propagation in the model space with Ground Penetrating Radar, dispersive soil and rough air-soil surface.

•We Compare the relative error between floating-point Fortran code and fixed-point C code.

•We Choose the suitable bit-width considering the trade-off between accuracy and area.

PCHOST

Memory in PC

Memory in PC

WILDSTAR-II PRO/PCI BOARD

PCI BUS

On-BoardMEMORY

Material Parametersand Source Data

Xilinx Virtex-II Pro FPGA

DESIGNOn-BoardMEMORY

On-BoardMEMORY

Mem

ory

Inte

rfac

e

Simulated Electromagnetic Space

ElectricField

PipelineModule

MagneticField

PipelineModule

AbstractUnderstanding and predicting electromagnetic behavior is needed more and more in modern technology. The Finite-Difference Time-Domain (FDTD) method is a powerful computational electromagnetic technique for modelling electromagnetic space. However, the computation of this method is complex and time consuming. Implementing this algorithm in hardware will greatly increase its computational speed and widen its usage.

We present the first fixed-point 3D FDTD FPGA accelerator, which supports a wide range of materials including dispersive media. By analyzing the performance of fixed-point arithmetic in both soil-based media and human tissue media, we choose the right fixed-point representation to minimize the relative error between fixed-point and floating point results. The FPGA accelerator supports the UPML absorbing boundary conditions which have better performance in dispersive soil and human tissue media than PML boundary conditions.

The 3D FDTD design is implemented on a WildStarII-Pro FPGA board and experimental results is provided. The speedup is due to pipelining, parallelism, use of fixed point arithmetic, and careful memory architecture design.

Acceleration of the 3D FDTD Algorithm in Fixed- point Arithmetic using Reconfigurable Hardware

PCI

PCI BUS

WILDSTARTM-II PCI Pro

32/64 Bits 33/66/133 MHz

50

DDRDRAM

I/O80 20

32 32

DDRII/QDRIISRAM

36

Switches

32 32

I/O

36 36

36 36 36

DDRDRAM

80

32 32

DifferentialPairsSingle Ended

50

DDRII/QDRIISRAM

DDRII/QDRIISRAM

DDRII/QDRIISRAM

DDRII/QDRIISRAM

DDRII/QDRIISRAM

DDRII/QDRIISRAM

36 36 36

DDRII/QDRIISRAM

DDRII/QDRIISRAM

36 36 36

DDRII/QDRIISRAM

DDRII/QDRIISRAM

DDRII/QDRIISRAM

20

PE 1VIRTEXTM II Pro

XC2VP 70,100,125

PE 2VIRTEXTM II Pro

XC2VP 70,100,125

Rocket IO

WILDSTAR™-II PRO/PCI

Fixed-point components is faster in hardware design

Data range of the FDTD algorithm is good for the fixed-point representation

Block Diagram Architecture

EM FieldMemory

Pip

elin

e H

xs

Pip

elin

e H

ys

BlockRam

EM FieldMemory

EM FieldMemory

BlockRam

Pip

elin

e H

zs

EM FieldMemory

EM FieldMemory

EM FieldMemory

Pip

elin

e E

xs

Pip

elin

e E

ys

Pip

elin

e E

zs

Updating E Field

Free Space andLossy Dielectric

Lossy Soil orDispersive Media

PerfectConductor

x x

-+

Write to Memory

Read from Memory

x x xx

- +

Read from Memory

Set all P to zerox x

-+

Read from Memory

x

-

-x x

+

x x

-+

x

-- ----

Electric Field Updating Pipeline

Features of the WILDSTAR™-II PRO/PCI boards:

• Uses two Xilinx® Virtex-II Pro™ FPGAs XC2V70 (33088 slices and 5904Kb BlockRAM)

• 12 ports of DDR II SRAM totally 48MBytes, 2 ports of DDR SDRAM totally 256 MBytes

• 11 GBytes/sec memory bandwidth

FDTD Application Models

•The 3D FDTD Buried Object Detection Forward Model and Breast Cancer Detection Forward Model were developed by Panos Kosmas and Dr. Carey Rappaport of Northeastern University.

3D UPML FDTD Hardware Implementation

• Schneider et. al. implement the 1D FDTD on hardware, but the architecture is too simple.

• Durbano et. al. implement the 3D FDTD on hardware, but their design use floating-point representation which sacrifice the speed for the precision.

Memory Interface

This work was supported in part by CenSSIS, the Center for Subsurface Sensing and Imaging Systems, under the Engineering Research Centers Program of the National Science Foundation (Award Number EEC-9986821).

This work is a part of CenSSIS Research Thrust R3A. As we know, forward modeling of large complex scattering geometries is too slow for real-time applications or iterative solution of inverse problems. Our goal is to develop hardware/software implementation of forward modeling processing to achieve real-time inversion.

Research Level 1 Thrust R3A

• Our 3D FDTD implementation has 16 times speedup compared to 3.0G PC, using fixed-point representation and support dispervice media and UPML boundary conditions.

State of the Art

[1] Ryan N. Schneider et. al., ``Application of FPGA Technology to Accelerate the Finite-Difference Time-Domain (FDTD) Method'', Proceedings of the FPGA 2002, pp.97 - 105.

[2] J. P. Durbano et. al., ``FPGA-Based Acceleration of the 3D Finite-Difference Time-Domain Method”, Proceeding of the FCCM 2004, pp. 156-163.

Publications Acknowledging NSF Support

[1] W. Chen, P. Kosmas, M. Leeser, C. Rappaport, "An FPGA Implementation of the Two-Dimensional Finite-Difference Time-Domain (FDTD) Algorithm", Proceedings of the 2004 ACM International Symposium on Field-Programmable Gate Arrays, February 2004, Monterey, CA, USA, pp.213-222.

[2] Kosmas, P., Wang, Y., and Rappaport, C., ``Three-Dimensional FDTD Model for GPR Detection of Objects Buried in Realistic Dispersive Soil'', SPIE Aerosense Conference, Orlando, FL, April 2002, pp.330--338.

R2FundamentalScienceFundamentalScience

ValidatingTestBEDsValidatingTestBEDs

L1L1

L2L2

L3L3

R3

S1 S4 S5S3S2

Bio-Med Enviro-Civil

R1

Breast Cancer Detection Forward Model

Spiral Antenna Model

Spiral Antenna Floorplan FDTD Simulated 2D Space

Mine

X

Y

Z

Object X

Y

Z

Transmitting Antenna Receiving Antenna

Initialization Initialize parameters of model space

and time step Load all the EM space data into memory

Excitation

Calculate H Field

End

Time over?

Yes

No, Go to NextTime Step

n = n + 1

Exterior BoundaryConditions

Calculate E Field

Buried Object Detection Forward Model

0.000%

0.500%

1.000%

1.500%

2.000%

2.500%

3.000%

3.500%

29 31 33 35

Electric Field Value Ex

Magnetic Field Value Hy

Magnetic Field Value Hz

Bit-width after the Binary Point

Relative Error between Fixed-pointand Floating-point Representation

Error Analysis on Fixed-point Representation

Geometry Map

Simulated Model Space

Spiral Antenna Floorplan

Accurate computational modelling of microwaves in human tissue with the FDTD method is very helpful for breast cancer detection research. This model uses the modified 3D FDTD algorithm and the modified UPML ABC for better performance in dispersive human tissue

Use the FDTD method to simulate the radiation of the Archimedean spiral antenna.

05

101520253035404550

A B

Performance Result

Exe

cutin

g T

ime

(Se

cond

)

A Software Floating-point ~~ 49s Fortran code at 3.0GHz PC

B Hardware - WildStar-II Pro ~~ 2.98s Design working at 90MHz 16X Speedup

3D UPML FDTD algorithmModel space 50*50*50 cellsIterate 500 time steps

3D UPML FDTD algorithmModel space 50*50*50 cells, Iterate 500 time stepsTotal Task: 62.5 Million Nodes

Optimize 3D FDTD Implementation Two FPGA parallel computing on board 3D UPML FDTD accelerator for general FDTD problems.

More Generic Hardware Design Support More Complex Sources Better User Interface

4 X 3 Rows of DataRead approximately 2 Rowswhile Calculating 1 Row

Level 2 CacheSRAM

Level 1 CacheBlockRAM

Input

Level 2 CacheSRAM

Level 1 CacheBlockRAM

Input

4 Rows of DataRead 1 Row while Calculating 1 Row andWrite out 1 Row of Data at the same time

Level 1 CacheBlockRAM

Output

E

H

New H

New E, H

Level 1 CacheBlockRAM Input

Level 1 CacheBlockRAM Output