Upload
elinor-shaw
View
216
Download
4
Embed Size (px)
Citation preview
Wang Chen , Dr. Miriam Leeser , Dr. Carey Rappaport [email protected] [email protected] [email protected]
Goal Speedup 3D Finite-Difference Time-
Domain (FDTD) Algorithm through the use of Field Programmable Gate Arrays
(FPGAs).
We have implemented the 2D FDTD on the FIREBIRD™/PCI board before. Now the New WILDSTAR™-II PRO/PCI board from Annapolis Micro Systems, Inc. is the target for our 3D FDTD hardware implementation.
Reconfigurable Hardware
Performance Result
FDTD Hardware Design Structure
Current Work
•We quantize the double floating-point precision data to fix-point data for hardware implementation according to data analysis.
•The Forward Model simulates the whole electromagnetic space and wave propagation in the model space with Ground Penetrating Radar, dispersive soil and rough air-soil surface.
•We Compare the relative error between floating-point Fortran code and fixed-point C code.
•We Choose the suitable bit-width considering the trade-off between accuracy and area.
PCHOST
Memory in PC
Memory in PC
WILDSTAR-II PRO/PCI BOARD
PCI BUS
On-BoardMEMORY
Material Parametersand Source Data
Xilinx Virtex-II Pro FPGA
DESIGNOn-BoardMEMORY
On-BoardMEMORY
Mem
ory
Inte
rfac
e
Simulated Electromagnetic Space
ElectricField
PipelineModule
MagneticField
PipelineModule
AbstractUnderstanding and predicting electromagnetic behavior is needed more and more in modern technology. The Finite-Difference Time-Domain (FDTD) method is a powerful computational electromagnetic technique for modelling electromagnetic space. However, the computation of this method is complex and time consuming. Implementing this algorithm in hardware will greatly increase its computational speed and widen its usage.
We present the first fixed-point 3D FDTD FPGA accelerator, which supports a wide range of materials including dispersive media. By analyzing the performance of fixed-point arithmetic in both soil-based media and human tissue media, we choose the right fixed-point representation to minimize the relative error between fixed-point and floating point results. The FPGA accelerator supports the UPML absorbing boundary conditions which have better performance in dispersive soil and human tissue media than PML boundary conditions.
The 3D FDTD design is implemented on a WildStarII-Pro FPGA board and experimental results is provided. The speedup is due to pipelining, parallelism, use of fixed point arithmetic, and careful memory architecture design.
Acceleration of the 3D FDTD Algorithm in Fixed- point Arithmetic using Reconfigurable Hardware
PCI
PCI BUS
WILDSTARTM-II PCI Pro
32/64 Bits 33/66/133 MHz
50
DDRDRAM
I/O80 20
32 32
DDRII/QDRIISRAM
36
Switches
32 32
I/O
36 36
36 36 36
DDRDRAM
80
32 32
DifferentialPairsSingle Ended
50
DDRII/QDRIISRAM
DDRII/QDRIISRAM
DDRII/QDRIISRAM
DDRII/QDRIISRAM
DDRII/QDRIISRAM
DDRII/QDRIISRAM
36 36 36
DDRII/QDRIISRAM
DDRII/QDRIISRAM
36 36 36
DDRII/QDRIISRAM
DDRII/QDRIISRAM
DDRII/QDRIISRAM
20
PE 1VIRTEXTM II Pro
XC2VP 70,100,125
PE 2VIRTEXTM II Pro
XC2VP 70,100,125
Rocket IO
WILDSTAR™-II PRO/PCI
Fixed-point components is faster in hardware design
Data range of the FDTD algorithm is good for the fixed-point representation
Block Diagram Architecture
EM FieldMemory
Pip
elin
e H
xs
Pip
elin
e H
ys
BlockRam
EM FieldMemory
EM FieldMemory
BlockRam
Pip
elin
e H
zs
EM FieldMemory
EM FieldMemory
EM FieldMemory
Pip
elin
e E
xs
Pip
elin
e E
ys
Pip
elin
e E
zs
Updating E Field
Free Space andLossy Dielectric
Lossy Soil orDispersive Media
PerfectConductor
x x
-+
Write to Memory
Read from Memory
x x xx
- +
Read from Memory
Set all P to zerox x
-+
Read from Memory
x
-
-x x
+
x x
-+
x
-- ----
Electric Field Updating Pipeline
Features of the WILDSTAR™-II PRO/PCI boards:
• Uses two Xilinx® Virtex-II Pro™ FPGAs XC2V70 (33088 slices and 5904Kb BlockRAM)
• 12 ports of DDR II SRAM totally 48MBytes, 2 ports of DDR SDRAM totally 256 MBytes
• 11 GBytes/sec memory bandwidth
FDTD Application Models
•The 3D FDTD Buried Object Detection Forward Model and Breast Cancer Detection Forward Model were developed by Panos Kosmas and Dr. Carey Rappaport of Northeastern University.
3D UPML FDTD Hardware Implementation
• Schneider et. al. implement the 1D FDTD on hardware, but the architecture is too simple.
• Durbano et. al. implement the 3D FDTD on hardware, but their design use floating-point representation which sacrifice the speed for the precision.
Memory Interface
This work was supported in part by CenSSIS, the Center for Subsurface Sensing and Imaging Systems, under the Engineering Research Centers Program of the National Science Foundation (Award Number EEC-9986821).
This work is a part of CenSSIS Research Thrust R3A. As we know, forward modeling of large complex scattering geometries is too slow for real-time applications or iterative solution of inverse problems. Our goal is to develop hardware/software implementation of forward modeling processing to achieve real-time inversion.
Research Level 1 Thrust R3A
• Our 3D FDTD implementation has 16 times speedup compared to 3.0G PC, using fixed-point representation and support dispervice media and UPML boundary conditions.
State of the Art
[1] Ryan N. Schneider et. al., ``Application of FPGA Technology to Accelerate the Finite-Difference Time-Domain (FDTD) Method'', Proceedings of the FPGA 2002, pp.97 - 105.
[2] J. P. Durbano et. al., ``FPGA-Based Acceleration of the 3D Finite-Difference Time-Domain Method”, Proceeding of the FCCM 2004, pp. 156-163.
Publications Acknowledging NSF Support
[1] W. Chen, P. Kosmas, M. Leeser, C. Rappaport, "An FPGA Implementation of the Two-Dimensional Finite-Difference Time-Domain (FDTD) Algorithm", Proceedings of the 2004 ACM International Symposium on Field-Programmable Gate Arrays, February 2004, Monterey, CA, USA, pp.213-222.
[2] Kosmas, P., Wang, Y., and Rappaport, C., ``Three-Dimensional FDTD Model for GPR Detection of Objects Buried in Realistic Dispersive Soil'', SPIE Aerosense Conference, Orlando, FL, April 2002, pp.330--338.
R2FundamentalScienceFundamentalScience
ValidatingTestBEDsValidatingTestBEDs
L1L1
L2L2
L3L3
R3
S1 S4 S5S3S2
Bio-Med Enviro-Civil
R1
Breast Cancer Detection Forward Model
Spiral Antenna Model
Spiral Antenna Floorplan FDTD Simulated 2D Space
Mine
X
Y
Z
Object X
Y
Z
Transmitting Antenna Receiving Antenna
Initialization Initialize parameters of model space
and time step Load all the EM space data into memory
Excitation
Calculate H Field
End
Time over?
Yes
No, Go to NextTime Step
n = n + 1
Exterior BoundaryConditions
Calculate E Field
Buried Object Detection Forward Model
0.000%
0.500%
1.000%
1.500%
2.000%
2.500%
3.000%
3.500%
29 31 33 35
Electric Field Value Ex
Magnetic Field Value Hy
Magnetic Field Value Hz
Bit-width after the Binary Point
Relative Error between Fixed-pointand Floating-point Representation
Error Analysis on Fixed-point Representation
Geometry Map
Simulated Model Space
Spiral Antenna Floorplan
Accurate computational modelling of microwaves in human tissue with the FDTD method is very helpful for breast cancer detection research. This model uses the modified 3D FDTD algorithm and the modified UPML ABC for better performance in dispersive human tissue
Use the FDTD method to simulate the radiation of the Archimedean spiral antenna.
05
101520253035404550
A B
Performance Result
Exe
cutin
g T
ime
(Se
cond
)
A Software Floating-point ~~ 49s Fortran code at 3.0GHz PC
B Hardware - WildStar-II Pro ~~ 2.98s Design working at 90MHz 16X Speedup
3D UPML FDTD algorithmModel space 50*50*50 cellsIterate 500 time steps
3D UPML FDTD algorithmModel space 50*50*50 cells, Iterate 500 time stepsTotal Task: 62.5 Million Nodes
Optimize 3D FDTD Implementation Two FPGA parallel computing on board 3D UPML FDTD accelerator for general FDTD problems.
More Generic Hardware Design Support More Complex Sources Better User Interface
4 X 3 Rows of DataRead approximately 2 Rowswhile Calculating 1 Row
Level 2 CacheSRAM
Level 1 CacheBlockRAM
Input
Level 2 CacheSRAM
Level 1 CacheBlockRAM
Input
4 Rows of DataRead 1 Row while Calculating 1 Row andWrite out 1 Row of Data at the same time
Level 1 CacheBlockRAM
Output
E
H
New H
New E, H
Level 1 CacheBlockRAM Input
Level 1 CacheBlockRAM Output