1
Graphical Design Environment for a Reconfigurable Processor IAmE Abstract The Field Programmable Processor Array (FPPA) is a new reconfigurable architecture developed by NASA/GSFC and the University of Idaho under ESTO funding. FPPA architecture promises high- throughput, radiation-tolerant, low-power data processing, for spacecraft instruments. FPPA implements a synchronous integer data flow computational model, which is not easily captured in procedural languages like C, but is easy to represent graphically. This motivates our Simulink- based design environment for the FPPA. In a process familiar to all Simulink users, the algorithm designer selects functional blocks from the menu, places them on a work screen, and connects them by drawing interconnect lines. A click of a button executes the simulation. The goals of this effort are to implement the following: 1. Verify algorithm; this is the familiar Simulink operational mode, which runs the simulation, invoking underlying Matlab functions and verifying the functional correctness of the program. 2. Translate to FPPA; incorporating design parameters such as value ranges and topology, the software will translate the floating point Matlab representation to the FPPA fixed point in an optimal fashion, and generate an interface to the FPPASim simulator software. 3. Verify the FPPA implementation; The designer now executes a simulation that invokes the FPPASim program, which faithfully duplicates the FPPA behavior. 4. Generate FPPA code; when the implementation has been verified, the software will map the design to FPPA configuration and run-time code, enabling the design to be ported to FPPA chips. FPPA architecture An embedded data processor VLSI chip for spacecraft: Radiation-tolerant, 0.25m CMOS process Fixed point processing elements Implements a reconfigurable synchronous data flow processor 1. Run-time reconfigurable 2. Extensible by tiling multiple chips Serves as accelerator to a host CPU Features: 16 configurable on-board Processing Elements Four 16-bit-wide, bidirectional I/O ports One 16-bit-wide dedicated output port On-board program memory and execution unit Application development: Text base development 1. Configuration and Run-Time compilers 2. Standalone functional simulator FPPA Simulink graphical design environment (GUI) Processing Element components Components: 17 bits multipliers ALU Data format Primary and secondary output Conditional output select module Delay elements Design Flow Note: SIFOpt tool is a result of David M. Buehler dissertation at the University of Idaho. BSEL0 PE00 PE01 PE03 PE02 PE10 PE1 1 PE13 PE12 PE30 PE31 PE33 PE32 PE24 PE21 PE23 PE22 LBUS0 LBUS1 LBUS2 LBUS3 XBAR BSEL1 BSEL2 BSEL3 DOM IOM0 IOM2 IOM3 IOM1 Figure 1: FPPA architecture General model of the Processing Element (PE) Behavior of the PE: The PE works in two different modes; configuration and runtime. During the configuration mode; C0, C1, Datapath and Runtime as shown in figure 4 are configures to a giving topology as well as a sequence of enable and disable of the PE. During the runtime mode, the PE take input data i.e. X,Y,W shown in figure 4 and produce an output base on the configured topology as well as the status of the PE i.e. enable or disable. Configure PE with Simulink Graphical Design Environment The PE can perform numerical computation as well as logic computation. As shown in Figure 5 is a sample of what Processing Element components the PE can do with both numerical and logical computation. mul_X mul_Y MUL_OUT Format ALU X Y alu_Y ALU_OUT alu_X Format Conditional Output Select Secondar y Output 16 16 16 16 Primary Output Input s Control Output 16 16 16 16 Delay Elements Figure 2: A look at the Processing Element architecture and its components Tu Le Institute of Advanced Microelectronics ECE/CAMBR University of Idaho [email protected] Gregory Donohoe Institute of Advanced Microelectronics ECE/CAMBR University of Idaho [email protected] David M. Buehler Institute of Advanced Microelectronics ECE/CAMBR University of Idaho [email protected] Pen-Shu Yeh NASA GSFC Code 567 [email protected] Configuration Input Processing Element (PE) Function (X, Y, W, C0, C1, DP, RT) => output 1/Z 1/Z 1/Z X Y W Data Path (DP) Run Time (RT) Output Constants (C0, C1) Figure 4: General model of the Processing Element Algorithm Simulink Model (floating point) • Design Data Path • Provide Input data • Data format • Run time FPPA C++ simulator (fixed point model) SIFOpt ConfigASM Data Path Data format Run Time PERL (floating fixed point) Validation result result Golden model Figure 3: Design flow of the graphical design environment for a reconfigurable processor 1. Unconditional PE Delay Shift right or left (X + Y) (X – Y) (X + Y) * Z (X - Y) * Z (X*Y – Z) (X*Y + Z) C0, C1 2. Conditional PE If (condition) then Perform task A Else Perform task B Figure 5: Unconditional and conditional PE configuration window Example Application using the FPPA Multi-rate filter bank: Each of the low pass filters shown in figure 6 made up by the four taps FIR filter with the debenchies coefficient. Figure 6 shown a filter bank, which is a portion of the circuit that implement data compression using debenchies coefficient, filter bank and the concept of wavelet decomposition. For this example, especially shown in figure 7 that FPPA can configure to be the filter bank with ease through the help of the simulink graphic interface. Down sampling at each of the levels i.e. L1, L2, L3 and L4 are accomplished by enable or disable desirable processing element. Figure 8 shown the output result at each of the down sample levels and the source signal. Low Pass L Dec by 2 Source Low Pass L Dec by 2 Low Pass L Dec by 2 Low Pass L Dec by 2 L1 L2 L3 L4 Figure 6: Block diagram of the multi-rate filter bank 1 Run Time: Down Sampling 01 0001 00000001 0000000000000001 Figure 7: Multi-rate filter implementation using the FPPA Figure 8: Source signal and the output signals at each of the down sampling levels

Graphical Design Environment for a Reconfigurable Processor IAmE Abstract The Field Programmable Processor Array (FPPA) is a new reconfigurable architecture

Embed Size (px)

Citation preview

Page 1: Graphical Design Environment for a Reconfigurable Processor IAmE Abstract The Field Programmable Processor Array (FPPA) is a new reconfigurable architecture

Graphical Design Environment for a Reconfigurable ProcessorIAmE

Abstract

The Field Programmable Processor Array (FPPA) is a new reconfigurable architecture developed by NASA/GSFC and the University of Idaho under ESTO funding. FPPA architecture promises high-throughput, radiation-tolerant, low-power data processing, for spacecraft instruments.

FPPA implements a synchronous integer data flow computational model, which is not easily captured in procedural languages like C, but is easy to represent graphically. This motivates our Simulink-based design environment for the FPPA. In a process familiar to all Simulink users, the algorithm designer selects functional blocks from the menu, places them on a work screen, and connects them by drawing interconnect lines. A click of a button executes the simulation. The goals of this effort are to implement the following:

1. Verify algorithm; this is the familiar Simulink operational mode, which runs the simulation, invoking underlying Matlab functions and verifying the

functional correctness of the program.

2. Translate to FPPA; incorporating design parameters such as value ranges and topology, the software will translate the floating point Matlab representation to the FPPA fixed point in an optimal fashion, and generate an interface to the FPPASim simulator software.

3. Verify the FPPA implementation; The designer now executes a simulation that invokes the FPPASim

program, which faithfully duplicates the FPPA behavior.

4. Generate FPPA code; when the implementation

has been verified, the software will map the design to FPPA configuration and run-time code, enabling the design to be ported to FPPA chips.

FPPA architecture

An embedded data processor VLSI chip for spacecraft:• Radiation-tolerant, 0.25m CMOS process• Fixed point processing elements• Implements a reconfigurable synchronous data flow processor

1. Run-time reconfigurable2. Extensible by tiling multiple chips

• Serves as accelerator to a host CPU

Features:• 16 configurable on-board Processing Elements • Four 16-bit-wide, bidirectional I/O ports• One 16-bit-wide dedicated output port• On-board program memory and execution unit

Application development:• Text base development

1. Configuration and Run-Time compilers2. Standalone functional simulator

• FPPA Simulink graphical design environment (GUI)

Processing Element components

Components:• 17 bits multipliers• ALU• Data format• Primary and secondary output• Conditional output select module• Delay elements

Design Flow

Note:SIFOpt tool is a result ofDavid M. Buehler dissertationat the University of Idaho.

BSEL0

PE00 PE01

PE03 PE02

PE10 PE11

PE13 PE12

PE30 PE31

PE33 PE32

PE24 PE21

PE23 PE22

LBUS0 LBUS1

LBUS2LBUS3

XBAR

BSEL1

BSEL2

BSEL3

DOM

IOM0

IOM2

IOM3

IOM1

Figure 1: FPPA architecture

General model of the Processing Element (PE)

Behavior of the PE:The PE works in two different modes;configuration and runtime. During the configuration mode; C0, C1, Datapath and Runtime as shownin figure 4 are configures to a giving topology as well as a sequence of enable and disable of the PE.

During the runtime mode, the PE takeinput data i.e. X,Y,W shown in figure 4and produce an output base on theconfigured topology as well asthe status of the PE i.e. enable or disable.

Configure PE with Simulink Graphical Design Environment

The PE can perform numericalcomputation as well as logic computation.As shown in Figure 5 is a sample of what Processing Element componentsthe PE can do with both numerical and logical computation.

mul_Xmul_Y

MUL_OUT

Format

ALUXY

alu_Y

ALU_OUT

alu_X

Format

Conditional Output Select

SecondaryOutput

16 16

16 16

PrimaryOutput

Inputs

ControlOutput

16 16

16 16

Delay Elements

Figure 2: A look at the Processing Element architecture and its components

Tu LeInstitute of Advanced Microelectronics ECE/CAMBRUniversity of [email protected]

Gregory DonohoeInstitute of Advanced Microelectronics ECE/CAMBRUniversity of [email protected]

David M. BuehlerInstitute of Advanced Microelectronics ECE/CAMBRUniversity of [email protected]

Pen-Shu YehNASA GSFC Code [email protected]

Configuration

Input Processing Element (PE)

Function (X, Y, W, C0, C1, DP, RT) => output

1/Z

1/Z

1/Z

X

Y

W

Data Path(DP)

Run Time(RT)

Output

Constants(C0, C1)

Figure 4: General model of the Processing Element

AlgorithmSimulink

Model (floating point)

• Design Data Path• Provide Input data

• Data format• Run time

FPPA C++ simulator (fixed point model)SIFOpt ConfigASM

Data Path

Data format

Run Time

PERL(floating fixed point)

Validation

result

result Golden model

Figure 3: Design flow of the graphical design environment for a reconfigurable processor

1. Unconditional PE

• Delay

• Shift right or left

• (X + Y)

• (X – Y)

• (X + Y) * Z

• (X - Y) * Z

• (X*Y – Z)

• (X*Y + Z)

• C0, C1

2. Conditional PE

If (condition) then Perform task A Else Perform task B

Figure 5: Unconditional and conditional PE configuration window

Example Application using the FPPA

Multi-rate filter bank:Each of the low pass filters shown in figure 6 made upby the four taps FIR filter with the debenchies coefficient.

Figure 6 shown a filter bank, which is a portion of the circuit that implement data compression using debenchies coefficient, filter bank and the concept of wavelet decomposition. For this example, especially shown in figure 7 that FPPA can configure to be the filter bank with ease through the help of the simulink graphic interface. Down sampling at each of the levels i.e. L1, L2, L3 and L4 are accomplished by enable or disable desirable processing element. Figure 8 shown the output result at each of the down sample levels and the source signal.

Low Pass

L Dec by 2

Source

Low Pass

L Dec by 2 Low Pass

L Dec by 2 Low Pass

L Dec by 2

L1 L2 L3 L4

Figure 6: Block diagram of the multi-rate filter bank

1

Run Time: Down Sampling

01

0001

00000001

0000000000000001

Figure 7: Multi-rate filter implementation using the FPPA

Figure 8: Source signal and the output signals at each of the down sampling levels