32
Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

Embed Size (px)

Citation preview

Page 1: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

Paper Review ICoarse Grained

Reconfigurable Arrays

Presented By: Matthew MayhewI.D.# 0234815

ENG*6530Tues, June, 10, 2008

1

Page 2: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

References

1. Link 2: Chapter 2: Coarse-Grained Reconfigurable Architectures

2. Parizi, H.; Niktash, A.; Bagherzadeh, N,; Kurdahi, F.; MorphoSys: A Coarse Grain Reconfigurable Architecture for Multimedia Applications, Euro-Par 2002 Parallel Processing. 8th International Euro-Par Conference. Proceedings (Lecture Notes in Computer Science Vol.2400), 2002, p 844-8

2

Page 3: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

References Cont.3. Sadasivam, M.; Hong, S.; Application Specific Coarse-

Grained FPGA for Processing Element in Real-Time Parallel Particle Filters, Proceedings 3rd IEEE International Workshop on System-on-Chip for Real-Time Applications, 2003, p 116-19

4. Veredes, F,; Scheppler, M.; Moffat, W.; Mei, B.; Custom Implementation of the Coarse-Grained Reconfigurable ADRES Architecture for Multimedia Purposes, Proceedings. 2005 International Conference on Field Programmable Logic and Applications (IEEE Cat. No.05EX1155), 2005, p 106-11

3

Page 4: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

Overview Introduction Basic Concepts

Classifications General Architectures

Research Architectures MorphoSys Architecture for Dynamically Reconfigurable

Embedded System (ADRES) Coarse Grained FPGA for parallel partical processing

Project Summary

4

Page 5: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

Problems with Fine Grained FPGAs Wide datapaths constructed of bit level

elements to allow for processing on individual bits.

Requires a high volume of reconfiguration data for the processing elements and routing switches.

Difficulty in mapping from high level languages due to the difference in granularity.

5

Page 6: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

Coarse Grained Architectures Constructed from multi-bit wide

datapaths and complex operators. Wide datapath allows for the

implementation of complex operators, reducing routing overhead

Connections in CGRA processing elements have widths of multiple bits. As such, each connection takes more area, but fewer connections are needed.

6

Page 7: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

Classification of Architectures Coarse Grained Architectures are

classified based on three criteria: Interconnect Structure

Mesh-based Linear Array Crossbar

Datapath Width Tradeoff between flexibility and area consumption

Reconfiguration Method Static Dynamic

7

Page 8: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

Basic Architectures: Mesh-Based

Processing Elements arranged in a rectangular array with horizontal and vertical connections.

8

Page 9: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

Mesh-Based Continued

Structure allows for good parallelism and use of communication resources.

Requires good tools for Place and Route.

Arrangement encourages Nearest Neighbour (NN) links, but generally has lines for longer connections.

9

Page 10: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

Basic Architectures: Linear Array

Processing elements arranged in a linear fashion with neighbours generally connected.

Generally designed for the implementation of pipelined processes.

10

Page 11: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

Basic Architectures: Crossbar

All Processing Elements connected by a matrix of switches, allowing for arbitrary connections.

Simple routing task. Due to implementation

restrictions, reduced crossbar more common with clusters connected.

11

Page 12: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

MorphoSys

Designed to handle multimedia applications.

Due to varied tasks and a large amount of input/output data, ASIC solutions are generally expensive to develop and GPPs ineffecient.

Currently in version M2, with research ongoing.

12

Page 13: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

System Architecture

The system level architecture of the MorphoSys system is shown below:

Parizi, H.; Niktash, A.; Bagherzadeh, N.; Kurdahi, F.

13

Page 14: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

RC Cell Architecture

The layout of an individual reconfigurable cell is shown below:

Parizi, H.; Niktash, A.; Bagherzadeh, N.; Kurdahi, F. 14

Page 15: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

Benefits of MorphoSys

Combination of both fine and coarse grained reconfigurable elements allow for customization and optimization depending on the application.

Memory structure designed to accommodate the high demand for data movement in multimedia applications.

15

Page 16: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

Evaluation

Tested with several operations common in multimedia and DSP applications.

Tested against dedicated DSP boards.

Parizi, H.; Niktash, A.; Bagherzadeh, N.; Kurdahi, F.

16

Page 17: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

ADRES Designed to achieve specified

performance and power consumption targets for portable wireless media applications.

Test application for the architecture was an H.264/AVC decoder.

The ADRES architecture consists of a VLIW processor coupled with an array of coarse grained processing cells for acceleration.

17

Page 18: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

ADRES Architecture VLIW processor optimized for load/store and

control operations. The accelerator component optimized for

data-flow with branching supported. Each reconfigurable cell contains a local

register file, allowing for iterative data processing and data delay.

Each reconfigurable cell can communicate with all cells in its row and column, as well as neighbouring cells within its quadrant.

18

Page 19: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

System Level View When running in acceleration mode, an 8x8

array can be formed by configuring the VLIW elements.

Veredes, F.; Scheppler, M.; Moffat, W.; Mei, B.

19

Page 20: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

ADRES Reconfigurable Cell While the configuration memory is

assumed to be static during execution, dynamic reconfiguration is possible using a pointer.

Veredes, F.; Scheppler, M.; Moffat, W.; Mei, B. 20

Page 21: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

Performance and Implementation

ADRES found to be 88% faster overall in a full decoding cycle than a standard VLIW processor.

Layout study performed using 0.13 μm technology standard cells.

Each reconfigurable cell consumes approximately 0.196 mm2.

Configuration memory accounts for around 50% of a cell, with 83% of the area in the full implementation used for various storage elements.

21

Page 22: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

Parallel Particle Filter Processor

Particle filters are used in non-linear problems where the goal is to track or detect dynamic signals.

Target application of designed system is the real-time tracking of a ball-bearing, where the goal is to determine the coordinates and velocity of the target using a given input angle.

Need to generate new particles, determine appropriate weights, and resample.

22

Page 23: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

Operations

Both the generation of new particles and determining the weights are performed using processing elements.

This involves the calculation of w(m), which is the weight of a particle, and f(m), which is determined by the application.

23

Page 24: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

System Level Architecture Consists of both parallel and sequential

data flow, with a buffer to synchronize their behaviour.

Sadasivam, M.; Hong, S.

24

Page 25: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

Sequential Flow Reconfigurable Slice (SFRS)

Responsible for the calculation of f(m), with direct access to the buffer unit.

Sadasivam, M.; Hong, S.

25

Page 26: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

Parallel Flow ReconfigurableSlice (PFRS)

Handles updating, creating, and outputting the particles.

Sadasivam, M.; Hong, S.

26

Page 27: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

Reconfiguration The architecture can be altered by changing:

The way in which particles are generated The way in which particles update The output method

The update of particles can be altered by reconfiguring the CORDIC unit used in the calculation of f(m), which also stores needed constants and MUX controls.

The control unit is used to control the interconnects in the SFRS to implement the desired function.

27

Page 28: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

Performance Tested against both a DSP processor and a

general purpose FPGA. It should be noted that the authors reported

problems in terms of having enough logic elements to map all the required PEs on the general purpose FPGA.

The results are shown in the table below for the calculation times of both f(m) and w(m).

28

Page 29: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

Conclusions Coarse Grained reconfigurable

architectures generally used in either calculation or I/O heavy applications.

Not single best design, with the architecture layout highly dependent on design goals.

Performance generally favourable when compared to dedicated processors and general purpose FPGAs.

29

Page 30: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

Project

Goal: Implementation of the Advanced Encryption Standard (AES) algorithm using VHDL.

Secondary Goal: Implement the algorithm in such a way as to reduce the area consumption and computation time.

30

Page 31: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

Progress Algorithm examined in terms of where

parallelism and alternative implementations can be considered.

While individual rounds must be performed sequentially, “blocks” of data within a given operation can be acted upon in parallel.

Implementation of the S-box and MixColumns operations crucial to a good application.

31

Page 32: Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008 1

Thank you for your time. Questions?

32