28
Systolic Array Architectur e 6/15/22 1

Systolic Array Architecture

Embed Size (px)

Citation preview

Page 1: Systolic Array Architecture

April 11, 2023

1

Systolic Array Architecture

Page 2: Systolic Array Architecture

April 11, 20232

Definition: Systolic

Definition 1. · sys·to·le (sîs¹te-lê) noun

· The rhythmic contraction of the heart, especially of the ventricles, by which blood is driven through the aorta and pulmonary artery after each dilation or diastole.

Definition 2.Data flows from memory in a rhythmic fashion, passing through many processing elements before it returns to memory.

Page 3: Systolic Array Architecture

April 11, 20233

What Is a Systolic Array? Imagine n simple processors arranged in

a row or an array and connected in such a manner that each processor may exchange information with only its neighbours to the right and left. The processors at either end of the row are used for input and output. Such a machine constitutes the simplest example of a systolic array.

Page 4: Systolic Array Architecture

April 11, 20234

Basic principle of systolic architecture• Systolic system consists of a set interconnected

cells, each capable of performing some simple

operation.

• Systolic approach reduces the computational

complexity, without complicating the system.

• In a systolic array in particular, we achieve higher

computation throughput without increasing memory

bandwidth

Page 5: Systolic Array Architecture

pp. 5

memory

PE

memory

PE PE PE PE PE

Instead of :

100 ns

We have :

100 ns 30 MOPS possible

5 million operations

per second at most

The systolic array

PE

Basic principle of systolic architecture

5

Page 6: Systolic Array Architecture

April 11, 20236

Typical structures

1D Linear array 1D Linear array with 2D I/O

Bi-directional two-dimensional network

Hexagonal network

Page 7: Systolic Array Architecture

April 11, 20237

Systolic Computing Systolic approach utilizes both Pipelining

and Parallelism

By pipelining, processing may proceed concurrently with input and output, and consequently overall execution time is minimized. Pipelining plus multiprocessing at each stage of a pipeline should lead to the best-possible performance.

Page 8: Systolic Array Architecture

Matrix Multiplication

a11 a12 a13

a21 a22 a23

a31 a32 a33 *b11 b12 b13

b21 b22 b23

b31 b32 b33=

c11 c12 c13

c21 c22 c23

c31 c32 c33

Conventional Method: N3

For I = 1 to N For J = 1 to N For K = 1 to N C[I,J] = C[I,J] + A[J,K] * B[K,J];

8

Page 9: Systolic Array Architecture

Systolic Method

This will run in O(n) time!

To run in N time we need N x N processing units, in this casewe need 9.

P9P8P7

P6P5P4

P1 P2 P3

9

Page 10: Systolic Array Architecture

We need to modify the input data, like so:

a13 a12 a11

a23 a22 a21

a33 a32 a31

b31 b32 b33

b21 b22 b23

b11 b12 b13

Flip columns 1 & 3

Flip rows 1 & 3

and finally stagger the data sets for input.

10

Page 11: Systolic Array Architecture

P9P8P7

P6P5P4

P1 P2 P3a13 a12 a11

a23 a22 a21

a33 a32 a31

b31

b21

b11

b32

b22

b12

b33

b23

b13

At every tick of the global system clock data is passed to eachprocessor from two different directions, then it is multiplied and the result is saved in a register.

11

Page 12: Systolic Array Architecture

3 4 2

2 5 33 2 5

* =

3 4 2

2 5 33 2 5

23 36 28

25 39 3428 32 37

Lets try this using a systolic array.

P9P8P7

P6P5P4

P1 P2 P32 4 3

3 5 2

323

5 2 3

532

254

12

Page 13: Systolic Array Architecture

3*32 4

3 5 2

32

5 2 3

532

254

Clock tick: 1

9 0 0 0 0 0 0 0 0

P1 P2 P3 P4 P6P5 P7 P8 P9

13

Page 14: Systolic Array Architecture

2*3

4*2 3*42

3 5

3

5 2 3

532

25

Clock tick: 2

17 12 0 6 0 0 0 0 0

P1 P2 P3 P4 P6P5 P7 P8 P9

14

Page 15: Systolic Array Architecture

3*3

2*45*2

2*3 4*5 3*2

3

5 2

532

Clock tick: 3

23 32 6 16 8 0 9 0 0

P1 P2 P3 P4 P6P5 P7 P8 P9

15

Page 16: Systolic Array Architecture

3*42*2

2*25*53*3

2*2 4*3

5

5

Clock tick: 4

23 36 18 25 33 4 13 12 0

P1 P2 P3 P4 P6P5 P7 P8 P9

16

Page 17: Systolic Array Architecture

3*25*25*3

5*33*2

2*5

Clock tick: 5

23 36 28 25 39 19 28 22 6

P1 P2 P3 P4 P6P5 P7 P8 P9

17

Page 18: Systolic Array Architecture

2*35*2

3*5

Clock tick: 6

23 36 28 25 39 34 28 32 12

P1 P2 P3 P4 P6P5 P7 P8 P9

18

Page 19: Systolic Array Architecture

5*5

Clock tick: 7

23 36 28 25 39 34 28 32 37

P1 P2 P3 P4 P6P5 P7 P8 P9

19

Page 20: Systolic Array Architecture

373228

343925

23 36 28

23 36 28 25 39 34 28 32 37

Same answer! In 2n + 1 time!

P1 P2 P3 P4 P6P5 P7 P8 P9

20

Page 21: Systolic Array Architecture

April 11, 202321

Extension to other applications The concepts used in Matrix-Vector

multiplication can be easily extended to compute more complex functions.

Some of these functions include the multiplication of multiple matrices and n-dimensional applications.

Systolic lattice filters used for speech and seismic signal processing

Page 22: Systolic Array Architecture

April 11, 202322

Reconfigurable systolic array An array of systolic

elements that can be configured at the lowest level

Relatively new field-programmable gate array (FPGA) technology permits a reconfigurable architecture, as opposed to a reprogrammable architecture.

Page 23: Systolic Array Architecture

April 11, 202323

Pipelining Vs. Systolic Array Input data is not consumed Input data streams can flow in different

directions Modules may be organized in a two

dimensional (or higher) configuration Configurable – Different array

configurations available for different processing purposes

Page 24: Systolic Array Architecture

April 11, 202324

Why Systolic? Extremely fast. Easily scalable architecture. Can do many tasks single processor

machines cannot attain. Turns some exponential problems into

linear or polynomial time.

Page 25: Systolic Array Architecture

April 11, 202325

Why Not Systolic? Expensive. Not needed on most applications, they

are a highly specialized processor type. Difficult to implement and build. No generalized structure, hence

algorithm specific.

Page 26: Systolic Array Architecture

April 11, 202326

Summary Systolic Arrays offer a substantial

reduction in the computational complexity.

They are expensive and sometimes complex but yield enormous throughput.

Re-configurability of systolic arrays can be achieved using the FPGA technology

Page 27: Systolic Array Architecture

April 11, 202327

References K. T. Johnson, A.R. Hurson, Behrooz

Shirazi, General-Purpose Systolic Arrays, IEEE 1993, pp. 20-31

www.cs.ucf.edu/courses/cot4810/fall04/.../Systolic_Arrays.ppt

www.ee.pdx.edu/~mperkows/temp/May22/jhanduber2.pdf

Page 28: Systolic Array Architecture

April 11, 202328

Thank You!Any Questions?