48
L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models 1 CNR-Bioinformatics Dec. 19, Napoli Many-core processors: the integrated approach to the computational and execution models Lorenzo Verdoscia and Roberto Vaccaro Institute for High Performance Computing and Networking National Research Council – Italy [email protected]

Many-core processors: the integrated approach to the computational and execution models

  • Upload
    dick

  • View
    44

  • Download
    0

Embed Size (px)

DESCRIPTION

Many-core processors: the integrated approach to the computational and execution models. Lorenzo Verdoscia and Roberto Vaccaro Institute for High Performance Computing and Networking National Research Council – Italy [email protected]. - PowerPoint PPT Presentation

Citation preview

Page 1: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

1CNR-BioinformaticsDec. 19, Napoli

Many-core processors: the integrated approach to the

computational and execution models

Lorenzo Verdoscia and Roberto Vaccaro

Institute for High Performance Computing and Networking

National Research Council – [email protected]

Page 2: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

2CNR-BioinformaticsDec. 19, Napoli

The Landscape of Parallel Computing Research: A View From Berkeley http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html

Page 3: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

3CNR-BioinformaticsDec. 19, Napoli

From our architectural point of view, this new trend raises at least two queries: how to exploit such spatial parallelism, how to program such systems.

What is D3AS

Page 4: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

4CNR-BioinformaticsDec. 19, Napoli

The first query brings us to seriously reconsider the dataflow paradigm, given the fine grain nature of its operations. In fact, instead of carrying out in sequence a set of

operations like a von Neumann processor does, a many-core dataflow processor could calculate a function first connecting and configuring a number of identical simple cores as a dataflow graph and then allowing data asynchronously flow through them.

What is D3AS

Page 5: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

5CNR-BioinformaticsDec. 19, Napoli

The second query brings us to seriously reconsider the functional programming style, given its intrinsic simplicity in writing parallel programs. In fact, functional languages have three key properties

that make them attractive for parallel programming: They have powerful mechanisms for abstracting over

both computation and coordination; they eliminate unnecessary dependencies; their high-level coordination achieves a largely

architecture-independent style of parallelism.

What is D3AS

Page 6: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

6CNR-BioinformaticsDec. 19, Napoli

Agenda

The hHLDS model CHIARA language Dataflow graph generation and mapping D3AS general architecture Future work

Page 7: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

7CNR-BioinformaticsDec. 19, Napoli

D3AS (Demand Data Driven Architecture System):

a high performance reconfigurable computing system demonstrator, which exploits FPGA technology where

the computational model is functional

the execution model is dataflow

and whose architecure has a highly scalable degree with nodes characterized by having

a dynamic configurability

a transparent hardware reconfiguration

Design methodology:

develop the right computation model alongside languages & hadware

Page 8: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

8CNR-BioinformaticsDec. 19, Napoli

The methodological approach

Physical Model(reconfigurable)

( dataflow ) ( functional )

Computational Model

Real Machine(Hundread thousands of identical MPFUs)

Data Driven Demand Driven

hHigh-Level dataflow System CHIARA language

Page 9: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

9CNR-BioinformaticsDec. 19, Napoli

Firing rules in the classical model

Let A={a1, …, an} be the set of actors

and L ={ll, …, ln} be the set of links

A dataflow graph is a labelled directed graph

G = (N, E)where

N = A L is the set of nodes

E (A × L) (L × A) is the set of edges

firing of an actora token on each input link and no token on each output link

effectconsumes all input tokens and produces a token on its output link

The homogeneous High Level Dataflow System (hHLDS) model

Page 10: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

10CNR-BioinformaticsDec. 19, Napoli

Special actors in the classical model

Merge

FT

A B

L

FT

Switch

A

L

Decider

A B

L

R L

Gate

are characterized by having heterogeneous I/O conditions

The hHLDS model

Page 11: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

11CNR-BioinformaticsDec. 19, Napoli

homogeneous High Level Dataflow SystemAny actor has two input links and one output link and consumes and produces only data tokens

firing of an actora token on each input link

effectconsumes all input tokens and can produces a token on its output link

a+b*c*

+

a

b c

+

a

b c

If b≤c then a

Page 12: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

12CNR-BioinformaticsDec. 19, Napoli

Comparison between the two models

TF

=

T F

T F

T F

* 3

/ 2 5

F F

1 c

a

d

F

F

F

T

TT

a )

TF

> 1

+

**

+ +

> <

:_

LS T LS T

++

==

a

b

1

53 2

1

c

d

a

b )

1 2

3

6

8

10

12 13 14

11

9

7

4 5

input (a, c) b := 1; repeat if a > 1 then a := a \ 2 else a := a * 5 b := b * 3; until b = c;output (d)

The hHLDS model

Page 13: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

13CNR-BioinformaticsDec. 19, Napoli

CHIARA language

dialect of Backus‘s FP tuple (O, F, F, :, D) where:

O is a set of objects; F is a set of functions (or operators) from

objects to objects; F is a set of functional forms (functionals)

from functions to functions; : is the application operation; D is a set of function definitions.

Page 14: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

14CNR-BioinformaticsDec. 19, Napoli

CHIARA objects Atoms: include integer fixed and floating-point numbers,

Boolean constants,characters and strings

Sequences: denoted with angle brackets < 1, 2, 3 > The empty sequence <> is the only object which is both an

atom and a sequence

Undefined special object (or UDF) called bottom, which is usually used to denote errors or exceptions. Sequences are bottom-preserving: < 1; 2;< 3; 5 >; > =

CHIARA language

Page 15: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

15CNR-BioinformaticsDec. 19, Napoli

CHIARA functionstwo kinds of operators that can be applied to

objects:

Elementary: the commonly used binary operators and some new ones

Combinator: operators that affect the structure of the objects on which they are applied (combine sequences, transpose sequences of sequences, etc).

CHIARA language

Page 16: Many-core processors:  the integrated approach to the computational and execution models

16CNR-BioinformaticsDec. 19, Napoli

Elementary operators

Keyword Name Action Definition

+ arithmetic addition

if x is a pair of numbers, + : x produces their sum, otherwise . + :< x1; x2 >= x1 + x2

- arithmetic subtraction

if x is a pair of numbers, - : x produces their difference, otherwise . - :< x1; x2 >= x1 - x2

* arithmetic multiplication

if x is a pair of numbers, * : x produces their product , otherwise . * :< x1; x2 >= x1 * x2

/ arithmetic division

if x is a pair of numbers, / : x produces their quotient , otherwise . / :< x1; x2 >= x1 / x2

lt less than if x is a pair of objects, lt : x produces T if thefirst object is less than the second, otherwise .

lt:< x1, x2 >

CHIARA language

Page 17: Many-core processors:  the integrated approach to the computational and execution models

17CNR-BioinformaticsDec. 19, Napoli

Elementary operators

Keyword Name Action Definition

gt greater than

if x is a pair of objects, gt : x produces T if the first object is greater than the second, otherwise . gt:< x1, x2 >

… … … …

lst loop startif x is a pair of objects, lst: x produces the first object when applied the first time, otherwise the second object

lst:< x1; x2 >

sL select leftif x is a pair of objects, sL : x produces the first object, otherwise .

sL:< x1; x2 > = x1

sR select right

if x is a pair of objects, sR : x produces the second object, otherwise .

sR:< x1; x2 > = x2

CHIARA language

Page 18: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

18CNR-BioinformaticsDec. 19, Napoli

Combinator operators

Keyword Name Action Definition

i selector if i and n are two natural numbers and x a non-empty sequence of n i elements, i: x produces the ith element, otherwise .

i :< x1,..,xi,..,xn >= xi

id identity if x is an object, id: x produces x.

id : x = x

apndL append left

if x is a pair of objects, where the second one is a sequence, apndL: x produces a sequence, otherwise .

apndL:< z,< y1,...,yn >>= < z,y1,...,yn,>

apndR append right

if x is a pair of objects, where the first one is a sequence, apndR: x produces a sequence, otherwise .

apndR:<< y1,...,yn >,z>= << y1,...,yn ,z >>

… … ... …

Page 19: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

19CNR-BioinformaticsDec. 19, Napoli

Combinator operators

Keyword Name Action Definition

concat concatenate

if x is a pair of sequences, concat: x produces a sequence where the second one is concatenated to the first one, otherwise .

concat:<< x11,...,x1n >,< x21 ,...,x2n >> = < x11,...,x1n,x21,...,x2n >

distL distribute from left

if x is a pair of objects, where the second one is a sequence, distL: x produces a sequence of pairs, otherwise .

distL:< z,< y1,..,yn >>=

<< z,y1 >,.., < z,yn >>

trans transpose

if x is a sequence of objects, trans : x produces a transposition of this sequence, otherwise .

trans :<< x11,...,x1m >,…,< xn1 ,...,xnm >> = << x11,...,xn1 >,…,< x1m ,...,xnm >>

… … ... …

Page 20: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

20CNR-BioinformaticsDec. 19, Napoli

Functional forms

CHIARA functional forms are used to define new functions from existing functions and combinators

Functionals in CHIARA include the functional forms of Backus’s FP and some new ones

Page 21: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

21CNR-BioinformaticsDec. 19, Napoli

Functional forms

Keyword

Name Action Definition

% constant if x is a value and y is an object, %x : y produces x, otherwise. %x : y = x

° composition

it permits the application of the composition of two functions to an object x. The composition of two functions, f ° g, is the function obtained by applying first g and then f to an object x

(f ° g) : x = f : (g : x)

−−> conditional

it permits the application of one of the two functions q and r to an object according to the boolean value of a condition p.

(p−−> q; r) : x = q : x if p : x =T r : x if p : x =F otherwise

case … ... …

Page 22: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

22CNR-BioinformaticsDec. 19, Napoli

Functional forms

Keyword

Name Action Definition

[...] construction it permits the application of a sequence of functions f1, ..., fn to an object x.

[f1 , ..., fn ]: x =

< f1 : x, ..., fn : x >

& apply to all it permits the application of the same function f to a sequence x.

&f :< x1, ..., xn >=

< f : x1, ..., f : xn >

| insert

if x is a sequence of at least two elements, it recursively applies the same function f to the couple of objects head-left tail. It stops if the tail contains one object, otherwise.

|f :< x1, x2, ..., xn >=

f :< x1, |f :< x2..., xn >>=

f :< x1, f :< x2 , |f :< x3, ..., xn >>>

… … … …

Page 23: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

23CNR-BioinformaticsDec. 19, Napoli

Functional forms

Keyword

Name Action Definition

! binary insert

it breaks up the argument into n pairs applying itself recursively to all the pairs.The functional parameter is applied to the result.It stops if the object contains one pair.

!f :< x1, x2, ..., xn >=

!f :< f :< x1,x2>, f :< x3,x4>..., f :< xn

-1,xn >> =…

while (while p, f): x = (while p, f ) : (f : x) if p : x =T x if p : x =F

repeat …

Page 24: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

24CNR-BioinformaticsDec. 19, Napoli

The assembly language

a functionally complete sub-set of elementary operators is the assembly language for a D3AS many-core processor

more complex functions are obtained applying the rule of metacomposition

dataflow graphs that are produced can be directly mapped and executed onto the hardware

CHIARA language

Page 25: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

25CNR-BioinformaticsDec. 19, Napoli

New functions

The def construct permits the definition of new functions from existing functions, combinators, functional forms, and other already defined functions.

For example:

def max = (gt ° [1,2] --> 1;2)

max:<5,6> = 6

a

a b

max

++

> <

Page 26: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

26CNR-BioinformaticsDec. 19, Napoli

Dataflow graph mapping

communications inter many-core processors are slower than intra many-core processor

NP-hard mapping problem

Dataflow graph generation and mapping

Page 27: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

27CNR-BioinformaticsDec. 19, Napoli

Compilation process

The whole compilation process is composed of two steps:

compilation, producing the dataflow graph from CHIARA programs (function definitions plus expressions to be evaluated)

mapping, aimed at implementing the produced dataflow graph onto the D3AS prototype

Dataflow graph generation and mapping

Page 28: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

28CNR-BioinformaticsDec. 19, Napoli

Dataflow graph generation

the CHIARA compiler, in conjunction with front-end tools, generates the

Global Dataflow Graph Table (GDGT)

Dataflow graph generation and mapping

Page 29: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

29CNR-BioinformaticsDec. 19, Napoli

Global Dataflow Graph Table (GDGT)Node# Func Apply Constr Insert Left Right Out

level level Level In In

.. ... . . . .. .. ..

43 MUL 1 0 0 %1 %30 47

44 MUL 1 0 0 %2 %30 47

45 MUL 1 0 0 %3 %30 48

46 MUL 1 0 0 %4 %30 48

47 ADD 0 0 1 43 44 49

48 ADD 0 0 1 45 46 49

49 ADD 0 0 2 47 48 out

50 MUL 1 0 0 %1 %40 54

51 MUL 1 0 0 %2 %40 54

52 MUL 1 0 0 %3 %40 55

53 MUL 1 0 0 %4 %40 55

54 ADD 0 0 1 50 51 56

55 ADD 0 0 1 52 53 56

56 ADD 0 0 2 54 55 out

.. ... . . . .. .. ..

Page 30: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

30CNR-BioinformaticsDec. 19, Napoli

Visualization of Compiler Graph

Page 31: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

31CNR-BioinformaticsDec. 19, Napoli

The next step

the compiler extracts from the GDGT two tables:

Dataflow Graph Description (DGD) table, that contains, for each node, the binary operation and interconnection codes for the Graph Setter of a Processing Subsystem

Initial Input Value (IIV) table, that contains the binary information about input program data tokens

Dataflow graph generation and mapping

Page 32: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

32CNR-BioinformaticsDec. 19, Napoli

Dataflow graph mapping

The presence of functionals:

permits the adoption of strategies that try to cluster parallelism exploitation

suggests handy ways to partition the dataflow graph into smaller, loosely connected graphs that can be run on the single platform-processors

Page 33: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

33CNR-BioinformaticsDec. 19, Napoli

D3AS general architecture

Reconfigurable Hardware System (RHS)

Capable to map and execute dataflow graphs, created with the hHLDS model in a completely asynchronous manner.

Contituted by three Subsystem

■ Actor Realization Subsystem (ARS)

Capable to create a one-to-one correspondence among graph actors and Functional Units.

■ Token flow Realization Subsystem (TRS)

Implementing graph edges.

■ Graph Mapping Subsystem (GMS)

Devoted to store the RHS Context Informations.

Page 34: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

34CNR-BioinformaticsDec. 19, Napoli

■ ARS Constituted by N identical Multipurpose Functional Unit (MPFUs).

■ TRS Constituted by 3 Sets of N buffer Registers and a Crossbar Swith Interconnect.

■ GMS Constituted by a set of buffers and logic circuitery.

D3AS general architecture

Page 35: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

35CNR-BioinformaticsDec. 19, Napoli

D3AS general architecture

Critical Parameters in the RHS design.

■ NMPFU: the number of the MPFUs constituting the ARS;

■ CMPFU: the logical and functional complexity of the MPFUs;

■ INTRS: the type of interconnect for the TRS.

The number of MPFU implementable on a VLSI device depends on:

■ interconnect complexity;

■ logical and functional complexity of MPFU.

Page 36: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

36CNR-BioinformaticsDec. 19, Napoli

D3AS general architecture

RHS/D3AS Fundamental Building Block

Many-core Datalow Processor (MDP)

A many-core chip replicating the D3AS general arcitecture with n MPFU interconnected via a non-blocking cross bar switch network.

Page 37: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

37CNR-BioinformaticsDec. 19, Napoli

D3AS general architecture

Architecture with globally pure dataflow model

N: Number of Graph Actor

n: Number of MPFU of MDP

RHS is configured interconnecting K= N/n MPD with a 2nd level non-blocking crossbar switch interconnection network.

Page 38: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

38CNR-BioinformaticsDec. 19, Napoli

D3AS general architecture with Hybrid Dataflow Model

N>n

The Graph is partitioned into subgraphs and the RHS is configured interconnecting m= N/n MDP with a 2nd level message passing interconnection network.

Dataflow Graph Edge among subgraph mapped on different MDP are virtualized by messages ranted through the network.

Communnicating Dataflow Processes (CDP)

Page 39: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

39CNR-BioinformaticsDec. 19, Napoli

D3AS general architecture demonstrator

GIDEL board

Page 40: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

40CNR-BioinformaticsDec. 19, Napoli

En

able

Sig

nal

s

Routing Subsystem

Kernel Subsystem

Processing Subsystem

Toke

n_I

n A

Mes

sag

e_In

Mes

sag

e_O

ut

Toke

n_I

n B

Toke

n_O

ut

Gra

ph

Tab

le

Packet AssemblerPacket Deassembler

WK-recursive Message Manager

DestinationList

GCL ITTE 0TTEControl

Unit

GRAPH SETTER

MPFU INTERCONNECT

MPFU # 1

MPFU # n

Gra

ph

Co

nfi

gu

rati

on

Ta

ble

Inte

rco

nn

ect

Co

de

MPFU OP Code

CO

NT

RO

L S

EC

TIO

N

TOKEN OUT ENSEMBLE BUFFERS

TOKEN_IN A ENSEMBLE BUFFERS

TOKEN_IN B ENSEMBLE BUFFERS

8 8

768

Page 41: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

41CNR-BioinformaticsDec. 19, Napoli

Control

Op

era

tio

n C

od

eR

eg

iste

r

/

//

//

//

/8

11

11

1

1

6

33

32 32

33

33

Latch LatchEnable In

Enable OutLST Test

ValidityTest result

Va

lid

ity

Fro

m t

he

Gra

ph

Se

tte

r

Va

lid

ity

ALU+MULT

Latch

# 1MPFU

/33 /33 /33/33

. . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . .

. . .

. . . . . . . . . . .

. . .

. . .

. .

\ 1

/6

/6

/6

/6

From the Token_in A buffer

Fro

m t

he

Gra

ph

Se

tte

r# k

MPFU

# mMPFU

# n

MPFU

11 6417 48

k1 6417 48

m1 6417 48

n1 6417 48

Page 42: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

42CNR-BioinformaticsDec. 19, Napoli

Matrix Multiplication

Given two matrices A(n,n) and B(n,n), their product generates a matrix C(n,n) whose generic element is given by the following formula:

Some results

n

1kkjikij

bac i,j = 1…n

Page 43: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

43CNR-BioinformaticsDec. 19, Napoli

Matrix Multiplication we used two values of n: n=32 and n=64

Some results

Page 44: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

44CNR-BioinformaticsDec. 19, Napoli

Matrix Multiplication

we compared the performance of a platform-processor with a IA32 Pentium IV

we measured performance in terms of CPI because our FPGA platform-processor executes an operation in 30 ns against 0.5 ns of the Pentium.

Some results

Page 45: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

45CNR-BioinformaticsDec. 19, Napoli

IA-32 Pentium IV vs D3AS

Pentium Platform-Processor

cycles per instruction cycles per instruction

n Products Sums Total Products Sums Total

32 8192 7939 16131 - - 1027

64 65561 64537 130098 4096 4108 8204

Some results

Page 46: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

46CNR-BioinformaticsDec. 19, Napoli

Zeroes of a function (f=x*x+3x-1.75)

assembly code generated compiling the C source code: 122 sequential assembly code lines

Some results

Page 47: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

47CNR-BioinformaticsDec. 19, Napoli

Zeroes of a function

our compiler generates a GDGT with only 28 micro-instructions organized on 12 sequential steps.

Node FuncApply level

Constr level

Insert Level

Left input

Right input

Output

1 LST 0 0 0 26 -1% 5-5-6-3-172 LST 0 0 0 27 1% 3-213 ADD 0 0 0 1 2 4 4 DIV 0 3 0 3 2% 7-7-8-18-19-20-225 MUL 0 0 0 1 1 9 6 MUL 0 0 0 1 3% 9 7 MUL 0 0 0 4 4 10 8 MUL 0 0 0 4 3% 10 9 ADD 0 0 0 5 6 11

10 ADD 0 0 0 7 8 12 11 SUB 0 0 0 9 1.75% 13 12 SUB 0 0 0 10 1.75% 13 13 MUL 0 0 0 11 12 14-15-16 14 LT 0 0 0 13 0% 17-1815 EQ 0 0 0 13 0% 19-2016 GT 0 0 0 13 0% 30-2217 ADD 0 0 0 1 14 30 18 ADD 0 0 0 14 4 28 19 ADD 0 0 0 4 15 30 20 ADD 0 0 0 15 4 28 21 ADD 0 0 0 2 16 28 22 ADD 0 0 0 16 4 30 23 SUB 0 0 0 30 28 24-2524 GEQ 0 0 0 23 0.01% 26-2725 LT 0 0 0 23 0.01% 29 26 ADD 0 0 0 30 24 1 27 ADD 0 0 0 28 24 2

28 MRG 0 0 018-20-

21

23-27-29

29 ADD 0 0 0 28 25 Out

30 MRG 0 0 017-19-

22

23-26

Some results

Page 48: Many-core processors:  the integrated approach to the computational and execution models

L. Verdoscia & R. Vaccaro – Many-core processors: the integrated approach to the computational and execution models

48CNR-BioinformaticsDec. 19, Napoli

Future work

To evalute which applications perfom better on the architecure with globally pure and hybrid dataflow model.

How to generalize pipeline inside the MDP