19
1 Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Instructor: Evgeny Fiksman Students: Meir Cohen Daniel Marcovitch Winter 2009

Implementing a NoMC on the Gidel platform end-semester presentation

Embed Size (px)

DESCRIPTION

Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab. Winter 2009. Implementing a NoMC on the Gidel platform end-semester presentation. Instructor: Evgeny Fiksman Students: Meir Cohen Daniel Marcovitch. Table of Contents. - PowerPoint PPT Presentation

Citation preview

Page 1: Implementing a  NoMC  on the  Gidel  platform end-semester presentation

1

Technion – Israel Institute of TechnologyDepartment of Electrical EngineeringHigh Speed Digital Systems Lab

Instructor: Evgeny FiksmanStudents: Meir Cohen

Daniel Marcovitch

Winter 2009

Page 2: Implementing a  NoMC  on the  Gidel  platform end-semester presentation

2

Project goals Page 2

Previous router Page 5

Our routers Page 7

Software design Page 11

Obstacles Page 12

Testing Page 14

Time tables Page 16

Table of Contents

Page 3: Implementing a  NoMC  on the  Gidel  platform end-semester presentation

Project goalsImplementing a parallel processing system

which contains several NoCs, each chip containing several sub-networks of processors.

Converting existing router to support Altera platform.

Expanding the router to enable communications between similar sub-networks.

Implementing a processor network which supports communication with the PC enabling: Use of PC’s CPU as part of the processing network. Simple I/O between PC and the rest of the processing

network.

3

Page 4: Implementing a  NoMC  on the  Gidel  platform end-semester presentation

Top-level structure of the expanded network

Each white square represents a single FPGA on the Gidel board.

FPGA-FPGA, FPGA-PC routes go via designated routers (GW).

The GWs design/protocols are the same as the internal routers.

4

Page 5: Implementing a  NoMC  on the  Gidel  platform end-semester presentation

Router from previous project

5

Cross Bar – Low Level

Clk Rst

Req

Des

t

Prem

it

Des

t

Pre

mit

Req

Dest

Premit

Req

Dest

Premit

Control B

us II

Control Bus II

Control Bus II

Permission Unit

Port

Controls3

Timer & Enable Unit

Control Bus I

Control Bus I

Data Bus 32 Bits

Data Bus 32 Bits

Data B

us

Data B

us

2

Bus I Interface Port2

Bus I Interface

Port2

Bus I Interface

Bus

I In

terfa

ceP

ort 2

Port2

Fsl_S_D

ata

Fsl_

M_D

ata

Port #3 FSM

Fsl_

S_R

ead

Fsl_

S_C

ontro

l

Fsl_

S_H

asD

ata

TO\FROM FSL

Fsl_M_W

rite

Fsl_M_C

ontrol

Fsl_M_Full

Bus II & Data Bus Interface

Port

2

Fsl_S_Data

Fsl_M_Data

Por

t #2

FSM

Fsl_S_Read

Fsl_S_Control

Fsl_S_HasData

TO\F

RO

M F

SL

Fsl_M_Write

Fsl_M_Control

Fsl_M_Full

Port2

Fsl_

S_D

ata

Fsl_M_D

ata

Port #1 FSMFsl_S

_Read

Fsl_S_C

ontrol

Fsl_S_H

asData

TO\FROM FSL

Fsl_

M_W

rite

Fsl_

M_C

ontro

l

Fsl_

M_F

ull

Por

t2

Fsl_S_Data

Fsl_M_Data

Port #4 FS

M

Fsl_S_Read

Fsl_S_Control

Fsl_S_HasData

TO\FR

OM

FSL

Fsl_M_Write

Fsl_M_Control

Fsl_M_Full

Port2

Bus

II &

Dat

a

Bus

Inte

rface

Bus II &

Data

Bus Interface

Bus II & Data Bus Interface

Dest2

Dest

2

Dest2

Des

t2

Dest2

COMM COMM

CO

MM

CO

MM

Bcast

Bca

st

Bcast

Bca

stR

eq

BcastPriority

• Two main units: Permission Unit Port FSM

• Time limited

Round Robin arbiter

• Port to Port & broadcasting

• Smart Connectivity• R – R• R - Core

• Modular design

Page 6: Implementing a  NoMC  on the  Gidel  platform end-semester presentation

Permission process

6

• Round Robin arbiter- service order according to loop counter.

• Check if DEST is not busy. • Permit for a ‘time slot’. • If not requesting, service next

requesting port.• BUSY and LAST writing ports

are saved.• Check for messages COMM

and direct to relevant port according to table

• Broadcast priority to enable only one bcast’ at a time.

CONTROLLER

Permission Unit

Clk Rst

BUSY

TO

\FR

OM

C

on

trol B

us

2

2 Port

DE

ST 2

Port2

3 1 2 4

LAST WRITING PORT1 2 3 4

MUX 4X2

1 0 1 0

BUSY PORTS1 2 3 4

MUX 4x1

LAST

Timer & Enable

Unit

Premit

2 2

2

2 2

Req1Req2Req3Req4

Req

2

COMMs table

4 3 2 1

Dest

COMM CommDst

DEST

BcastPriority

Unit

R1

R2

R3

R4Bcast1Bcast2Bcast3Bcast4

FR

OM

P

ort F

SM

’sNxt

TimeOver

Bcast

Page 7: Implementing a  NoMC  on the  Gidel  platform end-semester presentation

Our changes for the router

7

Fifth port

Routing table

Broadcast table

Local router (LR)Fabric router (FR)Primary/secondary interchip

router (P/S-ICR)PC router (PCR)

New router types:Changes:

Page 8: Implementing a  NoMC  on the  Gidel  platform end-semester presentation

Fifth port

8

Cross Bar – Low Level

Clk Rst

Req

Des

t

Prem

it

Des

t

Pre

mit

Req

Dest

Premit

Req

Dest

Premit

Control B

us II

Control Bus II

Control Bus II

Permission Unit

Port

Controls3

Timer & Enable Unit

Control Bus I

Control Bus I

Data Bus 32 Bits

Data Bus 32 Bits

Data B

us

Data B

us

2

Bus I Interface Port2

Bus I Interface

Port2

Bus I Interface

Bus

I In

terfa

ceP

ort 2

Port2

Fsl_S_D

ata

Fsl_

M_D

ata

Port #3 FSM

Fsl_

S_R

ead

Fsl_

S_C

ontro

l

Fsl_

S_H

asD

ata

TO\FROM FSL

Fsl_M_W

rite

Fsl_M_C

ontrol

Fsl_M_Full

Bus II & Data Bus Interface

Port

2

Fsl_S_Data

Fsl_M_Data

Por

t #2

FSM

Fsl_S_Read

Fsl_S_Control

Fsl_S_HasData

TO\F

RO

M F

SL

Fsl_M_Write

Fsl_M_Control

Fsl_M_Full

Port2

Fsl_

S_D

ata

Fsl_M_D

ata

Port #1 FSMFsl_S

_Read

Fsl_S_C

ontrol

Fsl_S_H

asData

TO\FROM FSL

Fsl_

M_W

rite

Fsl_

M_C

ontro

l

Fsl_

M_F

ull

Por

t2

Fsl_S_Data

Fsl_M_Data

Port #4 FS

M

Fsl_S_Read

Fsl_S_Control

Fsl_S_HasData

TO\FR

OM

FSL

Fsl_M_Write

Fsl_M_Control

Fsl_M_Full

Port2

Bus

II &

Dat

a

Bus

Inte

rface

Bus II &

Data

Bus Interface

Bus II & Data Bus Interface

Dest2

Dest

2

Dest2

Des

t2

Dest2

COMM COMM

CO

MM

CO

MM

Bcast

Bca

st

Bcast

Bca

stR

eq

BcastPriority

5th Port

Just adding another port module to the ring…

Page 9: Implementing a  NoMC  on the  Gidel  platform end-semester presentation

Routing

9

PC C C F F L LAddress

localfabricchip

rankcomm

Local router:Similar comm – routing by rank.Other comms – to 5th port.

Other routers:Routing by comm only.

Result: smaller routing tables

Page 10: Implementing a  NoMC  on the  Gidel  platform end-semester presentation

Routing

10

CONTROLLER

Permission Unit

Clk Rst

BUSY

TO

\FR

OM

C

on

trol B

us

2

2 Port

DE

ST 2

Port2

3 1 2 4

LAST WRITING PORT1 2 3 4

MUX 4X2

1 0 1 0

BUSY PORTS1 2 3 4

MUX 4x1

LAST

Timer & Enable

Unit

Premit

2 2

2

2 2

Req1Req2Req3Req4

Req

2

COMMs table

4 3 2 1

Dest

COMM CommDst

DEST

BcastPriority

Unit

R1

R2

R3

R4Bcast1Bcast2Bcast3Bcast4

FR

OM

P

ort F

SM

’s

Nxt

TimeOver

Bcast

Non-existing components to be added.

Page 11: Implementing a  NoMC  on the  Gidel  platform end-semester presentation

Broadcast table

11

Cross Bar – Low Level

Clk Rst

Req

Des

t

Prem

it

Des

t

Pre

mit

Req

Dest

Premit

Req

Dest

Premit

Control B

us II

Control Bus II

Control Bus II

Permission Unit

Port

Controls3

Timer & Enable Unit

Control Bus I

Control Bus I

Data Bus 32 Bits

Data Bus 32 Bits

Data B

us

Data B

us

2

Bus I Interface Port2

Bus I Interface

Port2

Bus I Interface

Bus

I In

terfa

ceP

ort 2

Port2

Fsl_S_D

ata

Fsl_

M_D

ata

Port #3 FSM

Fsl_

S_R

ead

Fsl_

S_C

ontro

l

Fsl_

S_H

asD

ata

TO\FROM FSL

Fsl_M_W

rite

Fsl_M_C

ontrol

Fsl_M_Full

Bus II & Data Bus Interface

Port

2

Fsl_S_Data

Fsl_M_Data

Por

t #2

FSM

Fsl_S_Read

Fsl_S_Control

Fsl_S_HasData

TO\F

RO

M F

SL

Fsl_M_Write

Fsl_M_Control

Fsl_M_Full

Port2

Fsl_

S_D

ata

Fsl_M_D

ata

Port #1 FSMFsl_S

_Read

Fsl_S_C

ontrol

Fsl_S_H

asData

TO\FROM FSL

Fsl_

M_W

rite

Fsl_

M_C

ontro

l

Fsl_

M_F

ull

Por

t2

Fsl_S_Data

Fsl_M_Data

Port #4 FS

M

Fsl_S_Read

Fsl_S_Control

Fsl_S_HasData

TO\FR

OM

FSL

Fsl_M_Write

Fsl_M_Control

Fsl_M_Full

Port2

Bus

II &

Dat

a

Bus

Inte

rface

Bus II &

Data

Bus Interface

Bus II & Data Bus Interface

Dest2

Dest

2

Dest2

Des

t2

Dest2

COMM COMM

CO

MM

CO

MM

Bcast

Bca

st

Bcast

Bca

stR

eq

BcastPriority

0 1 1 0 1

Broadcasting only to spanning tree branches.

Table tags branch ports with ‘1’ value:

Connected to “Port FSM” unit of each port.

Page 12: Implementing a  NoMC  on the  Gidel  platform end-semester presentation

12

Software layers

Software design

• Application Layer: MPI functions interface

• Network Layer: hardware independent implementation of these functions

• Data layer: relies on command bit fields

• Physical layer: designed for FSL bus Network layer

Application layer

Data layer

Physical layerAdjust to conform with altera i/f.

Using DMA transfers.

Add async. functions

Adjusted for new comm size

Page 13: Implementing a  NoMC  on the  Gidel  platform end-semester presentation

Message Passing Flow

13

Destination Tag Buffer address Size

Source Buffer

Auxiliary Receive Buffer (Constant)

Destination Buffer

Network

DMA transfer

DMA transfer

DMA transfer

MPI_Isend: only adds send request to sending list.

Destination Tag Buffer address Size

Destination Tag Buffer address Size

DMA sends data asynchronously.

Source Tag Buffer address Size

MPI_Irecv: only adds receive request to receiving list. Source Tag Buffer address Size

Source Tag Buffer address Size

DMA receives data asynchronously.

Transfer data into buffer in background.

Sending

Receiving

Page 14: Implementing a  NoMC  on the  Gidel  platform end-semester presentation

Obstacle1 - Memory bottleneck

14

Each Nios uses ~13Kb onchip memory.

FPGA has only ~70Kb onchip memory.

Only 5 processors fit.

Solutions:o Offchip memory – slow.Reducing program footprint.Using bigger FPGA for the whole network.

Page 15: Implementing a  NoMC  on the  Gidel  platform end-semester presentation

!!

Obstacle2 - Cache coherency

15

DMA buffer

cache line cache line cache line cache line

Cache flush is necessary but not enough! Incoherency in unaligned cache lines.

Solutions:o Not using cache – asynchronic system not effective.o Disabling cache in buffer area – cannot use cache after

DMA transfer. Align DMA buffers to cache lines (using memalign).

Memory

Cache

Page 16: Implementing a  NoMC  on the  Gidel  platform end-semester presentation

Local router Testing

16

Localrouter

NiosII

PC

Simple FIFO*

PIO

NiosII PIO

NiosII PIO

NiosII PIO

Simple FIFO*

Simple FIFO*

Simple FIFO*

Testing Program

* PIO to FIFO connector

• PIO output debug information, data sent/received and results.

• Test program prints the PIO data on screen.• In simulation PIO can be read directly from wave.

Page 17: Implementing a  NoMC  on the  Gidel  platform end-semester presentation

Application

17

Multiple matrix multiplication.

0,0 0,1 0,

1,0 1,1 1,

,0 ,1 ,

n

n

m m m n

a a a

a a a

a a a

0,0 0,1 0,

1,0 1,1 1,

,0 ,1 ,

n

n

m m m n

a a a

a a a

a a a

0,0 0,1 0,

1,0 1,1 1,

,0 ,1 ,

n

n

m m m n

a a a

a a a

a a a

0,0 0,1 0,

1,0 1,1 1,

,0 ,1 ,

n

n

m m m n

a a a

a a a

a a a

0,0 0,1 0,

1,0 1,1 1,

,0 ,1 ,

n

n

m m m n

a a a

a a a

a a a

0,0 0,1 0,

1,0 1,1 1,

,0 ,1 ,

n

n

m m m n

a a a

a a a

a a a

MUL MUL MUL MUL

0,0 0,1 0,

1,0 1,1 1,

,0 ,1 ,

n

n

m m m n

a a a

a a a

a a a

0,0 0,1 0,

1,0 1,1 1,

,0 ,1 ,

n

n

m m m n

a a a

a a a

a a a

0,0 0,1 0,

1,0 1,1 1,

,0 ,1 ,

n

n

m m m n

a a a

a a a

a a a

0,0 0,1 0,

1,0 1,1 1,

,0 ,1 ,

n

n

m m m n

a a a

a a a

a a a

Page 18: Implementing a  NoMC  on the  Gidel  platform end-semester presentation
Page 19: Implementing a  NoMC  on the  Gidel  platform end-semester presentation

19

QuestionsQuestions