49
1 A methodology for design space exploration of on-chip networks Luciano Lavagno Politecnico di Torino, Italy [email protected] Cadence Berkeley Labs, CA http://polimage.polito.it/~lavagno Laura Vanzago STMicroelectronics laura [email protected]

1 A methodology for design space exploration of on-chip networks Luciano Lavagno Politecnico di Torino, Italy [email protected] Cadence Berkeley Labs,

Embed Size (px)

Citation preview

1

A methodology for design space exploration of on-chip networks

Luciano LavagnoPolitecnico di Torino, Italy [email protected] Berkeley Labs, CA http://polimage.polito.it/~lavagno

Laura VanzagoSTMicroelectronics [email protected]

MPSOC summer school, Chateau de Pizay, July 2002

MPSOC 2002 - 2Luciano Lavagno ©

OutlineSystem-on-chip design flow

–Functional and architectural modeling–Mapping–Performance simulation–Communication refinement– Implementation

Case study: wireless LAN architectural exploration– functional model–on-chip communication architectural model–design space exploration

MPSOC 2002 - 3Luciano Lavagno ©

The System-On-Chip Design FlowSpecify:

–What does the customer really want?Architect:

–What is the most cost and performance effective architecture to implement it?

–What existing components can I adapt and re-use?Evaluate:

–What is the performance impact of a cheaper architecture? Implement:

–What can I generate automatically from libraries and customization?

Idea: separate computation, communication and performance

MPSOC 2002 - 4Luciano Lavagno ©

The System-On-Chip Design Flow

CommunicationRefinement

Mapping

SystemBehavior

SystemArchitecture

PerformanceSimulation

BehaviorSimulation

21

3

4Flow To Implementation

MPSOC 2002 - 5Luciano Lavagno ©

The System-On-Chip Design Flow

ŒAnnotation

of architectural timing and energy

onto behavior

PerformanceSimulation

behavior annotated with architectural effects

ŽAnalyze / Visualize

Results

MPSOC 2002 - 6Luciano Lavagno ©

Functional Modeling

MPEG Decoder

VLD IDCT MC DISPLAYBAIZ,IQ

BAMEM

BAMEMM M M M MM

M

MPSOC 2002 - 7Luciano Lavagno ©

Communication Refinement

VLD IDCT MC DISPLAYBAIZ,IQ

BAMEM

BAMEMM M M M MM

M

VLD IDCT MC DISPLAYBAIZ,IQ

BAMEM

BAMEMM MM

M

REAS

BUS

M

SEG

M

REAS

M

M

SEG

M

SEG

M

REAS

MPSOC 2002 - 8Luciano Lavagno ©

Optimization

VLD IDCT DISPLAYBAIZ,IQ

BAMEMM MMC BA

MEMM

M

REAS

BUS

M

SEG

M

REAS

M

M

SEG

M

SEG

M

REAS

MPSOC 2002 - 9Luciano Lavagno ©

Functional modeling

CommunicationRefinement

Mapping

SystemBehavior

SystemArchitecture

PerformanceSimulation

BehaviorSimulation

21

3

4Flow To Implementation

MPSOC 2002 - 10Luciano Lavagno ©

Architectural Modeling

CommunicationRefinement

Mapping

SystemBehavior

SystemArchitecture

PerformanceSimulation

BehaviorSimulation

21

3

4Flow To Implementation

MPSOC 2002 - 11Luciano Lavagno ©

MappingCommunication

Refinement

Mapping

SystemBehavior

SystemArchitecture

PerformanceSimulation

BehaviorSimulation

21

3

4Flow To Implementation

MPSOC 2002 - 12Luciano Lavagno ©

OutlineSystem-on-chip design flow

–Functional and architectural modeling–Mapping–Performance simulation–Communication refinement– Implementation

Case study: wireless LAN architectural exploration– functional model–on-chip communication architectural model–design space exploration

MPSOC 2002 - 13Luciano Lavagno ©

Separate Delay ModelSeparate Delay Model

Annotated IP Functional Model

my_ip() {f = x.read();r = f * k;__DelayCycles(2);y.write(r); }

IP Functional Modelmy_ip() {f = x.read();r = f * k;y.write(r); }

Delay Script// HW implemdelay() {input(x);run();delay(2.0 / cps);output(y); }

Inline Delay ModelInline Delay Model

Functional Modelmy_ip() {f = x.read();r = f * k;y.write(r); }

Performance ModelingCommunication

Refinement

Mapping

SystemBehavior

SystemArchitecture

PerformanceSimulation

BehaviorSimulation

21

3

4Flow To Implementation

MPSOC 2002 - 14Luciano Lavagno ©

Software Performance Estimation

PerformanceEstimation

Compile

generated C andrun natively

ld ld op ld li op ts -- br

Analyse

basic blockscompute delays

Virtual MachineInstructions

ArchitectureCharacterization

ŽGenerate new C

with delay annotations

v__st_tmp = v__st;__DELAY(LI+LI+LI+LI+LI+LI+OPc);startup(proc);if (events[proc][0] & 1) {__DELAY(OPi+LD+LI+OPc+LD+OPi+OPi+IF); goto L16;}

ŒSpecify behavior

and I/O

ANSI CInput

v__st_tmp = v__st;startup(proc);if (events[proc][0] & 1) goto L16;

MPSOC 2002 - 15Luciano Lavagno ©

Communication refinement

P C

Module Interface

Bus independent Virtual Component Interface Write, Read (address, bus-able data chunk...) VCI

P C

VCI to Physical-Bus Wrapper

Physical Bus Transferse.g. Arbitrated PIBus protocol PHY

P C

Process

APPDelay Independent APIe.g. unbounded FIFO Write, Read (vector of «any» type )

HW/SW Independent System Communications e.g. Bounded FIFO

P C

Module

SYS

MPSOC 2002 - 16Luciano Lavagno ©

Communication refinement

fifowtr

dest

ITC

fifowriter

MEM

prod-ucer

chan.writer

wakeup

writer

filter

chan.reader

RTOSqueue

fiforeader

n

CPU

RTOS board support package

wrapper

hw tasksw task

Interrupt service routine

src

VCI

PHY

SYS

SYS

MPSOC 2002 - 17Luciano Lavagno ©

Communication RefinementA B

CPU Mem

RTOS Computation

mutex_lckmemcpy

signal

Post(5)

write

busRequest

arbiterReq/Release

busResponse

setEnabled

Value()

waitmemcpy

signal

read

busRequest

arbiterReq/Release

busResponse

SwMutexes

BusMaster SlaveAdapter

BusArbiter

MemoryAccess Arch. Services

SemaphoreProtectedSemProt_Send SemProt_Recv

Comm. Services

MPSOC 2002 - 18Luciano Lavagno ©

Mapping communication links to a pattern

MPSOC 2002 - 19Luciano Lavagno ©

Mapping communication links to a pattern

MPSOC 2002 - 20Luciano Lavagno ©

Communication abstraction layer refinement

HW SW

CommunicationRefinement

Mapping

SystemBehavior

SystemArchitecture

PerformanceSimulation

BehaviorSimulation

21

3

4Flow To Implementation

RP E -LTPE nc oder

ChannelCoder

Interleaver

M odulator

Dem odulatorDe-interleaver

ChannelDec oder

RP E -LTPDec oder

F ro m R F

T o R F

S p e e c hO ut

P rotoc olS tac k

Con tro lData

M ult ip lex er

De-m ult ip lex er

S p e e c h In

C o n tro lD a ta

P rotoc olS tac kFigure 1

D SPD S P

Interna lRA M

BUS

M ic ro p ro c e s s o r RA MD e d ic ate dHard ware

Figure 2

Figure 3

Ad

dre

ss

de

co

de

r

Dri

ve

r

HW SW

SemProt_Send

A B

CPU Mem

RTOS

mutex_lock;

memcpy;

signal

Post(5)

write

busRequest

arbiterRequest/Release

busIndication

setEnabled

Value()

wait;

memcpy;

signal

read

SemaphoreProtected

busRequest

arbiterRequest/Release

busIndication

User Visible SwMutexes

BusMaster SlaveAdapter

BusArbiter

MemoryAccess Architecture Services

SemProt_Send SemProt_Recv

Pattern Services

MPSOC 2002 - 21Luciano Lavagno ©

Performance simulation by mapping

F1 F2 F3 Function

A1 A2 A3Architecture

Mapping

F1 F2 F3Comm Comm

Arbiter

Performancesimulation

model

MPSOC 2002 - 22Luciano Lavagno ©

Performance simulation

Software Gantt Charts

Architecture Analysis

CommunicationRefinement

Mapping

SystemBehavior

SystemArchitecture

PerformanceSimulation

BehaviorSimulation

21

3

4Flow To Implementation

MPSOC 2002 - 23Luciano Lavagno ©

Exploring Design Trade Offs

Iteration through different mapping experimentsGradual refinement of the designEvaluation

–of the "refined" design–of system performance after implementation

Export implementation to–Testbench and top-level netlist–Hardware netlist–Software RTOS customization

MPSOC 2002 - 24Luciano Lavagno ©

OutlineSystem-on-chip design flow

–Functional and architectural modeling–Mapping–Performance simulation–Communication refinement– Implementation

Case study: wireless LAN architectural exploration– functional model–on-chip communication architectural model–design space exploration

MPSOC 2002 - 25Luciano Lavagno ©

Implementation by mapping

F1 F2 F3

Intfc Intfc

BUS

F1 F2 F3

A1 A2 A3

Function

Architecture

Intfc Intfc

Mapping

Implementationmodel

MPSOC 2002 - 26Luciano Lavagno ©

CommunicationRefinement

Mapping

SystemBehavior

SystemArchitecture

PerformanceSimulation

BehaviorSimulation

21

3

4

Flow to Implementation

Flow To Implementation

Flow To Implementation

HardwareTop-level

SystemTest Bench

Softwareon RTOS

System ExplorationCommunication Refinement

Export refined design to co-verification and implementation tools

MPSOC 2002 - 27Luciano Lavagno ©

Flow to Implementation

Architecture

TaskX

TaskY

RTOSISRW

CPUASIC

bus

RAMROM

Behavior

A

B

C

D

E F

G

IJ

H

MPSOC 2002 - 28Luciano Lavagno ©

TaskX

TaskY

RTOS

CPU

A

B

C

D

E

Customizing RTOS$<StandardHeader,'RTOS rootialization'>$<RtosAndCpuIncludes>$<BoardSupportPackageIncludes>$<LynxSwIncludes>/* Device Driver includes/device handle decls */$<LynxDriverIncludes>$<LynxDeviceHandleDecs>/* Mutex semaphore per protected data-buffer */$<MutexVariableDefinitions>/* Define an identifier for each task */$<TaskIdDefinitions>

void root(void ) {/* Mutex semaphore per protected data-buffer */ $<CreateMutexes>/* Create each software task */ $<CreateTasks>/* Register interrupt service routines */ $<RegisterInterrupts>/* Schedule each software task. */ $<StartTasks>/* Delete or suspend the root task. */ $<DeleteSelf>}

#include <psos.h>#include "init.h"#include "tasks.h"/* Device Driver includes/device handle decls */#include "drivers.h"/* Mutex semaphore per protected data-buffer */unsigned long I_24_I_50_MainDisp_mutex;unsigned long I_24_I_50_SubDisp_mutex;/* Define an identifier for each task */unsigned long task_I_13_I_6__ready;unsigned long task_I_26__ready;

void root(void ) {/* Mutex semaphore per protected data-buffer */k_fatal(0x20000004, K_LOCAL);k_fatal(0x20000004, K_LOCAL);/* Create each software task */ if (t_create("T0", 10, 1024, 1024, T_LOCAL|,&task_I_13_I_6__ready)) k_fatal(0x20000001, if (t_create("T1", 11, 1024, 1024, T_LOCAL|,&task_I_26__ready)) k_fatal(0x20000001,…

MPSOC 2002 - 29Luciano Lavagno ©

TaskX

TaskY

RTOS

CPU

A

B

C

D

E

Creating SW Communication Code #include <psos.h> #define LYNX_BEGIN_ATOMIC() OSDisableInt() #define LYNX_END_ATOMIC() OSEnableInt() #define LYNX_SET_PENDING(taskEventName) ev_receive(allevents, \

(EV_ANY || EV_NOWAIT), 0, events_r) #define LYNX_SET_READY(taskEventName) ev_send(taskEventName, allevents) #define LYNX_MUTEX_REQUEST(mutex) sm_p(mutex, SM_WAIT, 0) #define LYNX_MUTEX_RELEASE(mutex) sm_v(mutex) #define LYNX_ISR_ENTER() OSEnterISR() #define LYNX_ISR_EXIT() OSExitISR()

void lynx_Run(lynx_inst_ident_t inst_id){ char buffinput[10] = ""; if (lynx_Enabled(inst_id,in)){ lynx_Value(inst_id, in, &buffinput); ... behaviour d functionality .... lynx_Post(inst_id, out, &buffinput);}#define I_31_I_64_Value_MainDisp(inst_id, buff_p) \

( ( \ (LYNX_MUTEX_REQUEST(I_3_DM_1_X_mutex)), \ (LYNX_MEMCPY(buff_p,&I_3_DM_1_X,sizeof(I_3_DM_1_X))), \ (Probe_I_31_I_64_Value_MainDisp), \ (LYNX_MUTEX_RELEASE(I_3_DM_1_X_mutex)) ), &I_3_DM_1_X\)

MPSOC 2002 - 30Luciano Lavagno ©

Creating HW Communication CodeD G

I

J

H

TaskY

RTOSISRW

CPU

ASIC

bus

InterruptRegisterMapped

Data bus (16)

ASICCPUAddressr bus

DBus Intface

64

ISR

Interrupt bus

= data buffer

= presence bit

= memcpy (64 bit read)

= frozen data buffer

= LYNX

InterruptRegisterMapped16

G

I

H8

32

InterruptRegisterMapped16

J

64

MPSOC 2002 - 31Luciano Lavagno ©

Creating Testbench

A

C B D

Test1

Test2

System-level Simulation

Results DB

Results DB

A

C B D

Co-Verification

Source

Source

MPSOC 2002 - 32Luciano Lavagno ©

Comparing Results

MPSOC 2002 - 33Luciano Lavagno ©

OutlineSystem-on-chip design flow

–Functional and architectural modeling–Mapping–Performance simulation–Communication refinement– Implementation

Case study: wireless LAN architectural exploration– functional model–on-chip communication architectural model–design space exploration

MPSOC 2002 - 34Luciano Lavagno ©

Case study: wireless LAN physical layer

OFDM Physical Layer/Digital BB

MAC

OFDM TX OFDM RXDynamicReconfiguration

Network

Application

HiperLan/2

PicoRadio

Protocol Stack

MultiMedia WirelessNetworks;High Rate: 10 Mb/secLow Power: 10-100 mW

Ad Hoc Networks:Low Rate: b/sec - kb/secLow Power: 100W

MPSOC 2002 - 36Luciano Lavagno ©Implementation

Design Flow Application Specification

Algorithm Exploration

Functional Simulation and Refinement

Architecture Exploration:Performance Simulation

Architecture Refinement

COSSAP/C (Matlab/Simulink, …)

English (UML, …)

VCC (SystemStudio, …)

VCC (SystemStudio, …)

TX

OFDM RXOFDM TX

RX

OFDM Physical Layer

MAC, Network and Higher Layers

Functional IP Reuse

Mapping

C

Mapping

Functional Partitioning

MPSOC 2002 - 37Luciano Lavagno ©

Top-level Hiperlan/2 Functional Model

MPSOC 2002 - 38Luciano Lavagno ©

Hiperlan/2 OFDM Transmitter

MPSOC 2002 - 39Luciano Lavagno ©

Hiperlan/2 OFDM Receiver

MPSOC 2002 - 40Luciano Lavagno ©

Heterogeneous BehaviorMAC

GoT GoR

Preamble

DataPath

TX – Static Dataflow RX – Dynamic Dataflow

Sym

DataPath

Sync

CostToRCostToT

RXTX

IdleState

GoT GoR

GoT/GoR

Control-FSM

N N

MPSOC 2002 - 41Luciano Lavagno ©

void CPP_MODEL_IMPLEMENTATION::Init() { ….; Length = LenghtPar.Value(); // read parameter // Set data rate on 2 input ports: Real and Imag Real.SetDataRate(Length); Imag.SetDataRate(Length);}// Run() is executed every time the firing rule is satisfiedvoid CPP_MODEL_IMPLEMENTATION::Run() { for (i=0; i<Real.GetDataRate(); i++) {

// Read data from the input ports data[i] [0] = Real.Value(); data[i] [1] = Imag.Value(); } // Call the FFT procedure (C functional model) fft_cns_rot_bfp(data,….); // Write data to two output ports (OutReal, OutImag) for( i=0; i< Lenght; i++) {

OutReal.Post(data[i][0]);OutImag.Post(data[i][1]);}}

}}

Example of functional block

FFTReal

Imag

OutReal

OutImag

LenghtPar = 64

Imported from Cossap environment

64

64

MPSOC 2002 - 42Luciano Lavagno ©

OutlineSystem-on-chip design flow

–Functional and architectural modeling–Mapping–Performance simulation–Communication refinement– Implementation

Case study: wireless LAN architectural exploration– functional model–on-chip communication architectural model–design space exploration

MPSOC 2002 - 43Luciano Lavagno ©

Wireless LAN physical layer SOC architecture

FPGA

FFT

FIR

UART

BUFFER

FPGA config. mem.

Int. bridge

Micro -

Clock

gen.

SPS2

(instruction/

data RAM)

XBARInterface

Processor busInterface

XBAR

Processor bus

JtagInterface

DPR2/SPS2

Bridge

TEST(0..2)

Ck, reset

CK2 CK1 MCK VDD VSS Reset

I/D caches

Datapath

MPSOC 2002 - 44Luciano Lavagno ©

Crossbar featuresThe crossbar model is flexible in the number of supported masters and slaves (evaluated at simulation initialization time)

A prioritized FIFO is used to arbitrate multiple master requests for each slave

Number of parallel slave accesses defined through a parameter

A transmission can be suspended by higher priority requests (preemptive)

Arbitration overhead and slave access delays are parameterized

MPSOC 2002 - 45Luciano Lavagno ©

Crossbar Architecture Service Structure

Beh1 Beh2Post() Value() Behavior

Network

XBAR

FPGA MEM FFTArchitecture Network

DFSHAREDMEMORY (HW->HW)

Communication Pattern

XBarArbiter

Sender Receiver

XBarMaster

XBarSlave

Memory

XBarMasterFPGA PORT

MEM

MEM PORT

FFT PORT

BUS

Service Stack

Mem0Mem1

MPSOC 2002 - 46Luciano Lavagno ©

Crossbar Architecture Service Structure

Beh1 Beh2Post() Value() Behavior

Network

XBAR

FPGA MEM FFTArchitecture Network

DFREGISTERDIRECT (HW->HW)

Communication Pattern

XBarArbiter

Sender Receiver

XBarMaster XBarMasterFPGA PORT FFT PORT

BUS

Service Stack

MPSOC 2002 - 48Luciano Lavagno ©

OutlineSystem-on-chip design flow

–Functional and architectural modeling–Mapping–Performance simulation–Communication refinement– Implementation

Case study: wireless LAN architectural exploration– functional model–on-chip communication architectural model–design space exploration

MPSOC 2002 - 49Luciano Lavagno ©

Design Space Exploration

Explored several computation/communication architecture configurations

–FFT throughput (1 item / 4 clock cycles vs 1 item / 1 clock cycle)

–Number of buffer ports FFT FIR on crossbar–FPGA FFT communication pattern

–Shared Memory–Register Direct

MPSOC 2002 - 50Luciano Lavagno ©

Exploration Results

0

500

1000

1500

2000

2500

3000

3500

4000

SimA1 SimB1 SimC1 SimD1 SimA2 SimB2 SimC2 SimD2

FPGA FFT FIR

FFT:1/4 FFT:1/1

RD SM RDSM

5.8 7.2 8.2 8.2 9.6 13.7 12.5 15.6BitRate (Mb/s) 12

1P 2P 1P 2P 1P 2P 1P 2P

Hiperlan/2spec.

MPSOC 2002 - 51Luciano Lavagno ©

Conclusion

System-On-Chip Design requires methodology, tools and libraries

Separate computation, communication and architecture–computation: compiled and scheduled–communication: refined via patterns

Map computation and communication onto platform–simulate performance–generate implementation model for HW, SW and

communication