Methods for Evaluation of Embedded Systems Simon Künzli, Alex Maxiaguine Institute TIK, ETH Zurich

Preview:

Citation preview

Methods for Evaluation of Embedded Systems

Simon Künzli, Alex Maxiaguine

Institute TIK, ETH Zurich

System-Level Analysis

RISCRISC

DSPDSP

LookUpLookUp

CipherCipher

IP Telephony

Secure FTP

Multimedia streaming

Web browsing

Memory ?

Clock Rate ?

Bus Load ?

Packet Delays ?

Resource Utilization ?

Problems for Performance Estimation

RISCRISC

DSPDSP

SDRAMSDRAM

ArbiterArbiter

• Distributed processing of applications on different resources

• Interaction of different applications on different resources

• Heterogeneity, HW-SW

A “nice-to-have” performance model

• measuring what we want

• high accuracy

• high speed

• full coverage

• based on unified formal specification model

• composability & parameterization

• reusable across different abstraction levels at least easy to refine

Overview over Existing Approachessp

eed

accuracy

Thiele

Ernst

Givargis

Lahiri

BeniniRTL

SPADEJerraya

Discrete-event Simulation

System Model

• Architecture and Behavior• Components/Actors/Processes• Communication

channels/Signals

Event Scheduler

• Event queue

© The MathWorks

future events(e.g. signal changes)

actions to be executed

Accuracy vs. Speed:

How many events are simulated?

Discrete-event Simulation

“The design space”:

Time resolution

Modeling communication

Modeling timing of data-dependent execution

Time Resolution

x(t)

tt2t1 t3 t5t4 t6 t7

x(t)

tt2t1 t3 t5t4 t6 t7

discretetime

cont.time

a a c a c a a

a a c a c a a accu

racy

• Continuous time e.g. Gate-level simulation

• Discrete time or “cycle-accurate” e.g. Register Transfer Level (RTL) simulation system-level performance analysis

Modeling communication

• Pin-level model

all signals are modeled explicitly often combined with RTL

• Transaction-level Model

protocol details are abstracted e.g. burst mode transfers

• TLM simulator of AMBA bus x100 faster then pin-level model

Caldari et al. Transaction-Level Models for AMBA Bus Architecture Using SystemC 2.0. DATE 2003

C1 C2

ready

d0d1d2

C1 C2

<write> transactiontrue/false

Modeling timing of data-dependent execution

Problem: • How to model timing of data-

dependent functionality inside a component?

Possible solution: Estimate and annotate delays in the functional/behavioral model:

a=read(in)

a > b

task1()

write(out,c)

task2()

in out

d2d1a=read(in);

if(a>b) {

task1();

delay(d1);

else {

task2();

delay(d2);}

write(out,c);

• this approach works well for HW but may be too coarse for modeling SW

HW/SW Cosimulation Options

Application SW...

• … is delay-annotated & natively executes on workstation as a part of HW simulator

• … is compiled for target processor and its code is used as a stimuli to processor model that is a part of HW simulator

• … is not a part of the HW simulator -- a complete separation of Application and Architecture models

Processor Models: Simulation Environment

HW Sim. (rest of the system)

ProcessorModel

wrapper

RTL

Microarch.Sim. ISS

C/C++

Application SW

Compiler.exe

prog.code

Processor Models

• RTL model cycle-accurate or continuous time all the details are modeled (e.g. synthesizable)

• Microarchitecture Simulator cycle-accurate model models pipeline effects, etc can be generated automatically

(e.g. Liberty, LISA…)

• Instruction Set Simulator provides instruction count functional models of instructions

e.g. SimpleScalar

Multiprocessor System Simulator

L Benini, U Bologna

SystemC model

Cycle-accurate ISS

SystemCWrapper

Comparison of HW/SW Co-simulation techniques

simulator speed

(instructions/sec)

continuous time

(nano-second accurate)

1 - 100

cycle-accurate 50 – 1000

instruction level 2000 – 20,000

J. Rowson, Hardware/Software Co-Simulation, Proceedings of the 31st DAC, USA,1994

HW/SW Co-simulation Options

Application SW...

• … is delay-annotated & natively executes on workstation as a part of HW simulator

• … is compiled for target processor and its code is used as a stimuli to processor model that is a part of HW simulator

• … is not a part of the HW simulator -- a complete separation of Application and Architecture models

Independent Application and Architecture Models (“Separation of Concerns”)

RISCRISCDSPDSP

SRAMSRAM

Application

Architecture

Mapping

WORKLOAD

RESOURCES

Co-simulation of Application and Architecture Models

Basic principle: Application (or functional) simulator drives architecture (or

hardware) simulator The models interact via traces of actions The traces are produced

on-line or off-line

Advantages: system-level view flexible choice of abstraction level the models and the mapping can be easily altered

Trace-driven Simulation

SPADE: System level Performance Analysis and Design space Exploration

Application model

Architecture model

P. Lieverse et al., U Delft & Philips

Trace-driven Simulation (SPADE)

Lieverse et al., U Delft & Philips

Going away from discrete-event simulation…

Analysis for Communication SystemsLahiri et al., UC San Diego

A two-step approach:

1. simulation without communication (e.g. using ISS)

2. analysis for different communication architectures

K. Lahiri, UCSD

Overview

K. Lahiri, UCSD

Analytical Methods for Power Estimation

• Givargis et al. UC Riverside

• Analytical models for power consumption of: Caches Buses

• two-step approach for fast power evaluation collect intermediate data using simulation use equations to rapidly predict power couple with a fast bus estimation approach

Approach Overview Givargis, UC Riverside

• Bus equation:• m items/second (denotes the traffic N on the bus)• n bits/item• k bit wide bus• bus-invert encoding• random data assumption

222

21

2

1

2

1

2

1

1 k

k

nmCP

k

k

k

k

k

k

k

bus

Experiment Setup Givargis, UC Riverside

CProgram

TraceGenerator

CacheSimulator

CPUPower

ISS

Performance

+

Pow

er

MemoryPower

BusSimulator

I/D CachePower

• Dinero [Edler, Hill]

• CPU power [Tiwari96]

Analytical Method

scheduling discipline 1

e1

e2

CPU1

scheduling discipline 2

e3

e4

CPU2

?

?

Workload ?

periodic with jitter

J J JTT

periodic with burstTb

t

b

t

periodicTT

sporadicxt xt xt

Event Model Interface Classification Ernst, TU Braunschweig

jitter = 0burst length (b) = 1

t = T - J

t = T

t = t

lossless EMIF EMIF to less expressive model

T=T, t=T, b=1 T=T, J=0

Example: EMIFs & EAFs

scheduling discipline 1

e1

e2

CPU1

scheduling discipline 2

e3

e4

CPU2

?

?EMIF

EMIF

EAF

Event model interface needed

Event adaptation

function needed

Use standard scheduling analysis for single components.

General Framework

Functional Task Model

Abstract Task Model

Architecture Model

Abstract Components(Run-Time Environment)

T1 T2 T3

ARM9 DSP

Abstract Architecture

loadscenari

os

resource units

mapping

relations

functional

units

event streams

abstract resource

units

abstract functional

units

abstract event

streams

abstract load

scenarios

max: 2 packetsmin: 0 packetsmax: 3 packetsmin: 1 packet

u

l

Event & Resource Models

• use arrival curves to capture event streams• use service curves to capture processing capacity

time t

max: 1 packetmin: 0 packets

0 1 2

# of packets

1

2

3

Analysis for a Single Component

ul ,

ul ,

ul ,αα

ul ,

Analysis – Bounds on Delay & Memory

u,l

u,l

delay d

backlog b

service curve l

arrival curve u

b

Comparison between diff. Approaches

Simulation-Based

• can answer virtually any questions about performance

• can model arbitrary complex systems

• average case (single instance)

• time-consuming• accurate

Analytical Methods

• possibilities to answer questions limited by method

• restricted by underlying models

• good coverage (worst case)

• fast• coarse

Example: IBM Network Processor

Comparison RTC vs. Simulation

0

10

20

30

40

50

60

70

80

90

100Mbps 150Mbps 200Mbps 250Mbps 300Mbps 350Mbps 400Mbps

Linespeed

Uti

liza

tio

n [

%]

Simulation Analytical Method

OP B

PLB

w

rite

PLB

re

ad

Experiment Results Givargis, UC Riverside

0

0. 05

0. 1

0. 15

0. 2

0. 25

0. 3

Conf 0 Conf 1 Conf 2 Conf 3 Conf 4 Conf 5 Conf 6 Conf 7 Conf 8 Conf 9Execu

tio

n T

ime (

sec)

•Diesel application’s performance•Blue is obtained using full simulation•Red is obtained using our equations

4% error320x faster

Concluding Remarks

Backup

Metropolis Framework

Cadence Berkeley Lab & UC Berkeley