45
System-level Exploration for Pareto- System-level Exploration for Pareto- optimal Configurations in optimal Configurations in Parameterized Systems-on-a-chip Parameterized Systems-on-a-chip Architectures Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center for Embedded Computer Systems University of California Irvine, CA 92697 [email protected]

System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

  • View
    226

  • Download
    0

Embed Size (px)

Citation preview

Page 1: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

System-level Exploration for Pareto-System-level Exploration for Pareto-optimal Configurations in Parameterized optimal Configurations in Parameterized

Systems-on-a-chip ArchitecturesSystems-on-a-chip Architectures

Tony Givargis (Frank Vahid, Jörg Henkel)Center for Embedded Computer Systems

University of California

Irvine, CA 92697

[email protected]

Page 2: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

2

OverviewOverview

Given:– Parameterized SOC

architecture

Explore

0

200

400

600

800

1000

1200

1400

0 200 400 600 800 1000 1200 1400 1600 1800

Execution Time (us)

Powe

r (uW

)

void main(){ while(1){ Receive(); Decode(); Display(); }} Application– Fixed application

Automatically explore the design space

Find optimal points w/respect to power and performance

SOCCPU Memory

JPEGCODEC

Math/FPU

UART

I$-D$BRIDGE

Size = {1K, 4K, 8K}Line = {4, 8, 16}Assoc = {1, 2, 4}

Page 3: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

3

MotivationMotivation

Design trends:– Growing demand for

portable devices– Growing demand for

low power design– Increased application

complexity– Shrinking time-to-

market windows

Technology trends:– Increased chip

capacity– Increased I/O pins– Improved on-chip

integration techniques (storage, digital, analog, digital, …)

– SOC era

Need for greater designer productivity!

Page 4: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

4

SOCCPU Memory

JPEGCODEC

Math/FPU

UART

MMXBRIDGE

?

MotivationMotivation

One approach: reuse of existing IP

???

?

– IP selection ?

MIPS

RAM

JPEGCODEC1 Math/FPU

UART

ISABRIDGE

ARM

SRAM

DRAM

AMBABRIDGE

JPEGCODEC2

USB

– IP integration ?

– SOC verification ?– Multi-source IP

licensing– More…

Page 5: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

5

MotivationMotivation

Alternate approach: reuse of SOC– Designed, integrated, tested– Domain specific– Parameterized

Designed by firms specializing in SOC

User: map application, then, “configure-and-execute”

(successors to microcontrollers!)

Parameterized SOC

CPU Memory

JPEGCODEC

Math/FPU

UART

MMXBRIDGE

Page 6: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

6

MotivationMotivation

Composed of 100s of cores

Cores are “configurable”

Configurations impact power/performance

Large number of total configurations!

Architecture is otherwise fixed!

Parameterized SOC

CPU Memory

JPEGCODEC

Math/FPU

UART

MMXBRIDGE

Page 7: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

7

MotivationMotivation

ATI Technologies – XILLEON™ 220 SOC for Digital Set-top Box Market

Tensilica – Xtensa™ 1040 configurable processor cores

Philips Semiconductors – Velocity RSP9™ SOC platforms

Adelante Technologies – offers complete SOC customizable platforms for DSP domains

More…

Page 8: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

8

OutlineOutline

Previous work Target architecture Power/performance estimation Parameter space exploration Experiments Conclusion

Page 9: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

9

OutlineOutline

Previous work Target architecture Power/performance estimation Parameter space exploration Experiments Conclusion

Page 10: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

10

Previous WorkPrevious Work

Parameterized SOC design– [Malik00], [Veidenbaum99], [Vahid99], [Stan95]

Power/performance evaluation– [Barndolese00], [Simunic99], [Li98], [Tiwari94]

Design space exploration (manual)– [givargis99], [Lieverse99]

Design space exploration (automatic)– Focus of this work…

Page 11: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

11

Previous WorkPrevious Work

ArchitectureApplicationApplicationApplicationApplicationApplication

Mapping

Analysis

Numbers

Auto

Y-chart [Lieverse99]

Page 12: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

12

OutlineOutline

Previous work Target architecture Power/performance estimation Parameter space exploration Experiments Conclusion

Page 13: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

13

Target ArchitectureTarget Architecture

UART

MIPSI-Cache

D-Cache

Bridge

Peripheral Bus

DCT CODEC

Memory

DMA

Page 14: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

14

Target ArchitectureTarget Architecture

Voltage scale Size, line,

associativity Bus width,

encoding (gray, invert)

UART tx/rx buffer size

DCT resol. UART

MIPSI-Cache

D-Cache

Bridge

Peripheral Bus

DCT CODEC

Memory

DMA

Page 15: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

15

Target ArchitectureTarget Architecture

Voltage scale Size, line,

associativity Bus width,

encoding (gray, invert)

UART tx/rx buffer size

DCT resol. UART

MIPSI-Cache

D-Cache

Bridge

Peripheral Bus

DCT CODEC

Memory

DMA

Page 16: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

16

Target ArchitectureTarget Architecture

Voltage scale Size, line,

associativity Bus width,

encoding (gray, invert)

UART tx/rx buffer size

DCT resol. UART

MIPSI-Cache

D-Cache

Bridge

Peripheral Bus

DCT CODEC

Memory

DMA

Page 17: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

17

Target ArchitectureTarget Architecture

Voltage scale Size, line,

associativity Bus width,

encoding (gray, invert)

UART tx/rx buffer size

DCT resol. UART

MIPSI-Cache

D-Cache

Bridge

Peripheral Bus

DCT CODEC

Memory

DMA

Page 18: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

18

Target ArchitectureTarget Architecture

Voltage scale Size, line,

associativity Bus width,

encoding (gray, invert)

UART tx/rx buffer size

DCT resol. UART

MIPSI-Cache

D-Cache

Bridge

Peripheral Bus

DCT CODEC

Memory

DMA

Page 19: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

19

Target ArchitectureTarget Architecture

26 parameters 1014

configurations What are the

optimal configuration (given a fixed application)?

UART

MIPSI-Cache

D-Cache

Bridge

Peripheral Bus

DCT CODEC

Memory

DMA

Page 20: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

20

Problem SummaryProblem Summary

What are the possible power/performance tradeoffs? (100 trillion)

Need to efficiently evaluate power/performance (1/sec150,000 years)

Need to explore the configuration space

Parameterized SOC

CPU Memory

JPEGCODEC

Math/FPU

UART

MMXBRIDGE

Page 21: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

21

OutlineOutline

Previous work Target architecture Power/performance estimation Parameter space exploration Experiments Conclusion

Page 22: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

22

Power EvaluationPower Evaluation

Exploration works with:– Chip instrumentation

(real-time)– System-level simulation– RTL simulation– Gate-level simulation– Circuit-level simulation

Relative accuracy required!

Digital camera application mapped on our SOC, capturing

1 image.

020000400006000080000

100000120000140000160000180000

1st Qtr

ChipSystemRTLGateCircuit

1 440

5400 28

800

1800

00

Page 23: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

23

Power EvaluationPower Evaluation

Exploration works with:– Chip instrumentation

(real-time)– System-level simulation– RTL simulation– Gate-level simulation– Circuit-level simulation

Relative accuracy required!

Digital camera application mapped on our SOC, capturing

1 image.

020000400006000080000

100000120000140000160000180000

1st Qtr

ChipSystemRTLGateCircuit

1 440

5400 28

800

1800

00

Page 24: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

24

Power Evaluation - ProcessorPower Evaluation - Processor

[Tiwari94/00]’s instruction-level

Measure watt/inst

Account for stalls + dependency

Apply traces UART

MIPSI-Cache

D-Cache

Bridge

Peripheral Bus

DCT CODEC

Memory

DMA

Page 25: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

25

Power Evaluation – Cache/Mem.Power Evaluation – Cache/Mem.

[Evans95] Capacitance

model of sub- components

Switching obtained via simulation (parameter dependent)

UART

MIPSI-Cache

D-Cache

Bridge

Peripheral Bus

DCT CODEC

Memory

DMA

Page 26: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

26

Power Evaluation – BusesPower Evaluation – Buses

[Chern92] Model bus

capacitance Switching

derived from I/O traffic (parameter dependent)

UART

MIPSI-Cache

D-Cache

Bridge

Peripheral Bus

DCT CODEC

Memory

DMA

Page 27: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

27

Power Evaluation – PeripheralsPower Evaluation – Peripherals

Observation: cores execute instructions!

Apply a technique similar to that used for processors! UART

MIPSI-Cache

D-Cache

Bridge

Peripheral Bus

DCT CODEC

Memory

DMA

Page 28: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

28

Power Evaluation – SummaryPower Evaluation – Summary

UART (5%)

MIPS (10%)I-Cache (8%)

D-Cache (8%)

Bridge (5%)

Peripheral Bus

DCT CODEC (5%)

Memory (8%)

DMA (5%)

~50-100K instruction/second! (Platune)

Page 29: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

29

OutlineOutline

Previous work Target architecture Power/performance estimation Parameter space exploration Experiments Conclusion

Page 30: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

30

ExplorationExploration

Problem formulation P1, P2, … , Pn

A configuration (point) is an assignment of values to all parameters

How to efficiently generate all Pareto-optimal configurations?

0

200

400

600

800

1000

1200

1400

0 200 400 600 800 1000 1200 1400 1600 1800

Execution Time (us)

Powe

r (uW

)

Page 31: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

31

Exploration Exploration

* = 320 pointsAlgorithm Idea A (10)

B (32)

A and B interdependent+ = 42 points A and C are independent

A (10)

C (32)

C and B are independentC

(32)B

(32)+ = 64 points

138 points

With knowledge about dependency we prune 98.6%

* * = 10240 pointsB (32)

C (32)

A (10)

Directed graph

Page 32: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

32

ExplorationExploration

A B : Pareto-optimal configurations of B calculated after Pareto-optimal configurations of nodes along the path A B

A B A, (cycle) : Pareto-optimal configurations of all the parameters on the cycle calculated simultaneously

A : Pareto-optimal configurations calculated in isolation

Page 33: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

33

Exploration Exploration

AB

C

D

J KE

F

G

H I

N O

L M

R S

P Q

V W

T U

X

YZ

Node Core Parameter

A MIPS

Voltage scale

B I$ Total size

C Line size

D Associativity

E D$ Total size

F Line size

G Associativity

H CPU I$

bus

Data bus width

I Data bus code

J Addr bus width

K Addr bus code

X UART

Tx buffer size

Y Rx buffer size

Node Core Parameter

L CPU D$ bus

Data bus width

M Data bus code

N Addr bus width

O Addr bus code

P I/D$ Mem bus

Data bus width

Q Data bus code

R Addr bus width

S Addr bus code

T Peripheral bus

Data bus width

U Data bus code

V Addr bus width

W Addr bus code

Z DCT CODE

C

Pixel resolution

Dependency Graph

Page 34: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

34

AB

C

D

J KE

F

G

H I

N O

L M

R S

P Q

V W

T U

X

YZ

Dependency graph Based on designer

knowledge Computed by

simulating all pairs of nodes (quadratic time complexity, approx.)

One time effort

ExplorationExploration

Page 35: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

35

Exploration – Algorithm Exploration – Algorithm

Step 1: Clustering followed by simulation

A

B

C

D

J K

E

F

GH I

N O

L M

R S

P Q

V W

T UX

Y Z

Page 36: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

36

Exploration – Algorithm Exploration – Algorithm

A,H,I B,C,D,E,F,G

J,K,T,U

L,M,P,Q

N,O,V,W X,Y,R,S

Z

A,H,I,B,C,D,E,F,

G

J,K,T,U,Z

L,M,P,Q,N,O,V,W

X,Y,R,S

A,H,I,B,C,D,E,F,G,J,K,T,U,Z

L,M,P,Q,N,O,V,W,X,Y,R,S

A,H,I,B,C,D,E,F,G,J,K,T,U,Z,L,M,P,Q,N,O,V,W,X,Y,R,S

Step 2: Pair-wise merge followed by simulation

Page 37: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

37

ExplorationExploration

Exhaustive solution Evaluate all points Sort by decreasing

execution time Walk through the

space, eliminate points with power > minimum seen so far!

Substitute heuristics 0

200

400

600

800

1000

1200

1400

0 200 400 600 800 1000 1200 1400 1600 1800

Execution Time (us)

Powe

r (uW

)

(only works for 1-4 parameters!)

Page 38: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

38

ExplorationExploration

Complexity: O((K + log(K)) * 2N/K) K is the number of clusters N is the number of parameters 2N/K bounds the exhaustive comp. (K + log(k)) bounds the number of iterations Worse case K=1, best case K=N 2N/K decrease rapidly as K increases (e.g.,

226/2+226/2 is much smaller than 226!)

Page 39: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

39

OutlineOutline

Previous work Target architecture Power/performance estimation Parameter space exploration Experiments Conclusion

Page 40: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

40

Exploration – Results Exploration – Results

JPEG

0

200

400

600

800

1000

1200

1400

0 200 400 600 800 1000 1200 1400 1600 1800

Execution time (usec)

Po

wer

(u

W)

JPEG Exploration

time: 29.1 min Config. visited:

12352 (141) 5.10x exe. time 7.51x power 2.73x energy Pruning ratio >

0.99997

Page 41: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

41

Exploration – Results Exploration – Results

CKEY

0

10

20

30

40

50

60

0 10 20 30 40 50 60 70 80 90 100

Execution time (usec)

Po

wer

(u

W)

CKEY Exploration

time: 108 min Config. visited:

15890 (223) 8.31x exe. time 6.08x power 2.57x energy Pruning ratio >

0.99993

Page 42: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

42

Exploration – ResultsExploration – Results

IMAGE

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

0 1000 2000 3000 4000 5000 6000 7000 8000

Execution time (usec)

Po

wer

(u

W)

IMAGE Exploration

time: 50.2 min Config. visited:

10135 (80) 8.29x exe. time 8.57x power 1.81x energy Pruning ratio >

0.99998

Page 43: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

43

Exploration – ResultsExploration – Results

MATRIX Exploration

time: 73.6 min Config. visited:

12623 (84) 10.7x exe. time 8.16x power 3.18x energy Pruning ratio >

0.99997

MATRIX

0

50

100

150

200

250

300

350

400

450

500

0 100 200 300 400 500 600

Execution time (usec)

Po

wer

(u

W)

Page 44: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

44

Exploration – ResultsExploration – Results

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0 200 400 600 800 1000 1200 1400 1600 1800

Execution time (u sec)

Ener

gy (u

J)

JPEGJPEG

0

200

400

600

800

1000

1200

1400

0 200 400 600 800 1000 1200 1400 1600 1800

Execution time (usec)

Powe

r (uW

)

JPEG JPEG

Page 45: System-level Exploration for Pareto- optimal Configurations in Parameterized Systems-on-a-chip Architectures Tony Givargis (Frank Vahid, Jörg Henkel) Center

45

ConclusionConclusion

Gave a system-level algorithm for exploring the solution space of an application mapped to a parameterized SOC architectures– Given a dependency graph we extensively prune the

solution space– Pruning ratio > 0.99997 in experiments

Future work:– Automatically compute the dependency model– Replace the exhaustive sub-algorithm with a heuristic

(e.g., gradient search, GA)