31
Design Automation Group Parallel Hierarchical Cross Entropy Optimization for On-Chip Decap Budgeting Xueqian Zhao Yonghe Guo Yonghe Guo Zhuo Feng Shiyan Hu Department of Electrical & Computer Engineering Michigan Technological University X. Zhao X. Zhao et al. et al. 47 47 th th DAC, June 17 DAC, June 17 th th , 2010 , 2010 1 2010 ACM/EDAC/IEEE Design Automation Conference

Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

Design Automation Group

Parallel Hierarchical Cross Entropy Optimization for On-Chip Decap Budgetingp p g g

Xueqian Zhao Yonghe GuoYonghe GuoZhuo FengShiyan Hu

Department of Electrical & Computer EngineeringMichigan Technological University

X. Zhao X. Zhao et al. et al. 4747thth DAC, June 17DAC, June 17thth, 2010, 20101 2010 ACM/EDAC/IEEE Design Automation Conference

Page 2: Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

Outline

Introduction

Problem Formulation

•Importance Sampling Based•Hierarchical Optimization•Sensitivity Guided•Parallelized in GPU environment

The Cross Entropy Based Algorithm

Experimental Results

Conclusion

X. Zhao X. Zhao et al. et al. 4747thth DAC, June 17DAC, June 17thth, 2010, 20102

Page 3: Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

Power Supply NetworkPower supply grid is one of the most important sources of noise.

VddVdd

Interconnect wire

CurrentNode

Functional gate

X. Zhao X. Zhao et al. et al. 4747thth DAC, June 17DAC, June 17thth, 2010, 20103

Page 4: Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

Voltage DropPower Supply NoiseS l lt i ti lt i l i hi h l d t

g p

VddV(t)

Supply voltage variation can result in supply noise which can lead to problems related to logic error, spurious transitions and delay variations.

Vdd

Vth

Noise gj

( ) max( ( ) 0)T

g c c V v t dt= ∫0 T tt1 t2

10

( ,..., ) max( ( ),0)j m th jg c c V v t dt= −∫H. Su, S. Sapatnekar, and S. Nassif. Optimal decoupling capacitor sizing

d l t f t d d ll l t d i (IEEE T CAD ’03)

X. Zhao X. Zhao et al. et al. 4747thth DAC, June 17DAC, June 17thth, 2010, 20104

and placement for standard-cell layout designs. (IEEE Trans. on CAD, ’03)

Page 5: Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

Equivalent Power Grid ModelPower Grid Transient AnalysisqUsing power grid transient analysis to identify the power supply noise.In simulation, gates are replaced by pulse current sources.

Vdd( ) ( ) ( )dv tC Gv t b t

dt+ =

Vdd

Ids

t

X. Zhao X. Zhao et al. et al. 4747thth DAC, June 17DAC, June 17thth, 2010, 20105

Page 6: Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

DecapDecoupling Capacitor (Decap)D i ti d ff t th l lt i tiDecapDecap insertion and effect on the supply voltage variation

Decap

Current is partially supplied by decap.

Decap

Before Applying Decap After Applying Decap

X. Zhao X. Zhao et al. et al. 4747thth DAC, June 17DAC, June 17thth, 2010, 2010

Before Applying Decap After Applying Decap

6

Page 7: Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

Budget Constrained Decap Optimization

n

Our objective is to minimize the total noise subject to the global and local constraints.

11

min: ( ,..., )j mj

g c c=∑ m candidate decap locations/nodes

. . ii u

m

st c C

C

≤∑

Local size constraint

1

i toti

c C=

≤∑ Global budget constraint

Constraints: limited empty space in the chip; leakage power; impact inConstraints: limited empty space in the chip; leakage power; impact in routing of interconnected wires, etc. H. Su, S. Sapatnekar, and S. Nassif. Optimal decoupling capacitor sizing

d l t f t d d ll l t d i IEEE T CAD ’03

X. Zhao X. Zhao et al. et al. 4747thth DAC, June 17DAC, June 17thth, 2010, 20107

and placement for standard-cell layout designs, IEEE Trans. on CAD, ’03

Page 8: Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

Motivation

Sensitivity-guided Cross Entropy Based Optimization(SCE)– Relative sensitivityy

– Importance Sampling

– Easy to be Parallelized

Hierarchical Optimization– Different Strategies for Block-level and Node-level Decap BudgetingDifferent Strategies for Block level and Node level Decap Budgeting

Parallel AccelerationParallel Acceleration– GPU Acceleration for Power Grid Simulation

– Parallel Samples Evaluation on Multi-core Many-core platform

X. Zhao X. Zhao et al. et al. 4747thth DAC, June 17DAC, June 17thth, 2010, 2010

p y p

8

Page 9: Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

Decap Sensitivity

• More efficient rule for decap budgeting.

• Decap Sensitivity:Decap Sensitivity:

1( ,..., )n

j mg c c∂∑ 11

,

( , , )j mj

i all

gs

c==

∑ic∂

The above formula can not be directly used for sensitivity computation:1 did t d d t i t l i (ti i )1. m candidate nodes need m transient analysis (time consuming)2. difficult to determine

ic∂

X. Zhao X. Zhao et al. et al. 4747thth DAC, June 17DAC, June 17thth, 2010, 20109

Page 10: Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

Efficient Sensitivity Computation

Adjoint sensitivity computation: needs only one original network transient analysis and one adjoint network transient analysis.

Two networks have the same topology but different sources setup

Original Network Adjoint Network

Vdd Gnd

Violating Node Violating Node

X. Zhao X. Zhao et al. et al. 4747thth DAC, June 17DAC, June 17thth, 2010, 201010

Page 11: Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

Efficient Sensitivity Computation(cont.)

Adjoint sensitivity computation: convolution of the two voltage waveforms obtained from each network.

*

( )iOriginal v t⇒

,

* ( )i all

T

Adjoint V t⇒

,

*,

0

( ) ( )i all

T

i all is V T t v t dt= −∫L. Pillage, R. Rohrer and C. Visweswariah, Electronic circuit & system simulation methods, McGraw-Hill, 1995.

0

X. Zhao X. Zhao et al. et al. 4747thth DAC, June 17DAC, June 17thth, 2010, 2010

, ,

11

Page 12: Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

Partitioning• Reduce solution space from a great number of candidate nodes to fewer number of candidate blocks.• Foundation of hierarchical optimization in block-level and node-level

Candidate block

Candidate node

X. Zhao X. Zhao et al. et al. 4747thth DAC, June 17DAC, June 17thth, 2010, 201012

Page 13: Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

Main Idea

Hierarchical Optimization – Different strategies for block-level and node-level

ti i tioptimization

DecapDecapDecap assigned at

block n

Relative sensitivity basedNode-level Decap Budgeting

Cross Entroy based Block-Level Decap Budgeting

X. Zhao X. Zhao et al. et al. 4747thth DAC, June 17DAC, June 17thth, 2010, 201013

Node level Decap BudgetingLevel Decap Budgeting

Page 14: Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

Node-level Relative-sensitivity based Optimization• Relative sensitivity based optimization

• Relative sensitivity is approximately constant within a small block• No need to re-evaluate the sensitivity after each iterationNo need to re evaluate the sensitivity after each iteration

constanta

b

ss

≈b

The relative impact to noise d i b breduction between nearby

decaps always keep the same before and after decap budgetingbudgeting

X. Zhao X. Zhao et al. et al. 4747thth DAC, June 17DAC, June 17thth, 2010, 201014

Page 15: Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

Empirical ValidationThe figure shows the relative sensitivities before and after decap insertion within a block with size of 30 x 30.

X. Zhao X. Zhao et al. et al. 4747thth DAC, June 17DAC, June 17thth, 2010, 201015

Page 16: Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

Block-level Cross Entropy based OptimizationC E t M th d(CE)Cross Entropy Method(CE)

– A general Monte Carlo approach using importance sampling technique

– Rare event probability estimation

– In any optimization problem, optimum solution can be considered as a rare eventas a rare event

( )( ) [ ( ) ] [ ]

( ) representing the objective functionf X aa P f X a E I

f x

δ ≤= ≤ =

( ) representing the objective function( ) denoting the PDF for general Monte Carlo method being a set of samples generated from ( )

f xg xX g x

denoting the thresholda

minimize s.t. ( ) 0a aδ →

X. Zhao X. Zhao et al. et al. 4747thth DAC, June 17DAC, June 17thth, 2010, 201016

Page 17: Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

Importance Sampling

General Monte Carlo: g(x) needs a lot of randomly generated samples, but would not obtain accurate result (There would be none sample falling into rare event region).

Use a different PDF k(x) not g(x) to estimate δ(a) as β(a). Most of

g g )

Importance sampling is used to reduce the number of samples

( )1 n X

( ) g( ) ( ) β( )samples generated by k(x) will fall into the rare event region. Thus, only a few samples are needed.

( )1

( )1( )( )i

ni

f X ai i

g Xa In k X

β ≤=

= ∑

( )* ( )( ) ( ) ( )

( )if X aI g x

a a k xa

δ βδ

≤= ⇔ =

X. Zhao X. Zhao et al. et al. 4747thth DAC, June 17DAC, June 17thth, 2010, 2010

( )

17

Page 18: Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

CE Based Decap Insertion

CE consists of two phases in a nutshell– Generate a series of random data samples according to a initial

specified PDFspecified PDF

– Update the E(x), δ2(x) and etc. of the PDF based on the previous "good" samples to produce "better" samples in the next iteration.

k(x) x1k(x): PDF in solution space

x1

k’(x)

k(x)

x* x*

x20 x20

x*: Optimal solution

X. Zhao X. Zhao et al. et al. 4747thth DAC, June 17DAC, June 17thth, 2010, 2010

p

18

Page 19: Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

CE Algorithmic Flow(For 2-block variables)It ti 1

Decap budget at block 2

Iteration 1k(x)

Iteration 1k(x)

Pick top solutions with smallest noisewith smallest noiseto update PDF

Decap budget at block 10

k( )

0

Iteration 2Repeat until convergencek*(x)

k(x) Iteration 2k(x)Generate another group of samples

0

Optimumx*

X. Zhao X. Zhao et al. et al. 4747thth DAC, June 17DAC, June 17thth, 2010, 201019

0 0

Page 20: Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

Parallel Decap Budgeting

Decap budget at block 2g(x)

Core 1

Core 2

g(x)

D b d t t bl k 10

Core 2

Bottleneck of computation, but can be easilyDecap budget at block 10 can be easily parallelized.

Evaluate noise of each solution

X. Zhao X. Zhao et al. et al. 4747thth DAC, June 17DAC, June 17thth, 2010, 201020

Page 21: Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

Muti/Many-Core Based ParallelizationThe graph shows the flow of multi-thread SCE samples processing with multi-GPU.

Generate n samples

Th d 1 Th d kThread 1 Thread k

n/k Samples n/k Samples n/k Samples ProcessingOn GPU 1

n/k Samples ProcessingOn GPU k

Pick Top Best Ones

Z. Feng and P. Li. Multigrid on GPU: tackling power grid

X. Zhao X. Zhao et al. et al. 4747thth DAC, June 17DAC, June 17thth, 2010, 201021

analysis on parallel SIMT platforms, ICCAD’08.

Page 22: Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

Complete Sensitivity-guided CE(SCE) Algorithm Flow

Power Grid Partition & Sensitivity Calculation

Build up a PDF for Solutions

G t Bl k L l DGenerate Block-Level DecapBudgeting samples using PDF

Determine Decap Size for EachDetermine Decap Size for Each Node Based on Relative Sensitivity

Evaluate Solutions on If NotConvergeMulti-Core Multi-GPU Converge

Result Comparison

X. Zhao X. Zhao et al. et al. 4747thth DAC, June 17DAC, June 17thth, 2010, 201022

Page 23: Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

Experimental Setup

Hardware Platform SetupIBM Po er Grid Benchmarks (S Nassif ASPDAC ‘08)– IBM Power Grid Benchmarks (S. Nassif ASPDAC ‘08)

– C++/ GPU CUDA

– Intel Quad-Core CPU, 2.66 GHzIntel Quad Core CPU, 2.66 GHz

– Two NVIDIA GeForce GTX285 Graphics Cards

– Ubuntu 8.04, 64-bit

Compare to a recent conjugate gradient based decap optimization approach(iCG)optimization approach(iCG)

– H. Li, J. Fan, Z. Qi, S. Tan, L. Wu, Y. Cai and X. Hong, Partitioning-Based Approach to Fast On-Chip Decoupling Capacitor Budgeting and Minimization. (TCAD ’06).

X. Zhao X. Zhao et al. et al. 4747thth DAC, June 17DAC, June 17thth, 2010, 2010

( )

23

Page 24: Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

Comparison - IThe figure shows total noise after decap insertion under different budgets and methods.

The figureThe figure shows total noise after decap insertion under different budgets and methods.

X. Zhao X. Zhao et al. et al. 4747thth DAC, June 17DAC, June 17thth, 2010, 201024

Page 25: Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

Noise-Decap Budget TradeoffUsing our SCE method, 70% decap budget can eliminate most of the power supply noises.

X. Zhao X. Zhao et al. et al. 4747thth DAC, June 17DAC, June 17thth, 2010, 201025

Page 26: Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

Comparison – II-1

Partition-Based SCE

The figure shows comparison of runtime, total noise and number of iteration among different methods.

Budget 50% iCG CEPartition Based SCE

Block dim 10x10

Block dim 25x25

CKT #vio.N N(%) Iter. T(s) N(%) Iter. T(s) N(%) Iter. T(s) N(%) Iter. T(s)

ibm2 481 19.7 15 62 35.1 20 316 14.8 3 38 15.8 2 25ibm2 481 19.7 15 62 35.1 20 316 14.8 3 38 15.8 2 25

ibm4 1,829 24.2 15 638 -- -- -- 19.2 4 401 20.3 3 300

ibm5 1,809 47.2 15 1265 -- -- -- 38.1 4 1026 42.1 3 729

ibm6 1,926 30.1 15 1409 -- -- -- 27.7 5 1258 28.1 3 771

X. Zhao X. Zhao et al. et al. 4747thth DAC, June 17DAC, June 17thth, 2010, 201026

Page 27: Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

Comparison – II-2

Partition-Based SCE

The figure shows comparison of runtime, total noise and number of iteration among different methods.

Budget 70% iCG CEBlock dim

10x10Block dim

25x25

CKT #vio.N N(%) Iter. T(s) N(%) Iter. T(s) N(%) Iter. T(s) N(%) Iter. T(s)

ibm2 481 1 83 13 55 12 4 20 312 1 1 77 3 38 1 78 2 25ibm2 481 1.83 13 55 12.4 20 312.1 1.77 3 38 1.78 2 25

ibm4 1,829 7.1 14 592 -- -- -- 1.7 3 307 3.7 2 203

ibm5 1,809 38.3 15 1286 -- -- -- 24.1 4 1028 23.8 3 735

ibm6 1 926 6 4 15 1430 -- -- -- 5 1 5 1219 6 0 3 769

X. Zhao X. Zhao et al. et al. 4747thth DAC, June 17DAC, June 17thth, 2010, 201027

ibm6 1,926 6.4 15 1430 5.1 5 1219 6.0 3 769

Page 28: Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

Comparison – II-3

Partition-Based SCE

The figure shows comparison of runtime, total noise and number of iteration among different methods.

Budget 90% iCG CEPartition Based SCE

Block dim 10x10

Block dim 25x25

CKT #vio.N N(%) Iter. T(s) N(%) Iter. T(s) N(%) Iter. T(s) N(%) Iter. T(s)

ibm2 481 0.02 16 65 0.02 19 294 0.004 5 63 0.01 3 37ibm2 481 0.02 16 65 0.02 19 294 0.004 5 63 0.01 3 37

ibm4 1,829 0.00 16 617 -- -- -- 0.00 3 299 0.00 4 398

ibm5 1,809 31.2 17 1459 -- -- -- 7.1 5 1251 8.4 3 1119

ibm6 1,926 0.00 1 151 -- -- -- 0.00 1 354 0.00 1 356

X. Zhao X. Zhao et al. et al. 4747thth DAC, June 17DAC, June 17thth, 2010, 201028

Page 29: Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

Speedup Between Different SetupThe figure below shows the comparison of time cost between decap simulation under single GPU and double GPUs.

X. Zhao X. Zhao et al. et al. 4747thth DAC, June 17DAC, June 17thth, 2010, 201029

Page 30: Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

Conclusion

■ A novel cross entropy based optimization technique is proposed for decoupling capacitor budgeting problem.p p p g p g g p■ Sensitivity Guided■ Hierarchical Optimization■ Parallelization-friendly for multi-/many-core platforms

E i t l lt d t t th t l ith■ Experimental results demonstrate that our algorithm runs 2x faster than prior approach and obtain 25% better results in the final decap budgeting solutions.p g g

X. Zhao X. Zhao et al. et al. 4747thth DAC, June 17DAC, June 17thth, 2010, 201030

Page 31: Parallel Hierarchical Cross Entropy Optimization for On ...zhuofeng/MTU_VLSI_DA_files/papers/decap_dac10_slides.pdfD i ti d ff t th l lt i tiDecap insertion and effect on the supply

Th k !Thanks!

X. Zhao X. Zhao et al. et al. 4747thth DAC, June 17DAC, June 17thth, 2010, 201031