24
Implementing Useful Skew Using Skew Groups Matthew Mei Cisco Systems

Implementing Useful Clock Skew Using Skew Groups

  • Upload
    m-mei

  • View
    1.119

  • Download
    6

Embed Size (px)

Citation preview

Page 1: Implementing Useful Clock Skew Using Skew Groups

Implementing Useful Skew

Using Skew Groups

Matthew Mei

Cisco Systems

Page 2: Implementing Useful Clock Skew Using Skew Groups

2

Matthew Mei

• Overview of skew

• Example design affected by skew

• What is useful skew

• Using skew groups to achieve useful skew

• Experimental results of trials on example design

• Inserting clock buffers to achieve useful skew

• Comparing skew groups and buffer insertion

• Conclusions

Outline

Page 3: Implementing Useful Clock Skew Using Skew Groups

3

Matthew Mei

Skew

Capture

Flip

Flop

Clock

Port

• Skew equals insertion delay at capture minus insertion delay at launch

• The insertion delay from: report_clock_timing -to <pin> -type latency

-setup

• Common path pessimism removal from: report_crpr -from <pin1> -to <pin2> -setup

Launch

Flip

Flop

Page 4: Implementing Useful Clock Skew Using Skew Groups

4

Matthew Mei

• 40 nm technology being used

• The block was about 8000 µm × 4000 µm

• Block utilization was about 75%, while standard

cell utilization was only about 20% (~600K cells)

• The block was mostly Ternary Content

Addressable Memories (TCAMs), which are

large memory macros used for fast searches

The Example Design

Page 5: Implementing Useful Clock Skew Using Skew Groups

5

Matthew Mei

Example Failing Path

(Diagram)

Memory

Capture

Flip

Flops

clk_core

• Thus, the skew is equal to:

1.0460 ns – 1.1783 ns = -0.132 ns

• Therefore, this timing path has -132 ps of skew

1.4831 ns 0.0000 ns

1.0460 ns 1.1783 ns

Page 6: Implementing Useful Clock Skew Using Skew Groups

6

Matthew Mei

Example Failing Path

(Timing Report)

Path Type: max

Point Incr Path

----------------------------------------------------------

clock clk_core (rise edge) 0.0000 0.0000

clock network delay (propagated) 1.1783 1.1783

w/m_36x1/CLK 0.0000 1.1783 r

w/m_36x1/QXY[13] 1.4831 2.6614 f

w/r0_data_read1_s_36x1_13_ (net) 0.0000 2.6614 f

w/r1_data_read1_s_36x1_reg_13_/D 0.0000 & 2.6614 f

data arrival time 2.6614

clock clk_core (rise edge) 1.6670 1.6670

clock network delay (propagated) 1.0460 2.7130

clock uncertainty -0.0580 2.6550

w/r1_data_read1_s_36x1_reg_13_/CK 0.0000 2.6550 r

library setup time -0.1197 2.5353

data required time 2.5353

----------------------------------------------------------

data required time 2.5353

data arrival time -2.6614

----------------------------------------------------------

slack (VIOLATED) -0.1261

Page 7: Implementing Useful Clock Skew Using Skew Groups

7

Matthew Mei

Example Failing Path

(Layout)

• Pipeline flops already added and magnet placed

Page 8: Implementing Useful Clock Skew Using Skew Groups

8

Matthew Mei

Using Skew Groups to Achieve

Useful Skew

TCAMs

Pipeline

Flip

Flops

clk_core

• To improve the setup timing performance, delay

can be added to the red clock path

• Tried to achieve the target skew using skew

groups

• Also tried manual buffer insertion (later)

Target Skew

Page 9: Implementing Useful Clock Skew Using Skew Groups

9

Matthew Mei

Skew Groups

• Skew groups were defined before clock tree

synthesis

• The following commands were used before

clock_opt to create a skew group: set_skew_group -name <name> -target_skew <skew>

<pins list>

report_skew_group -name <name>

commit_skew_group

• The pins list in the example design included the

clock pins of about 8000 flip flops

• Tried 50 ps, 120 ps, 200 ps, 240 ps, 300 ps

Page 10: Implementing Useful Clock Skew Using Skew Groups

10

Matthew Mei

Skew Groups

Effective Skew vs. Target Skew

-0.05

0

0.05

0.1

0.15

0.2

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

Eff

ecti

ve S

kew

(n

s)

Target Skew (ns)

Effective Skew vs. Target Skew

Clock Opt Effective Skew

Route Opt Effective Skew

Post Route Effective Skew

Page 11: Implementing Useful Clock Skew Using Skew Groups

11

Matthew Mei

Skew Groups

Setup Timing Performance

-700

-600

-500

-400

-300

-200

-100

0

-0.18

-0.16

-0.14

-0.12

-0.1

-0.08

-0.06

-0.04

-0.02

0

0 0.05 0.1 0.15

Neg

ati

ve S

lac

k (

ns)

Effective Skew (ns)

Negative Slack vs. Effective Skew

WNS

TNS

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

0 0.05 0.1 0.15

Failin

g P

ath

s

Effective Skew (ns)

Failing Paths vs. Effective Skew

Page 12: Implementing Useful Clock Skew Using Skew Groups

12

Matthew Mei

Skew Groups

Hold Timing Performance

0

20

40

60

80

100

120

140

0 0.05 0.1 0.15

Failin

g P

ath

s

Effective Skew (ns)

Failing Hold Paths vs. Effective Skew

-1.8

-1.6

-1.4

-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

-0.14

-0.12

-0.1

-0.08

-0.06

-0.04

-0.02

0

0 0.05 0.1 0.15

Neg

ati

ve S

lac

k (

ns)

Effective Skew (ns)

Negative Hold Slack vs. Effective Skew

Worst Hold

Total Hold

Page 13: Implementing Useful Clock Skew Using Skew Groups

13

Matthew Mei

Skew Groups

Path Skew Distribution

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

-0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25 0.3

Nu

mb

er

of

Flo

ps (

Cu

mu

lati

ve)

Skew of Individual Path (ns)

Cumulative Distribution of Path Skew Among Skew Group Flip Flops

Effective Skew 0.005 ns

Effective Skew 0.085 ns

Effective Skew 0.121 ns

Effecitve Skew 0.138 ns

Page 14: Implementing Useful Clock Skew Using Skew Groups

14

Matthew Mei

• Using skew groups causes the clock tree to

branch out at an early level

• The TCAMs and the pipeline flip flops had zero

common path pessimism removed

• More complex clock tree, more cells and routing

Skew Groups

Effects on Clock Tree

Page 15: Implementing Useful Clock Skew Using Skew Groups

15

Matthew Mei

Skew Groups

Clock Tree Cells and Buffer Area

23000

24000

25000

26000

27000

28000

29000

5950

6000

6050

6100

6150

6200

6250

6300

6350

6400

6450

Control 0.05 0.12 0.2 0.24 0.3

Bu

ffer

Are

a (

µm

2)

Nu

mb

er

of

Clo

ck C

ells

Target Skew (ns)

Clock Tree vs. Target Skew

Buffer AreaClock Cells

• Increased clock tree size by about 250 cells

Page 16: Implementing Useful Clock Skew Using Skew Groups

16

Matthew Mei

Skew Groups

Power Consumption

0

0.2

0.4

0.6

0.8

1

1.2

0

1

2

3

4

5

6

7

8

0.05 0.12 0.2 0.24 0.3

Incre

ase i

n T

ota

l P

ow

er

(%)

Incre

ase i

n C

lock T

ree P

ow

er

(%)

Target Skew (ns)

Power Increase vs. Target Skew

Percent Total Power IncreasePercent Clock Tree Power Increase

• On average, increase by 5.16% in clock tree and 0.66% in total block power consumption

Page 17: Implementing Useful Clock Skew Using Skew Groups

17

Matthew Mei

Manual Buffer Insertion to Achieve

Useful Skew

TCAMs

Pipeline

Flip

Flops

clk_core

• The instinctive way of inserting delay is to manually insert clock buffers: insert_buffer –no_of_cells <num buffers> <pins

list> <buffer type>

• The target skew is determined by the number and type of buffers, not by numerical value

Target Skew

Page 18: Implementing Useful Clock Skew Using Skew Groups

18

Matthew Mei

Manual Buffer Insertion

• Clock buffers were inserted right before clock

tree routing

• Two buffers of low drive strength were used.

Each buffer added about 40 ps of delay

• The pins list in the example design included the

clock pins of the same ~8000 flip flops

• The clock buffer insertion resulted in a “Post

Route Effective Skew” of about 0.084 ns

• The TCAMs and the flip flops had on average 38

ps of common path pessimism removed

Page 19: Implementing Useful Clock Skew Using Skew Groups

19

Matthew Mei

Manual Buffer Insertion

Setup Timing Performance

-700

-600

-500

-400

-300

-200

-100

0

-0.18

-0.16

-0.14

-0.12

-0.1

-0.08

-0.06

-0.04

-0.02

0

0 0.05 0.1 0.15

Neg

ati

ve S

lac

k (

ns)

Effective Skew (ns)

Negative Slack vs. Effective Skew

WNS

WNS (clkbuf)

TNS

TNS (clkbuf)

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

0 0.05 0.1 0.15

Failin

g P

ath

s

Effective Skew (ns)

Failing Paths vs. Effective Skew

Failing Paths

Failing Paths (clkbuf)

Page 20: Implementing Useful Clock Skew Using Skew Groups

20

Matthew Mei

Manual Buffer Insertion

Hold Timing Performance

0

20

40

60

80

100

120

140

0 0.05 0.1 0.15

Failin

g P

ath

s

Effective Skew (ns)

Failing Hold Paths vs. Effective Skew

Failing Paths

Failing Paths (clkbuf)

-1.8

-1.6

-1.4

-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

-0.14

-0.12

-0.1

-0.08

-0.06

-0.04

-0.02

0

0 0.05 0.1 0.15

Neg

ati

ve S

lac

k (

ns)

Effective Skew (ns)

Negative Hold Slack vs. Effective Skew

Worst Hold

Worst Hold (clkbuf)

Total Hold

Total Hold (clkbuf)

Page 21: Implementing Useful Clock Skew Using Skew Groups

21

Matthew Mei

Manual Buffer Insertion

Path Skew Distribution

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

-0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25 0.3

Nu

mb

er

of

Flo

ps (

Cu

mu

lati

ve)

Path Skew (ns)

Cumulative Distribution of Path Skew Among Skew Group Flip Flops

Effective Skew 0.005 ns

Effective Skew 0.085 ns

Effective Skew 0.121 ns

Effecitve Skew 0.138 ns

Effective Skew clkbuf

Page 22: Implementing Useful Clock Skew Using Skew Groups

22

Matthew Mei

Manual Buffer Insertion

Power Consumption

• Buffer insertion resulted in about 22000 clock cells, dramatically increasing power

0

0.5

1

1.5

2

2.5

3

3.5

4

0

10

20

30

40

50

60

0.05 0.12 0.2 0.24 0.3 clkbuf

Incre

ase i

n T

ota

l P

ow

er

(%)

Incre

ase i

n C

lock T

ree P

ow

er

(%)

Target Skew (ns)

Power Increase vs. Target Skew

Percent Total Power IncreasePercent Clock Tree Power Increase

Page 23: Implementing Useful Clock Skew Using Skew Groups

23

Matthew Mei

Conclusions

• Both methods are easy to setup in IC Compiler

• Skew groups:

– Easy to specify target skew

– Results in smaller increase in cells, power, and area

• Manual buffer insertion:

– Relies on past experience for buffer selection

– Results in larger increase in cells, power, and area

Page 24: Implementing Useful Clock Skew Using Skew Groups

Questions?