46
A Random Walk from Async to Sync Paul Cunningham & Steev Wilcox

A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

A Random Walk from Async to Sync

Paul Cunningham & Steev Wilcox

Page 2: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

Thank You Ivan

Page 3: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

In the Beginning…

March 2002

Page 4: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

Azuro Day 1

Some money in the bank from Angel Investors

2 employees

Small Office rented from Cambridge University Computer Lab

…So why is Asynchronous design low power!?

Vision to automatically compile mainstream synchronous designs

into low power asynchronous equivalents

Page 5: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

Level Sensitive Device

Azuro Day 100

…after several meetings with chip companies, VCs and lots of whiteboard time

Page 6: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

A

B

Z

B A

A

B

4 transistors

LEVEL SENSITIVE AND GATE

C A

B Z Z

Z

A

B

B

A

12 transistors

3X

EDGE SENSITIVE AND GATE

Page 7: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

Costs power to implement

Hard to “sign-off” across different process corner, voltage, temperature

LA

TC

H

LA

TC

H Level

Sensitive

Logic

ctrl ctrl delay

request

acknowledge

Gmax

D

Gmax < D

Page 8: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

LA

TC

H

Level

Sensitive

Logic

ctrl

request

acknowledge

LA

TC

H

Level

Sensitive

Logic

ctrl

request

acknowledge

LA

TC

H

Level

Sensitive

Logic

ctrl

request

acknowledge

LA

TC

H

ctrl

Page 9: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

LA

TC

H

Level

Sensitive

Logic

ctrl

LA

TC

H

Level

Sensitive

Logic

ctrl

LA

TC

H

Level

Sensitive

Logic

ctrl

LA

TC

H

ctrl

Gmax

Gmin

W

P

Hold Check: Gmin > W

Setup Check: Gmax < P

“clock”

Page 10: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

LA

TC

H

Level

Sensitive

Logic

ctrl

LA

TC

H

Level

Sensitive

Logic

ctrl

LA

TC

H

Level

Sensitive

Logic

ctrl

LA

TC

H

ctrl

Gmax

Gmin

W

P

Hold Check: Gmin > W

Setup Check: Gmax < P

Off-chip crystal

oscillator

Pulse-flopped Synchronous Design

“clock”

Page 11: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

LA

TC

H

Level

Sensitive

Logic

ctrl

LA

TC

H

Level

Sensitive

Logic

ctrl

LA

TC

H

Level

Sensitive

Logic

ctrl

LA

TC

H

ctrl

Gmax

Gmin

W

P

Hold Check: Gmin > W

Setup Check: Gmax < P

Off-chip crystal

oscillator

How do you shift values in and out during chip test without violating the hold check?

How do make sure the pulse width is predictable across multiple process corner, temperature and voltages?

“clock”

Page 12: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

Small W

Level

Sensitive

Logic

Level

Sensitive

Logic

Level

Sensitive

Logic LA

TC

H

LA

TC

H

LA

TC

H

LA

TC

H

LA

TC

H

LA

TC

H

“clock” LA

TC

H

LA

TC

H

Hold Check: Gmin > W

Setup Check: Gmax < P

Page 13: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

Level

Sensitive

Logic

Level

Sensitive

Logic

Level

Sensitive

Logic

“clock”

L L

L L

L L

L L

L L

L L

L L

L L

L L

L L

L L

L L

Hold Check: Gmin > W

Very small and predictable W

Setup Check: Gmax < P

Small local circuit which can be carefully crafted and characterized across multiple process corners, voltages, temperatures

“D-type flip-flop” (DFF)

Page 14: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan
Page 15: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

Azuro Day 1

Some money in the bank from Angel Investors

2 employees

Small Office rented from Cambridge University Computer Lab

Vision to automatically compile mainstream synchronous designs

into low power asynchronous equivalents

Vision to bring the low power benefits of asynchronous design to

the mainstream synchronous community

Page 16: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

LA

TC

H

Level

Sensitive

Logic

ctrl

request

acknowledge

LA

TC

H

Level

Sensitive

Logic

ctrl

request

acknowledge

LA

TC

H

Level

Sensitive

Logic

ctrl

request

acknowledge

LA

TC

H

ctrl

These delays don’t all have to be the same

No data no latching (consume power only when needed)

Page 17: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

LA

TC

H

Level

Sensitive

Logic

ctrl

request

acknowledge

LA

TC

H

Level

Sensitive

Logic

ctrl

request

acknowledge

LA

TC

H

Level

Sensitive

Logic

ctrl

request

acknowledge

LA

TC

H

ctrl

These delays don’t all have to be the same

No data no latching (consume power only when needed)

Page 18: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

Power Saving Ideas

Power in the clock

− Is wasted if the register did not capture new data

− Clock gating

Power in the datapath

− Is wasted if the result of computation is not captured

− Guarding

a b

+

If will not

accept data

Don’t clock

register

Don’t send data

if (…) b = a + …

Page 19: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

First “Real” Idea: Guard Flops

Saw that flop can be decomposed

So create “Guard Flop”

T T

T T

T

T

Control accepting data Control routing of data

Page 20: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

First “Real” Idea: Guard Flops

Gained two US patents

Reception was that it was still too disruptive

− Needed to create new standard cells

− New Methodologies …

− Still too Async?

Needed to come closer to the mainstream

Page 21: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

Focus on Clock Gating

Clock gates are like AND gates with one rising-

edge sensitive input

T Enable

Clock in

Gated clock out

Clock in

Enable in

Clock out

Page 22: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

Focus on Clock Gating

Clock Gating in 2002 was primitive

− Instantiated directly from RTL expressions only if (a)

b <= c;

Three ways to improve:

− Find and exploit hierarchy

− Find new register-level expressions not in netlist

− Find sub-register-level expressions

Page 23: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

Hierarchical Gating

Finding expressions requires BDD-based netlist traversal

− Rarely as simple as observing an AND gate in the netlist!

Multiple levels of hierarchy usually available

Heuristics to limit levels to 2 or 3 to avoid clock impact

CG CG A & C A & B

CG CG C B

CG A

Page 24: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

Symbolic Gating

Use BDDs to decompose existing D-side function

− Gating function for whole register

− Next-state function assuming gating function pulled out

May need to avoid decomposition

− If timing is tight, might need to pull out CG late in the

day, and put back in old (green) function

CG Gating

function

Next-state

function

Page 25: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

Structural Gating

Generalization of previously-published work:

On it’s own, this is useless

− Clock cap of CG as much or larger than flop!

But can be combined with NAND, OR, and across

subsets of registers

CG

Page 26: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

Structural Gating

CG

CG

Register split into

number of clusters,

depending on simulation

information (SAIF, VCD)

Some parts of register

may stay at logic 0

more than logic 1

Use OR gate for comparison

Key is to make

the whole thing

activity-driven

Page 27: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

First Product: PowerCentric

VICTORY!

Page 28: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

1000

1200

1400

1600

1800

2000

2200

2400

2600

2800

3000

2007 2008 2009 2010

NASDAQ stock price

Azuro ready

to launch!

First Product: PowerCentric

Page 29: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

1000

1200

1400

1600

1800

2000

2200

2400

2600

2800

3000

2007 2008 2009 2010

NASDAQ stock price

Houston, we

have a problem …

Launch abort! Hunker down for phase 2

First Product: PowerCentric

Page 30: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

LA

TC

H

Level

Sensitive

Logic

ctrl

request

acknowledge

LA

TC

H

Level

Sensitive

Logic

ctrl

request

acknowledge

LA

TC

H

Level

Sensitive

Logic

ctrl

request

acknowledge

LA

TC

H

ctrl

These delays don’t all have to be the same

No data no latching (consume power only when needed)

These delays don’t all have to be the same

Page 31: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

2004, In Azuro R&D

Could now gate clock far more efficiently than any

competitor

However, we couldn’t estimate effect on the clock tree

Needed to generate our own clock tree synthesis algorithm

− Naturally, we used our Async experience …

Page 32: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

Azuro CTS

Bottom-up clustering approach

− Cluster lower drivers together first

− More flexible, less “synchronous” in a way

Ideally suited to deep clock gating

− Also ~10% lower power

− But has other benefits

Page 33: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

Azuro CTS

Azuro CTS allowed

us to

Tame clock complexity

Manage on-chip

variation

Productize

useful skew

Page 34: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

Clock Complexity

Innovated LP-based scheduling algorithm

to solve complex clock tree constraints

Do these flops need to be balanced?

Page 35: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

On-Chip Variation

OCV derates hidden until CTS

Page 36: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

OCV + Clock Complexity = Clock Timing Gap

Page 37: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

Useful Skew

Why does the clock need to arrive at every point at

the same time?

Cycle time: 1200ps

Think Async: What would we do?

Short stage

Short stage

Long stage

600ps 600ps 1200ps

CLK

Page 38: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

Useful Skew

Short stage

Short stage

Long stage

600 + 200ps 600 + 200ps 1200ps – 400ps

CLK

Advance this

signal 200ps

Delay this

signal 200ps

Cycle time now: 800ps

− 50% frequency improvement!

Page 39: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

Useful Skew

Has been tried before

− EDA vendors had useful skew engines

− Numerous academic papers on this

Focus was either too lowbrow or too highbrow

− Tradtional approach was adding individual buffers.

No ability to decrease delay. Buffers cost power.

− Academic approach was solving chip-wide LP-style

problems. Too hard for “real” chips (1M+ instances). − No consideration of complex clocks

− No consideration of on-chip variation

Page 40: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

?

Azuro CTS

Tame clock complexity

Manage on-chip

variation

Productize

useful skew

Page 41: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

D D1 D3

Traditional Design

Worst Path Closure Speed limited by max D

Chain

Path

Clock Concurrent Design

Worst Chain Closure Speed limited by avg D

2

Page 42: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

D1 D3

slack1 slack2

slack3

slack1 = slack2 = slack3

Need to be propagated clocks slacks not ideal clocks slacks!

?

D 2

Page 43: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

Clock Concurrent Optimization (CCOpt)

Logic

D3

Clock

Skew clocks and optimize logic concurrently

Always using propagated clocks slacks

Page 44: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

Azuro Rubix (CCOpt)

Azuro CTS

Tame clock complexity

Manage on-chip

variation

Productize

useful skew

Page 45: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan
Page 46: A Random Walk from Async to Sync - USC Viterbiee.usc.edu/async2015/web/.../2015/03/paulsteev_async2015_v07_relea… · Async to Sync Paul Cunningham & Steev Wilcox . T hank You Ivan

Thank You