Lec 19 Design for Skew

Embed Size (px)

Citation preview

  • 7/28/2019 Lec 19 Design for Skew

    1/29

    Introduction to

    CMOS VLSIDesign

    Design for Skew

  • 7/28/2019 Lec 19 Design for Skew

    2/29

    CMOS VLSI DesignDesign for Skew Slide 2

    Outline

    Clock Distribution

    Clock Skew

    Skew-Tolerant Static Circuits

    Traditional Domino Circuits

    Skew-Tolerant Domino Circuits

  • 7/28/2019 Lec 19 Design for Skew

    3/29

    CMOS VLSI DesignDesign for Skew Slide 3

    Clocking

    Synchronous systems use a clock to keep

    operations in sequence

    Distinguish this fromprevious ornext

    Determine speed at which machine operates

    Clock must be distributed to all the sequencing

    elements

    Flip-flops and latches

    Also distribute clock to other elements Domino circuits and memories

  • 7/28/2019 Lec 19 Design for Skew

    4/29

    CMOS VLSI DesignDesign for Skew Slide 4

    Clock Distribution

    On a small chip, the clock distribution network is just

    a wire

    And possibly an inverter for clkb

    On practical chips, the RC delay of the wire

    resistance and gate load is very long

    Variations in this delay cause clock to get to

    different elements at different times

    This is called clock skew

    Most chips use repeaters to buffer the clock and

    equalize the delay

    Reduces but doesnt eliminate skew

  • 7/28/2019 Lec 19 Design for Skew

    5/29

    CMOS VLSI DesignDesign for Skew Slide 5

    Example

    Skew comes from differences in gate and wire delay

    With right buffer sizing, clk1 and clk2 could ideally

    arrive at the same time.

    But power supply noise changes buffer delays

    clk2 and clk3 will always see RC skew

    3 mm

    1.3 pF

    3.1 mmgclk

    clk1

    0.5 mm

    clk2

    clk3

    0.4 pF 0.4 pF

  • 7/28/2019 Lec 19 Design for Skew

    6/29

    CMOS VLSI DesignDesign for Skew Slide 6

    Review: Skew Impact

    F1

    F2

    clk

    clk clk

    Combinational Logic

    Tc

    Q1 D2

    Q1

    D2

    tskew

    CL

    Q1

    D2

    F1

    clk

    Q1

    F2

    clk

    D2

    clk

    tskew

    tsetup

    tpcq

    tpdq

    tcd

    thold

    tccq

    setup skewsequencing overhead

    hold skew

    pd c pcq

    cd ccq

    t T t t t

    t t t t

    Ideally full cycle is

    available for work

    Skew adds sequencing

    overhead Increases hold time too

  • 7/28/2019 Lec 19 Design for Skew

    7/29

    CMOS VLSI DesignDesign for Skew Slide 7

    Cycle Time Trends

    Much of CPU performance comes from higherf

    fis improving faster than simple process shrinks

    Sequencing overhead is bigger part of cycle

    0.01

    0.1

    1

    10

    100

    8038680486PentiumPentium II / III

    SpecInt95

    1985 1988 1991 1994 1997 2000

    1.2 0.8 0.6 0.352.0

    Process

    100

    200

    500VDD = 5

    VDD = 3.3

    VDD = 2.5

    50Fanout-of-4(FO4)InverterD

    elay(ps)

    0.25

    10

    100

    1000

    8038680486PentiumPentium II / III

    MHz

    1988 1991 1994 1997 20001985

    10

    100

    1985 1988 1991 1994 1997

    8038680486PentiumPentium II / IIIF

    O4inverterdelays/cycle

    50

    20

    2000

  • 7/28/2019 Lec 19 Design for Skew

    8/29

    CMOS VLSI DesignDesign for Skew Slide 8

    Solutions

    Reduce clock skew

    Careful clock distribution network design

    Plenty of metal wiring resources

    Analyze clock skew Only budget actual, not worst case skews

    Local vs. global skew budgets

    Tolerate clock skew

    Choose circuit structures insensitive to skew

  • 7/28/2019 Lec 19 Design for Skew

    9/29

    CMOS VLSI DesignDesign for Skew Slide 9

    Clock Dist. Networks

    Ad hoc

    Grids

    H-tree

    Hybrid

  • 7/28/2019 Lec 19 Design for Skew

    10/29

    CMOS VLSI DesignDesign for Skew Slide 10

    Clock Grids

    Use grid on two or more levels to carry clock

    Make wires wide to reduce RC delay

    Ensures low skew between nearby points

    But possibly large skew across die

  • 7/28/2019 Lec 19 Design for Skew

    11/29

    CMOS VLSI DesignDesign for Skew Slide 11

    Alpha Clock Grids

    PLL

    gclk grid

    Alpha 21064 Alpha 21164 Alpha 21264

    gclk grid

    Alpha 21064 Alpha 21164 Alpha 21264

  • 7/28/2019 Lec 19 Design for Skew

    12/29

    CMOS VLSI DesignDesign for Skew Slide 12

    H-Trees

    Fractal structure

    Gets clock arbitrarily close to any point

    Matched delay along all paths

    Delay variations cause skew

    A and B might see big skew A B

  • 7/28/2019 Lec 19 Design for Skew

    13/29

    CMOS VLSI DesignDesign for Skew Slide 13

    Itanium 2 H-Tree

    Four levels of buffering:

    Primary driver

    Repeater

    Second-levelclock buffer

    Gater

    Route around

    obstructionsPrimary Buffer

    Repeaters

    Typical SLCB

    Locations

  • 7/28/2019 Lec 19 Design for Skew

    14/29

    CMOS VLSI DesignDesign for Skew Slide 14

    Hybrid Networks

    Use H-tree to distribute clock to many points

    Tie these points together with a grid

    Ex: IBM Power4, PowerPC H-tree drives 16-64 sector buffers

    Buffers drive total of 1024 points

    All points shorted together with grid

  • 7/28/2019 Lec 19 Design for Skew

    15/29

    CMOS VLSI DesignDesign for Skew Slide 15

    Skew Tolerance

    Flip-flops are sensitive to skew because ofhard edges

    Data launches at latest rising edge of clock

    Must setup before earliest next rising edge of clock

    Overhead would shrink if we can soften edge

    Latches tolerate moderate amounts of skew

    Data can arrive anytime latch is transparent

  • 7/28/2019 Lec 19 Design for Skew

    16/29

    CMOS VLSI DesignDesign for Skew Slide 16

    Skew: Latches

    Q1L1

    1

    2

    L2

    L3

    1

    1

    2

    Combinational

    Logic 1

    Combinational

    Logic 2

    Q2 Q3D1 D2 D3

    sequencing overhead

    1 2 hold nonoverlap skew

    borrow setup nonoverlap skew

    2

    ,

    2

    pd c pdq

    cd cd ccq

    c

    t T t

    t t t t t t

    Tt t t t

    2-Phase Latches

    setup skew

    sequencing overhead

    hold skew

    borrow setup skew

    max ,pd c pdq pcq pw

    cd pw ccq

    pw

    t T t t t t t

    t t t t t

    t t t t

    Pulsed Latches

  • 7/28/2019 Lec 19 Design for Skew

    17/29

    CMOS VLSI DesignDesign for Skew Slide 17

    Dynamic Circuit Review

    Static circuits are slow because fat pMOS load input

    Dynamic gates use precharge to remove pMOS

    transistors from the inputs

    Precharge: = 0 output forced high

    Evaluate: = 1 output may pull low

    A B

    Y

    C DY

    A B C D

    A

    B

    CD

  • 7/28/2019 Lec 19 Design for Skew

    18/29

    CMOS VLSI DesignDesign for Skew Slide 18

    Domino Circuits

    Dynamic inputs must monotonically rise during

    evaluation

    Place inverting stage between each dynamic gate

    Dynamic / static pair called domino gate

    Domino gates can be safely cascaded

    A

    W

    B

    X

    domino AND

    dynamic

    NAND

    static

    inverter

  • 7/28/2019 Lec 19 Design for Skew

    19/29

    CMOS VLSI DesignDesign for Skew Slide 19

    Domino Timing

    Domino gates are 1.5 2x faster than static CMOS

    Lower logical effort because of reduced Cin

    Challenge is to keep precharge off critical path

    Look at clocking schemes for precharge and eval Traditional schemes have severe overhead

    Skew-tolerant domino hides this overhead

  • 7/28/2019 Lec 19 Design for Skew

    20/29

    CMOS VLSI DesignDesign for Skew Slide 20

    Traditional Domino Ckts

    Hide precharge time by ping-ponging between half-

    cycles

    One evaluates while other precharges

    Latches hold results during precharge

    Tc

    Static

    Dynamic

    Latch

    clk

    Static

    Dynamic

    clk

    Static

    Dynamic

    clk

    Dynamic

    clk clk

    Static

    Dynamic

    Latch

    Static

    Dynamic

    Static

    Dynamic

    Dynamic

    clk clk clk clk clk

    clk

    clk

    tpdq

    tpdq

    2pd c pdqt T t

  • 7/28/2019 Lec 19 Design for Skew

    21/29

    CMOS VLSI DesignDesign for Skew Slide 21

    Clock Skew

    Skew increases sequencing overhead

    Traditional domino has hard edges

    Evaluate at latest rising edge

    Setup at latch by earliest falling edge

    Static

    Dynamic

    Latch

    clk

    Static

    Dynamic

    clk

    Dynamic

    clk clk

    Static

    Dynamic

    Latch

    Static

    Dynamic

    Dynamic

    clk clk clk clk

    clk

    clk

    tskew

    tsetup

    setup skew2 2pd ct T t t

  • 7/28/2019 Lec 19 Design for Skew

    22/29

    CMOS VLSI DesignDesign for Skew Slide 22

    Time Borrowing

    Logic may not exactly fit half-cycle

    No flexibility to borrow time to balance logic

    between half cycles

    Traditional domino sequencing overhead is about

    25% of cycle time in fast systems!

    Static

    Dynamic

    Latch

    clk

    Static

    Dynamic

    clk clk

    Static

    Dynamic

    Latch

    Static

    Dynamic

    clk clk clk

    clk

    clk

    tskew

    tsetup

  • 7/28/2019 Lec 19 Design for Skew

    23/29

    CMOS VLSI DesignDesign for Skew Slide 23

    Relaxing the Timing

    Sequencing overhead caused by hard edges

    Data departs dynamic gate on late rising edge

    Must setup at latch on early falling edge

    Latch functions

    Prevent glitches on inputs of domino gates

    Holds results during precharge

    Is the latch really necessary?

    No glitches if inputs come from other domino Can we hold the results in another way?

  • 7/28/2019 Lec 19 Design for Skew

    24/29

    CMOS VLSI DesignDesign for Skew Slide 24

    Skew-Tolerant Domino

    Use overlapping clocks to eliminate latches at phase

    boundaries.

    Second phase evaluates using results of first

    a

    Static

    Dynamic1

    Static

    Dynamic2

    b c d

    a

    1

    2

    b

    c

    a

    1

    2

    b

    c

    No latch at

    phase boundary

  • 7/28/2019 Lec 19 Design for Skew

    25/29

    CMOS VLSI DesignDesign for Skew Slide 25

    Full Keeper

    After second phase evaluates, first phase precharges

    Input to second phase falls

    Violates monotonicity?

    But we no longer need the value

    Now the second gate has a floating output

    Need full keeper to hold it either high or low

    weak full

    keeper

    transistors

    f

    X

    H

  • 7/28/2019 Lec 19 Design for Skew

    26/29

    CMOS VLSI DesignDesign for Skew Slide 26

    Time Borrowing

    Overlap can be used to

    Tolerate clock skew

    Permit time borrowing

    No sequencing overhead

    tskew

    Static

    Dynamic

    Static

    Dynamic

    Static

    Dynamic

    Dynamic

    Static

    Dynamic

    Static

    Dynamic

    Static

    Dynamic

    Dynamic

    Static

    Static

    1

    2

    1

    1

    1

    1

    1

    2

    2

    2

    Phase 1 Phase 2

    toverlap

    tborrow

    pd ct T

  • 7/28/2019 Lec 19 Design for Skew

    27/29

    CMOS VLSI DesignDesign for Skew Slide 27

    Multiple Phases

    With more clock phases, each phase overlaps more

    Permits more skew tolerance and time borrowing

    Static

    Dynamic

    Static

    Dynamic

    Static

    Dynamic

    Dynamic

    Static

    Dynamic

    Static

    Dynamic

    Static

    Dynamic

    Dynamic

    Static

    Static

    3

    4

    1 1 2 2 3 3 4 4

    Phase 1 Phase 2 Phase 3 Phase 4

    1

    2

  • 7/28/2019 Lec 19 Design for Skew

    28/29

    CMOS VLSI DesignDesign for Skew Slide 28

    Clock Generation

    clken

    1

    2

    3

    4

  • 7/28/2019 Lec 19 Design for Skew

    29/29

    CMOS VLSI DesignDesign for Skew Slide 29

    Summary

    Clock skew effectively increases setup and holdtimes in systems with hard edges

    Managing skew

    Reduce: good clock distribution network

    Analyze: local vs. global skew

    Tolerate: use systems with soft edges

    Flip-flops and traditional domino are costly

    Latches and skew-tolerant domino perform at fullspeed even with moderate clock skews.