62
® Page 1 Design of Clock Distribution in High Performance Processors Ian Young Intel Senior Fellow and Director, Advanced Circuits and Technology Integration (ACTI) Technology Manufacturing Group (TMG) Intel Corporation, Hillsboro, Oregon

Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 1

Design of Clock Distribution in High Performance Processors

Ian Young

Intel Senior Fellow and Director, Advanced Circuits and Technology Integration (ACTI)

Technology Manufacturing Group (TMG)Intel Corporation, Hillsboro, Oregon

Page 2: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 2Clock Distribution in Microprocessors I. Young 3/30/2005

Desktop Clock Frequency Trend

10 MHz

100 MHz

1985 1990 1995

Year

1 GHz

2000 2005

10 GHz

Clo

ck F

requ

ency

PIIPentium

PIII

P4

XX

X

486 X

X

Page 3: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 3Clock Distribution in Microprocessors I. Young 3/30/2005

Outline

Introduction to Synchronous Logic and clock distribution

Manufacturing effects

Early history of clocking: 80486, Pentium and Pentium II

Itanium active deskewing clock distribution

Pentium4 clock distribution

Montecito (Itanium family next gen.) clock distribution

Summary

Page 4: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 4Clock Distribution in Microprocessors I. Young 3/30/2005

Synchronous Logic

Logic progresses at a rate controlled by the clock— Retiming removes the effects of different logic and wire delays— Slows down signals that arrive too fast

Requires a state element— Latch stores Input when clock is low— Flip-Flop stores Input when clock rises

Requires a precise clock

Enables CPU pipelining and high through-put CPUs

Page 5: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 5Clock Distribution in Microprocessors I. Young 3/30/2005

Flip-Flop: Set-up and Hold Times

Setup Time:— time before the clock signal, that a data signal must be valid in

order to be stored.

Hold Time:— time after the clock signal, that a data signal must be valid in

order to be stored.

Clock

Data In

Data Out

Setup time

Hold time

Page 6: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 6Clock Distribution in Microprocessors I. Young 3/30/2005

The Clock Distribution Problem

Deliver the clock signal from the source (PLL) to all the receivers with the best timing precision. CLOCK SKEW is the inaccuracy of the same clock edge arriving at various locations in the chip (spatial separation)CLOCK JITTER is the inaccuracy of consecutive clock edges arriving at the same location (temporal separation)

Skew

AB

JitterC

A

B

C

PLL

Page 7: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 7Clock Distribution in Microprocessors I. Young 3/30/2005

Clock and Logic Structure & Operation

Clock

Data OutData Out

U1 U2

Data InData In

C

Buffer controlling U1 data capture & transmit

ComputationCircuitry

Data path

Buffer Controlling U2 data capture & transmit

B1

B2Clock path 2

CLOCK

Data OutData Out

FFU1

FFU2

Data InData In

C

Buffer controlling U1 data capture & transmit

ComputationCircuitry

Clock path 1

Data path

Buffer Controlling U2 data capture & transmit

B1

Clock path 2

Clock Delay 1 = delay from Clock to the Clock input of U1.Clock Delay 2 = delay from Clock to the Clock input of U2.Clock Skew = Clock Delay 2 – Clock Delay 1 (should be zero).

Page 8: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 8Clock Distribution in Microprocessors I. Young 3/30/2005

SETUP violation: Data arrives at U2 too late, anddoesn’t get captured by U2 clock cycle.

Combinational logic

CU1 U2DATA IN DATA OUT

DATA IN

CLOCK

0

0

1

1

1

1

DATA IN Data ArrivesToo Late

CLOCK

DATA OUT

DATA IN

DATA OUT

CLOCK

U1

bU2

CLOCK SKEWCLOCK SKEW

Page 9: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 9Clock Distribution in Microprocessors I. Young 3/30/2005

HOLD violation: Data arrives at U2 too soon, not held long enough to be captured by U2 clock cycle.

CCombinational logic

U1 U2DATA IN DATA OUT

CLOCK

0

0

1

1

1

1

DATA IN Data ArrivesToo SoonU1 CLOCK

DATA OUT

DATA IN

U2DATA OUT

CLOCK

CLOCK SKEWCLOCK SKEW

Page 10: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 10Clock Distribution in Microprocessors I. Young 3/30/2005

Challenges for Microprocessor Clock Design

Rapid increases in Core Clock Frequencies— 1991: 100 MHz (0.8mm)— 1997: 400 MHz (0.35mm)— 2001: 2.0GHz (0.13mm)— 2005: 3.8GHz (90nm)

Increasing Clock Load— as indicated by total transistors/die— 1991: 1.2 million transistors (0.8mm) — 1997: 7.5 million transistors (0.35mm)— 2001: 42 million transistors (0.13mm) — 2005: 1.7 billion transistor (90nm)

Worsening within-die process variations— Lithography and Etch — Supply Noise— Hot Spots

Page 11: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 11Clock Distribution in Microprocessors I. Young 3/30/2005

Clock Distribution H-Tree (2 level)Global / LocalSkew

L1Ext Clk PLL

(100MHz)

(2 GHz)

L2

L3

Page 12: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 12Clock Distribution in Microprocessors I. Young 3/30/2005

Outline

Introduction to Synchronous Logic and clock distribution

Manufacturing effects

Early history of clocking: 80486, Pentium and Pentium II

Itanium active deskewing clock distribution

Pentium4 clock distribution

Montecito (Itanium family next gen.) clock distribution

Summary

Page 13: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 13Clock Distribution in Microprocessors I. Young 3/30/2005

Poly Gate Transistor Length Variation Sources

Long-range Within-die— Stepper lens aberrations

Proximity effects (systematic)— Nest or isolated

Random component— Stepper lens— Poly gate line edge roughness— Threshold voltage Lgate

Source Drain

Page 14: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 14Clock Distribution in Microprocessors I. Young 3/30/2005

Transistor Vth Variation Sources

Die-to-die

Random component— From Random Dopant Fluctuations f (W, L)

Short channel component.

Well and Halo doping

Page 15: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 15Clock Distribution in Microprocessors I. Young 3/30/2005

Vth Variation Model

Model relates variability to device size:

Where We and Le are the effective device width and length. C1 and C2 relate to some physical device parameters such as Tox, junction depth, etc.

WeZeCCVt 2

1)( +=∆σ

Page 16: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 16Clock Distribution in Microprocessors I. Young 3/30/2005

Interconnect Variability

Conductor width and space

Conductor Thickness

Dielectric Thickness

Inductance — needs to be analyzed and modeled for busses

Page 17: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 17Clock Distribution in Microprocessors I. Young 3/30/2005

Interconnect Variability

Conductor Thickness Dielectric Thickness Power and Global SignalInductance (needs to be analyzed and modeled)

Metal LineVia

Insulating Dielectric

M1M2

M5

M4

M3

M7

M6

Page 18: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 18Clock Distribution in Microprocessors I. Young 3/30/2005

Outline

Introduction to Synchronous Logic and clock distribution

Manufacturing effects

Early history of clocking: 80486, Pentium and Pentium II

Itanium active deskewing clock distribution

Pentium4 clock distribution

Montecito (Itanium family next gen.) clock distribution

Summary

Page 19: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 19Clock Distribution in Microprocessors I. Young 3/30/2005

The 50MHz 80486 Microprocessor (1991)

PLL

• 3 Layer Metal• 2 Phase Clocking • RC clock skew• On-chip PLL

Page 20: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 20Clock Distribution in Microprocessors I. Young 3/30/2005

Clock Buffers80486 Microprocessor: PLL

PhaseFreqDet

CP VCOVcntl Div

2

Clkin

Clkout

Ph1/Ph2Ext Clk

Fb Clk

50% duty-cycle 2φ clocks, Clkin, Clkout compatible with prior gen.Internal Clock Skews Between Chips Reduced by ~2nsEnables 0-1ns Hold Time for frequency ScalabilityVCO Designed with: Wide Frequency Range (5-130MHz)

Supply Voltage Noise Rejection

Conventional With PLLSetup 3ns 1.5nsHold 2.5ns 1.0nsOutput Valid 9.0ns 7.5ns

Page 21: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 21Clock Distribution in Microprocessors I. Young 3/30/2005

Pentium Microprocessor Clock DistributionSingle 50% Duty-cycle Clock66-133MHz Internal Frequency Internal clock freq. vs. External bus freq. Ratios of 2:1, 3:2, 1:1

To Global Clock Driver

PLL Clock Generator

To Global Clock Driver

Local Clock Enable

Local Clock Enable

T=0ns

Global Clock DriversLocated in the Pad Ring

Serpentines for RC matching

Page 22: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 22Clock Distribution in Microprocessors I. Young 3/30/2005

Pentium II (P6) Microprocessor Clock Distribution

Buffer Network designed for > 300 MHzMinimized the Propagation Delay Minimized Global Clock Skew Global Clock Power DownSupply di/dt noise reduction

— Vdd / Vss decoupling capacitance— Minimize Vdd / Vss DC Resistance (IR drop) — Minimize Vdd / Vss AC Resistance and Inductance

Page 23: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 23Clock Distribution in Microprocessors I. Young 3/30/2005

P L L

C lo c k _ e nL o c a l C lo c k D r iv e rs

G lo b a l C lo c k D riv e rs

L o c a l C lo c k D r iv e rs

L o c a l C lo c k D r iv e rs

1 .2 n s

C lo c k D is t r ib u tion N e tw o rk

g c lk # c lk

2 n s

Global

Page 24: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 24Clock Distribution in Microprocessors I. Young 3/30/2005

Pentium II Global Clock Skew measured test chip results

SK= -592ps

SK= -460ps

SK = -424ps SK = -548ps

SK= -564ps

SK= -476ps with M4 Metal Strapping Ring

SK= -488ps

Input Point to Local Buffers

5 Level Driver for 500pF load

with clock gating

Floor Plan of measured skew SK = Skew relative to feedback point from local bufferSkew across M4 Global Distribution = 140ps

Page 25: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 25Clock Distribution in Microprocessors I. Young 3/30/2005

Outline

Introduction to Synchronous Logic and clock distribution

Manufacturing effects

Early history of clocking: 80486, Pentium and Pentium II

Itanium active deskewing clock distribution

Pentium4 clock distribution

Montecito (Itanium family next gen.) clock distribution

Summary

Page 26: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 26Clock Distribution in Microprocessors I. Young 3/30/2005

Itanium Processor Clocking

IA-64 architecture0.18µm CMOS6 metal layers25.4M transistors800MHz frequency

FPUIA-32Control

Instr.Fetch &Decode

Cache

Cache

TLB

Integer Units

IA-64 Control

Bus

Ref: S. Tam, S. Rusu, U. Desai, R. Kim, J. Zhang, I. Young JSSC Nov 2000

Page 27: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 27Clock Distribution in Microprocessors I. Young 3/30/2005

Clock Distribution Hierarchy

GlobalDistribution

VCC/2

DSK

DSK

DSKPLL

CLKPCLKN

RCD

RCDDLCLK

OTB

RegionalDistribution

LocalDistribution

ReferenceClock

MainClock

Page 28: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 28Clock Distribution in Microprocessors I. Young 3/30/2005

Itanium Global Clock Distribution

Balanced H-tree routed in M5 and M6Lateral shieldingDistributes both main and reference clockOptimized to account for inductive effectsPLL

DSK

DSK

DSK

DSKDSK

DSKDSK

DSK

Page 29: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 29Clock Distribution in Microprocessors I. Young 3/30/2005

Itanium Regional Clock Distribution

Distributed array of deskew buffers to reduce process related skew

— 8 deskew clusters each holding up to 4 buffers

Regional clock grids driven by modular Regional Clock Drivers

— M4-M5 grid tailored for the clock load densityof the underlying block

— Full support for scan and clock gating

DSK = Cluster of 4 deskew buffers

DSK

DSK

DSK

DSKDSK

DSKDSK

DSK

CDC

CDC = Central Deskew Controller

Page 30: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 30Clock Distribution in Microprocessors I. Young 3/30/2005

Deskew Buffer: Block Diagram

Regional Feedback Clock

Ref. ClockLocal Controller

RCD

RCD

Global Clock

TAP I/F

Deskew Buffer

RegionalClock Grid

DelayCircuit

Deskew covers the entire clock distribution up to the input of the local clock buffer

Page 31: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 31Clock Distribution in Microprocessors I. Young 3/30/2005

Deskew Buffer: Delay Circuit

Enable

20-bit Delay Control Register

Output

Input

TAP I/F Step size = 8.5psDeskew range = 170ps

Small step size enables fine granularity skew control over a wide rangeTAP read/write access to Control Register enables faster timing debug and performance tuning

Page 32: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 32Clock Distribution in Microprocessors I. Young 3/30/2005

Deskew Buffer: Local Controller

Referenceclock 16-to-1

CounterEnable

Phase Detector

Digital Low-Pass Filter

To Deskew Buffer Register

Feedbackclock

Lead/Lag

Phase detector output sampled every 16 core cycles6-tap digital low pass filter reduces comparison noiseLocal controller ensures stable deskew operation

Page 33: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 33Clock Distribution in Microprocessors I. Young 3/30/2005

Itanium Skew Measurements

0102030405060708090

100110120

R01 R03 R11 R20 R22 R31 R33 R41 R43 R51 R53 R63 R71

Regional Clock

Dis

trib

utio

n Sk

ew (p

s) Projected max skew without deskew mechanism = 110ps

Max skew with deskew mechanism = 28ps

Page 34: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 34Clock Distribution in Microprocessors I. Young 3/30/2005

Clock Skew Timing Budgets

Common LCB Ts1

Common RCD Ts2

Common Reference Ts3

Common Main Clock Ts4

Category Skew Budget

Ts1 < Ts2 < Ts3 < Ts4

DSK

CommonReference

LCBRCD

CB

CB

CB

MainClock

CB

DSK

DSK

RCD

RCD

LCB

CommonReference

Multiple skew budgets minimize the skew penaltyand enable timing optimization

Page 35: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 35Clock Distribution in Microprocessors I. Young 3/30/2005

Outline

Introduction to Synchronous Logic and clock distribution

Manufacturing effects

Early history of clocking: 80486, Pentium and Pentium II

Itanium active deskewing clock distribution

Pentium4 clock distribution

Montecito (Itanium family next gen.) clock distribution

Summary

Page 36: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 36Clock Distribution in Microprocessors I. Young 3/30/2005

Pentium®4 Clock Challenges

Enable Netburst™ micro-architecture for 0.18um technology

≥ 2GHz clock for Hyper Pipelined Technology core≥ 4GHz clock for Rapid Execution Engine≥ 400MHz I/O clock for fast data transfer

— < 10% clock inaccuracy

Enable clock gating for low powerClock observability and controllability for fast debug

Reference: Kurd, Barkatullah, Dizon, Fletcher, Madland JSSC Nov 2001

Page 37: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 37Clock Distribution in Microprocessors I. Young 3/30/2005

Clock Generation & Distribution

I/OPLL

Clock Dist

CorePLL

Clock Dist LCDs

systemclocks

Local ClockDrivers

Cor

e C

lock

s

LCDs

LCDs

LCDs

LCDs

LCDs I/O C

loc k

s

2x

1x

½x

1x

2x

4x

4GHz

2GHz

1GHz100MHz

100MHz

400MHz

200MHz

Page 38: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 38Clock Distribution in Microprocessors I. Young 3/30/2005

Clock Distribution

I/OPLL

Clock Dist

CorePLL

Clock Dist LCDs

systemclocks

Local ClockDrivers

Cor

e C

lock

s

LCDs

LCDs

LCDs

LCDs

LCDs I/O C

loc k

s

2x

1x

½x

1x

2x

4x

Page 39: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 39Clock Distribution in Microprocessors I. Young 3/30/2005

Binary distribution tree in three spines

F r o m P L L

Page 40: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 40Clock Distribution in Microprocessors I. Young 3/30/2005

Triple Clock Spines

Top Spine

Middle Spine

Bottom Spine PLL

BinaryDist.Tree

SkewOptimizer

To LocalClock Drivers

From PLLSp

ine

Page 41: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 41Clock Distribution in Microprocessors I. Young 3/30/2005

Skew Optimization Scheme

PLL

binarytreeof

clockrepea-

ters

DB47

DB1

DB2

DB3

PDDB46

PD

To TestAccess Port Local Clock

Macro

LCDPD

PD Local ClockMacro

Local ClockMacroLCD

LCD

LCD

LCD

SE

SE

SE

SE

SE

AdjustableDomain Buffers

PhaseDetectors

LocalClock

Drivers

SequentialElements

FilteredVCC

Skew Optimizer

Page 42: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 42Clock Distribution in Microprocessors I. Young 3/30/2005

Skew Profile Graph

-40.0

-30.0

-20.0

-10.0

0.0

10.0

20.0

30.0

40.0

Left Right

Rel

ativ

e Ti

min

g (p

s)

TopMiddleBottom

REFERENCEAFTER

BEFORE

Page 43: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 43Clock Distribution in Microprocessors I. Young 3/30/2005

Skew Control Scheme

For the particular die example shown— Pre-compensation max skew ~ ±32ps— Post-compensation max skew ±8ps

Side benefits— Provides a within-die skew profile— Deliberately skew clocks for performance

– 200MHz frequency increase obtained

Page 44: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 44Clock Distribution in Microprocessors I. Young 3/30/2005

Outline

Introduction to Synchronous Logic and clock distribution

Manufacturing effects

Early history of clocking: 80486, Pentium and Pentium II

Itanium active deskewing clock distribution

Pentium4 clock distribution

Montecito (Itanium family next gen.) clock distribution

Summary

Page 45: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 45

Montecito Clock System Floorplan

PLL / PLL / translatiotranslation table / n table / clock clock controlcontrol

FSB FSB DFDsDFDs

Foxton Controller DFDFoxton Controller DFD

FSB FSB DFDsDFDs

Core Core DFDsDFDs

RVDsRVDs

CORE 1CORE 1

CORE CORE 00

Bus Logic DFDBus Logic DFD

References: ISSCC 2005 Paper 16.1, T. Fischer et al, ISSCC 2005 Paper 16.2, P. Mahoney et al

Page 46: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 46Clock Distribution in Microprocessors I. Young 3/30/2005

Clock SystemArchitecture

FixedSupply

VariableSupply

Bus Clock

I/Os

Foxton

Bus Logic

Core0

Core1

1/N

CVD

RAD

Gater

1/1

GaterCVD

RVD

DFD

DFD

SLCB CVD GaterPLL

DFD

SLCB CVD Gater

SLCB

DFD

CVD Gater

1/M

DFD

DFD

1/NRAD

SLCB

SLCB

Frequency Translation Table

DivisorsFuses

PinsBalanced

Tree ClockDistribution

Phase Aligner

Page 47: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 47Clock Distribution in Microprocessors I. Young 3/30/2005

Montecito Clock Distribution Summary (1)

Core Clock Frequency controlled real-time based upon DC and transient power supply voltage

— Montecito/Foxton Power delivery sets DC supply voltage based upon power dissipation (temp. sensors around the chip) of 100W. No worst case power.

— Ldi/dt supply noise transients slow critical paths and reduce the operating frequency for a few cycles. Constantly varying frequency responds to the core supply voltage transient behavior.

Regional Active Deskewing system reduces the process voltage and temperature sources of skew across the 21.5mm x 27.7 mm die.Clock Venier Devices (CVD) inserted at each local clock buffer allow 70ps of adjustment via Scan control.The clock distribution system consumes less than 25W for the 30mm route from PLL through the clock tree to all the Latches.

Page 48: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 48

Montecito Clock System Floorplan

PLL / PLL / translation translation table / table / clock clock controlcontrol

FSB FSB DFDsDFDs

Foxton Controller DFDFoxton Controller DFD

FSB FSB DFDsDFDs

Core Core DFDsDFDs

RVDsRVDs

CORE 1CORE 1

CORE 0CORE 0

Bus Logic DFDBus Logic DFD

Page 49: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 49Clock Distribution in Microprocessors I. Young 3/30/2005

Overview with block diagram

Bus Clock

IOs

Foxton

Bus Logic

REPEATERS

core0

core1

SLCB

RAD

RAD

GATERSCVDSLCB

DFD

DFD

PLL

DFD

DFD

DFD

DFD

L0 route L1 route L2 route

Latches

Latches

Latches

Latches

Latches

Latches

Latches

SLCB

SLCB

SLCB

GATERSCVD

GATERSCVD

GATERSCVD

GATERSCVD

L3 Route

Fixed frequencyLow Voltage Swings

Variable Frequency Full Rail Transitions

Differential Single Ended

Page 50: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 50Clock Distribution in Microprocessors I. Young 3/30/2005

Montecito Clock Distribution Summary (2)

The LO clock route is differential with 400mV swig and resistive load at the end of each line (length = 20mm)

— Line width tapering is used— A self-biased differential amplifier is the repeater

L1 route (length = 2mm) from the Digital Frequency Divider (DFD) to the Second Level Clock Buffer (SLCB) is distributed as a half frequency 0o / 90o clocks that are XOR’d in the SLCB and not duty cycle sensitiveL2 route (length=3mm) from SLCBs to the Clock Venier Device has < 6ps skew using optimization with a CAD timing tool.L3 route (length = 2mm), from the CVDs through the clock gaters to the Latches, provides an overall clock skew adjustment to < 10ps to the Latches controlled by test scan

Page 51: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 51Clock Distribution in Microprocessors I. Young 3/30/2005

Digital Frequency Divider (DFD) Block Diagram

PLL CLOCK DIFFERENTIAL INPUT

64 PHASES

STATE MACHINE

FULL FREQUENCY DIFFERENTIAL “UTILITY”CLOCK ROUTES TO CLOCK SYSTEM

PCSM

PERIOD ADJUST +2 TO -1

16-PHASE DLL AND INTERPOLATION

TO / FROM SAME-CORE PCSMS

STARTUP CONTROL

RVD UP / DOWN REQUESTS

DIVIDE BY 2

DIVIDE BY 2

ODCS CONTROL

SCAN AND TRIGGERS

½ FREQUENCY QUADRATURE DIFFERENTIAL CLOCK ROUTES TO SLCBS

Page 52: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 52Clock Distribution in Microprocessors I. Young 3/30/2005

Voltage-to-Frequency Conversion (VFC)Per Core

DFD1 L1 Clock Route To SLCBS

L0 Clock Route from PLL

22

4

RVD4 RVD5 RVD6 RVD7

2

2

DFD2

RVD11

L1 Clock Route To SLCBS

4

RVD8 RVD9 RVD10

2

8

8

DFD0

RVD3

L1 Clock Route To SLCBS

2

4

RVD0 RVD1 RVD2

2

8

Utility Clocks

Utility Clocks

Utility Clocks

Page 53: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 53Clock Distribution in Microprocessors I. Young 3/30/2005

Variable Frequency Mode:CMOS Critical Path Scaling

0.5

0.7

0.91.1

1.3

1.5

1.7

1.92.1

2.3

2.5

600 800 1000 1200

Supply Voltage (mV)

Del

ay (n

orm

to 1

100

mV)

circuit1circuit2circuit3circuit4circuit5

Page 54: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 54Clock Distribution in Microprocessors I. Young 3/30/2005

RVD Coarse Delay Element

I

O

I

I

I

I I

I

Metal 1 Serpentine Resistor

out

VDD

nout

nrun

run

nrunfbn

runfbp

GND

VDD

even

clear

GND

nrun fetconfig_fet nfet

nrunrun

GND

VDDfbp

nclear

nodd

innfet

Page 55: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 55Clock Distribution in Microprocessors I. Young 3/30/2005

RVD Block Diagram

Delay Line 0A

Delay Line 1A

Delay Line 0B

Delay Line 1B

RV

D F

SM

dly0in

dly1in

dly0out

dly1out

HOLD

eval0

DOWN

eval1eval0

eval1

clk

additional delay creates deadzone

Page 56: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 56Clock Distribution in Microprocessors I. Young 3/30/2005

Example VFC Supply Droop Response

DFD Output Clock

Vcore

RVD Delay Line Clock

Period “UP” to DFD

Clock period increasedNo Adjust needed this cycle

Droop increases RVD delay line delay

Increased delay asserts period “UP” for one cycle

3 4 51 2

Page 57: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 57Clock Distribution in Microprocessors I. Young 3/30/2005

SLCB Block Diagram

SLCBO

PCb

Setback dutycycle

inainbincind

PCa

128 Bit Delay Element

128 Bit Thermometer

ShifterShifter ControlFilter

Zone Summer

Output Buffer &Duty Cycle Control

PRESET

DutyCycleScan Registers

Set-backScanRegisters

Page 58: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 58Clock Distribution in Microprocessors I. Young 3/30/2005

CVD Circuit and Operation

I O

I I I

SLCBOx cvdoSLCBO

mid

highlow Drive fight with feedback is

attenuated with pass gate settings change the delay as desired

SPICE simulationsshowing low, mid andhigh delay settingsfor SLCBOx (top graph)and CVDO (bottom graph)

Page 59: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 59Clock Distribution in Microprocessors I. Young 3/30/2005

Route Statistics and Power

Route Terminals Distance Delay

L0 14 20mm 640ps

L1 71 5mm 215ps

L2 14500 2-3.3mm 60ps

L3 ~5 million 0-1.5mm 12ps

Power dissipation contribution by route

Total CPU

L3

L2L1L0

Route statistics

•Highest load and most power dissipated in the L3 route.•Future research into low-power clock distribution should focus on last section of route.

Page 60: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 60Clock Distribution in Microprocessors I. Young 3/30/2005

Evolution in Clock Distribution• 486: PLL on-chip to remove the large clock distribution delay (zero

delay buffer). Clock RC skew minimized across chip with metal.• Pentium: Clock tree with length “tuning” for skew balancing.• Pentium II: Clock Binary Tree in center Spine with branch length

“tuning” to local clock buffer for skew balancing.• Itanium I: Lightly loaded “balanced” reference clock routed with the

highly loaded “unbalanced” clock tree - actively adjust clock buffer delay for low skew (at product test).

• Pentium IV: Three binary tree Spines with “tunable delays” and Phase Detectors distributed across the die. Blow fuses (based upon compare algorithm at test).

• Itanium (Next Gen):- Differential global clock distribution (20mm). - Digital Frequency Divider (synthesizer) adjusts frequency in 1.6% steps within 2 cycles based upon measured local supply.

- Regional Active Deskew adjusts Second Level Clock Buffer delay for low skew ( done during test)

- Local clock buffer variable delay adjust to load (flip-flops) by design (time borrowing)

1991

1993

1995

1998

2000

2004

Complexity

Page 61: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 61Clock Distribution in Microprocessors I. Young 3/30/2005

Summary / Future Directions

Clocking systems have evolved with even more complex electrical methods

— Trimming and active feedback de-skewing circuits developed— Transient Frequency adjust based upon local supply voltage

Design the micro-architecture with interconnect delay in mind Exploit locality for frequency scaling

— Logic / clock domains

Clock Distribution Power will take a larger % of the total chip power

Page 62: Design of Clock Distribution in High Performance …...Pentium Microprocessor Clock Distribution Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External

Page 62Clock Distribution in Microprocessors I. Young 3/30/2005

Acknowledge the contributions of

Keng Wong, TMG/LTD Design Simon Tam and the Itanium clock design teamNasser Kurd and the Pentium 4 clock design team Patrick Mahoney, Tim Fischer and Montecito clock team