20
Clock Distribution on a Dual- core Multi-threaded Itanium- Family Microprocessor

Clock Distribution on a Dual- core Multi-threaded Itanium ... · Overview with block diagram Variable Frequency Full Rail Transitions Bus Clock IOs Fixed frequency Low Voltage Swings

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Clock Distribution on a Dual- core Multi-threaded Itanium ... · Overview with block diagram Variable Frequency Full Rail Transitions Bus Clock IOs Fixed frequency Low Voltage Swings

Clock Distribution on a Dual-core Multi-threaded Itanium™-

Family Microprocessor

����������� ��������� �� ������� ����� ��

� �� �� ������ �

��� ��� ��� ��������������� ������ ��� �

Page 2: Clock Distribution on a Dual- core Multi-threaded Itanium ... · Overview with block diagram Variable Frequency Full Rail Transitions Bus Clock IOs Fixed frequency Low Voltage Swings

Topics• Overview of distribution• L0 route – from PLL to Digital Frequency Divider

(DFD)• L1 route – from DFD to 2nd Level Clock Buffer (SLCB)• Regional Active Deskewing (RAD) system• L2 route – from SLCB to CVD’s• Clock Vernier Device (CVD) description• L3 route – from CVDs to gaters to latches• Summary

Page 3: Clock Distribution on a Dual- core Multi-threaded Itanium ... · Overview with block diagram Variable Frequency Full Rail Transitions Bus Clock IOs Fixed frequency Low Voltage Swings

Overview with block diagram

Variable Frequency Full Rail Transitions

Bus Clock

IOs

Fixed frequencyLow Voltage Swings

Foxton

Bus Logic

REPEATERS

core0

core1

SLCB

RAD

RAD

GATERSCVDSLCB

DFD

DFD

PLL

DFD

DFD

DFD

DFD

L0 route L1 route L2 route

Latches

Latches

Latches

Latches

Latches

Latches

Latches

SLCB

SLCB

SLCB

GATERSCVD

GATERSCVD

GATERSCVD

GATERSCVD

L3 Route

Differential Single Ended

Page 4: Clock Distribution on a Dual- core Multi-threaded Itanium ... · Overview with block diagram Variable Frequency Full Rail Transitions Bus Clock IOs Fixed frequency Low Voltage Swings

Level 0 Route: PLL to DFD• Connects PLL to the DFDs• 400mV Low-voltage swing, constant frequency• 20mm long, repeated every 5mm• Balanced h-tree tapered route with matched impedance

resistively terminated.• No duty cycle correction in repeaters, DFD not sensitive

GNDCLK -CLK +GND

GND VDD VDD VDDGND GND N-2 layer

Page 5: Clock Distribution on a Dual- core Multi-threaded Itanium ... · Overview with block diagram Variable Frequency Full Rail Transitions Bus Clock IOs Fixed frequency Low Voltage Swings

L0 Repeater Circuit

outb

outa

ina

inb

Feedback improves edge rates

O

II

Ooutinout in

GND

VDDVDDVDD

GND

VDD

VDD

VDD

GND

VDD VDD

GND

GND

VDD

GND

VDD

ayax

ina inboutaoutb noutb nouta

Page 6: Clock Distribution on a Dual- core Multi-threaded Itanium ... · Overview with block diagram Variable Frequency Full Rail Transitions Bus Clock IOs Fixed frequency Low Voltage Swings

Level 1 Route: DFD to SLCB

• Connects DFD to the Second Level Clock Buffer (SLCB)• Full rail signal w/ varying supply and varying frequency• Runs at half core frequency using differential clocks.• Balanced h-tree tapered route with matched impedance

GND VDD VDD VDDGND GND

CLK + CLK - CLK90 + CLK90 -G GG

N-2 layer

Page 7: Clock Distribution on a Dual- core Multi-threaded Itanium ... · Overview with block diagram Variable Frequency Full Rail Transitions Bus Clock IOs Fixed frequency Low Voltage Swings

SLCBO

PCb

SLCB Block Diagram

Setback dutycycle

inainbincind

PCa

128 Bit Delay Element

128 Bit Thermometer

ShifterShifter ControlFilter

Zone Summer

Output Buffer &Duty Cycle Control

PRESET

DutyCycleScan Registers

Set-backScanRegisters

Page 8: Clock Distribution on a Dual- core Multi-threaded Itanium ... · Overview with block diagram Variable Frequency Full Rail Transitions Bus Clock IOs Fixed frequency Low Voltage Swings

ina

inb

Feedback turns on

inc

ind

outa

Route Feedback

XOR output generator

Differential to single ended converter

L1 Route Topology

O

I

I

O

I

I

VDD

GND

GND

VDD

GND

GND

VDD

VDD

p12

p12b

outa

outb

ind

inb

ina

inc, ind

inc

ina, inb

Page 9: Clock Distribution on a Dual- core Multi-threaded Itanium ... · Overview with block diagram Variable Frequency Full Rail Transitions Bus Clock IOs Fixed frequency Low Voltage Swings

SLCB Delay Operation

0

40

80

120

160

200

0 50 100 150trim setting (in ticks)

dela

y (p

s)adjusted delay

Measured silicon data showing adjusted delay vs. trim setting

4:1mux

Delay line

Delay line

Delay line

Delay line

Delay line

Delay line

Delay line diagram

Page 10: Clock Distribution on a Dual- core Multi-threaded Itanium ... · Overview with block diagram Variable Frequency Full Rail Transitions Bus Clock IOs Fixed frequency Low Voltage Swings

RAD System Overview

11 12 13 14 15

21 2223 24 25 36

31 32 34

33

35

41

42

43

44

52

45

53

46

48

4762

Reference Zone Delay Centered

Each arrow in the zone diagram represents the below phase comparator

I

O

I

O

CVD

CVD

VDDVDD

GND

VDD

GND

VDD

cvdacvdb

inb

up

ina

down

Page 11: Clock Distribution on a Dual- core Multi-threaded Itanium ... · Overview with block diagram Variable Frequency Full Rail Transitions Bus Clock IOs Fixed frequency Low Voltage Swings

RAD Circuit Operation

RAD system disabled.

RAD system enabled.

Page 12: Clock Distribution on a Dual- core Multi-threaded Itanium ... · Overview with block diagram Variable Frequency Full Rail Transitions Bus Clock IOs Fixed frequency Low Voltage Swings

L2 route: SLCB to CVD• Connects SLCB to Clock Vernier Device (CVD)• Full rail signal w/ varying supply and varying frequency• Runs at full core frequency and is single-ended. • H-tree route 2.1-3.3mm in length matched in RLC delay• Extracted RLC simulation with a target skew of 6ps• Each SLCB typically drives several hundred CVD’s• Route tapered to reduce power and RC flight

GNDCLKVDD

Page 13: Clock Distribution on a Dual- core Multi-threaded Itanium ... · Overview with block diagram Variable Frequency Full Rail Transitions Bus Clock IOs Fixed frequency Low Voltage Swings

L2 Clock Routing Tool• In house tool called EZRoute used for balanced h-tree

creation.• Uses two methods of modeling: Elmore delay and

SPICE.• Uses recursive algorithm to adjust wire widths to obtain

lowest skew for a given target.• Tool inputs to are the location of clock inputs and their

loading. The direct output is the layout mask of a clock tree.

Page 14: Clock Distribution on a Dual- core Multi-threaded Itanium ... · Overview with block diagram Variable Frequency Full Rail Transitions Bus Clock IOs Fixed frequency Low Voltage Swings

Die micrograph showing L2 grid

Page 15: Clock Distribution on a Dual- core Multi-threaded Itanium ... · Overview with block diagram Variable Frequency Full Rail Transitions Bus Clock IOs Fixed frequency Low Voltage Swings

Clock Vernier Devices (CVD’s)

• CVD’s act as local clock buffers and allow scanable delay insertion

• 14500 individually controllable CVD’s on die for localized clock skew adjustment - RAD CVD’sfor unit-level adjustments

• Skew can be added to the clock– Automate speedpath detection– Find and fix speedpaths in software via scan– Find and fix races in software via scan– In combination with scan latches and on-die clock

shrink, a powerful tool for rapid speedpath debug..

Page 16: Clock Distribution on a Dual- core Multi-threaded Itanium ... · Overview with block diagram Variable Frequency Full Rail Transitions Bus Clock IOs Fixed frequency Low Voltage Swings

CVD Circuit and Operation

I O

I I I

SLCBOx cvdoSLCBO

mid

high

low Drive fight with feedback is

attenuated with pass gate settings change the delay as desired

SPICE simulationsshowing low, mid andhigh delay settingsfor SLCBOx (top graph)and CVDO (bottom graph)

Page 17: Clock Distribution on a Dual- core Multi-threaded Itanium ... · Overview with block diagram Variable Frequency Full Rail Transitions Bus Clock IOs Fixed frequency Low Voltage Swings

Level 3 Route: CVDs to latches

• Connects CVDs to the clock-gating buffers and then latches.

• Routed by individual circuit designers – not clock team• Clocks are gated by logic wherever possible.• Up to 1.5mm long in the extreme case, but typically a

few hundred microns. • Shielded route with a worst-case delay of 12ps.• Routes not correctable by RAD system• Routes and loads are extracted and simulated in SPICE

using “picoskew” custom tool and CVD and gater sizing determined by loading.

Page 18: Clock Distribution on a Dual- core Multi-threaded Itanium ... · Overview with block diagram Variable Frequency Full Rail Transitions Bus Clock IOs Fixed frequency Low Voltage Swings

Route Statistics and Power

Power dissipation contribution by route

640ps20mm14L0

12ps0-1.5mm~5 millionL3

60ps2-3.3mm14500L2

215ps5mm71L1

DelayDistanceTerminalsRoute

Route statistics

•Highest load and most power dissipated in the L3 route.•Future research into low-power clock distribution should focus on last section of route.

Total CPU

L3

L2L1L0

Page 19: Clock Distribution on a Dual- core Multi-threaded Itanium ... · Overview with block diagram Variable Frequency Full Rail Transitions Bus Clock IOs Fixed frequency Low Voltage Swings

Summary

• Four level clock distribution system operating in a variable frequency, variable voltage domain at 2GHz.

• Deskewing system minimizes routing skew to 10ps or less - constantly - across PVT.

• CVD’s allow software detection and repair of setup or hold timing violations.

• Distribution system dissipates about 25W of power

Page 20: Clock Distribution on a Dual- core Multi-threaded Itanium ... · Overview with block diagram Variable Frequency Full Rail Transitions Bus Clock IOs Fixed frequency Low Voltage Swings

Acknowledgements

The authors wish to recognize the efforts of Steve Hall and Erin Francom and the others whose contributions made this design possible.