23
Low-overhead solutions for clock generation and synchronization. Gord Allan PhD Candidate Carleton University Monday, March 10/ 2003 A presentation in the series on ULSI Configurable Systems.

Low-overhead solutions for clock generation and ...gallan/pdf/timing.pdf · Clock and Data Recovery: Frequency Re-Synthesis Data Line Set 1 2 CLK 1)The training pulse fires off an

  • Upload
    others

  • View
    23

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Low-overhead solutions for clock generation and ...gallan/pdf/timing.pdf · Clock and Data Recovery: Frequency Re-Synthesis Data Line Set 1 2 CLK 1)The training pulse fires off an

Low-overhead solutions for clock generation and synchronization.

Gord AllanPhD CandidateCarleton University

Monday, March 10/ 2003

A presentation in the series on ULSI Configurable Systems.

Page 2: Low-overhead solutions for clock generation and ...gallan/pdf/timing.pdf · Clock and Data Recovery: Frequency Re-Synthesis Data Line Set 1 2 CLK 1)The training pulse fires off an

OutlineOutline

• Ultra Large Scale (ULSI) Configurable Systems

• Adjustable Delay Elements

• A Pausable Numerically Controlled Oscillator (NCO)

• All digital Phase-Locked Loops and Frequency Synthesis

• Hybrid Analog and Digital Extreme Range PLLs

• Calibrated Delay Lines and DLLs

• Clock-Data Recovery (CDR)

• Single Cycle Acquisition PLLs

• Frequency Re-synthesis

• Phase Adjustment

• Skew Compensation

• High speed reconfigurable links

PresentationProgress

• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs

• DLLs• Quick PLL• CDR• System IO

Page 3: Low-overhead solutions for clock generation and ...gallan/pdf/timing.pdf · Clock and Data Recovery: Frequency Re-Synthesis Data Line Set 1 2 CLK 1)The training pulse fires off an

Ultra Large Scale Configurable SystemsUltra Large Scale Configurable Systems

ARM

LocalStorage

Cache

Cache

DMA

DMA

Systolic

Array

FIRs

ADC

Cryto

uP

ARM

ARM

ARM

Periph-erals

Interface

FEC

FPGA Fabric

I/FFTFP FPModulator

USB

DACSwitch Cap

Filters

PLL DDFSIIRs

FPGA

Cache

System

Memory...

• Many sub-systems on a chip/board.

• Subsystems are fully isolated from one-another – IO is configured via software.

• On demand system configuration and processing.

Timing Issues

• Subsystems operate on independent clocks.- Low overhead clock generation.

• Communications reequire fault-tolerant, high-throughput re-synchronization.

Our architecture:

• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs

• DLLs• Quick PLL• CDR• System IO

Page 4: Low-overhead solutions for clock generation and ...gallan/pdf/timing.pdf · Clock and Data Recovery: Frequency Re-Synthesis Data Line Set 1 2 CLK 1)The training pulse fires off an

System Timing: A Key ComponentSystem Timing: A Key Component

Adjustable Delay Element• Low Power – Ideally no static current.• Low Area• Wide-Range, Fine Resolution• High operating frequency• Mixed signal control• Linear delay characteristic• Composed of standard library elements• Low Noise – More on this later…

Digital IN Digital Out

Control Word

Delay = 100pS to 1000ps, Increments of 10pS.

• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs

• DLLs• Quick PLL• CDR• System IO

Page 5: Low-overhead solutions for clock generation and ...gallan/pdf/timing.pdf · Clock and Data Recovery: Frequency Re-Synthesis Data Line Set 1 2 CLK 1)The training pulse fires off an

Some Conventional OptionsSome Conventional Options

Typical delay 100pS à inf

RFP

RFN

Starved Invertor

Power, Area

Non-linear

No dig. Control

Touchy Analog References

Wide-Range

Potentially Low Noise

BadGood

slow

slow

Typical delay 100pS à 120pS

Standard Cell (AOI)

Low-Range

Effects edges differently.

Little alg. Control.

Area, Power

High Speed

Dig. Library Element

BadGood

Switched Capacitance

Typical delay 300pS à 3000pS

Switched Paths

Power

Definite glitching.

Poor resolution.

No alg. ctrl.

Wide-Range

Std Cells

Low Noise - fast mode

BadGood…

Typical delay 200pS à 600pS

Power penalty.

Lower max. speed.

Glitching

Wide-Range

Nearly Linear

Potential Alg ctrl.

Std Cells

Low Noise

BadGood

• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs

• DLLs• Quick PLL• CDR• System IO

Page 6: Low-overhead solutions for clock generation and ...gallan/pdf/timing.pdf · Clock and Data Recovery: Frequency Re-Synthesis Data Line Set 1 2 CLK 1)The training pulse fires off an

More advanced ApproachesMore advanced Approaches

Low-Range.

Custom tweaking reqd.

Fine Resolution

Potentially Linear

BadGood

Adjustable Drive

Typical delay 200pS à 300pS

Glitchless Switched Capacitor

Power penalty.

Larger Area

Lower max. speed.

Wide-Range

Nearly Linear

Potential Alg ctrl.

Std Cells

Low Noise

BadGood

TX GateBleeder

Typical delay 400pS à 3000pS

slowLow resolution.

Low Max speed.

Noise sensitive.

Area, Power

Wide Range

Linearity

Mixed Signal Ctrl

Externally Tunable

BadGood

Self-Starved Invertor

Typical delay 120pS à 1000pS

~Area

~Power

Med-Range

Poor alg control.

Med Resolution (~30 pS)

Potentially Linear

Std Cells

BadGood

Parallel Invertor Chain

Typical delay 200pS à 400pS

• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs

• DLLs• Quick PLL• CDR• System IO

Page 7: Low-overhead solutions for clock generation and ...gallan/pdf/timing.pdf · Clock and Data Recovery: Frequency Re-Synthesis Data Line Set 1 2 CLK 1)The training pulse fires off an

Delay Element RecapDelay Element Recap

slow

slow

• Hybrids are not only possible, but are encouraged.

• The delay element to use depends largely on the application requirements.

• Is it passing High or Low frequency signals?

• What is the required delay range?

• Does glitching matter?

• Fine or coarse control resolution?

• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs

• DLLs• Quick PLL• CDR• System IO

Page 8: Low-overhead solutions for clock generation and ...gallan/pdf/timing.pdf · Clock and Data Recovery: Frequency Re-Synthesis Data Line Set 1 2 CLK 1)The training pulse fires off an

Numerically Controlled OscillatorNumerically Controlled Oscillator

All digital, Numerically Controlled Oscillator.

An example Delay Element:

Typical Power:• This is one of the more power hungry elements.• @ f = 1.25 GHz, 1.8V à 720 uW à 580 fW/Hz• ~ 6 flip-flops• With custom sizing, power can be nearly halved.

NCOFreq adj.

CLK

NCOFreq adj.

CLK

f = 2.3GHz f = 1.1GHz~ 150 MHz Intervals

RUNPASS

T = 2*(Tdelay + Tinv)

• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs

• DLLs• Quick PLL• CDR• System IO

Speed Setting 8 7 6 5 4 3 2 1 0 AVG VariationFrequency Typical Case: 27C 2.26 2.05 1.86 1.69 1.53 1.40 1.28 1.19 1.11 1.51(GHz) Worst Case: 100C 1.55 1.41 1.28 1.17 1.07 0.99 0.91 0.85 0.79 1.06 -30%

Best Case: 0C 2.76 2.52 2.29 2.08 1.89 1.73 1.58 1.46 1.35 1.86 23%

Page 9: Low-overhead solutions for clock generation and ...gallan/pdf/timing.pdf · Clock and Data Recovery: Frequency Re-Synthesis Data Line Set 1 2 CLK 1)The training pulse fires off an

An Aside… MOS NoiseAn Aside… MOS Noise• Ring Oscillators are highly non-linear. Difficult to analyze.• Noise causes the delay through each stage to vary randomly about the mean.• Translates to jitter in the output frequency• Single MOS: Thermal Noise + Flicker (1/f) Noise

fWLCoxK

gkTfV

mi ⋅

+

=

1324)(2

2iV

• Moral: For noise immunity, add capacitance instead of resistance.• In this work me make the conscious choice to focus on power and area, not noise.• For circuits which interface to RF, noise must be a priority.

• Large Transistors (WL) Poorer Area, Power, Speed• High W/L ratios Poorer Area, Power, Range• Fewer Stages Poorer Range• Larger Cox tox is smaller as technology scales.

1kΩ

100fF

10kΩ

10fFLow NoiseRC = 100 pS

Low Power, AreaRC = 100 pS

• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs

• DLLs• Quick PLL• CDR• System IO

Page 10: Low-overhead solutions for clock generation and ...gallan/pdf/timing.pdf · Clock and Data Recovery: Frequency Re-Synthesis Data Line Set 1 2 CLK 1)The training pulse fires off an

A Poor Man’s PLL:A Poor Man’s PLL:Digitally ‘Matching’ a ReferenceDigitally ‘Matching’ a Reference

• Start the ring oscillator running at full speed.

• The frequency detector will force the NCO delay to rise, thus lowering the frequency.

• Once the frequency falls below the reference, the NCO will be commanded to reverse course, and raise the frequency.

• Provided the delay through the feedback loop is controlled, we then know that the frequencies are roughly aligned, and we can use a similar approach to align the phase.

• Once locked, the oscillator will toggle between the two ‘digital’ frequencies that surround the reference frequency – this introduces a quantization based jitter

NCO

FreqUP

FreqDNFreq adj. Phase

Adjust

Phase DetectAdvance

Recede

Freq DetectRef

• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs

• DLLs• Quick PLL• CDR• System IO

Page 11: Low-overhead solutions for clock generation and ...gallan/pdf/timing.pdf · Clock and Data Recovery: Frequency Re-Synthesis Data Line Set 1 2 CLK 1)The training pulse fires off an

A Poor Man’s PLL: A Poor Man’s PLL: Clock MultiplicationClock Multiplication

• The same approach as for an analog PLL.

• Add a clock divider into the feedback path.

• The frequency and phase detectors will work to keep the two signals it ‘sees’ matched.

• The frequency detector forces the NCO to put out a frequency faster than the reference.

• Thus, by inserting a simple counter in the feedback path, we can generate integer multiples of the reference frequency.

NCO

FreqUP

FreqDNFreq adj. Phase

Adjust

Phase DetectAdvance

Recede

Ref

Divider

Nfdbk

Freq Detect

• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs

• DLLs• Quick PLL• CDR• System IO

Page 12: Low-overhead solutions for clock generation and ...gallan/pdf/timing.pdf · Clock and Data Recovery: Frequency Re-Synthesis Data Line Set 1 2 CLK 1)The training pulse fires off an

A Poor Man’s PLL:A Poor Man’s PLL:Digital Clock SynthesisDigital Clock Synthesis

• Including a divider on the reference, we can generate nearly arbitrary frequencies:

fclk = fref * (Nfdbk/Mref)

NCO

FreqUP

FreqDN

1

Divider

Freq adj. Phase Adjust

1

R R

R RDivider

Advance

Recede

Ref

• However, if the rationals N and M are large, then special considerations must be taken to unsure stability of the system.

• There is still the problem of quantization induced jitter…

RST not shown.

• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs

• DLLs• Quick PLL• CDR• System IO

Page 13: Low-overhead solutions for clock generation and ...gallan/pdf/timing.pdf · Clock and Data Recovery: Frequency Re-Synthesis Data Line Set 1 2 CLK 1)The training pulse fires off an

Improving Jitter: Hybrid PLLImproving Jitter: Hybrid PLL

• No quantization based noise – potentially suitable for RF and switched cap filtering.

• Wide Range, Quick Locking, More stability than conventional analog PLL

Osc Freq

Time

0.1M

100M

20.33 MHz ref

Lock some units digitally, pass on only the required number to analog control.

• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs

• DLLs• Quick PLL• CDR• System IO

FilterPFD CP

AnalogControlRef

DigitalFreqDetect

Page 14: Low-overhead solutions for clock generation and ...gallan/pdf/timing.pdf · Clock and Data Recovery: Frequency Re-Synthesis Data Line Set 1 2 CLK 1)The training pulse fires off an

Hybrid Analog/Digital PLL ExampleHybrid Analog/Digital PLL Example• Simulates a hybrid lock to a 12.5 Mhz reference frequency.• NCO is composed of 7 self-starved delay elements à 8 speed settings from 8 to 35 Mhz in ~3 Mhz steps.

• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs

• DLLs• Quick PLL• CDR• System IO

•Start Running Quickly •Slow Down when told •We are no longer ‘too fast’ •Lock, and give analog control

•Roughly locked – analog takes over.

Page 15: Low-overhead solutions for clock generation and ...gallan/pdf/timing.pdf · Clock and Data Recovery: Frequency Re-Synthesis Data Line Set 1 2 CLK 1)The training pulse fires off an

Calibrated Delay Line / Delay Locked LoopCalibrated Delay Line / Delay Locked Loop

D Q

D Q

D Q

When reference edge peeks out of the delay chain, decide whether to increase or decrease the delay through the line.

Snapshot of values through the chain at +ve edge clk.

REFERENCE

Note: FSM must be tolerant to metastable inputs.

• Single sided form is usefull for clock manipulation – generating offset phases, etc…

• With dual-rails, we can force an external signal to undergo the ‘same’ delay as the reference

• Usefull for highly accurate timing measurements as in ADC.

Logic: Is the falling edge at the tail of the line?Yes à Locked, No à INC or DEC delay appropriately.

• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs

• DLLs• Quick PLL• CDR• System IO

• Can be used in place of a PLL in many case.

• Locks the delay through a line to a particular reference.

Page 16: Low-overhead solutions for clock generation and ...gallan/pdf/timing.pdf · Clock and Data Recovery: Frequency Re-Synthesis Data Line Set 1 2 CLK 1)The training pulse fires off an

Clock and Data Recovery:Clock and Data Recovery:SingleSingle Cycle Acquisition PLLsCycle Acquisition PLLs

D0D1 Line IN

Local Oscillator

• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs

• DLLs• Quick PLL• CDR• System IO

• Typical Clock-recovery solutions require a PLL is trained before sampling data.

• An analog PLL would typically require hundreds of training pulses.

• A typical digital PLL would require 10s of pulses.

• There should be NO REASON why we can’t lock to a transmitter’s timing in ONE training pulse.

Page 17: Low-overhead solutions for clock generation and ...gallan/pdf/timing.pdf · Clock and Data Recovery: Frequency Re-Synthesis Data Line Set 1 2 CLK 1)The training pulse fires off an

Clock and Data Recovery:Clock and Data Recovery:SingleSingle Cycle Acquisition PLLsCycle Acquisition PLLs

D Q

D Q

Snapshot of values through the chain at +ve edge clk.

D0D1

D Q

D Q

D Q …

Line IN

Line IN

Local Oscillator

When training pulse peeks out of the delay chain:

• find the falling edge• turn on the feedback loop• creates a ring oscillator with the period of the training pulse

Extra precautions (not shown) protect against metastability.

Note: We can use further transition information in the data to adjust the frequency accordingly.

Concerns:

• Resolution of MUXs• Range vs. Area and Power • Logic Delay compensation

Logic:Search for the falling edge, set that MUX into feedback.(Requires 2 gates + a latch for each stage)

• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs

• DLLs• Quick PLL• CDR• System IO

Page 18: Low-overhead solutions for clock generation and ...gallan/pdf/timing.pdf · Clock and Data Recovery: Frequency Re-Synthesis Data Line Set 1 2 CLK 1)The training pulse fires off an

Clock and Data Recovery:Clock and Data Recovery:Frequency ReFrequency Re--SynthesisSynthesis

Data Line

Set

1

2

CLK

1) The training pulse fires off an oscillator at the ‘same’ frequency as the transmitter.

2) We then use transition information in the data to update the frequency up or down.

Problems:

• Very fine resolution NCO.

• Transition activity must be enforced.

• First pulse is slow to set off the clock.

• Stability is an issue.

• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs

• DLLs• Quick PLL• CDR• System IO

D0D1 Line IN

Local Oscillator

Transition Update

Start Oscillator

Glitcher

Page 19: Low-overhead solutions for clock generation and ...gallan/pdf/timing.pdf · Clock and Data Recovery: Frequency Re-Synthesis Data Line Set 1 2 CLK 1)The training pulse fires off an

Clock Data Recovery: Phase AlignmentClock Data Recovery: Phase Alignment

Phase Alignment • The data and clock will arrive within T seconds of each other.

• Attempt to pull the signals into phase before being sampled.

• Two paths ‘race’ each other through interlocked latchesà extra delay is added to the path who is ahead.

• With slower clocks, the maximum pull-in range extendsà requires more ‘racer’ stages at lower speeds

• Reasonable for T=0.5-2nS, f = 500Mbps – 2Gbps.

Clock

Data

Up to T

D0 D1

Clock

Data

D0 D1

D0 D1

Danger Removes Delay to avoid clock edge.

Data

Clock

Assume a global frequency locked clock which has random phase relation across the chip.

Phase Mis-Alignment • Rather than pull into-phase, just ensure they are far enough out of phase.

• Prevent transitions ‘near’ a clock-edge.à Danger window T = set-up + hold time

• If the synchronizing bit transitions near an edge, à bump it out of the way.

• Practical concerns add to the safety window, and,à the maximum speed of operation = 1 / (2*Twindow)

• Suitable below 750 Mbps

• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs

• DLLs• Quick PLL• CDR• System IO

Page 20: Low-overhead solutions for clock generation and ...gallan/pdf/timing.pdf · Clock and Data Recovery: Frequency Re-Synthesis Data Line Set 1 2 CLK 1)The training pulse fires off an

Clock Data Recovery: Skew AdjustmentClock Data Recovery: Skew Adjustment

Data Channel(s)

SkewCorrection

Clock

Interconnectskew

• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs

• DLLs• Quick PLL• CDR• System IO

• Timing signal is transmitted along with the data.• Bussed signals may not be routed together, and therefore experience different delays. à simultaneously transmitted signals may arrive many clock cycles apart

To solve•After a handshake, an initial training pulse (all ones) is sent simultaneously along every channel.•The receiver ‘measures’ the relative delays in each path and compensates to remove skew.•The transmitter can then send bit/nibble serially at ~ 750 Mbps/channel.

•Measurement is performed with delay elements and interlocked sets of RS latches.•Can compensate for arbitrary skew across the interconnect à limited only by how much maximum compensation one wishes to add• Lowering clock speed will solve any skew beyond HW limits.

Page 21: Low-overhead solutions for clock generation and ...gallan/pdf/timing.pdf · Clock and Data Recovery: Frequency Re-Synthesis Data Line Set 1 2 CLK 1)The training pulse fires off an

HighHigh--Speed SelfSpeed Self--Synchronizing IOSynchronizing IO

SourceSubsystem

Variable FreqPausable OSC

Variable FreqPausable OSC

Data

WREN

ROUTE CONTROLLER

Variable FreqPausable OSC

Data

RDREQ

SYNCACK

SerialData

SinkSubsystem

• Low power and area, pausable NCOs• Efficient CDR schemes for Gbps serial links.• Skew-correction across an arbitrary interconnect.

Using the timing circuits permit:

• Serial• Low Area/Power• Configurable

• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs

• DLLs• Quick PLL• CDR• System IO

Within architecture we can provide:• Dynamic Routing• Flow Control• Error Detection and ARQ• A simple ‘synchronous’ interface to subsystems.

ACK

SerialData

SYNC

Page 22: Low-overhead solutions for clock generation and ...gallan/pdf/timing.pdf · Clock and Data Recovery: Frequency Re-Synthesis Data Line Set 1 2 CLK 1)The training pulse fires off an

ConclusionConclusion

• Pausable clocks of ‘arbitrary’ frequency

• Fully digital à Low-area, low-power local oscillators.

• Fast, ‘Rough’, timing locks without analog circuits

• High-throughput, error-tolerant, bit-serial links across domains

• Provide simple ‘synchronous’ interfaces to generic IP modules

Potentially unsuitable for RF mixing, and other jitter intolerant systems

For Questions Offline:

Gord Allan – gallan.doe.carleton.ca

Web: www.doe.carleton.ca/~gallan

• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs

• DLLs• Quick PLL• CDR• System IO

Page 23: Low-overhead solutions for clock generation and ...gallan/pdf/timing.pdf · Clock and Data Recovery: Frequency Re-Synthesis Data Line Set 1 2 CLK 1)The training pulse fires off an

Appendix: Timing Based Appendix: Timing Based AnalogAnalog--Digital ConversionDigital Conversion

Case 1: An analog voltage into the VCO produces a variable rate clock.The clock period is measured, and converted to a digital word.

Case 2: An analog voltage charges a capacitor at a variable rate.We measure the time it takes to discharge.

The time is measured in ‘ticks’ of a very high frequency reference.

• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs

• DLLs• ADC• Quick PLL• CDR• System IO