View
23
Download
0
Category
Preview:
Citation preview
Low-overhead solutions for clock generation and synchronization.
Gord AllanPhD CandidateCarleton University
Monday, March 10/ 2003
A presentation in the series on ULSI Configurable Systems.
OutlineOutline
• Ultra Large Scale (ULSI) Configurable Systems
• Adjustable Delay Elements
• A Pausable Numerically Controlled Oscillator (NCO)
• All digital Phase-Locked Loops and Frequency Synthesis
• Hybrid Analog and Digital Extreme Range PLLs
• Calibrated Delay Lines and DLLs
• Clock-Data Recovery (CDR)
• Single Cycle Acquisition PLLs
• Frequency Re-synthesis
• Phase Adjustment
• Skew Compensation
• High speed reconfigurable links
PresentationProgress
• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs
• DLLs• Quick PLL• CDR• System IO
Ultra Large Scale Configurable SystemsUltra Large Scale Configurable Systems
ARM
LocalStorage
Cache
Cache
DMA
DMA
Systolic
Array
FIRs
ADC
Cryto
uP
ARM
ARM
ARM
Periph-erals
Interface
FEC
FPGA Fabric
I/FFTFP FPModulator
USB
DACSwitch Cap
Filters
PLL DDFSIIRs
FPGA
Cache
System
Memory...
• Many sub-systems on a chip/board.
• Subsystems are fully isolated from one-another – IO is configured via software.
• On demand system configuration and processing.
Timing Issues
• Subsystems operate on independent clocks.- Low overhead clock generation.
• Communications reequire fault-tolerant, high-throughput re-synchronization.
Our architecture:
• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs
• DLLs• Quick PLL• CDR• System IO
System Timing: A Key ComponentSystem Timing: A Key Component
Adjustable Delay Element• Low Power – Ideally no static current.• Low Area• Wide-Range, Fine Resolution• High operating frequency• Mixed signal control• Linear delay characteristic• Composed of standard library elements• Low Noise – More on this later…
Digital IN Digital Out
Control Word
Delay = 100pS to 1000ps, Increments of 10pS.
• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs
• DLLs• Quick PLL• CDR• System IO
Some Conventional OptionsSome Conventional Options
Typical delay 100pS à inf
RFP
RFN
Starved Invertor
Power, Area
Non-linear
No dig. Control
Touchy Analog References
Wide-Range
Potentially Low Noise
BadGood
slow
slow
Typical delay 100pS à 120pS
Standard Cell (AOI)
Low-Range
Effects edges differently.
Little alg. Control.
Area, Power
High Speed
Dig. Library Element
BadGood
Switched Capacitance
Typical delay 300pS à 3000pS
Switched Paths
Power
Definite glitching.
Poor resolution.
No alg. ctrl.
Wide-Range
Std Cells
Low Noise - fast mode
BadGood…
Typical delay 200pS à 600pS
Power penalty.
Lower max. speed.
Glitching
Wide-Range
Nearly Linear
Potential Alg ctrl.
Std Cells
Low Noise
BadGood
• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs
• DLLs• Quick PLL• CDR• System IO
More advanced ApproachesMore advanced Approaches
Low-Range.
Custom tweaking reqd.
Fine Resolution
Potentially Linear
BadGood
Adjustable Drive
Typical delay 200pS à 300pS
Glitchless Switched Capacitor
Power penalty.
Larger Area
Lower max. speed.
Wide-Range
Nearly Linear
Potential Alg ctrl.
Std Cells
Low Noise
BadGood
TX GateBleeder
Typical delay 400pS à 3000pS
slowLow resolution.
Low Max speed.
Noise sensitive.
Area, Power
Wide Range
Linearity
Mixed Signal Ctrl
Externally Tunable
BadGood
Self-Starved Invertor
Typical delay 120pS à 1000pS
~Area
~Power
Med-Range
Poor alg control.
Med Resolution (~30 pS)
Potentially Linear
Std Cells
BadGood
Parallel Invertor Chain
Typical delay 200pS à 400pS
• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs
• DLLs• Quick PLL• CDR• System IO
Delay Element RecapDelay Element Recap
slow
slow
…
• Hybrids are not only possible, but are encouraged.
• The delay element to use depends largely on the application requirements.
• Is it passing High or Low frequency signals?
• What is the required delay range?
• Does glitching matter?
• Fine or coarse control resolution?
• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs
• DLLs• Quick PLL• CDR• System IO
Numerically Controlled OscillatorNumerically Controlled Oscillator
All digital, Numerically Controlled Oscillator.
An example Delay Element:
Typical Power:• This is one of the more power hungry elements.• @ f = 1.25 GHz, 1.8V à 720 uW à 580 fW/Hz• ~ 6 flip-flops• With custom sizing, power can be nearly halved.
NCOFreq adj.
CLK
NCOFreq adj.
CLK
f = 2.3GHz f = 1.1GHz~ 150 MHz Intervals
RUNPASS
T = 2*(Tdelay + Tinv)
• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs
• DLLs• Quick PLL• CDR• System IO
Speed Setting 8 7 6 5 4 3 2 1 0 AVG VariationFrequency Typical Case: 27C 2.26 2.05 1.86 1.69 1.53 1.40 1.28 1.19 1.11 1.51(GHz) Worst Case: 100C 1.55 1.41 1.28 1.17 1.07 0.99 0.91 0.85 0.79 1.06 -30%
Best Case: 0C 2.76 2.52 2.29 2.08 1.89 1.73 1.58 1.46 1.35 1.86 23%
An Aside… MOS NoiseAn Aside… MOS Noise• Ring Oscillators are highly non-linear. Difficult to analyze.• Noise causes the delay through each stage to vary randomly about the mean.• Translates to jitter in the output frequency• Single MOS: Thermal Noise + Flicker (1/f) Noise
fWLCoxK
gkTfV
mi ⋅
+
=
1324)(2
2iV
• Moral: For noise immunity, add capacitance instead of resistance.• In this work me make the conscious choice to focus on power and area, not noise.• For circuits which interface to RF, noise must be a priority.
• Large Transistors (WL) Poorer Area, Power, Speed• High W/L ratios Poorer Area, Power, Range• Fewer Stages Poorer Range• Larger Cox tox is smaller as technology scales.
1kΩ
100fF
10kΩ
10fFLow NoiseRC = 100 pS
Low Power, AreaRC = 100 pS
• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs
• DLLs• Quick PLL• CDR• System IO
A Poor Man’s PLL:A Poor Man’s PLL:Digitally ‘Matching’ a ReferenceDigitally ‘Matching’ a Reference
• Start the ring oscillator running at full speed.
• The frequency detector will force the NCO delay to rise, thus lowering the frequency.
• Once the frequency falls below the reference, the NCO will be commanded to reverse course, and raise the frequency.
• Provided the delay through the feedback loop is controlled, we then know that the frequencies are roughly aligned, and we can use a similar approach to align the phase.
• Once locked, the oscillator will toggle between the two ‘digital’ frequencies that surround the reference frequency – this introduces a quantization based jitter
NCO
FreqUP
FreqDNFreq adj. Phase
Adjust
Phase DetectAdvance
Recede
Freq DetectRef
• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs
• DLLs• Quick PLL• CDR• System IO
A Poor Man’s PLL: A Poor Man’s PLL: Clock MultiplicationClock Multiplication
• The same approach as for an analog PLL.
• Add a clock divider into the feedback path.
• The frequency and phase detectors will work to keep the two signals it ‘sees’ matched.
• The frequency detector forces the NCO to put out a frequency faster than the reference.
• Thus, by inserting a simple counter in the feedback path, we can generate integer multiples of the reference frequency.
NCO
FreqUP
FreqDNFreq adj. Phase
Adjust
Phase DetectAdvance
Recede
Ref
Divider
Nfdbk
Freq Detect
• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs
• DLLs• Quick PLL• CDR• System IO
A Poor Man’s PLL:A Poor Man’s PLL:Digital Clock SynthesisDigital Clock Synthesis
• Including a divider on the reference, we can generate nearly arbitrary frequencies:
fclk = fref * (Nfdbk/Mref)
NCO
FreqUP
FreqDN
1
Divider
Freq adj. Phase Adjust
1
R R
R RDivider
Advance
Recede
Ref
• However, if the rationals N and M are large, then special considerations must be taken to unsure stability of the system.
• There is still the problem of quantization induced jitter…
RST not shown.
• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs
• DLLs• Quick PLL• CDR• System IO
Improving Jitter: Hybrid PLLImproving Jitter: Hybrid PLL
• No quantization based noise – potentially suitable for RF and switched cap filtering.
• Wide Range, Quick Locking, More stability than conventional analog PLL
Osc Freq
Time
0.1M
100M
20.33 MHz ref
Lock some units digitally, pass on only the required number to analog control.
• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs
• DLLs• Quick PLL• CDR• System IO
FilterPFD CP
AnalogControlRef
DigitalFreqDetect
Hybrid Analog/Digital PLL ExampleHybrid Analog/Digital PLL Example• Simulates a hybrid lock to a 12.5 Mhz reference frequency.• NCO is composed of 7 self-starved delay elements à 8 speed settings from 8 to 35 Mhz in ~3 Mhz steps.
• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs
• DLLs• Quick PLL• CDR• System IO
•Start Running Quickly •Slow Down when told •We are no longer ‘too fast’ •Lock, and give analog control
•Roughly locked – analog takes over.
Calibrated Delay Line / Delay Locked LoopCalibrated Delay Line / Delay Locked Loop
…
D Q
D Q
D Q
When reference edge peeks out of the delay chain, decide whether to increase or decrease the delay through the line.
Snapshot of values through the chain at +ve edge clk.
REFERENCE
…
Note: FSM must be tolerant to metastable inputs.
• Single sided form is usefull for clock manipulation – generating offset phases, etc…
• With dual-rails, we can force an external signal to undergo the ‘same’ delay as the reference
• Usefull for highly accurate timing measurements as in ADC.
Logic: Is the falling edge at the tail of the line?Yes à Locked, No à INC or DEC delay appropriately.
• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs
• DLLs• Quick PLL• CDR• System IO
• Can be used in place of a PLL in many case.
• Locks the delay through a line to a particular reference.
Clock and Data Recovery:Clock and Data Recovery:SingleSingle Cycle Acquisition PLLsCycle Acquisition PLLs
D0D1 Line IN
Local Oscillator
• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs
• DLLs• Quick PLL• CDR• System IO
• Typical Clock-recovery solutions require a PLL is trained before sampling data.
• An analog PLL would typically require hundreds of training pulses.
• A typical digital PLL would require 10s of pulses.
• There should be NO REASON why we can’t lock to a transmitter’s timing in ONE training pulse.
Clock and Data Recovery:Clock and Data Recovery:SingleSingle Cycle Acquisition PLLsCycle Acquisition PLLs
…
D Q
D Q
Snapshot of values through the chain at +ve edge clk.
D0D1
D Q
D Q
D Q …
Line IN
Line IN
Local Oscillator
When training pulse peeks out of the delay chain:
• find the falling edge• turn on the feedback loop• creates a ring oscillator with the period of the training pulse
Extra precautions (not shown) protect against metastability.
Note: We can use further transition information in the data to adjust the frequency accordingly.
…
Concerns:
• Resolution of MUXs• Range vs. Area and Power • Logic Delay compensation
Logic:Search for the falling edge, set that MUX into feedback.(Requires 2 gates + a latch for each stage)
• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs
• DLLs• Quick PLL• CDR• System IO
Clock and Data Recovery:Clock and Data Recovery:Frequency ReFrequency Re--SynthesisSynthesis
Data Line
Set
1
2
CLK
1) The training pulse fires off an oscillator at the ‘same’ frequency as the transmitter.
2) We then use transition information in the data to update the frequency up or down.
Problems:
• Very fine resolution NCO.
• Transition activity must be enforced.
• First pulse is slow to set off the clock.
• Stability is an issue.
• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs
• DLLs• Quick PLL• CDR• System IO
D0D1 Line IN
Local Oscillator
Transition Update
Start Oscillator
Glitcher
Clock Data Recovery: Phase AlignmentClock Data Recovery: Phase Alignment
Phase Alignment • The data and clock will arrive within T seconds of each other.
• Attempt to pull the signals into phase before being sampled.
• Two paths ‘race’ each other through interlocked latchesà extra delay is added to the path who is ahead.
• With slower clocks, the maximum pull-in range extendsà requires more ‘racer’ stages at lower speeds
• Reasonable for T=0.5-2nS, f = 500Mbps – 2Gbps.
Clock
Data
Up to T
D0 D1
Clock
Data
D0 D1
D0 D1
Danger Removes Delay to avoid clock edge.
Data
Clock
Assume a global frequency locked clock which has random phase relation across the chip.
Phase Mis-Alignment • Rather than pull into-phase, just ensure they are far enough out of phase.
• Prevent transitions ‘near’ a clock-edge.à Danger window T = set-up + hold time
• If the synchronizing bit transitions near an edge, à bump it out of the way.
• Practical concerns add to the safety window, and,à the maximum speed of operation = 1 / (2*Twindow)
• Suitable below 750 Mbps
• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs
• DLLs• Quick PLL• CDR• System IO
Clock Data Recovery: Skew AdjustmentClock Data Recovery: Skew Adjustment
Data Channel(s)
SkewCorrection
Clock
Interconnectskew
• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs
• DLLs• Quick PLL• CDR• System IO
• Timing signal is transmitted along with the data.• Bussed signals may not be routed together, and therefore experience different delays. à simultaneously transmitted signals may arrive many clock cycles apart
To solve•After a handshake, an initial training pulse (all ones) is sent simultaneously along every channel.•The receiver ‘measures’ the relative delays in each path and compensates to remove skew.•The transmitter can then send bit/nibble serially at ~ 750 Mbps/channel.
•Measurement is performed with delay elements and interlocked sets of RS latches.•Can compensate for arbitrary skew across the interconnect à limited only by how much maximum compensation one wishes to add• Lowering clock speed will solve any skew beyond HW limits.
HighHigh--Speed SelfSpeed Self--Synchronizing IOSynchronizing IO
SourceSubsystem
Variable FreqPausable OSC
Variable FreqPausable OSC
Data
WREN
ROUTE CONTROLLER
Variable FreqPausable OSC
Data
RDREQ
SYNCACK
SerialData
SinkSubsystem
• Low power and area, pausable NCOs• Efficient CDR schemes for Gbps serial links.• Skew-correction across an arbitrary interconnect.
Using the timing circuits permit:
• Serial• Low Area/Power• Configurable
• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs
• DLLs• Quick PLL• CDR• System IO
Within architecture we can provide:• Dynamic Routing• Flow Control• Error Detection and ARQ• A simple ‘synchronous’ interface to subsystems.
ACK
SerialData
SYNC
ConclusionConclusion
• Pausable clocks of ‘arbitrary’ frequency
• Fully digital à Low-area, low-power local oscillators.
• Fast, ‘Rough’, timing locks without analog circuits
• High-throughput, error-tolerant, bit-serial links across domains
• Provide simple ‘synchronous’ interfaces to generic IP modules
Potentially unsuitable for RF mixing, and other jitter intolerant systems
For Questions Offline:
Gord Allan – gallan.doe.carleton.ca
Web: www.doe.carleton.ca/~gallan
• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs
• DLLs• Quick PLL• CDR• System IO
Appendix: Timing Based Appendix: Timing Based AnalogAnalog--Digital ConversionDigital Conversion
Case 1: An analog voltage into the VCO produces a variable rate clock.The clock period is measured, and converted to a digital word.
Case 2: An analog voltage charges a capacitor at a variable rate.We measure the time it takes to discharge.
The time is measured in ‘ticks’ of a very high frequency reference.
• Architecture• Delays• NCOs• Digital PLLs• Hybrd PLLs
• DLLs• ADC• Quick PLL• CDR• System IO
Recommended