PLL-Based Active Optical Clock Distribution

PLL-Based Active Optical Clock Distribution

by

Alexandra M. Kern

A.B., Engineering SciencesB.E., Electrical Engineering

Dartmouth College, 2002

MAsSACHUSE!S INSTI EOF TECHNOLO"

OCT 2004

LIBRARIES

Submitted to the Department of Electrical Engineeringand Computer Science

in Partial Fulfillment of the Requirements for the Degree of

Master of Science

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

September 2004

© 2004 Massachusetts Institute of TechnologyAll Rights Reserved

Author ...........................Department of Electrical Engineering

and Computer ScienceJuly 23, 2004

Certified by ................................Anantha P. Chandrakasan

Profesoqr of Electrical Engineering3Thesis $upervisor

Accepted by ......... . -.Arthur C. Smith

Chairman, Department Committee on Graduate Students

ARC IHIVw+1 v~.'.- · s

· _

PLL-Based Active Optical Clock Distribution

by

Alexandra M. Kern

Submitted to the Department of Electrical Engineeringand Computer Science

on July 23, 2004, in partial fulfillment of therequirements for the degree of

Master of Science

AbstractReducing the timing uncertainty associated with clock edges has become an exceed-ingly difficult problem as clock frequencies in high-performance processors increasepast several gigahertz. Absolute quantities of skew and jitter that were insignificantat lower frequencies now consume an increasingly large percentage of each clock cycleand directly reduce the time available for logic propagation. Processor designers cur-rently employ several types of electrical deskew mechanisms to combat this problemin order to delay the inevitable need for more radical clocking solutions.

Optical clock distribution has the potential to deliver extremely high precisionglobal clocks across large chips. However, traditional transimpedance amplifier ap-proaches to optical-electrical conversion introduce so much timing uncertainty thatthe accuracy gained through optical global distribution is lost at the global-to-localclock domain interface.

This thesis analyzes the feasibility of a phase-locked loop (PLL) based approachto the optical-electrical clock signal conversion. The proposed small-signal current-steering optical-electrical phase detector extracts timing information from the opticalreference without explicit optical-electrical conversion. This phase detector is inte-grated with a loop filter, LC VCO, and frequency divider to form a complete optical-electrical PLL system capable of generating 1.6 GHz local electrical clocks from a200 MHz global optical reference. The insights gained through the design and imple-mentation of this system are used as the basis for a broader analysis of the advantagesand challenges of PLL-based optical clock distribution systems.

Thesis Supervisor: Anantha P. ChandrakasanTitle: Professor of Electrical Engineering

3

4

�

Acknowledgments

I would like to thank my advisor, Dr. Anantha Chandrakasan, for the patient guid-

ance and insight he provided as I learned from each of the many rewarding challenges

I encountered during the first two years of my graduate career at MIT. Researching

and writing this thesis under his expert supervision has taught me many invaluable

lessons about the pursuit of academic research.

Many other extraordinary individuals have also shaped my academic development

over the past six years. Mr. David Kneedler, Mr. Ian Fink, and Mr. John Sledziewski

supervised my first two undergraduate internships and introduced me to the possi-

bilities of the semiconductor industry. Dr. Charles Sullivan and Dr. Edmond Cooley

were my mentors during my early undergraduate years and guided me through my

first experiences in academic research. Dr. Ian Young and Mr. Thomas Thomas

gave me the opportunity to apply my analog design skills in an industrial setting and

provided the perfect balance of expert guidance and independence. My colleagues

and friends in my research group always made time to discuss ideas and provide

valuable insights. I am grateful to these mentors for providing the opportunities and

encouragement that helped me build the foundations of my future career.

My friends have shared and enhanced my experiences at MIT. Coffee breaks when

we could least afford to waste the time, extended midnight phone conversations about

everything and nothing, heated lunchtime lab discussions on topics ranging from

politics to genetics, visits and calls from out-of-town friends who dare to believe that

the world does not revolve around MIT, and excursions both into the city and into

the wilderness have balanced the academic challenges of the past two years. I would

especially like to thank Vin Scarlata, Johnna Powell, Julia Cline, Elizabeth Basha,

Alicia Messmer, Joanna Lisker, Emily Halpern, Anne Thompson, and Devika Gopal.

The unconditional love and support of my family has allowed me to pursue in-

creasingly challenging goals with the reassuring knowledge that they will always be

there to catch me. My grandparents, Ja and Martha Densmore and Florence Kern,

are all unique role models and their stories have always inspired me. My brother,

5

Christopher, is unusually adept at filtering out the various pressures and expecta-

tions of society and pursuing objectives of his own choosing and I have learned from

his positive example. My parents, Edward and Priscilla, supported me and tolerated

my trademark indecisiveness through brief interests in various other fields before I fi-

nally chose engineering. They have always encouraged me to pursue my own dreams,

find a career that makes me truly happy, and create my own definition of success. I

am eternally thankful for their love and their unwavering belief that I can accomplish

anything I choose.

6

Contents

1 Introduction 17

1.1 Motivation for Optical Clocking . . . . . . . . . . . . . . . . . . 17

1.2 Current Electrical Clock Distribution Practices ............. 18

1.3 Prior Work on Optical Clock Distribution ............... 19

1.4 Objective of This Work .......................... 24

2 VCO and Divider Circuits 25

2.1 Divider ................................... 25

2.2 VCO .................................... 28

3 Optoelectronics 33

3.1 Photodiode Background ......................... 33

3.2 Standard CMOS Silicon Photodiodes . ................. 35

3.2.1 Possible Diode Structures in Standard CMOS ......... 35

3.2.2 Photodiode Junction Capacitance ............ . 37

3.2.3 Transit Time. .......................... 40

3.3 Photodiodes in SOI and Custom Processes . .............. 44

3.3.1 SOI Photodiode Receivers ................ . 44

3.3.2 CMOS-Compatible Custom Photodiode Processes ....... 45

3.4 Waveguides ................................ 46

3.5 Conclusions ................................ 47

4 Analysis of Phase Detectors 49

7

4.1 Current-Steering Phase Detector ................... .. 50

4.1.1 Basic Current-Steering Topology and Operation ........ 50

4.1.2 Sources of Phase Offset . . . . . . . . . . . . . . . . . . . 51

4.2 Extensions of Current Steering Topology . ............... 67

4.2.1 Current Mirrors .......................... 67

4.2.2 Photodiode in Feedback. . . . . . . . . . . . . . . . .. 68

4.3 Topologies with Alternate Phase Detector Cores . ........... 70

4.3.1 Bang-Bang Phase Detector ................... . 70

4.4 Conclusions ................................ 74

5 PLL Loop Dynamics and Complete Circuit Simulations 75

5.1 Optical PLL Analysis ........................... 75

5.1.1 Acquisition Range . . . . . . . . . . . . . . . ...... 76

5.1.2 Small-Signal Stability Analysis ................. 82

5.2 Final Simulated Results. . . . . . . . . . . . . . . . . . . . . 84

6 On-Chip Skew Measurement 89

6.1 TDC Concept ............................... 89

6.2 Critical Path ............................... 90

6.3 Control and State Machine ........................ 92

6.4 Implementation .............................. 93

6.5 Additional Qualitative Verification ................... 94

6.6 Conclusions ................................ 95

7 Conclusions 97

7.1 Summary ................................. 97

7.2 Simulation Results ............... ......... 98

7.3 Future Work ............... ................. 99

7.3.1 Optoelectronics .......................... 99

7.3.2 Circuits .............................. 100

7.3.3 Complete System ......................... 101

8

__

7.4 Conclusion ................................ 102

9

10

List of Figures

1-1 H-tree clock distribution. . . . . . . . . . . . . . . . . ...... 19

1-2 Receiverless optical clocking ........................ 21

1-3 Proposed optical PLL system ....................... 22

1-4 Original optical PLL clocking proposal - Clymer/Goodman. ..... 24

2-1 Divider architecture. ........................... 26

2-2 Circuit schematic of embedded XOR register block of Figure 2-1 .... 27

2-3 Divider output . ............................. 27

2-4 VCO core and buffer circuits. ...................... 29

2-5 ASITIC II model .............................. 29

2-6 VCO gain .................................. 30

2-7 VCO output for control voltage of 0.4 V . ....... ......... 31

3-1 Possible CMOS diode structures. ..................... 38

3-2 Junction capacitance versus reverse bias . ....... ......... 39

3-3 Depletion width versus reverse bias. . . . . . . . . . . . . . . ..... 39

3-4 Junction capacitance versus intrinsic width. .............. 41

3-5 Illustration of transit time ......................... 42

3-6 Transit time versus intrinsic width. ................... 43

4-1 Basic current-steering phase detector topology and operation. .... 51

4-2 Phase difference versus average current transfer function of current-

steering phase detector. . . . . . . . . . . . . . . . . ....... 52

4-3 Simplification of phase-detector structure. ............... 53

11

4-4 Ideal output, actual output at TT/27 °C and matched idealized output

(solid lines), and actual output over FF/SS/100 °C (dashed lines). .. 58

4-5 Skew for locked PLL across SS, FF and TT process corners. ..... 58

4-6 Skew for locked PLL at 27 C and 100 °C. ............... 59

4-7 Skew over SS/FF corners with ideal amplifier and real CMOS switches

(solid lines) and optical reference (dashed line). . ........ 61

4-8 Skew over SS/FF corners with real amplifier and ideal switches (solid

lines) and optical reference (dashed line) . ................ 61

4-9 Skew over SS/FF corners with ideal amplifier and triple-sized CMOS

switches (solid lines) and optical reference (dashed line) . ...... 62

4-10 Phase difference for locked PLLs at TT/27 °C with 10 1 A/20 pA versus

10 iA/10 pA differential current mismatch. .............. 64

4-11 Skew for locked PLLs at TT/27 C with 10 pA versus 12 /A common-

mode current mismatch. ......................... 66

4-12 Current mirror approach .......................... 68

4-13 Feedback amplifier approach. ...................... 69

4-14 Bang-bang phase detector approach. .................. 71

5-1 Well-damped, overdamped, and underdamped loop dynamics. .... 77

5-2 Typical characteristics of a phase detector (top) and a phase-frequency

detector (bottom). ............................ 78

5-3 Loop filter topology. ...................... ..... 80

5-4 Photodiode capacitance of 500 fF limits lock range. . ....... 80

5-5 Cycleslipping: Simulation of the PLL with the diodes modeled as cur-

rent sources with 200 fF parallel capacitance and a 100 fF/20 k2 loop

filter ..................................... 81

5-6 Complete layout of the PLL ........................ 85

5-7 Well-damped locking: Simulation of the PLL with the diodes modeled

as current sources with 200 fF parallel capacitance and a 800 fF/44 kQ

loop filter. ................................. 86

12

5-8 PLL locking from both extremes of input voltage range ......... 87

5-9 VCO output clock (dotted), optical reference (dashed, 10 IA amplitude

scaled for comparison), and divider output (solid) shown at the end of

the locking transient of Figure 5-7. ................... 87

6-1 Time-to-digital converter. ........................ 91

6-2 Split-output TSCP latches ......................... 92

13

14

I _

List of Tables

2.1 VCO inductor HI model values ....................... 30

4.1 Summary of skew sources. ......................... 63

6.1 State table for the microcoded state machine. ............. 94

15

16

Chapter 1

Introduction

1.1 Motivation for Optical Clocking

As clock frequencies in high-performance processor applications increase past several

gigahertz, meeting the increasingly rigorous skew and jitter requirements with tradi-

tional electrical clock distribution systems will become prohibitively difficult. Uncer-

tainty in the clock edge due to skew and jitter directly reduces the time available for

logic propagation and therefore limits the maximum logic depth and increases the re-

quired hold time. An absolute quantity of skew that could be tolerated at slower clock

frequencies will occupy a much more significant percentage of the total clock period

at higher frequencies, so skew and jitter limits are typically specified in percentages

instead of absolute terms. Typical systems require that the combined effect of skew

and jitter not exceed 10 percent of the clock period, though recent stretching of that

budget to 20 percent of the clock cycle is evidence of the fact that the challenges of

precise electrical clock distribution are intensifying [1]. Replacing the global levels of

the clock distribution with optical waveguides will likely be required in the future,

but high-speed, high-precision optoelectronic conversion will be required to maximize

the advantages of optical distribution. This thesis will analyze the feasibility of using

an optical PLL receiver circuit to generate local electrical clocks from a global optical

reference.

17

1.2 Current Electrical Clock Distribution Practices

Before beginning an analysis of optical alternatives, it is instructive to examine the

electrical clock distribution methods currently employed in today's state-of-the-art

microprocessors. This is beneficial not only because it provides the background nec-

essary to understand the relative advantages of an optical system, but also because

many of the methods utilized in an optical system are derived from electrical system

precursors.

Active techniques for distributing precise clocks across chips began to appear in

significant quantities in the literature around 1990. Prior to that date, clock frequen-

cies below 100 MHz allowed designers to distribute clocks with sufficient precision

using passive networks. In 1992, Intel introduced the now-prevalent concept of using

phase-locked loops to generate on-chip clocks from lower frequency off-chip references

in order to overcome package bandwidth limitations and improve precision [2].

Once the clock is on chip, many sources in an electrical clock distribution network

contribute skew and jitter. Variations in clock buffer speed due to device and supply

variation, differences in capacitive coupling to adjacent lines, unmatched load capac-

itance, and variations in the resistance and the capacitance of the lines themselves

all introduce timing variation [3]. Distributing all clocks from a central point on the

chip through paths of matched length is one obvious measure that is almost always

used to reduce skew. This can be accomplished either with a symmetric H-tree distri-

bution, illustrated in Figure 1-1, or with an asymmetric matched-length routed path

scheme. However, path length matching alone is no longer a sufficient solution. While

matching lengths may be possible, matching capacitive load and coupling across a

complicated chip is practically impossible. Therefore, most modern clock distribu-

tion schemes use some type of matched-length distribution in conjunction with other

deskew methods.

Figure 1-1 shows a top level H-tree distribution and sixteen local grids, with the

clock buffers that would certainly be present omitted for simplicity. The small black

boxes at the global-to-local interface might represent any number of deskew mech-

18

Figure 1-1: H-tree clock distribution.

anisms. The first Itanium processor used an active deskew scheme, but designers

of subsequent generations cited manufacturing concerns as a reason for reverting to

passive fuse-based deskew [4] [5]. Active deskew provides the capability of adjusting

for temperature and supply induced skew, but at the cost of possible stability con-

cerns. Despite their implementation differences, both of these processors use H-tree

distribution at the global level with an array of deskew circuits interfacing the global

network to the local grids.

1.3 Prior Work on Optical Clock Distribution

Using a H-tree optical waveguide to distribute a global optical timing reference to

several optoelectronic receivers across a chip initially appears to be a perfect solution

to the problem of skew and jitter. This is true in the sense that the optical signal

19

arriving at those receivers has imperceptible skew and the jitter is limited only by

the extremely precise laser source. Transimpedance amplifiers (TIAs) are commonly

employed to convert photocurrent inputs to voltage outputs, but converting these

small currents to full-scale logic voltages requires a high-gain transimpedance stage

followed by several stages of voltage amplification. Process and supply variation can

introduce significant skew and jitter in these circuits, often negating the benefits of

optical top-level distribution.

A comprehensive review of prior work, technical challenges, and possible benefits

of optical clocking is presented in [6]. The extensive reference list is an indispensable

introductory resource encompassing all aspects of the optical interconnect challenge.

This work correctly points out that optical interconnects must become CMOS com-

patible, high-density, precise, and economically attractive in order to succeed. Intel

recently considered many of these criteria in an analysis of TIA-based approaches for

clocking and interconnect applications [1]. Despite their rather aggressive assump-

tions about the future performance of integrated optical components, they concluded

that optical clocking will not be a practical replacement for electrical clocking. They

argued that the area requirements and radius of curvature of optical waveguides will

limit optical distribution to the highest levels of the global clock domain, which do not

account for a significant portion of the timing mismatch, and that therefore optical

clocking will not provide significant performance enhancement. They compared the

performance to both scaled and unscaled copper interconnects and concluded that,

while optical clocking did outperform scaled copper, using unscaled copper to achieve

the same performance was more cost effective and less disruptive to the manufacturing

process.

These conclusions, however, are restricted to a system analysis assuming that the

global clock frequency is 4 GHz and that TIAs are used for signal conversion. As

frequencies continue to increase, electrical clock distribution will eventually reach a

fundamental limit. It may be true that TIAs are not the ideal solution, but optical

clocking may still be a viable solution as researchers are currently exploring many

alternative methods of achieving optical-electrical signal conversion.

20

Vdd

I II II I

Diode, A

Inputs: I I

_TimeTime

-H-XP Clockk

Figure 1-2: Receiverless optical clocking.

Researchers at Stanford have proposed a "receiverless" method of optical clock-

ing, shown in Figure 1-2 [7]. In this scheme, a high-energy, short-duration pulse is

generated by a mode-locked laser. This signal power is split and one branch is delayed

by T. These signals are used to drive two photodiodes which respectively charge and

discharge the CMOS gates to be clocked. This approach has low skew and jitter if

the register gates are driven directly. However, since the optical power required to do

this would be prohibitively large, intermediate buffers would most likely be required

for a practical clock network and would introduce uncertainties.

Using a PLL to generate local clocks from a global optical reference allows both

active deskew of the gate-level clocks and local generation of low-jitter clocks. Fig-

ure 1-3 shows the architecture of the proposed optical PLL system. A complete

optical clock distribution using this system would replace the top-level distribution

in Figure 1-1 with an optical waveguide structure and use one instance of the optical

PLL to generate the clocks for each local network. The 1.6 GHz low-jitter local clocks

generated by the LC VCO are buffered as much as the clock load requires and de-

livered to the registers. Because the clocks are generated locally, there are no global

distribution clock buffer chains to introduce skew and jitter. The PLL eliminates any

skew generated in the forward path and the clock buffers, so skew between instances

is introduced only by variations in phase detector offset and divider delay. Jitter is

determined by the VCO phase noise and the jitter generated by the short forward-

path buffer chain used to amplify the clock power from the VCO to the gates. Using

21

Optical Reference

Figure 1-3: Proposed optical PLL system.

the optical signal as a precise reference, instead of directly sensing and amplifying

the optical input power in order to generate a full-swing signal as in the other two

approaches, therefore provides a significant advantage. In addition, using a divider

in the feedback path allows generation of high-frequency local clocks from a lower

frequency external optical reference.

As with any complex circuit system, there are many topology choices that affect

the final implementation of a PLL. Clearly the topologies for main blocks such as the

divider and VCO will be chosen for their performance in the particular application.

However, the two most fundamental decisions are the choice of a Type I or Type

II loop, the selection of phase or phase-frequency detection, and the determination

of the loop order. A Type I PLL uses a phase detector that generates an output

voltage proportional to the phase difference of the reference signal and the feedback

signal. This output voltage is simply low-pass filtered to generate the control voltage

and this type of PLL therefore has a finite possible voltage range and may exhibit

static phase error during lock, as there is no integration in the phase detector. A

Type II PLL uses a phase detector that adds a second integrator to the forward path.

This is typically implemented by using a phase detector in conjunction with a charge

pump. In a charge pump PLL, the phase error signal generated by the phase detector

is used to issue "UP" and "DOWN" pulses to a charge pump, which responds by

incrementally increasing or decreasing the voltage on the loop filter. Because of the

22

CLOCK

second integrator there can be no steady-state error between the two signals, but

because the phase detector and the VCO both contribute integrators to the forward

path, the loop filter must be carefully designed to stabilize the loop. Simple phase

detection and phase-frequency detection result in very different loop dynamics. A

simple phase detector detects only phase difference, not frequency difference, and its

gain changes sign if the phase deviates too far from lock. Therefore, if the VCO

frequency is too far from the reference then the phase cycles through the positive

and negative gain regions faster than the detector can control the VCO and the PLL

does not achieve lock. A phase-frequency detector (PFD), however, uses knowledge

of all the clock edges to detect phase and frequency, so the detector constantly drives

the loop toward lock with much improved dynamics and no limit to range. Type II

PLLs with PFDs are used in most modern systems due to their improved range and

tracking. A concise and accessible tutorial on the basics of PLL design is available in

[8]. Finally, the loop filter must be chosen to work with the selected phase detector

or PFD. Charge pump PLLs require more complex filters to stabilize the loop due to

the additional integrator introduced by the charge pump.

Researchers proposed the idea of optical clock distribution as early as the 1980's.

One of the first papers suggested the idea of using the optical clock as a reference for

local phase-locked loops [9]. They compared this approach to a transimpedence am-

plifier and concluded that it improved performance and reduced power consumption.

The proposed phase detector is shown in 1-4.

The details of operation can be found in the paper, but the basic functionality

is simple to understand. The phase detector has three possible output voltages.

When the VCO output is low, the diode is forward biased and the phase detector

output voltage is determined by the diode drop. When the VCO output is high, the

diode is reverse biased and the voltage is the result of the simple resistive voltage

divider if the optical signal is low or a resistive divider with an additional current

source if the optical signal is high. In this way, the filtered average output voltage

of the phase detector indicates what percentage of the total period was spent in

each state. This is a Type I PLL and achieves only 12.8 MHz of locking range with

23

Figure 1-4: Original optical PLL clocking proposal - Clymer/Goodman.

an output center frequency near 100 MHz. This range is simply not sufficient for

a modern or future clocking system so a Type II loop should be used instead to

improve range. Furthermore, the performance of this circuit at higher speeds will be

RC limited since the photodiode current is driven directly into a resistor to generate

the error signal voltage. The full potential of PLL-based optical clock distribution for

future applications may be realized by investigating small-signal charge pump phase

detectors and the resulting Type II loops.

1.4 Objective of This Work

The objective of this work is to explore the feasibility of a small-signal, current-

steering Type II phase detector as the central component of a clock distribution

network in a modern standard CMOS process. While design of VCOs and frequency

dividers is well understood, this type of phase detector is relatively unexplored and

there are many remaining challenges. This work will present an analysis of the chal-

lenges and possibilities for the design of such a phase detector, deriving examples from

the lessons learned during design of one particular topology and then discussing the

advantages and disadvantages of some other possible topologies. Simulation results of

a full custom layout implementation of the optical-electrical PLL clock system with

full parasitic capacitance extraction are presented.

24

Chapter 2

VCO and Divider Circuits

The implementation of a PLL with frequency multiplication requires a phase detector,

loop filter, VCO, and divider. The optical-electrical phase detector and loop filter are

the central parts of this work and will be analyzed in detail, but it is also necessary

to briefly summarize the choices of VCO and divider topologies.

2.1 Divider

In a PLL with frequency multiplication, the PLL output is controlled indirectly by

locking the feedback divider output to the reference signal. Any variation in divider

delay between two instances of the PLL will result in phase error of the generated

clocks even when the divider outputs are perfectly matched. The low-bandwidth loop

filter attenuates jitter introduced by the divider in the feedback path, but skew due

to process and temperature variations across the dividers will be directly translated

into skew between the generated 1.6 GHz clock outputs.

Both synchronous and asynchronous dividers are commonly used in PLL feedback,

often in a hybrid combination employing an asynchronous prescaler stage followed by

a synchronous divider with a large divide value. Cascaded divide-by-two asynchronous

dividers have a speed advantage as the logic depth is much shallower than a larger

value synchronous divider, but the data is latched many times instead of one, which

increases the potential for skew introduction. For example, consider a divider with

25

Figure 2-1: Divider architecture.

N=8. If a synchronous implementation with one clock-Q delay experienced X sec-

onds of skew due to a particular process or temperature variation, an asynchronous

divider with three stages would experience 3X seconds. Because minimizing skew is

imperative for this application, the divider architecture should be fully synchronous if

possible, while staying within reasonable logic style boundaries. Furthermore, a fully

differential circuit style should be used to maximize resistance to skew. High-speed

RF logic styles, such as resistively loaded SCL with small-signal outputs, are not

suitable for use in this design since full-swing outputs are required.

Since the maximum output frequency of the VCO in the typical corner is 1.8 GHz,

the divider is required to function properly at 2.0 GHz in the slow corner in order to

allow for a reasonable safety margin. Several circuit styles were considered, but none

allowed a fully synchronous divide-by-eight at these speeds without using resistively

loaded circuit styles. Figure 2-1 shows the chosen divider architecture, which consists

of a divide-by-two prescaler and a synchronous divide-by-four. The divide-by-two

circuits are implemented with registers in the standard feedback configuration such

that the output changes state at each positive clock edge and generates an output

at half the input frequency. The XOR register in the synchronous divider includes

XOR logic embedded in the first latch in order to increase speed performance and is

shown in Figure 2-2. Both the simple registers and the embedded XOR register are

implemented in source-coupled logic (SCL) with cross-coupled PMOS loads.

The extracted and simulated divider output waveforms, shown in Figure 2-3,

demonstrate that the divider does function properly at 2 GHz as designed.

26

Prescaler Synchronous %4 with Embedded XOR. - -1 I r - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

2Q0 MHzOutput1.6 GHz

Input

II

---·-

Figure 2-2: Circuit schematic of embedded XOR register block of Figure 2-1.

1.8

1.6

1.2

S0.6

0.4

0.2

0

0 0.5 1 1.5 2 2.5 3 3.5 4Time (ns)

Figure 2-3: Divider output.

27

--0 A

AB (From Previous Stage)~-AB

cKl

AlL

BiL

Divider Input and Output

r- I I I I II II I tIII I II II II I I II I I I I

Ii' II iiI II I III I I I t I II I i I II I I II I

II I i~ II II~ II I It

II II I I II I

~~I II II 11 1 I liiI$ II It I

2.2 VCO

The VCO must provide excellent jitter performance and tolerance of supply noise in

order to generate precise output clocks. Furthermore, because the PLL uses only a

phase detector instead of a phase frequency detector, the VCO must have a small

enough tuning range that the loop cannot initialize itself too far from the correct

frequency to acquire lock or accidentally lock to harmonics of the reference. The

combination of these two criteria suggests that a LC VCO is the best option to obtain

the desired performance. However, given that area is a major concern in production

level microprocessors, it is important to note that there are circuit techniques available

to obtain acceptable performance without large passive devices if necessary [10].

Figure 2-4 shows the VCO used in the PLL. The VCO core is a standard topology

with both PMOS and NMOS cross-coupled gain devices and a varactor-tunable LC

tank [11]. The application requires a relatively narrow tuning range, but making

the range arbitrarily small is dangerous because small process variations might then

cause the VCO range to shift away from including the target center frequency. For

this reason, the VCO was designed with a range large enough to accommodate a

10 percent frequency shift in either direction and still include the 1.6 GHz target

frequency.

VCO simulations showed that obtaining this range and center frequency requires

a 6.1 nH inductor. The inductor was designed and verified with an external sim-

ulator and incorporated into the circuit simulations through an equivalent circuit

model. This tool, developed at Berkeley and abbreviated ASITIC for "Analysis and

Simulation of Spiral Inductors and Transformers for ICs", is optimized for simulating

integrated inductor structures [12]. The tool provides an equivalent "11 Model" of the

spiral structure, which has the form shown in Figure 2-5. The VCO used a standard

square spiral inductor with six turns in M6; the II model values generated by ASITIC

for this structure are listed it Table 2.2.

A graph of the resulting VCO gain is shown in Figure 2-6. The total range is

approximately 1.4 GHz to 1.8 GHz, with the highest gain in the control voltage range

28

Figure 2-4: VCO core and buffer circuits.

C1

L

C2

R2 R3

Figure 2-5: ASITIC II model.

29

C3- l

L C1 C2 C3 I RS R2 R3 6.12nH 25fF 125fF 125fF 11.5 | 5.0Q 5.0Q

Table 2.1: VCO inductor I model values.

0.2 0.4 0.6 0.8 1VCO Control Voltage (V)

1.2 1.4 1.6

Figure 2-6: VCO gain.

of 0 V to 0.9 V. The 1.6 GHz center frequency is generated by a control voltage of

about 0.4 V, near the center of the range.

As the divider is fully differential to minimize jitter generation, the VCO should

have symmetric differential full-swing outputs. The VCO core produces a differential

output with a common-mode around 0.4 V. Therefore, the first two stages of the

symmetric buffers shift the DC level of the signal to near the half-rail voltage of

0.9 V. This signal drives an inverter with resistive feedback to prevent saturation

in order to generate a full-swing signal, which may then be used to drive standard

inverters and logic. The VCO core waveform, first stage buffer waveform, and final

output are shown in Figure 2-7. This LC VCO provides a low-jitter local reference

which, when locked to the optical reference signal by a PLL, can be used to generate

a high-precision local clock.

30

VCO Voltage-to-Frequency GainI.e

1.

ii11.7

1.

1.4

1.

.8

15

.6.......... ..........

55 ...... .. ........ .. .. ..........i5

. ' : :

1.8o

VCO Core and Buffer Output Waveforms

Tirme (ns)

Figure 2-7: VCO output for control voltage of 0.4 V.

31

00

32

· _

Chapter 3

Optoelectronics

Monolithic integration of optics and electronics is one of the primary challenges in

any optical clocking scheme. The complete system requires integrated waveguides,

integrated photodiodes, and standard CMOS logic. Clearly, the ideal solution would

meet all the specifications and be completely fabricated in a standard CMOS process.

Failing that, then the remaining functionality should at least be obtained through a

CMOS compatible post-process. It is feasible to consider the post-process approach

for a real application where optical clocking was required, but the prototype is lim-

ited to strictly CMOS design. Therefore, this chapter will consider both cases and

attempt to provide an accurate review of the performance attainable given both sets

of constraints.

3.1 Photodiode Background

This chapter includes a brief review of the several simple concepts and metrics that

must be understood before embarking on an analysis of various photodiode struc-

tures. Fundamentally, any diode becomes a photodiode and produces current when

illuminated. Incoming photons generate electron-hole pairs in the semiconductor and

some of these electrical carriers drift or diffuse across the diode junction before re-

combining, thereby generating electrical current. The speed and efficiency at which

this process occurs is a function of the material properties and the geometric features

33

of the photodiode.

A depletion region, with depth varying as a function of doping concentrations and

reverse bias voltage, is formed at any PN junction. The carriers generated in the

depleted region appear at the terminals fastest because they drift to the appropriate

terminal. Those generated within the P or N regions may recombine or diffuse to the

junction and then drift the remaining distance. Drift is much faster than diffusion,

so in order to obtain a photodiode with a fast response, it is preferable to generate

the majority of the carriers in the depleted region. On way to achieve this result

is to put a very lightly doped "intrinsic" region between the P and N regions. In a

well-designed case, this low doping allows the applied reverse bias to fully deplete the

entire intrinsic region and all the carriers generated in that region will drift to the

appropriate terminals. These structures and doping levels are not, however, available

in a standard CMOS process and the diodes obtained through this type of process do

not obtain the best performance achievable in silicon.

In a fully depleted PIN diode where the P and N regions are masked so that no

carriers are generated there, the speed of the diode is no longer limited by diffusion

transit time so the effect of drift transit time becomes significant. In this case, the

speed of the diode becomes a function of the intrinsic region width. These custom

process diodes have historically employed a vertical PIN structure; the layers are

stacked one on top of the other with the intrinsic region sandwiched between the P and

N. This structure presents a tradeoff between efficiency and speed. The concentration

of remaining photons decreases exponentially with distance from the surface of the

semiconductor as they are absorbed and converted into electron-hole pairs and each

semiconductor has a different absorption depth. Materials in which the photons

are absorbed very near the surface can obtain higher performance because better

efficiency is obtained for a given intrinsic width, leading to lower transit time for

a given efficiency requirement. For materials such as silicon, which are relatively

inefficient at absorbing photons, a deeper intrinsic region is required to absorb the

majority of the photons and obtain reasonable efficiency. Unfortunately, a deeper

intrinsic region will also increase transit time for the carriers generated farthest from

34

their terminals and therefore slow diode performance. This tradeoff is present for

all materials, but those with shorter absorption lengths are able to achieve better

performance.

Previous discussion considers vertical PIN diodes fabricated in customized pro-

cesses. In a standard CMOS process, lateral partially-depleted PIN diodes are fabri-

cated by using existing PN junctions such as the P+/NWELL. If we consider a lateral

PIN CMOS diode structure and assume that a certain fixed amount of intrinsic area

is required and that the minimum dimension of the P and N regions is fixed, then

the total junction area is inversely proportional to the intrinsic region width. In a

standard CMOS lateral PIN, the doping levels are not optimized for photodiodes and

the intrinsic region will not be fully depleted. Since carriers must therefore diffuse

to the terminals, the width must be very limited to obtain reasonable performance.

As the width is decreased, however, the total junction area will increase and cause

an increase in junction capacitance. The diode performance will therefore be ca-

pacitance limited for small intrinsic region width and transit time limited for large

intrinsic region width. An analysis of the performance obtainable in a standard 0.18

/im processes will be presented in this chapter.

3.2 Standard CMOS Silicon Photodiodes

3.2.1 Possible Diode Structures in Standard CMOS

Because this design uses a standard mixed-signal 0.18 m CMOS process without

special photodiode process features, the photodiodes must be created from the ex-

isting PN junctions: P substrate, NWELL, DNWELL, PWELL, N+, and P+. The

DNWELL and PWELL are available only in the RF process, not the standard digital

0.18 m process, and are included for completeness but should be avoided in the

design if possible in order to demonstrate the achievable performance in standard

CMOS. The P substrate must always be grounded, so substrate diodes may not be

used in the stacked diode phase detector and are not considered here.

35

Though non-substrate diodes may be connected to arbitrary potentials, the para-

sitic diodes of each structure contribute differently to the output for different po-

tentials. Figure 3-1 shows the physical cross section for both NWELL/P+ and

DNWELL/PWELL/P+ diodes, and the schematic representation of the intentional

and parasitic diodes for connections from VDD to V_ and V. to GND. To obtain a

larger diode, multiple N+/P+ finger pairs would be added in the same well to in-

crease the total intrinsic area without making the intrinsic width too large. Because

the typical depth of an NWELL is on the order of 1-2 lm, much less than the absorp-

tion depth of silicon, a large quantity of carriers will be generated in the substrate

in addition to those generated in the well, and the parasitic diodes may easily pro-

duce more current than the intentional diodes. Furthermore, these parasitic diodes

will have slow tails in their responses caused by long diffusion lengths from the deep

substrate. For cases A and B, the NWELL/P+ junction forms the intentional diode

while the NWELL/P-SUB diode is the parasitic diode. In A, the parasitic diode is

connected from VDD to GND and the current is not seen at the output node. In B,

however, the parasitic diode is connected from the output to ground and the currents

add in parallel. Therefore, in the diode stack configuration, the bottom diode current

would be much larger than the top diode current for equal illumination. Cases C and

D, using the RF process DNWELL and PWELL, exhibit similar problems. In C, the

intended N+/PWELL pull-up diode may actually be smaller than the DNWELL/P-

SUB parasitic pull-down diode, leading to a net pull-down effect. In D, the parasitic

diodes are shorted together to GND and eliminated.

Using the mismatched diodes of A and B in the proposed phase detector would

result in significant phase offset from quadrature proportional to the difference in

current. If the top and bottom diodes were consistently mismatched across the chip,

then the DC mismatch itself, temporarily neglecting transit time and capacitance

concerns, might present only a minor problem. The use of diodes C and D, however,

would likely result in complete failure of the PLL if both diodes presented a net

pull-down current and the loop had no way to gain voltage. Given this analysis and

the more universal availability of standard CMOS fabrication, diodes A and B were

36

chosen for the design and further analysis will be based on these two structures.

3.2.2 Photodiode Junction Capacitance

Capacitance, transit time, and DC current output are the three major design criteria

for integrated photodiodes. In the case of a TIA, the system will be directly limited by

the RC bandwidth of the photodiode capacitance and the feedback resistance. Since

the gain required to achieve a given output swing is inversely proportional to the input

current, the metric of ~- is typically used to assess photodiode performance. TheCPD

proposed phase detector is not constrained by the traditional RC bandwidth limit,

but the same ratio metric is still valid for reasons relating to loop stability which will

be discussed in Chapter 5. Therefore, characterization of the diode capacitance is

critical.

This photodiode is simply the illuminated version of a P+/NWELL diode. There-

fore, capacitance simulations are available in a standard design flow. In addition to

simply determining the capacitance of the proposed diode structure, the capacitance

as a function of reverse bias can be used to determine the depletion layer width as a

function of reverse bias, since doping concentrations are rarely available to designers.

Figure 3-2 shows the capacitance of a 35 m square diode as a function of reverse

bias voltage. A large square diode is used for this test, as opposed to a fingered diode,

in order to guarantee that the sidewall capacitance is an insignificant portion of the

overall capacitance and therefore make the depletion width numbers more accurate.

As expected, the photodiode exhibits a capacitance that decreases with reverse bias

voltage.

The depletion width can be approximately determined by using the simple plate

capacitor formula: W - Asi. Using the constants e0=8.85e- 12 and esi=11.7o0, the

capacitance data is easily used to obtain the depletion width data shown in Figure 3-3.

This analysis of the depletion width versus reverse bias shows that, even if the

intrinsic region width is reduced to the 0.23 /lm minimum allowed by the design

rules, the intrinsic region will not be fully depleted at the 0.9 V reverse bias expected

at steady-state in the stacked diode phase detector topology. Therefore, the transit

37

.F-1 I Vdd Ivaa VX

N MWELL ANWELL V~

/ P-Sub Vx

A

BVdd

I17 3A VI a

VoVx

C

5OVx

Figure 3-1: Possible CMOS diode structures.

38

Vx _

NWELL ~ u

DD

Reverse Bias (V)

Figure 3-2: Junction capacitance versus reverse bias.

Reverse Bias (V)

39

Figure 3-3: Depletion width versus reverse bias.

time cannot be neglected as in a fully depleted PIN, and the performance of the diode

will be determined by the combined effects of the capacitance and the transit time

through the undepleted regions.

The junction capacitance of the photodiode structure is proportional to the total

P+/NWELL junction area. Assuming that the P+ and N+ implants will be fixed

at the minimum width and that a fixed intrinsic area is required to produce the

required current, then the total junction area will be inversely proportional to the

intrinsic region width. As the intrinsic width is increased, more intrinsic area is

enclosed between each fixed size P+/N+ finger pair. Based on measurements of the

P+/NWELL diodes previously fabricated in the same process, it was determined that

1250 /m 2 of intrinsic area will be needed to obtain the required 10 JIA of photocurrent

with reasonable power [13]. A capacitance/area value derived from the 0.9 V bias

point of Figure 3-2 was then used to determine the total capacitance of a fingered diode

structure with this fixed intrinsic area as the intrinsic width was varied. The results of

this calculation, shown in Figure 3-4, indicate that the intrinsic region must be nearly

1 Jm wide to obtain a total photodiode capacitance as low as the 200 fF target of the

phase detector structure. In fact, even from an area efficiency perspective, intrinsic

width much lower than 1 um seems unreasonable, given that the contact N+/P+

areas on either side will total to about 0.5 pm. However, though capacitance analysis

alone would suggest that the photodiode is optimized by arbitrarily increasing the

intrinsic region width, the transit time through the undepleted region is oppositely

optimized and therefore requires that some intermediate intrinsic width be chosen for

a reasonable compromise between the two performance criteria.

3.2.3 Transit Time

The electron-hole pairs generated by incoming photons may experience two modes of

transport to their respective junction destinations. Carriers in a depletion region will

be accelerated by the relatively high electric field and transported by the rapid drift

due to that field. Carriers in an undepleted region will diffuse slowly due to carrier

gradients and will either recombine or reach a junction. Because drift is much faster

40

Junction Capacitance (O V Bias) v. Intrinsic Width for Fixed Total Intrinsic Area

0.3 0.4 0.5 0.6 0.7Intrinsic Region Width (urn)

0.8 0.9 1

Figure 3-4: Junction capacitance versus intrinsic width.

than diffusion, it is desirable to obtain a fully depleted PIN diode and generate all

the carriers in the depleted region. When this is not possible, careful consideration of

the total distance the carriers must diffuse and the resulting transit time is required.

In this situation, the electrical current produced as a result of an incident optical

square wave will appear qualitatively similar to the waveform shown in Figure 3-5.

The carriers generated in the intrinsic region will drift to the terminals very rapidly

and produce nearly a step change in photodiode current. The carriers generated in

the undepleted region will gradually diffuse to the junction and introduce a slow tail.

Clearly, in order to obtain an output current that approximates a square wave, the

transit time should be very short compared to the signal period.

The transit time is a function of distance, temperature, and carrier mobility.

Because the electrical carriers in this diode are generated in the NWELL, the mobility

of holes in the NWELL should be used for these calculations. A typical value of hole

mobility in a 0.18 Mm process is 110 cm2 /V-s [14]. The diffusion coefficient, D, may

be calculated from the Einstein relation, D=p kT, where is the hole mobility,q

41

700

600

400

300

0.20.2

. . . ....... . ..

blJ · · ·OM

I I I

- .. A..... .. .... .. .. .I............:....... ...:.. .........:............ ..... .....:I

............ :.

............ :.

........... .......................

. . . . . . . . . . . . . . . . . . . .

.... .. ....... .. . . ..... .

... .. ............ ......... ........ .. ........ .. .................. :..........

Illustration ot Effect of Transit Time

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Normalized Time

Figure 3-5: Illustration of transit time.

k is Boltzmann's constant, T is the temperature in Kelvin, and q is the electron

charge. The transit time, , is calculated according to r = and a plot of r as

a function of W is shown in Figure 3-6. The 200 MHz optical reference clock has

a period of 5 ns, so it is clear that the 1.8 ns transit time for a diode with 1 m

intrinsic width is unacceptable. However, Figure 3-4 shows that the capacitance of a

diode with sufficiently small transit time will be too large. This analysis shows that

there is not a photodiode structure available in this 0.18 Am CMOS process capable

of simultaneously meeting the defined power, current, capacitance, and transit time

requirements.

In the context of a research prototype, more photocurrent can be generated with

less photodiode area by using a well-focused high-power source. Reducing the diode

area by 50-75 percent would allow an acceptable tradeoff between capacitance and

transit time between fingers. If it were not for the deep substrate effects, this com-

promise would provide an acceptable diode for this application.

However, carrier generation in the deep substrate causes two problems that are

42

Transit Time v. Intrinsic Width

Intrinsic Width (um)

Figure 3-6: Transit time versus intrinsic width.

beyond the control of the designer: diode mismatch and transit time from the deep

substrate. Overall pull-up/pull-down diode mismatch is caused by the parasitic

NWELL/PSUB diode, which appears in parallel with the pull-down diode but is

shorted from VDD to GND for the pull-up diode. (Refer back to Figure 3-1 for illus-

trations and schematics.) Furthermore, the transit time for most of the carriers in the

parasitic diode is independent of the finger spacing. A carrier generated deep in the

substrate may need to travel up to 10 lm to the NWELL/PSUB junction, whether

the spacing of the N+/P+ fingers is 0.1 pm or 10 pm. In fact, as the spacing of these

fingers becomes smaller than the depth of the NWELL, this effect will even begin to

become apparent in NWELL/P+ diodes due to carriers generated near the vertical

center of the NWELL.

Diodes A and D from Figure 3-1 form a set of pull-up/pull-down diodes in which

all parasitic diodes are shorted between the supplies and do not contribute their long

tail currents to the output. Using these two in combination could ameliorate the

problem of transit time from the deep substrate, but the photocurrents of the two

43

completely different diode structures would not be matched. This would be the best

available solution if a test chip were to be fabricated in this RF process, but diode D

is not available in standard CMOS. To first order, the mismatch of the two diodes will

simply produce a systematic offset from quadrature of the generated clocks, which is

not necessarily undesirable so long as it is consistent between instances. Chapter 5

presents a more complete analysis of the effects of diode characteristics on overall

PLL dynamics with the originally proposed phase detector.

3.3 Photodiodes in SOI and Custom Processes

Although this design will consider only photodiodes fabricated in a standard CMOS

process, understanding the future potential of optical clock distribution requires a

brief analysis of more optimized, higher performance silicon photodiodes.

3.3.1 SOI Photodiode Receivers

Significant effort has recently been dedicated to finding ways to integrate high per-

formance photodiodes into a standard CMOS chip. The majority of the approaches

use a process that is somehow modified or amended to provide enhanced photodiode

functionality. Although the test chip must be designed with the diodes available in a

standard CMOS process, a production optical clocking system could introduce a few

additional process steps in order to obtain the improved performance of these types

of photodiodes.

Because of the parasitic diodes and substrate carrier generation, discussed in Sec-

tion 3.2, none of these diodes are fabricated using a standard process. However, an

SOI process offers some advantages in optoelectronics and researchers have pursued

the idea of SOI photodiodes [15]. The authors of this reference also examined high-

resistivity non-SOI as a possible candidate and found that the photodiodes in this

process required 30 V bias to achieve 1.0 Gb/s. Even with this bias, they still had a

low frequency tail response due to diffusion from deep substrate carrier generation.

Unless the photodiode is somehow isolated from the deep substrate, the electron-hole

44

pairs generated by the last photons absorbed deep in the bulk will gradually diffuse

to a junction and appear at the terminals as a long tail. This problem prompted the

authors to examine the SOI photodiodes in the referenced paper. They found that

by using a 3.0 pm silicon layer on a buried oxide they could achieve the required per-

formance, both in terms of efficiency and bandwidth, without resorting to extremely

large bias voltages. The receiver, fabricated in a 1.0 pm SOI process, achieved 1.5

Gb/s and 622 Mb/s maximum speed operation at 5 V and 3 V single supply voltage,

respectively. Later work by the same group demonstrated improved results by using

the same techniques in an unmodified 0.13 pm SOI process [16]. This work achieved

8 Gb/s receiver operation with the photodiode biased at 24 V.

3.3.2 CMOS-Compatible Custom Photodiode Processes

Freedom to add custom process steps allows optimization for higher performance

photodiode topologies. IBM has focused a major research effort on developing lateral

trench detectors in silicon. By etching deep trenches in the silicon and filling them

with N-type and P-type polysilicon, they are able to decouple the transit distance

and absorption depth in order to obtain both high speed and high responsivity [17]

[18]. The trenches extend many microns into the substrate but are placed relatively

close together, so carriers generated in the intrinsic region many microns below the

silicon surface are still rapidly collected by the nearby terminals. The photodiodes

created through this process exhibited 6-dB bandwidth of 1.5 GHz at 3.0 V single

supply voltage and quantum efficiency of 68 percent at 845 nm.

Even for 8 pm deep trenches, some carriers are still generated in the substrate

beyond the trenches due to the 15-20 m absorption length of silicon at 850 nm

and generate long tails in the photodiode response [19]. Therefore, this extension of

the previous work explored the idea of using deep trench detectors in an epitaxial

layer of opposite type from the substrate, thereby isolating the carriers generated

below that junction and improving response time. The work reports that the use of a

junction substrate lateral trench detector can improve the bandwidth to 6 GHz from

the 100 MHz obtained with a bulk lateral trench detector in the same process.

45

There are many possible structures besides the lateral trench PIN, though it ap-

pears to be one of the most promising ones reported in recent literature. Researchers

are also in the early stages of investigating the possibility of using materials other

than silicon to create even higher performance diodes and then using self-assembly

to place these diodes into recesses left in the silicon wafer, though the results are not

yet published.

Some combination of these techniques will eventually produce high-performance,

CMOS-compatible photodiodes for use in future optical clocking systems. Therefore,

while the analysis of attainable performance of photodiodes in CMOS will reflect the

true attainable results, parts of the phase detector and PLL analysis will assume

higher performance to demonstrate the feasibility of the concepts in future technolo-

gies.

3.4 Waveguides

Integrated waveguides may be fabricated in a dedicated CMOS-compatible post-

process. These waveguides are created by fabricating a core/cladding structure that

operates on the same principal as a multimode optical fiber. The difference in re-

fraction index causes the light to remain contained in the core and proceed through

the waveguide. Some materials that have so far been considered are SiON or SiONy

for the core and SiO2 for the cladding. A 49:51 worst-case split power mismatch has

been achieved by using these materials and shaping the split points to minimize loss

at these junctions and improve matching [20]. This reference also describes a method

for integrating photodiodes in a way that will evanescently couple to the waveguides.

Though it is possible to fabricate diodes in a standard CMOS process, these diodes

will not likely be easily coupled to the waveguides. Therefore, in the case where the

wafer will already be post-processed to add the waveguides it is logical to include a

few extra steps to integrate higher performance diodes that will couple directly to the

waveguides. Since waveguides cannot be fabricated in a standard CMOS process, this

design relies on free space optics and the silicon photodiodes available in a standard

46

CMOS process.

3.5 Conclusions

Obtaining acceptable performance from CMOS diodes is extremely challenging for

most applications. Even if a high-power source is used and the intrinsic area reduced

in order to find an acceptable optimum between transit time and capacitance limi-

tation, the deep substrate effects are significant. In a standard CMOS process, it is

not possible to generate both pull-up and pull-down diodes unaffected by the slow

current tails of the parasitic diodes.

In the future, when optical clocking becomes the only practical way to deliver high

precision timing references, processes will be modified to include higher performance

diodes. Intel, in their analysis of the feasibility of optical clocking, assumed the

availability of photodiodes producing 100 IA with only 5 fF capacitance. So, although

actual prototype designs in standard CMOS may be photodiode limited, the analysis

and design of the optical PLL should instead consider the circuits and systems that

will become possible when photodiode performance improves. Therefore, the PLL

dynamics analysis will assume the availability of 200 fF photodiodes with transit

time much less than the period, an assumption that is not unreasonable given recent

progress in the field.

47

48

Chapter 4

Analysis of Phase Detectors

A PLL-based optical clock distribution system with an optical-electrical small-signal

phase detector has the potential to generate low-jitter, low-skew local clocks. Assum-

ing that the phase detector is implemented in a way that does not introduce excessive

ripple to the loop filter, the jitter of the overall system is primarily determined by the

VCO, which may be minimized by using a LC VCO or a low-jitter, self-biased ring

oscillator [101. This decoupling of the jitter from the optoelectronic conversion stage

potentially provides a significant advantage over a TIA system that may introduce

large jitter at this interface. However, like the TIA system, the steady-state offset of

the output clocks from the optical input signal is non-zero. In a traditional receiver,

this offset would be contributed by the TIA and limit-amplifier delay. In this case, it

is determined by the small-signal characteristics of the phase detector. If the sources

of this difference from the ideal case are independent of process and supply then all

instances of the PLL will experience the same offset and no skew will result. It there-

fore becomes important to characterize the source of the offsets accurately for each

phase detector topology considered in order to determine the impact.

This chapter analyzes the proposed current-steering phase detector in detail and

discusses basic operation, sources of phase offset, and silicon optoelectronics consid-

erations. The analysis is extended to include suggestions of other topologies with

different advantages and disadvantages.

49

4.1 Current-Steering Phase Detector

4.1.1 Basic Current-Steering Topology and Operation

The basic topology for the proposed current-steering phase detector is shown in Fig-

ure 4-1. The circuit provides the functionality of both a phase detector and a charge

pump by using the electrical feedback clock from the divided VCO output (EC) to

steer the current generated in the photodiode by the optical input (OC) on and off of

the loop filter. The circuit is similar to the charge-pump structures used in previous

works [2], but in this case the current sources are replaced with photodiodes and

controlled with optical input signals. The photodiodes are illuminated with the same

fifty-percent duty cycle optical reference clock, OC. We assume for the moment that

the loop filter is simply a capacitor. Although the actual filter will be more complex

in order to stabilize the PLL, this assumption simplifies visualization of the phase

detector operation and the intuition gained from this exercise is directly applicable to

the more complex filters. When the electrical clock generated by the feedback divider

(EC) is high, the current from the upper photodiode flows into the loop filter and

increases the output voltage, while the unity gain feedback buffer absorbs the current

from the lower photodiode. When the electrical clock goes low, the switch settings

are reversed and current flows out of the loop filter and decreases the output voltage.

Since charge is the integral of current, if we define P to be the percent of the optical

clock (OC) high period for which the electrical clock (EC) is also high, then the net

charge on the loop filter after a cycle is given by Q=Icp () (P - (1 - P)). It fol-

lows that the net change in loop filter voltage over one cycle is zero when the optical

and electrical signals are locked in quadrature and P=0.5. The feedback amplifier is

required to prevent the parasitic capacitance of the photodiodes from simply storing

the charge that should be steered away from the loop filter and delivering it through

charge sharing when the switches transition.

In PLL analysis, the phase detector is characterized by the transfer function from

phase difference to average current. This model is a linearization of the actual phase

detector characteristics and is valid only when the loop has pulled the oscillator into

50

EC

OC

ICP

VCP

TIME

Figure 4-1: Basic current-steering phase detector topology and operation.

the small-signal locking range, but this type of linearization is necessary for loop

dynamics analysis. The phase-current transfer function for this topology is shown in

Figure 4-2. The relationship between net charge per cycle and relative timing of EC

with respect to OC has just been established and the average current is simply ,

so it follows that IAVG = IpD () (P - (1 - P)). We define the two signals to have

zero phase error at quadrature. Therefore, the average current is zero when there

is no phase difference and reaches its maximum of IP when the phase difference is

X. This analysis again assumes that the optical clock has fifty percent duty cycle.

Reducing or increasing the duty cycle of the optical clock will result in a maximum

phase detector gain of DIPD.

4.1.2 Sources of Phase Offset

The conceptual phase detector analysis implicitly assumes the availability of ideal

switches, photodiodes, and amplifiers by representing the circuit of 4-3.1 with the

model of 4-3.2. This simplification is appropriate and necessary for loop dynamics

51

I

IIi

I

I

: 1. L i jI II 1 1

I I ~~~~~~~~~~~IIl/l\i l lm

l I i---------- W_

I

I

I

::

A.: \L

A

Average Current

Figure 4-2: Phase difference versus average current transfer function of current-steering phase detector.

modeling, as it provides a good first-order model of the phase detector, which can be

described mathematically and used in LTI system analysis to characterize loop sta-

bility and damping. A closed-form mathematical description including nonidealities

would be prohibitively complicated, as many of the nonidealities are nonlinear with

respect to output voltage as well as phase difference. Furthermore, such a model is

unnecessary as the mathematical stability analysis is correct to first-order with the

simpler model and all higher-order effects are verified in circuit level simulations.

Nevertheless, it is important to understand the qualitative effect that each signifi-

cant nonideality will have on overall circuit performance. Simulations in the following

sections show that amplifier gain error and switch resistance collectively account for

the vast majority of second-order effects present in the phase detector structure. A

brief examination of 4-3.1 reveals the origin of each contribution. The feedback am-

plifier is intended to prevent unwanted charge sharing by holding each photodiode

parasitic capacitance at the output voltage when the electrical clock signal alter-

nately isolates each photodiode from the loop filter. The switches ideally provide

zero-resistance paths between the circuit components. The simplified circuit of 4-3.2

does not model the effect of deviations from these idealized assumptions. The follow-

ing sections analyze how circuit performance changes when the amplifier gain is not

52

Phase Difference-I

Figure 4-3: Simplification of phase-detector structure.

exactly unity and the switches have non-zero on-resistance.

Amplifier Gain Error

The unity gain feedback amplifier is included in the circuit so that both photodiode

parasitic capacitances will always be held at the output voltage and undesirable charge

sharing does not occur when EC changes state. If there is gain error in the feedback

amplifier, however, the photodiode capacitance will be held at a slightly different

voltage when the switch configuration isolates it from the output filter and this voltage

differential will result in charge sharing when the switches change state to short the

photodiode capacitance to the loop filter. The impact of any voltage differential

introduced by the amplifier is scaled by a factor related to the photodiode parasitic

capacitance and the loop filter capacitance. When the switch closes and the two

capacitances are shorted together, the voltage will change according to the basic

principals of charge sharing shown in 4.1, which simplifies to 4.2. These equations

assume that the damping resistance in series with the loop filter capacitor COUT is

zero because this does not alter the steady-state result of the charge sharing.

53

F

I I[~~~~~~~~~~~~~~~~~~~~~~~~~~~ !

-

VOUT + AVOUT = CP(VOUT + AVAMP) + COUTVOUT (4.1VOUT + /\VOUT (4.1)

Cp + COUT

AVoUT = CPAVAMP (4.2)CP + COUT

4.2 clearly shows that as the parasitic capacitance approaches zero, the output

voltage is not affected by gain error because even large voltage differences on rela-

tively small capacitors will contribute very little charge. Conversely, as the parasitic

capacitance approaches infinity, any gain error will appear directly at the output

node. In a realistic implementation, the parasitic capacitance might be twenty-five

percent of the output capacitance and the influence of the gain offset would be scaled

accordingly. The choice of loop filter component sizing with respect to photodiode

capacitance for this design, discussed in Chapter 5, is consistent with this general

rule.

The steady-state gain error introduced by the amplifier will be identical for both

the upper and lower diode parasitic capacitances, but the amplifier may also have

different up and down slew rates. Each parasitic capacitance is shorted to the output

once per cycle. One parasitic capacitance is shorted to the output when EC changes

state while OC is high and the loop filter voltage is ramping. The other capacitance

is shorted to the output when EC returns to the starting state while OC is low and

the loop filter voltage is stable. Therefore, when the respective capacitors share their

charge with the output node, one will be set to the steady-state offset voltage and

the other will be set to either the up or down ramping error voltage.

The result of this charge sharing is that some fixed quantity of charge will be in-

jected onto the loop filter each cycle. If this quantity is positive, the loop filter voltage

will gradually increase if the inputs are in quadrature and the loop will therefore lock

with the electrical clock transition positioned slightly away from quadrature to allow

the loop to discharge for longer than it charges in order to obtain a steady-state loop

filter voltage. Therefore, any charge sharing due to amplifier nonidealities translates

54

directly into phase offset from quadrature and variations in the amplifier nonidealities

across temperature and process corners translate into skew.

In this implementation of the PLL, the feedback amplifier is implemented with

a simple open-loop unity-gain buffer. Replacing this circuit with a very high gain

amplifier configured in unity-gain feedback could nearly eliminate the steady-state

errors. However, the slew rate problem would not be eliminated, new stability con-

cerns would be introduced, and any variation of the feedback resistors across process

or temperature would still introduce skew between instances. This option is therefore

not obviously superior to the existing open-loop, unity-gain buffer.

Switch On-Resistance Error

Non-zero on-resistance of the CMOS switches also contributes to phase offset from

quadrature. When the optical signal is off and the feedback amplifier is holding one

of the photodiode parasitic capacitances at the steady-state output voltage, there

is no current through the switch connecting the two and the voltages are therefore

equal regardless of switch resistance. When the optical signal is on and the voltages

are ramping, however, the switches must carry the full photodiode current. With an

on-resistance of 1 kM, this will generate a voltage difference of 10 mV in addition to

the amplifier ramping error. 1 k is approximately the resistance of a transmission

gate with a 1 Am NMOS and 3 Am PMOS, both with minimum channel length, in the

current process and at the biases expected for steady-state operation of this circuit.

Increasing the size of these switches will reduce the resistance, but this approach is

limited by the drive capability of the feedback divider. Introducing buffering stages

at the divider output also potentially introduces skew, so the advantage of increasing

switch size to the point where buffering is required is unclear.

A brief consideration of the direction of current flows through the switches reveals

that, when OC is high and the voltages are ramping, the upper parasitic capacitance

voltage will be above the amplifier output voltage and the lower parasitic capacitance

will be below the amplifier output voltage. We will temporarily assume that the

amplifier itself has perfect unity gain in order to simplify this discussion because, in

55

any case, the errors contributed by the two sources may simply be summed to obtain

the total error. In this case, if the output voltage ramps up and then down in steady

state, the lower parasitic capacitance will be shorted to the output when its voltage

is below the output voltage and the upper parasitic capacitance will be shorted to

the output when the two voltages are equal. This will cause a net downward ramp

in output voltage if the electrical and optical signals are in perfect quadrature. The

reverse case, when the voltage ramps down and then up, clearly causes a net upward

ramp for two signals in quadrature. As with the error due to the amplifier gain error,

the effect of switch on-resistance is proportional to the capacitance ratios as described

by 4.2.

Constructive and Destructive Summing of Errors

The sign of the offset due to switch on-resistance is dependent on whether EC and

OC are positioned such that the output voltage ramps in an up-down or down-up

order. The offset due to the amplifier gain error, however, has the same sign for both

cases. Therefore, the two will sum constructively for one charge-discharge order and

destructively for the other. The amplifier used in this phase detector implementation

exhibits a small positive gain error at the locked steady-state voltage. Therefore, if

the voltage ramps up and then down and the error introduced by the on-resistance

is negative, the two errors add destructively and could cancel each other if properly

ratioed. On the other hand, if the voltage ramps down and then up and the error

introduced by the on-resistance is positive, the two will add constructively to create

a faster upward ramp for quadrature signals.

In order to maintain generality, both cases of ramping order have been analyzed

and their differences compared. In fact, however, for each PLL topology only one

order is stable while the other is metastable. In this case, the VCO gain is negative

since increasing the control voltage lowers the output frequency. If the loop is locked

in quadrature with the voltage ramping up and then down and some factor slightly

slows the electrical feedback clock, the electrical signal in the next cycle will arrive

later in the on-period of the optical signal and allow it to ramp up longer than down,

56

thereby further increasing the control voltage and slowing the output clock. This is

positive feedback and therefore unstable. In contrast, if the PLL began in a locked

state ramping down and then up, the same slowing of the electrical clock would result

in negative feedback to speed it up again. The loop therefore locks to the down-up

charging order and the offsets add instead of canceling, but this is not necessarily

problematic since skew due to offset-variation, and not steady-state offset itself, is

the primary concern.

Matching to Idealized Model

In order to verify that these two sources account for the significant majority of the

phase offset from quadrature, simulations of the extracted phase detector containing

MOSFET switches and a real amplifier were compared to simulations of an idealized

phase detector containing perfect switches with a specified on-resistance and a voltage-

controlled voltage source with a non-unity gain. Figure 4-4 shows the results from

these simulations.

The solid line with no slow ramp shows the output of an "ideal" phase detector

with a perfect unity gain feedback amplifier and zero-resistance switches. The two

other solid lines show the TT/27 °C outputs of the real phase detector and an idealized

phase detector with an amplifier gain of 1.014 and on-resistance of 1.16 k. The close

matching of these results clearly indicates that these two factors together account for

nearly all phase error. The dashed lines show the output of the real phase detector

over a variety of temperature and process corner conditions.

The rate of the ramp for quadrature inputs is proportional to the phase difference

required to maintain zero net change in the output during lock. The ramp rate vari-

ation evident in these results indicates that the presence of process and temperature

variations will introduce skew between instances of the PLL on distant parts of the

chip. Simulations of SS, FF, and TT corners resulted in the skew shown in Figure 4-5.

The 200 MHz divider feedback clock variation, instead of the output clock variation,

is shown because this is the signal that directly locks to the optical signal and because

the skew may be seen much more clearly on this time scale.

57

0.5 1 1.5 2 2.5Time (s)

3 3.5 4 4.5 i

x 10

Figure 4-4: Ideal output, actual output at TT/27 C and matched idealized output(solid lines), and actual output over FF/SS/100 °C (dashed lines).

0L

E00

eCo.20*0

aa

1.8

1.6

1.4

1.2

1

0.8

0.6

0.4

0.2

0

-v.2 4 6

Time (ns)8 10 12

Figure 4-5: Skew for locked PLL across SS, FF and TT process corners.

58

............... ... .......... ................. .... .:'''5 ~ : , -Y '

_ ... ........ . ....... . ............. ..... .

. ... . ... ............... ... . . . ..r <~~~f::

................. :........ ........ ·.. ... ... .

. . . . . . . .

1.

1

0.98

a 0.96

C. 0.94

0.92

0.90.9

n RR

Skew Across Process Comers (TT, SS, FF)

...... ........,...". ..........I i

.. ...... ...... .. ...... .........Ii ! I 'i1 I ,I ; I

....... 1 :. ........... .,l ..............

.. ........ i. .. .... .....

II : : .; 1:: :. I. : :::::::I I ',t; l I : I

II iI

-U iI

M.rz

' ' ' ' ' ' ' '

.vG

$

a

a0.

e

gEI-e

8

*0la

C)

Skew Across Temperature (27 C and 100 C)

Time (ns)

Figure 4-6: Skew for locked PLL at 27 C and 100 C.

These simulation used the full extracted PLL with parasitic capacitance at 27 C,

a loop filter of 800 pF and 44 k, and modeled the photodiodes as 10 uA pulses

with 200 fF parasitic capacitance. The skew between the SS/FF corners due to the

combination of amplifier gain error and switch resistance is 450 ps, with the TT corner

roughly centered between the two. Although a single chip would not likely contain

both process extremes, this range is large enough that the skew across opposite ends

of the chip might easily exceed 10 percent of the 625 ps period.

Figure 4-6 shows the skew generated when temperature is varied from 27 C to

100 C. This condition is more likely to occur within a chip and, as shown in the

figure, produces 114 ps of skew between the two divider feedback clocks.

The results just presented include the effects of process and temperature varia-

tion on the switch resistance and the amplifier gain error. It is also important to

understand what percentage of this total skew is introduced by each of these two

sources. Therefore, another set of simulations was completed to isolate the effect of

switch resistance and amplifier gain error variation across the SS and FF corners. In

59

: I I :

I PEI.,' " '. '. . ...... . .

. . . . ... . . . ..- · · · · I. · ·. . . . . . . . . . . .

.... ............ , ........... ................; ......... ... .. 1 ....... ......./... M...............

i... ................. I............... ...

4I : I :

.................... ..................

4 610 12

l.-

order to obtain these results, the phase detector was removed from the layout and

the new layout was extracted with parasitic capacitance and pins for attaching an

external schematic-view phase detector. Two versions of this external phase detector

were created. One phase detector was implemented with CMOS switches but the

feedback amplifier was replaced with a voltage-controlled voltage source with perfect

unity gain. Another phase detector was implemented with the real amplifier but the

CMOS switches were replaced with ideal switches with 1 kQ resistance. Simulation

convergence issues prevented the use of switches with zero resistance, but the use of

switches with 1 kQ resistance is completely acceptable for this test because the resis-

tance is not process dependent and will therefore introduce only steady-state phase

offset and not skew.

Figure 4-7 shows the results of the corner simulations with an ideal amplifier and

real CMOS switches. Figure 4-8 shows the results of the same corner simulations with

the real amplifier and ideal 1 k switches. This result clearly shows that the switch

on-resistance is responsible for nearly all of the skew generated across process corners.

The data presented in these figures shows that the skew is 447 ps across SS/FF with

an ideal amplifier and real switches but only 38 ps with a real amplifier and ideal

switches. A more process-independent amplifier could be designed to reduce the

38 ps skew introduced by that circuit, but further analysis in this work will focus on

the effect of reducing skew by decreasing switch on-resistance. The sum of these two

skew components is not identical to the results obtained for the complete extracted

phase detector, but the difference is relatively small and can be explained by the fact

that the phase detector parasitic capacitance was not extracted for the later set of

simulations.

As discussed earlier in the chapter, increasing the width of the switches proportion-

ally decreases the switch on-resistance. This method cannot be extended indefinitely

as the CMOS switch parasitic capacitances will eventually become large enough that

the feedback divider outputs require significant buffering and the parasitic capacitance

also begins to interfere with the loop filter and PLL dynamics. However, moderate

increases in the switch size are possible. In order to determine the skew reduction at-

60

Skew Across Corners SSIFF: Real Switches-Ideal Amplifier

00E

a00

8U

IM

8cm

6Time (ns)

8 10 12

Figure 4-7: Skew over SS/FF corners with ideal amplifier and real CMOS switches(solid lines) and optical reference (dashed line).

E0

U0

0

a2C.)0

IN

00cmJ

Time (ns)

Figure 4-8: Skew over SS/FF corners with real amplifier and ideal switches (solidlines) and optical reference (dashed line).

61

· i ' ' ' iI :

.. I

··i ''`i

--· · I·-·I···I-··I

. �I��··

.. l....i.__

Skew Across Comers SS/FF: Ideal Switches-Real Amplifier

1.8

1.6

140 1.20

U 0.8

'I-U.0

0cmJ

Skew Across Corers SS/FF: Triple-Size Real Switches-Ideal Amplifier

. . . . . ........ ....... .. ......

2 4 6 8 1CTime (ns)

Figure 4-9: Skew over SS/FF corners with ideal amplifier and triple-sized CMOSswitches (solid lines) and optical reference (dashed line).

tainable by this method, the corner simulations were repeated with an ideal feedback

amplifier and transmission gates three times larger than those used in the original

phase detector. The results of this simulation are shown in Figure 4-9 and the skew

between the SS/FF corners is 275 ps. This is clearly an improvement from the earlier

result, but the improvement is not proportional to the increase in switch width. This

could be partially due to increases in parasitic capacitances and the resulting increase

in the feedback clock rise time, an effect which could be reduced by buffering the di-

vider feedback clock, but the buffers themselves would increase skew so this potential

solution should be approached with caution. In either case, it is unlikely that the

switch size can practically be increased enough to reduce skew to reasonable levels in

this process.

Table 4.1 shows a summary of the skew for each of the above simulation pairs.

These results do not compare favorably with the 26 ps skew obtained in [5] by pas-

sive fuse-based electrical deskew methods at 1.5 GHz, but the disparity may not be

62

I I - , __ - - I

1 .. .... :

. . -. ... -.. - - .-. . ... . ... . .-........ ........... 1 ...i

..... . ........ . . . ..' . .. .. . .. i

Simulation Description Simulated SkewTT/SS/FF corners - Real switches and real amplifier 450 ps

27 C - 100 C - Real switches and real amplifier 114 psSS/FF corners - Real switches and ideal amplifier 447 psSS/FF corners - Ideal switches and real amplifier 38 ps

SS/FF corners - Triple-size real switches and ideal amplifier 275 ps

Table 4.1: Summary of skew sources.

quite as large as it appears at first glance. The reported results use a faster process

and consider only the actual variations across the chip, not the worst-case corners.

However, even these factors cannot realistically account for the entire performance dif-

ference. In addition, these simulations modeled the photodiodes as idealized current

sources with capacitors in parallel. This model neglects transit time and photodiode

matching, which must also be analyzed.

Silicon Optics Considerations

A few of the relevant challenges of silicon integrated optics are discussed in Chapter 3.

Many of these nonidealities will also shift the steady-state lock position away from

quadrature. If, for example, the bottom diode provides twice as much current as the

top diode, then the electrical clock must divide the optical signal such that current is

added to the loop filter for 67 percent of the cycle and subtracted for 33 percent of the

cycle. Variations in transit time will also significantly affect the equal charge division

point. In order to maintain a steady-state voltage as the transit time increases, the

electrical signal must divide the optical signal farther to the right on the time axis, a

fact which may be easily observed in Figure 3-5. These effects are difficult to analyze

and, for all intents and purposes, impossible to simulate within a traditional circuit

simulation environment. Furthermore, integrated optical waveguide technology is still

an emerging field and obtaining equal power splitting is still a challenge. Splitting

ratios of 49/51 percent or better have been reported, but even such a small difference

introduces some error and even ideal photodiodes would produce different currents

due to power mismatch [20].

63

_

..

0

C

aIDa.Qi

00

'S

.0

C

1.8

1.6

1.4

1.2

1

0.8

0.6

0.4

0.2

0

0

Skew With Diode Mismatch: 10 uA Top Diode - 20 uA Bottom Diode

............. .. .... .I I

........... --i:'[ ....... ..... ..........

..... ............ .i. ..I i I_ . . . ... ..............

I .!

_.... 1. ....................

..... .... .. I ..... ... ......

.............. 1.I I_.. ....... .. .................i

........... ................ '. .

I I ' II I

2 4 6Time (ns)

8

.... I . . .... , ....... , .. ............ft .' :lI....... : .I. : ......................... .. ... ..... ......: : I : I

i \ ' I·I · · · · · · · · i ~

10 12

Figure 4-10: Phase difference for locked PLLs at TT/27 °C with 10 /A/20 A versus10 pA/10 /A differential current mismatch.

We will consider two types of photodiode mismatch. First, there is mismatch

due to the parasitic diodes, which we will call "differential current mismatch". In

this case, the top diode in a particular instance of a phase detector has a different

current than the bottom diode in the same phase detector. If two instances of the

phase detector have different differential current mismatch then clearly there will

be skew between the two generated clocks. This is a predictable, first order effect.

Second, skew will be introduced if both photocurrents in one instance of a PLL are

proportionally scaled with respect to both photocurrents in another PLL, a condition

that we will call "common-mode current mismatch". For example, in a given pair

of PLLs, both diodes in one phase detector might produce 10 A and both diodes

in the other diode might produce 12 pA. In this case, skew is generated as a result

of second-order effects, such as the feedback amplifier ramping nonidealities and the

variation of the voltage drop across the switch resistances with varying current.

Accurate determination of the differential current mismatch requires a more exact

64

I I I

: :

process description than that which is available. However, given that the NWELL

is only about 1-2 m deep while the absorption length of silicon is on the order of

10 ,Im, it is reasonable to estimate that the parasitic photodiode would produce the

same current as the intentional NWELL diode. This would result in a system in

which the top diode produced 10 HA and the bottom diode produced 20 PA. If this

ratio were constant across all instances of the PLL, the result would not be skew, only

common-mode phase offset from quadrature. Unfortunately, characterizing the prob-

able variation corners for differential mismatch is even more complex than the initial

estimate of differential mismatch magnitude, requiring knowledge of such parameters

as NWELL depth variation and NWELL and substrate doping concentrations. Since

realistic estimates of variation are not possible, the system was simulated over the

variation range of 10 pA/20 AA versus 10 pA/10 IA in order to determine the phase

offset between the ideal case and the estimated differential current mismatch case

and gain a somewhat quantitative understanding of how smaller variations around

the operating point might contribute to skew. Figure 4-10 shows the results of the

simulations and indicates that there will be 500 ps phase difference between the two

cases.

This result reports a common-mode phase difference and would be impractically

pessimistic as a worst-case skew result. However, it is difficult to determine what

might be a possible worst-case amount of skew generated by this effect. The diode

mismatch should be relatively constant across the chip, meaning that one chip would

certainly not contain both the 10 pA/10 uA and 10 pA/20 PA cases compared here.

However, as discussed in Chapter 3, the ratio might vary due to factors such as

NWELL depth mismatch, doping mismatch and other process variations. Simulations

that consider these factors are not possible within an IC design flow and predicting

the actual skew they introduce is therefore not practically possible.

Common-mode current mismatch could be produced by optical power mismatch

and variations in photodiode sensitivity due to process and temperature. Again, it

is difficult to estimate the potential variation introduced by these sources so an arbi-

trary mismatch percentage was simulated. Figure 4-11 shows the skew generated by

65

Both Diodes with 10 uA v Both Diodes with 12 uA

EP

's.2c

C0E

20

UC.0IPL

C!

4 6Time (ns)

8 10 12

Figure 4-11: Skew for locked PLLs at TT/27 C with 10 A versus 12 ADA common-mode current mismatch.

common-mode mismatch between two PLLs in which one has two 10 A photodiodes

and the other has two 12 A photodiodes. It is reasonable to estimate that two sets of

diodes on opposite corners of a chip might have such a worst-case current difference.

Both simulations were conducted at 27 C in the TT corner and the resulting skew

was 34 ps. As expected, the skew introduced by this second-order effect is relatively

insignificant compared with the skew potentially introduced by the first-order effect

of differential current mismatch.

Because characterization of the variations of optical components was not possible,

this analysis of phase offset and skew is based on highly speculative estimates of po-

tential levels of optical current mismatch. For this reason, these results are presented

only for their value in approximate determination of the amount of skew potentially

introduced by a given quantity of current mismatch. They are not based on actual

knowledge of the effect of corners and temperature on photodiodes and should not

be interpreted as the predicted skew for this particular PLL implementation.

66

"L

·~~ l r

!

I I

:..... t

:...

I

.. : . ....

.......... .:.....

............ :......

....... ... :......

... ..... .. .......

j ·

.. .. .~ :. . . . .

. ...... : '` ~' ' '' '

. .. ... i... ........ ' ' `

... :

... :

....

.... ;

.... 1: I

.. . . . . . . . . . .:. . . . . . . . . . ; . . - . . . .

.... .... I ... :... ...... I ...... .. ...

............ :......... 'y; ;~~~~~~~~~~~~~~~~~~~~~~~i

........ .....

ZZZ

4.2 Extensions of Current Steering Topology

The sources of skew in the phase detector can be divided into the two main cate-

gories of photodiode mismatch and variation in the switches and feedback amplifier.

Choosing a topology that requires only one photodiode could mitigate the first of

these problems, but such topologies may introduce other sources of skew.

4.2.1 Current Mirrors

Using current mirrors is one obvious idea that comes to mind when attempting to

make a similar phase detector using only one photodiode. The topology shown in Fig-

ure 4-12 appears, at first glance, to provide a solution whereby the single photocurrent

is perfectly copied thereby eliminating the problem of diode mismatch. However, the

mirroring is not symmetrical since the UP current is mirrored once and the DOWN

current twice, which introduces a delay between the two currents. Furthermore, both

currents vary with output voltage. The voltage dependence could be somewhat im-

proved by using cascode current sources, at the expense of decreased headroom for

the remainder of the phase detector. However, when the current begins to pulse on

and off during normal operation, other problems become evident. The original phase

detector used a certain amount of the parasitic capacitance as a necessary part of the

loop filter and the photocurrent was divided between the loop filter and the parasitic

capacitance according to the capacitance ratio. In the mirrored case, however, the

current through the first leg of the mirror is simply a function of the voltage on the

photodiode capacitance, which effectively creates an integrator. The mirrored cur-

rent will therefore ramp up and down in a triangle wave as the photocurrent proceeds

through the square wave pattern.

This additional integrator alone might not necessarily be a problem. The phase-

to-current conversion ratio is reduced, but adjustments to the loop filter will correct

for this change. The larger concern, however, is a possible increase in skew. The

sources of skew inherent in the phase detector itself are not eliminated by this topol-

ogy modification since this circuit still contains the same basic phase detector core.

67

Figure 4-12: Current mirror approach.

In addition, whereas previous analysis of the skew introduced by process variation

modeled the photodiodes as perfect current sources, the current mirrors add another

level of circuitry affected by process and temperature.

Therefore, while the current mirror does eliminate the problem of differential pho-

todiode mismatch, it introduces new sources of skew to a topology that already had

too many. Implementing this additional circuitry to correct for photodiode mismatch

when inherent phase detector mismatch already exceeds the specifications is mis-

guided. It is more productive to take a step back from this topology and explore the

design space for a topology with fewer inherent sources of timing mismatch.

4.2.2 Photodiode in Feedback

It is worthwhile to examine the topology shown in Figure 4-13 [21]. This circuit

partially eliminates the ramping effect that occurs in the current mirror topology.

When the photodiode is illuminated, the increased photocurrent decreases the voltage

on the connected NMOS gate and therefore increases the output voltage. This voltage

is capacitively divided and fed back to the NMOS device biasing the photodiode,

which increases the current to match the photodiode current and hold the photodiode

68

Vdd

Figure 4-13: Feedback amplifier approach.

voltage steady. For a step increase in photocurrent, the diode parasitic capacitance is

therefore able to reach a steady-state voltage much faster. This circuit, as shown here,

was originally intended as a direct current-voltage amplifier. Instead, the current from

the bias stage could be mirrored into the phase detector.

This topology reduces the ramping problem, but does not solve other problems.

The inherent phase detector skew remains and, as in the simple mirror case, the new

circuits add more potential sources of skew. In addition, the feedback configuration of

this amplifier requires analysis of damping and stability. The complete analysis shows

that it is not possible to obtain much gain from the structure without accepting a

certain amount of gain peaking [21]. Effectively, this topology minimizes the ramping

problem at the expense of introducing a stability problem while keeping the other skew

sources of the current mirror approach and core phase detector relatively constant.

This topology is preferable to the current mirror scheme if increasing speed is a

primary concern, but it does not provide any skew advantages.

69

4.3 Topologies with Alternate Phase Detector Cores

All of the topologies described in the previous section use the same basic phase detec-

tor core with various modifications. Therefore, they all retain at least the minimum

skew generated in that core due to amplifier gain error and on-resistance. Using a

high-gain op-amp in feedback would lower the gain error, at the expense of some

additional complexity and stability concerns, but the on-resistance problem is much

more fundamental. In addition, adding mirror devices may improve the photodiode

mismatch problems but, in the process, introduces more possible skew sources. Ex-

amining topologies that depart from this phase detector core structure may prove

more promising.

4.3.1 Bang-Bang Phase Detector

The topologies considered until now are all linear phase detectors. That is, the error

signal is linearly proportional to the phase difference. A bang-bang phase detector, in

contrast, simply generates a constant magnitude signal indicating that the feedback is

either early or late with respect to the reference. Bang-bang phase detectors of various

topologies are commonly used in clock and data recovery (CDR) applications.

A possible circuit implementation of an optical-electrical bang-bang phase detector

is shown in Figure 4-14. This circuit bears some resemblance to an electrical current

integrator circuit presented in [22], but there are also significant differences. Instead of

a photodiode, the original circuit employed an electrical current source with relatively

limited parasitic capacitance, so the issues of charge sharing and capacitor reset were

not as centrally important. Replacing the current source in the original circuit with

a photodiode provides the capability of steering the optical current from a single

photodiode onto two separate capacitors and comparing their voltages to determine

the relative phase of the optical and electrical clocks.

During the reset phase, both capacitors are shorted to the supply voltage and the

voltage across the terminals is reset to zero. Then, during the sample phase, each

capacitor is discharged by an amount proportional to the time that the feedback signal

70

Figure 4-14: Bang-bang phase detector approach.

is high. Ideally, the capacitor voltages would be discharged by identical amounts over

a single cycle when the optical and electrical signals were in perfect quadrature. A

series of amplifiers and latches, described in the reference and carefully designed to

minimize systematic offsets, is used to generate a full-swing signal indicating early

or late feedback signal arrival based on the determination of which capacitor was

discharged for longer.

The requirement that various nodes be reset each cycle introduces complexities

that are not present in the originally proposed phase detector. In addition, the

mismatches associated with switch on-resistance are not significantly changed. In

light of these facts, and given that this circuit gives up the linear phase detection

characteristic of the other topology, it does not provide a significant advantage.

Charge Sharing and Parasitic Node Voltage Reset

In order for the first-order model of quadrature phase lock to be applicable, the

parasitic capacitance of the photodiode must be reset to VDD when the electrical

clocks switch in the center of the optical pulse. If there were no mid-cycle voltage

reset, the circuit had perfect quadrature optical/electrical inputs, and all internal

71

R

nodes were initially precharged to VDD, then one node would begin to discharge

when the optical signal went high. After half of the optical high period, the electrical

clocks would switch. At this time, the parasitic capacitance would have some voltage

below VDD and this charge would immediately share with newly switched node and

effectively initialize it to some voltage lower than VDD. Since the signals are in perfect

quadrature, this second node will now be discharged for the same amount of time as

the first node and will therefore have a final voltage lower than the first node by

an amount proportional to the ratio of parasitic and storage capacitance. Given the

lower voltage initialization of the second node, the first node must remain higher for

longer than the second to generate equal voltage on the two outputs.

There are also asymmetries introduced because the capacitance of the photodiode

is variable with voltage. If the voltage were not reset, the voltage variable capacitance

of the photodiode would discharge over different average voltage ranges for the two

sides and therefore present different average capacitance. This would clearly lead to

a differential in the required active time of each side.

It is therefore necessary to reset the parasitic capacitance to VDD at switch time.

Because there is only one clock edge available, a pulse-based approach must be used.

A relatively large PMOS could be connected to node and driven by a short ON pulse

occurring just at the edge of the electrical feedback clock. This pulse could be created

by feeding the electrical clock into an inverter chain and then taking the XOR of the

output and the original signal, a technique commonly employed in various types of

high-performance pulse registers.

These reset techniques do not completely eliminate the phase offsets. The reset

pulse cannot occur until after the first node is isolated from the photodiode by the

electrical clock; earlier reset would contaminate the data on that storage node. There-

fore, the time required for the reset pulse will be taken from the half period in which

the second node is discharged by the photodiode. The second node will therefore

be discharged slightly less than the first for perfect quadrature inputs, driving the

PLL to lock with the electrical clock switching before the prescribed time in order to

compensate and equalize the voltages.

72

Initialization Reset Timing

After each evaluation cycle, the differential voltage on the two capacitors must be

sampled and all internal voltages must be reset to VDD. Ideally, this evaluation and

reset would occur just after the optical reference goes low. However, the optical

clock input is small-signal, not full-swing logic level, and cannot therefore be used for

timing events. Once the loop is in lock, an additional fully electrical PLL or DLL

could be used to generate an electrical clock in phase with the optical clock. Since,

however, this clock would be referenced to the electrical feedback clock, it is just as

effective to use the edge of the electrical clock that occurs during the optical low

half-cycle in steady-state. However, during the locking transient of PLL, the phase

relationship of the optical and electrical clocks is undefined. Therefore, during the

settling time, the reset signal will likely be asserted many times in the middle of the

optical high half-cycle. There is no way to avoid this situation, given that the relative

position of the optical clock is unknown during this period, but it will likely have a

significant and analytically complex effect on loop dynamics. A complete analysis of

these effects would be required prior to the implementation of a PLL with this type

of phase detector.

Other Potential Sources of Phase Offset

This topology also has switched small-signal currents, which means that the on-

resistance problems encountered in the previous topologies will also be a concern

here. The problem of the feedback amplifier gain error is eliminated since nodes are

simply reset to VDD, but the first stage differential amplifier offset is also a concern.

Susceptibility to Skew

While it initially appears that the choice of a bang-bang phase detector would allow

enough circuit simplification to minimize potential sources of skew, there are still

several issues to consider. The differential amplifier may have process, mismatch,

and temperature dependent input offset leading to a different definition of when the

73

capacitor voltages are equal. The delay in the inverter chains in the pulse-based reset

circuit will be temperature and process dependent and the skew will be a function

of this pulse length. The on-resistance of the switches will vary with process and

temperature. In short, the majority of the main skew sources in the original phase

detector have translated to this circuit.

4.4 Conclusions

Many topologies have been explored, but none provide the precision and insuscepti-

bility to process and temperature that is required of a receiver for high-speed clocking

applications. On-resistance, amplifier gain error, inverter delay, and numerous other

parameters vary so significantly with these process and temperature variations that

it is extremely difficult to find a topology that is not affected by the skew poten-

tially introduced by these sources. At first investigation, many of these topologies

appear quite promising, but the same sources return over and over again to intro-

duce skew. TIA approaches are traditionally criticized for being overly susceptible

to skew and jitter. Perhaps the examination of these topologies simply shows that

it is not the TIA topology, but rather the small-signal to logic-level conversion that

is inherently prone to skew and jitter introduction. In this case, simply finding a

topology that appears simple and elegant at first glance may not solve the problem.

Instead, attention to designing TIA structures or small-signal phase detectors that

contain explicit calibration for mismatch and variation will be required to provide an

acceptable optical-electrical conversion solution.

74

Chapter 5

PLL Loop Dynamics and Complete

Circuit Simulations

The PLL dynamics are of primary importance in designing a functional and reliable

optical clocking system. The analysis of PLL operation requires careful considera-

tion of both large-signal characteristics such as locking range and small-signal loop

stability and damping. These general concerns are relevant to the design of all PLL

topologies, but the specific analysis of the effects is dependent on loop type and order

and therefore varies significantly for PLLs with different phase or phase-frequency

detectors and loop filters. The analysis in this chapter will focus on a PLL using

the original Type II charge steering optoelectronic phase detector and a second-order

loop filter.

5.1 Optical PLL Analysis

The two most critical design criteria for the dynamics of the optoelectronic PLL are

acquisition range and linearized loop stability and damping. Qualitative illustrations

of these two possible failure modes, along with an illustration of idealized proper

operation, are shown in Figure 5-1. Normal operation of a well-damped loop within

the acquisition range is shown in the top frame; in this case the PLL frequency

initializes within the acquisition range and the well-damped loop drives the frequency

75

to lock with minimal ringing. The second frame shows an illustration of a severely

underdamped PLL; the loop initializes within the acquisition range but, instead of

locking, simply oscillates around the desired control voltage. The third frame shows

an illustration of a waveform that the PLL might generate if the loop acquisition

range were much smaller than the VCO tuning range and the VCO initialized to a

frequency far from the reference; the PLL produces a sinusoidal waveform at the beat

frequency of the feedback waveform and the reference, which never approaches the

control voltage that would be required to match the frequencies. Obviously, careful

analysis of acquisition range and stability are required to guarantee functionality of

the PLL. A complete and concise tutorial on PLL design and stability is presented in

[8].

For a phase detector PLL, the acquisition range and stability are determined by

the loop filter transfer function, VCO gain, divider ratio, and phase detector gain.

In addition, range and stability are typically affected in opposite ways by changes

to these parameters such that improvements in one will require compromises in the

other. The following analysis sections will first analyze the factors that determine lock

range and stability and then present simulated results of the PLL demonstrating that

the loop is stable and capable of locking for loop filter voltages initialized throughout

the possible range.

5.1.1 Acquisition Range

The acquisition range of the PLL must encompass the entire VCO range or the

control voltage may initialize outside of the acquisition range and cause the loop to

fail to lock. Quantitative calculations of acquisition range are extremely impractical

because, while the loop is acquiring lock, the phase relationship of the two signals

cycles through a wide range and cannot be linearized around the lock point to create a

LTI system model for analysis. In certain specific cases, various simplifications allow

relatively accurate approximate analytical determination of the range [8].

Qualitative understanding of the acquisition process, however, is much more ac-

cessible. Both phase and phase-frequency detectors have nonlinear transfer functions

76

& 1.5

:o 1

o.5000

2

zD .5

0.5)

Well Damped

Over Damped2

1.5

--. . ............ ........

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Normalized Time (s)

Figure 5-1: Well-damped, overdamped, and underdamped loop dynamics.

when considered across the entire range of phase offset. Figure 5-2 compares the

transfer functions of the optoelectrical phase detector and a PFD. While both ex-

hibit nonlinear characteristics, the PFD provides negative feedback through the en-

tire phase range while the simple phase detector reverses sign and provides positive

feedback if the phase is too far from lock.

If the VCO control voltage initializes far outside the acquisition range and the

feedback and reference frequencies are dramatically different, the phase relationship

rapidly cycles through the portions of the phase detector range that alternately push

the loop towards lock and away from lock. This behavior generates a sinusoid on

the VCO control voltage with a frequency determined by the beat frequency of the

feedback waveform and the reference waveform. Signals near lock will generate slow

sinusoids while those far from lock generate fast sinusoids. Furthermore, the loop

filter is driven by a fixed-current charge-pump type structure so the magnitude of

the sinusoid is proportional to period of the error signal. Explained qualitatively, if

the two frequencies are very close together, the phase relationship changes slowly and

77

_ / \: .. : .... .. : ... ~~.. ............ ........... .. ........ .......... ::.. _

... , ~~~~~~~. . .. . . . . . . ._i i i j~~~~~~~~~~~~ i~~~~~~~~~~~

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Under Damped

. . _ . . . . . .

·- -- ·- ·- · _-~ ·- · · ~~·__ -· ·i ... V· - L .· C -· ·- -- ; k.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

i i i ! I i i 'I

Phase Difference

Figure 5-2: Typical characteristics of a phase detector (top) and a phase-frequencydetector (bottom).

the phase detector remains in the positive gain region for a long time and increases

the loop filter voltage significantly before transitioning to the negative region, and

vice versa. If, however, the frequencies are far apart, the phase relationship changes

rapidly and the phase detector will produce a few UP pulses followed by a few DOWN

pulses and the resulting error signal will have small magnitude.

Assuming that the phase detector topology, divider ratio, and VCO gain and

range are fixed, the acquisition range is determined by the loop filter and the PC

ratio of the photodiode. If this ratio is too large, additional parallel capacitance may

be added to C2 to adjust the filter appropriately. However, it is not possible to add

components to appropriately retune the filter if the -P ratio is too small. The impulse

response of this filter is shown in Equation 5.1. It is obvious, by inspection, that the

impulse must initially charge C2 and then redistribute to balance the voltage on C1

and C2 with a time constant determined by all three passive components. The loop

filter voltage consists of fast voltage ripple due to the charging transients and a slower

acquisition waveform based on the integral portion of the impulse response. Increasing

C2 decreases the cycle-to-cycle voltage ripple associated with the equal charging and

discharging behavior of the phase detector, but also reduces the magnitude of the

78

Phrnase-Current uain

Phase Difference

_· ._·

semi-sinusoidal locking transient thereby decreasing the acquisition range. Increasing

C1 also decreases the acquisition range but without reducing ripple, as the initial

charge must all be integrated on C2. Increasing R damps the loop and improves

stability at the expense of added voltage ripple. However, some amount of voltage

ripple and reduction in range must be tolerated to obtain acceptable damping of the

overall loop dynamics.

The poor sensitivity of silicon integrated photodiodes makes it difficult to achieve

good performance. Diodes fabricated and tested in the same 0.18 Hm process

produced 26 mA when illuminated with mW of optical power [13]; the analysis in

Chapter 4 shows that to obtain 10 MA with similar illumination power and acceptable

transit time would require a diode with nearly 500 fF of parasitic junction capaci-

tance. Figure 5-4 shows that, even with C1 minimized, this parasitic capacitance

of C2 is enough to limit the acquisition range to the point where the loop will not

lock. Fortunately, diode performance is expected to improve in the future and higher

power, bench-based, laser sources may be used to obtain higher - in the context of

research where cost and manufacturability is not a primary concern. It is reasonable

to estimate that, with such a source, the required 10 HA could be obtained from

a photodiode with 200 fF of junction capacitance. All subsequent simulations and

analysis will therefore assume a diode with these characteristics.

h(t) = (C) ( ) ( )t 1 (5.1)+ C2 C1 + as

The simplified non-locking case of a near-sinusoidal waveform on the loop filter

will only occur if the reference and feedback frequencies are so far from lock that

the loop gain is insufficient to exert any noticeable locking force on the loop. For a

realistically designed loop, the more common scenario is cycle slipping. In this case,

the initial feedback frequency is still too far from the reference for the loop to acquire

lock without transitioning into the positive feedback portion of the phase detector

range. However, the two frequencies are close enough that the VCO control voltage

waveform has lower frequency and higher magnitude. This response brings the control

79

Figure 5-3: Loop filter topology.

Control Voltage for Out-of-Range Signal1

o.8 . .... .............. ..... . ........ . .. ......... ... .. . ...... . . ..... .... ...

0 .7 . ........... ................ ...................... . ....................... ...................

0.6 0.3 .....

: 0.5.

. . . ...... .... .

0. 0 0.2 0.4 0.6 0.8 1 1.2

Time (us)

Figure 5-4: Photodiode capacitance of 500 fF limits lock range.

80

rem '1 r0'%

R

% I %.,A C.

Cycle Slipping and Locking

Time (us)

Figure 5-5: Cycleslipping: Simulation of the PLL with the diodes modeled as currentsources with 200 fF parallel capacitance and a 100 fF/20 kQ loop filter.

voltage close enough to the edge of the capture range that the loop gain increases

and exerts a net effect in the correct direction on the loop filter before the sign of the

feedback reverses. Therefore, although the loop does not acquire lock immediately,

each cycle slip will bring the VCO control voltage incrementally closer to the voltage

required to match the reference and the loop will eventually stop cycle slipping and

lock to the reference.

Figure 5-5 shows an example of cycle-slipping in a version of the extracted PLL

simulated with a 200 fF parasitic capacitance and a 100 fF/20 KQ loop filter. Reduc-

ing the photodiode parasitic capacitance to 200 fF sufficiently increases the acquisition

range so that the loop acquires from lock from its natural startup voltage. This figure

also intentionally illustrates the effects of poor loop filter design. Although the loop

is at least stable in this case, unlike the illustration in Figure 5-1, it is extremely

underdamped and the oscillations around the operating point decay slowly. Careful

analysis and design of the loop filter can be used to obtain significantly improved

81

alV .... ''vv -,:V . ..

........................ ! . .................................. ....

.

. ..... . .. ....... :.. ......

I

2

settling dynamics.

5.1.2 Small-Signal Stability Analysis

A formal analysis of stability and damping provides both qualitative and quantitative

understanding of the PLL dynamics. The first step is the linearization of the phase

detector. Within the range of ±fr, the detector is linear and has a transfer function

given by 5.2. The VCO voltage-phase transfer function and the divider phase transfer

function are easily obtained by inspection and are shown for completeness in 5.3 and

5.4, respectively.

IH(s) = - (5.2)

7'r

H(s)= KvCo (5.3)

1H(s)(5.4)

The transfer function of the loop filter is also obtained by basic Laplace methods.

For simplicity of expression, we will define the variables b, T2, and K in 5.5, 5.6, and

5.7. This analysis is similar to the results presented in [2], though the presence of a

feedback divider introduces some differences.

b =1I + C' (5.5)C2

2 = RC1 (5.6)

K= (I) Kvcor2 (5.7)

It is then straight forward to derive the loop filter transfer function of 5.8 in terms

of these defined variables. This filter has one zero and two poles, so the application of

82

Black's Formula to derive the closed-loop transfer function results in the third-order

system of 5.9.

Hs= ( b ) (sC( + 1 (5.8)

H(S) b 712(5-9)s3r2 +K b- i (K) (b- 5)

The Routh-Hurwitz stability criteria may be used to demonstrate the guaranteed

stability of the system. This criteria states that a third-order system with the charac-

teristic equation s + as2 + bs + c will be stable, though not necessarily well-damped,

on the condition that a, b, c > 0 and ab > c. The characteristic equation of the closed-

loop PLL, obtained by multiplying the numerator and denominator by , is given in

5.10.

s3 +2b ) (K)(b-l) (K)(b - 1) (5.10)T2) (N ( T2 ) N (T2)2 (5.10)

If C1 and C2 are both nonzero and finite, then b > 1 and the necessary condition

that b -1 > 0 is satisfied. N must always be greater than zero as must 2 since

resistors and capacitors must have positive values. Finally, K will be greater than

zero if the phase detector gain and VCO gain have equivalent sign. In this case, the

VCO gain is negative and the phase detector range contains metastable points with

both positive and negative gain. The loop will lock to the point within the negative

gain range and K will therefore be positive. Since the product of any number of

positive numbers is clearly positive, this set of criteria is sufficient to determine that

all the coefficients of the characteristic equation are positive. By inspection, since

ab = bc, the condition ab > c will be automatically satisfied since b > 1. This proves

that the loop must always be stable.

The loop must be well-damped as well as stable. Because this is a third-order sys-

tem, the closed-form analytical methods for obtaining critical damping in a second-

order system cannot be applied. Furthermore, the loop dynamics cannot be con-

83

sidered and optimized without regard to the acquisition range, since increasing the

damping of the complex poles results in decreased range and simply designing the

filter for optimal damping might result in unacceptably narrow acquisition range.

Nevertheless, a loop filter design may be obtained by keeping both stability and ac-

quisition range in mind, using Matlab to perform root locus analysis of the system,

and verifying acquisition range and stability with circuit-level simulations. Using

these methods, and assuming a parasitic capacitance of 200 fF, the values of 800 fF

and 44 k were chosen for the loop filter components.

5.2 Final Simulated Results

The final PLL layout was completed, including the chosen loop filter values, and is

shown in Figure 5-6. The photodiodes were modeled by ideal current sources and

200 fF capacitors. The entire PLL layout was extracted with parasitic capacitance

included and simulated. Figure 5-7 shows that this choice of loop filter does produce

a PLL with well-damped dynamics.

The loop filter also provides sufficient acquisition range. In Figure 5-7, the initial

conditions of the loop were determined by the DC bias point analysis of the simulator.

This is likely to be the starting point experienced by the real loop, but since any state

is possible at startup it is necessary to guarantee that the loop can lock from any

possible initial loop filter voltage. A source follower shifts the VCO control voltage

about 0.5 V below the loop filter voltage, which is already slightly limited in voltage

range by the feedback amplifier. Therefore, it is reasonable to assume that control

voltage on the VCO itself will initialize somewhere between 0.1 V and 1.0 V. Figure 5-

8 shows that the loop is able to acquire lock for initial voltages at both extremes of

this range.

If area were a primary consideration, the LC VCO could be replaced with a ring

oscillator structure. The structure should be composed of low-jitter, self-biased ele-

ments such as those proposed in [10] and the tuning range of the resulting VCO should

be carefully analyzed. Ring oscillator VCOs typically have wide tuning ranges, but it

84

mI P -5 .'i A -4' --'4Z I >7'. - I

Figure 5-6: Complete layout of the PLL.

85

- . -0 4 1I;;, : ~ - - - .. - -- - . -

35= ~rmrt~nm~hmmi~` --

M=~1L

7777\�

0.9

0.8

0.7

0.6

> 0.5eP,°0

0.3

0.2

0.1

n

Cycle Slipping and Locking - Well Damped

0 0.2 0.4 0.5 0.5 1 1.2Time (us)

Figure 5-7: Well-damped locking: Simulation of the PLL with the diodes modeled ascurrent sources with 200 fF parallel capacitance and a 800 fF/44 kQ loop filter.

is possible to limit the range by employing current-starved oscillator elements. If the

tuning range of the selected oscillator were too large, the oscillator could potentially

initialize outside of the acquisition range and the loop would never acquire lock. The

identical concern was addressed in this work for the PLL with an LC VCO, but the

magnitude of the potential problem is larger for a ring oscillator VCO with larger

tuning range. The small-signal stability analysis of a PLL with a different oscillator

would remain unchanged with the exception of the VCO gain constant.

Finally, Figure 5-9 shows the 200 MHz reference and feedback waveforms and the

1.6 GHz output clock waveform generated by the PLL. Due to the nonidealities of

the phase detector, discussed in Chapter 4, the 200 MHz waveforms are not locked in

perfect quadrature. However, the phase difference between the two has settled to a

constant value and they are therefore locked, albeit to a phase with some phase offset

from quadrature.

While there is still good reason to choose a PFD over a phase detector given

86

. . .. . . . . . . . . . . . . ... . . . . . ... . . . . . . . . .

............. . . . . . . . . . .. .... . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. .. . . . . . . . . . . . .. ..

....................... ................ ................ ..............

t ~ ~~~~~~~~~~~~~~ -- ----

, .... ........ ... ..... ......

l _ , ... ... ..... .. ......... ....... ... ... . . ... .. ..... ... ... ..... ......... .. .. ........................... -

... . .. . ........ ......... ... ......... . . . . . . . . . . .... .. .. . ....... ...: : : : :,~~llllm~nllllllnnnnn: :- : :

l-~~''~~'' ~ ~ ''~~~'' ~ ~ '''':'' ~ ~ '''' ''

1

.. . . . . . . . . .:. . . . . . . . . . . . . . : . . . . . . . . . . . . . . . .... . . . . . .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . .

-1 1 . . . . . . . . . . . .. . . . . . .: .. . . . . . . . . . . . . . . .. . . . . . . . . . . . -

-- -- --

PLL Locking Transient

Figure 5-8: PLL locking from both extremes of input voltage range.

C0aM.E00

a

aC

0'0r

aa2a

0C.)01

Time (ns)

Figure 5-9: VCO output clock (dotted), optical reference (dashed, 10 /A amplitudescaled for comparison), and divider output (solid) shown at the end of the lockingtransient of Figure 5-7.

87

2Time (us)

Optical Reference, VCO Output, and Divider Output

the option, these results show that practical range and resolution for this applica-

tion will be achievable with the optical-electrical phase detector when the expected

improvements in photodiode performance are realized.

88

Chapter 6

On-Chip Skew Measurement

Accurate characterization of the skew between two instances of the optical PLL is

required in order to assess the performance of the optical clock distribution system.

At 1.6 GHz, the output clock frequency is too high to simply drive the clocks off chip

through pads and measure them externally. Even if this were possible, the required

output buffers could potentially introduce significant skew between the outputs and

cast doubt on the accuracy of the results. High-speed probing of the outputs would

be required to obtain the accuracy necessary for jitter characterization and could also

be used to measure skew very accurately. However, this type of test setup is quite

complex, sensitive, costly, and far more accurate than necessary to obtain reasonable

skew characterization. It is desirable to implement on-chip test circuitry to determine

skew without probing and with precision on the order of 50 ps. A standard time-to-

digital converter (TDC) is a practical way to obtain these measurements and provides

an appropriate balance between resolution, range, and complexity.

6.1 TDC Concept

A time-to-digital converter (TDC) uses a set of incrementally delayed clock edges

to sample a waveform many times at equally spaced intervals and generate a digital

output describing the waveform. Skew can be measured by sampling the two signals

with two TDCs using the same delayed clocks and observing the relative transition

89

positions. Figure 6-1 shows a block level schematic of the dual TDC designed to

characterize the skew of the PLL output clocks. A state machine, described in detail

later in the chapter, is used to control the converter operation. In the first stage of

operation, the sample waveform (SAM) goes high and the state-change propagates

down the inverter chain clocking each one of the sampling registers (SA) with a pro-

gressively delayed clock and sampling the two input signals, INI and IN2. When the

sample output signal (SAO) goes high indicating that the sampling stage is complete,

the control logic sets the multiplexer control signal (S/L) so that each shift register

(SH) takes its D input from the Q output of the adjacent sample register (SA) and

then generates one clock pulse on the shift register clock (CLK) to load the data.

Finally, the control logic sets the multiplexer control signal (S/L) so that each shift

register (SH) takes its input from the output of the previous shift register, clocks the

shift registers with CLK until all the data in the shift register chain has been shifted

off chip via signals SHO1 and SHO2, and then repeats the entire process from the

beginning.

Skew may be determined by comparing the sampled versions of each of the two

PLL output clocks, since the off-chip waveforms generated by the TDC outputs are

sampled and time-scaled representations of the on-chip 1.6 GHz clock signals. The

time-scaling factor is proportional to the ratio of the sampling rate and the shifting

rate. The sampling rate may be determined by configuring a replica of the TDC

critical path as a ring oscillator, measuring the period, and dividing by the number

of stages to determine the delay between sampling edges. The shifting rate is defined

by the frequency of the shift register clock (CLK).

6.2 Critical Path

The resolution of the TDC is determined by the inverter chain and registers which

comprise the critical path. In this 0.18 pm process, the F01 ring oscillator delay is

about 40 ps in the case where the devices are large compared to any interconnect

parasitic capacitance. Making the inverters in the delayed clock generation chain

90

CLK

SAM

S/L

*-0

00

000

00

000

000

Figure 6-1: Time-to-digital converter.

large enough that the sampling register load is only a small percentage of the next

inverter load approximates the F01 case and therefore provides 40 ps resolution.

This resolution figure assumes that the data will be sampled at every available

sampling phase, which requires that the sampling registers must be single phase and

alternately positive and negative edge triggered. The true single-phase clock (TSPC)

register is a common topology that meets these initial criteria [23]. However, if

the positive and negative edge triggered registers have substantially different setup

and hold times, then the accuracy is reduced and a single transition in the data

might even result in a "1111010000" or "0000101111" style "bubble" in the output

data. This type of glitch occurs when an earlier register has a shorter setup time

and captures a transition while the next register has a much longer setup time and

either completely misses the transition and stores the old value or enters a metastable

state. Standard TSPC registers exhibit this problem because opposite polarities of

registers have different setup times since data must propagate through either one or

two logic stages before the clock edge. The more symmetrical split-output TSPC

style, proposed in the same paper and shown in Figure 6-2, provides better matched

setup and hold times and is the best choice for the TDC sampling register topology.

The metastability problem is not entirely eliminated because this type of system

91

Figure 6-2: Split-output TSCP latches.

provides no way to guarantee that the setup and hold times will not be violated by

the delayed clocks, but this latch style reduces the problem and provides the best

available compromise between the various design criteria.

The total time period sampled by the critical path is the product of the single-

stage delay and the total number of stages. The TDC should be designed to sample a

minimum of two periods of the clocks to show two consecutive positive and negative

edges in context. However, the TDC critical path layout is modular and compact

so there is no significant area or design time penalty for extending the time range

to show the generated output clocks over a full period of the input reference and

provide a means for characterizing the effect of the steady-state ripple on the control

voltage. For this implementation, a 128 stage TDC was designed and provides 5 ns

of sampling range in order to sample about 8 periods of the generated clock at the

nominal 625 ps clock period.

6.3 Control and State Machine

A microcoded state machine (MSM) is the best way to run the TDC and guarantee the

appropriate sequential manipulation of the TDC critical path control. The relatively

92

simple state machine table is shown in Table 6.1.

The state machine controls the shift register clock (CLK), the shift/load signal

for the multiplexers (S/L), and the sample pulse for the TDC critical path (SAM).

An additional counter, external to the state machine, is used to count the shift pulses

and is reset at the appropriate time by the S/L signal, which happens to be asserted

at the correct time to be reused for this purpose.

A complete cycle of the TDC begins when the MSM enters state "Sample" and

issues a pulse to the inverter delay line. At this time, the multiplexers are configured

to load, in preparation for the next state. The MSM advances to the "Loadl" state

when the "SAMPLEDONE" signal, generated from the XOR of the delay line input

and output, indicates that the signal has propagated through the entire delay line.

In "Loadl" and "Load2", the MSM generates one complete pulse on the output

"CLK" in order to load the data into the shift registers. Because S/L is asserted

and also serves as the external counter reset enable signal, this pulse also resets the

external counter to the zero state. The next state, "Switchl", is a dummy state

to switch S/L from 1 to 0. The TDC will spend the vast majority of time in states

"Shiftl" and "Shift2", cycling between the two states until the "SHIFTDONE" signal

generated by the external counter indicates that all 128 bits have been shifted off the

chip. The MSM then progresses through the final two dummy states and restarts

the sample/shift cycle. Because this control is integrated, the test setup need only

observe the two shifted outputs on an oscilloscope to determine the skew of the two

PLL output clocks.

6.4 Implementation

The TDC critical path, including the delay line, sampling registers, and shift registers

were custom designed through the layout stage in order to guarantee the required

performance in timing, symmetry, matching, and speed. The 8-by-9 ROM was too

small to be generated with the commercially available ROM tools and was therefore

laid out by hand. A multiplexer implementation would also have been possible, but

93

State | (Name) I CLK S/L SAM CNS[2:0] Input Select[1:0] Polarity

000 Sample 0 1 1 Sample SAMPLEDONE 0

001 Loadl 1 1 0 X AlwaysO 1

010 Load2 0 1 0 X AlwaysO 1011 Switchl 0 0 0 X AlwaysO 1100 Shiftl 1 0 0 X AlwaysO 1101 Shift2 0 0 0 Shift1 SHIFTDONE 0

110 Switch2 0 1 0 Sample AlwaysO 0

111 Dummy 0 1 0 Sample AlwaysO 0

Table 6.1: State table for the microcoded state machine.

the ROM structure was chosen for simplicity and ease of integration. The control logic

is clocked at only 100 MHz and the performance and timing demands are therefore

relaxed, allowing Verilog design and synthesis of the control logic, MSM, and external

counter. A fully functional behavioral Verilog model was written and simulated,

compiled to RTL, and finally synthesized using Silicon Ensemble.

Even using Nanosim instead of SPICE, the simulation and verification of the

complete TDC is challenging because it requires both a long total simulation time

(many microseconds) as well as a small simulation step size (around 10-20 ps) in

order to capture the TDC critical path accurately. In order to solve this problem,

the TDC was simulated in two parts. First, a miniature TDC with a very short

critical path was simulated with high resolution to test the critical path. Second, the

complete TDC was simulated at lower resolution to test the overall logical operation

while loosing some of the accuracy in the critical path simulation. The combination

of these two simulations fully verified the functionality of the TDC.

6.5 Additional Qualitative Verification

In order to obtain a qualitative measurement of the settling dynamics of the PLL,

it may be useful to include other measurement circuits in addition to the TDC. For

example, placing a unity gain buffer at the VCO control voltage, low-pass filtering

the output, and sending it off chip would allow the observation of the low-frequency

94

component of the locking transient and provide useful information about the damping

and acquisition time. Observing the low-pass filtered output of the XOR of the clocks

generated by two PLL instances would allow qualitative determination of whether

the two loops had arrived at steady-state relative to each other. Finally, though

some timing information would be lost, synchronously dividing the outputs by 4

or 8 and buffering them out to a pad would provide a way to view a real-time,

frequency-proportional representation of the circuit operation. While the TDC is

clearly still required, these additional measurement methods would provide valuable

extra information to inform the test process.

6.6 Conclusions

On-chip measurement provides a method for determining the skew between two in-

stances of the PLL to within 40 ps. While there are more accurate methods, such as

arbiter-array skew/jitter measurement [24] and optical skew measurement, the TDC

provides the appropriate combination of range, resolution, and complexity for this

application. The additional test methods provide a simple way to obtain a complete

qualitative and intuitive view of the circuit operation to supplement the quantitative

results provided by the TDC.

95

96

I_�_____�_�

Chapter 7

Conclusions

This thesis presents a complete design and simulation of an optical PLL clock dis-

tribution system using a current-steering optical-electrical phase detector. The con-

tribution of the work, however, also includes insights into the present and future

advantages and challenges of optical clock distribution.

7.1 Summary

An optical-electrical PLL for clock distribution was designed through the layout stage,

extracted, and simulated. The optical current-steering phase detector proposed and

implemented provides direct phase comparison by using the PLL feedback clocks to

steer the photocurrent in order to deliver a current-mode error signal to the loop filter

and drive the PLL towards lock. This phase detector and PLL take the place of a

traditional transimpedance amplifier optical receiver and thereby eliminate a circuit

block that is known to introduce unacceptable levels of skew and jitter.

The phase detector detects the phase difference between the local electrical clock

and the global optical reference by using the state of the divided electrical feedback

clock to determine whether to add or subtract the optical input current from the loop

filter. The resulting change in voltage on the loop filter provides negative feedback to

the VCO and forces the signals to synchronize. This charge steering method provides

simple phase detection, not phase-frequency detection, so the PLL acquisition range

97

and stability is of critical importance. Complete analysis and simulation at the circuit

level show that the final PLL design has sufficient acquisition range to acquire lock

from any possible initialization voltage and that the loop has well-damped dynamics.

Various topologies from the literature were studied in order to find the most

suitable circuits for use in each of the required PLL subcomponents. A standard LC

VCO topology [11] was chosen to generate the 1.6 GHz local clocks, both because

the clocks must have low jitter and because the PLL uses a simple phase detector

and must therefore use a VCO with relatively low tuning range. In order to minimize

skew and jitter, the feedback divider should be differential and should ideally be

implemented in a single synchronous stage. However, even high-speed SCL divider

stages optimized for division speed by embedding the required synchronous divide

logic into the first latch could not provide full-swing outputs at 2 GHz as required

unless resistive loading were used in place of cross-coupled PMOS loads. The risks

of this type of topology outweigh the advantages for this application and the divider

was therefore implemented as a divide-by-two prescaler followed by a synchronous

divide-by-four circuit.

A time-to-digital converter (TDC) was implemented to provide on-chip skew char-

acterization capability. The critical path was designed at the circuit level and imple-

mented with custom layout, while the control state machine was written in Verilog

and synthesized. This TDC provides 40 ps resolution over a 5 ns sample window.

7.2 Simulation Results

The majority of interesting simulation results and findings pertain to the optical-

electrical phase detector block and the optical-electrical PLL as a whole. Although

the current-steering phase detector appears after initial analysis to provide a high-

accuracy alternative to transimpedance amplifier receivers, further simulations pro-

vide insight into several second-order effects that introduce unacceptable levels of

skew between instances of the PLL. Feedback amplifier gain error and CMOS switch

resistance account for the majority of the phase offsets in the initially proposed phase

98

detector. The nominal common-mode offsets from quadrature lock introduced by

these factors are merely inconvenient, but the temperature and process sensitivity

of these offsets results in the skew exceeding reasonable specifications. While the

feedback amplifier problem is topology-specific and may be mitigated by using a

more complex and accurate amplifier, the switch resistance problem is fundamental.

Any phase detector topology based on the concept of steering photocurrents with an

electrical feedback signal will be limited by the reality that switching photocurrents

requires the use of CMOS switches and that the on-resistance of these switches is

extremely process and temperature dependant.

A simple and elegant small-signal phase detector topology capable of leveraging

the precision of a global optical clock to generate low-skew local electrical clocks may

still be discovered. However, most topologies will be adversely affected by the same

process and temperature induced variations that limit the performance of the tran-

simpedance amplifiers they attempt to replace. While it may still be possible to find a

topology that is somehow immune to these effects, designers should realize that such

a topology may not exist and prepare for that possibility by simultaneously devoting

some effort to implementing explicit process and temperature variation compensation

and cancellation within known optical-electrical conversion circuits.

7.3 Future Work

Despite current limitations of optoelectronic receiver circuitry, optical clocks retain

their inherent potential for high-accuracy global clock distribution. Advances in op-

toelectronics, circuits, and system architectures may converge to make optical clock

distribution on microprocessors feasible in the future.

7.3.1 Optoelectronics

The field of integrated optoelectronic systems is advancing rapidly and a variety

of new and improved integrated components will be available to optoelectronic IC

designers within the next decade. High-speed photodiodes have already been demon-

99

strated in custom silicon processes, research efforts to self-assemble high-speed, non-

silicon photodiodes onto silicon wafers are underway, and waveguide matching has

improved dramatically in the past several years. The photodiode advances are par-

ticularly critical as they will improve the of integrated photodiodes to reasonableC

levels for both current-steering phase detectors and RC limited transimpedance am-

plifier systems.

The progress of integrated optical modulators is particularly interesting due to the

potential application in a current-steering phase detector. An optical modulator is an

electrically controlled component capable of modulating the intensity of an incoming

optical signal. One method of achiving this modulation is to split the incoming optical

power and route it through two parallel optical phase shifters with variable phase

relationships. The signals will add constructively when the phase shifters have equal

delay and cancel when they provide a relative phase shift of 7r. The proposed phase

detector effectively completes this modulation in the electrical domain though resistive

CMOS switches and this process is the source of much of the skew introduced by the

phase detector. Implementing phase detection in the integrated optical domain could

provide a path around the fundamental problem of skew generation by the CMOS

switches due to process and temperature variations. This type of system has been

demonstrated in the discrete domain and recent advances in silicon optical modulator

technology may facilitate an integrated version of this solution in the future [25].

7.3.2 Circuits

The VCO, frequency divider, and TDC blocks were implemented with the best known

circuit topologies given time and complexity constraints, but because the phase de-

tector and the overall system were the primary focus of this work, these supporting

blocks were not fully optimized. A LC VCO was used to minimize jitter and re-

duce VCO tuning range, but the implementation of many inductors on a processor

is undesirable from an area perspective. While more complex to implement, low-

jitter self-biased delay element based oscillators would also provide suitable jitter

performance and their tuning range can be limited by employing current-starving

100

__

techniques. For the 1.6 GHz prototype chip implementation, the divider was imple-

mented as a divide-by-two prescaler followed by a synchronous divide-by-four block.

At frequencies approaching 10 GHz, where optical clocking will become even more at-

tractive and higher divider ratios may be required, the speed of synchronous dividers

will likely fall even farther behind the local clock frequency. Further investigation

of synchronous topologies to meet this challenge and of skew and jitter robust asyn-

chronous divider stages will result in improved performance of future optical PLL

implementations. Finally, the implemented TDC has a resolution of 40 ps. While

this is acceptable for basic characterization of skew in this system, future low-skew

systems will require higher accuracy measurement to fully characterize circuit perfor-

mance. Existing methods with high-resolution, however, also have a short sampling

window [24]. Therefore, new circuits for high-resolution measurement of skew over a

reasonable measurement window should be developed.

7.3.3 Complete System

If the photodiode capacitance and switch resistance problems are resolved by ad-

vances in optoelectronics and a new skew-resistant phase detector is developed, an

optical PLL clock distribution system will have significant advantages over a tran-

simpedance amplifier clock distribution system. The local clocks in the PLL system

are generated from local low-jitter sources, whereas the transimpedance amplifier re-

lies on converting the optical signal to create the local clocks and introduces jitter as

well as skew in this process. Because either of these systems would at least initially be

situated relatively high in the H-tree distribution, the total skew-reduction capability

of a transimpedance amplifier system would be limited by the introduction of skew at

these lower levels. In the PLL system, however, the feedback clock is chosen directly

from the gate level so the system can also compensate for any skew generated in the

process of buffering the VCO clocks.

101

7.4 Conclusion

Although the results obtained in this work were limited by the performance of in-

tegrated optical components, optical clocks nevertheless have significant potential to

deliver high-speed, high-accuracy global timing signals. Over the past decade, opti-

cal signaling schemes have been employed in progressively smaller scale applications.

Techniques that originated for use only in long-haul optical networks are now applied

in optical backplanes for high-performance computers. The continued shrinking and

integration of optical components facilitates feasible solutions to challenges created

by ever increasing bandwidth requirements.

Clock speeds on microprocessors have not yet reached speeds that absolutely man-

date optical clock distribution, but as speeds continue to increase there will inevitably

be a point where electrical clocks can no longer meet the performance challenges and

a radical solution will be required. If optoelectronic components continue to become

more integrated and optical signaling is extended into even smaller systems, the inte-

grated optoelectronics technology may very well advance fast enough to make optical

clock distribution feasible before electrical clock distribution fails.

102

Bibliography

[1] M.J. Kobrinsky, B. Block, J. Zheng, B. Barnett, E. Mohammed, M. Reshotko,

F. Robertson, S. List, I. Young, and K. Cadien. On-chip optical interconnects.

Intel Technology Journal, 8(2):128-142, May 2004.

[2] I.A. Young, J.K. Greason, and K.L. Wong. A PLL clock generator with 5-110

MHz of lock range for microprocessors. IEEE Journal of Solid-State Circuits,

27(11):1599-1607, November 1992.

[3] J.M. Rabaey, A.P. Chandrakasan, and B. Nikoli&. Digital Integrated Circuits: A

Design Perspective. Prentice Hall Electronics and VLSI Series. Pearson Educa-

tion, Upper Saddle River, NJ, second edition, 2003.

[4] S. Tam, S. Rusu, U.N. Desai, R. Kim, J. Zhang, and I. Young. Clock generation

and distribution for the first ia-64 microprocessor. IEEE Journal of Solid-State

Circuits, 35(11):1545-1552, November 2000.

[5] S. Tam, R.D. Limaye, and U.N. Desai. Clock generation and distribution for

the 130-nm Itanium 2 processor with 6-MB on-die L3 cache. IEEE Journal of

Solid-State Circuits, 39(4):636-642, April 2004.

[6] D.A.B. Miller. Rationale and challenges for optical interconnects to electrical

chips. Proceedings of the IEEE, 88(6):728-749, June 2000.

[7] A. Bhatnagar, C. Debaes, R. Chen, N.C. Hellman, G.A. Keeler, D. Agarwal,

H. Thienpoint, and D.A.B. Miller. Receiverless clocking of a CMOS digital

103

circuit using short optical pulses. In The 15th Annual Meeting of the IEE Lasers

and Electro-Optics Society, volume 1, pages 127-128. IEEE, November 2002.

[8] B. Razavi, editor. Monolithic Phase-Locked Loops and Clock Recovery Systems.

IEEE Press, New York, 1996.

[9] B.D. Clymer and J.W. Goodman. Timing uncertainty for receivers in optical

clock distribution for VLSI. Optical Engineering, 27(11):944-954, November

1988.

[10] J.G. Maneatis. Low-jitter process-independent DLL and PLL based on self-

biased techniques. IEEE Journal of Solid-State Circuits, 31(11):1723-1732,

November 1996.

[11] D.D. Wentzloff. Design and layout of LC VCO core. Unpublished, 2003.

[12] A. M. Niknejad. ASITIC: Analysis of spiral inductors and transformers for ICs.

http://rfic.eecs.berkeley.edu/ niknejad/asitic.html, 2004.

[13] Travis L. Simpkins. Active optical clock distribution. Master's thesis, Mas-

sachusetts Institute of Technology, Department of Electrical Engineering and

Computer Science, May 2002.

[14] The MOSIS Service. Wafer electrical test data and SPICE model parameters.

http://www.mosis.com/Technical/Testdata/tsmc-018-prm.html, 2004.

[15] J.D. Schaub, R. Li, S.M. Csutak, and J.C. Campbell. High-speed monolithic

silicon photoreceivers on high resistivity and SOI substrates. IEEE Journal of

Lightwave Technology, 19(2):272-278, February 2001.

[16] S.M. Csutak, J.D. Schaub, W.E. Wu, R. Shimer, and J.C. Campbell. High-speed

monolithically integrated silicon photoreceivers fabricated in 130-nm CMOS

technology. IEEE Journal of Lightwave Technology, 20(9):1724-1729, September

2002.

104

[17] M. Yang, K. Rim, D.L. Rogers, J.D. Schaub, J.J. Welser, D.M. Kuchta, D.C.

Boyd, F. Rodier, P.A. Rabidoux, J.T. Marsh, A.D. Ticknor, Q. Yang, A. Upham,

and S.C. Ramac. A high-speed, high-sensitivity silicon lateral trench photode-

tector. IEEE Electron Device Letters, 23(7):395-397, July 2002.

[18] M. Yang, K. Rim, D. Rogers, J. Schaub, J. Welser, D. Kuchta, and D. Boyd.

A CMOS-compatible high-speed silicon lateral trench photodetector. In Device

Research Conference, pages 153-154. IEEE, June 2001.

[19] Q. Ouyang and J.D. Schaub. High speed lateral trench detectors with a junction

substrate. In Device Research Conference, pages 73-74. IEEE, June 2003.

[20] D. Ahn, J. Michel, K. Wada, and L.C. Kimerling. Waveguides and

integrated photodetectors for on-chip optical clock signal distribution.

http://photonics.mit.edu/research/2003/opt clock.html, 2003.

[21] R. Sarpeshkar. Adaptive photoreceptor: 6.376 lecture. MIT 6.376 Lecture Notes,

October 2003.

[22] S. Sidiropoulos and M. Horowitz. Current integrating receivers for high speed

system interconnects. In Proceedings of the IEEE Custom Integrated Circuits

Conference, pages 107-110. IEEE, May 1995.

[23] J. Yuan and C. Svensson. High-speed CMOS circuit technique. IEEE Journal

of Solid-State Circuits, 24(1):62-70, February 1989.

[24] V. Gutnik and A.P. Chandrakasan. On-chip picosecond time measurement. In

Symposium on VLSI Circuits Digest of Technical Papers, pages 52-53. IEEE,

June 2000.

[25] A. Liu, R. Jones, L. Liao, D. Samara-Rubio, D. Rubin, O. Cohen, R. Nicolaescu,

and M. Paniccia. A high-speed silicon optical modulator based on a metal-oxide-

semiconductor capacitor. Nature, 427(12):615-618, February 2004.

105

Documents

PLL-Based Active Optical Clock Distribution