9
IEEE TRANSACTIONS ON ADVANCED PACKAGING, VOL. 23, NO. 3, AUGUST 2000 521 Design of a High Speed Processor System Bus for Notebook Computers Pochang Hsu, Yanmei Tian, and Gerald Pasdast Abstract—A 133 MHz processor system bus (PSB) has been de- signed and developed for notebook computer systems running at core frequency of 500 MHz and beyond based on an enhanced gunning transceiver logic. We described the design flow and high- lighted the design challenges unique to notebook computers. It is shown that with careful I/O circuit design, transmission line anal- ysis and reliability consideration, the design target can be achieved. The similar approach can be applied to notebook computers with even higher bus frequency. Index Terms—Gunning transceiver logic, notebook computer, processor system bus. I. INTRODUCTION P ROCESSOR system bus (PSB) is the bus between CPU and core logic chipset (with system memory controller) in a computer system. It consists of data, address and control signals. With the increase of clock frequency (processing capability) of the current microprocessor, the data transaction between CPU and core logic and that between core logic and system memory become the bottleneck of the entire system performance. Take Intel’s Pentium ® II based desk-top PC system as an example, with the CPU core frequency running at over 400 MHz, the pro- gression from a 66 MHz to a 100 MHz processor system bus will result in an increase in data transaction throughput from 528 to 800 Mbytes/sec between core logic and system memory to ful- fill the needs of higher CPU processing capability at core. One of the implementation choices of PSB is the gunning transceiver logic (GTL) technology [1]. GTL has been designed and used as a lower voltage swing transceiver suitable for trans- mission line environment. The technology does not need to handle the protocol issue to avoid the floating bus situation (when no bus agent is driving) as encountered in traditional CMOS bus, therefore it is suitable for a computer system which needs to be scaled with multiple processors. The technology has also been used extensively in uni-processor (UP) system including lap-top or desktop personal computers (PC). General discussion of processor system bus design based on GTL technology for desktop personal computers can be found in [2]. The focus of the literature has been on interconnect de- sign and layout guideline based on larger form factor and less costly mother board design. The trade-off between cost and per- formance is the main driving force in the desktop world. For notebook computers, the performance is still important how- ever it is limited by the amount of heat it can effectively dissi- Manuscript received December 22, 1998; revised March 3, 2000. The authors are with the Mobile Computing Group, Intel Corporation, Santa Clara, CA 95052 USA. Publisher Item Identifier S 1521-3323(00)04621-9. Fig. 1. Shortest and longest PSB traces in a typical notebook computer motherboard. To satisfy the hold time requirement, a layout designer often has to find extra space to work around the short trace routing. pate. To meet the thermal constraint, a mobile processor usu- ally has higher junction temperature, lower core voltage and slower performance. For a mini-notebook (sub-notebook) com- puter system, the power constraint is tighter and therefore re- sults in an even lower processor core voltage and frequency. Since the scaling of processor core voltage has certain effect on I/O circuit’s characteristics, one needs to pay attentions to its impact on the timing requirement of processor system bus when designing a notebook computer. In addition to lower pro- cessor core voltage, notebook computers most likely come with smaller form factors than desktop systems. The processor and chipset are placed much closer to each other as compared to its desktop configuration. In the situation like this, one may expect the PSB traces can be really too short to meet hold time require- ments. To illustrate the scenario, the longest and shortest PSB traces of a typical notebook computer motherboard are high- lighted in Fig. 1. In this particular design, the length of the ser- pentine shaped shortest trace is almost doubled than its original dimension in order to meet hold time requirement. The other differences between notebook and desktop computers related to PSB design include substrate stack-up, topology and GTL+ de- signs, etc. For example, desktop PC’s usually use 4 layer board technology (two signals and two references) whereas a notebook PC requires 6 to 8 layers of substrate technology depending on the design. In this paper, we describe the design and development of a 133 MHz common clocked GTL+ (a variation of GTL with higher termination voltage at 1.5 V versus 1.2 V) processor 1521–3323/00$10.00 © 2000 IEEE

Design of a high speed processor system bus for notebook computers

  • Upload
    g

  • View
    214

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Design of a high speed processor system bus for notebook computers

IEEE TRANSACTIONS ON ADVANCED PACKAGING, VOL. 23, NO. 3, AUGUST 2000 521

Design of a High Speed Processor System Bus forNotebook ComputersPochang Hsu, Yanmei Tian, and Gerald Pasdast

Abstract—A 133 MHz processor system bus (PSB) has been de-signed and developed for notebook computer systems running atcore frequency of 500 MHz and beyond based on an enhancedgunning transceiver logic. We described the design flow and high-lighted the design challenges unique to notebook computers. It isshown that with careful I/O circuit design, transmission line anal-ysis and reliability consideration, the design target can be achieved.The similar approach can be applied to notebook computers witheven higher bus frequency.

Index Terms—Gunning transceiver logic, notebook computer,processor system bus.

I. INTRODUCTION

PROCESSOR system bus (PSB) is the bus between CPUand core logic chipset (with system memory controller) in a

computer system. It consists of data, address and control signals.With the increase of clock frequency (processing capability) ofthe current microprocessor, the data transaction between CPUand core logic and that between core logic and system memorybecome the bottleneck of the entire system performance. TakeIntel’s Pentium® II based desk-top PC system as an example,with the CPU core frequency running at over 400 MHz, the pro-gression from a 66 MHz to a 100 MHz processor system bus willresult in an increase in data transaction throughput from 528 to800 Mbytes/sec between core logic and system memory to ful-fill the needs of higher CPU processing capability at core.

One of the implementation choices of PSB is the gunningtransceiver logic (GTL) technology [1]. GTL has been designedand used as a lower voltage swing transceiver suitable for trans-mission line environment. The technology does not need tohandle the protocol issue to avoid the floating bus situation(when no bus agent is driving) as encountered in traditionalCMOS bus, therefore it is suitable for a computer system whichneeds to be scaled with multiple processors. The technologyhas also been used extensively in uni-processor (UP) systemincluding lap-top or desktop personal computers (PC).

General discussion of processor system bus design based onGTL technology for desktop personal computers can be foundin [2]. The focus of the literature has been on interconnect de-sign and layout guideline based on larger form factor and lesscostly mother board design. The trade-off between cost and per-formance is the main driving force in the desktop world. Fornotebook computers, the performance is still important how-ever it is limited by the amount of heat it can effectively dissi-

Manuscript received December 22, 1998; revised March 3, 2000.The authors are with the Mobile Computing Group, Intel Corporation, Santa

Clara, CA 95052 USA.Publisher Item Identifier S 1521-3323(00)04621-9.

Fig. 1. Shortest and longest PSB traces in a typical notebook computermotherboard. To satisfy the hold time requirement, a layout designer often hasto find extra space to work around the short trace routing.

pate. To meet the thermal constraint, a mobile processor usu-ally has higher junction temperature, lower core voltage andslower performance. For a mini-notebook (sub-notebook) com-puter system, the power constraint is tighter and therefore re-sults in an even lower processor core voltage and frequency.Since the scaling of processor core voltage has certain effecton I/O circuit’s characteristics, one needs to pay attentions toits impact on the timing requirement of processor system buswhen designing a notebook computer. In addition to lower pro-cessor core voltage, notebook computers most likely come withsmaller form factors than desktop systems. The processor andchipset are placed much closer to each other as compared to itsdesktop configuration. In the situation like this, one may expectthe PSB traces can be really too short to meet hold time require-ments. To illustrate the scenario, the longest and shortest PSBtraces of a typical notebook computer motherboard are high-lighted in Fig. 1. In this particular design, the length of the ser-pentine shaped shortest trace is almost doubled than its originaldimension in order to meet hold time requirement. The otherdifferences between notebook and desktop computers related toPSB design include substrate stack-up, topology and GTL+ de-signs, etc. For example, desktop PC’s usually use 4 layer boardtechnology (two signals and two references) whereas a notebookPC requires 6 to 8 layers of substrate technology depending onthe design.

In this paper, we describe the design and development of a133 MHz common clocked GTL+ (a variation of GTL withhigher termination voltage at 1.5 V versus 1.2 V) processor

1521–3323/00$10.00 © 2000 IEEE

Page 2: Design of a high speed processor system bus for notebook computers

522 IEEE TRANSACTIONS ON ADVANCED PACKAGING, VOL. 23, NO. 3, AUGUST 2000

Fig. 2. Topology of a GTL based processor system bus in a typical notebook computer system.

system bus for notebook computers. The focus of this paper willbe at the pre-layout stage of the design. Unique considerationsfor mobile systems will also be highlighted. At the pre-layoutdesign stage, the ultimate goal is to make enough design trade-offs in relatively shorter cycles or iterations. Studies done at thisstage will speed up the transition to final design (post layout)stage during which the PSB traces are actually decided and laidout. The pre-layout design will thus need to generate reason-able timing budget, highly confident I/O circuit design, prelim-inary I/O and clocking design specifications and preliminaryPSB trace layout guideline. All these intermediate results won’tbe finalized until silicon and systems are actually built, manu-factured and validated. But the expectation here is that we willneed much less iterations since each cycle is going to be longerand more expensive.

In the next section, the PSB topology used in notebook com-puters is described. Based on the topology, we define and workout the timing budget for the processor system bus. We willcover the GTL circuit design for mobile systems in Section IV.Pre-layout and signal quality studies are described in Section V.Section VI draws the conclusion of this paper.

II. PROCESSORSYSTEM BUS INTERCONNECT ANDTOPOLOGY

A typical GTL based processor system bus for a notebookcomputer system is shown in Fig. 2. The packages of processorand core logic chip set are not shown in the figure for simplicity.As opposed to the dual edge terminated (DET) scheme usuallyseen for a multiple loads system in servers or desktop PC’s, anotebook computer has only two loads, the processor and thecore logic chipset. Particularly, a single edge terminated (SET)pull-up resistor in Fig. 2) for a single bit of data/addressis used in mobile systems to save power and cost and is usu-ally placed at the processor side of the interconnection. Sincethe external resistor introduces a stub in the interconnection andoccupies board territory, the pull-up resistor can also be imple-mented in the silicon so that an ideal point-to-point interconnectscheme can be achieved. For our case of discussion, we have as-sumed the termination resistor is on the processor silicon.in Fig. 2 is the reference voltage of input sense amplifier of the

GTL. The clock source used to synchronize the data transferalong with the buses is from an external common clock gen-erator and thus the clocking method is usually called commonclocking scheme. As shown in Fig. 2, T-topology is usually usedfor clock net routing to minimize the clock driver’s output pinto pin skew.

III. PSB TIMING BUDGET

In order for the system bus to perform a fully functionalsignal transaction between processor and the core logic chipset,the setup and hold time specifications of the receiving latchesneed to be met. Since the processor system bus is bi-directional,the receiver can be at either end of the processor or core logicchipset. The timing budget among the components (in this casethe processor and the core logic chipset), interconnect, clockingneeds to be allocated and balanced appropriately. The timingrelation of a two load GTL+ PSB is governed by

Setup Margin

(1)

Hold Margin

(2)

For reader’s convenience, we summarize the meaning of eachparameter in the equations below although a similar descriptioncan be found in [2]

maximum clock to output of the GTL;minimum clock to output of the GTL;maximum flight time;minimum flight time;minimum required time specified to setup beforethe clock;minimum specified input hold time;clock period of the processor system bus. Forexample, the clock period is 10 ns for a 100 MHzprocessor system bus;maximum variation between components re-ceiving the same clock edge;maximum clock edge to edge variation.

Page 3: Design of a high speed processor system bus for notebook computers

HSU et al.: DESIGN OF A HIGH SPEED PROCESSOR SYSTEM BUS 523

TABLE IINITIAL TIMING BUDGET OF A133 MHz PSB

Equations (1) and (2) apply to both rising and falling edgesof the signal propagation since GTL output waveform is usu-ally unsymmetrical due to its open-drain circuit structure andtermination resistor.The maximum and minimum values of

(such as , and ac-count for the silicon process and circuit board design and manu-facturing skews or corners. For example, is the value ofclock to output when the silicon is at a fast corner (lower tem-perature and higher voltage) And means the boardtrace has smaller characteristic impedance and the trace lengthis the longest. This will give you the worst case (longest) flighttime which is used to decide if the setup margin will be met.

Readers should refer to [2] for the detailed definition of flighttime since it is commonly misunderstood as the time requiredfor a signal to travel from one end of the interconnect to theother. In fact, a better way to interpret its physical meaning is thetotal delay of the interconnect plus loads add to the componenttiming.

We will explain how we design a 133 MHz PSB startingfrom this section with an initial timing budget in the format ofa spreadsheet as shown in Table I. The timing budget will berefined in Section IV.

Readers should be able to verify the setup and hold margin inTable I using (1) and (2) with additional timing information

Clock driver jitter nsPCB clock skew nsClock period ns (133 MHz)

The initial timing budget of each parameter in the timingtable comes from different owners. It is derived from previouslyworking systems based on the older technology (e.g., 100 MHzPSB, 0.25 m processor, etc.) and then takes into account thedifference between that and the newer technology (e.g., 0.18mprocessor). In other words, it is the best “guess” of the 133 MHzPSB timing for a common notebook computer. The clock driverjitter, PCB clock skew and the flight time estimation are theinputs from system designers based on clock chip technology,clock net routing, PCB technology and PSB trace length esti-mation. Usually we will try to keep this portion of budget asthe same as previous design unless I/O designs from the com-ponents (can be CPU, core logic or clock chip) really need toloosen their budget (requirements). I/O circuit designers on theother hand provide the AC timing of the processor and core

Fig. 3. Top-level block diagram of the GTL+ buffer.

logic. We will explain one way of resolving the hold marginalityproblem in Table I in the next section.

To sum up, timing budgeting is a planning and negotiationprocess. We use the process to identify the parameters in thetiming table that we can further improve to make the PSB work.If the bottleneck is on the I/O design side for example, then thetiming margin may need to be absorbed by interconnect andclocking on the mother board or vice versa. The final budgetallocated for I/O circuits, clocking and interconnects in a similartiming table as Table I will later become the design target orspecification for circuit and system designers. Also note that thecircuit analysis and system simulations are necessary even forinitial timing budget.

IV. GTL CIRCUIT DESIGN

This section describes the GTL I/O circuit designs for bothprocessor and core logic chip set for the processor system bus.The details of the I/O circuits are usually transparent to the busor board designers however they play a very important role for asuccessful design. This section will focus more on the featuresprovided by the circuit design that differentiate the requirementsfor mobile and desktop PC’s. Fig. 3 shows a top level blockdiagram of the GTL+ buffer in the processor. One of the keyelements implemented in the GTL buffer of the microprocessoris the circuit used to adjust the delay of the output signal (whichaffects in (1) and (2)) to the core logic chip set. The delaytuning circuit provides a solution to resolve the potential holdtime marginality problem for a notebook computer platform dueto its short traces by adding delay to the entire signal’s validtiming. However the designers have to make sure that there isenough setup margin for the processor system bus in doing so.As shown in Fig. 4, the delay tuning circuit consists of a chain ofdelay stages just before the pre-driver stage of the GTL+. Eachstage adds an equal amount of delay to the output (data out inthe Fig. 4). The actual number of delay stages is dependent onthe range needed for operation over all platforms and PCBdesigns.

The ideal setup/hold adjustment circuit would reduce inthe slow corner of operation (slow silicon, low supply voltageand high operating temperature) for improved setup margin andat the same time would increase in the fast corner of oper-ation (fast silicon, high supply voltage and low operating tem-perature) for improved hold margin. Designing such a delay el-ement is difficult and usually not possible given the rangeneeded for 133 MHz operation. The next best approach is tohave a delay element that has as little propagation delay varia-tion across all operating corners as possible. Fig. 5 shows the de-sign of the individual delay elements. The delay chain itself is aseries of muxed inverters in series. Each delay circuit stage alsohas a delay compensation element that helps reduce the delayvariation of the stage across corners. Since MOS capacitance

Page 4: Design of a high speed processor system bus for notebook computers

524 IEEE TRANSACTIONS ON ADVANCED PACKAGING, VOL. 23, NO. 3, AUGUST 2000

Fig. 4. Programmable delay tuning circuit implemented in the GTL to adjustthe output signal delay.

Fig. 5. Delay stage design with delay compensation circuit for helping reducedelay variation over process and operating conditions.

varies less over operating conditions as compared to transistorconductance, the operation of the delay compensation circuit be-comes feasible. Therefore, in the fast corner, the delay inverters“see” more capacitance due to the strong conducting passgateand thus slow down. In the slow corner, the passgate is weakerthus the inverters “see” less capacitance on the output node andso they do not get slowed down as much. An inverter delay byitself in a chain can vary as much as 3–3.5With the simplepassive delay compensation circuit shown here, delay variationcan be reduced to less than 2With the help of this circuit, lesssetup margin is given up when increasing for hold fixes.

The delay tuning circuit in Fig. 4 is controlled by a 3 to 8decoder and the input bits are programmable through fuse ormetal options within the processor. If all the delay stages areselected, seven more times of delay can be added than the defaultcase which the data only goes through Delay stage 1 as shownin the figure. Each stage adds 100 ps to min and 200 ps to

(due to process, voltage and temperature variations).

TABLE IIPCB TIMINGS OF SYSTEM AFTER APPLYINGT ADJUSTMENT

Recall Table I in the previous section, due to mobile havingshorter PCB traces, you probably have noticed that the hold tim-ings in both directions (processor driving chipset and vice versa)are negative. Table II shows the timings after adjustmentsare made (both on the processor and chipset sides). The holdmargin was 0.06 ns when the processor drove the chipset. Tofix the timing, a fuse setting change was made to add 100 ps ofdelay to the processor (one delay bit). Notice that the setupmargin was 0.61 ns and got reduced to 0.41 ns since the delaycircuit has a 2 delay variation from the fast to the slow corner.Had no delay compensation circuit been added, the variationwould have been 3 and the setup margin would have been 0.31ns. Similarly, the hold margin was0.38 ns when the chipsetdrove the processor. After adding 400 ps to the chipset(fourdelay stages) the hold time was fixed. The setup margin was re-duced from 0.9 ns to 0.1 ns (four delay stages add 800 ps ofdelay in the slow corner).

For mobile systems, a special pre-driver is designed into theGTL to ensure signal quality and power saving. Fig. 6 showsthe GTL design in the core logic. When the core logic isconfigured to be used for mobile systems, a “mobile” signal isasserted (it is permanently asserted for mobile parts). This willensure mobile pre-driver is used in GTL. The pre-driver designis shown in Fig. 7. The pre-driver is a series of tristate invertersconnected in parallel. There is a global active compensationcircuit that distributes compensation bits to all the individualbuffers. The purpose of this circuit is to turn on or off indi-vidual legs of the pre-driver depending on the process and op-erating conditions of the chip. Thus it compensates for process,voltage and temperature. More pre-driver legs are turned on inthe slow corner and less when in the fast corner. The individualpre-driver legs are binary weighted and thus four compensationbits can provide 16 different strength levels. The benefits ofthis consideration are two folds. First of all, it helps to reducepower (the saturated current for the pull-down n-MOS devicecan be around 150 mA and the nonsaturated current 30–90mA less than those of desktop GTL) for the I/O devices. Sec-ondly the slower edge rate due to the weaker pre-driver createsless overshoot and undershoot without sacrificing the timingrequirements at the receiving end of the processor system bus.This helps the oxide stress at the receiver of processor GTL.Our previous experiences showed that the signal quality andoxide integrity would not be acceptable at the processor side

Page 5: Design of a high speed processor system bus for notebook computers

HSU et al.: DESIGN OF A HIGH SPEED PROCESSOR SYSTEM BUS 525

Fig. 6. GTL design in the core logic.

Fig. 7. Pre-driver design.

of the mobile GTL bus if the desktop pre-driver was used inthe core logic.

A closer look at the circuitry in Fig. 6 also reveals that toimprove the signal quality and to assure the reliability of theGTL in the core logic, early clamping circuits are included forboth ground and power paths. It is used to dampen the overshootor undershoot effects with respect to ringback and helps toprevent the incoming signal to ringback into the voltage region

between and For ground clamp, the “nbias” signalin Fig. 6 is biased close to the threshold voltage of the n-MOSdevice. It helps to absorb energy of the input waveform as itgoes below 0 V (undershoot). For power clamp, the “pbias”signal is biased around is the thresholdvoltage of the p-MOS device. Due to the process, temperatureand voltage variation, tradeoff of the leakage current of theseclamping structures versus clamping benefits need to be made.For notebook computer PSB design, we usually turn off theseclamping circuits when core logic GTL is driving. When corelogic GTL is receiving, the ground clamp is enabled and thatwill help the undershoot. Again, this conclusion is from ourearly assessment of the PSB design. By doing this, we savearound 20–50 mW of power which can further help to increasethe battery life of a notebook computer.

V. INTERCONNECTDESIGN

From timing perspective, we have come up with a preliminarysolution that addressed the hold time violations in the previoussection. We need to further translate the flight time constraintsin the timing equations such as the ones in Table II to actualtrace design including the specification of trace length range forminimal and maximum traces of PSB. Unlike timing budget,signal integrity issues such as overshoot and undershoot can notbe addressed with simple spreadsheet approach unless detailedsystem level circuit simulation is performed. Overshoot and un-dershoot are particularly critical for deep sub-micron devices

Page 6: Design of a high speed processor system bus for notebook computers

526 IEEE TRANSACTIONS ON ADVANCED PACKAGING, VOL. 23, NO. 3, AUGUST 2000

Fig. 8. Model of the system bus used for flight time simulation.

Fig. 9. Stack-up of a typical circuit board design for high-performance laptopcomputers.

since it directly impact the oxide wear out of the I/O circuits. Inthis section, we will build a system model and use circuit simu-lator such as SPICE [3] to characterize the flight time and noiseof the PSB. At the pre-layout design stage, because our goal isto conduct as many trade-off studies as possible, it is necessaryto minimize the simulation run time without sacrificing the ac-curacy. An approach such as SPICE simulation is thus preferredsince we can combine the transistor level I/O circuit model di-rectly with PSB transmission line model for simulation. Thisway we don’t have to generate behavior I/O models such as IBIS[4] every time we modify or optimize our I/O circuits. Similarlyfor oxide wear out study, unless we use circuit simulator such asSPICE, we won’t be able to observe the overshoot and under-shoot at the I/O pads which are at the device levels.

A. Flight Time, Trace Length Constraints and Pre-LayoutRouting Guideline

To determine the trace length constraints (in this case our PSBpre-layout guideline), we have built a system model as shown inFig. 8 to simulate the flight time. The PSB traces are representedby three coupled and lossless transmission lines so that coupledeffects such as cross talk, etc. can be taken into account. Thepackages of the processor and the core logic chip set are alsoincluded in the system model. Also in this model, we assumethat the GTL+ pull up termination resistance is implemented onthe processor as an active device and thus is not shown in Fig. 8.The floating nodes in this simulation model are used to charac-terize the waveforms when GTL+ is driving a test (reference)

load. The value at the end of core logic will be the sameas that implemented in the processor.

Signal transactions can happen in two different directions andboth are considered in the simulation, i.e. (1) processor drivingcore logic chipset and (2) core logic driving processor. The sim-ulation includes both rising and falling edges since the edgerates are different and that will affect the flight time measure-ment. The length and spacing of the traces between the pro-cessor and the core logic chipset are varied to study their impacton flight time. In this particular study, the trace length rangedfrom 1 cm to 15 cm for three different trace to trace space: 4 mil,8 mil and 12 mil. Fast corner models for the I/O, package, sub-strate, and PCB are used for minimum flight time simulationsand slow corner models for maximum flight time simulations.

The layer stack of the printed circuit board used for our studyis shown in Fig. 9. This is in general a minimal required stack-upfor a notebook computer as opposed to a four layered boardused in a desktop system. It is a six layered FR-4 board withtwo signal layers of micro-strip line, two signal layers of stripline and two reference layers for power supply. The width of thecopper trace is 4 mils for all the signal layers. The thickness ofthe copper trace/plane and FR-4 core or pre-preg of the stack-upfrom top to bottom is 0.709 mil, 2.362 mil, 0.709 mil, 3.937 mil,0.709 mil, 3.937 mil, 0.709 mil, 2.362 mil, and 0.709 mil, re-spectively. It is further assumed that the dimensional variation(due to manufacturing, material properties, etc.) is negligible.The micro-strip line represents the fast corner of the intercon-nect since its characteristic impedance is higher ( 63.5 ).The strip line (L3 and L4) then represents the slow corner of theinterconnection ( 47 ).

To cover the entire range of possible solution space, eightsets of simulations which include permutations of process cor-ners, falling/rising edges and buffer driving/receiving directionneed to be performed. However, if only the extreme corners ofthe interconnect need to be simulated for validation’s purpose,the number of simulations can be reduced to eight. Since inthis case, the fast corner of the interconnect corresponds to theshortest trace (1 cm) with high impedance (63.5). And theslow corner of the interconnect corresponds to the longest trace(15 cm) with the small characteristic impedance (47). Oneof the eight simulations showing the dependency of flight timewith respect to spacing and trace length is shown in Fig. 10. Thisis the case when core logic is driving the processor starting withfalling edge (the nMOS pull down of the GTL is turned on first)

Page 7: Design of a high speed processor system bus for notebook computers

HSU et al.: DESIGN OF A HIGH SPEED PROCESSOR SYSTEM BUS 527

Fig. 10. A 3-D graph shows the flight time and trace length relationship at different trace spacing when core logic drives processor starting with falling edge. Itis assumed that the process and interconnect are at slow corner.

Fig. 11. Flight time simulation for different trace length. This is the case thatchipset is driving the processor at the falling edge and slow corner. The tracespacing is 4 mil.

and the silicon and interconnect are all at the slow corner. Its2-D representation at space equals to 4 mil is shown in Fig. 11.

To determine the trace length constraints, we derive the fol-lowing equations from (1) and (2)

Setup Margin

(3)

Hold Margin

(4)

If we design into zero setup and hold margin, then the tracelength that corresponds to will the upper bound, andthe trace length that corresponds to will be lowerbound of our trace length constraints. We will use the constraintsas the pre-layout routing guideline. To illustrate the concept fur-ther, let’s use the results in Table II to derive the constraints. Forsetup case, when core logic drives, is 1.16 ns andsetup margin 0.1 ns and thus we have 1.26 ns. Ifwe assume the trace-to-trace spacing is 4 mils, from Figs. 10and 11, we can find out the trace length can not exceed 14.5 cm.Of course, this resulting trace length constraint needs to be com-pared with other three different scenarios at the slow corner, i.e.,

1) Processor driving core logic, rising edge.2) Core logic driving processor, rising edge,.3) Core logic driving processor, falling edge.

And the shortest trace length among the four situations willbe the final upper bound of the longest trace we can route onthe mother board based on 4 mil trace-to-trace spacing. Pleasealso note that, in Fig. 10 the impact of cross talk on flight timebecomes more severe when the coupled length is longer andspacing is smaller (4 mils for example). Similarly the shortesttrace length (lower bound) that can be routed to meet hold timerequirement will be derived from four other different simulationsets and their i.e.

1) Processor driving core logic at rising edge.2) Processor driving core logic at falling edge.3) Core logic driving processor at rising edge.4) Core logic driving processor at falling edge.

Page 8: Design of a high speed processor system bus for notebook computers

528 IEEE TRANSACTIONS ON ADVANCED PACKAGING, VOL. 23, NO. 3, AUGUST 2000

Fig. 12. The waveform observed at the GTL+ input pad of the processor whenchipset is driving.

Assuming in these four cases, the processor, core logic, and in-terconnection are all at the fast corner.

B. Signal Quality and Reliability

Before we recommend the trace length constraints derived inSection V-A as pre-layout guideline to system layout designers,we need to verify the overshoot and undershoot at the I/O padsdo not violate the reliability criteria. The signal quality on theovershoot and undershoot at the input pad of the receiver di-rectly impacts the reliability and life time of the I/O devices.Device failure such as gate oxide breakdown [5] is due to theoverstress or understress at the NMOS or PMOS device gateover time. The time to failure (TTF) is defined as certain times(such as 1.5 in a 0.25 m CMOS technology) of increase ingate leakage current and is heavily dependent on process tech-nology [6].

Gate oxide breakdown is assumed to follow quasi-static be-havior, meaning that the oxide damage due to voltage stress isaccumulative and can be integrated over time. The detailed nu-merical method and formula used to integrate the input wave-form at the die pad is beyond the scope of this paper. Generallyspeaking after analyzing the overshoot and undershoot at theinput pad if the TTF passes a pre-determined threshold (100 000hours for example) set by a certain process, then that particularcircuit passes gate oxide integrity requirements. If it does notpass, voltage overshoot or undershoots needs to be reduced bytechniques such as tuning the driving strength of the I/O cir-cuit or having better impedance matching on the interconnectdesign, etc. Besides ac analysis (integrating the input waveformfor a certain period of time), strict dc rules are usually set bythe process technology as well. For example, the peak voltageat the gate must be less than or equal to a certain voltage (2.1 Vfor example for 1.5 V I/O supply in our study) at any instanceof time. Also if a transistor is biased in a dc mode, the voltageacross gate oxide can not exceed a certain pre-setvoltage (such as 2.1 V for a 0.18m technology) determined bya process technology.

Fig. 12 shows a waveform observed at the processor’s inputpad of the GTL+ bus when chipset is driving. It was simulatedbased on an extremely fast condition: both processor and chipsetare at fast corner, characteristic impedance of interconnectequals to 64 and trace length 1 inch. In this case, the under-shoot at the receiving pad of processor (0.1 V) did not exceed

2.1 V hard limit for MOS device with its source tied to 1.5 VI/O supply voltage 1.6 V). The in-house reliability simu-lation also predicts that the current bus design passes a pre-deter-mined TTF threshold for the process technology used. A similaranalysis needs to be performed at the chipset side of the GTL+input pad to make sure the overshoot or undershoot meet the re-quirement of oxide integrity. However, the criteria may be dif-ferent if different process technology is used for chipset.

VI. CONCLUSION

The design flow of a high speed GTL+ bus for notebookPC’s has been described. The unique problems of a mobilesystem have been addressed. GTL+ design has been opti-mized for mobile applications. An example has also beengiven to illustrate the timing budget process and to explainthe derivation of pre-layout guideline based on system levelcircuit simulation. The approach discussed in this paper canbe extended for system bus design at even higher frequencysuch as 166/200 MHz if designers pay attentions to the balanceof timing budget among I/O components, interconnect andclocking. Source synchronous clocking scheme [7] can alsobe considered for future PSB design. The scheme in generalreduces clock skew and hence improves the bus frequency.Even though the technology itself requires more sophisticateddevelopment in the chip and interconnects areas, it is the futuretrend of PSB design to enhance PC’s performance.

ACKNOWLEDGMENT

The authors would like to thank D. Alexander and R. G. An-derson for providing board layer stack in this study and S. Natuand M. Pitta for their support.

REFERENCES

[1] B. Gunning, L. Yuan, T. Nguyen, and T. Wong, “A CMOS low-voltage-swing transmission line transceiver,” inProc. 1992 IEEE Int. Solid-StateCircuits Conf., pp. 58–59.

[2] Pentium® II Processor GTL+ Guidelines, Application Note, AP-585:Intel Corp., 1997, May.

[3] P. W. Tuinenga,A Guide to Circuit Simulation and Analysis UsingPspice. Englewood Cliffs, NJ: Prentice-Hall, 1992.

[4] I/O Buffer Information Specification (IBIS), Version 3.2, 1999, Jan. 15,.[5] H. B. Bakoglu, Interconnections, and Packaging for VLSI, Ch.

2. Reading, MA: Addison-Wesley, 1990.[6] P. Aminzadeh, private communication, 1998.[7] S. Dabral and T. Maloney,Basic ESD and I/O Design. New York:

Wiley, 1998, ch. 4.

Pochang Hsureceived the Ph.D. degree in electricalengineering from the University of Arizona, Tucson,in 1993.

He joined Intel Corporation, Santa Clara, CA, in1995 and was a Circuit Designer for its 0.35�mand 0.25�m Pentium II processor families. He iscurrently managing a circuit design group in MobileComputing Group for I/O, clocking, low power,and other emerging technologies used in mobilecomputing systems. Prior to joining Intel, he workedon the electrical design of high performance ASIC

packaging at LSI Logic Corporation, Milpitas, CA.

Page 9: Design of a high speed processor system bus for notebook computers

HSU et al.: DESIGN OF A HIGH SPEED PROCESSOR SYSTEM BUS 529

Yanmei Tian received the B.S. degree in electronicengineering from Tsinghua University, Beijing,China, in 1991 and the M.S. degree in electrical andcomputer engineering from University of Illinois,Urbana-Champaign, in 1995.

She worked as a Research Assistant in the Coordi-nated Science Lab, University of Illinois. Since then,she has been with the Mobile Computing Group, IntelCorporation, Santa Clara, CA, where she is currentlya Senior Component Design Engineer. She workedon the design and verification of Pentium® II micro-

processor. She also worked on the design and development of SRAM and lowvoltage differential signaling (LVDS) buffers for core logic chipsets. Her pro-fessional interests are in the design of high speed, mixed signal, analog, anddigital integrated circuits for various applications.

Gerald Pasdastreceived the B.S.E.E. degree fromSan Jose State University, San Jose, CA, in 1996.

He worked at Cypress Semiconductor, SantaClara, CA, for one year as a Product Engineer. Hethen joined Intel Corporation, Santa Clara, in 1997as a Circuit Design Engineer concentrating on highspeed I/O circuits and bus design. He is currentlyworking on the second generation IA-64 processor.