Ch5 Process Variation

Embed Size (px)

Citation preview

  • 8/3/2019 Ch5 Process Variation

    1/21

    1

    Chapter 5

    Finding Peak Performance in a ProcessProcess Variation and Operating Conditions

    D. G. Chinnery, K. KeutzerDepartment of Electrical Engineering and Computer Sciences,

    University of California at Berkeley

    One of the traditional advantages of using an ASIC design methodologywas that the ASIC designer was insulated from semiconductor processingissues. By restricting the circuit design to use only the cells offered in theASIC library, and by using the golden tool flow provided by the ASICvendor in the ASIC Design Kit, a designer could be assured that if the toolssaid that the IC would run at a certain speed then the manufactured partwould as well. As the delay of integrated circuits began to be impacted bywiring delay, the simplicity of this approach was compromised. It becameimpossible to determine the final delay of the circuit without layoutinformation. A number of different approaches were taken to address thisproblem, but many ASIC design groups resolved to take over theresponsibility for physical design as well.

    In our investigation of the sources of performance differences between

    ASIC and custom designs, semiconductor processing proved to be one of themost significant factors. ASIC designers pay a high price in circuitperformance for the luxury of ignoring processing details. This Chaptergives a tutorial introduction to the many sources of performance variation insemiconductor processing. One approach to modelling performancevariation and reducing it is discussed in Chapter 14 (Orshansky Chapter).

    Semiconductor companies have their own implementations of what isgenerally known as the same process generation, such as 0.18 micron. Inaddition to differences in their ability to manage variation in the criticaldimensions, there can be other significant differences betweenimplementations of a process by semiconductor companies: whetheraluminium or copper is used for the metal interconnect; the dielectricconstant of the insulator to reduce crosstalk between interconnect; the

  • 8/3/2019 Ch5 Process Variation

    2/21

    2 Chapter 5number of metal layers; gate oxide thickness; and silicon-on-insulator orbulk CMOS. Thus there can be differences of 40%, or more, between thespeeds of the fastest chips produced in one foundry versus the fastest chipsproduced in another (see Section 3). Also an individual semiconductorcompany may have several different versions of the same processtechnology, and the highest clock frequency chips produced in a fabrication

    plant may be 20% to 40% faster then the slowest chips produced. (Thisestimate was based on the spread between fast and slow process conditionsreported in [1] and communications with Professor Costas Spanos, ProfessorChenming Hu and Michael Orshansky at UC Berkeley.) Section 2 discussesdifferences in process technology within a company.

    Thus we see there are numerous sources for performance variation of ICdesigns arising from processing-related issues. Based on the variation inperformance of chips produced within a process technology, a foundryreleases models for worst-case, typical, and high-speed performance. Thebehaviour of the circuitry also depends on the operating conditions.

    1. PROCESS AND OPERATING CONDITIONSStandard cell libraries for a given foundry are characterized for several

    different process and operating conditions. The operating conditions aredefined by the nominal supply voltage and operating temperature. Theprocess conditions correspond to the speed of chips fabricated based onmodels provided by the foundry: slow (worst case), typical, and fast.

    ASICs synthesized for slow process conditions will give a high yield atthe post-layout clock frequency. There will be less chips of a higher clockfrequency corresponding to fast process conditions. ASIC vendors areultimately paid for packaged parts that work at the required speed, so ASICdesigners must design for process conditions that will ensure a high yield.Thus ASIC designers typically assume worst case process conditions.

    The specification for the operating conditions varies. Slow operatingconditions are at a lower supply voltage and higher temperature, typically100C or 125C. Sometimes 90% of the nominal supply voltage VDD is usedfor the worst case supply voltage. Note that 90% ofVDD is about what maybe caused by variations in supply voltage in normal operation (e.g. due toground bounce) [Andrew Chang, personal communication]. Similarly,100C to 125C are typical worst case operating temperatures for someembedded applications [8]. Thus slow operating conditions may be quiterealistic and not pessimistic it depends on the application. Typical

    operating conditions are often specified at the nominal supply voltage and

  • 8/3/2019 Ch5 Process Variation

    3/21

    5. Finding Peak Performance in a Process 3room temperature (25C). Fast operating conditions are at a higher supplyvoltage and below room temperature.

    Whether operating conditions are optimistic or pessimistic depends onthe actual range of conditions within which the chips will be used.

    The conditions for worst-case power consumption are different to worstcase speed conditions (worst case process and worst case operating

    conditions). The worst-case dynamic power consumption is at the highestoperating clock frequency (e.g. at fast process, high supply voltage and lowtemperature conditions). The worst-case leakage power is at highertemperature and higher supply voltage.

    1.1 Supply VoltageChapter 4, Section 5, has a more detailed discussion of the energy

    dissipated by gates. We present the essential information here to discuss thesupply voltage operating condition.

    Consider the impact of supply voltage on the drive current. Thesaturation drive currentIDSAT is given by [46]

    (1) )( THDDDSAT VVBL

    WI =

    Where W is the effective channel width and L is the effective channellength of an equivalent inverter. VDD is the supply voltage to the gate, andVTH is the threshold voltage of the transistors. B and are empiricallydetermined constants, where is between 1 and 2 depending on thetechnology.

    From (1), the drive current, and hence circuit speed, can be increased byincreasing the width of transistors, the supply voltage, or reducing thethreshold voltage. Gates with wider transistors and higher drive strength are

    provided in a standard cell library, and are utilized in synthesis and layout toreduce the delay on critical paths (Section 3.5 of Chapter 4 discussesstandard cell libraries).

    Increasing the supply voltage increases the drive current, but the dynamicpower dissipation Pdynamicdischarging or charging a capacitance C, switchingat frequencyftoggle, is [63]

    (2) toggleDDdynamic fCVP 2= When static CMOS gates switch, there is also dynamic power

    consumption due to short-circuit current. When the gate switches, there is a

    period of time when both the pull-up and pull-down networks are conducting

    and there is current from the supply to ground through the gate. Short-circuitcurrent is not significant if the input rise (fall) time is close to the output fall

  • 8/3/2019 Ch5 Process Variation

    4/21

    4 Chapter 5(rise) time [12], and is less than 10% of the dynamic power consumption ifthe input and output transition times are similar in a well-designed processtechnology [14]. Therefore the dynamic power is dominated by thecapacitance charging and discharging power consumption for a well-designed circuit.

    Higher supply voltage increases the drive current, speeding up the circuit,

    at the expense of increased dynamic power consumption. The charge storedon a capacitance C in the circuit is CVDD, so the increase in speed withvoltage is roughly linear or worse, as the increase in drive current withsupply voltage is at best quadratic. For typical high-speed operatingconditions (not low voltage for low power) in 0.13um and 0.18um, thedependence of the circuit speed on voltage is approximately linear [20][57][61]. From (2), the increase in power consumption is quadratic.

    The transistor threshold voltage can also be decreased, but there is staticpower consumption due to the subthreshold leakage current Ioff, whichdepends exponentially on the threshold voltage [54]:

    (3) subTH VVoff kI /10= Where kis the reverse saturation current [63], and Vsub is the subthreshold

    slope in mV/decade. It is common for high-speed designs to take advantageof low threshold voltage on critical paths. The exponential dependence ofleakage current on threshold voltage limits reduction in the thresholdvoltage, and static power dissipation is becoming of increasing concern [54].

    The transistor threshold voltages are controlled by the threshold-adjustimplants in processing. Generally ASIC designers have only several choicesof standard cell library threshold voltage (e.g. a designated low thresholdvoltage library corresponding to high speed and high leakage, and adesignated slower library with higher threshold and lower leakage). By

    modifying the threshold adjust implantation in the process, custom designerscan optimize the transistor threshold voltages for the design [54]. As supplyvoltage is not as limited by the foundrys process, ASIC designers canchoose a fairly wide range of supply voltages for a given standard celllibrary, to trade off performance versus power.

    1.2 TemperatureA higher operating temperature increases wire resistance. It also changes

    the transistor threshold voltage and decreases the carrier mobility, increasingthe resistance of transistors and decreasing the drive current [13]. Thushigher temperatures reduce circuit speed.

    The theoretical limit for the subthreshold slope Vsub is

    (1) 60/10lVsub == qkToge mv/decade at room temperature

  • 8/3/2019 Ch5 Process Variation

    5/21

    5. Finding Peak Performance in a Process 5

    Where k is Boltzmanns constant, T is the temperature, and q is thecharge of an electron. In general, the subthreshold slope is proportional tothe temperature. Thus increasing the temperature also increases static powerconsumption, because the leakage increases as the subthreshold slope isproportional to the temperature [41]. The leakage current can increase by afactor of a hundred as the temperature increases from 25C to 125C [19].

    While the chip temperature may be lower than worst case operatingconditions, 125C is a conservative estimate for modelling standard celllibrary delays, which allows for spot temperatures that are higher than theaverage temperature on the chip.

    Operating temperature is limited by the ambient temperature andavailable cooling methods [8]. The ambient temperature near a car engine is165C. Many embedded applications are limited by low cost or batterylifetime (if portable) and cannot afford to use cooling devices, operating atup to 125C with passive cooling. A higher power chip will have higheroperating temperature, as more heat is dissipated.

    ASICFrequency(GH

    z)

    Technology(um

    )

    Voltage(V)

    Temperature

    (C)

    Operating

    Conditions

    Process

    Conditions

    IncreaseinClock

    Frequnecy

    Tensilica Xtensa (Base) 0.150 0.25 2.30 125 worst worst

    0.175 0.25 2.30 125 worst typical 17%

    0.250 0.25 2.50 25 typical typical 43%

    Tensilica Xtensa (Base) 0.200 0.18 1.60 125 worst worst

    0.250 0.18 1.60 125 worst typical 25%

    0.320 0.18 1.80 25 typical typical 28%

    Lexra LX4380 0.360 0.13 *1.20 125 worst worst

    0.420 0.13 *1.20 125 worst typical 17%

    Lexra LX5280 0.180 0.18 1.62 125 worst worst

    0.230 0.18 1.62 125 worst typical 28%

    Lexra LX5280 0.266 0.13 0.90 125 worst worst

    0.330 0.13 0.90 125 worst typical 24%

    LX5380 0.360 0.13 worst worst

    0.420 0.13 worst typical 17%

    Table 1. ASIC clock frequency variation with process and operatingconditions as specified [7][31][32]. *The worst case operatingvoltage of the LX4380 was not known, this is the nominal voltage.

  • 8/3/2019 Ch5 Process Variation

    6/21

    6 Chapter 51.3 Comparison of ASIC Process and Operating Conditions

    Table 1 lists the clock frequency for some high speed ASIC CPUs, fordifferent technologies, and different process and operating conditions. Theresults reported for Lexra and Tensilica show 17% to 28% difference inclock frequency comparing worst case and typical process conditions. For

    typical process conditions of the Tensilica Xtensa Base configuration, thedifference between worst case operating conditions and typical operatingconditions is 28% and 43%.

    We can further quantify the difference between process conditions byexamining variation in clock frequencies from foundries for more expensivecustom processors. The cost overhead for testing chips at different clockfrequencies is worthwhile if you can sell the faster chips at higher prices.Generally, ASIC chips are sold for fairly low prices per chip. Costs of $0.25for speed testing parts have been reported [Professor Borivoje Nikolic,personal communication]. Thus ASIC vendors typically cant afford theadditional cost per chip of testing at different clock frequencies.

    After testing at several different clock frequencies, chips are speed

    binnedinto groups of chips that have the same valid operating frequency.

    mask with four dice

    die

    field of

    view

    wafer

    optics focus the mask

    onto the field of view

    three steps of the fieldof view across the wafer

    mask with four dice

    die

    field of

    view

    wafer

    optics focus the mask

    onto the field of view

    three steps of the fieldof view across the wafer

    Figure 1. A simplified diagram of the mask, field of view, and die. The fieldof view is stepped across the wafer by stepper motors.

    2. CLOCK FREQUENCY VARIATION OF CHIPSFROM CUSTOM FOUNDRIES

    A large foundry may have several fabrication plants producing chips inthe same process technology, but the implementation of the same process at

    the fabrication plants may differ. Within a fabrication plant, there aremultiple production lines processing silicon wafers to produce chips in the

  • 8/3/2019 Ch5 Process Variation

    7/21

    5. Finding Peak Performance in a Process 7same technology. There are sometimes differences between a foundrysfabrication plants for the same process technology, though to controlvariability it is advantageous to ensure implementations of a process are assimilar as possible [34].

    To process a wafer, light is shone through a mask then focused by opticsonto the wafer to expose selected areas of photoresist, as shown in Figure 1.

    Each die on a mask has the complete pattern of photoresist to be exposedcorresponding to an individual chip. Within the field of view through theoptics, the area of several die is exposed simultaneously. Steppers then moveto the next location to expose the wafer with the same mask. After the entirewafer has been exposed, the area left unprotected by the photoresist isprocessed. The sequence is repeated to process the next layer [65].

    Process variation can occur within a plant for multiple reasons, includingnon-uniform ion implantation or photoresist exposure, or lack of uniformityin oxidation, diffusion, or polishing. Variation between the production linescan occur because of differences in masks, steppers, and the optics at eachline. There can be variations between each exposed field on a wafer, because

    of slight changes in the distance between the wafer and the optics, due to thesteppers or the wafer not being perfectly flat. Varying illumination and lensaberrations can cause intra-field variation on the same wafer [68], leading toa large intra-field variation of effective channel length and speed. The intra-field variation can cause a 15% to 20% difference in ring oscillator speedsacross the field [37]. Optical proximity effects will also cause identicalpatterns on the mask to be printed differently, because of optical interferencefrom different neighbouring patterns [68]. Chapter 14 discusses the impactof process variation on circuit performance in greater detail.

    These process variations cause the delays of wires and gates within a chipto vary, and chips are produced with a range of working speeds. Some chipswith minor imperfections will only operate correctly at slower speeds. We

    can estimate the total variation in clock speed caused by these processfactors, by looking at the range of clock frequencies of custom processors.

    Table 2 lists Intel and AMD x86 chips at the time of introduction of anarchitecture or technology from 0.35um to 0.13um. The difference between

    the slowest and fastest clock frequencies available at the time of introductionis up to 47%. Disregarding the four bold cases where higher supply voltageswere used to achieve higher clock frequency, the difference between slowestand faster frequencies is up to 38%.

    For marketing purposes, AMD and Intel will sometimes bin processors toslower clock frequencies than necessary for correct operation, and thesechips are easily over clocked [64]. Hence the range of maximum operating

    clock frequencies may be less. The variation also decreases as the processmatures, as factors causing sub-optimal performance are more tightly

  • 8/3/2019 Ch5 Process Variation

    8/21

    8 Chapter 5controlled. The Pentium II (Deschutes) was sold at clock frequencies of266MHz (25% slower!) and 300MHz to meet demand, after initially beingreleased with at 333MHz [52]. The Athlon (K75) was initially available onlyat 750MHz [2], and was sold at a slower frequency later. The Pentium 4(Northwood) 1.6A has been over-clocked from its nominal clock frequencyof 1.6 GHz to 2.2 GHz [23], which is expected given Intels claims of

    0.13um circuits being up to 65% faster than in 0.18um [29].

    Intel Dateof

    Introduction

    Technology(um)

    SlowestClock

    Frequency(GHz)

    CoreVoltage(V)

    FastestClock

    Frequency(GHz)

    CoreVoltage(V)

    Differencein

    Frequnecy

    Pentium II (Klamath) 1997/05/07 0.35 0.233 2.80 0.300 2.80 29%

    Pentium II (Deschutes) 1998/01/26 0.25 0.333 2.00 0.333 2.00 0%Pentium III (Katmai) 1999/02/26 0.25 0.450 2.00 0.500 2.00 11%

    Pentium III (Coppermine) 1999/10/25 0.18 0.500 1.60 0.733 1.65 47%

    Pentium III (Tualatin) 2001/07/18 0.13 1.000 1.48 1.200 1.48 20%

    Pentium 4 (Willamette) 2000/11/20 0.18 1.300 1.70 1.500 1.70 15%Pentium 4 (Northwood) 2002/01/07 0.13 1.600 1.50 2.200 1.50 38%

    AMD

    K6 (Model 6) 1997/05/07 0.35 0.166 2.90 0.233 3.20 40%

    K6 (Little Foot - Model 7) 1997/01/06 0.25 0.233 2.20 0.266 2.20 14%K6-2 (Chompers - Model 8) 1998/05/28 0.25 0.266 2.20 0.333 2.20 25%K6-2 (Chompers - Model 8 CXT) 1998/11/16 0.25 0.300 2.20 0.400 2.20 33%

    K6-3 (Sharptooth - Model 9) 1999/02/22 0.25 0.400 2.40 0.450 2.40 13%Athlon (K7) 1999/07/09 0.25 0.500 1.60 0.650 1.60 30%Athlon (K75) 1999/11/29 0.18 0.750 1.60 0.750 1.60 0%

    Athlon (Thunderbird) 2000/06/05 0.18 0.750 1.70 1.000 1.75 33%Athlon (Palomino) 2001/06/05 0.18 1.000 1.75 1.200 1.75 20%

    Table 2.

    Clock frequencies of custom chips at the official introduction dateof a new model, in different technologies [24][25][35][36]. Thefirst version of each architecture is indicated in bold. If the corevoltages of slow and fast versions differ, those voltages areindicated in bold.

    Trying to remove any bias caused by speed binning to lower speeds thannecessary, we can look at the first processor introduced in each architecture,listed in bold in Table 2 (the K6-2 and K6-3 may or may not be consideredsufficiently different from the K6 to include in this group, but the range isnot any larger by including these). Then disregarding the K6 (Model 6)where a higher supply voltage was used to achieve higher clock frequency,

    the difference between slowest and faster frequencies is up to 30%.

  • 8/3/2019 Ch5 Process Variation

    9/21

    5. Finding Peak Performance in a Process 9

    Process

    Technology(um)

    ITRSPrinted

    GateLength(um)

    ITRSPhysical

    GateLength(um)

    IntelGateLength

    (um)

    AMDGateLength

    (um)

    0.25 0.200 0.200 0.180 0.1600.18 0.140 0.120 0.100 0.1000.13 0.090 0.065 0.060 0.080

    Table 3. Comparison of reported Intel and AMD gate lengths with theInternational Technology Roadmap for Semiconductors [17][20][21][40][48][47][49][57].

    2.1 Process Improvements in a Technology GenerationBeyond minimizing process variation, there are also improvements to the

    process technology and the designs within a given process generation. TheNational and International Technology Roadmaps for Semiconductorspredicted printed gate lengths for a given process technology, but Intel andAMD have aggressively scaled gate lengths beyond this, as shown in Table3. To correspond with the effective gate lengths from aggressive scaling, theroadmap has been updated to include the more aggressive scaling of theeffective gate length.

    Intel Process Process

    Technology(um)

    OriginalGate

    Length(um)

    Imp

    rovedGate

    Length(um)

    Speed

    Imp

    rovement

    Driv

    eCurrent

    Imp

    rovement

    P856 to P856.5 0.25 0.18 18% 21%P858 to P858.5 0.18 0.14 0.10 10% 11%P860 0.13 0.07 0.06 10%

    Table 4. Intels process speed and drive current improvements withreductions in gate width [9][10][20][57][61][67] . The 0.13umprocesses will show further improvements beyond what is detailedhere as the technology matures.

    Table 4 details publications of Intels technology scaling within eachgeneration, beyond the initial gate length in the process [16]. Intel's 0.25um

    P856 process, dimensions were shrunk by 5%, and along with othermodifications, this gave a speed improvement of 18% in the Pentium II [10].

  • 8/3/2019 Ch5 Process Variation

    10/21

    10 Chapter 5The 11% improvement reported in 0.18um was for the speed of a ringoscillator [20].

    AMD has released less information about its process technology. AMDs0.25um process had 0.16um effective gate length [21], and reportedly AMDhas been aggressively scaling drawn gate length from 0.12um to 0.10um inits 0.18um process technology [17]. AMD has produced 0.18um Athlon

    (Thunderbird) chips with aluminium interconnect at Fab 25 in Austin, andfaster copper interconnect at Fab 30 in Dresden [3]. Thus AMD also hassubstantial variation within a process generation.

    Table 5 illustrates the total range between slowest and fastest clockfrequencies of processors sold by Intel and AMD. There is up to 126%difference between the slowest and fastest chips sold for a specific designwithin the same process generation, but this is not indicative of only processimprovements. This difference is due to several reasons: some chips beingbinned to slower clock frequencies; higher supply voltage; processimprovements, such as using copper rather than aluminium, or using lowerthreshold voltages to increase gate drive strength; and some of the

    improvement is from changes in the design to improve critical paths.

    Intel Technology(um)

    SlowestClock

    Frequency(GHz)

    CoreVoltage(V)

    DateFastest

    Introduced

    FastestClock

    Frequency(GHz)

    CoreVoltage(V)

    Differencein

    Frequnecy

    Pentium II (Klamath) 0.35 0.233 2.80 1997/05/07 0.300 2.80 29%

    Pentium II (Deschutes) 0.25 0.333 2.00 1998/08/24 0.450 2.00 35%Pentium III (Katmai) 0.25 0.450 2.00 1999/08/02 0.600 2.05 33%

    Pentium III (Coppermine) 0.18 0.500 1.60 2001/07/30 1.130 1.80 126%

    Pentium III (Tualatin) 0.13 1.000 1.48 2002/01/08 1.400 1.45 40%

    Pentium 4 (Willamette) 0.18 1.300 1.70 2001/08/27 2.000 1.75 54%Pentium 4 (Northwood) 0.13 1.600 1.50 2002/01/07 2.200 1.50 38%

    AMD

    K6 (Model 6) 0.35 0.166 2.90 1997/05/07 0.233 3.20 40%

    K6 (Little Foot - Model 7) 0.25 0.233 2.20 1998/04/07 0.300 2.20 29%K6-2 (Chompers - Model 8) 0.25 0.266 2.20 1998/08/27 0.350 2.20 32%K6-2 (Chompers - Model 8 CXT) 0.25 0.300 2.20 2000/02/22 0.550 2.30 83%

    K6-3 (Sharptooth - Model 9) 0.25 0.400 2.40 1999/02/22 0.450 2.40 13%Athlon (K7) 0.25 0.500 1.60 1999/10/04 0.700 1.60 40%Athlon (K75) 0.18 0.750 1.60 2000/03/06 1.000 1.80 33%

    Athlon (Thunderbird) 0.18 0.750 1.70 2001/06/06 1.400 1.75 87%Athlon (Palomino) 0.18 1.000 1.75 2002/01/07 1.667 1.75 67%

    Table 5. Comparison of slowest clock frequencies of custom chips at theofficial introduction date of a new model versus the fastestproduced [24][25][35][36]

  • 8/3/2019 Ch5 Process Variation

    11/21

    5. Finding Peak Performance in a Process 112.2 Custom versus ASIC Operating Conditions

    It is difficult to infer the impact of increasing supply voltage on the speedof PC processors. The faster processors will be able to run at the clock speedof the lower speed bins with lower supply voltage, but some of the chips inthe lower speed bins will need a higher supply voltage to operate correctly.

    For example, when process variation causes transistors to be slower, andhence the design operates at lower clock frequency for the same supplyvoltage. Increasing the supply voltage is a standard technique used whenover-clocking chips [64].

    Temperature is the other important operating condition. Increasing thetemperature of an integrated circuit requires increasing the capability of thepackage to withstand and dissipate heat. With an ambient room temperature(nominally 25C) and appropriate fans and heat sinks, AMD chips have runat operating temperatures in the range of 65C to 95C [35], and Intel chipshave run at 55C to 100C [36]. More exotic cooling devices can be used tolower the temperature and improve performance. KryoTech sells systemswith Freon-based refrigeration, and has shown speed improvements in K6,

    K6-2, Athlon, Pentium Pro, Pentium II and Alpha 21164 chips by 25% to43%, by cooling to 40C and increasing the core supply voltage[30][18][4][50].

    Table 6 lists a variety of cooling devices used for Intel and AMDprocessors. Peltier thermoelectric conductors conduct heat better from the

    chip, but there can be condensation. The Peltier thermoelectric conductorsconsume a substantial amount of power, and the hot side of the conductor ismuch hotter than when using water as a coolant [45]. Water and Freoncoolants can dissipate substantially more heat than air-based coolingapproaches, but are very expensive, requiring pumps and a method ofcooling the recirculated coolant. Water is cooled via radiators exchanging

    heat with the surrounding air. Freon is cooled by the same technique as usedin refrigerators, expanding the Freon from a liquid to a gas to cool it.Condensation causes device failure, and solutions to avoid condensation areexpensive, thus cheaper water and Freon cooling methods operate at roomtemperature.

    Exotic cooling techniques, such as Freon refrigeration, have regularlybeen used only in very expensive mainframes, such as the Cray [15]. Theperformance improvement does not generally justify the cost for use ofrefrigeration even in high-performance CPUs, as a 40% speed improvement

    is out-paced by improvements in modern processor speeds within a year. Ifthe refrigerated processor can be upgraded without replacing therefrigeration system, then the cost of purchasing the refrigeration system can

    be amortized over a longer time period. However, Peltier devices and

  • 8/3/2019 Ch5 Process Variation

    12/21

    12 Chapter 5refrigeration systems also have high power consumption, which must befactored into the cost.

    Both heat sinks and fans significantly increase the overall system cost byaround $10 to $50 in the current market. This cost increase will only beacceptable to a few ASIC designs, as many ASICs sell for as little as severaldollars. Cost-conscious ASIC designs will be limited to plastic packaging,

    with only passive power dissipation, dissipating around 2 Watt for hand-heldappliances [8].

    Cooling Method Cost Coolant PowerConsumption(W)

    Amb

    ientTemperature(C)

    Chip

    Temperature(C)

    heatsink and fan $10 to $50 air 1 to 4 25 40 to 80Peltier thermoelectric

    conductor, heatsink and fan $50 to $100 air 50 to 80 25 0 to 10recirculated water cooling $150 to $230 water 15 25 40freon vapor phase cooling at

    room temperature $350 to $400 freon 140 25 30freon vapor phase cooling $1,000 to $2,000 freon 160 -50 -40

    Table 6. Costs and temperature ranges of some available cooling devicesfor Intel and AMD processors [5][39][42][44][45][50][51][53][56][58][66]. The power consumption of the water cooler was

    estimated based on the radiator and pump components.

  • 8/3/2019 Ch5 Process Variation

    13/21

    5. Finding Peak Performance in a Process 13

    Intel Tech

    nology(um)

    Phys

    icalGateLength

    (um)

    Core

    Voltage(V)

    n-channelMOSFET

    Thre

    sholdVoltage(V)

    MetalLayers

    Metal

    Dielectric

    Ring

    Oscillator

    Delay/Stage(ps)

    WorstCaseIoff

    (na/u

    m)

    P858.5 0.18 0.10 1.5 0.40 6 Al FSG (3.6) 10 10P860 0.13 0.06 1.4 0.27 6 Cu FSG (3.6) 6 10AMD

    HiP7 0.13 0.08 1.2 9 Cu low-k (< 3.0) 10

    TSMC

    CL018LV 0.18 0.13 1.5 low 6 AlCu low-k 25 1CL015HS 0.15 0.11 1.5 low 7 Cu low-k 14 17CL013HS+ 0.13 0.08 1.2 low 8 Cu FSG/low-k 10 15

    UMC

    L180 0.18 1.8 6 Al 4, Cu 2 27 0.01

    L150 MPU 0.15 0.10 1.5 0.44 7 Al 5, Cu 2 14 7

    L130 MPU 0.13 0.08 1.2 0.36 8 Cu SiLK (2.7) 11 < 10

    IBM 2inputNAND

    delay(ps)

    SA-27E 0.15 0.11 1.8 6 Cu SiO2 33Cu-11 0.11 0.08 1.5 7 Cu low-k 27

    Table 7. A comparison of fastest custom processes from Intel and AMD,with fastest ASIC foundry processes reported from TSMC, UMCand IBM. Blanks appear where data was unavailable. [20][27][57][60][61][62][67]. HiP7 is a silicon-on-insulator process developedby AMD and Motorola [40]; the other processes are bulk CMOS.

    3. VARIATION BETWEEN FOUNDRIESIntel didnt use copper in its 0.18um process, whereas AMD did, which

    will have some impact on performance [22]. In 0.13um both companiesprocesses use copper interconnect; Intels process is bulk CMOS with0.06um gate length and dielectric with relative permittivity of 3.6 [57],

    whereas AMD and Motorola in conjunction have a 0.13um silicon-on-

    insulator process with 0.08um gate length and dielectric with relativepermittivity of less than 3.0 [40]. Copper has lower resistance than

  • 8/3/2019 Ch5 Process Variation

    14/21

    14 Chapter 5aluminium. A dielectric with smaller permittivity reduces the capacitancebetween wires, and thus decreases crosstalk. According to IBM, silicon-on-insulator is 20% to 25% faster, and it is also lower power [28]. Also, smallergate lengths reduce the resistance of a transistor. All these factors affect thespeed and differentiate foundries processes in the same process technologygeneration.

    Table 7 illustrates process differences between ASIC and customfoundries. A ring of inverters oscillating, with their output into the nextinverter, has been used to characterize the speed of most of thesetechnologies. However, the delay of a ring oscillator is independent ofdevice sizing and has negligible interconnect [55], so it is not a particulargood metric for comparing technologies. The difference between aluminiumand copper interconnect, and the choice of dielectric, will not be apparentfrom the ring oscillator stage delay. Furthermore, the process conditions andoperating conditions affect the delay, but what the process and operatingconditions are is not stated.

    Comparing Intels reported ring oscillator stage delays with TSMC and

    UMC, Intels processes have been significantly faster. The custom foundriesin Table 7 have not reported intermediate 0.15um process generations,whereas the ASIC foundries more regularly make technologies available tocustomers as another faster or lower power choice within a generation, oras an intermediate process generation. Considering when ASIC processeshave been available to customers (e.g. IBMs SA27E process has beenavailable since 1999), Intels 0.18um P858.5 process should be compared tocontemporary 0.15um or 0.13um processes, which have between 0% and40% larger ring oscillator stage delay.

    Accurate comparisons are difficult because the delays reported do notaccount for interconnect delay. We can infer that the fastest processes usedmay be up to 20% to 40% faster than the fastest process available from some

    ASIC foundries, by considering the differences due to gate length, dielectric,silicon-on-insulator, and use of copper. Given the variety of techniquesbeing considered and developed for future high-speed processes, we canexpect this range to continue.

    Slower processes will generally be cheaper. ASIC foundries offer severalprocesses within a process generation: slower, low power applications; highdensity processes; and high-speed alternatives.

    3.1 The Effects of Process and Tuning (Contributed byAndrew Chang and William J. Dally)

    Advances in process technology combined with additional tuning canmeasurably increase the performance of a given design. Table 8 summarizesthe benefit of a wide selection of the possible options. The listed options are

  • 8/3/2019 Ch5 Process Variation

    15/21

    5. Finding Peak Performance in a Process 15equally applicable to ASICs and custom chips, though they have firstappeared in custom designs, which can afford to use more expensiveprocesses. For the average ASIC (60 FO4 delays), a 10% improvement inperformance is equivalent to a 6 FO4 reduction in delay.

    Technique Benefit Reference/Notes

    L eff shrinks 10% Hurd [26]Low VTH devices 10% Buchholtz[11]

    SOI 5-25% Anderson [6]

    15% for static

    5% for dynamic

    >20% for pass gates

    Copper + low K dielectric

  • 8/3/2019 Ch5 Process Variation

    16/21

    16 Chapter 5

    Thus synthesizable ASIC designs should not lag custom designs becauseof process improvements and differences in process technology, if the fastestprocess available with standard cell libraries and regular updates isaffordable. This depends on how much the chip can be sold for. Often it ismore profitable to sell a slower chip that is produced in a substantiallycheaper process.

    It is must easier to retarget a standard cell ASIC to a different librarywhen a new technology becomes available. To migrate to a different processgeneration, custom designs require substantial work from a large team ofengineers as they cannot simply target the synthesis tools to a differentstandard cell library.

    In custom designs, individual transistors will need resizing and circuitsmay need to be redesigned to account for the design rules, supply voltage,current, and power not scaling linearly with the process shrink. However,custom optimizations can take full advantage of a process, whereas standardcell libraries may be more conservative.

    5. SUMMARYFigure 2 gives an overview of the speed differences due to operating and

    process conditions. Speeds at best case operating conditions can be a factorof 1.6 faster than speeds at worst case operating conditions. The smallpercentage of chips that are produced at fast process conditions may be afactor of1.4 faster than the speed of chips estimated for worst case processconditions. Thus fast process and fast operating conditions can be more thana factor of 2.2 faster than worst case process and worst case operatingconditions. In addition, there may be a 20% improvement within a processgeneration, and 20% to 40% difference between foundries. Overall due toprocess and operating conditions, the fastest custom processor could be afactor of3 faster than an ASIC produced in an older, slower process, forworst case operating and process conditions.

    This is pessimistic, as operating temperature does not contribute to theclock frequency difference between custom and ASIC designs, if they aredesigned for the same temperature. Comparing only process conditions, thefastest custom processor could be a factor of 2 faster than an ASICproduced in an older, slower process, for worst case process conditions.Careful manual design can create custom designs that are lower power thanASICs, and hence may be able to use a higher supply voltage to increaseperformance compared to an ASIC at the same power consumption and

    operating temperature.

  • 8/3/2019 Ch5 Process Variation

    17/21

    5. Finding Peak Performance in a Process 17

    worstprocess

    fastprocess

    typicalprocess

    Process

    Conditions

    Operating

    Conditions

    125C, lowersupply voltage

    55C to 100C,typical supply

    voltage

    25C, typicalsupply voltage

    17% to 28%

    (Lexra, Tensilica)

    28% to 43%(Tensilica)

    up to 30% atintroduction

    (Intel, AMD),20% intra-field

    variation

    -40C, highersupply voltage

    improvedprocess

    18% Intel P856improvement

    25% to 43%

    (Kryotech)

    worstprocess

    fastprocess

    typicalprocess

    Process

    Conditions

    Operating

    Conditions

    125C, lowersupply voltage

    55C to 100C,typical supply

    voltage

    25C, typicalsupply voltage

    17% to 28%

    (Lexra, Tensilica)

    28% to 43%(Tensilica)

    up to 30% atintroduction

    (Intel, AMD),20% intra-field

    variation

    -40C, highersupply voltage

    improvedprocess

    18% Intel P856improvement

    25% to 43%

    (Kryotech)

    Figure 2. Diagram of speed improvement with better process and operatingconditions for a given process technology generation. Anapproximate process yield curve is shaded at the bottom: typicalprocess conditions correspond to acceptable yield, maybe only10% of processors are fast enough for the fastest speed bin; and afew non-defective chips will only work correctly under the worstcase process conditions. The temperatures listed are the operatingtemperature on the chip, not the ambient surrounding temperature.

    Many embedded designs are used in a wide range of operating conditionsand are limited by the worst operating conditions for which they areexpected to operate correctly. Thus when compared to designs that are

    optimized for a less pessimistic operating point, they will not perform aswell. An advantage of synthesizable designs is that they can be easilyoptimized for different operating conditions, by redoing the synthesis andlayout the price is fabricating different chips for different operatingconditions, which may be too costly.

    Fabrication plants won't offer ASIC customers the top chip speed off theproduction line, as they cannot guarantee a sufficiently high yield for this tobe profitable. The fabrication plant guarantees that they can produce anASIC chip with a certain speed. This speed is limited by the worst speeds offthe production line, but chips capable of faster speeds are produced. Also,

  • 8/3/2019 Ch5 Process Variation

    18/21

    18 Chapter 5ASIC designers may not have access to the best fabrication plants in aparticular technology generation for production of their chips.

    6. CONCLUSIONSTo improve ASIC performance, designers can test chips to see if

    performance is better than worst case process conditions, and higher speedare available at lower yields. ASIC designs are not limited by the worstspeeds off the production line. Produced chips can be tested to verify correctoperation at higher speeds in typical process conditions. This may allow a20% to 30% improvement in speed over worst-case speeds (see Table 1).This requires discussion with the semiconductor foundry and exploration ofthe yield curve for speeds of the given design. Foundries generally requiresign off at worst case process conditions, so that they can guarantee a certainlevel of performance.

    Chips at typical process speeds lag the fastest speeds produced by 20% to

    30%. Random variation in the maximum clock frequency produced in aprocess cannot be accounted for, but systematic intra-field variation can bemodelled and reduced (reference Orshanksy Chapter). Alternatively, if thereis a sufficient premium for faster chips, it can be worthwhile to tests ASICchips at two clock frequencies and speed bin them.

    If operating temperature and power consumption are not limited, then thesupply voltage can also be increased to speed up the chip. Otherwise, it maynot be possible to improve the chips performance by allowing it to operateat a higher temperature, because cooling devices become increasinglyexpensive.

    7. REFERENCES[1] Allen, D. H., et al. Custom circuit design as a driver of microprocessor performance.IBM Journal of Research and Development, Vol. 44, No. 6, November 2000.

    [2] AMD,AMD Athlon Processor Sets New Performance Leadership Mark At 750MHzBuilt On Advanced 0.18 Micron Technology, November 29, 1999.http://www.amd.com/us-

    en/Corporate/VirtualPressRoom/0,,51_104_543_553~713,00.html

    [3] AMD,AMD Showcases High-Powered Workstations At Solidworks World 2001.February 12, 2001. http://www.amd.com/us-

    en/Corporate/VirtualPressRoom/0,,51_104_543_4493~640,00.html

    [4] AnandTech, Kryotech Super G2 Athlon 1.86 GHz. August 2001.http://www.anandtech.com/showdoc.html?i=1515&p=1

    [5] AnandTech.MC1000 & MC2000 Peltier Coolers. August 1999.http://www.anandtech.com/cooling/showdoc.html?i=1012&p=1

  • 8/3/2019 Ch5 Process Variation

    19/21

    5. Finding Peak Performance in a Process 19[6] Anderson, C, et al. Physical Design of A Fourth-Generation POWER GHzMicroprocessor. 2001 IEEE International Solid-State Circuits Conference.

    [7] Archide, R. (ed.).Xtensa Application Specific Microprocessor Solutions OverviewHandbook, a Summary of the Xtensa Data Sheet. February 2000.

    http://www.tensilica.com/dl/handbook.pdf

    [8] Bar-Cohen, A. Thermal Packaging for the 21st Century: Challenges and Options.International Workshop Thermal Investigations of ICs and Systems, Rome, Italy, 10/3-6,

    1999.[9] Bohr, M., et al.A high performance 0.25um logic technology optimized for 1.8 Voperation. Technical Digest of the International Electron Devices Meeting 1996, 847-850,

    960.

    [10]Brand, A., et al.Intel's 0.25 Micron, 2.0 Volts Logic Process Technology, IntelTechnology Journal, Q3 1998. http://developer.intel.com/technology/itj/q31998/pdf/p856.pdf.

    [11]Buchholtz, T. et al. A 660MHz 64b SOI Processor with Cu Interconnects. 2000 IEEEInternational Solid-State Circuits Conference.

    [12]Burd, T.Low-Power CMOS Library Design Methodology, M.S. Report, University ofCalifornia, Berkeley, UCB/ERL M94/89, 1994.

    http://bwrc.eecs.berkeley.edu/Publications/theses/low.power.CMOS.library.MS/

    [13]Chain, K., et al.A MOSFET Electron Mobility Model of Wide Temperature Range (77-400 K) for IC Simulation. Semiconductor Science and Technology, vol.12-4, April 1997. 355-

    358.

    [14]Chatterjee, A., et al. An Investigation of the Impact of Technology Scaling on PowerWasted as Short-Circuit Current in Low Voltage Static CMOS Circuits. International

    Symposium on Low Power Electronics and Design, 1996, 145-150.

    [15]Cray. Cray Inc. History. 2002. http://www.cray.com/company/history.html[16]Chau, R., Marcyk, G. Intel Technology Briefing. December 11, 2000.http://www.intel.com/research/silicon/30nmpress.pdf

    [17]De Gelas, J. AMDs Roadmap. February 28, 2000.http://www.aceshardware.com/Spades/read.php?article_id=119

    [18]Dicarlo, L. Alpha Chills to 767, in PC Week, vol. 15-15, April 13, 1998.http://www.zdnet.com/eweek/acrobat/1998/98pcweek/apr13mb.pdf

    [19]Europractice, UMC 0.18 1P6M Logic process with VST libraries. August 2000.http://www.europractice.imec.be/europractice/on-line-

    docs/prototyping/ti/ti_VST_UMC18_logic.html

    [20]Ghani, T., et al. 100 nm Gate Length High Performance / Low Power CMOS TransistorStructure. Technical Digest of the International Electron Devices Meeting 1999, 415-418.

    [21]Golden, M., et al. A Seventh-Generation x86 Microprocessor. IEEE Journal of Solid-State Circuits, Vol. 34-11, November 1999, 1466-1477.

    [22]Green, P.K. A GHz IA-32 Architecture Microprocessor Implemented on 0.18umTechnology with Aluminum Interconnect, International Solid-State Circuits Conference.

    Digest of Technical Papers, 2000, 98-99.

    [23][H]ard|OCP, Intel Pentium 4 Northwood 1.6A CPU Overclock, February 1, 2002.http://www.hardocp.com/reviews/cpus/intel/p416a/

    [24]Hare, C. 586/686 Processors Chart. February 2002. http://users.erols.com/chare/586.htm[25]Hare, C. 786 Processors Chart. February 2002. http://users.erols.com/chare/786.htm[26]Hurd, K. A 600MHz 64b PA-RISC Microprocessor. 2000 IEEE International Solid-StateCircuits Conference.

    [27]IBM, IBM ASIC Standard Cell/Gate Array Products, January 2002. http://www-3.ibm.com/chips/products/asics/products/stdcell.html

  • 8/3/2019 Ch5 Process Variation

    20/21

    20 Chapter 5[28]IBM, SOI Technology: IBMs Next Advance In Chip Design, January 2002. http://www-3.ibm.com/chips/bluelogic/showcase/soi/soipaper.pdf

    [29]Intel, Intel Completes 0.13 Micron Process Technology Development, November 7,2000. http://www.intel.com/pressroom/archive/releases/cn110700.htm

    [30]Kryotech, Kryotech Announces 1.86GHz SuperG2 PC Platform, August 21, 2001.http://www.kryotech.com/news_release.html

    [31]Lexra, Lexra LX4380 Product Brief. June 2001. http://www.lexra.com/LX4380_PB.pdf[32]Lexra, Lexra LX5180, LX5280 Product Brief. December 2002.http://www.lexra.com/LEX_ProductBrief_LX5K.pdf

    [33]Liou, F. We See the Future and Its Copper. Fabless Semiconductor Association FablessForum, September 1999. http://www.fsa.org/fablessforum/0999/liou.pdf

    [34]McDonald, C. The Evolution of Intel's Copy Exactly! Technology Transfer Method,Intel Technology Journal, Q4'98. http://developer.intel.com/technology/

    itj/q41998/pdf/copyexactly.pdf.

    [35]MTEK Computer Consulting, AMD CPU Roster, January 2002.http://www.cpuscorecard.com/cpuprices/head_amd.htm

    [36]MTEK Computer Consulting, Intel CPU Roster, January 2002.http://www.cpuscorecard.com/cpuprices/head_intel.htm

    [37]Orshansky, M., et al. Intra-Field Gate CD Variability and Its Impact on CircuitPerformance. International Electron Devices Meeting 1999, 479-82, 943.

    [38]Orshansky, M., et al. Characterization of spatial CD variability, spatial mask-levelcorrection, and improvement of circuit performance. Proceedings of the International Society

    for Optical Engineering, vol.4000, pt.1-2, 2000, 602-611.

    [39]Overclockers Australia. SocketA Cooler Roundup. May 2001.http://www.overclockers.com.au/techstuff/r_socketa/index.shtml

    [40]Perera, A.H., et al. A versatile 0.13um CMOS Platform Technology supporting HighPerformance and Low Power Applications. Technical Digest of the International Electron

    Devices Meeting. 2000, 571-574.

    [41]Rabaey, J.M. Digital Integrated Circuits. Prentice-Hall, 1996.[42]Rolfe, J. Chipset Cooler Roundup. February 2001.http://www.overclockers.com.au/techstuff/r_gpu_cool/index.shtml

    [43]Rohrer, N. et al. A 480 MHz RISC Microprocessor in 0.12um Leff CMOS Technologywith Copper Interconnects. 1998 IEEE International Solid-State Circuits Conference.

    [44]Rutter, D. CPU coolers compared! May 2001. http://www.dansdata.com/coolercomp.htm[45]Rutter, D. Picking a Peltier. March 2001. http://www.dansdata.com/pelt.htm[46]Sakurai, T., Newton, R. Delay Analysis of Series-Connected MOSFET Circuits. Journalof Solid-State Circuits, vol. 26-2, February 1991, 122-131.

    [47]Semiconductor Industry Association, International Technology Roadmap forSemiconductors 2000 Update: Overall Roadmap Technology Characteristics. December 2000.

    http://public.itrs.net/Files/2000UpdateFinal/ORTC2000final.pdf

    [48]Semiconductor Industry Association, International Technology Roadmap forSemiconductors 2001 Edition: Executive Summary. 2001.

    http://public.itrs.net/Files/2001ITRS/Home.htm

    [49]Semiconductor Industry Association. National Technology Roadmap forSemiconductors: Technology Needs 1997 Edition. December 1997.

    [50]Steinbrecher, T. The Heatsink Guide: KryoTech Athlon-900 review. May 2000.http://www.heatsink-guide.com/kryoathlon.htm

    [51]Steinbrecher, T. The Heatsink Guide: KryoTech Renegade review. May 1999.http://www.heatsink-guide.com/renegade.htm

  • 8/3/2019 Ch5 Process Variation

    21/21

    5. Finding Peak Performance in a Process 21[52]Stiller, A. Processor Talk: Of Tumbling Markets and Survival Strategies, translation byE. Wolfram from ct magazing, vol. 18/98, September 15, 1998.

    http://www.heise.de/ct/english/98/18/024/

    [53]Swiftech. Swiftech Home Page. September 2001. http://www.swiftnets.com/[54]Sylvester, D., Kaul, H. Power-Driven Challenges In Nanometer Design. IEEE Designand Test of Computers, vol.18-6, 2001, 12-21.

    [55]Sylvester, D., Keutzer, K. Getting to the Bottom of Deep Submicron. Presentation at theInternational Conference on Computer-Aided Design, November 8-12, 1998. http://www-device.eecs.berkeley.edu/~dennis/talks/iccad.pdf

    [56]The Card Cooler. CPU Coolers. October 2001.http://www.thecardcooler.com/shopcart/CPU_Cooling/cpu_cooling.html

    [57]Thompson, S., et al. An Enhanced 130 nm Generation Logic Technology Featuring 60nm Transistors Optimized for High Performance and Low Power at 0.7 1.4 V. Technical

    Digest of the International Electron Devices Meeting 2001.

    [58]Toms Hardware Guide. Water Coolers: Four Power Kits Starting at 200 Dollars.January 2002. http://www6.tomshardware.com/cpu/02q1/020102/

    [59]Tschanz, J. et al. Adaptive Body Bias for Reducing the Impacts of Die-to-Die andWithin-Die Parameter Variations on Microprocessor Frequency and Leakage. 2002 IEEE

    International Solid-State Circuits Conference.

    [60]TSMC, Technology and Manufacturing, January 2002.http://www.tsmc.com/technology/

    [61]Tyagi, S., et al. A 130 nm generation logic technology featuring 70 nm transistors, dualVt transistors and 6 layers of Cu interconnects. Technical Digest of the International Electron

    Devices Meeting 2000, 567-570, 871.

    [62]UMC, Foundry Service Guide, 2001.http://www.umc.com/english/pdf/1_Technology.pdf

    [63]Weste, N., and Eshraghian, K. Principles of CMOS VLSI Design. Addison-Wesley,1992.

    [64]Wojnarowicz, J. Overclocking Basics Guide, March 30, 2000.http://firingsquad.gamers.com/guides/overclockingbasicsver4/default.asp

    [65]Wolf, S., and Tauber, R. N. Silicon Processing for the VLSI Era, Volume 1: ProcessTechnology, Lattice Press, 1986.

    [66]Wyntec, Coolers for CPU, Hard Disks, Systems, Notebooks, Monitors and Chipsets.January 2002. http://www.wyntec.com.au/coolers.htm

    [67]Yang, S., et al. A High Performance 180 nm Generation Logic Technology. TechnicalDigest of the International Electron Devices Meeting 1998, 197-200.

    [68]Yu, C., et al. Use of Short-Loop Electrical Measurements for Yield Improvement, IEEETrans. On Semiconductor Manufacturing, Vol. 8-2, May 1995, 150-159.