78
Statistical Usage Models in Mobile Processor Thermal Design and Testing by Thomas C. Evans B.S. Mechanical Engineering, Case Western Reserve University 1994 Submitted to the Department of Mechanical Engineering and the Sloan School of Management in Partial Fulfillment of the Requirements for the Degrees of Master of Science in Mechanical Engineering and Master of Business Administration In Conjunction with the Leaders for Manufacturing Program at the Massachusetts Institute of Technology June 2003 @2003 Massachusetts Institute of Technology. All rights reserved. Signature of Author Department of Mechanical Engineering Sloan School of Management May 2003 Certified by Daniel E. Whitney Senior Research Scientist Thesis Supervisor Certified by___________ C edRoy E. Welsch Professor fStatistics and Management Science Thesis Supervisor Accepted by Ain Sonin, Chairman, Graduate Committee Department of Mechanical Engineering Accepted by Margaret Andrews, Executive Director of Masters Program Sloan School of Management MASSACHUSETTS INSTITUTE OF TECHNOLOGY JUL 0 8 2003 LIBRARIES

Statistical Usage Models in Mobile Processor Thermal

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Statistical Usage Models in Mobile Processor Thermal

Statistical Usage Models in Mobile Processor Thermal Design and Testing

by

Thomas C. Evans

B.S. Mechanical Engineering, Case Western Reserve University 1994

Submitted to the Department of Mechanical Engineering and the Sloan School ofManagement in Partial Fulfillment of the Requirements for the Degrees of

Master of Science in Mechanical Engineering andMaster of Business Administration

In Conjunction with the Leaders for Manufacturing Program at theMassachusetts Institute of Technology

June 2003

@2003 Massachusetts Institute of Technology. All rights reserved.

Signature of AuthorDepartment of Mechanical Engineering

Sloan School of ManagementMay 2003

Certified byDaniel E. Whitney

Senior Research ScientistThesis Supervisor

Certified by___________C edRoy E. Welsch

Professor fStatistics and Management ScienceThesis Supervisor

Accepted byAin Sonin, Chairman, Graduate Committee

Department of Mechanical Engineering

Accepted byMargaret Andrews, Executive Director of Masters Program

Sloan School of Management

MASSACHUSETTS INSTITUTEOF TECHNOLOGY

JUL 0 8 2003

LIBRARIES

Page 2: Statistical Usage Models in Mobile Processor Thermal

2

Page 3: Statistical Usage Models in Mobile Processor Thermal

Statistical Usage Models in Mobile Processor Thermal Design and Testing

by

Thomas C. Evans

Submitted to the Department of Mechanical Engineering and the Sloan School ofManagement on May 9, 2003 in partial fulfillment of the Requirements for the Degreesof Master of Science in Mechanical Engineering and Master of Business Administration

Abstract

The performance, quality and reliability of microprocessors are highly dependant upontheir operating temperature. Intel microprocessors are equipped with a thermal monitorfeature that reduces power to the chip when the maximum temperature is reached toprevent overheating. This is commonly referred to as "throttling". Worst-case thermaldesign methodologies provide highly robust thermal solutions that keep processorsrunning below their maximum temperature. OEMs do not always follow Intel'srecommendations in this regard, particularly when they use desktop processors in mobileform factors. The processors in these systems run hotter and are more likely to throttle.

A methodology that uses the principles of statistical tolerancing is developed to quantifythe performance impact of throttling on thermally under-designed mobile systems.Customer usage models are developed from market survey data, and used with MonteCarlo simulation techniques to calculate the distributions of processor temperature andperformance in use. Simulation results from both worst-case operating conditions and astatistical usage model are analyzed and compared. The statistical usage model is thenused to compare the theoretical operation and performance of a true mobile system and adesktop processor transportable system. Finally, an analysis is performed to evaluate theeffect of reducing the throttle set point on a mobile processor.

Results show that the predicted occurrences of throttling drops by two orders ofmagnitude when comparing the statistical usage model to worst-case, that approximately10 percent of the desktop transportable systems would throttle severely, and that totalquality events could be reduced by 37 percent by lowering the throttle set point.

The implications for new product positioning and increasing integrality of mobilecomputer systems are discussed.

Thesis Supervisor: Daniel E. WhitneyTitle: Senior Research Scientist

Thesis Supervisor: Roy E. WelschTitle: Professor of Statistics and Management Science

3

Page 4: Statistical Usage Models in Mobile Processor Thermal

Acknowledgements

I would like to acknowledge the Leaders for Manufacturing Program for its support of

this work.

I would like to acknowledge Vivek Phanse, my supervisor at Intel, for his support and

guidance throughout my internship.

I would like to acknowledge Professors Whitney and Welsch for their constructive ideas.

I dedicate this thesis to my wife, Nancy, without her constant support and encouragement

none of my success would have been possible.

4

Page 5: Statistical Usage Models in Mobile Processor Thermal

Table of Contents

A b stra ct .............................................................................................................................. 3Acknowledgements....................................................................................................... 4Table of Contents......................................................................................................... 5Chapter 1 - Overview .................................................................................................. 7

1.1 Background ........................................................................................................ 71.2 Project Goals....................................................................................................... 81.3 Approach.............................................................................................................. 8

Chapter 2 - Background and Project Setting ........................................................ 112.1 Statistical Tolerancing...................................................................................... 112.2 Intel Mobile Platform Group ............................................................................... 132.3 Thermal Enabling ............................................................................................. 15

2.3.1 Thermal Design Power............................................................................ 162.3.2 TDP Ratio and Application Ratio (AR)................................................... 17

Chapter 3 - Product Overview ................................................................................. 193.1 Thermal Monitor................................................................................................ 193.2 Thermal Impact ................................................................................................ 20

3.2.1 Temperature Effect on Processor Frequency....................................... 203.2.2 Temperature Effect on Processor Reliability ....................................... 22

3.3 Power Consumption ................................................................................... 223.3.1 Dynamic Power......................................................................................... 233.3.2 Leakage Current Power ......................................................................... 24

3.4 Heat Dissipation................................................................................................ 24Chapter 4 - Simulation Process .............................................................................. 27

4.1 Monte Carlo Simulator .................................................................................... 274.2 Simulator Modifications.................................................................................... 30

4.2.1 Application Ratio and Ambient Temperature Distribution................... 304.2.2 Performance Impact Modeling ............................................................... 31

Chapter 5 - Usage Models......................................................................................... 335.1 Mobile Choice Survey...................................................................................... 335.2 Defining a "User".............................................................................................. 34

5.2.1 Software Characterization - Application Ratio .................................... 355.2.2 User Characterization............................................................................... 36

5.3 Usage Model Matrix ........................................................................................ 375.3.1 Worst-Case Peak 5-Second Average Usage Model .......................... 375.3.2 Expected Peak 5-Second Average Usage Model................................ 395.3.3 Expected Average Usage Model.......................................................... 415.3.4 W orst-Case Average Usage Model...................................................... 43

5.4 Application Ratio Distribution Parameters ................................................... 445.5 Ambient Temperature Distributions............................................................. 47

Chapter 6 - Simulation Results .............................................................................. 496.1 Worst-Case Operating Conditions vs. Statistical Usage Model............... 49

5

Page 6: Statistical Usage Models in Mobile Processor Thermal

6.2 Mobile System vs. Desktop Arbitrage System........................................... 546.3 Recommended Change to Throttle Set-Point.............................................. 60

Chapter 7 - Recommendations and Future Use .................................................. 697 .1 V a lid atio n ........................................................................................................ . . 697.2 Market Segments and Product Positioning .................................................. 717.3 Industry Dynamics and Strategic Partnerships........................................... 73

Appendix A: Selected Questions and Sample Answers from End-User MobileC ho ice S u rvey............................................................................................................. . 75B ib lio g ra p h y ..................................................................................................................... 77

6

Page 7: Statistical Usage Models in Mobile Processor Thermal

Chapter 1 - Overview

1.1 Background

Intel Corporation is the leading supplier of microprocessors to the computing industry.

Intel supplies microprocessors to many market segments, including both the desktop

market segment and the mobile market segment. The appearance of a new mobile

computing market segment, the desktop replacement or "transportable", has resulted in

higher-power desktop processors installed in laptop form factor chassis. The high power

processors are pushing the limits of the cooling systems that can be used in these form

factors, and processors are running hotter.

The performance, quality and reliability of microprocessors are highly dependant upon

their operating temperature. Intel microprocessors are equipped with a thermal monitor

feature that reduces power to the chip when the maximum temperature is reached to

prevent overheating. This is commonly referred to as "throttling".

Intel uses statistical methods and tolerances for many aspects of estimating product

quality and to determine product-testing specifications as part of its standard business

practice. However, the usage model input, one of the major factors, is typically entered

as a "worst-case realistic" value. The thermal design process is also based on worst-case

tolerances. Intel sets its thermal specifications and recommendations such that processor

"throttling" is an unlikely event, happening only under extreme circumstances. However,

7

Page 8: Statistical Usage Models in Mobile Processor Thermal

some original equipment manufacturers (OEM) are disregarding Intel's

recommendations, and are producing systems with less capable thermal solutions. These

systems are considered thermally under-designed, and throttle more often because the

processors are not adequately cooled and running hotter. With increases in processing

power, thermal under-design is expected to increase in both quantity and magnitude. The

problem that Intel is faced with is the challenge of realistically predicting or simulating

the amount of throttling that occurs in the under-designed systems.

1.2 Project Goals

The project goals are to answer the following questions:

* What is the expected occurrence of throttling with thermally under-designed

systems?

o How many units will throttle/suffer performance impact?

o How significant is the performance impact?

o How often is the performance impacted?

" How can Intel incorporate the usage model of mobile computers into product

specs and recommendations and/or process parameters in order to maximize

value?

1.3 Approach

Chapter 2 begins with a brief description of statistical tolerancing and a comparison to

worst-case tolerancing. It then looks at the project setting, the mobile computer market

segment, and the Thermal Enabling function at Intel. Chapter 3 goes into some depth

8

Page 9: Statistical Usage Models in Mobile Processor Thermal

about the microprocessor product characteristics that are germane to this thesis. Chapter

4 discusses the simulation process that is the basis for making the throttling predictions.

Chapter 5 develops the statistical usage model that is used as an input to the simulator to

provide realistic throttling estimations. Chapter 6 reviews the results of the three

simulation comparisons: worst-case operating conditions vs. statistical usage model,

mobile system vs. desktop arbitrage system, and 100 *C vs. 95 'C throttle set point.

Chapter 7 concludes with recommendations and future uses for the methodology

developed.

9

Page 10: Statistical Usage Models in Mobile Processor Thermal

This page intentionally left blank.

10

Page 11: Statistical Usage Models in Mobile Processor Thermal

Chapter 2 - Background and Project Setting

2.1 Statistical Tolerancing

Statistical tolerancing is a methodology that allows an increase in the manufacturability

of a product while achieving the required performance and quality by understanding the

statistical nature of the factor deviations. Statistical tolerancing works because the

chance that all factors will experience maximum deviations in the same direction, what is

called worst-case stack-up, is small. In fact, with more factors involved the chance of

worst-case stack-up typically becomes less. This is in direct contrast to worst-case

tolerancing, which always assumes worst-case stack-up. As a simple example, imagine a

five-inch stack of blocks made of five one-inch blocks as shown in Figure 2.1.

1.000" ±/-?

5.000" +/- 0.050

Figure 2.1: Example of stack-up tolerances

If the tolerance for the stack is 0.050 inches, what should the tolerance on each individual

block be? Worst-case tolerancing says 0.010 inches. That way sum of the tolerances can

never be more than 0.050. If the size of the blocks is normally distributed with a mean of

11

Page 12: Statistical Usage Models in Mobile Processor Thermal

one and standard deviation of 0.005 (i.e. N(l, 0.005)), then five percent of the blocks

would be outside of the 0.010 individual tolerance and would need to be scrapped.

However if all of the blocks are N(1,0.005) and the deviations are independent, then the

stack can be assembled with all of the blocks (none of the blocks are scrapped), and only

one in 100,000 of the assembled stacks will fall outside of the 0.050 stack tolerance. This

is because the chance that all five blocks will have large deviations in the same direction

is rare. This is the principle that makes statistical tolerancing work. One caveat of

statistical tolerancing is that there is no longer any absolutes. One cannot say that the

stack will never be greater than X. There is always some probability, however

infinitesimal that it could happen. Of course, it is still possible to inspect the assembly

and scrap and assemblies that do not meet the final specifications.

Intel uses this kind of statistical approach to estimate product quality and to determine

product-testing specifications as part of its standard business practice. However, the

usage model input, one of the major factors which contains two input variables, is

typically entered as a "worst-case realistic" value.

Let us return to the block example and see what happens if we assume that two of the

blocks are always at the worst-case tolerance of 0.010, even though they are still actually

N(1,0.005). We now need to inspect the assembly to 3 inches +/- 0.030 after stacking the

three blocks we know to be normally distributed. Now 30 in 100,000 three-block

assemblies would measure out of tolerance and need to be scrapped. The scrap rate

12

Page 13: Statistical Usage Models in Mobile Processor Thermal

increased by a factor of 30 because-worst case assumptions were made for two blocks out

of five. This is analogous to the approach Intel is taking with its quality simulations, and

why it is important to correctly identify the distributions for the last two parameters.

2.2 Intel Mobile Platform Group

Intel Mobile Platform Group (MPG) is one of Intel's six strategic business units. MPG

and is responsible for developing and marketing microprocessors, chipsets and related

hardware and software for laptop and other mobile computing systems. Products for this

market segment have features not found in desktop products. These features, most of

which revolve around reducing power consumption; make them particularly valuable for

mobile applications. For this reason, mobile products command a premium in the

market.

Over the last several years the mobile market segment has become much more

complicated as the traditional notebook computer form factor has given way to a variety

of form factors as shown in Figure 2.2

13

Page 14: Statistical Usage Models in Mobile Processor Thermal

V7

Notebookcomputer

Thin &light

Figure 2.2: The variety of mobile form factors currently available.Source: Intel Mobile Platforms Group

In addition to the proliferation of form factors, mobile system powers have been

increasing rapidly, as shown in Figure 2.3.

System Power

60

50

40

30

n"20

10

01994 1995 1996 1997 1998 1999 2000 2001

Figure 2.3: Average total mobile system powers.Source: Intel Mobile Thermal Enabling

14

Trans-portable

=MWW9:::f

Mini-Note

Page 15: Statistical Usage Models in Mobile Processor Thermal

The active/passive line refers to the cooling threshold described in Chapter 3. The

combination of smaller form factors and higher system powers continues to make heat

removal more challenging. In addition, OEM's have been installing higher-power

desktop processors in laptop form factor chassis, creating a new mobile computing

market segment, the desktop replacement or "transportable". This practice is referred to

as "desktop arbitrage", as desktop processors are replacing premium priced mobile

processors in what is traditionally thought of as a mobile form factor.

Obviously, "desktop arbitrage" is of great concern to MPG as they are directly losing

sales of its mobile products. It also negatively impacts the revenue of Intel as a whole as

sales of less expensive desktop processors replace those of the premium priced mobile

processors. In addition to the negative financial implications for Intel, "desktop

arbitrage" also presents significant technical hurdles. The higher power processors

exceed the capabilities of the cooling systems used in these mobile form factors.

2.3 Thermal Enabling

The role of the Mobile Thermal Enabling team is to provide guidance and make it as easy

as possible for customers (OEMs) to design Intel products into its systems. They test

solutions from suppliers and create example thermal solutions. They also publish design

collateral on the best design practices. Most interesting for this thesis is the first part of

the thermal enabling process; defining the Thermal Design Power (TDP) for a processor.

15

Page 16: Statistical Usage Models in Mobile Processor Thermal

2.3.1 Thermal Design Power

TDP is the value that Intel recommends the computer OEMs use in designing their

cooling solutions. TDP is not the theoretical maximum power for a given chip design,

but is an empirical value (adjusted for anticipated leakage) derived from testing the

processor with most intensive commercially available software application. A wide

variety of software applications are screened to find the one that is most processor

intensive.

During TDP testing the power consumed by the processor is measured and recorded at

the millisecond timescale. However power spikes at that frequency are of no thermal

interest. Due to the thermal inertia of the processor and the cooling solution the

temperature of the processor responds on a much slower time scale. Therefore the

maximum power of interest for a given application is not the maximum power spike.

Laboratory experiments, computer modeling and experience have shown that a rolling 5-

second average power trace is a good indicator of the temperature that a mobile processor

will reach. The rolling 5-second average power is post-processed from the data. Each

data point is given the average value of the previous five seconds worth of data. This

simulated the thermal inertia of the system. Since the TDP value is intended for thermal

calculations it makes sense to use a measurement that corresponds with temperature.

Because the maximum temperature that the processor reaches is the value of interest, the

peak 5-second average is used as the rating. Figure 2.4 shows a power trace for a

software application test.

16

Page 17: Statistical Usage Models in Mobile Processor Thermal

3Dmark2000

1.6

1.4

1.2

0

0.8

0.6

0.4

In (C0 r- (a 40 0)0N C DC N C 0 v N D C -N C U .0 N

Time

Figure 2.4: Software application power trace showing power spikes, 5-second rollingaverage power, and mean power. Source: Intel Mobile Power Lab

After the peak 5-second average value is calculated the TDP rating still needs to be

corrected for leakage power. Leakage power varies from chip to chip (see Section 3.3.2).

The leakage power is measured for the TDP test chip when it is at idle. To make the

correction the test leakage value is subtracted from the measured peak 5-second average

power and the maximum leakage power for that processor specification is added back.

This ensures that the TDP corresponds to the highest leakage part for a given

specification.

2.3.2 TDP Ratio and Application Ratio (AR)

TDP is expressed in watts. However it can also be useful to think of it in terms of a ratio

of a maximum power. The maximum power of a processor is determined by running a

"power virus", a piece of specifically developed in-house software code that has no

17

Page 18: Statistical Usage Models in Mobile Processor Thermal

functionality other than to apply maximum stress to the CPU. The TDP ratio is then

defined as shown in Equation. 2.1.

TDP ratio = (TDP - Leakage power) (Eq. 2.1)(Maximum power - Leakage power)

TDP ratios typically fall between 0.7 - 0.85. The TDP ratio can be particularly useful in

modeling simulations and in comparing processors. Software applications can also be

defined in terms of an application ratio (AR). In mobile computing the process is the

same as defining the TDP ratio. The equation 2.2 is given for AR.

AR = (Peak 5-second average power - Leakage power) (Eq. 2.2)(Maximum power - Leakage power)

This chapter looks at the fundamentals of statistical tolerancing, provides an overview of

Intel Mobile Platform Group and gives some insight to the functions of the Mobile

Thermal Enabling team. The next chapter focuses on the features and characteristics of

Intel microprocessors that are important for this thesis.

18

Page 19: Statistical Usage Models in Mobile Processor Thermal

Chapter 3 - Product Overview

The central feature of the Intel microprocessor that is relevant to this work is the thermal

monitor. The thermal monitor attempts to control the processor temperature by initiating

the thermal control circuit, which reduces the processor frequency, and hence its power

dissipation. This is commonly referred to as "throttling". This chapter will provide a

brief discussion of the thermal monitor, why it is necessary, how heat is generated in a

semiconductor, and how that heat is removed from a mobile computer system.

3.1 Thermal Monitor

Intel® Pentium® 4 processors are equipped with a thermal monitor feature. This consists

of an on-die temperature sensor and a built in control circuit which can stop the processor

clock input to the central processing unit (CPU) core, placing the microprocessor in a

lower power state [1]. When the microprocessor reaches the maximum operating

temperature (the Throttle Set Point) the thermal monitor toggles the stop clock feature.

Modulating the power dissipated in the CPU reduces it linearly with the percentage of

time the clock is stopped, and preventing the processor from overheating. While the

clock is stopped the CPU is also prevented from executing instructions, which reduces its

operating performance. This is commonly referred to as "throttling".

Based on the characteristics described above, this project set out to answer 3 questions:

1. How many Mobile CPU's will throttle and suffer performance impact?

19

Page 20: Statistical Usage Models in Mobile Processor Thermal

2. How significant is the performance impact?

3. How often does the performance impact occur?

3.2 Thermal Impact

Why is a thermal monitor necessary? The thermal monitor is designed to provide non-

traumatic temporary performance decreases that are virtually unnoticeable to the end user

in the place of more severe thermal failures. Without a thermal monitor a microprocessor

can literally overheat and destroy itself [2]. Temperature affects the operation of a

microprocessor in several ways. A microprocessor's maximum frequency is a function

of its temperature. Changes in temperature can have an instantaneous effect on how fast

the processor can perform operations. Temperature also has a cumulative impact. The

long-term reliability of a microprocessor is also a function of its temperature. Elevated

temperatures increase the rate of all chip related failure mechanisms [3]. These two

temperature effects are discussed below. The purpose of the thermal monitor is to

substitute short-term performance losses in place of more severe thermal failures.

3.2.1 Temperature Effect on Processor Frequency

The maximum frequency that a microprocessor can operate at before it loses functionality

is referred to as Fmax [4]. Fmax is a function of both input voltage and temperature.

Fmax increases with increasing voltage, and decreases with increasing temperature.

Fmax can be measured on an individual part either in the laboratory or on the production

line with the proper equipment. Due to manufacturing variation there is a distribution of

Fmax on any given product. An example of this distribution is shown in Figure 3.2.

20

Page 21: Statistical Usage Models in Mobile Processor Thermal

(0

.a-

Fmax (GHz)

Figure 3.2: Processor frequency maximum (Fmax) due to manufacturing variation.Source: Intel Corporation

Before a device is shipped its frequency is tested at a prescribed temperature and voltage

at the factory. Based on the results of this test the device is "binned", and its speed is

preset. If a microprocessor's temperature increases (or voltage decreases) to the point

that Fmax is less than the pre-programmed frequency, the processor will "lock-up" and

stop operating. This is known as an Fmax failure. The speed is preset on each device

such that as long as the temperature specifications are met, an Fmax failure would be an

extremely rare event. Throttling is a mechanism that is intended to prevent a catastrophic

Fmax failure by virtue of trying to limit the temperature.

21

Page 22: Statistical Usage Models in Mobile Processor Thermal

3.2.2 Temperature Effect on Processor Reliability

In addition to the instantaneous Fmax failures, elevated temperatures also have a

cumulative negative effect on the long-term reliability of a semiconductor

microprocessor.

All chip related failure mechanisms are aggravated by high temperature. One example is

dopant diffusion, which is the diffusion of contaminants within the semiconductor

structure. Increases in temperature increase the diffusion rate, which in turn reduces the

mean time to fail. Another example is gate oxide, where temperature increases the rate of

the oxidation reaction [3]. In addition, a large number of assembly related and operation

induced failure mechanisms are also aggravated by high temperature. One example is

electro migration, an operation induced failure where higher temperatures make it easier

for metal atoms to be moved by the impact of current flow [3]. These effects also make it

desirable to limit the silicon temperature.

3.3 Power Consumption

Heat is generated in a semi-conductor circuit, like any electrical circuit, when a voltage is

applied across a resistance and a current flows. The heat generated is equal to the power

consumed. For a simple approximation, the power consumed by a digital logic circuit is

entirely due to charging and discharging circuit capacitor nodes [5]. The power

consumed by any node is small, but multiplied by millions of transistors per chip

22

Page 23: Statistical Usage Models in Mobile Processor Thermal

switching billions of times per second, it adds up rapidly [1]. This results in Equation 3.1

for power:

Power = CtotaIV 2f (Eq. 3.1)

where: Ctotai is the average capacitance charged per cycleV is the supply voltagef is the operating frequency.

Equation 3.1 only accounts for the dynamic power that occurs from changes in transistor

states. There is also a leakage current power component that in theory should be zero,

but has become more significant as processor geometries have shrunk and incorporated

more gates [1]. Both dynamic power and leakage current power are detailed below.

3.3.1 Dynamic Power

Dynamic power is typically the major source of power dissipation in microprocessors [6].

The dynamic power consists of the dynamic switching power and short circuit current

power. The dynamic switching power is the power consumed in switching a transistor

gate, and the short-circuit current occurs during a transition when both the input and

output gates are partially open. In a well-designed circuit the short circuit power

dissipation can be limited to 5-10% of the total dynamic power [6]. For this thesis the

most important aspects of dynamic power are that it is the dominant source of power

dissipation in microprocessors and it is determined by the design of the microprocessor

(the number of transistors and their capacitance), the operations it is performing (how

many transistors are switching), the supply voltage, and operating frequency.

It is also useful to think in terms of the dynamic current, which is the dynamic power

divided by the supply voltage.

23

Page 24: Statistical Usage Models in Mobile Processor Thermal

3.3.2 Leakage Current Power

The leakage, or standby current (Isb) is, as the name implies, the current that leaks across

transistors when they are turned off Therefore leakage current is independent of the

operations performed. Leakage current is temperature sensitive, Isb can increase

dramatically at higher temperatures [6]. As a result of manufacturing variability the Isb

can vary significantly. The distribution of Isb for a particular product will also change

over time as the manufacturing process matures. Also, there are process parameters that

can be used to tune other desirable properties of the microprocessors that also affect

leakage current, and will add to the shifts in Isb distribution. For each product there is a

maximum specified Isb, and all parts are screened to below that level. The leakage

current power is calculated with ohm's law:

Pleakage = V * Isb (Eq 3.2)

3.4 Heat Dissipation

All of the heat generated within a system needs to be dissipated, otherwise the

temperature will continue to rise. A normally sized mobile computer can accommodate

approximately 15 W of passive cooling [7] from radiation and natural convection.

Systems that exceed this total require an active cooling solution. In particular, the CPU,

which has been on of the higher power devices in the system, requires a dedicated active

thermal solution. A remote heat exchanger (RHE), as depicted in Figure 3.3, is the most

common technology used to remove heat from the CPU in mobile computer systems.

24

Page 25: Statistical Usage Models in Mobile Processor Thermal

AttachmentBlock

CPU(junction)

Air InletM

Heat Pipe

AirExhaust

HeatExchanger

(HX)

Figure 3.3[8]: Remote heat exchanger for mobile computer systems.

The heat generated at the transistors in the processor is conducted through the silicon into

the attachment block. There is typically a thermal interface material between the CPU

and the attachment block to improve conduction across this boundary. The attachment

block also serves as a heat spreader to equalize temperatures across the silicon die. The

heat then flows through the heat pipe. Heat pipes are extremely efficient. They operate

by vapor/liquid phase change and mass transfer [7]. The heat is then conducted into the

fins of the heat exchanger, where it is removed via forced convection from the fan.

The heat flow path can be modeled as a series of thermal resistors as shown in Figure 3.4.

0 j-heat pipe 0 heat pipe-HX OHX-ambient

Oj-a

Figure 3.4: Thermal resistance model of remote heat exchanger

25

Page 26: Statistical Usage Models in Mobile Processor Thermal

The total thermal resistance is empirically defined in a steady state condition as shown in

Equation 3.3.

E_ Po we (Eq. 3.3)'i Power

where 0 j-a is the thermal resistance in 0C/W

Tj is the transistor junction temperature of the CPU in 'C

Ta is the ambient temperature in 'C

Power is the power dissipated by the CPU in watts

Lower thermal resistance allows greater cooling capacity for a given temperature

difference. To reduce the total thermal resistance it is necessary to reduce one or more of

the resistances in series. The most common method to reduce Oj-heat pipe is to use a lower

resistance thermal interface material. The heat pipe thermal resistance can be reduced

with a larger cross-section heat pipe. The easiest way to reduce the total resistance is to

reduce OHX-ambient by using a heat exchanger with greater surface area, a larger more

powerful fan to flow more air, or both. However, those options are in direct contradiction

to the desire to make the systems smaller, thinner, and lighter.

This chapter reviews the thermal monitor feature of Intel microprocessors and the thermal

impact and power consumption characteristics of semi-conductor microprocessors. The

next chapter looks at how those characteristics and features are a simulator for product

quality.

26

Page 27: Statistical Usage Models in Mobile Processor Thermal

Chapter 4 - Simulation Process

The simulation tool used as the basis of this work was developed by Intel Corporation.

The simulation runs within Excel (Microsoft Corp.) and uses JMP (SAS Institute Inc.) for

data analysis and graphing. The tool is used to understand the impact of different test

conditions for microprocessors and how they affect factory yield and quality defects in

the field. Quality defects are defined broadly as when a device does not perform as

expected. Setting the test conditions too conservatively results in unnecessarily scrapping

microprocessors at the factory that would have performed flawlessly in the field. Setting

the test conditions too aggressively increases the yield at the factory, but will result in

quality defects in the field. This chapter provides an overview of how the simulator

works at a very high level, and the modifications that where made to it.

4.1 Monte Carlo Simulator

In Monte Carlo simulations, a model is evaluated repeatedly using parameter values that

are randomly drawn from statistical distributions. The results of the repeated simulations

are then evaluated in terms of their statistical distributions. Monte Carlo methods are

particularly useful when dealing with complex systems and a variety of failure rate

models [9].

27

Page 28: Statistical Usage Models in Mobile Processor Thermal

The tool used in this project simulates the microprocessor product creation, testing, and

end use in a virtual environment. The simulator uses the Monte Carlo technique to

randomly sample the products, testers and end use conditions from a statistical

description of the characteristics of each. Millions of parts can be sampled, tested and

evaluated under use conditions in this virtual environment leading to estimates of failure

rates in defects per million (DPM). By treating parts as samples of a distribution, worst

case assumptions can be avoided and more accurate assessments of expected field failure

rates are obtained.

First, the product is created by randomly choosing parameter values for Fmax, Dynamic

current, and Isb. Distributions for those parameters for a product are developed from

production data and engineering characterization data. The randomly chosen parameters

are fed into a set of equations that define the operating characteristics of the

microprocessor.

After the program creates a simulated product it is virtually tested by choosing a set of

parameters that define the test environment. The automated test equipment (ATE) is

described by its operating tolerances and the test condition settings. Statistics and

distributions of the product characteristics that pass the testing can be calculated.

If the device under test passes, it is then simulated in the end-use environment. The end-

use environment is characterized by tolerances for the electrical and thermal

specifications of the system platform or chassis, the ambient air temperature, and degree

28

Page 29: Statistical Usage Models in Mobile Processor Thermal

to which the microprocessor is utilized or stressed. Statistics and distributions of the

product characteristics in the field can be calculated. Figure 4.1 is a diagram showing an

overview of the simulation.

Product A TE, ttings and Ed s0 tibtoteristicC

Producer iImpact

Figure 4.1: Simulation overview. Source: Intel Corporation

This work focuses on the thermal aspects of the End-Use Characteristics and the

Customer Impact. From this perspective the most important value that the simulator

calculates is the temperature of the microprocessor (Tj) in use as defined in Equation 4.1.

Tj = Tair + Tsys + O*Power (Eq. 4.1)

where: Tair is the ambient temperature that the system is operating in.Tsys is the temperature rise within the system due to other electrical components.o is the thermal resistance of the OEM thermal solution in *C/W.Power = V*(Isb + AR*Idyn)Isb is the leakage or standby currentAR is the software application ratioIdyn is the dynamic current

The above equations have been simplified for presentation here. Temperature, current

and voltage are all coupled in the simulation model.

29

Page 30: Statistical Usage Models in Mobile Processor Thermal

4.2 Simulator Modifications

Several comments need to be made about the simulator methodology. The methodology

is not concerned with behavior integrated over time. It mainly focuses on instantaneous

catastrophic events such as Fmax. Therefore, no time dimension is included in the

model. For inputs that will vary over the lifetime of a device, such as software

application ratio, and in the case of mobile computers, the ambient environment, the

methodology is only concerned with the worst-case instances that may lead to a failure.

For these reasons, the application ratio is set at the TDP ratio and the ambient

temperature is set at 35 0C.

However, throttling is not an instantaneous catastrophic event. It can be a reoccurring

event that happens frequently over the lifetime of a device. One of the goals of this

project is to determine how often throttling will occur for a particular device. This

explicitly requires some inclusion of a time or frequency dimension. The application

ratio and ambient temperature variables are used to bring that time dimension into the

simulator.

4.2.1 Application Ratio and Ambient Temperature Distribution

With assistance from the programmers at Intel the simulation code was modified to

accept statistical distribution parameters for the application ratio that are sampled by the

Monte Carlo method. The application ratio parameters are developed to include an

estimation of the amount of time spent along each point on the curve (see Chapter 5).

30

Page 31: Statistical Usage Models in Mobile Processor Thermal

The simulator already had the capability to sample from a statistical bi-modal ambient

temperature distribution. A time-based model of this distribution for mobile computers is

developed in Chapter 5.

4.2.2 Performance Impact Modeling

Another goal of this project is to estimate the performance impact of throttling. When the

simulator calculates Tj for a device that is greater than the throttle set point temperature,

it then back calculates the AR required to maintain the maximum temperature for the

device. This reduction in AR is translated into a performance loss based on performance

data collected in the lab. The problem was that all of the performance loss data was

focused on the TDP application ratio. With the application ratio distribution model,

information about performance loss from a much greater range of applications is

required.

Lab testing of a broad range of software benchmarks under throttling conditions showed

consistent performance decreases for similar AR reductions. A model is fit to this

performance data. The model, shown in Figure 4.2, is exponential and is included as a

post processor in JMP.

31

Page 32: Statistical Usage Models in Mobile Processor Thermal

(0-j

E

(0

AR Reduction

Figure 4.2: Performance loss model.

The most important attributes of Figure 4.2 are that for small levels of AR reduction

(throttling) the performance loss is negligible, and that there is a rapid transition from

small performance loss to very significant performance loss. The implications are three

regimes: limited throttling produces almost no performance impact; throttling can

produce moderate performance decrease and the system will still function; performance

loss increases rapidly to the point where system functionality may be compromised.

This chapter looks at the framework of a product quality simulator that uses Monte Carlo

methods, and how to incorporate a time dimension into the simulation by using statistical

usage models. The next chapter looks at the development of the user models in detail.

32

Page 33: Statistical Usage Models in Mobile Processor Thermal

Chapter 5 - Usage Models

In order to predict the quality of a product some determination of its intended use has to

be established. These are termed usage models and typically contain information

regarding how the product is used, how often and for how long, and under what

conditions. Software benchmark companies have conducted a significant amount of

research and data collection on how people use software [10,11]. This thesis will build

on that work and focus on which software applications are being used, processing power

those applications utilize, and the environment and ambient temperature in which the

computer is being used.

5.1 Mobile Choice Survey

The basis for this usage model was data from a survey of mobile computer end users that

Intel Corporate Marketing Research had conducted in the spring of 2002. Over 900 users

were surveyed from multiple countries across the globe. Embedded in the survey were

five questions that related to the manner in which people used their mobile computers.

These questions included:

What environments?

Hours per week on?

Hours per week from batteries?

Hours per week actively using?

33

Page 34: Statistical Usage Models in Mobile Processor Thermal

What type of activities(software applications)?

Appendix A contains the actual survey questions as well as sample tabulated answers.

Another useful feature of the survey data is that it included sample-weighting parameters.

These are used to more accurately aggregate results among various sub-groups. For

example, consider 200 users surveyed in two different countries. If one country has a

total user population of 10,000,000 users and the other country only has 100,000 users,

simply combining the survey results would not provide an accurate picture of the entire

10,100,000-user population. Differences in the ratios between the number of users

surveyed in a subpopulation and the total number of users in a sub-population are

accounted for by the sample weighting parameters.

5.2 Defining a "User"

The definition of a user for this thesis is focused on defining an application ratio or

application ratio distribution that can be used as an input to the simulation model.

Having the raw data on the types of applications that are employed by mobile computer

users was the first step. In order to be utilized in the existing modeling framework, the

data needed to be distilled down to a single statistical distribution, while maintaining an

accurate representation of the of the variety of user profiles. Several methods of defining

the usage distribution were considered, both from the standpoint of how to characterize

each software type and how to characterize each user.

34

Page 35: Statistical Usage Models in Mobile Processor Thermal

5.2.1 Software Characterization - Application Ratio

As discussed in Chapter 2, the dynamic power that a CPU consumes running a particular

software application varies significantly over time as it performs different operations. It

is useful to think of software's dynamic power consumption in terms of an application

ratio (AR).

Software applications, many of which are industry standard benchmarks that

corresponded to the 14 software types listed in the survey, were tested for their power

consumption. Two measures of power consumption were considered: average power and

the peak 5-second rolling average power. The power measurements for each software

type were converted to application ratios. The results are shown in Table 5.1.

Table 5.1: Software application ratios. Source: Intel Mobile Power LabAverage App Veak 5-Sec Ave App

Software Type Ratio RatioWord processing 0.29 0.78Spreadsheets 0.29 0.78Presentation Creation 0.29 0.78Email 0.18 0.18Internet browsing 0.22 0.22Games 0.71 0.77Audio/MP3 0.32 0.37VideoNiewing DVDs 0.27 0.39Video/Downloading from Internet 0.47 0.83Video/inputting to Devices 0.86 0.88Graphics/CAD 0.59 0.74Web content creation 0.78 0.79Programming tools 0.77 0.77Simulations/math models 0.69 0.70

35

Page 36: Statistical Usage Models in Mobile Processor Thermal

5.2.2 User Characterization

With information about the types of software applications that each respondent uses the

next challenge is to define a way to characterize each user. Two approaches are taken.

The first approach followed in the tradition of the worst-case methodology; each user is

defined by the most strenuous software type that he/she reported in the survey. A single

application ratio can then be associated with each user based on the application ratio

definition.

The second approach involved calculating the expected value of the application ratio for

each user. This is the average value of the application ratios reported used, weighted by

the percent time reported for each software type.

The expected application ratio calculated for each surveyed user is given by Equation 5.1:

E(AR)i= Al*Ti,1 + A2*Ti,2 + ... A14*Ti,14 (Eq. 5.1)

where Al is Application Ratio for software type I

where Ti, 1 is % time user i reports for software type 1

Both methods produce different results depending on the definition of software

application ratio.

36

Page 37: Statistical Usage Models in Mobile Processor Thermal

5.3 Usage Model Matrix

A two by two matrix can be formed with the two definitions of software application ratio

and user characterization, as shown in Figure 5.1.

Software Application Ratio

Peak 5-Sec Average Average

0Worst Most conservative

N Case Will it ever happen?(-

-C Least ConservativeExpected Will it happen most of the

o time?

0)

Figure 5.1: Usage Model Matrix

The nomenclature to describe each quadrant is the user characterization (Worst-Case,

Expected) followed by the software application ratio (Peak 5-Second Average, Average).

Each quadrant of the matrix encompasses a different degree of conservatism, and

represents a different time scale or frequency of events when used in the simulator. Each

of the quadrants is explored in detail, starting in the upper left and going counter-

clockwise.

5.3.1 Worst-Case Peak 5-Second Average Usage Model

The application ratio distribution that results from the worst-case 5-second average usage

model is displayed in Figure 5.2.

37

Page 38: Statistical Usage Models in Mobile Processor Thermal

Worst Case 5-second

0.8

0.7

C 0.6<0.5

0.4

0.3

9L 0.2

0 .1...... m

00 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Application Ratio

Figure 5.2: Worst-case 5-second average usage model application ratio distribution.

This application ratio distribution is very similar to the single value worst-case

assumption. This is because almost all users will at some point use software types that

have the potential to consume a high fraction of the processor power. This analysis

validates the choice of TDP application ratio for a quality analysis at worst-case

conditions.

However, this type of distribution offers no insight to how often a recurring event, such

as throttling, would occur. This type of input to the quality simulator will only answer

one time-based question - "Will it ever happen?" This type of analysis is appropriate for

catastrophic unrecoverable events, and may be appropriate for catastrophic recoverable

events (e.g. Fmax). This input is also useful to calculate the absolute maximum

temperature a CPU will see in use. However , it provides absolutely no information on

38

Page 39: Statistical Usage Models in Mobile Processor Thermal

the frequency of events; would the event happen once a year? once day? once and hour,

once a minute? No conclusions can be drawn.

Does it make sense to use this as the standard for events where the consequences are

transitory and functionality is maintained? For example, a simulated person who used

their mobile computer for email, Internet browsing, and word processing would be

assigned an application ratio of 0.78 under this model. If that 0.78 application ratio, in

combination with the CPU, system, and ambient factors, created a throttling event, then

the result of the simulation would record the user as suffering from a throttling system.

However, if that user was word processing only 10 percent of the time, and word

processing software reaches it peak power infrequently and for short durations, throttling

may be imperceptible, if it occurs at all. In such a case, the user would never experience

a quality event. For these reasons the worst-case 5-second average usage model is not

useful for calculating throttling or other transient events, but it does provide an upper

bound on the number of units that could be affected.

5.3.2 Expected Peak 5-Second Average Usage Model

The application ratio distribution that results from the expected 5-second average usage

model is displayed in Figure 5.3.

39

Page 40: Statistical Usage Models in Mobile Processor Thermal

Expected 5-Second Average

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Application Ratio

Figure 5.3: Expected 5-second average usage model application ratio distribution.

This is an interesting distribution that uses the peak 5-second average application ratio

definition, accounting for the highest powers (and hence temperatures) that each software

type can generate. The peak 5-second average application ratio definition also removes

any influence of processor idle time in the calculation of the application. At the same

time by using the expected user characterization, "credit" is given for time spent in low

power applications.

Running the simulation with this usage model input should produce a typical distribution

of maximum temperatures. Note that this is different than the absolute maximum

temperatures, which are calculated using the worst-case 5-second average usage model.

The expected 5-second average usage model also provides some insight into the

frequency of events. Basically, simulating with this type of usage model answers the

40

0.1

.!n 0.08

0.06

0.04

2 0.02

0

Page 41: Statistical Usage Models in Mobile Processor Thermal

question, "Is it expected to ever happen?" This is actually quite different than "Will it

ever happen?" A processor could spend minutes throttling over its 1000's of hours of

operating lifetime. Did the processor ever throttle? Yes. Would the processor be

expected to throttle? No.

Let's return to the example person who used his/her mobile computer for email, Internet

browsing, and word processing. If this person spent 10 percent of his/her time word

processing and split the remaining time equally between email and Internet browsing

their expected 5-second average application ratio (from EQ 5.1 and Table 5.1) would be:

E(AR) = 0.78*0.10 + 0.18*0.45 + 0.22*0.45 = 0.258

With a 0.258 application ratio it is highly unlikely that the simulator would calculate a

throttling event. The interpretation of this result is that the user would not be expected to

be affected by throttling. This does not mean the user would never experience throttling.

For a different user with a higher expected 5-second average application ratio the

simulator may predict throttling. The interpretation of a throttling result is that the CPU

would be throttling "sometimes", not all of the time, but more than infrequently.

5.3.3 Expected Average Usage Model

The application ratio distribution that results from the expected average usage model is

displayed in Figure 5.4.

41

Page 42: Statistical Usage Models in Mobile Processor Thermal

Expected Average

0.20.180.160.14 -0.12

- 0.10.08

o 0.060 0.04

0.020

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Application Ratio

Figure 5.4 Expected average usage model application ratio distribution.

This distribution includes the effect of processor idle time when running applications, and

includes the influence of time spent in low power applications. Inputting this usage

model produces the expected average temperature distribution, but no information about

peak transient temperatures.

Using this distribution will only predict events that happen most of the time. This may be

useful for predicting the number of laptops that would never work properly due to

excessive rate of throttling. If a simulated CPU is predicted to throttle under this model,

the interpretation is that on average it would be throttling to some degree for its active

life. Because this model will predict the long-term average temperature distribution, it

may be useful in long-term reliability modeling.

42

Page 43: Statistical Usage Models in Mobile Processor Thermal

5.3.4 Worst-Case Average Usage Model

The application ratio distribution that results from the worst-case average usage model is

displayed in Figure 5.5.

Worst Case Average

0.4

0.35

.I 0.3

0.25

0.2

0.150L 0.1

0.05

0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Application Ratio

Figure 5.5: Expected average usage model application ratio distribution.

This model only accounts for the most strenuous application that each respondent

reported using in the survey. It then defines each of those applications by their average

application ratio, including processor idle and low-power time. Using this model will

predict the number of users that experience events on average when running their worst-

case applications. However, it does not provide any information on how often these

events will happen because it does not include any information on how often the worst-

case applications are used.

43

Page 44: Statistical Usage Models in Mobile Processor Thermal

5.4 Application Ratio Distribution Parameters

To utilize an application ratio distribution model in the simulation program it is necessary

to define it terms of statistical distribution parameters. The first step was to choose a

usage model from the matrix. The expected 5-second average application ratio usage

model is chosen because it provides the best balance between the maximum power that

each software type can consume and the percentage of time spent in both high and low

power applications. It is a good compromise for throttling simulations between the

highly conservative worst-case 5-second average model, which will count throttling

events that happen so infrequently that a user may never notice, and the expected average

model, which only counts events if they are occurring more than half of the time.

Because the application ratio (AR) is bounded between zero and one, the data is

transformed into the Z variable as shown in Equation. 5.2.

Zi = log [E(AR)i/(1 -E(AR)i)] (Eq. 5.2)

The Z variable is then approximated by a normal, and the mean (/A) and standard

deviation (&) are calculated.

The normal parameters are also adjusted by different weighting factors for each user (w)

from the survey data. The equations for the mean (fl) and standard deviation (&) are

given in Equation 5.3 and 5.4 respectively.

44

Page 45: Statistical Usage Models in Mobile Processor Thermal

$ =1 xZ2(w x Z,)Wi X(Eq. 5.3)

3= ~ ~ ~ i x w^ Z )2

Wi (I1 'X z PEq. 5.4)

Multiple weighting factors and combinations are examined. The sample population

weighting factors (as described in 5.1), the hours of actual use reported by each user, and

the combination of the two are each used to calculate a set of normal distribution

parameters. The Z normal parameters are entered directly into the simulation program,

which transforms the Z values back into AR as shown in Equation 5.5.

AR=1Oz /1Oz +1) (Eq 5.5)

The four resulting application ratio distributions (raw, weighted by population factor, by

hours, and by population factor*hours) are shown in Figure 5.6.

45

Page 46: Statistical Usage Models in Mobile Processor Thermal

j

1

Expected 5-Second Average rpp Ratio

0.4 0.6

Application Ratio

Figure 5.6: Expected 5-second average application ratio data and normal functions.

Weighting the data by the hours of active use that each user reported makes the

distribution more accurately reflect probability that a CPU would be operating at one

condition compared with another. Take the example of two users; one operates his/her

computer for 10 hours per week at an AR of 0.35; the second operates his/her computer

for 40 hours at an AR of 0.70. If we were to randomly sample the operating condition of

a CPU in this example, 0.70 would be four times more likely. Weighting the raw data by

the hours of active use corrects for this affect. It is interesting to note that weighting the

data by hours reduces the standard deviation. This suggests that people who spend more

time actively using their mobile computer also tend to have an average expected 5-second

average application ratio that is closer to the mean of the distribution.

46

0.2 0.8

'-- raw fit

-weighted by Hours

- weighted byPopulation Factors

- weighted byPopulationFactors*HOURS

0

Page 47: Statistical Usage Models in Mobile Processor Thermal

As discussed in Section 5.1, weighting the data by the sample population factors accounts

for difference in the ratio of sample size to total population for different subgroups. It is

interesting to note that weighing the data by these factors does not significantly shift the

mean or standard deviation. This suggests that mobile computer users have similar

software type usage patterns across the geographies and sub-groups surveyed.

The result of the above analysis is that both the hours of active use and the sample

population factors are used in developing the normal parameters to be used in the

simulation model.

5.5 Ambient Temperature Distributions

A model of the mobile computer ambient temperature distribution is also developed from

the Mobile Choice Survey data. The base for this mobile ambient temperature model is

desktop model that was developed by Corporate Marketing Research. The desktop

model focuses on indoor ambient conditions, and concludes conservatively that 40

percent of desktop computers are used in an air-conditioned environment. The ambient

temperature estimations for non air-conditioned environments are based on daily

maximum summer temperatures.

Analysis of the mobile survey data shows that 87 percent of the active use time fell into

one of the four indoor categories: Office, Home Office, Home, Hotel. All of these

47

Page 48: Statistical Usage Models in Mobile Processor Thermal

environments conservatively fit into the desktop indoor pattern described above. The

remaining 13 percent of the active use time (Airport, Car, Dorm, Classroom, Outdoors,

Other) was placed entirely in the non air-conditioned category. The resulting bi-modal

ambient temperature distribution for a mobile computer is shown in Figure 5.7.

-0.08

-0.06

-0.04

0.02

18 20 22 24 26 28 30 32 34 36Ambient Temperature (deg C)

Figure 5.7: Mobile computer ambient temperature distribution model.

This chapter details the development of mobile computer usage models from end-user

survey data and software application ratios. A two by two matrix of usage models is

developed with different levels of conservatism in each quadrant. A model for the

mobile ambient operating temperature is also developed. In the next chapter the usage

and ambient temperature models are used in product quality simulations to predict

throttling events.

48

Page 49: Statistical Usage Models in Mobile Processor Thermal

Chapter 6 - Simulation Results

A Monte Carlo simulation program is used to evaluate the expected operating

characteristics of microprocessors in mobile computer systems. Simulation results from

both worst-case operating conditions and a statistical usage model are analyzed and

compared. The statistical usage model is then used to compare the theoretical operation

and performance of a true mobile system and a desktop arbitrage transportable system.

Finally, an analysis is performed to evaluate the effect of reducing the throttle set point

on a mobile CPU, and how this could reduce the number of quality events in the field.

One hundred thousand simulations are used for all cases to tabulate results.

6.1 Worst-Case Operating Conditions vs. Statistical Usage

Model

The first case to be investigated was a simulation of a 2.2 GHz mobile CPU in a

theoretical traditional thin & light mobile system with a the thermal solution capability

that is designed to Intel's recommendations for that CPU. A comparison between the

results using the TDP application ratio user model and the expected 5-second average

application ratio user model and mobile ambient temperature distribution was performed.

As shown in Figure 6.1 a and 6. 1b, the average predicted CPU temperature (Tj) drops by

over 20 degrees C when evaluated with the expected 5-second average application ratio.

49

Page 50: Statistical Usage Models in Mobile Processor Thermal

The range of Tj also increases dramatically, by more than a factor of four, and the

distribution is better described by a normal distribution as predicted by the Central Limit

Theorem. This is the expected result of convolving an additional normal distribution in

the calculation in place of a single worst-case value.

- -4 -4.- 3 -3.

044

E Ea 0

0 Z --- 0 Z

F.ue .n -:1rditd P tepeatr -1tiuin(j n s o . ~

ns-2

-2

ni-3 Mli -3

Pu sz

40 50 60 70 80 90 100 40 50 60 70 80 90 100Tj in Use (deg C) Tj in Use (deg C)

(a) (b)Figure 6.1la & b: Predicted CPU temperature distribution (Tj) in use for a 2.2 GHzmobile CPU in a thin & light system using (a) Worst-case operating conditions (b)Statistical usage model.

The impact throttling, power reduction, and performance of this significant shift in

predicted CPU temperature is summarized in Table 6. 1.

50

Page 51: Statistical Usage Models in Mobile Processor Thermal

Table 6.1: Predicted operating characteristics of 2.2 GHz mobile CPU thin & lightsystems.

Worst-case operating Statistical usageconditions model

CPUs throttling 0.91% 0.01%(90% upper confidence limit)Maximum power reduction 9.1% 9.7%requiredMaximum performance 5.1% 16.4%reduction

The 90 percent upper confidence limit on the percentage of CPU's that will experience

throttling in the field drops by almost two orders of magnitude.

It is interesting to note that to maintain the CPU temperature limit, the maximum power

reduction required is slightly greater when using the expected 5-second average

application ratio distribution in the simulation. This is true because, with the application

ratio distribution, it is possible to simulate an application ratio higher than the TDP

application ratio. The higher application ratio results in higher power consumption,

which then requires a greater power reduction to maintain the temperature limit under

certain conditions. Also of interest is that the slightly greater maximum power reduction

required in the application ratio distribution simulation. This results in more than three

times the performance loss. This is due to the highly non-linear relationship between

power reduction and performance loss.

It is possible to examine in greater detail the impact to the CPU's that were predicted to

throttle. Figure 6.2 shows the fraction of the power reduced to maintain the CPU

temperature limit with the TDP application ratio for the subset of the CPU's that are

51

Page 52: Statistical Usage Models in Mobile Processor Thermal

throttled in simulation. It is important to remember that the throttling subset represents

only 0.91% of the total population.

0.9--0.20 I0.8-

0.15

EL 0.5-0.1 a0 n

0

CL 0.3--0.050.

0.1-

0 .01 02 03 .04 .05 .06 .07 .08 .09 0 .01 .02 .03 .04 .05 .06 .07 .08 .09 .Power Reduction Power Reduction

(a) (b)Figure 6.2a & b: Predicted power reductions for throttling 2.2 GHz mobile CPU in thin& light systems using worst-case operating conditions (a) Histogram (b) Cumulativedistribution function.

As can be seen from Figure 6.2b, half of the CPU's that are simulated to throttle under

the TDP usage model require less than two percent power reduction to maintain the

temperature limit. Figures 6.3a and b show the CPU performance reduction as a result of

the power reductions of Figure 6.2a and b

52

Page 53: Statistical Usage Models in Mobile Processor Thermal

-0.50 0-0.9-

-0.40 0.-0.7-

0.30 >, 8.

00

0..1-

0.202-

.01 .02 .03 .04 .05kirirt~ 0 .4 .50 .01 .02 .33.04 .05

Performance Impact PromneIpc

(a) (b)Figure 6.3a & b: Predicted performance reductions for throttling 2.2 GHz mobile CPU

in thin & light systems using worst-case operating conditions (a) Histogram (b)Cumulative distribution function.

Due to the inexact nature of the performance correlation there is a slight offset at zero.

What is important to note is that the majority of the CPU's that are predicted to throttle

under the TDP application ratio assumption experience less than one percent performance

impact. In fact, more than 90 percent of the throttling CPU's would suffer from less than

two percent performance impact.

This validates Intel's current mobile thermal design process. Intel starts with the

assumption of the TDP application ratio user model and a 35 degree centigrade ambient

temperature. Its success criteria are that throttling will only occur in rare circumstances

and the performance impact should be practically undetectable. The simulation validates

that using those assumptions, Intel meet its goals.

53

Page 54: Statistical Usage Models in Mobile Processor Thermal

The picture looks very different with the expected 5-second application ratio distribution.

Only one unit out of the 100,000 simulated with the application distribution indicated

throttling. This was only in the extremely rare instance of a high ambient temperature

(Ta > 33C), an extremely high application ration (AR > 0.85), and a high power CPU.

6.2 Mobile System vs. Desktop Arbitrage System

Simulations were run to compare the performance and occurrences of throttling between

two theoretical systems; a thin & light system with a 2.2 GHz mobile processor as in

Section 6.1, and a transportable system with a 3.06 GHz desktop processor. The desktop

processor can consume up to 146 percent more power than the mobile processor, and that

power needs to be dissipated. The larger physical size of the transportable system allows

for a thermal solution with a greater cooling capacity. For these simulations the thermal

solution capability of the transportable system is 72 percent greater than the thin & light

system, which is typical. However the transportable system with the desktop processor is

still 31 percent thermally under-designed per Intel's current design recommendations.

The OEMs may design and build thermally under-designed systems because they realize

Intel is using a worst-case design methodology, and the OEMs may believe they can

relax the thermal design conditions (i.e. the ambient temperature and processor power)

without suffering any consequences.

The first result of the simulation that needs to be mentioned is that with the TDP

application ratio user model and a 35 degree C ambient temperature, Intel's current

design assumptions, the simulation predicts that all of the desktop processor transportable

54

Page 55: Statistical Usage Models in Mobile Processor Thermal

systems will experience severe throttling. They would all require a 10% - 50% power

reduction to maintain the CPU temperature limit. If such extreme conditions are realistic

it is safe to say that many of the systems would fail to operate at all.

Running the simulation with the expected 5-second average application ratio and the

ambient temperature distribution use model produces results that are more indicative of

real world use.

As shown in Figure 6.4a and 6.4b the average predicted CPU temperature (Tj) is

approximately 15 degrees C higher for the "transportable" desktop CPU system when

compared to the "thin & light" mobile CPU system.

55

Page 56: Statistical Usage Models in Mobile Processor Thermal

M1

.10

-4 1

-3-.2

HiLa

--1

--2

--3

Lii II

0.08

-0.06

-0.04

-0.02

90 10060 70 80Tj in Use (deg C)

n-

-3 A?-2

a0 Z

13

7 Dl]

A

a..

40 50 60 70 80 90 100Tj in Use (deg C)

-- 1

--2

-- 3

-0.13-0.10-0.08-0.05-0.03

(a) (b)Figure 6.4a & b: Predicted CPU temperature distribution (Tj) in use for (a) 2.2 GHzmobile CPU in a thin & light system using (b) 3.06 GHz desktop CPU in a transportablesystem

The effect of the thermal monitor can clearly be seen in Figure 6.4b. The thermal

monitor limits the right hand side of the temperature distribution to approximately 100

degrees C.

The impacts of these predicted CPU temperature distributions are summarized in Table

6.2.

56

40 50

......... ..... ...... ..................................... ..-........... -........ ....... ........... ... ..................... ....... .................................. ......... .. ................. ............. .... - 1 .. ...........4

Page 57: Statistical Usage Models in Mobile Processor Thermal

Table 6.2: Operating characteristics predicted with statistical usage model of theoreticalmobile and transportable computer systems

2.2 GHz mobile CPU 3.06GHz desktop CPUthin & light system transportable system

CPUs throttling 0.01% 19.6%(90% upper confidence limit)Maximum power reduction 9.7% 49.1%requiredMaximum performance 16.4% Some systems would notreduction operate properly

The 90 percent upper confidence limit shows that nearly 20 percent of the desktop

processor transportable systems would be expected to experience some degree of

throttling. This also means that 80 percent of the systems, if equally distributed among

standard mobile computer users, would not be expected to experience any significant

throttling. However transportable system users may have a different profile than standard

mobile computer users.

The distributed usage model still predicts that some of the desktop processor

transportable systems are thermally constrained to the point that not only the processor

performance suffers, but also they may fail to operate entirely.

We can investigate in more detail the impact to the 19.6 percent of the desktop CPU's

that are predicted to throttle. Figures 6.5a & b show the fraction of power reduction

required to maintain the temperature limit for the subset of the CPU's that throttle.

57

Page 58: Statistical Usage Models in Mobile Processor Thermal

0.0 0.9-

0.8-

-0.06 _0.7-

05-0.04 E

0.4-CC

0.3--0.02 0.2-

0.1-

0 .1 .2 .3 .4 .5 0 .1 .2 .3 .4Power Reduction Power Reduction

(a) (b)Figure 6.5a & b: Power reduction predictions under the statistical usage model for

throttling 3.06 GHz desktop CPU in transportable systems (a) Histogram (b) Cumulativedistribution function.

From Figure 6.5b, 50 percent of the throttling desktop CPU's require 10% power

reduction or less. These systems would continue to function, although at varying degrees

of full performance when compared with an un-throttled equivalent system. Analysis of

the performance of the entire throttling sub-population is shown in Figures 6.6a & b.

-0.70

-0.60

-0.50

0.40

-0.30 .

0.20

-0.10

0 .1 .2 .3 .4Performance Reduction

(a)

0.9-

0.7-

-00.6-

0.5-ES0.4 -

0.2 -0.1-

Performance Reduction

(b)Figure 6.6a & b: Performance reduction predictions under the statistical usage model forthrottling 3.06 GHz desktop CPU in transportable systems (a) Histogram (b) Cumulativedistribution function.

58

Page 59: Statistical Usage Models in Mobile Processor Thermal

Fifty percent of the throttling desktop units, which represent approximately 10 percent of

the total population, suffer less than a five percent performance reduction. However

beyond that the performance of the remaining sub population falls off dramatically. This

is due to the highly non-linear nature of performance loss.

The performance loss correlation is capped at 40%, which results in non-uniform

distribution for highly thermally stressed systems, such as in Figure 6.6a & b. The reality

is that at high levels of thermal stress, throttling and power reduction (>>10%), the

performance of the processor is no longer the critical issue, rather function of the system

becomes the critical issue. There are documented cases [12] of desktop processors used

in mobile systems that resulted not only in reduced performance, but lack of function.

With this data we can now make a direct comparison between the performance of a 2.2

GHz Mobile CPU thin& light system, and a 3.06 GHz desktop CPU transportable. For

the mobile systems 99.99 percent would operate at their full rated performance. Only

under very rare and extreme use conditions would they experience any significant

throttling. In comparison, 80 percent of the desktop transportable systems would perform

at their full performance level. However, the full performance of the desktop

transportable system is greater than the mobile system's performance due to the desktop

CPU's higher speed rating.

59

Page 60: Statistical Usage Models in Mobile Processor Thermal

6.3 Recommended Change to Throttle Set-Point

Another interesting analysis that can be performed with this methodology is the impact of

the throttle temperature on the number of quality events that will be experienced in the

field. Returning to the theoretical 2.2 GHz mobile CPU thin & light systems, multiple

simulations were run where the only variable changed was the temperature at which the

CPU began to throttle. The system parameters were held constant to reflect no change in

the thermal solution. By definition the simulated systems with the lower throttle set point

are thermally under-designed because there is less of a temperature difference to drive the

heat out of the system (see Eq 3.3).

There are two types of events that are of concern in this analysis, frequency maximum

(Fmax) events, and throttling events. Fmax events occur when, due to elevated

temperature, the processor cannot run at its prescribed frequency. Fmax events increase

with increasing CPU temperature. Throttling events occur when a CPU reaches the

throttle set point temperature, and a power reduction is required to not exceed that

temperature. For a given CPU temperature distribution reducing the throttle set point

increases the number of throttling events.

Figure 6.7 shows the simulation results of how the two events vary as a function of the

throttle temperature set point when evaluated with the standard worst-case TDP

application ratio and 35 'C ambient temperature.

60

Page 61: Statistical Usage Models in Mobile Processor Thermal

-- Fmax

o -U- Throttle-- Total

88 90 92 94 96 98 100 102

Throttle Set Point

Figure 6.7: Predicted quality events as a function throttle set point for a 2.2 GHz mobileCPU thin & light system under worst-case operating conditions.

As can be seen in Figure 6.7, in this worst-case scenario throttling events dominate the

Fmax events. To minimize the total number of quality events the throttle set point should

be set at 100 (set points above 100 were not considered).

Using a statistical usage model to provide a more realistic view of the number of

throttling events provides a very different picture. Figure 6.8 represents the simulation

results when the expected five-second average application ratio distribution and ambient

temperature distribution are used to calculate the CPU temperatures and hence the

occurrences of throttling.

61

Page 62: Statistical Usage Models in Mobile Processor Thermal

0

0

8.

98 100 102

Figure 6.8: Predicted quality events as a function throttle set point for a 2.2 GHz mobileCPU thin & light system under statistical usage model.

As shown in figure 6.8, the number of predicted throttling events no longer dominates the

Fmax events when evaluated with the distributed usage model. With the throttle set point

at 100, the expected number of throttling events is five times less than predicted with the

worst-case use model. And, as the throttle set point is reduced from 100 the number of

Fmax events declines more rapidly than the number of throttling events increases.

Therefore to minimize the total number of quality events the throttle set point should be

reduced to 95. This will reduce the total number of quality events by 37 percent and the

number of catastrophic events by 57 percent.

This is counter-intuitive to the traditional thermal design mindset. Recall that no changes

are made to the definition of the thermal solution for this theoretical system. The thermal

62

- Fmax-- Throttle

. Total

94 96

Throttle Set Point

3 90 92

Page 63: Statistical Usage Models in Mobile Processor Thermal

solution is appropriately sized for a CPU with a maximum temperature of 100. As the

CPU temperature limit is reduced, thermal design principals state that a larger thermal

solution is required. Otherwise the system will be considered thermally under-designed.

We know that thermally under-designed systems generate a greater number of quality

events for a given throttle set point. However, by adjusting the throttle set point a trade-

off can be made between Fmax and throttling quality events. As shown in Figure 6.8, the

number of events can go down when system is thermally under-designed by virtue of a

lower throttle set point. Of course, lowering the throttle set point and specifying a larger

thermal solution could achieve even lower event rates. That would place additional

burden on system designers and OEM's, and runs counter to the trend of making systems

smaller and lighter.

Perhaps the most important aspect of these results is that the thermal design process need

no longer be one-dimensional. The size and capacity of the thermal solution is not the

only variable Intel can manipulate in its thermal design and specification process. By

using the throttle set point, the thermal solution size, a realistic representation and

understanding of throttling and Fmax events, and an understanding of the OEM thermal

solution capabilities, Intel can better optimize the system.

It is also important to understand the impact on the CPUs of the proposed change to a

95*C throttle set point. Figures 6.9 and 6.10 show detailed information on the simulated

CPU temperature distributions for a 2.2 GHz mobile CPU thin & light system with a

throttle set point of 100*C and 95*C, respectively.

63

Page 64: Statistical Usage Models in Mobile Processor Thermal

Quantiles I Moments100.0% maximum 98.723 Mean99.5% 92.469 Std Dev97.5% 88.130 Std Err Mean90.0% 82.639 upper 95% M75.0% quartile 77.197 lawer 95% M50.0% median 70.920 N

'4 15

-3 2

-2S C

.75- a01-- z

25-

ni

--2

-0.05 8-0.04

-0.042

40 50 60 70 80 90 100Tj in Use (deg C)

eanean

70.831249.07539780.051444570.93207470.730407

31121

quartile b4.4U58.97353.01148.213

minimum 38.949

Figure 6.9: Predicted CPU temperature distribution (Tj) in use for a 2.2 GHz mobileCPU with throttle set point at 100*C in a thin & light system

64

.25.0%10.0%2.5%0.5%0.0%

Page 65: Statistical Usage Models in Mobile Processor Thermal

Quantiles Moments

100.0% maximum 97.162 Mean99.5% 92.418 Std Dev97.5% 88.130 Std Err Mean90.0% 82.639 upper 95% Mean75.0% quartile 77.197 lower 95% Mean50.0% median 70.920 N25.0% quartile 64.54810.0% 58.9732.5% 53.0110.5% 48.2130.0% minimum 38.949

-I70.82B1789.06733840.051398870.92892170.727434

31121

Figure 6.10: Predicted CPU temperature distribution (Tj) in use for a 2.2 GHz mobileCPU with throttle set point at 1 00*C in a thin & light system

As can be seen in a comparison of Figures 6.9 and 6.10, reducing the throttle set point

from 100*C to 95*C has a negligible impact on the overall temperature distribution. This

is expected as the throttle set point only affects the far right tail of the distribution.

Despite the minor effect on the overall temperature distribution there is a noteworthy

effect on the number of units that throttle, and the performance impact of the throttling.

A comparison of the summary data is presented in Table 6.3.

65

-3- 3 .

-2 :3as Gm- 1 ~

0

.r-1

.1a0"5- --2DI- --13

m.- -32

L -- 1.------ i

-2500

2000-.!-1500

40 U

40 50 60 70 80 90

Page 66: Statistical Usage Models in Mobile Processor Thermal

Table 6.3: Operating characteristics predicted with statistical usage model of 2.2 GHzmobile CPU thin & light systems.

Throttle Set Point Throttle Set Point100 0C 95 0C

CPUs throttling 0.01% 0.15%(90% upper confidence limit)Maximum power reduction 9.7% 19.9%requiredMaximum performance 16.4% 43.4%reduction

The 90 percent upper confidence limit of the number of units throttling has increased by

a factor of 15, but it is still less than 1 in 650. The maximum power reduction and the

associated performance loss to keep all of the CPU temperatures within specification

have also more than doubled.

Figures 6.11 a & b show the distribution of the power reduction required to keep the

throttling CPUs at the 95*C temperature limit.

66

Page 67: Statistical Usage Models in Mobile Processor Thermal

1.0-

-0.20 0.9-0.8-0.7 -

-0.15 - 0.7D.E5 -~0.5-

-0.10 _

a0.3--0.05 0.2-

0.1

0 .02 .04 .06 .08 .1 .12 .14 .16 .18 .2 0 .05 .1 .15 .2

Power Reduction Power Reduction

(a) (b)

Figure 6.11 a &b: Predicted power reductions for throttling 2.2 GHz mobile CPU withthrottle set point at 95*C in thin & light systems using statistical usage model (a)

Histogram (b) Cumulative distribution function.

Fifty percent of the CPU's that throttle need less than a five percent power reduction, and

90+ percent of the CPU's that throttle need less than 10 percent power reduction to stay

within the maximum temperature specification. Again, the one outlier at 20 percent

power reduction is the result of the extremely rare instance of a high ambient temperature

(Ta > 33C), an extremely high application ratio (AR > 0.85), and a high power CPU.

Figures 6.12a & b show the results of translating the power reductions into performance

reduction metrics. Over 90 percent of the throttling CPU's would suffer less than five

percent performance loss. The remaining few percent of the throttling population may

suffer significant performance loss, or even experience some functional problems,

however these would be at most 1 in 100,000 of the total population.

67

Page 68: Statistical Usage Models in Mobile Processor Thermal

-1.00

0.75

-0.50B

0.25

0 .1 .2 .3 .4Performance Reduction

1.0-

0.9-0.8-

0.7-

no 0.6

~0.40.3-

0.2

0.1

0 .05 .1 .15 .2 .25 .3 .35 .4 .45Performance Reduction

(a) (D)

Figure 6.12a &b: Predicted performance reductions for throttling 2.2 GHz mobile CPUwith throttle set point at 95*C in thin & light systems using statistical usage model (a)

Histogram (b) Cumulative distribution function.

This chapter reviews the results of the three simulation comparisons: worst-case

operating conditions vs. statistical usage model, mobile system vs. desktop arbitrage

system, and 100 *C vs. 95 *C throttle set point. The next chapter concludes with

recommendations and future uses for the methodology developed.

68

-

I I I I I I I I ~1

r-

Page 69: Statistical Usage Models in Mobile Processor Thermal

Chapter 7 - Recommendations and Future Use

The results of this work point towards three topics for future consideration. The first

focuses on validating the process. The second involves new ways of segmenting the

market and opportunities for new products. The third focuses on strategic relationships

within the industry.

7.1 Validation

One of the barriers to adopting this usage model based methodology is that some people,

engineers in particular, do not believe that users can accurately report on a survey the

percent time that they use each software type. Admittedly the data is not precise as most

respondents parsed their time down to five or ten percent increments, and the accuracy of

any one user is questionable. However in the opinion of the author the data accurately

represents the mobile user population as a whole. We could not determine any reason that

there would be systematic bias in the data.

To overcome this barrier a validation of the survey results is desirable. One method is to

place software on the laptops of a representative group of mobile users that records the

applications being used. The software can periodically sample and record the application

data from the Windows Task Manager shown in Figure 7.1.

69

Page 70: Statistical Usage Models in Mobile Processor Thermal

Figure 7.1: Windows Task Manager showing applications being used.

The aggregate data sampled from the Windows Task Manager would then be compared

to the aggregate data from the end user survey. The distribution of software use should be

similar. The group of users having their Windows Task Manager sampled could also be

given a use survey to fill out. A direct comparison of reported use pattern to measured

use pattern on a per user basis could be performed. In many ways this validation

approach is the simplest because it is independent of the computer system hardware. It

focuses only on the software applications.

70

Page 71: Statistical Usage Models in Mobile Processor Thermal

It would be tempting to sample the CPU Usage function from the Windows Task

Manager and use that as a proxy for the software application ratio. This would skip the

step of measuring and assigning an application ratio to each software type. Unfortunately

the CPU Usage in the Windows Task Manager does not correlate well with the power

consumed by the CPU [8].

The most important output of this methodology and simulation is the CPU temperature

distribution profile. This is the variable that drives throttling and performance impact.

Ideally one would like to validate the temperature distribution output from the simulation

to real-world data. This is possible, software can be placed on a computer that samples

and records the CPU temperature. However to make a valid comparison one would need

a appropriately sized representative group of mobile users, all of whom had the same

computer system and processor. The logistics of coordinating such a study is daunting.

7.2 Market Segments and Product Positioning

Section 6.2 shows that 3.06 GHz desktop processor in a transportable form factor may

not perform to its full potential for all users. In fact, because they are thermally under-

designed and cannot properly cool the microprocessor, as many as 10 percent of those

systems may fail to function properly for their end-users. However there is significant

demand for higher power processors in the transportable segment as evidenced by the

increase in desktop arbitrage. Using the simulation and methodology presented here as a

path-finding tool, Intel could determine how powerful a processor can operate adequately

71

Page 72: Statistical Usage Models in Mobile Processor Thermal

in a transportable form factor. They could then increase its mobile product offerings to

include processors up to that power level that also have other mobile specific features

enabled.

Section 8.3 shows that reducing the throttle set point to 95 0C on a traditional mobile

processor can reduce the number of total quality events that the general population

experiences by 37 percent and the catastrophic events by 57 percent. However this

would leave a small group, approximately 0.1 percent of the population, of high power

users who would be stressing their processor to the point that they would suffer

significant performance loss.

One solution is to allow those users to identify themselves as high power users and offer

them a processor and/or system that meets their needs for a premium price. From a

technical perspective there are several ways to satisfy their demands. Intel could select

processors from the screening at the end of the production line that can operate reliably at

higher temperatures, set the throttle set point on those processors at 100 0C, and sell them

into the premium segment. Or Intel could select the processors that have the lowest

leakage current, and therefore run at the lowest power and lower temperatures, and sell

then into the premium segment. A third option is that Intel could enter into closer

partnerships with one or more of the OEM's and have them offer a premium system with

a more robust thermal solution for the power user segment. The importance of these

partnerships is discussed in the next section.

72

Page 73: Statistical Usage Models in Mobile Processor Thermal

7.3 Industry Dynamics and Strategic Partnerships

Much of Intel's success in the computer industry is attributed to the modular product and

supply chain architecture of the personal computer [13]. The relationship between

system thermal performance and CPU performance analyzed in this thesis points to a

product architecture that is becoming more integral. In particular, as processing power

increases and mobile systems become smaller, overall system design becomes more

important and needs to be better integrated, as predicted by Whitney [14]

This could be problematic for Intel in the future. Intel has built the "Intel Inside*" brand

on the premise that the processor is the most important feature of a PC, more so than even

the OEM brand. However if overall system design begins to be the constraint on

performance, consumers may place less emphasis on processor brand and more on the

OEM or system designer. This would be a significant shift in the power of the industry

players and a major loss for Intel.

Intel should use the resource of its brand strength to leverage themselves into other areas

of the mobile PC architecture, particularly in areas where integration is becoming more

important. A recent example is the launch of Intel@ CentrinoTM mobile technology [15].

The Intel@ CentrinoTM mobile technology is package consisting of a CPU, chipset, and

wireless network connection. We suggest that Intel also form closer partnerships with the

OEMs and system designers. As the CPU performance gains greater dependence on the

73

Page 74: Statistical Usage Models in Mobile Processor Thermal

thermal system performance, it becomes in Intel's best interest to do all that it can to help

its customers design successful systems. Closer relationships with the OEMs would

allow Intel more input to the design and verification of thermal solutions, and perhaps the

next incarnation of "Intel Inside*" will include not only a CPU, chipset and network

connection, but a promise of thermal system and processor performance as well. This

may be crucial for Intel's survival if the double helix industry/ product structure, as

described by Fine in Clockspeed, continues to head towards more integration [13].

74

Page 75: Statistical Usage Models in Mobile Processor Thermal

Appendix A: Selected Questions and Sample Answersfrom End-User Mobile Choice Survey

@ 2002 Intel Corporation8. Thinking of a typical week, please allocate your notebook usage across each of the following

environments, so that the total equals 100%. Please enter '0' to any option not applicable to.

1. Office (not home office) %2. Office setting in home %3. Home (other than desk/office setting) %4. Airport %5. Car, train, bus, or airplane %6. Hotel %7. Dorm room %8. Classroom or library %9. Outdoors %10. Other location (please specify) %

12. Approximately, how m any hours each week is your notebook computer turned on? Please provideyour best estimate and record whole hours only.

13. When it is turned on, approximately, how many hours each week do you run your notebookcomputer from its batteries, rather than plugging it into a wall outlet? Please record whole hoursonly.

14. Approximately, how many hours each week are you actively using your notebook computer?Please provide your best estimate and record whole hours only.

21 .Of the time that you, personally, spend using your laptop in an average week, what percent of thetime do you spend on each of the following activities, for either business or personal use? Pleaseallocate your time spent on each activity so that the total adds up to 100%. Please enter '0' for anyactivity that does not apply to you.

1. Word processing (e.g. Microsoft Word) %2. Spreadsheets (e.g. Excel) %3. Presentation creation (e.g. Powerpoint) %4. Email %5. Internet browsing (excluding email) %6. Games %7. Audio/MP3 %8. Video - Viewing DVD's %9. Video - Downloading video from the Internet %11. Video - Inputting video files to other devices, e.g. %

camcorders or digital cameras11. Graphics/CAD %12. Web content creation %13. Programming Tools %14. Simulations/Math Models %

75

Page 76: Statistical Usage Models in Mobile Processor Thermal

00.. .......... - ........ ...0

............................05............... .. .....00............... ................ .... -0:

01b il,- -," .......... .

0)............ 0 - .......... .............. - .......... .

G05 Q..... ........ 0..........

0 010:......... .. .......... ...............0: 1 a... ...... .

0: OiG G

. .... ........... i...........3: 2'.......... ...............0: 0..

O001.......... .................................. ........

q00:...... ....0i U... .......... ..... ....................0: G...... ... .. ... ..............

0:0;.......... ...01G

30............. ....

.......... .................0 :

G................ 2 .5- 1 ...............................

...... ...........0.:0:On.. ...... ...... .... ...........

0:25............ -............

010:0.0:

1 901, 01 101 0 D................ .... ... .. ......... ....... ... ... .. .. ...... .... .... .... ........ ... ...... .. ........2: 75: 0: 6 0 U............. ......3: 0: 01: 99: 0 0............ ......... .. ........ ..4: 0: 100 0: 0 0................................. ............. ................... ........... ......................... 1- ........... ........................................ .............................. ................... .............. ....5: 46 0: 10: 15 15.

.................................... ........... ........... I ...................................... ....................... ............ ........................... .......... .................. .................................6: 40 102 0: 10 0

. .... ... ....... .......... ..........7: 0:: 100 On 01... ........... .--- ........ ............ .... ....... ....... ....

p 8: 0: 0. 100 0i Op...... .. ... .................................. ....................... . ..........9: 01 100 0:............. ..... ..... ...w0: ...... b.10 80: 20, b,.............. .... .... ... ....... ................................ .............

............................... .................................................... .................. ............ ............................... ... ....... ........... ..... ........... ...........................

.. ..........NOTEBOOK USAGEo time using respective applicatiph.. ..... .. ..... ........... .............

10.

........................ ....................................3 2.

............................ .............. ........................... .4 10:..........5 15............6 1 5........ .. ...... ... ............ ........... .......................... .............................7 0:8 50:....... ................ ... .......9 901..... ........... ........ ..........10 10i........... ..... .. ...........

........................... 3 .......... -............. .................. ...................... .. ........ .......... .. ...............

... ....... ..... .....

Video- CAD Math/input graphicsl Web content.Prog tooli. Sim

16. 001 001 00: 001 g:

2. 0...........0: 00. 00: 00. G.... ... ......... ... ....

Ga

.... ................ ................. * .............. .... ... .......... .......... ...... .... ......... .......NOTEBOOK USAGI!Yo time in each environment---- - ----------------- ................ ..... . .. ... .. ....... ..---- ---------- ..... .............. .. ........

Home Office' Home :Ai hrs baffimUser ID Office ar ect 'Hotel Dorm [Classroor,60utdoors Other hrs on/Wk- /wli hrQ ",za/WV-.

............. 0 ....... ........0 0i

002

.... ......

15 01.......................... .................................. .. ... .... .......10 .0 .0 U0 01............ .............................. ..........0: 0:0: 0:

........... ............ ... .. .. .

Uz

..................... ...0!z.................. ..........................U........ ...

01G.................................. .........010!

..... ............. .............................

ing Video-

U.....................

............0............................... ....0......................... ...... ....0

0!...........U

..........

01OL0:

..............6,01.............. :0:0:0.

0:

40....................... ...................... .................... .. ...........6:

7: 0:............. I ................. .......................................... .............53 15:................................ .................................................. .... ... ...........35 201

6:....... .. ...... .10: G............. -1 : ................ ........................... ...........10 0:

20 01.... ........ ... ... ... .......

6.9:71............. .....................

53 : ..............2521.....................

6:20...... .........

PresUser ID WP spreadsheet creation

InternetEmail browsinq' S

............

Audio/: Viewmes MP3 DVD download

10: 2,oil40,............ ....................

2:...................................1 G 015 0

.............. .............................. ............0: 0'G 015: 3.......... ........................ , ...........

I' .1111.11, 12 0 ,;. , l'.1 '..",., .......... ..-0

__,____2005

56

..................8

O

60!

00........... ......

85.. .............. ............ -.......30

................................... 5

..............2:............0:

Page 77: Statistical Usage Models in Mobile Processor Thermal

Bibliography

[1] Building the power-efficient PC : a developer's guide to ACPI power management /Hillsboro, OR: Intel Press, c2001.

[2] Pabst, Thomas and Volkel, Frank "Hot Spot- How Modem Processors Cope with HeatEmergencies" September 17, 2001, Tom's Hardware Guide accessed athttp://www6.tomshardware.com/cpu/20010917/index.html

[3] Rajsuman, Rochit. Digital hardware testing : transistor-level fault modeling and testing/ Boston: Artech House, c1992.

[4] Reliability challenges in a large microprocessor below quarter micron technology[Cambridge, Mass.]: MIT Microsystems Technology Laboratories, c1998

[5] Design of high-performance microprocessor circuits / New York: IEEE Press, c200 1.

[6] Burd, Thomas D. Energy efficient microprocessor design / Boston : Kluwer AcademicPublishers, c2002.

[7] Phanse, Vivek "Enabling Technology for Mobile Computing" MS thesis, MIT c2000

[8] Intel Corporation, Jim Hermerding

[9] Pukite, Jan, Modeling for reliability analysis : Markov modeling for reliability,maintainability, safety, and supportability analyses of complex computer systems / NewYork : IEEE Press, c1998.

[10] An overview of MobileMark 2002 : Business Applications Performance Corporationaccessed at http://www.bapco.com/MobileMark2002WhitePaper.pdf

[11] An overview of SYSMark 2002 : Business Applications Performance Corporation accessedat http://www.bapco.com/SYSmark2002Methodology.pdf

[12] Los Angles Superior Court, Case # BC278163, Class action v. Toshiba AmericaIncorporated, July 23, 2002

[13] Fine, Charles, Clockspeed / Reading : Perseus Books, c1998

[14] Whitney, Daniel "Why Mechanical Design Cannot Be Like VLSI Design" April 4, 1996accessed at http://web.mit.edu/ctpid/www/Whitney/papers.html

[15] http://www.intel.com/products/mobiletechnology/index.htm?iid=sr+centrino&

77

Page 78: Statistical Usage Models in Mobile Processor Thermal

-,- .