Probabilistic design for reliability (pdfr) in electronics part2of2

Probabilistic Design for Reliability (PDfR) in

El iElectronicsPart IIPart IIDr. E. Suhir

©2011 ASQ & Presentation SuhirPresented live on Jan 03~06th, 2011

http://reliabilitycalendar.org/The_Reliability Calendar/Short Courses/Shliability_Calendar/Short_Courses/Short_Courses.html

ASQ Reliability DivisionASQ Reliability Division Short Course SeriesShort Course Series

One of the monthly webinarsOne of the monthly webinars on topics of interest to reliability engineers.

To view recorded webinar (available to ASQ Reliability ) /Division members only) visit asq.org/reliability

To sign up for the free and available to anyone live webinars visit reliabilitycalendar.org and select English Webinars to find links to register for upcoming events

http://reliabilitycalendar.org/The_Reliability Calendar/Short Courses/Shliability_Calendar/Short_Courses/Short_Courses.html

Dr. E. Suhir Page 1

PROBABILISTIC DESIGN for RELIABILITY (PDfR) CONCEPT,

the Roles of Failure Oriented Accelerated Testing (FOAT)

and Predictive Modeling (PM), and

a Novel Approach to Qualification Testing (QT)

“You can see a lot by observing”

Yogi Berra, American Baseball Player

“It is easy to see, it is hard to foresee”

Benjamin Franklin, American Scientist and Statesman

E. Suhir Bell Laboratories, Physical Sciences and Engineering Research Division, Murray Hill, NJ (ret),

University of California, Dept. of Electrical Engineering, Santa Cruz, CA,

University of Maryland, Dept. of Mechanical Engineering, College Park, MD, and

ERS Co. LLC, 727 Alvina Ct. Los Altos, CA, 94024, USA

Tel. 650-969-1530, cell. 408-410-0886, e-mail: [email protected]

Four hour ASQ-IEEE RS Webinar short course

January 3-6, 2011

Dr. E. Suhir Page 2

Contents

Session I

1. Introduction: background, motivation, incentive

2. Reliability engineering as part of applied probability and Probabilistic Risk

Management (PRM) bodies of knowledge

3. Failure Oriented Accelerated Testing (FOAT): its role, attributes, challenges, pitfalls

and interaction with other accelerated test categories

Session II

4. Predictive Modeling (PM): FOAT cannot do without it

5. Example of a FOAT: physics, modeling, experimentation, prediction

Session III

6. Probabilistic Design for Reliability (PDfR), its role and significance

Session IV

7. General PDfR approach using probability density functions (pdf)

8. Twelve steps to be conducted to add value to the existing practice

9. Do electronic industries need new approaches to qualify their devices into products?

10. Concluding remarks

Dr. E. Suhir Page 60

Session III

6. PROBABILISTIC DESIGN FOR RELIABILITY,

ITS ROLE and SIGNIFICANCE

“Probable is what usually happens”

Aristotle, Greek philosopher

“Probability is the very guide of life”

Marcus Tullius Cicero,

Roman philosopher and statesman


Design-for-Reliability

� Design for reliability (DfR) is a set of approaches, methods and best practices that are supposed to be used during the design phase of the product to minimize the likelihood (risk) that the product will not meet the reliability requirements, objectives and expectations.

� While 50% of the total actual cost of an electronic product is due to the cost of materials, 15% - to the cost of labor, 30% to the overhead costs and only 5% tothe design effort, this effort influences about 70% of the total cost of the product (“Six Sigma”, M. Harry and R. Schroeder).

� If reliability is taken care of during the design phase, the final cost of the product does not go up. If a reliability problem is detected during engineering the cost of the product goes up by a factor of 10. If the problem is caught in production phase, the cost of the product increases by a factor of 100 or more.


Deterministic approach

� Deterministic approach is based on the concept that reliability is assured by introducing a sufficiently high deterministic safety factor, which is defined as the ratio of the capacity (“strength”) C of the system to the demand (“load”) D:

� The level of the safety factor SF is being chosen depending on the consequences of failure, acceptable risks, the available and trustworthy information about the capacity and the demand, the accuracy with which these characteristics are determined, possible costs and social benefits, variability of materials and structural parameters, construction (manufacturing, fabrication) procedures, etc.

� In a particular problem the capacity and demand could be different from the strength and load, and the role of these characteristics can be replaced by, say, acceptable and actual current, voltage, light intensity, electrical resistance; traffic capacity and traffic flow; culvert size and the quantity of water; critical (buckling) and actual compressive stresses; etc.

� The safety factors in engineering are being established from the previous experiences for the considered system in its anticipated environmental or operation conditions.

.D

CSF==δ


Probabilistic approach

� Probabilistic DfR (PDfR) approach is based on the probabilistic risk management

(PRM) concept, and if applied broadly and consistently, brings in the probability

measure (dimension) to each of the design characteristics of interest. Using AT data

and particularly FOAT data, and PM techniques, it enables one to establish the

probability of the possible (anticipated) failure under the given operation conditions

and for the given moment of time in operation

� After the probabilistic PMs are developed, one should use sensitivity analyses to

determine the most feasible materials and geometric characteristics of the design, so

that the lowest probability of failure is achieved

� In other cases, the probabilistic DfR approach enables one to find the most feasible

compromise between the reliability and cost effectiveness of the product

� When probabilistic DfR (PRM) approach is used, the reliability criteria (specifications)

are based on the acceptable (allowable) probability of failure for the given product.


Basic Principles Underlying our PDfR Approach-1

� Not all the products require the PDfR approach, but only those for which high

reliability is crucial and for which there is a reason to believe that this probability

might not be high enough for particular applications

� Nobody and nothing is perfect. The difference between a reliable and unreliable

system (device) is in the level of the probability of failure in the field under the

given (anticipated) loading (environmental) conditions and after the given

(specified) time in operation.

� The probability of failure in the field is the ultimate and a “reliable” criterion

(“judge”) of the product’s reliability

� This probability can be established through a specially designed and carefully

conducted DFOAT aimed at understanding the physics of failure and choosing the

right predictive DFOAT model (e.g., Arrhenius, Coffin-Manson, crack propagation,

demand-capacity “interference”, etc.) for the anticipated loading conditions or

their combination (say, thermal+vibrations)



� The reliability of a product is due to the reliability of its one or two most vulnerable

(most unreliable) functional or structural elements, and it is for these elements that the

adequate DFOAT should be designed and conducted

� Sensitivity analyses are a must after the physics of the anticipated failure is

established, the appropriate predictive model is agreed upon, and the acceptable

probability of failure in the field is specified, but prior to the final decision about

launching the mass production of the product

� DFOAT is not necessarily a destructive test, but is always a test to failure, a test to

determine the limits of the reliably operation and the probability that these limits are

exceeded

� DFOAT cannot do without predictive modeling, and it is only through the predictive

modeling that the probability-of-failure in the field could be found (established)



� Time and labor consuming a-posteriori “statistics-of-failure” can be successfully

replaced, to a great extent, by the anticipated a-priori “probability-of-failure”

confirmed by some statistical data (for the mean and STD values of the probability

distribution of interest, but not for the probability-distribution function itself)

� PDfR concept enables one to qualify a viable device (system) into a reliable-in-the-

field product, with the predicted, prescribed (specified) and even, if necessary,

controlled probability of failure in the field

� Technical diagnostics, prognostication and health monitoring could be effective

means to anticipate, establish and prevent possible field failures

� PDfR has to do with the DfR, and not with the Manufacturing-for-Reliability (MfR)

� Burn-ins could be viewed as a special type of FOAT intended for MfR objectives

and are always a must, whatever DfR approach is considered.


Reliability function

� The simplest objects (items) in reliability engineering are those that do not let

themselves to restoration (repair) and have to be replaced after the first failure.

The reliability of such items is due entirely do their dependability, i.e., probability

of non-failure, which is the probability that no failure could possibly occur during

the given period of time. The dependence of this probability of time is known as

the reliability function.

� As any other probability, the dependability of a sufficiently large population of

non-repairable items can be substituted by the frequency, and therefore the

reliability function can be sought as

,

where is the total number of items being tested and is the number of

items that are still sound by the time t .

0

)()(

ttR s=

0 )(ts


Failure rate

� Differentiation the relationship

with respect to time t, we have:

where is the number of the failed items.

� The failure rate is introduced as follows:

As evident from this formula, the failure rate is the ratio of the number of items that

failed by the time t to the number of items that remained sound by this time. The

failure rate characterizes the change in the dependability of an item in the course of its

lifetime.

0

)()(

ttR s=

dt

td

dt

td

dt

tdR fs)(1)(1)(

00

−==

)()( 0 tt sf −=

dt

td

tt

f

s

)(

)(

1)( =λ


Bathtub curve


Probabilistic and statistical definitions

of the reliability function

Considering , the formula

yields: , or

so that . . Hence,

The reliability function R(t) satisfies the obvious initial condition R(0)=1. The

above formula for the reliability function expresses the probabilistic definition of

this function, while the formula

provides its statistical definition.

dt

td

tt

f

s

)(

)(

1)( =λ

dt

td

dt

td

dt

tdR fs)(1)(1)(

00

−==

)()()(1

)()(

00

tRt

t

t

dt

tdR s λλ −=−=

dttR

tdR)(

)(λ−=

∫−=t

dtR0

)()(ln ττλ

−= ∫t

dtR0

)(exp)( ττλ

0

)()(

ttR s=


Exponential formula of reliability (revisited).

Probability of failureWhen the failure rate is time independent, the formula

leads to the exponential formula of reliability:

The function

is the probability density function for the flow of failures, or the failure frequency.

The probability of a failure during the time t can be evaluated as

−= ∫

t

dtR0

)(exp)( ττλ

tetR λ−=)(

−=−= ∫t

dtdt

tdRtf

0

)(exp)()(

)( ττλλ

∫=−=t

dftRtQ0

)()(1)( ττ

Stress-strength (“interference”) conceptThe curve on the right should be obtained experimentally, based on the accelerated life testing and

on the accumulated experience. The bearing capacity of the structure should be such that the

probability of failure, P(t), is sufficiently low, and the safety factor (SF) is not lower than the

specifies value, say, SF=1.4. In a simplified analysis the curve on the right could be substituted,

particularly, by a constant value, which, if a conservative approach is taken, should be sufficiently

low.

Probability density function for a

particular mechanical or thermal

characteristic (response) of the tile

structure to the given environmental factor

at the given moment of time (“Demand”, D)

Capability of the tile structure with respect to the

particular mechanical or thermal loading (may or

may not be time-dependent). In the current analysis

we assume that the bearing capacity for a particular

reliability characteristic is either a constant value or a

normally distributed random variable with a known

(evaluated) mean and standard deviation

(“Capacity”, C)

The larger is the overlap of these two curves, the higher is the probability of failure, and the lower is

the safety factor. After these two curves are evaluated (established) for each reliability characteristic of

interest and for each moment of time (separately, for the take off and landing processes) we evaluate

the probability distributing function, f(ψ), for the safety margin, ψ=C-D, its mean, <ψ>, and standard

deviation, ŝ, and the safety factor, SF= <ψ>/ ŝ. It should not be lower than the specified value, say,

SF=1.4.

Probability of non-failure (dependability)

� The “reliability” (actually, “dependability”) of a non-repairable item is defined as the

probability of non-failure, P = P {C>D}, i.e., as the probability that the item’s bearing

capacity (“strength”), C, during the time, t, of operation under the given stress

conditions, will always be greater than the demand (“loading”), D.

� Although the probability of non-failure is never zero, it can be made, if a probabilistic

approach is used, as low as necessary. If the probability distributions f (C) and g (D)

(probability density functions) for the random variables C and D are known, then the

probability, P, of non-failure (reliability, dependability) can be evaluated as

where f(ψ) is the probability density function of the margin of safety ψ=C-D, which is

also a random variable.

∫∞

=0

)( ψψψ dfP

Safety factor -1

� Direct use of the probability of non-failure is often inconvenient, since, for highly

reliable items, this probability is expressed by a number which is very close to one,

and, for this reason, even significant chan in the item’s (system’s) design, which have

an appreciable impact on the item’s reliability, may have a minor effect on the

probability of non-failure.

� In those cases when both the mean value, <ψ>, and the standard deviation, ŝ, of the

margin of safety (or any other suitable characteristic of the item’s reliability, such as

stress, temperature, displacement, affected area, etc.), are available, the safety factor

(safety index, reliability index)

SF=δ= <ψ>/ŝ

can be used as a suitable reliability criterion.


Safety factor-2

� After the capacity and the demand curves are established for each probability

characteristic of interest and for each moment of time the probability distribution

function for the safety margin should be determined. Then,

for normally distributed capacity and demand, the mean value

of the safety margin and its standard deviation

should be evaluated.

� The safety factor could be found as the ratio of the mean value of the safety margin

to its standard deviation:

)(ψf DC −=Ψ

ψψψψ df∫∞

=0

)(><

ψψψψψ dfs ∫∞

−=0

2))(( ><

ψ

ψδ

sSF

><==


Safety factor-3

�The SF should not be lower than its specified value for the characteristic of interest.

�This value should reflect the state-of-the-art in the given area of engineering, cost and

time-to-market considerations, and should account for the consequences of failure.

�If the computed SF does not meet the specification requirements, the design should be

revised (improved) until the required level of safety (reliability) is met.

�The required level of safety could be established also based on the level of the

probability

of non-failure. This formula defines the probability that the safety margin

is found between the given value and infinity. i.e., is higher than the given (specified)

value of this margin.

∫∞

=ψ

ψψψ dfP )()(

DC−=Ψ

Normal law

� The SF and the probability of exceeding a certain level of the safety margin

are related If the reliability characteristic of interest (such as, e.g., the safety margin,

ψ) is distributed in accordance with the normal law

then the probability of non-failure is related to the safety factor

SF as

P=½[1+Ф(SF)],

where

is the probability integral (Laplace function).

( )ψ

ψψ

πψ

ψψ

ψ dDD

f

−−=

2exp

2

1)(

2

dteФ t

∫−=

α

πα

0

22)(

P SF

0.999000 3.0901

0.999900 3.7194

0.999990 4.5255

0.999999 4.7518

1.0 ∞

)(ψP


Safety factor-4

� SF establishes both the upper limit of the reliability characteristic of interest

(through the mean value of the corresponding margin of safety) and the accuracy

with which this characteristic is defined (through the corresponding standard

deviation).

� The structure of the SF indicates that it is acceptable that a system characterized by

a high mean value of the safety margin (i.e., a system whose bearing capacity with

respect to a certain stress/reliability-characteristic, not necessarily mechanical, is

significantly higher than the level of loading) has a less accurately defined deviation

from this mean value than a system characterized by a low mean value of the safety

margin (i.e., a system whose bearing capacity is much closer to the possible level of

loading). In other words, the uncertainty in the evaluation of the safety margin

should be smaller for a more vulnerable design.


Safety factor (SF) and coefficient of variability (COV)

� Safety factor (SF) is reciprocal to the coefficient of variability (COV). The latter is

defined as the ratio of the standard deviation to the mean value of the random

variable of interest.

� While the COV is the characteristic of uncertainty of the random variable of

interest, the SF is the characteristic of certainty of the random parameter (stress-

at-failure, the highest possible temperature, the ultimate displacement, the

affected area, etc.) that is responsible for the non-failure of the item.

� If the reliability characteristic of interest (for a non-repairable item) is a random

variable that is determined by just two independent non-random quantities (say,

the mean value and the standard deviation), then the safety factor, SF, determines

completely the probability of non-failure (reliability): the larger the SF is, the

higher is the probability of non-failure.

Time-to-failure (TTF), MTTF and the corresponding SF

� Usually the capacity (strength), C, and/or the demand (loading), D, change in time.

Failure occurs, when the demand (loading), D, becomes equal or smaller than the

bearing capacity (strength), C, of the item. This random event is the time-at-failure (TAF), and the duration of operation until this time takes place is the random variable

known as time-to-failure (TTF).

� Thus, TTF is the time from the beginning of operation until the moment of time when

the demand (loading) D becomes equal or higher than the bearing capacity C, i.e.,

when the safety margin becomes zero or negative.

� The corresponding safety factor, SF, is the ratio of the MTTF to the STD of the TTF:

SF=MTTF/STD

DC−=Ψ


Mean time-to-failure and reliability function

Mean-time-to-failure (MTTF) is the mean time of the item operation until it fails.

Hence, it can be computed as . Since

we have (using integration by parts):

and the variance of the TTF can be found as

The corresponding SF is

∫∞

=0

)( tdttft

dt

tdRtf

)()( −=

[ ] ,)()()()(

)(00 0 0

0 ∫∫ ∫ ∫∞∞ ∞ ∞

∞=+−=−== dttRdttRttRtdt

dt

tdRtdttft

2

0

2

0

2

0

2 )(2)())(( ttdttRtdtttfdttttfDt −=−=−= ∫∫∫∞∞∞

tD

t

STD

MTTFSF ===δ

Example #1

As a simple example, examine a device whose MTTF, ,τ during steady-state operation is described

by the Boltzmann-Arrhenius equation .exp0

=

kT

Uττ The failure rate is therefore

.exp11

0

−==kT

U

ττλ If Weibull law is used to predict the probability of failure, then the probability

of non-failure (dependability) can be evaluated on the basis of the following probability distribution

function: [ ] ,expexp)(exp0

−−=−=

β

β

τλ

kT

UttP where β is a shape parameter. Solving

this equation for the absolute temperature ,T we obtain:

( ).

lnln/10

−

−=βτ

Pt

k

UT

Example #1 (cont)

Let for the given type of failure (say, surface charge accumulation), the k

U ratio is ,116000K

k

U=

the 0τ value predicted on the basis of the ALT is 8

0 105 −= xτ hours, and the shape parameter β

turned out to be close to 2=β (Rayleigh distribution). Let the allowable (specified) probability of

failure at the end of the device’s service time of, say, 000,40=t hours be 510−=Q (it is acceptable

that one out of hundred thousand devices fails). Then the above formula indicates that the steady-state

operation temperature should not exceed ,8.768.349 00 CKT == and the thermal management

tools should be designed accordingly. This rather elementary example gives a feeling of how the

PDfR concept works and what kind of information one could expect using it.


Example #2

Let, for instance, the absolute temperature T be distributed in accordance with the

Rayleigh law, so that the probability that a certain level is exceeded is

determined as

where is the most likely value of the absolute temperature T. Then, using the

Boltzmann-Arrhenius relationship

we conclude that the probability that the random MTTF (“random”, because

of the uncertainty in the level of the most likely temperature) is below a certain level

(probability of failure is defined in this case as the probability that the specified level

is not achieved) can be found as

*T

−=

2

0

2

** exp)(

T

TTTP >

0T

=

kT

U aexp0ττ

τ

*τ


Example #2 (cont)

Solving this equation for the most likely (specified) value, we find:

This formula indicates how the (most likely) level of the device temperature should be

established, so that the probability that the specified level of the MTTF is not

achieved is sufficiently low.

−=

−=

2

0

*0

2

0

2

**

ln

expexp)(

τ

τττ

kT

U

T

TP a

>

Pk

UT a

lnln0

*0

−

=

τ

τ

*τ

0T


Reliability of repairable items

� Reliability of complex items (products) depends not only on their dependability,

but on their repairability as well.

� It is important that the products are designed in such a way that their gradual and

potential failures could be easily detected and eliminated in due time, and that the

detected damages (defects), such as, say, fatigue cracks, could be removed

before a catastrophic failure process commences.

� The reliability of complex products is characterized, first of all, by their

availability, which is defined as an ability of an item (system) to perform its

required function at the given time or over a stated period of time, with

consideration of its dependability, repairability, maintainability and maintenance

support.

� A high level of reliability of complex products can be achieved by employing the

most feasible combination of dependability, on one hand, and dependability,

repairability, maintainability and maintenance support, on the other.


Availability index-1

� The non-steady-state (time dependent) operational availability index is defined

as the probability that the item of interest will be available to the user at the given

moment T of time and will operate failure-free during the given time beginning

with the moment t .

� The steady-state availability index K is the time-independent probability that the

item will operate (will be available) failure-free during the time T , beginning with

an arbitrary moment t of time that is sufficiently remote from the beginning of

operations (so that the “infant mortality” portion of the “bathtub” curve is

excluded).

� The most often used availability characteristic of the Class II and Class III items,

whose normal operation includes regular repairs (say, workstations or other

complex and expensive electronic systems), is the availability index defined

as the steady-state probability that the item will be available at the arbitrary

moment of time taken between the preplanned preventive maintenance activities.

)(tK

aK


Availability index-2

� The availability index can be computed by the formula

where is the mean time between successive failures for the i-th item in the

system, and is the mean-time-to-repair for this item.

� The index indicates the percentage of time, during which the system is in the

working (available) condition.

� The use of the index enables one to make assessments of the unforeseen

idle times and to consider these times at early stages of the design of the product.

aK

∑=

+

=n

if

i

r

i

a

t

tK

1

1

1

f

it rit

aK

aK


Operational Availability Index

� The operational availability index can be calculated for situations,

when the probability of failure-free operation during the time interval t is

independent of the beginning of this interval, by the formula

where R(t) is the dependability of the item.

� This formula determines the probability that two events take place:

1) the item is available at the arbitrary moment of time with the probability and

2) will operate failure-free during the time period of the duration t.

)(tK

)()( tRKtK a=

aK


Session IV

7. GENERAL PDfR APPROACH

USING PROBABILITY DENSITY FUNCTIONS (PDF)

“Education is man’s going forward from cocksure ignorance to thoughtful uncertainty”,

Donald B. Clark, Australian author, “Scrapbook”

“There are things in this world, far more important than the most splended discoveries –

It is the methods by which they were made”

Gottfried Leibnitz, German mathematician


PDfR Characteristics

� The appropriate electrical, optical, mechanical, thermal, and other physical

characteristics that determine the functional performance, mechanical

(physical/structural) reliability and/or environmental durability of the

design/device/apparatus of interest should be established.

� Examples of are: appropriate electrical parameters (current, voltage, etc.), light

output, heat transfer capability, mechanical ultimate and fatigue strength, fracture

toughness, maximum and/or minimum temperatures, maximum

accelerations/decelerations, etc.


Factors that affect the PDfR characteristics-1

� Establish the electrical, optical, mechanical, thermal, environmental and other

possible (say, human) stress (loading) factors (conditions) that might affect the

reliability characteristics, i.e., characteristics that determine (affect) the short- and

long-term reliability of the object (structure) of interest.

� Examples are: high an/or low temperatures, high electrical current or voltage,

electrical and/or optical properties of materials, mechanical and thermal stresses,

displacements, maximum temperatures, size of the affected areas, etc.

� This should be one separately for each characteristic of interest and, if necessary,

for each manufacturing process and for different phases of manufacturing, testing

and/or operations



� Based on the physical nature of the particular environmental/loading factor

(electrical, optical, mechanical, environmental) and on the available information of

it, establish if this factor should be treated as a non-random (deterministic) value,

or should/could be treated as a random variable with the given (assumed)

probability distribution function.

� At this stage one could treat random characteristics of interest as nonrandom

functions of random factors, and establish the probability distribution functions

for the random factors using experimental data, and/or Monte-Carlo simulations,

and/or finite-element analyses (FEA), and/or evaluations based on analytical

(“mathematical”) modeling, etc.



Let, for instance, the absolute temperature T be distributed in accordance with the

Rayleigh law, so that the probability that a certain level is exceeded is

determined as

where is the most likely value of the absolute temperature T.

Then, using the Boltzmann-Arrhenius relationship

we conclude that the probability that the random mean-time-to-failure (“random”,

because of the uncertainty in the level of the most likely temperature) is

below a certain level

*T

−=

2

0

2

** exp)(

T

TTTP >

0T

=

kT

U aexp0ττ

τ

*τ



(probability of failure that is define in this case as the probability that the specified

level is not achieved) can be found as

Solving this equation for the value, we find:

This formula indicates how the (most likely) level of the device temperature should be

established, so that the probability that the specified level of the MTTF is not

achieved is sufficiently low.

−=

−=

2

0

*0

2

0

2

**

ln

expexp)(

τ

τττ

kT

U

T

TP a

>

)( *ττ >P

Pk

UT a

lnln0

*0

−

=

τ

τ

*τ


Choose appropriate basic probability distributions-1

� After the reliability characteristics are established and the factors affecting these

characteristics are selected , one should choose the adequate probability

distributions for the factors (conditions) that affect the short- and long-term

reliability characteristics.

� For those factors (conditions) that should be treated as random variables,

establish (accept) the physically meaningful probability distribution laws.

� When the actual experimental information is not available, assume, based on

general physical considerations, the most suitable (or the most conservative)

laws of the probability distribution (e.g., uniform, exponential, normal, Weibull,

Rayleigh, etc.).



Here are some general considerations that can be used in practical applications.

�Since the exponential distribution has the largest entropy (the largest uncertainty)

of all the distributions with the same mean, this distribution should be considered, if

no other information, except the expected (mean) value, is available. The

exponentially distributed random variable is always positive. The safety factor for an

exponentially distributed random variable is always “one”.

�If the random process of failures can be treated as a simple Poisson flow with a

constant intensity, then the time interval between two adjacent consecutive failures

has an exponential distribution. The most likely value of the exponentially distributed

random variable, t, is at the initial moment of time t=0.



� If the physical nature of a random environmental factor is such that it can be only

positive (i.e., acceleration during take off of an aircraft, or a current for an

electronic module) or only negative (i.e., deceleration during landing or during

drop tests of a cell phone), its most likely value is certainly non-zero.

� If only this value (or the mean) is available, then the Rayleigh law could be

employed. This law is also (like the exponential law) a single-parametric law.

� The safety factor, when Rayleigh distribution is used, is always

6633.0

41

1=

+

=

π

δ



� If a normally distributed random variable has a finite variance and zero mean, and

changes periodically with a constant or next-to-constant frequency, but with a

random amplitude and random phase angle, then these amplitudes and the

corresponding energies obey the Rayleigh law of distribution.

� If the expected (mean) value and the variance are known, and the physical nature

of the random environmental factor is such that the probability density function is

symmetric with respect to the mean value (which coincides with the median and

the most likely value), then the normal distribution should be accepted, especially

(but not necessarily) if the random variable can be either positive or negative.



� It is noteworthy that if the safety factor defined as the ratio of the mean value of

the safety margin to its standard deviation, is significant (which is typically the

case), then application of the normal law of the distribution of the safety factor is

acceptable: its negative values, although are possible in principle, are

characterized by negligibly low probabilities and need not be considered.

� If the expected (mean) value and the variance are known, and the physical nature

of the random environmental factor is such that the probability density function is

highly asymmetric (skewed) with respect to its mean or the most likely value,

then Weibull distribution, or the distribution of the absolute value of a normal

random variable, or a truncated normal distribution, or a log-normal distribution

can be used.


Establish appropriate cumulative probability distributions-1

� Treating each reliability characteristic of interest as a non-random function

(output) of a random argument (input) due to a particular external or internal

factor, evaluate the probability density function of this characteristic for the

assumed (accepted, determined) law of the probability distribution of the

environmental factor.

� Time could enter as an independent parameter into the computed response.

� For some factors, the input could be considered as a non-random (deterministic)

value.



� Determine the cumulative probability distribution functions for all the probability

density functions that affect the given mechanical or thermal characteristic of

interest.

� Such a convolution of the constituent laws of distribution considers, in the most

accurate and non-conservative way, the probabilistic input of each of the

environmental parameters that affect the particular mechanical, electrical, optical

or thermal characteristic.

� Cumulative distributions consider the likelihood that the maxima of different

important factors might not occur simultaneously



� If the number of random variables does not exceed two, the convolution could be

carried out analytically.

� If the number of random variables is three or more, one should “teach” a

computer how to obtain a cumulative law of distribution.

� Since the above distributions are based on the transient responses of the

mechanical (thermal) characteristics of interest to the time-dependent

environmental excitations (parameters), these distributions determine the

probability that at the given moment of time the given characteristic is

below/above the given value of this characteristic.


Probabilistic reliability criteria

Determine for each point of time, after the given duration of operation (mission):

� the safety factors and other reliability criteria for the characteristics that

determine the performance, reliability, durability and safety of the system,

� the probability of non-failure, P (t), for the established (accepted) safety factor, at

each point of time, and

� the mean time-to-failure, MTTF, for the established (accepted) safety factor,

standard deviation, STD, of the time-to-failure and safety factor SF=MTTF/STD for

the time-to-failure.


8. Twelve steps to be conducted

to add value to the existing practice

“The man who removes a mountain begins by carrying away small stones”

Chinese saying

“Give me a fruitful error any time, full of seeds, bursting with

its own corrections. You can keep your sterile truth for yourself”

Vilfredo Pareto, Italian engineer, sociologist, and economist


Some important preliminary steps

� Establish, as the manufacturer of a particular product, the list of possible failures and

suitable failure criteria, as far as the functional, mechanical (physical) and

environmental failures are concerned.

� Find out the similar requirements that the customer specifies (desires) regarding

lifetimes (minimum and mean time to failure), failure rates (considering, for a particular

product, if necessary, the wear-out portion of the bath-tub curve), probability of failure

(for non-reparable products), availability specifications, etc.

� Identify active and passive parts, reparable and non-reparable parts, the most

vulnerable (least reliably) parts (e.g., solder joint interconnections, materials prone to

creep or aging, etc.), the feasibility of introducing redundancy, etc.

� As a customer, evaluate the ability of a particular manufacturer, to make parts with

consistent quality, and, as a manufacturer, establish your company’s ability to

produce such parts.


Twelve steps to be conducted to add value to the

existing practice-11) Develop a detailed list of possible electrical, mechanical (structural), thermal, and

environmental failures that should be considered, in one way or another, in the

particular design (package, invertor, module, structure, etc.)

2) Make, based on the existing experience and best practices, the preliminary decision on

the materials and geometries in the physical design and packaging of the product and

its units/subunits/assemblies

3) Conduct predictive modeling (using FEA or other simulation packages, as well as

analytical/"mathematical" wherever possible) of the stresses and other failure criteria

(say, elevated temperatures or electrical characteristics), considering steady state

and transient thermal, stress/strain and electrical fields

4)Consider possible loading in actual use conditions (electrical, thermal, mechanical,

dynamic, as well as their combinations) and distinguish between short-term high-

level loading (related to the ultimate strength of the structure) and long-term low-level

loading (related to the fatigue strength of the structure)


Twelve steps to be conducted to add value to the existing practice-2

5) Review the existing qualification standards for the similar structures, having in mind,

however, that these standards were designed, although for similar, but for different

(power, geometry, materials, use) conditions, than what we will be dealing with; come

up with the preliminary level of acceptable stresses, accelerations, temperatures,

voltages, currents, etc.

6) Having in mind FOAT procedures, decide on the constitutive relationships (formulas,

FEA procedures, plots) that govern the failure mechanisms in question (Arrhenius type

of equations for high temperature "baking", Minor type- for the materials that are

expected to work within the elastic range, Erdogan-Paris type - for brittle materials,

etc.)

7) Design, conduct and interpret the results of the FOAT and, based on this testing,

predict the reliability characteristics of the assemblies, joints, subunits and units of

interest


Twelve steps to be conducted to add value to the existing practice-3

8) Based on the obtained information, the state-of-the-art in the area in question and the

requirements of the existing specifications, decide on the allowable (acceptable)

values of the characteristics of failure, with consideration of the economically and

technically feasible lifetime of the module and its major subassemblies

9) Write first draft of the qualification specs (in other words, revise, if necessary, the

existing JEDEC specs) for the module and its unites/subunits of interest

10) Develop root cause analysis (RCA) methodologies

11) Decide on the burn-in conditions and establish adequate service for collecting field

failures

12) Conduct, on the permanent basis, revisions of the designs and the reliability

specifications.

9. DO ELECTRONIC INDUSTRIES

NEED NEW APPROACHES

TO QUALIFY THEIR DEVICES INTO PRODUCTS?

“I do not need an everlasting pen. I do not intend to live forever”

Ilf and E. Petrov, “The Golden Calf” (in Russian)

“It is always better to be approximately right than precisely wrong”

Unknown Reliability Manager


Nobody and nothing is perfect:

probability of failure is never zero

� It should be widely recognized that the probability of a failure is never zero, but could

be predicted and, if necessary, controlled and maintained at an acceptable low level

� One effective way to achieve this is to implement the existing methods and

approaches of PRM techniques and to develop adequate PDfR methodologies

� These methodologies should be based mostly on FOAT and on a widely employed

predictive modeling effort

� FOAT should be carried out in a relatively narrow but highly focused and time-

effective fashion for the most vulnerable elements of the design of interest

� If the QT has a solid basis in FOAT, PM and PDfR, then there is reason to believe that

the product of interest will be sufficiently robust in the field.


QT could be viewed as “quasi-FOAT”

� The QT could be viewed as “quasi-FOAT,” as a sort-of the “initial stage of FOAT” that

more or less adequately replicates the initial non-destructive, yet full-scale, stage of

FOAT.

� We believe that such an approach to qualify devices into products will enable industry

to specify, and the manufacturers -to assure, a predicted and low enough probability

of failure for a device that passed the QT and will be operated in the field under the

given conditions for the given time.

� We expect that the suggested approach to the DfR and QT will be accepted by the

engineering and manufacturing communities, implemented into the engineering

practice and be adequately reflected in the future editions of the QT specifications and

methodologies.


The PDfR-based QT will still be non-destructive

� Such QTs could be designed, therefore, as a sort of mini-FOAT that, unlike the actual ,

“full-scale” FOAT, is non-destructive and conducted on a limited scale.

� The duration and conditions of such “mini-FOAT” QT should be established based on

the observed and recorded results of the actual FOAT, and should be limited to the

stage when no failures in the actual full-scale FOAT were observed.

� Prognostics and health management (PHM) technologies (such as “canaries”) should

be concurrently tested to make sure that the safe limit is not exceeded.


What should be done differently

� It is important to understand the reliability physics that underlies the mechanisms and

modes of failure in electronics and photonics components and devices

� FOAT should be thoroughly implemented, so that the QT is based on the FOAT

information and data.

� PDfR concept should be widely employed

� FOAT cannot do without predictive modeling, the role of such modeling, both

computer-aided and analytical (“mathematical”), in making the suggested new

approach to product qualification practical and successful.

10. CONCLUSIVE REMARKS

“Life is the art of drawing sufficient conclusions from insufficient premises”

Samuel Butler, British poet and satirist, “The Way of All Flesh”


Conclusions-1

� Improvements in the existing QT, as well as in the existing best QT practices, are

indeed possible, provided that the Probabilistic Design for Reliability (PD fR) concept

is thoroughly developed and the corresponding methodologies are employed.

� One effective way to improve the existing QT and specs is to

� conduct, on a wide scale, Failure Oriented Accelerated Testing (FOAT) at the design

stage (DFOAT) and at the manufacturing stage (MFOAT), and, since DFOAT cannot do

without PM,

� carry out, whenever and wherever possible, predictive modeling (PM) to understand

the physics of failure and to accumulate, when appropriate, failure statistics;

� revisit, review and revise the existing QT and specs considering the DFOAT and, to a

lesser extent, MFOAT data for the most vulnerable elements of the device of interest;

� develop and widely implement the PDfR methodologies having in mind that “nobody

and nothing is perfect”, that probability of failure is never zero, but could be predicted

and, if necessary, controlled and maintained during operation at an acceptable low

level.


Conclusions-2

� We believe that our new approach to the qualification

of the electronic devices will enable industry to

specify and the manufacturers to assure a predicted

and low enough probability of failure for a device that

passed the qualification specifications and will be

operated under the given stress (not necessarily

mechanical) conditions for the given time.

� We expect that eventually the suggested new

approaches to the DfR and QT will be accepted by the

engineering and manufacturing communities,

implemented in a timely fashion into the engineering

practice and be adequately reflected in the future

editions of the qualification specifications and

methodologies.


© 2009

Thank youfor taking my course

Technology

Probabilistic design for reliability (pdfr) in electronics part2of2