Upload
asq-reliability-division
View
854
Download
1
Embed Size (px)
DESCRIPTION
This is a four parts lecture series. The course is designed for reliability engineers working in electronics, opto-electronics and photonics industries. It explains the roles of Highly Accelerated Life Testing (HALT) in the design and manufacturing efforts, with the emphasis on the design one (the HALT in manufacturing is the well known late Greg Hobb’s approach), and teaches what could and should be done to design, when high probability is a must, a product with the predicted, specified (“prescribed”) and, if necessary, even controlled, low probability of the field failure. Part 3: • Design for Reliability (DfR) • Probabilistic Design for Reliability (PDfR): role, attributes, challenges, pitfalls • Safety margin and safety factor • Practical examples: assemblies subjected to thermal and/or dynamic loading Part 4: • More general PDfR approach • New Qualification Approaches Needed? • One effective way to improve the existing QT practices and specifications
Citation preview
Probabilistic Design for Reliability (PDfR) in
El iElectronicsPart IIPart IIDr. E. Suhir
©2011 ASQ & Presentation SuhirPresented live on Jan 03~06th, 2011
http://reliabilitycalendar.org/The_Reliability Calendar/Short Courses/Shliability_Calendar/Short_Courses/Short_Courses.html
ASQ Reliability DivisionASQ Reliability Division Short Course SeriesShort Course Series
One of the monthly webinarsOne of the monthly webinars on topics of interest to reliability engineers.
To view recorded webinar (available to ASQ Reliability ) /Division members only) visit asq.org/reliability
To sign up for the free and available to anyone live webinars visit reliabilitycalendar.org and select English Webinars to find links to register for upcoming events
http://reliabilitycalendar.org/The_Reliability Calendar/Short Courses/Shliability_Calendar/Short_Courses/Short_Courses.html
Dr. E. Suhir Page 1
PROBABILISTIC DESIGN for RELIABILITY (PDfR) CONCEPT,
the Roles of Failure Oriented Accelerated Testing (FOAT)
and Predictive Modeling (PM), and
a Novel Approach to Qualification Testing (QT)
“You can see a lot by observing”
Yogi Berra, American Baseball Player
“It is easy to see, it is hard to foresee”
Benjamin Franklin, American Scientist and Statesman
E. Suhir Bell Laboratories, Physical Sciences and Engineering Research Division, Murray Hill, NJ (ret),
University of California, Dept. of Electrical Engineering, Santa Cruz, CA,
University of Maryland, Dept. of Mechanical Engineering, College Park, MD, and
ERS Co. LLC, 727 Alvina Ct. Los Altos, CA, 94024, USA
Tel. 650-969-1530, cell. 408-410-0886, e-mail: [email protected]
Four hour ASQ-IEEE RS Webinar short course
January 3-6, 2011
Dr. E. Suhir Page 2
Contents
Session I
1. Introduction: background, motivation, incentive
2. Reliability engineering as part of applied probability and Probabilistic Risk
Management (PRM) bodies of knowledge
3. Failure Oriented Accelerated Testing (FOAT): its role, attributes, challenges, pitfalls
and interaction with other accelerated test categories
Session II
4. Predictive Modeling (PM): FOAT cannot do without it
5. Example of a FOAT: physics, modeling, experimentation, prediction
Session III
6. Probabilistic Design for Reliability (PDfR), its role and significance
Session IV
7. General PDfR approach using probability density functions (pdf)
8. Twelve steps to be conducted to add value to the existing practice
9. Do electronic industries need new approaches to qualify their devices into products?
10. Concluding remarks
Dr. E. Suhir Page 60
Session III
6. PROBABILISTIC DESIGN FOR RELIABILITY,
ITS ROLE and SIGNIFICANCE
“Probable is what usually happens”
Aristotle, Greek philosopher
“Probability is the very guide of life”
Marcus Tullius Cicero,
Roman philosopher and statesman
Dr. E. Suhir Page 61
Design-for-Reliability
� Design for reliability (DfR) is a set of approaches, methods and best practices that are supposed to be used during the design phase of the product to minimize the likelihood (risk) that the product will not meet the reliability requirements, objectives and expectations.
� While 50% of the total actual cost of an electronic product is due to the cost of materials, 15% - to the cost of labor, 30% to the overhead costs and only 5% tothe design effort, this effort influences about 70% of the total cost of the product (“Six Sigma”, M. Harry and R. Schroeder).
� If reliability is taken care of during the design phase, the final cost of the product does not go up. If a reliability problem is detected during engineering the cost of the product goes up by a factor of 10. If the problem is caught in production phase, the cost of the product increases by a factor of 100 or more.
Dr. E. Suhir Page 62
Deterministic approach
� Deterministic approach is based on the concept that reliability is assured by introducing a sufficiently high deterministic safety factor, which is defined as the ratio of the capacity (“strength”) C of the system to the demand (“load”) D:
� The level of the safety factor SF is being chosen depending on the consequences of failure, acceptable risks, the available and trustworthy information about the capacity and the demand, the accuracy with which these characteristics are determined, possible costs and social benefits, variability of materials and structural parameters, construction (manufacturing, fabrication) procedures, etc.
� In a particular problem the capacity and demand could be different from the strength and load, and the role of these characteristics can be replaced by, say, acceptable and actual current, voltage, light intensity, electrical resistance; traffic capacity and traffic flow; culvert size and the quantity of water; critical (buckling) and actual compressive stresses; etc.
� The safety factors in engineering are being established from the previous experiences for the considered system in its anticipated environmental or operation conditions.
.D
CSF==δ
Dr. E. Suhir Page 63
Probabilistic approach
� Probabilistic DfR (PDfR) approach is based on the probabilistic risk management
(PRM) concept, and if applied broadly and consistently, brings in the probability
measure (dimension) to each of the design characteristics of interest. Using AT data
and particularly FOAT data, and PM techniques, it enables one to establish the
probability of the possible (anticipated) failure under the given operation conditions
and for the given moment of time in operation
� After the probabilistic PMs are developed, one should use sensitivity analyses to
determine the most feasible materials and geometric characteristics of the design, so
that the lowest probability of failure is achieved
� In other cases, the probabilistic DfR approach enables one to find the most feasible
compromise between the reliability and cost effectiveness of the product
� When probabilistic DfR (PRM) approach is used, the reliability criteria (specifications)
are based on the acceptable (allowable) probability of failure for the given product.
Dr. E. Suhir Page 64
Basic Principles Underlying our PDfR Approach-1
� Not all the products require the PDfR approach, but only those for which high
reliability is crucial and for which there is a reason to believe that this probability
might not be high enough for particular applications
� Nobody and nothing is perfect. The difference between a reliable and unreliable
system (device) is in the level of the probability of failure in the field under the
given (anticipated) loading (environmental) conditions and after the given
(specified) time in operation.
� The probability of failure in the field is the ultimate and a “reliable” criterion
(“judge”) of the product’s reliability
� This probability can be established through a specially designed and carefully
conducted DFOAT aimed at understanding the physics of failure and choosing the
right predictive DFOAT model (e.g., Arrhenius, Coffin-Manson, crack propagation,
demand-capacity “interference”, etc.) for the anticipated loading conditions or
their combination (say, thermal+vibrations)
Dr. E. Suhir Page 65
Basic Principles Underlying our PDfR Approach-2
� The reliability of a product is due to the reliability of its one or two most vulnerable
(most unreliable) functional or structural elements, and it is for these elements that the
adequate DFOAT should be designed and conducted
� Sensitivity analyses are a must after the physics of the anticipated failure is
established, the appropriate predictive model is agreed upon, and the acceptable
probability of failure in the field is specified, but prior to the final decision about
launching the mass production of the product
� DFOAT is not necessarily a destructive test, but is always a test to failure, a test to
determine the limits of the reliably operation and the probability that these limits are
exceeded
� DFOAT cannot do without predictive modeling, and it is only through the predictive
modeling that the probability-of-failure in the field could be found (established)
Dr. E. Suhir Page 66
Basic Principles Underlying our PDfR Approach-3
� Time and labor consuming a-posteriori “statistics-of-failure” can be successfully
replaced, to a great extent, by the anticipated a-priori “probability-of-failure”
confirmed by some statistical data (for the mean and STD values of the probability
distribution of interest, but not for the probability-distribution function itself)
� PDfR concept enables one to qualify a viable device (system) into a reliable-in-the-
field product, with the predicted, prescribed (specified) and even, if necessary,
controlled probability of failure in the field
� Technical diagnostics, prognostication and health monitoring could be effective
means to anticipate, establish and prevent possible field failures
� PDfR has to do with the DfR, and not with the Manufacturing-for-Reliability (MfR)
� Burn-ins could be viewed as a special type of FOAT intended for MfR objectives
and are always a must, whatever DfR approach is considered.
Dr. E. Suhir Page 67
Reliability function
� The simplest objects (items) in reliability engineering are those that do not let
themselves to restoration (repair) and have to be replaced after the first failure.
The reliability of such items is due entirely do their dependability, i.e., probability
of non-failure, which is the probability that no failure could possibly occur during
the given period of time. The dependence of this probability of time is known as
the reliability function.
� As any other probability, the dependability of a sufficiently large population of
non-repairable items can be substituted by the frequency, and therefore the
reliability function can be sought as
,
where is the total number of items being tested and is the number of
items that are still sound by the time t .
0
)()(
ttR s=
0 )(ts
Dr. E. Suhir Page 68
Failure rate
� Differentiation the relationship
with respect to time t, we have:
where is the number of the failed items.
� The failure rate is introduced as follows:
As evident from this formula, the failure rate is the ratio of the number of items that
failed by the time t to the number of items that remained sound by this time. The
failure rate characterizes the change in the dependability of an item in the course of its
lifetime.
0
)()(
ttR s=
dt
td
dt
td
dt
tdR fs)(1)(1)(
00
−==
)()( 0 tt sf −=
dt
td
tt
f
s
)(
)(
1)( =λ
Dr. E. Suhir Page 69
Bathtub curve
Dr. E. Suhir Page 70
Probabilistic and statistical definitions
of the reliability function
Considering , the formula
yields: , or
so that . . Hence,
The reliability function R(t) satisfies the obvious initial condition R(0)=1. The
above formula for the reliability function expresses the probabilistic definition of
this function, while the formula
provides its statistical definition.
dt
td
tt
f
s
)(
)(
1)( =λ
dt
td
dt
td
dt
tdR fs)(1)(1)(
00
−==
)()()(1
)()(
00
tRt
t
t
dt
tdR s λλ −=−=
dttR
tdR)(
)(λ−=
∫−=t
dtR0
)()(ln ττλ
−= ∫t
dtR0
)(exp)( ττλ
0
)()(
ttR s=
Dr. E. Suhir Page 71
Exponential formula of reliability (revisited).
Probability of failureWhen the failure rate is time independent, the formula
leads to the exponential formula of reliability:
The function
is the probability density function for the flow of failures, or the failure frequency.
The probability of a failure during the time t can be evaluated as
−= ∫
t
dtR0
)(exp)( ττλ
tetR λ−=)(
−=−= ∫t
dtdt
tdRtf
0
)(exp)()(
)( ττλλ
∫=−=t
dftRtQ0
)()(1)( ττ
Stress-strength (“interference”) conceptThe curve on the right should be obtained experimentally, based on the accelerated life testing and
on the accumulated experience. The bearing capacity of the structure should be such that the
probability of failure, P(t), is sufficiently low, and the safety factor (SF) is not lower than the
specifies value, say, SF=1.4. In a simplified analysis the curve on the right could be substituted,
particularly, by a constant value, which, if a conservative approach is taken, should be sufficiently
low.
Probability density function for a
particular mechanical or thermal
characteristic (response) of the tile
structure to the given environmental factor
at the given moment of time (“Demand”, D)
Capability of the tile structure with respect to the
particular mechanical or thermal loading (may or
may not be time-dependent). In the current analysis
we assume that the bearing capacity for a particular
reliability characteristic is either a constant value or a
normally distributed random variable with a known
(evaluated) mean and standard deviation
(“Capacity”, C)
The larger is the overlap of these two curves, the higher is the probability of failure, and the lower is
the safety factor. After these two curves are evaluated (established) for each reliability characteristic of
interest and for each moment of time (separately, for the take off and landing processes) we evaluate
the probability distributing function, f(ψ), for the safety margin, ψ=C-D, its mean, <ψ>, and standard
deviation, ŝ, and the safety factor, SF= <ψ>/ ŝ. It should not be lower than the specified value, say,
SF=1.4.
Probability of non-failure (dependability)
� The “reliability” (actually, “dependability”) of a non-repairable item is defined as the
probability of non-failure, P = P {C>D}, i.e., as the probability that the item’s bearing
capacity (“strength”), C, during the time, t, of operation under the given stress
conditions, will always be greater than the demand (“loading”), D.
� Although the probability of non-failure is never zero, it can be made, if a probabilistic
approach is used, as low as necessary. If the probability distributions f (C) and g (D)
(probability density functions) for the random variables C and D are known, then the
probability, P, of non-failure (reliability, dependability) can be evaluated as
where f(ψ) is the probability density function of the margin of safety ψ=C-D, which is
also a random variable.
∫∞
=0
)( ψψψ dfP
Safety factor -1
� Direct use of the probability of non-failure is often inconvenient, since, for highly
reliable items, this probability is expressed by a number which is very close to one,
and, for this reason, even significant chan in the item’s (system’s) design, which have
an appreciable impact on the item’s reliability, may have a minor effect on the
probability of non-failure.
� In those cases when both the mean value, <ψ>, and the standard deviation, ŝ, of the
margin of safety (or any other suitable characteristic of the item’s reliability, such as
stress, temperature, displacement, affected area, etc.), are available, the safety factor
(safety index, reliability index)
SF=δ= <ψ>/ŝ
can be used as a suitable reliability criterion.
Dr. E. Suhir Page 75
Safety factor-2
� After the capacity and the demand curves are established for each probability
characteristic of interest and for each moment of time the probability distribution
function for the safety margin should be determined. Then,
for normally distributed capacity and demand, the mean value
of the safety margin and its standard deviation
should be evaluated.
� The safety factor could be found as the ratio of the mean value of the safety margin
to its standard deviation:
)(ψf DC −=Ψ
ψψψψ df∫∞
=0
)(><
ψψψψψ dfs ∫∞
−=0
2))(( ><
ψ
ψδ
sSF
><==
Dr. E. Suhir Page 76
Safety factor-3
�The SF should not be lower than its specified value for the characteristic of interest.
�This value should reflect the state-of-the-art in the given area of engineering, cost and
time-to-market considerations, and should account for the consequences of failure.
�If the computed SF does not meet the specification requirements, the design should be
revised (improved) until the required level of safety (reliability) is met.
�The required level of safety could be established also based on the level of the
probability
of non-failure. This formula defines the probability that the safety margin
is found between the given value and infinity. i.e., is higher than the given (specified)
value of this margin.
∫∞
=ψ
ψψψ dfP )()(
DC−=Ψ
Normal law
� The SF and the probability of exceeding a certain level of the safety margin
are related If the reliability characteristic of interest (such as, e.g., the safety margin,
ψ) is distributed in accordance with the normal law
then the probability of non-failure is related to the safety factor
SF as
P=½[1+Ф(SF)],
where
is the probability integral (Laplace function).
( )ψ
ψψ
πψ
ψψ
ψ dDD
f
−−=
2exp
2
1)(
2
dteФ t
∫−=
α
πα
0
22)(
P SF
0.999000 3.0901
0.999900 3.7194
0.999990 4.5255
0.999999 4.7518
1.0 ∞
)(ψP
Dr. E. Suhir Page 78
Safety factor-4
� SF establishes both the upper limit of the reliability characteristic of interest
(through the mean value of the corresponding margin of safety) and the accuracy
with which this characteristic is defined (through the corresponding standard
deviation).
� The structure of the SF indicates that it is acceptable that a system characterized by
a high mean value of the safety margin (i.e., a system whose bearing capacity with
respect to a certain stress/reliability-characteristic, not necessarily mechanical, is
significantly higher than the level of loading) has a less accurately defined deviation
from this mean value than a system characterized by a low mean value of the safety
margin (i.e., a system whose bearing capacity is much closer to the possible level of
loading). In other words, the uncertainty in the evaluation of the safety margin
should be smaller for a more vulnerable design.
Dr. E. Suhir Page 79
Safety factor (SF) and coefficient of variability (COV)
� Safety factor (SF) is reciprocal to the coefficient of variability (COV). The latter is
defined as the ratio of the standard deviation to the mean value of the random
variable of interest.
� While the COV is the characteristic of uncertainty of the random variable of
interest, the SF is the characteristic of certainty of the random parameter (stress-
at-failure, the highest possible temperature, the ultimate displacement, the
affected area, etc.) that is responsible for the non-failure of the item.
� If the reliability characteristic of interest (for a non-repairable item) is a random
variable that is determined by just two independent non-random quantities (say,
the mean value and the standard deviation), then the safety factor, SF, determines
completely the probability of non-failure (reliability): the larger the SF is, the
higher is the probability of non-failure.
Time-to-failure (TTF), MTTF and the corresponding SF
� Usually the capacity (strength), C, and/or the demand (loading), D, change in time.
Failure occurs, when the demand (loading), D, becomes equal or smaller than the
bearing capacity (strength), C, of the item. This random event is the time-at-failure (TAF), and the duration of operation until this time takes place is the random variable
known as time-to-failure (TTF).
� Thus, TTF is the time from the beginning of operation until the moment of time when
the demand (loading) D becomes equal or higher than the bearing capacity C, i.e.,
when the safety margin becomes zero or negative.
� The corresponding safety factor, SF, is the ratio of the MTTF to the STD of the TTF:
SF=MTTF/STD
DC−=Ψ
Dr. E. Suhir Page 81
Mean time-to-failure and reliability function
Mean-time-to-failure (MTTF) is the mean time of the item operation until it fails.
Hence, it can be computed as . Since
we have (using integration by parts):
and the variance of the TTF can be found as
The corresponding SF is
∫∞
=0
)( tdttft
dt
tdRtf
)()( −=
[ ] ,)()()()(
)(00 0 0
0 ∫∫ ∫ ∫∞∞ ∞ ∞
∞=+−=−== dttRdttRttRtdt
dt
tdRtdttft
2
0
2
0
2
0
2 )(2)())(( ttdttRtdtttfdttttfDt −=−=−= ∫∫∫∞∞∞
tD
t
STD
MTTFSF ===δ
Example #1
As a simple example, examine a device whose MTTF, ,τ during steady-state operation is described
by the Boltzmann-Arrhenius equation .exp0
=
kT
Uττ The failure rate is therefore
.exp11
0
−==kT
U
ττλ If Weibull law is used to predict the probability of failure, then the probability
of non-failure (dependability) can be evaluated on the basis of the following probability distribution
function: [ ] ,expexp)(exp0
−−=−=
β
β
τλ
kT
UttP where β is a shape parameter. Solving
this equation for the absolute temperature ,T we obtain:
( ).
lnln/10
−
−=βτ
Pt
k
UT
Example #1 (cont)
Let for the given type of failure (say, surface charge accumulation), the k
U ratio is ,116000K
k
U=
the 0τ value predicted on the basis of the ALT is 8
0 105 −= xτ hours, and the shape parameter β
turned out to be close to 2=β (Rayleigh distribution). Let the allowable (specified) probability of
failure at the end of the device’s service time of, say, 000,40=t hours be 510−=Q (it is acceptable
that one out of hundred thousand devices fails). Then the above formula indicates that the steady-state
operation temperature should not exceed ,8.768.349 00 CKT == and the thermal management
tools should be designed accordingly. This rather elementary example gives a feeling of how the
PDfR concept works and what kind of information one could expect using it.
Dr. E. Suhir Page 84
Example #2
Let, for instance, the absolute temperature T be distributed in accordance with the
Rayleigh law, so that the probability that a certain level is exceeded is
determined as
where is the most likely value of the absolute temperature T. Then, using the
Boltzmann-Arrhenius relationship
we conclude that the probability that the random MTTF (“random”, because
of the uncertainty in the level of the most likely temperature) is below a certain level
(probability of failure is defined in this case as the probability that the specified level
is not achieved) can be found as
*T
−=
2
0
2
** exp)(
T
TTTP >
0T
=
kT
U aexp0ττ
τ
*τ
Dr. E. Suhir Page 85
Example #2 (cont)
Solving this equation for the most likely (specified) value, we find:
This formula indicates how the (most likely) level of the device temperature should be
established, so that the probability that the specified level of the MTTF is not
achieved is sufficiently low.
−=
−=
2
0
*0
2
0
2
**
ln
expexp)(
τ
τττ
kT
U
T
TP a
>
Pk
UT a
lnln0
*0
−
=
τ
τ
*τ
0T
Dr. E. Suhir Page 86
Reliability of repairable items
� Reliability of complex items (products) depends not only on their dependability,
but on their repairability as well.
� It is important that the products are designed in such a way that their gradual and
potential failures could be easily detected and eliminated in due time, and that the
detected damages (defects), such as, say, fatigue cracks, could be removed
before a catastrophic failure process commences.
� The reliability of complex products is characterized, first of all, by their
availability, which is defined as an ability of an item (system) to perform its
required function at the given time or over a stated period of time, with
consideration of its dependability, repairability, maintainability and maintenance
support.
� A high level of reliability of complex products can be achieved by employing the
most feasible combination of dependability, on one hand, and dependability,
repairability, maintainability and maintenance support, on the other.
Dr. E. Suhir Page 87
Availability index-1
� The non-steady-state (time dependent) operational availability index is defined
as the probability that the item of interest will be available to the user at the given
moment T of time and will operate failure-free during the given time beginning
with the moment t .
� The steady-state availability index K is the time-independent probability that the
item will operate (will be available) failure-free during the time T , beginning with
an arbitrary moment t of time that is sufficiently remote from the beginning of
operations (so that the “infant mortality” portion of the “bathtub” curve is
excluded).
� The most often used availability characteristic of the Class II and Class III items,
whose normal operation includes regular repairs (say, workstations or other
complex and expensive electronic systems), is the availability index defined
as the steady-state probability that the item will be available at the arbitrary
moment of time taken between the preplanned preventive maintenance activities.
)(tK
aK
Dr. E. Suhir Page 88
Availability index-2
� The availability index can be computed by the formula
where is the mean time between successive failures for the i-th item in the
system, and is the mean-time-to-repair for this item.
� The index indicates the percentage of time, during which the system is in the
working (available) condition.
� The use of the index enables one to make assessments of the unforeseen
idle times and to consider these times at early stages of the design of the product.
aK
∑=
+
=n
if
i
r
i
a
t
tK
1
1
1
f
it rit
aK
aK
Dr. E. Suhir Page 89
Operational Availability Index
� The operational availability index can be calculated for situations,
when the probability of failure-free operation during the time interval t is
independent of the beginning of this interval, by the formula
where R(t) is the dependability of the item.
� This formula determines the probability that two events take place:
1) the item is available at the arbitrary moment of time with the probability and
2) will operate failure-free during the time period of the duration t.
)(tK
)()( tRKtK a=
aK
Dr. E. Suhir Page 90
Session IV
7. GENERAL PDfR APPROACH
USING PROBABILITY DENSITY FUNCTIONS (PDF)
“Education is man’s going forward from cocksure ignorance to thoughtful uncertainty”,
Donald B. Clark, Australian author, “Scrapbook”
“There are things in this world, far more important than the most splended discoveries –
It is the methods by which they were made”
Gottfried Leibnitz, German mathematician
Dr. E. Suhir Page 91
PDfR Characteristics
� The appropriate electrical, optical, mechanical, thermal, and other physical
characteristics that determine the functional performance, mechanical
(physical/structural) reliability and/or environmental durability of the
design/device/apparatus of interest should be established.
� Examples of are: appropriate electrical parameters (current, voltage, etc.), light
output, heat transfer capability, mechanical ultimate and fatigue strength, fracture
toughness, maximum and/or minimum temperatures, maximum
accelerations/decelerations, etc.
Dr. E. Suhir Page 92
Factors that affect the PDfR characteristics-1
� Establish the electrical, optical, mechanical, thermal, environmental and other
possible (say, human) stress (loading) factors (conditions) that might affect the
reliability characteristics, i.e., characteristics that determine (affect) the short- and
long-term reliability of the object (structure) of interest.
� Examples are: high an/or low temperatures, high electrical current or voltage,
electrical and/or optical properties of materials, mechanical and thermal stresses,
displacements, maximum temperatures, size of the affected areas, etc.
� This should be one separately for each characteristic of interest and, if necessary,
for each manufacturing process and for different phases of manufacturing, testing
and/or operations
Dr. E. Suhir Page 93
Factors that affect the PDfR characteristics-2
� Based on the physical nature of the particular environmental/loading factor
(electrical, optical, mechanical, environmental) and on the available information of
it, establish if this factor should be treated as a non-random (deterministic) value,
or should/could be treated as a random variable with the given (assumed)
probability distribution function.
� At this stage one could treat random characteristics of interest as nonrandom
functions of random factors, and establish the probability distribution functions
for the random factors using experimental data, and/or Monte-Carlo simulations,
and/or finite-element analyses (FEA), and/or evaluations based on analytical
(“mathematical”) modeling, etc.
Dr. E. Suhir Page 94
Factors that affect the PDfR characteristics-3
Let, for instance, the absolute temperature T be distributed in accordance with the
Rayleigh law, so that the probability that a certain level is exceeded is
determined as
where is the most likely value of the absolute temperature T.
Then, using the Boltzmann-Arrhenius relationship
we conclude that the probability that the random mean-time-to-failure (“random”,
because of the uncertainty in the level of the most likely temperature) is
below a certain level
*T
−=
2
0
2
** exp)(
T
TTTP >
0T
=
kT
U aexp0ττ
τ
*τ
Dr. E. Suhir Page 95
Factors that affect the PDfR characteristics-4
(probability of failure that is define in this case as the probability that the specified
level is not achieved) can be found as
Solving this equation for the value, we find:
This formula indicates how the (most likely) level of the device temperature should be
established, so that the probability that the specified level of the MTTF is not
achieved is sufficiently low.
−=
−=
2
0
*0
2
0
2
**
ln
expexp)(
τ
τττ
kT
U
T
TP a
>
)( *ττ >P
Pk
UT a
lnln0
*0
−
=
τ
τ
*τ
Dr. E. Suhir Page 96
Choose appropriate basic probability distributions-1
� After the reliability characteristics are established and the factors affecting these
characteristics are selected , one should choose the adequate probability
distributions for the factors (conditions) that affect the short- and long-term
reliability characteristics.
� For those factors (conditions) that should be treated as random variables,
establish (accept) the physically meaningful probability distribution laws.
� When the actual experimental information is not available, assume, based on
general physical considerations, the most suitable (or the most conservative)
laws of the probability distribution (e.g., uniform, exponential, normal, Weibull,
Rayleigh, etc.).
Dr. E. Suhir Page 97
Choose appropriate basic probability distributions-2
Here are some general considerations that can be used in practical applications.
�Since the exponential distribution has the largest entropy (the largest uncertainty)
of all the distributions with the same mean, this distribution should be considered, if
no other information, except the expected (mean) value, is available. The
exponentially distributed random variable is always positive. The safety factor for an
exponentially distributed random variable is always “one”.
�If the random process of failures can be treated as a simple Poisson flow with a
constant intensity, then the time interval between two adjacent consecutive failures
has an exponential distribution. The most likely value of the exponentially distributed
random variable, t, is at the initial moment of time t=0.
Dr. E. Suhir Page 98
Choose appropriate basic probability distributions-3
� If the physical nature of a random environmental factor is such that it can be only
positive (i.e., acceleration during take off of an aircraft, or a current for an
electronic module) or only negative (i.e., deceleration during landing or during
drop tests of a cell phone), its most likely value is certainly non-zero.
� If only this value (or the mean) is available, then the Rayleigh law could be
employed. This law is also (like the exponential law) a single-parametric law.
� The safety factor, when Rayleigh distribution is used, is always
6633.0
41
1=
+
=
π
δ
Dr. E. Suhir Page 99
Choose appropriate basic probability distributions-4
� If a normally distributed random variable has a finite variance and zero mean, and
changes periodically with a constant or next-to-constant frequency, but with a
random amplitude and random phase angle, then these amplitudes and the
corresponding energies obey the Rayleigh law of distribution.
� If the expected (mean) value and the variance are known, and the physical nature
of the random environmental factor is such that the probability density function is
symmetric with respect to the mean value (which coincides with the median and
the most likely value), then the normal distribution should be accepted, especially
(but not necessarily) if the random variable can be either positive or negative.
Dr. E. Suhir Page 100
Choose appropriate basic probability distributions-5
� It is noteworthy that if the safety factor defined as the ratio of the mean value of
the safety margin to its standard deviation, is significant (which is typically the
case), then application of the normal law of the distribution of the safety factor is
acceptable: its negative values, although are possible in principle, are
characterized by negligibly low probabilities and need not be considered.
� If the expected (mean) value and the variance are known, and the physical nature
of the random environmental factor is such that the probability density function is
highly asymmetric (skewed) with respect to its mean or the most likely value,
then Weibull distribution, or the distribution of the absolute value of a normal
random variable, or a truncated normal distribution, or a log-normal distribution
can be used.
Dr. E. Suhir Page 101
Establish appropriate cumulative probability distributions-1
� Treating each reliability characteristic of interest as a non-random function
(output) of a random argument (input) due to a particular external or internal
factor, evaluate the probability density function of this characteristic for the
assumed (accepted, determined) law of the probability distribution of the
environmental factor.
� Time could enter as an independent parameter into the computed response.
� For some factors, the input could be considered as a non-random (deterministic)
value.
Dr. E. Suhir Page 102
Establish appropriate cumulative probability distributions-2
� Determine the cumulative probability distribution functions for all the probability
density functions that affect the given mechanical or thermal characteristic of
interest.
� Such a convolution of the constituent laws of distribution considers, in the most
accurate and non-conservative way, the probabilistic input of each of the
environmental parameters that affect the particular mechanical, electrical, optical
or thermal characteristic.
� Cumulative distributions consider the likelihood that the maxima of different
important factors might not occur simultaneously
Dr. E. Suhir Page 103
Establish appropriate cumulative probability distributions-3
� If the number of random variables does not exceed two, the convolution could be
carried out analytically.
� If the number of random variables is three or more, one should “teach” a
computer how to obtain a cumulative law of distribution.
� Since the above distributions are based on the transient responses of the
mechanical (thermal) characteristics of interest to the time-dependent
environmental excitations (parameters), these distributions determine the
probability that at the given moment of time the given characteristic is
below/above the given value of this characteristic.
Dr. E. Suhir Page 104
Probabilistic reliability criteria
Determine for each point of time, after the given duration of operation (mission):
� the safety factors and other reliability criteria for the characteristics that
determine the performance, reliability, durability and safety of the system,
� the probability of non-failure, P (t), for the established (accepted) safety factor, at
each point of time, and
� the mean time-to-failure, MTTF, for the established (accepted) safety factor,
standard deviation, STD, of the time-to-failure and safety factor SF=MTTF/STD for
the time-to-failure.
Dr. E. Suhir Page 105
8. Twelve steps to be conducted
to add value to the existing practice
“The man who removes a mountain begins by carrying away small stones”
Chinese saying
“Give me a fruitful error any time, full of seeds, bursting with
its own corrections. You can keep your sterile truth for yourself”
Vilfredo Pareto, Italian engineer, sociologist, and economist
Dr. E. Suhir Page 106
Some important preliminary steps
� Establish, as the manufacturer of a particular product, the list of possible failures and
suitable failure criteria, as far as the functional, mechanical (physical) and
environmental failures are concerned.
� Find out the similar requirements that the customer specifies (desires) regarding
lifetimes (minimum and mean time to failure), failure rates (considering, for a particular
product, if necessary, the wear-out portion of the bath-tub curve), probability of failure
(for non-reparable products), availability specifications, etc.
� Identify active and passive parts, reparable and non-reparable parts, the most
vulnerable (least reliably) parts (e.g., solder joint interconnections, materials prone to
creep or aging, etc.), the feasibility of introducing redundancy, etc.
� As a customer, evaluate the ability of a particular manufacturer, to make parts with
consistent quality, and, as a manufacturer, establish your company’s ability to
produce such parts.
Dr. E. Suhir Page 107
Twelve steps to be conducted to add value to the
existing practice-11) Develop a detailed list of possible electrical, mechanical (structural), thermal, and
environmental failures that should be considered, in one way or another, in the
particular design (package, invertor, module, structure, etc.)
2) Make, based on the existing experience and best practices, the preliminary decision on
the materials and geometries in the physical design and packaging of the product and
its units/subunits/assemblies
3) Conduct predictive modeling (using FEA or other simulation packages, as well as
analytical/"mathematical" wherever possible) of the stresses and other failure criteria
(say, elevated temperatures or electrical characteristics), considering steady state
and transient thermal, stress/strain and electrical fields
4)Consider possible loading in actual use conditions (electrical, thermal, mechanical,
dynamic, as well as their combinations) and distinguish between short-term high-
level loading (related to the ultimate strength of the structure) and long-term low-level
loading (related to the fatigue strength of the structure)
Dr. E. Suhir Page 108
Twelve steps to be conducted to add value to the existing practice-2
5) Review the existing qualification standards for the similar structures, having in mind,
however, that these standards were designed, although for similar, but for different
(power, geometry, materials, use) conditions, than what we will be dealing with; come
up with the preliminary level of acceptable stresses, accelerations, temperatures,
voltages, currents, etc.
6) Having in mind FOAT procedures, decide on the constitutive relationships (formulas,
FEA procedures, plots) that govern the failure mechanisms in question (Arrhenius type
of equations for high temperature "baking", Minor type- for the materials that are
expected to work within the elastic range, Erdogan-Paris type - for brittle materials,
etc.)
7) Design, conduct and interpret the results of the FOAT and, based on this testing,
predict the reliability characteristics of the assemblies, joints, subunits and units of
interest
Dr. E. Suhir Page 109
Twelve steps to be conducted to add value to the existing practice-3
8) Based on the obtained information, the state-of-the-art in the area in question and the
requirements of the existing specifications, decide on the allowable (acceptable)
values of the characteristics of failure, with consideration of the economically and
technically feasible lifetime of the module and its major subassemblies
9) Write first draft of the qualification specs (in other words, revise, if necessary, the
existing JEDEC specs) for the module and its unites/subunits of interest
10) Develop root cause analysis (RCA) methodologies
11) Decide on the burn-in conditions and establish adequate service for collecting field
failures
12) Conduct, on the permanent basis, revisions of the designs and the reliability
specifications.
9. DO ELECTRONIC INDUSTRIES
NEED NEW APPROACHES
TO QUALIFY THEIR DEVICES INTO PRODUCTS?
“I do not need an everlasting pen. I do not intend to live forever”
Ilf and E. Petrov, “The Golden Calf” (in Russian)
“It is always better to be approximately right than precisely wrong”
Unknown Reliability Manager
Dr. E. Suhir Page 111
Nobody and nothing is perfect:
probability of failure is never zero
� It should be widely recognized that the probability of a failure is never zero, but could
be predicted and, if necessary, controlled and maintained at an acceptable low level
� One effective way to achieve this is to implement the existing methods and
approaches of PRM techniques and to develop adequate PDfR methodologies
� These methodologies should be based mostly on FOAT and on a widely employed
predictive modeling effort
� FOAT should be carried out in a relatively narrow but highly focused and time-
effective fashion for the most vulnerable elements of the design of interest
� If the QT has a solid basis in FOAT, PM and PDfR, then there is reason to believe that
the product of interest will be sufficiently robust in the field.
Dr. E. Suhir Page 112
QT could be viewed as “quasi-FOAT”
� The QT could be viewed as “quasi-FOAT,” as a sort-of the “initial stage of FOAT” that
more or less adequately replicates the initial non-destructive, yet full-scale, stage of
FOAT.
� We believe that such an approach to qualify devices into products will enable industry
to specify, and the manufacturers -to assure, a predicted and low enough probability
of failure for a device that passed the QT and will be operated in the field under the
given conditions for the given time.
� We expect that the suggested approach to the DfR and QT will be accepted by the
engineering and manufacturing communities, implemented into the engineering
practice and be adequately reflected in the future editions of the QT specifications and
methodologies.
Dr. E. Suhir Page 113
The PDfR-based QT will still be non-destructive
� Such QTs could be designed, therefore, as a sort of mini-FOAT that, unlike the actual ,
“full-scale” FOAT, is non-destructive and conducted on a limited scale.
� The duration and conditions of such “mini-FOAT” QT should be established based on
the observed and recorded results of the actual FOAT, and should be limited to the
stage when no failures in the actual full-scale FOAT were observed.
� Prognostics and health management (PHM) technologies (such as “canaries”) should
be concurrently tested to make sure that the safe limit is not exceeded.
Dr. E. Suhir Page 114
What should be done differently
� It is important to understand the reliability physics that underlies the mechanisms and
modes of failure in electronics and photonics components and devices
� FOAT should be thoroughly implemented, so that the QT is based on the FOAT
information and data.
� PDfR concept should be widely employed
� FOAT cannot do without predictive modeling, the role of such modeling, both
computer-aided and analytical (“mathematical”), in making the suggested new
approach to product qualification practical and successful.
10. CONCLUSIVE REMARKS
“Life is the art of drawing sufficient conclusions from insufficient premises”
Samuel Butler, British poet and satirist, “The Way of All Flesh”
Dr. E. Suhir Page 116
Conclusions-1
� Improvements in the existing QT, as well as in the existing best QT practices, are
indeed possible, provided that the Probabilistic Design for Reliability (PD fR) concept
is thoroughly developed and the corresponding methodologies are employed.
� One effective way to improve the existing QT and specs is to
� conduct, on a wide scale, Failure Oriented Accelerated Testing (FOAT) at the design
stage (DFOAT) and at the manufacturing stage (MFOAT), and, since DFOAT cannot do
without PM,
� carry out, whenever and wherever possible, predictive modeling (PM) to understand
the physics of failure and to accumulate, when appropriate, failure statistics;
� revisit, review and revise the existing QT and specs considering the DFOAT and, to a
lesser extent, MFOAT data for the most vulnerable elements of the device of interest;
� develop and widely implement the PDfR methodologies having in mind that “nobody
and nothing is perfect”, that probability of failure is never zero, but could be predicted
and, if necessary, controlled and maintained during operation at an acceptable low
level.
Dr. E. Suhir Page 117
Conclusions-2
� We believe that our new approach to the qualification
of the electronic devices will enable industry to
specify and the manufacturers to assure a predicted
and low enough probability of failure for a device that
passed the qualification specifications and will be
operated under the given stress (not necessarily
mechanical) conditions for the given time.
� We expect that eventually the suggested new
approaches to the DfR and QT will be accepted by the
engineering and manufacturing communities,
implemented in a timely fashion into the engineering
practice and be adequately reflected in the future
editions of the qualification specifications and
methodologies.
Dr. E. Suhir Page 118
© 2009
Thank youfor taking my course