62
Probability Distributions Probability distributions are a fundamental concept in statistics. They are used both on a theoretical level and a practical level. Some practical uses of probability distributions are: To calculate confidence intervals for parameters and to calculate critical regions for hypothesis tests. For univariate data, it is often useful to determine a reasonable distributional model for the data. Statistical intervals and hypothesis tests are often based on specific distributional assumptions. Before computing an interval or test based on a distributional assumption, we need to verify that the assumption is justified for the given data set. In this case, the distribution does not need to be the best-fitting distribution for the data, but an adequate enough model so that the statistical technique yields valid conclusions. Simulation studies with random numbers generated from using a specific probability distribution are often needed. Related Distributions Probability distributions are typically defined in terms of the probability density function. However, there are a number of probability functions used in applications. Probability Density Function For a continuous function, the probability density function (pdf) is the probability that the variate has the value x. Since for continuous distributions the probability at a single point is zero, this is often expressed in terms of an integral between two points. For a discrete distribution, the pdf is the probability that the variate takes the value x. The following is the plot of the normal probability density function. 1

Distributions

Embed Size (px)

DESCRIPTION

all the statistical distributions and its implimentation in statistical analysis

Citation preview

Page 1: Distributions

Probability DistributionsProbability distributions are a fundamental concept in statistics. They are used both on a theoretical level and a practical level.

Some practical uses of probability distributions are:

To calculate confidence intervals for parameters and to calculate critical regions for hypothesis tests. For univariate data, it is often useful to determine a reasonable distributional model for the data. Statistical intervals and hypothesis tests are often based on specific distributional assumptions.

Before computing an interval or test based on a distributional assumption, we need to verify that the assumption is justified for the given data set. In this case, the distribution does not need to be the best-fitting distribution for the data, but an adequate enough model so that the statistical technique yields valid conclusions.

Simulation studies with random numbers generated from using a specific probability distribution are often needed.

Related Distributions

Probability distributions are typically defined in terms of the probability density function. However, there are a number of probability functions used in applications.

Probability Density Function For a continuous function, the probability density function (pdf) is the probability

that the variate has the value x. Since for continuous distributions the probability at a single point is zero, this is often expressed in terms of an integral between two points.

For a discrete distribution, the pdf is the probability that the variate takes the value x.

The following is the plot of the normal probability density function.

1

Page 2: Distributions

Cumulative Distribution Function

The cumulative distribution function (cdf) is the probability that the variable takes a value less than or equal to x. That is

For a continuous distribution, this can be expressed mathematically as

For a discrete distribution, the cdf can be expressed as

The following is the plot of the normal cumulative distribution function.

2

Page 3: Distributions

The horizontal axis is the allowable domain for the given probability function. Since the vertical axis is a probability, it must fall between zero and one. It increases from zero to one as we go from left to right on the horizontal axis.

Percent Point Function The percent point function (ppf) is the inverse of the cumulative distribution function. For this reason, the percent point function is also commonly referred to as the inverse distribution function. That is, for a distribution function we calculate the probability that the variable is less than or equal to x for a given x. For the percent point function, we start with the probability and compute the corresponding x for the cumulative distribution. Mathematically, this can be expressed as

or alternatively

The following is the plot of the normal percent point function.

3

Page 4: Distributions

Since the horizontal axis is a probability, it goes from zero to one. The vertical axis goes from the smallest to the largest value of the cumulative distribution function.

Hazard Function The hazard function is the ratio of the probability density function to the survival function, S(x).

The following is the plot of the normal distribution hazard function.

Hazard plots are most commonly used in reliability applications. Note that Johnson, Kotz, and Balakrishnan refer to this as the

4

Page 5: Distributions

conditional failure density function rather than the hazard function.

Cumulative Hazard Function

The cumulative hazard function is the integral of the hazard function. It can be interpreted as the probability of failure at time x given survival until time x.

This can alternatively be expressed as

The following is the plot of the normal cumulative hazard function.

Cumulative hazard plots are most commonly used in reliability applications. Note that Johnson, Kotz, and Balakrishnan refer to this as the hazard function rather than the cumulative hazard function.

Survival Function Survival functions are most often used in reliability and related fields. The survival function is the probability that the variate takes a value greater than x.

The following is the plot of the normal distribution survival function.

5

Page 6: Distributions

For a survival function, the y value on the graph starts at 1 and monotonically decreases to zero. The survival function should be compared to the cumulative distribution function.

Inverse Survival Function

Just as the percent point function is the inverse of the cumulative distribution function, the survival function also has an inverse function. The inverse survival function can be defined in terms of the percent point function.

The following is the plot of the normal distribution inverse survival function.

As with the percent point function, the horizontal axis is a

6

Page 7: Distributions

probability. Therefore the horizontal axis goes from 0 to 1 regardless of the particular distribution. The appearance is similar to the percent point function. However, instead of going from the smallest to the largest value on the vertical axis, it goes from the largest to the smallest value.

Families of DistributionsShape Parameters

Many probability distributions are not a single distribution, but are in fact a family of distributions. This is due to the distribution having one or more shape parameters.

Shape parameters allow a distribution to take on a variety of shapes, depending on the value of the shape parameter. These distributions are particularly useful in modeling applications since they are flexible enough to model a variety of data sets.

Example: Weibull Distribution

The Weibull distribution is an example of a distribution that has a shape parameter. The following graph plots the Weibull pdf with the following values for the shape parameter: 0.5, 1.0, 2.0, and 5.0.

The shapes above include an exponential distribution, a right-skewed distribution, and a relatively symmetric distribution.

The Weibull distribution has a relatively simple distributional form. However, the shape parameter allows the Weibull to assume a wide variety of shapes. This combination of simplicity and flexibility in the shape of the Weibull distribution has made it an effective distributional model in reliability applications. This ability to model a wide variety of distributional shapes using a relatively simple distributional form is possible with many other distributional families as well.

The sections on parameter estimation are restricted to the method of moments and maximum likelihood. This is because the least squares and PPCC and probability plot

7

Page 8: Distributions

estimation procedures are generic. The maximum likelihood equations are not listed if they involve solving simultaneous equations. This is because these methods require sophisticated computer software to solve. Except where the maximum likelihood estimates are trivial, you should depend on a statistical software program to compute them. References are given for those who are interested.

Be aware that different sources may give formulas that are different from those shown here. In some cases, these are simply mathematically equivalent formulations. In other cases, a different parameterization may be used

The PPCC plot can be used to estimate the shape parameter of a distribution with a single shape parameter. After finding the best value of the shape parameter, the probability plot can be used to estimate the location and scale parameters of a probability distribution

The advantages of this method are:

It is based on two well-understood concepts. 1. The linearity (i.e., straightness) of the probability plot is a good measure of the adequacy of

the distributional fit. 2. The correlation coefficient between the points on the probability plot is a good measure of

the linearity of the probability plot. It is an easy technique to implement for a wide variety of distributions with a single shape parameter.

The basic requirement is to be able to compute the percent point function, which is needed in the computation of both the probability plot and the PPCC plot.

The PPCC plot provides insight into the sensitivity of the shape parameter. That is, if the PPCC plot is relatively flat in the neighborhood of the optimal value of the shape parameter, this is a strong indication that the fitted model will not be sensitive to small deviations, or even large deviations in some cases, in the value of the shape parameter.

The maximum correlation value provides a method for comparing across distributions as well as identifying the best value of the shape parameter for a given distribution. For example, we could use the PPCC and probability fits for the Weibull, lognormal, and possibly several other distributions. Comparing the maximum correlation coefficient achieved for each distribution can help in selecting which is the best distribution to use.

The disadvantages of this method are:

It is limited to distributions with a single shape parameter. PPCC plots are not widely available in statistical software packages other than Dataplot (Dataplot

provides PPCC plots for 40+ distributions). Probability plots are generally available. However, many statistical software packages only provide them for a limited number of distributions.

Significance levels for the correlation coefficient (i.e., if the maximum correlation value is above a given value, then the distribution provides an adequate fit for the data with a given confidence level) have only been worked out for a limited number of distributions.

Continuous Distributions

The general formula for the probability density function of the normal distribution is

where is the location parameter and is the scale parameter. The case where = 0 and = 1 is called the standard normal distribution. The equation for the standard normal distribution is

8

Page 9: Distributions

Since the general form of probability functions can be expressed in terms of the standard distribution, all subsequent formulas in this section are given for the standard form of the function.

The following is the plot of the standard normal probability density function.

Cumulative Distribution FunctionThe formula for the cumulative distribution function of the normal distribution does not exist in a simple closed formula. It is computed numerically.

The following is the plot of the normal cumulative distribution function.

9

Page 10: Distributions

The formula for the percent point function of the normal distribution does not exist in a simple closed formula. It is computed numerically.

The following is the plot of the normal percent point function.

The formula for the hazard function of the normal distribution is

where is the cumulative distribution function of the standard normal distribution and

is the probability density function of the standard normal distribution.

The following is the plot of the normal hazard function.

10

Page 11: Distributions

The normal cumulative hazard function can be computed from the normal cumulative distribution function.

The following is the plot of the normal cumulative hazard function.

The normal survival function can be computed from the normal cumulative distribution function.

The following is the plot of the normal survival function.

The normal inverse survival function can be computed from the normal percent point function.

The following is the plot of the normal inverse survival function.

11

Page 12: Distributions

Mean The location parameter . Median The location parameter . Mode The location parameter . Range Infinity in both directions. Standard Deviation The scale parameter . Coefficient of Variation

Skewness 0 Kurtosis 3

Probability Density Function

The general formula for the probability density function of the uniform distribution is

where A is the location parameter and (B - A) is the scale parameter. The case where A = 0 and B = 1 is called the standard uniform distribution. The equation for the standard uniform distribution is

Since the general form of probability functions can be expressed in terms of the standard distribution, all subsequent formulas in this section are given for the standard form of the function.

The following is the plot of the uniform probability density function.

12

Page 13: Distributions

Cumulative Distribution Function

The formula for the cumulative distribution function of the uniform distribution is

The following is the plot of the uniform cumulative distribution function.

Percent Point Function

The formula for the percent point function of the uniform distribution is

The following is the plot of the uniform percent point function.

13

Page 14: Distributions

Hazard Function The formula for the hazard function of the uniform distribution is

The following is the plot of the uniform hazard function.

Cumulative Hazard Function

The formula for the cumulative hazard function of the uniform distribution is

The following is the plot of the uniform cumulative hazard function.

14

Page 15: Distributions

Survival Function

The uniform survival function can be computed from the uniform cumulative distribution function.

The following is the plot of the uniform survival function.

Inverse Survival Function

The uniform inverse survival function can be computed from the uniform percent point function.

The following is the plot of the uniform inverse survival function.

15

Page 16: Distributions

Common Statistics

Mean (A + B)/2 Median (A + B)/2 Range B - A Standard Deviation

Coefficient of Variation

Skewness 0 Kurtosis 9/5

Parameter Estimation

The method of moments estimators for A and B are

The maximum likelihood estimators are usually given in terms of the parameters a and h where

A = a - h B = a + h

The maximum likelihood estimators for a and h are

This gives the following maximum likelihood estimators for A and B

Comments The uniform distribution defines equal probability over a given range for a continuous distribution. For this reason, it is important as a reference distribution.

One of the most important applications of the uniform distribution is in the generation of random numbers. That is, almost all random number generators generate random numbers on the (0,1) interval. For other distributions, some transformation is applied to the uniform random

16

Page 17: Distributions

numbers.

Probability Density Function

The general formula for the probability density function of the exponential distribution is

where is the location parameter and is the scale parameter

(the scale parameter is often referred to as which equals ).

The case where = 0 and = 1 is called the standard exponential distribution. The equation for the standard exponential distribution is

The general form of probability functions can be expressed in terms of the standard distribution. Subsequent formulas in this section are given for the 1-parameter (i.e., with scale parameter) form of the function.

The following is the plot of the exponential probability density function.

Cumulative Distribution Function

The formula for the cumulative distribution function of the exponential distribution is

The following is the plot of the exponential cumulative distribution function.

17

Page 18: Distributions

Percent Point Function

The formula for the percent point function of the exponential distribution is

The following is the plot of the exponential percent point function.

Hazard Function The formula for the hazard function of the exponential distribution is

The following is the plot of the exponential hazard function.

18

Page 19: Distributions

Cumulative Hazard Function

The formula for the cumulative hazard function of the exponential distribution is

The following is the plot of the exponential cumulative hazard function.

Survival Function

The formula for the survival function of the exponential distribution is

The following is the plot of the exponential survival function.

19

Page 20: Distributions

Inverse Survival Function

The formula for the inverse survival function of the exponential distribution is

The following is the plot of the exponential inverse survival function.

Common Statistics

Mean

Median

Mode Zero Range Zero to plus infinity Standard Deviation Coefficient of Variation

1

20

Page 21: Distributions

Skewness 2 Kurtosis 9

Parameter Estimation

For the full sample case, the maximum likelihood estimator of the scale parameter is the sample mean. Maximum likelihood estimation for the exponential distribution is discussed in the chapter on reliability (Chapter 8). It is also discussed in chapter 19 of Johnson, Kotz, and Balakrishnan.

Comments The exponential distribution is primarily used in reliability applications. The exponential distribution is used to model data with a constant failure rate (indicated by the hazard plot which is simply equal to a constant).

t Distribution

Probability Density Function

The formula for the probability density function of the t distribution is

where is the beta function and is a positive integer shape parameter. The formula for the beta function is

In a testing context, the t distribution is treated as a “standardized distribution” (i.e., no location or scale parameters). However, in a distributional modeling context (as with other probability distributions), the t distribution itself can be transformed with a location parameter, , and a scale parameter, .

The following is the plot of the t probability density function for 4 different values of the shape parameter.

21

Page 22: Distributions

These plots all have a similar shape. The difference is in the heaviness of the tails. In fact, the t distribution with equal to 1 is a Cauchy distribution. The t distribution approaches a normal distribution as becomes large. The approximation is quite good for values of > 30.

Cumulative Distribution Function

The formula for the cumulative distribution function of the t distribution is complicated and is not included here. It is given in the Evans, Hastings, and Peacock book.

The following are the plots of the t cumulative distribution function with the same values of as the pdf plots above.

Percent Point Function

The formula for the percent point function of the t distribution does not exist in a simple closed form. It is computed numerically.

The following are the plots of the t percent point function with

22

Page 23: Distributions

the same values of as the pdf plots above.

Other Probability Functions

Since the t distribution is typically used to develop hypothesis tests and confidence intervals and rarely for modeling applications, we omit the formulas and plots for the hazard, cumulative hazard, survival, and inverse survival probability functions.

Common Statistics

Mean 0 (It is undefined for equal to 1.) Median 0 Mode 0 Range Infinity in both directions. Standard Deviation

It is undefined for equal to 1 or 2. Coefficient of Variation

Undefined

Skewness 0. It is undefined for less than or equal to 3. However, the t distribution is symmetric in all cases.

Kurtosis

It is undefined for less than or equal to 4.

Parameter Estimation

Since the t distribution is typically used to develop hypothesis tests and confidence intervals and rarely for modeling applications, we omit any discussion of parameter estimation.

Comments The t distribution is used in many cases for the critical regions for hypothesis tests and in determining confidence intervals. The most common example is testing if data are consistent with the assumed process mean

23

Page 24: Distributions

F Distribution

Probability Density Function

The F distribution is the ratio of two chi-square distributions with degrees of freedom and , respectively, where each chi-square has first been divided by its degrees of freedom. The formula for the probability density function of the F distribution is

where and are the shape parameters and is the gamma function. The formula for the gamma function is

In a testing context, the F distribution is treated as a “standardized distribution” (i.e., no location or scale parameters). However, in a distributional modeling context (as with other probability distributions), the F distribution itself can be transformed with a location parameter, , and a scale parameter, .

The following is the plot of the F probability density function for 4 different values of the shape parameters.

Cumulative Distribution Function

The formula for the Cumulative distribution function of the F distribution is

where k = / ( + *x) and Ik is the incomplete beta function. The formula for the incomplete beta function is

where B is the beta function

The following is the plot of the F cumulative distribution function with the same

24

Page 25: Distributions

values of and as the pdf plots above.

Percent Point Function

The formula for the percent point function of the F distribution does not exist in a simple closed form. It is computed numerically.

The following is the plot of the F percent point function with the same values of and as the pdf plots above.

Other Probability Functions

Since the F distribution is typically used to develop hypothesis tests and confidence intervals and rarely for modeling applications, we omit the formulas and plots for the hazard, cumulative hazard, survival, and inverse survival probability functions.

Common Statistics

The formulas below are for the case where the location parameter is zero and the scale parameter is one. Mean

25

Page 26: Distributions

Mode

Range 0 to positive infinity Standard Deviation

Coefficient of Variation

Skewness

Parameter Estimation

Since the F distribution is typically used to develop hypothesis tests and confidence intervals and rarely for modeling applications, we omit any discussion of parameter estimation.

Comments The F distribution is used in many cases for the critical regions for hypothesis tests and in determining confidence intervals. Two common examples are the analysis of variance and the F test to determine if the variances of two populations are equal.

Chi-Square Distribution

Probability Density Function

The chi-square distribution results when independent variables with standard normal distributions are squared and summed. The formula for the probability density function of the chi-square distribution is

where is the shape parameter and is the gamma function. The formula for the gamma function is

In a testing context, the chi-square distribution is treated as a “standardized distribution” (i.e., no location or scale parameters). However, in a distributional modeling context (as with other probability distributions), the chi-square distribution itself can be transformed with a location parameter, , and a scale parameter, .

The following is the plot of the chi-square probability density function for 4 different values of the shape parameter.

26

Page 27: Distributions

Cumulative Distribution Function

The formula for the cumulative distribution function of the chi-square distribution is

where is the gamma function defined above and is the incomplete gamma function. The formula for the incomplete gamma function is

The following is the plot of the chi-square cumulative distribution function with the same values of as the pdf plots above.

27

Page 28: Distributions

Percent Point Function

The formula for the percent point function of the chi-square distribution does not exist in a simple closed form. It is computed numerically.

The following is the plot of the chi-square percent point function with the same values of as the pdf plots above.

Other Probability Functions

Since the chi-square distribution is typically used to develop hypothesis tests and confidence intervals and rarely for modeling applications, we omit the formulas and plots for the hazard, cumulative hazard, survival, and inverse survival probability functions.

Common Statistics

Mean

28

Page 29: Distributions

Median approximately - 2/3 for large

Mode

Range 0 to positive infinity Standard Deviation

Coefficient of Variation

Skewness

Kurtosis

Parameter Estimation

Since the chi-square distribution is typically used to develop hypothesis tests and confidence intervals and rarely for modeling applications, we omit any discussion of parameter estimation.

Comments The chi-square distribution is used in many cases for the critical regions for hypothesis tests and in determining confidence intervals. Two common examples are the chi-square test for independence in an RxC contingency table and the chi-square test to determine if the standard deviation of a population is equal to a pre-specified value.

Cauchy Distribution

Probability Density Function

The general formula for the probability density function of the Cauchy distribution is

where t is the location parameter and s is the scale parameter. The case where t = 0 and s = 1 is called the standard Cauchy distribution. The equation for the standard Cauchy distribution reduces to

Since the general form of probability functions can be expressed in terms of the standard distribution, all subsequent formulas in this section are given for the standard form of the

29

Page 30: Distributions

function.

The following is the plot of the standard Cauchy probability density function.

Cumulative Distribution Function

The formula for the cumulative distribution function for the Cauchy distribution is

The following is the plot of the Cauchy cumulative distribution function.

Percent Point Function

The formula for the percent point function of the Cauchy distribution is

30

Page 31: Distributions

The following is the plot of the Cauchy percent point function.

Hazard Function The Cauchy hazard function can be computed from the Cauchy probability density and cumulative distribution functions.

The following is the plot of the Cauchy hazard function.

Cumulative Hazard Function

The Cauchy cumulative hazard function can be computed from the Cauchy cumulative distribution function.

The following is the plot of the Cauchy cumulative hazard function.

31

Page 32: Distributions

Survival Function

The Cauchy survival function can be computed from the Cauchy cumulative distribution function.

The following is the plot of the Cauchy survival function.

Inverse Survival Function

The Cauchy inverse survival function can be computed from the Cauchy percent point function.

The following is the plot of the Cauchy inverse survival function.

32

Page 33: Distributions

Common Statistics

Mean The mean is undefined. Median The location parameter t. Mode The location parameter t. Range Infinity in both directions. Standard Deviation

The standard deviation is undefined.

Coefficient of Variation

The coefficient of variation is undefined.

Skewness The skewness is undefined. Kurtosis The kurtosis is undefined.

Parameter Estimation

The likelihood functions for the Cauchy maximum likelihood estimates are given in chapter 16 of Johnson, Kotz, and Balakrishnan. These equations typically must be solved numerically on a computer.

Comments The Cauchy distribution is important as an example of a pathological case. Cauchy distributions look similar to a normal distribution. However, they have much heavier tails. When studying hypothesis tests that assume normality, seeing how the tests perform on data from a Cauchy distribution is a good indicator of how sensitive the tests are to heavy-tail departures from normality. Likewise, it is a good check for robust techniques that are designed to work well under a wide variety of distributional assumptions.

The mean and standard deviation of the Cauchy distribution are undefined. The practical meaning of this is that collecting 1,000 data points gives no more accurate an estimate of the mean and standard deviation than does a single point.

Double Exponential Distribution

Probability Density Function

The general formula for the probability density function of the double exponential distribution is

33

Page 34: Distributions

where is the location parameter and is the scale parameter.

The case where = 0 and = 1 is called the standard double exponential distribution. The equation for the standard double exponential distribution is

Since the general form of probability functions can be expressed in terms of the standard distribution, all subsequent formulas in this section are given for the standard form of the function.

The following is the plot of the double exponential probability density function.

Cumulative Distribution Function

The formula for the cumulative distribution function of the double exponential distribution is

The following is the plot of the double exponential cumulative distribution function.

34

Page 35: Distributions

Percent Point Function

The formula for the percent point function of the double exponential distribution is

The following is the plot of the double exponential percent point function.

Hazard Function The formula for the hazard function of the double exponential distribution is

35

Page 36: Distributions

The following is the plot of the double exponential hazard function.

Cumulative Hazard Function

The formula for the cumulative hazard function of the double exponential distribution is

The following is the plot of the double exponential cumulative hazard function.

Survival Function

The double exponential survival function can be computed from the cumulative distribution function of the double exponential distribution.

The following is the plot of the double exponential survival

36

Page 37: Distributions

function.

Inverse Survival Function

The formula for the inverse survival function of the double exponential distribution is

The following is the plot of the double exponential inverse survival function.

Common Statistics

Mean Median Mode Range Negative infinity to positive infinity Standard

37

Page 38: Distributions

Deviation Skewness 0 Kurtosis 6 Coefficient of Variation

Parameter Estimation

The maximum likelihood estimators of the location and scale parameters of the double exponential distribution are

where is the sample median.

Weibull Distribution

Probability Density Function

The formula for the probability density function of the general Weibull distribution is

where is the shape parameter, is the location parameter and is the scale parameter. The case where = 0 and = 1 is called the standard Weibull distribution. The case where = 0 is called the 2-parameter Weibull distribution. The equation for the standard Weibull distribution reduces to

Since the general form of probability functions can be expressed in terms of the standard distribution, all subsequent formulas in this section are given for the standard form of the function.

The following is the plot of the Weibull probability density function.

38

Page 39: Distributions

Cumulative Distribution Function

The formula for the cumulative distribution function of the Weibull distribution is

The following is the plot of the Weibull cumulative distribution function with the same values of as the pdf plots above.

Percent Point Function

The formula for the percent point function of the Weibull distribution is

The following is the plot of the Weibull percent point function with the same values of as the pdf plots above.

39

Page 40: Distributions

Hazard Function The formula for the hazard function of the Weibull distribution is

The following is the plot of the Weibull hazard function with the same values of as the pdf plots above.

Cumulative Hazard Function

The formula for the cumulative hazard function of the Weibull distribution is

The following is the plot of the Weibull cumulative hazard function with the same values of as the pdf plots above.

40

Page 41: Distributions

Survival Function

The formula for the survival function of the Weibull distribution is

The following is the plot of the Weibull survival function with the same values of as the pdf plots above.

Inverse Survival Function

The formula for the inverse survival function of the Weibull distribution is

The following is the plot of the Weibull inverse survival function with the same values of as the pdf plots above.

41

Page 42: Distributions

Common Statistics

The formulas below are with the location parameter equal to zero and the scale parameter equal to one. Mean

where is the gamma function

Median

Mode

Range Zero to positive infinity. Standard Deviation

Coefficient of Variation

Parameter Estimation

Maximum likelihood estimation for the Weibull distribution is discussed in the Reliability chapter (Chapter 8). It is also discussed in Chapter 21 of Johnson, Kotz, and Balakrishnan.

Comments The Weibull distribution is used extensively in reliability applications to model failure times.

Lognormal Distribution Probability Density Function

A variable X is lognormally distributed if Y = LN(X) is normally distributed with “LN” denoting the natural logarithm. The general formula for the probability density function of the lognormal distribution is

42

Page 43: Distributions

where is the shape parameter, is the location parameter and m is the scale parameter. The case where = 0 and m = 1 is called the standard lognormal distribution. The case where equals zero is called the 2-parameter lognormal distribution.

The equation for the standard lognormal distribution is

Since the general form of probability functions can be expressed in terms of the standard distribution, all subsequent formulas in this section are given for the standard form of the function. The following is the plot of the lognormal probability density function for four values of .

There are several common parameterizations of the lognormal distribution. The form given here is from Evans, Hastings, and Peacock.

Cumulative Distribution Function

The formula for the cumulative distribution function of the lognormal distribution is

where is the cumulative distribution function of the normal distribution. The following is the plot of the lognormal cumulative distribution function with the same values of as the pdf plots above.

43

Page 44: Distributions

Percent Point Function

The formula for the percent point function of the lognormal distribution is

where is the percent point function of the normal distribution. The following is the plot of the lognormal percent point function with the same values of as the pdf plots above.

Hazard Function The formula for the hazard function of the lognormal distribution is

where is the probability density function of the normal distribution and is the cumulative distribution function of the normal distribution. The following is the plot of the lognormal hazard function with the same values of as the pdf plots above.

44

Page 45: Distributions

Cumulative Hazard Function

The formula for the cumulative hazard function of the lognormal distribution is

where is the cumulative distribution function of the normal distribution. The following is the plot of the lognormal cumulative hazard function with the same values of as the pdf plots above.

Survival Function

The formula for the survival function of the lognormal distribution is

where is the cumulative distribution function of the normal distribution. The following is the plot of the lognormal survival function with the same values of as the pdf plots above.

45

Page 46: Distributions

Inverse Survival Function

The formula for the inverse survival function of the lognormal distribution is

where is the percent point function of the normal distribution. The following is the plot of the lognormal inverse survival function with the same values of as the pdf plots above.

Common Statistics

The formulas below are with the location parameter equal to zero and the scale parameter equal to one. Mean

Median Scale parameter m (= 1 if scale parameter not specified).

Mode

Range Zero to positive infinity Standard Deviation

46

Page 47: Distributions

Skewness

Kurtosis

Coefficient of Variation

Parameter Estimation

The maximum likelihood estimates for the scale parameter, m, and the shape parameter, , are

and

where

If the location parameter is known, it can be subtracted from the original data points before computing the maximum likelihood estimates of the shape and scale parameters.

Comments The lognormal distribution is used extensively in reliability applications to model failure times. The lognormal and Weibull distributions are probably the most commonly used distributions in reliability applications.

Beta Distribution

Probability Density Function

The general formula for the probability density function of the beta distribution is

where p and q are the shape parameters, a and b are the lower and upper bounds, respectively, of the distribution, and B(p,q) is the beta function. The beta function has the formula

The case where a = 0 and b = 1 is called the standard beta distribution. The equation for the standard beta distribution is

Typically we define the general form of a distribution in terms of location and scale parameters. The beta is different in that we define the general distribution in terms of the lower and upper bounds. However, the location and scale parameters can be defined in terms of the lower and upper limits as follows:

47

Page 48: Distributions

location = a scale = b - a

Since the general form of probability functions can be expressed in terms of the standard distribution, all subsequent formulas in this section are given for the standard form of the function.

The following is the plot of the beta probability density function for four different values of the shape parameters.

Cumulative Distribution Function

The formula for the cumulative distribution function of the beta distribution is also called the incomplete beta function ratio (commonly denoted by Ix) and is defined as

where B is the beta function defined above.

The following is the plot of the beta cumulative distribution function with the same values of the shape parameters as the pdf plots above.

48

Page 49: Distributions

Percent Point Function

The formula for the percent point function of the beta distribution does not exist in a simple closed form. It is computed numerically.

The following is the plot of the beta percent point function with the same values of the shape parameters as the pdf plots above.

Other Probability Functions

Since the beta distribution is not typically used for reliability applications, we omit the formulas and plots for the hazard, cumulative hazard, survival, and inverse survival probability functions.

Common Statistics

The formulas below are for the case where the lower limit is zero and the upper limit is one. Mean

Mode

Range 0 to 1

49

Page 50: Distributions

Standard Deviation Coefficient of Variation

Skewness

Parameter Estimation

First consider the case where a and b are assumed to be known. For this case, the method of moments estimates are

where is the sample mean and s2 is the sample variance. If a and b are not 0 and

1, respectively, then replace with and s2 with in the above equations.

For the case when a and b are known, the maximum likelihood estimates can be obtained by solving the following set of equations

DISCRETE DISTRIBUTIONS Binomial Distribution

Probability Mass Function

The binomial distribution is used when there are exactly two mutually exclusive outcomes of a trial. These outcomes are appropriately labeled “success” and “failure”. The binomial distribution is used to obtain the probability of observing x successes in N trials, with the probability of success on a single trial denoted by p. The binomial distribution assumes that p is fixed for all trials.

The formula for the binomial probability mass function is

where

50

Page 51: Distributions

The following is the plot of the binomial probability density function for four values of p and n = 100.

Cumulative Distribution Function

The formula for the binomial cumulative probability function is

The following is the plot of the binomial cumulative distribution function with the same values of p as the pdf plots above.

Percent Point The binomial percent point function does not exist in simple closed form. It is

51

Page 52: Distributions

Function computed numerically. Note that because this is a discrete distribution that is only defined for integer values of x, the percent point function is not smooth in the way the percent point function typically is for a continuous distribution.

The following is the plot of the binomial percent point function with the same values of p as the pdf plots above.

Common Statistics

Mean Mode

Range 0 to N Standard Deviation Coefficient of Variation

Skewness

Kurtosis

Comments The binomial distribution is probably the most commonly used discrete distribution. Parameter Estimation

The maximum likelihood estimator of p (n is fixed) is

Poisson Distribution

Probability Mass Function

The Poisson distribution is used to model the number of events occurring within a given time interval.

52

Page 53: Distributions

The formula for the Poisson probability mass function is

is the shape parameter which indicates the average number of events in the given time interval.

The following is the plot of the Poisson probability density function for four values of .

Cumulative Distribution Function

The formula for the Poisson cumulative probability function is

The following is the plot of the Poisson cumulative distribution function with the same values of as the pdf plots above.

53

Page 54: Distributions

Percent Point Function

The Poisson percent point function does not exist in simple closed form. It is computed numerically. Note that because this is a discrete distribution that is only defined for integer values of x, the percent point function is not smooth in the way the percent point function typically is for a continuous distribution.

The following is the plot of the Poisson percent point function with the same values of as the pdf plots above.

Common Statistics

Mean Mode For non-integer , it is the largest integer

less than . For integer , x = and x = - 1 are both the mode.

Range 0 to positive infinity Standard Deviation Coefficient of

54

Page 55: Distributions

Variation Skewness

Kurtosis

Parameter Estimation

The maximum likelihood estimator of is

where is the sample mean.

55