121
DRAFT The Maxwell Distribution: Properties, Parameter Estimators and Some Applications Rev. 3.1 Doug Hollingshead [email protected] November 2009 Abstract This note is intended to provide some statistical concepts and results related to the Maxwell (or Maxwell-Boltzmann) distribution. This distribution can be derived from the three degree-of-freedom Chi-squared distribution. The Maxwell distribution is the three dimensional counterpart of the Rayleigh distribution when interpreted as sums of physical quantities. Many standard probability texts discuss statistical properties of the Rayleigh distribution. To date, the author has not found an expanded discussion of the Maxwell distribution properties, or formulas pertaining to parameter estimation. This note is primarily aimed at deriving relevant statistical properties of the one-parameter Maxwell distribution and presenting formulae for various point estimators for this parameter. Maximum likelihood, method of moments and quantile point estimation procedures are discussed. Cramer-Rao lower bound relative efficiency comparison of the three estimators is included. The discussion of methods proceeds from general 1

Maxwell Distribution Rev3

Embed Size (px)

DESCRIPTION

Compiliation of properties for the Maxwell-Boltzmann probability distribution. Includes methodology for statistical estimation of parameters by MoM, MLE, Bayesian and quantile estimators. Provides MATLAB program to generate plotting paper. Presents table for Andrson-Darling GoF test and a table for outlier detection based on likelihood ratio test.

Citation preview

Page 1: Maxwell Distribution Rev3

DRAFT

The Maxwell Distribution:Properties, Parameter Estimators and Some Applications

Rev. 3.1

Doug Hollingshead [email protected]

November 2009

Abstract

This note is intended to provide some statistical concepts and results related to the Maxwell (or Maxwell-Boltzmann) distribution. This distribution can be derived from the three degree-of-freedom Chi-squared distribution. The Maxwell distribution is the three dimensional counterpart of the Rayleigh distribution when interpreted as sums of physical quantities. Many standard probability texts discuss statistical properties of the Rayleigh distribution. To date, the author has not found an expanded discussion of the Maxwell distribution properties, or formulas pertaining to parameter estimation.

This note is primarily aimed at deriving relevant statistical properties of the one-parameter Maxwell distribution and presenting formulae for various point estimators for this parameter. Maximum likelihood, method of moments and quantile point estimation procedures are discussed. Cramer-Rao lower bound relative efficiency comparison of the three estimators is included. The discussion of methods proceeds from general concepts to specifics related to the Maxwell distribution. A summary of important formulae is provided. Appendix A provides discussion of Bayesian estimation for the Maxwell distribution parameter. Two- parameter Maxwell distributions are discussed in Appendix B. Additional information pertaining to the Maxwell distribution statistics is provided in the remaining appendices.

An attempt is made to identify some situations where knowledge of the Maxwell distribution and its estimators is helpful. These include three dimensional miss distance, and detection of mismatch between the assumed and actual covariance matrix associated with an alpha-beta filter. The Mahalanobis distance for tri-variate normally distributed variables is introduced as the primary analytical tool for this application.

1

Page 2: Maxwell Distribution Rev3

DRAFT

One Parameter Maxwell Distribution Formula Summary

Density Function:

CDF:

Mean, Mode and Variance:

One Parameter Maxwell Statistics for n samples

Minimum Variance Bound:

MLE estimator:

MOM estimator:

Quantile Estimator:

Two Parameter Maxwell Distribution Formula Summary (Appendix B)

; ;

Two Parameter Maxwell Statistics for n samples

List of Appendices

2

Page 3: Maxwell Distribution Rev3

DRAFT

Appendix A: Bayes’ Estimators for the Maxwell Distribution

Appendix B: The Two-parameter Maxwell Distribution

Appendix C: Sample Moment Standard Error and the Delta Method

Appendix D: Some Results for Order Statistics of a Sample

Appendix E: Sums of Maxwell IID Random Variables

Appendix F: Goodness of Fit and Outlier Tests for Maxwell Distribution

Appendix G: Probability Plots and MATLAB Code for Plot Generation

Appendix H: Chi Distributions for k > 3 and Nakagami Distributions

Appendix I: Inverse Maxwell Distribution

References

3

Page 4: Maxwell Distribution Rev3

DRAFT

The Maxwell Distribution:

Properties, Parameter Estimators and Some Applications

Purpose and Overview

The Maxwell (or more correctly, the Maxwell-Boltzmann) distribution was originally derived using statistical mechanics, to explain the most probable energy distribution of molecules in a system. The kinetic theory of gasses rests on this distribution. Later developments in probability theory showed that this distribution can be derived from the more general Chi-squared distribution, which in turn is a special case of the Gamma distribution.

The Maxwell distribution is related to the Chi-squared distribution, with three degrees of freedom. This distribution is essentially the “3-dimensional” equivalent of the Rayleigh distribution. The Chi-Squared distribution expresses the probability distribution of the sum of the squares of random variables drawn from a Normal distribution. The Rayleigh and Maxwell distributions define a random variable in terms of the root-mean-square (RMS) of these sums. It is important to note the underlying variables are distributed normally, but have a common variance.

This paper is intended to provide some technical background on the properties of the Maxwell distribution including distribution function, moments and other properties. A second goal is to discuss estimation methods for the single parameter of this distribution as well as discuss mean square error of the various statistical estimators. The statistical efficiency concept for estimators is introduced. Calculation of efficiency requires on the general expressions for variance of the sample moments.

In this regard, three methods of parameter estimation will be discussed: method of moments (MOM), method of quantiles, and maximum likelihood (MLE). The Cramer-Rao Lower Bound (CRLB) for this distribution will be derived. The MLE estimator is shown to achieve this lower bound.

Necessary concepts of order statistics are introduced to aid in development of quantile estimators. Quantile estimators are generally very easy to calculate and are quite robust when outliers are present in the data set. They are however less efficient than either MOM or MLE estimators. A simple expression for the Maxwell minimum variance quantile estimator is provided.

Bayesian estimators for the Maxwell distribution may be useful when using the Maxwell distribution but are often limited by lack of knowledge of a suitable “a priori” distribution. Bayesian estimation is discussed in Appendix A. Because of the structure of the likelihood function for this distribution, a closed form solution exists only for special prior distributions. The concept of sufficient statistics is introduced here. One such prior is discussed and the Bayes’ estimate (i.e., mean) of the posterior distribution is provided. Comparison with MLE estimators is made. An alternative “maximum a priori” or MAP (i.e., mode) estimator is also mentioned.

4

Page 5: Maxwell Distribution Rev3

DRAFT

The last goal of this paper is to indicate two potential applications of the Maxwell distribution as applied to models or experiments encountered in missile design and analysis. The application motivating this paper was the need to provide a simple metric for evaluation of the output from a specific “alpha-beta” filter. The “capture probability” is defined as the proportion of “weighted error magnitudes” that lie inside the volume contained within the “2.5-sigma” covariance ellipsoid.

The capture probability involves calculation of the Mahalanobis distance for each of the data points. This distance will be introduced, and it will be shown that even when the various error components are correlated, the Mahalanobis distance still has a Maxwell distribution.

In some cases, the Maxwell distribution may not have the lower bound at zero. The distribution will then have a location parameter as well as the scale parameter. This two parameter Maxwell distribution is discussed in Appendix B, along with derivation of properties of method-of-moments parameter estimators.

Derivation of both MLE and MOM estimates depend on the variance of the raw moment estimators. Appendix C contains additional derivations pertaining to the variance and covariance of moment statistics. Approximate variance of functions of the raw moments are also discussed where the Delta Method is introduced for this purpose.

Appendix D develops some necessary results for order statistics of a sample, including the expression for the expected value of the quantiles. The asymptotic mean and variance for order statistics themselves is also derived and an application for the Maxwell distribution is provided.

Appendix E derives the distribution for the sum or two Maxwell random variables. The sum of N independent Maxwell random variables and the asymptotic normal distribution are also discussed.

Appendix F provides an introductory discussion of hypothesis testing, goodness-of-fit tests and tests for outliers. The appendix contains Monte Carlo derived tables for testing goodness of fit via the Anderson-Darling statistic and using correlation statistics for the Maxwell distribution. Tables are also provided for testing for k outliers in samples assumed to be Maxwell distributed. The author is unaware of tables being published elsewhere for either of these statistics from hypothesized Maxwell distributions. Appendix G discusses and provides MATLAB code for development of Maxwell distribution “probability paper.” These plots are useful for initial evaluation of data.

Appendix H provides a brief overview of Chi distributions for k > 3. Here, k is the number of squared normal variables included in the root-mean-square sum. A generalized form of this distribution is the Nakagami. Parameter estimation methods for these distributions are also discussed.

5

Page 6: Maxwell Distribution Rev3

DRAFT

Distribution Derivation

The sum of squares of “n” independent random variables, each with a standard normal distribution, result in a random variable that has a Chi-squared distribution with “n” degrees-of-freedom. References [1] or [2] contain details. We are currently interested in the sum of three independent variables which are normally distributed with zero mean. The three component variables have a common variance, and each variable is normalized by dividing the square of each by variance . The resulting Chi-squared random variable, W, with three degrees-of-freedom is distributed as:

The gamma function is expressed as an integral, but is constant for a fixed number of degrees of freedom. For three degrees of freedom, .

The variable of interest is the square root of W. Consider the transformation: where “a” is a scaling factor. It will be shown subsequently that this factor is the standard deviation of the normal random variables. The distribution of variable y is required.

The inverse of the transformation is . The support of W is the positive real axis and as such, the support of y is also the positive real axis. Further, a one-to-one transformation exists between y and W, so the distribution of y can be found from the standard procedure (Reference [4]):

This distribution is in the general family of “Chi distributions.” The three degree of freedom Chi distribution is the Maxwell density function. Substituting the transformed variable, and the value for , provides the Maxwell probability density function:

Equation (1)

The Cumulative Density Function (CDF) is derived by direct integration over the interval where .

Equation (2)

The error function is a tabulated integral function defined as:

;

It is noted that if we define and , the parameter “a” is a “scale” parameter for the Maxwell distribution. This is seen by writing Equation 1 as:

6

Page 7: Maxwell Distribution Rev3

DRAFT

Figure 1 shows the Maxwell distribution for various values of the scale parameter:

0 1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

f(y)

y

a = 1

a = 2a = 3

a = 5

Figure 1 – Maxwell Probability Distribution

It is interesting to note that the distribution has an approximate “bell shape” which implies this distribution is close in some sense to the normal distribution.

7

Page 8: Maxwell Distribution Rev3

DRAFT

Maxwell Distribution Properties

Calculation of the distribution moment (about the origin) can in theory be done using the moment generating function (MGF). The MGF does exist for this distribution but is quite complicated. Moments about the origin are more easily evaluated by direct integration. For the mean (first moment):

The second moment about the origin is:

The third and fourth moments about the origin can be calculated similarly:

The variance follows from the first two moments:

Skewness of the distribution involves the third moment about the mean. It should be positive as indicated in Figure 1. Skewness and kurtosis can be shown to be constant:

and

This is not a highly skewed distribution, as Figure 1 confirms. The mode of the Maxwell distribution can be found by differentiating equation 1 and setting the result to zero (note: a > 0):

8

Page 9: Maxwell Distribution Rev3

DRAFT

Parameter Estimation and CRLB

CRLB for Maxwell Distribution

Various methods exist to estimate parameter “a” when samples from the Maxwell distribution are available. Since the estimate depends on this data, it is a random variable and has an associated sampling distribution, . The notation implies the estimator . The distribution of this random variable depends on the true parameter value “a”. The sampling distribution generally is unknown, but the variance of the estimator can often be calculated, at least approximately. Thus, we desire to find the variance of this estimator, denoted . The variance of an estimator is referred to as the Mean Square Error (MSE) of the estimator. The square root of the MSE is referred to as Standard Error (SE) of the estimator.

The first question regarding estimators is: does a theoretical lower bound exist for the MSE, and if so is there an estimator that achieves this bound? The Cramer-Rao Lower Bound (CRLB), if it exists, provides an expression for the minimum MSE. It is a function of the particular distribution and sample size. An estimator that attains this lower bound may not exist. If the CRLB exists, it is derived from the “likelihood function” of the sample.

The likelihood function of a sample is the product of the probabilities, or joint distribution, of the sample values. The Maxwell distribution, with sample size n, has the following likelihood function:

Equation (3)

The likelihood function is generally more useful in logarithmic form:

Equation (4)

Note that the logarithmic function is monotonic increasing, so the operations on the likelihood function can be replaced with operations on the “log likelihood” function in most cases:

The CRLB derivation can be found in References [1] or [2]. In order for the CRLB to exist, the support of the density function cannot depend on the parameters, and the first two derivatives of the log likelihood function, with respect to the parameters, must exist. These “regularity conditions” are both met for the Maxwell distribution. The CRLB is:

In the previous expression, E denotes expectation and I(a) is called the “Fisher information.”

9

Page 10: Maxwell Distribution Rev3

DRAFT

For the Maxwell distribution, evaluation of the Fisher information is straightforward:

The first integral is evaluated as:

The remaining two integrals are similar, with one term involving the error function and the second term involving a polynomial in y, multiplied by the exponential term. The Fisher information for the Maxwell distribution is:

The CRLB is:

Equation (5)

Moment Estimators

Given that the CRLB exists, the second question is: how close does the MSE of a chosen estimator function get to the minimum variance provided by this lower bound? The ratio of the CRLB to the MSE for a particular estimator is referred to as the estimator efficiency.

In order to evaluate efficiency of a selected estimator, the variance of that estimator is required. We will address three kinds of estimators; Maximum Likelihood (MLE), Method of Moments (MOM) and method of quantiles. All of these methods allow for relatively direct calculation of the MSE of the estimators. The last one requires some results from order statistics of a sample. The MSE of quantile estimators is addressed in the next section.

MLE and MOM estimators depend on the sample moments. As such, an expression for variance of the sample moments is required. The “r-th’ sample moment about the origin is given by the following computational formula, for random variable Y:

Note that sample moments are random variables, because they depend on the random sample values. It is shown in Appendix C that the sample moments are unbiased.

10

Page 11: Maxwell Distribution Rev3

DRAFT

Appendix C contains a general derivation of the variance and covariance of these sample moments. Because the Maxwell distribution contains only one parameter, only one sample moment is required and no covariance is involved. The expression of interest is:

Equations (6)

MLE and MOM estimators are functions of sample moments as will be shown. An expression for the variance of a function of these sample moments is therefore required. The exact expression for the variance of a function of these random variables is generally not known. In such cases, an approximation can be developed via the “Delta Method,” which is discussed in Appendix C.

In the particular case of the single parameter Maxwell distribution:

where

The variance of this estimator is expressed in terms of the derivatives of the function (see Appendix C):

Equation (7)

The partial derivative is evaluated at the respective mean value: . The MSE of

parameter can be calculated using Equations 6 and 7.

Maximum Likelihood Estimate of “a”

The maximum likelihood principle is based on the notion that parameter values occurring in the likelihood (or log likelihood) function should maximize that function. In this application, the likelihood function is no longer considered as a joint probability function as it was for the CRLB development. In this case, the data values are fixed, and the likelihood is viewed as a function of the parameter value(s). The parameter is chosen such that the probability of observing the actual data values is maximized.

For the Maxwell distribution, the derivative of Equation 4 is taken and equated to zero for maximization:

Solving this last equation provides the MLE for the parameter “a”:

Equation (8)

To calculate the MSE of this estimator, note that the statistic considered is:

11

Page 12: Maxwell Distribution Rev3

DRAFT

and Equation (9)

The variance of the second moment is calculated by inserting values for the second and fourth moments into Equation 6:

Squaring Equation 9, using the previous result, and plugging into Equation 7 provides MLE estimator variance:

Equation (10)

The estimator “efficiency” defined previously for this estimator is:

When this ratio is 1.0, the estimator is called “efficient.” The MLE estimator of “a” is therefore efficient since Equation 5 and Equation 10 are identical.

Method Moments Estimate of “a”

The method of moments is based on the idea that if two distributions have the same moments, they should be (approximately) the same. The method therefore consists of setting the first “k” sample moments equal to the corresponding moments of the assumed distribution. The number of equations then equals the number of parameters being estimated.

For the Maxwell distribution, only one parameter is available; thus, we set the sample mean to the distribution mean:

Using the previous procedure to derive the MSE of this estimator:

Equation (11)

The efficiency of the MOM estimator is thus:

12

Page 13: Maxwell Distribution Rev3

DRAFT

The MOM estimator is seen to have a variance approximately 7% greater than the MLE estimator of ; consequently it is less efficient. It is of interest to note that if we had used the second moments and equated the sample and distribution moments, the resulting MOM estimator would have been identical to the MLE. This shows that in some cases, judicious selection of the sample moments to use in MOM might have some advantage.

The above examples indicate that MLE estimators are more efficient, and this is generally true. MLE estimators have several other advantages. The first is that for “large” sample sizes, the MLE estimator approaches a normal distribution with mean equal to the true parameter, and the variance equal to the CRLB.

The second advantage of an MLE estimator is that it has an “invariance” property. This means that if an estimate of some function of a parameter is required, we can insert the MLE estimate of the parameter into the function to obtain the MLE of that function. This is expressed mathematically as:

It is noted that MLE estimators are often biased, although this is not the case for the Maxwell distribution as is apparent since it is an efficient estimator.

13

Page 14: Maxwell Distribution Rev3

DRAFT

Quantile Estimates and Order Statistics

Quantile estimators are similar to MOM estimators in that they are not based on a maximization procedure. Quantile estimators are generally easy to compute, but often the MSE of such estimators is difficult to ascertain. The efficiency of these estimators is generally less than MOM estimators. One application of such estimators is when the data has been censored. Censored data implies that not all samples have been evaluated (e.g., such as life testing). The method depends on some results for order statistics of a sample. The pertinent results are discussed in Appendix D.

In some cases the presence of unexpectedly large (or small) values may be present in a sample. These ‘outliers’ can have a disproportionate influence on the parameter estimates. If the outlier needs to be accounted for in a sample, but not unduly affect the parameter estimate, then quantile estimates are quite “robust” in this sense. The relation between quantile and MLE (or MOM) estimators is analogous to the sample mean and median estimates of the population central tendency in this sense. The order statistics just the sample values arranged in ascending order:

The r-th order statistic is denoted as . Each order statistic of the sample has an associated probability value, referred to as the quantile of the order statistic:

The idea behind quantile estimators is to pick a number of quantiles equal to the number of parameters to be estimated, and the associated sample order statistics for those quantiles. Inserting these values into the distribution function provides a set of equations equal to the number of unknown parameters. For the Maxwell distribution, there is only one parameter, hence, only one quantile is needed to estimate the unknown parameter.

Several questions arise. First, does the mean square error of the parameter estimate depend on which quantile is selected? Secondly, if a particular quantile gives the minimum MSE, how is it identified? A third question is: does the MSE depend on the parameter value itself? The following development shows that for the Maxwell distribution, the MSE does depend on the selected quantile, but the minimum MSE quantile does not depend on the parameter value.

Quantile Estimator for Maxwell Distribution

Quantile estimators in general are functions of the sample order statistics (just as moment estimators are functions of the sample moments). The delta method procedure can again be applied, and for a single parameter case the MSE of the estimator is:

Equation (12)

The “large sample” formula for of any order statistic is derived in Appendix D:

14

Page 15: Maxwell Distribution Rev3

DRAFT

Equation (13)

In this equation, is the value of the “r-th” order statistic corresponding to the sample

quantile . The notation implies this quantity is a random variable. The function is the underlying density function of the random variable y.

The minimum variance quantile is determined using the following procedure. We first need to transform into a more computationally convenient form:

Equation (14a)

Equation 2 is then expressed in terms of this variable, which shows that depends only on :

Equation (14b)

The derivative term required in Equation 12 is obtained directly from Equation 14a:

Equation (15a)

A plot the relationship between and from Equation 14a is shown in the left side of Figure 2. The Maxwell density function can also be expressed in terms of :

Equation (15b)

Note that this expression involves parameter “a”. For selected value of “a”, Equation 12 can be evaluated over the range of quantiles , by calculating via Equation 14b, and then using Equations 15a and 15b.

Equation (15c)

The right hand side of Figure 2 shows the relative variation of Equation 15c as a function of the quantile value for various choices of parameter “a”. The sample size n is a scaling variable that does not affect the shape of the curves.

For parameter “a” close to unity, the relative variance appears rather insensitive to the quantile value, but as the parameter increases, a minimum close to (upper quartile) is evident. In fact, the minimum of Equation 15c occurs at , and is independent of the parameter as can be verified by direct calculation.

15

Page 16: Maxwell Distribution Rev3

DRAFT

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.8

1

1.2

1.4

1.6

1.8

2

quantile value

t valu

e

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

2

4

6

8

10

12

14

16

18

quantile value

Estim

ate Va

rianc

e (n =

1)

a = 1

a = 2a = 3

a = 5

Figure 2 –Order Statistics Estimator Variance for Maxwell Distribution

For ease of calculation, the upper quartile value is recommended for the Maxwell distribution estimator (a more exact analysis shows that the minimum quantile is 0.7676). At the upper quartile, variable “t” equals 1.4330, and the quantile parameter estimate is:

Equation (16)

The efficiency of this quantile estimator is calculated using these values for the minimum MSE:

The quantile estimator efficiency can be calculated using the CRLB:

Equation (17)

16

Page 17: Maxwell Distribution Rev3

DRAFT

Estimation Example for Hypothetical Sample

As an example of the use of the parameter estimators, assume a sample of size 100 from a Maxwell distribution is available. In this example, the parameter value is a = 1.5 . Computation was done using MATLAB functions. The sample is:

data = [ 3.7743 2.6739 1.1872 0.7470 2.7742 1.7389 2.1289 0.3800 1.7851 2.9057 2.2847 2.1543 1.1863 3.3807 1.0166 2.4348 3.0516 2.4945 0.7600 4.1672 4.3509 3.3549 2.2589 0.9332 1.5142 1.4671 2.2576 4.3885 1.7236 1.9492 0.7190 1.6172 1.6940 1.1957 1.9209 2.6369 1.4716 2.4191 3.0460 2.4330 3.5561 1.7868 2.2678 2.2190 2.7345 0.6198 2.3071 3.0629 4.0377 1.8466 4.9367 2.3712 2.6957 1.1412 2.3771 0.6530 1.9543 0.2878 2.0164 3.0145 4.4819 5.7881 1.2017 1.8738 2.7212 1.5086 3.1833 1.7552 0.8323 2.6708 3.3421 4.6603 2.9550 2.0458 1.5950 3.9521 2.4211 1.7068 1.4631 2.7573 4.0250 3.0648 3.5964 1.7052 3.4843 4.1598 2.4427 1.6287 4.2384 1.9250 4.1387 0.9586 3.7827 1.2290 2.8417 1.4812 0.6498 3.3351 3.3007 2.9664 ]

A check on the calculated standard errors was made by running 200 replicates (of sample size 100 each with ). The standard deviation of the 200 MLE, MOM and quantile estimates was calculated and compared to the theoretical values above. Results are:

In order to check the quantile estimator, estimates and standard errors were made at the lower quartile and median also. Results showed the upper quartile estimator does in fact produce minimum variance among these.

17

Page 18: Maxwell Distribution Rev3

DRAFT

Mahalanobis Distance and Maxwell Distribution

The Mahalanobis distance is a “variance weighted” or statistical distance between two points in r-dimensional space. We confine discussion to Euclidian three dimensional space. Mahalanobis distance is employed when the distance between r-dimensional random variables is required, but the distance is weighted inversely to the “spread” or variance if each variable. Components of the random vector y may be correlated.

The definition of Mahalanobis distance is:

Equation (18)

Where:

Suppose the three dimensional vector random components are from uncorrelated normal distributions. The variance of each component is in general different. For simplicity assume all of the component variables are zero mean. The covariance matrix inverse is just the inverse of the diagonal terms in this case, and the square of the Mahalanobis distance is:

This squared value is seen to be the sum of squares of three standard normal random variables. As such, the distribution of is Chi-squared with 3 degrees-of-freedom. The square root of this random variable therefore is Maxwell distributed as shown previously, where the distribution “a” is equal to unity because is the sum of three standard normal variables.

The next issue is what happens to the distribution of when the components are correlated? In the following discussion, we consider tri-variate jointly distributed random variables. Assume that all components have different standard deviations and different means. The vector for the three components is .

Examine the general covariance matrix for these variables, where non-zero correlations are present:

18

Page 19: Maxwell Distribution Rev3

DRAFT

This matrix is positive definite and has non-negative eigenvalues. The eigenvector corresponding to each eigenvalue allows for spectral decomposition of the covariance matrix which can be expressed as:

Matrix are the eigenvectors, is a diagonal matrix of the eigenvalues and the factoring shown can be accomplished since the eigenvector matrix is orthonormal. Define the “square root” of the covariance matrix as:

Because the eigenvalues are all positive, the inverse of the above matrix is:

Now, let Z be a random vector that has a multivariate normal distribution with zero mean and independent components with unity variance, or . The subscript 3 implies Z is tri-variate. Define a random vector X such that:

This linear relation can be inverted such that:

Using the previous equation, results in the following:

The previous equation states that the Mahalanobis distance squared is still Chi-squared, with three degrees of freedom, as a result of the being independent standard normal variables, even though the underlying variables (x) are correlated. It is again noted that the value of parameter “a” is unity.

Application of the Mahalanobis distance measure is provided in the second example below. The fact that the parameter value is unity provides a metric for evaluation of consistency of a covariance matrix and a set of random variables.

19

Page 20: Maxwell Distribution Rev3

DRAFT

Maxwell Distribution Applications

This section provides two examples of missile systems analysis where the Maxwell distribution applies.

Three Dimensional Miss Distance

One measure of tactical missile performance is the distribution of miss distance, measured as the Euclidian distance from the missile to the target center, at closest approach of the weapon. For air-to-ground missiles, the two-dimensional impact point is described in terms of orthogonal “x-y” coordinates, centered at the target. The usual assumption is that the each coordinate position is independent, and identically distributed normal variables, with zero mean. These assumptions lead to the well known result that the distribution of the miss distance in the two-dimensional plane follows a Rayleigh distribution.

For air-to-air or surface-to-air missiles, the three-dimensional Euclidian miss distance distribution is sometimes used to describe performance. If the same assumptions are made regarding each component (iid distributed, zero mean normal variables), then the resulting joint probability distribution is:

where

To obtain Euclidian miss distance r, transform the orthogonal Cartesian coordinates to spherical coordinates:

The joint distribution of the transformed variables is:

The Jacobian for the transformation follows from the coordinate transform equations:

20

Page 21: Maxwell Distribution Rev3

DRAFT

Equation (19)

The marginal distribution of each variable is obtained by integrating out the other two:

Equation (20a)

Equation 19a shows that the 3-dimensional miss distance is distributed as a Maxwell random variable, with the parameter equal to the common standard deviation of the underlying normal distribution.

For reference, the other two marginal distributions can be easily calculated, noting that:

Equation (20b)

Equation (20c)

Thus, the distribution of the azimuth angle is uniform of the interval of , while the elevation angle has a cosine distribution. It is of interest to note that the product of the three marginal equations results in the joint distribution of Equation 19. This result shows these variables are independent of each other.

If a set of miss distance data is available, the standard deviation of the underlying normal random variables can be estimated using one of the methods discussed previously.

Filter Covariance Consistency Check

In some ground tracking systems, prediction and smoothing of the target track is implemented via an alpha-beta filter. Reference 6 contains an excellent description of these trackers, and their implementation. Trackers or this type generally filter target position in Cartesian coordinates, although measurements are made in a different “sensor” coordinate system. The errors associated with the sensor coordinates (usually, range, azimuth and elevation) are generally independent, but become correlated when they are transformed to the Cartesian system.

The covariance matrix associated with the target track is used for a variety of purposes, including uplinks to a missile seeker for search and acquisition by the onboard system. A potential problem can occur because the alpha-beta () filter does not directly produce a covariance matrix estimate, although one can be derived from the particular structure of the

21

Page 22: Maxwell Distribution Rev3

DRAFT

filter. In applications, the covariance matrix is supplied external to the filter and should be “matched” to the actual sensor error measurement characteristics.

One consistent way of accomplishing this is to develop the respective variance terms from the implemented filter, in the sensor coordinate system, and transform to the Cartesian system. This requires that statistics of the error components (assumed normal) be known ahead of time. It also assumes that the “lag error”: of the filter is negligible compared to the noise error. Another approach is to develop the covariance matrix from some other algorithm, such as a related sensor or filter and make necessary adjustments to be compatible with the alpha-beta filter.

In either case, consistency of the implemented filter with the supplied covariance matrix should be assessed. One way of doing this is by comparing the theoretical capture ratio of the measurements with the actual (or simulated) data error values. Capture ratio is defined as the proportion of “weighted error magnitudes” that lie inside the volume contained within the “2.5-sigma” error (position) ellipsoid. For equal component variance, the error ellipsoid is a sphere.

Development of the statistical distribution for Mahalanobis distance showed it is Maxwell distributed with parameter . This fact can be used to check consistency of the measurement error and covariance matrix. The theoretical value is found from Equation 2:

Capture Ratio =

The “data” value of the capture ratio is computed by evaluating the Mahalanobis distance according to Equation 18 at each update point while the filter is tracking. The ratio of calculated points larger than 2.5, divided by the total data points provides the data capture ratio. This should be close to the theoretical value if the filter, measurements, and covariance data are consistent.

A second equivalent estimate of consistency can be done by calculating the MLE, and associated MSE of the parameter “a” from the data values. If the computed value is more than twice the square root of the in the MSE, this is a strong indication that a mismatch exists.

Potential reasons for a mismatch are:

The supplied covariance matrix is in fact based in a different model than the alpha-beta filter.

The actual noise error components have different variances than those assumed in development of the covariance matrix.

The actual noise error components are not from normal distributions, or are not zero mean.

22

Page 23: Maxwell Distribution Rev3

DRAFT

Large target maneuvers during tracking have introduced large lag error in the filter output which is not included in the theoretical development.

Miscellaneous Topics

This section covers some additional topics related to the Maxwell distribution and its estimator properties.

Generation of Maxwell Random Variables

Simulation analyses work may require generation of Maxwell distributed random variables. This can be accomplished in either of two ways. The first is by direct use of the cumulative density function. As noted previously, we can write this in the form:

where

The variables have a uniform distribution on the unit interval. Given a particular random value on this interval, the corresponding value of can be found by a numerical procedure using the above function.. For a particular analysis, the parameter value “a” is known, so values of the random variables are obtained from the values.

Various root finding procedures are available as discussed in Reference [7]. When applicable, the Newton-Raphson method is desirable since it has quadratic convergence. This method does require the derivatives to be “well behaved” in that the derivative value does not vanish anywhere over the range of function values. Unfortunately, the Maxwell CDF has a zero derivative at zero and the derivative approaches zero as the variable becomes large. As such, the Newton-Raphson method does not converge in general and some other method, not involving derivatives, should be used. Bisection is one method, which has slow convergence. Brent’s method (Reference [7]) has “super-linear” convergence and is considered one of the best “non-derivative’ root finding algorithms.

The second way to generate Maxwell random variables is to use the results derived for the miss distance distribution. In this method, three separate normal random deviates are generated, each with a standard deviation corresponding to parameter “a”. These three deviates are squared and summed. The square root of the sum then provides the desired Maxwell random deviate.

This implementation is generally much faster, depending on the software implementation, because the iterative root finding is not required (although a normal random number algorithm is needed). The results derived in Appendix F use this generation method and MATLAB random variate generators. This implementation executes an order of magnitude faster than does root finding implementation.

23

Page 24: Maxwell Distribution Rev3

DRAFT

Non-Linear Least Squares (NLS) Parameter Estimation

The minimum variance quantile estimate of parameter “a” was derived previously. A question is: “if all of the quantile data points were used to estimate the parameter, by minimizing the square of the error between estimated and predicted values, would this produce a better estimate?” This proposed estimation procedure is a least squares problem.

The parameter is related to the quantile values and the data points through the CDF. As such, the least squares problem can be expressed as:

The non-linear least squares problem is to minimize the square of this error by setting the derivative with respect to the parameter equal to zero:

This derivative still involves the error function and requires numerical solution. A computationally more direct way is use the fact that a single parameter is involved. The function can be minimized directly using an optimization procedure (such as golden section, Reference [7]).

A difficulty in determining the efficiency of this non-linear estimator is that the neither the distribution nor the mean square error can be calculated analytically. This non-linear least squares (NLS) estimator will require numerical evaluation for any particular desired case.

As an example, take a Maxwell distribution with , and sample size . The approach is to take a large number of Monte Carlo samples of size n and compute the MLE and the NLS estimator for each sample. The mean and standard deviation of these n estimates is calculated and compared. The MLE estimators should have variance very close to the CRLB. In this particular example, 5000 samples of size 100 were generated, and the NLS and MLE estimates were made for each. The quantiles were estimated as:

Figure 3 shows the results for the first 500 samples. The results apply only to the case considered, but they are indicative of the relative performance of the NLS and MLE estimators. The figure shows that the two estimates are highly correlated as would be expected .

Calculated results for the means and standard deviations are:

24

Page 25: Maxwell Distribution Rev3

DRAFT

1.3 1.35 1.4 1.45 1.5 1.55 1.6 1.65 1.7 1.751.3

1.35

1.4

1.45

1.5

1.55

1.6

1.65

1.7

MLE estimates

Err

or

Le

ast

Sq

ua

res

Est

iam

tes

Figure 3 – NLS & MLE Estimators: Empirical Results

The efficiency of the NLS (for this example) is the ratio of the CRLB to the estimator variances.

Efficiency of the sample derived MLE estimator can be computed also:

This example indicates that using all of the quantiles in a sample does provide a higher relative efficiency than the single quantile estimator. However, both the MLE and MOM estimators have smaller MSE and are thus superior to this estimator.

The nonlinear least squares estimator is related to the ideas of correlation based goodness-of-fit tests. Appendix F provides discussion of test of this type.

25

Page 26: Maxwell Distribution Rev3

DRAFT

Appendix A

Bayes’ Estimators for the Maxwell Distribution

Development

Bayesian estimation is based on a different point of view than are the other point estimation methods discussed. Whereas MOM, MLE and quantile estimates assume the parameter is fixed, the Bayesian approach is to treat this quantity as a random variable. The method is has an advantage of incorporating probabilistic information that is known prior to taking a sample. The method is based on conditional probabilities and is an expression for Bayes’ rule as applied to density functions. It is expressed as:

Equation (A1)

In this equation, represents statistical knowledge about variable “a” prior to obtaining a data sample. It is a probability density and is called the “prior density.” The conditional density is the joint density of the sample (y), given a value of the parameter. Since the data values are independent, this density is the product of the probabilities of each data point, which is just the likelihood function of Equation 3.

The function is the probability density of the data. This density can be calculated in terms of the likelihood and the prior density using the Theorem of Total Probability. Integration is over the support of the prior density (i.e., the range of values that parameter “a” can take on):

Equation (A2)

This density is essentially a normalization factor, so that Equation A1 satisfies the requirements of a probability function.

The conditional density, , is referred to as the posterior (or “a posteriori”) distribution and represents the statistical knowledge about variable “a” after the current data sample is incorporated. This posterior density, if it can be evaluated, provides a complete description of the estimator (for the given data) since the mean, variance and other parameters can in theory be calculated.

Actual selection of the estimate can be chosen in several ways. The Bayes’ estimator most commonly used is found as the expectation of the posterior distribution:

Equation (A3)

26

Page 27: Maxwell Distribution Rev3

DRAFT

The Bayesian procedure can be thought of as a “weighted average’ of the prior distribution mean and the data sample mean. Most textbooks such as References 1-4 show simple examples of this for normal and binomial distribution parameters. Generally as the data sample size n increases, more weight is placed on the sample and less on the prior distribution. This will be demonstrated in an example for the Maxwell distribution.

There are two issues involved with Bayesian estimation that complicate the procedure. The first is selection of the prior distribution for the parameter, which is subjective in nature. This distribution can be based on prior data samples if they are available. This distribution may be assumed by the analyst, based simply on intuition. The second issue is the evaluation of the data density (normalization factor) and the integrals resulting in Equation A2 and A3. For single parameter distributions, the integrals can usually be evaluated numerically.

The likelihood function for the Maxwell distribution data sample is:

It is noted that all terms in the likelihood function that are not functions of the parameter can be brought outside the integrals, and thus cancel. This includes terms involving only the since these are fixed for a particular sample.

Evaluation of integrals involving this likelihood function presents particular problems because of its structure. If the sample size n is even moderately large, the product of the negative exponential and the term becomes extremely small. In numerical evaluation procedures the resulting integrals evaluate to a “0/0” situation (to computational accuracy on most computers).

The likelihood function is the distribution of the data samples. At any fixed value of the parameter, the likelihood is a function of the data only, and as such it is a statistic. The question arises as whether a simpler statistic exists that contains all of the statistical information of the sample. The answer is affirmative, if and only if, a sufficient statistic exists for the data sample.

A sufficient statistic is defined in terms of conditional densities. Heuristically, a statistic that is sufficient summarizes all of the information contained in the sample about the value of the parameters to be estimated. The direct way of determining if a sufficient statistic exists is via the Factorization Theorem (Reference 8). This theorem says that if the likelihood function can be separated into two factors; one involving only data and constants and the other involving only the parameter and a statistic, then that statistic is sufficient.

The Maxwell likelihood function can be factored as follows:

27

Page 28: Maxwell Distribution Rev3

DRAFT

The function depends only on the data while function depends only on the

parameter and the (sufficient) statistic where:

In other words, the second moment of the data about the origin is a sufficient statistic for the Maxwell distribution. It should be noted that any function of a sufficient statistic is also a sufficient statistic. The fact that the second raw moment is sufficient should be no surprise because this statistic is what determines the MLE. Sufficient statistics that arise from applying the Factorization Theorem often produce estimators on which an MLE is based.

Equation A1 can be rewritten as:

Equation (A4)

The distribution function is the sampling distribution of the sufficient statistic , for a fixed parameter value. To determine this distribution, recall expressions for the mean and variance of the sample moments for the Maxwell distribution:

Since the sample moment is the sum of n independent random variables, we can invoke the Central Limit Theorem or CLT (Reference 9). This theorem states that the distribution of the sum of a “large’ number of random variables approaches a normal distribution. In order to proceed further, we will assume that the second moment is distributed normally, with the preceding mean and variance. If the constants and terms involving only the data values are set to a proportionality factor (“k”), The Bayes estimate is the expectation of Equation A4, over the support of “a”. The sample data, as contained in , is fixed for the Bayes’ estimate.

Equation (A5)

Equation A5 is as far as the theoretical development can go, other than canceling out common terms in the exponent. The next step is to arrive at some prior distribution and then perform the integration as shown. This will be done via a hypothetical example.

Hypothetical Example of Maxwell Parameter Bayesian Estimation

28

Page 29: Maxwell Distribution Rev3

DRAFT

Suppose we have an electronic device that has inputs from three sources. Each signal is first passed through a squaring circuit, and these are fed into a summing circuit. The circuits are fast enough that dynamics of the system do not need to be considered. The output signal is the sum of the three squared signals. The mean value of each input is known. The problem is to estimate RMS power of input noise signals for a given device.

From previous experience with these devices, the RMS noise is known to vary from 3V to 5V, with the data being approximately bell shaped around 4V. The beta distribution is a quite flexible statistical model when the data is known to have a finite support (i.e., the values are within a finite interval of the real axis). The symmetrical beta distribution, for random variable a, has the following density function over the finite interval :

where

If the parameter is unity, the beta distribution becomes the uniform distribution. A graph of the symmetrical beta function for [3, 5] is shown in Figure A1 for various values of .

3 3.2 3.4 3.6 3.8 4 4.2 4.4 4.6 4.8 50

0.2

0.4

0.6

0.8

1

1.2

1.4

parameter "a"

(Sym

met

rical

) B

eta

Dis

trib

utio

n pr

iors

= 2 = 3 = 4 = 5

Figure A1 – Symmetrical Beta (prior) Distribution

For this example, assume the beta parameter value has been established at 3. Once a sample is taken and the second moment of the sample is calculated, the numerical integration of Equation A5 can be carried out. Note again that constant factors in the prior distribution cancel out.

We might be interested in how the Bayes estimate varies with different values of the sample second moment, and with different sample sizes n. We can do the integration of Equation A5

29

Page 30: Maxwell Distribution Rev3

DRAFT

for a range of values of and n, and display the results graphically. In order to see how the

Bayes’ estimate compares with the MLE estimate, write the sufficient statistic, as:

Figure A2 shows the relationship between the MLE estimate and the Bayesian estimate, assuming a symmetrical beta prior, with parameter equal to three. Note that n occurs elsewhere in the integrals of Equation A5, independent of , and therefore is not a simple scale factor. Several points are noted regarding these results.

1 2 3 4 5 6 73

3.2

3.4

3.6

3.8

4

4.2

4.4

4.6

4.8

5

MLE parameter estimate "aMLE

"

Ba

yesi

an

pa

ram

ete

r e

stim

ate

"a

BAY "

n = 10

n = 25

n = 100

priordistribution

support

Figure A2 – Parameter Estimate Method Comparison

30

Page 31: Maxwell Distribution Rev3

DRAFT

First, if the data indicate the parameter is not included in the support of the assumed prior, the resulting Bayes estimate should not be used. In the case shown, if the data indicate that the parameter is either greater than 5 or less than 3, then either this is a different device that the historical data is based on, or the selection of the prior distribution was not adequate.

Second, the MLE and Bayes estimates are equal only at the mean of the prior. If the prior density were not symmetrical, then in general the two estimates would never be the same. Also, the variation of the Bayes estimate is approximately linear with the MLE estimate, over the prior distribution support.

The third observation is that as sample size increases, the slope of the linear relationship between the MLE and Bayes becomes greater. As mentioned earlier, the Bayes estimate is a weighted mean of the prior and sample data. The prior tends to “pull” the estimate toward the mean value. The data tends to “pull” the estimate toward the MLE, which is the best estimate in the absence of knowledge other than the data itself.

For sample sizes less than about 10, application of the CLT may be questionable. In this case, use of the likelihood function in equation A3 generally allows for sufficiently accurate numerical computation of the required integrals. Substitution of the likelihood function rather than the sample distribution of the statistic do not affect choice of the prior distribution.

Another “Bayesian” estimate that may be chosen in some cases is the “Maximum a Posterior” (MAP) estimate, This corresponds to the maximum value (mode) of the posterior density. The MAP has some appeal since function is a fixed constant for the data sample. In this case, all that is required is maximizing the numerator of Equation A1, and no integration is involved. In fact, if the prior density is chosen as uniform over an interval (c,d) the MAP is just the MLE solution. As with the Bayes estimate, the MAP estimate must be within the support of the prior.

31

Page 32: Maxwell Distribution Rev3

DRAFT

Appendix B

Two Parameter Maxwell Distribution

Definition and Properties

The left side of Figure B1 shows the projection of a (hypothetical) sample of random data values which represent three dimensional miss distances from a target. The third dimension is perpendicular to the page. The miss distance magnitude is shown on the right hand side. These graphs indicate a bias in miss distance. Assuming the lower bound as zero would not provide the best estimation of this density function. The miss distance bias can be accounted for by considering a two-parameter Maxwell distribution as the applicable statistical model.

-8 -6 -4 -2 0 2 4 6 8-8

-6

-4

-2

0

2

4

6

8

Axis 1 Miss Distance

Axi

s 2

Mis

s D

ista

nce

Target

2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Empirical CDF - 200 samples

3D Miss Distance d

P( d

<D )

Figure B1 – Miss Distance Projection and Magnitude

The general two parameter density function is now a “location-scale” type distribution:

Equation (B1)

The corresponding CDF is again found by direct integration on the interval :

Equation (B2)

These function forms allow for a negative location parameter; however, based on physical considerations, it is expected that for almost all real world problems.

If parameter b is known, then a simple change of variable shows that all of the previous results apply for estimators. If b is considered as an unknown parameter, a quite different situation occurs. In this latter case, the support of the random variable y depends on the

32

Page 33: Maxwell Distribution Rev3

DRAFT

parameter itself. This situation violates one of the “regularity conditions” for determining the CRLB; and in fact a lower bound for the estimators of “a” and “b” cannot be found. In Appendix A, the concept of a “sufficient statistic” was introduced. The existence of sufficient statistics is generally the starting point for determining MLE estimators. In general estimation of two parameters requires two jointly sufficient statistics. Recall that the Factorization Theorem provides a convenient way to derive sufficient statistics. The likelihood function for the two parameter Maxwell distribution is:

Because of the product terms, this function cannot be factored into one term that includes only statistics and parameters, and a second factor including only data values. What this implies is that entire data set is required to define the parameters. This agrees with intuition since we know that where is the smallest order statistic. This smallest value is not known unless all of the data values are known. Hence, the data cannot be summarized into a function of the data that contains the same information about the parameters.

Method of Moment Estimators

A straightforward procedure for estimation of the parameters for this distribution is the method of moments. This allows for relatively simple computations and also allows for computation of the standard error of estimate for the parameters.

In order to calculate moments about the origin for Equation B1, simple substitution is made. The first and second moments are:

The following notation will be used in discussion of MOM estimators for the two parameter Maxwell distribution:

Note that the moments are the same as these moments for the one parameter distribution

(i.e., when b = 0).

33

Page 34: Maxwell Distribution Rev3

DRAFT

Variance of the random variable y is unaffected by location change: .

The MOM estimates for “a” and “b” are determined by equating first and second sample

moments, and to the theoretical expressions and solving the simultaneous equations.

After some algebra, and substituting for the moments :

Equation (B3a)

Equation (B3b)

The constants in these equations are: and .

Both of the estimated parameters depend on and . The mean square error for each of

these MOM estimates can be derived using the formulas developed in Appendix C for the mean square error of a function of moment statistics. The relevant equations for two parameter Maxwell distributions are:

Equation (B4a)

Equation (B4b)

The partial derivatives in these equations are evaluated at the mean value of for the respective parameters. In order to evaluate variance of the moment estimators, the third and fourth moments of the density function are required.

Equation (B5a)

Equation (B5b)

The variance and covariance of the moment estimators are also derived in Appendix C. These quantities are required in terms of the population moments.

34

Page 35: Maxwell Distribution Rev3

DRAFT

Plugging the respective population moments into the above expressions and then substituting

the parameters “a” and “b” into the expressions for the moments , provides (after some

algebra) the following:

Equation (B6a)

Equation (B6b)

Equation (B6c)

The partial derivative terms are evaluated by differentiation of Equations B3a and B3b. The last expression results from evaluating the derivatives at the moment mean values.

Equation (B7a)

Equation (B7b)

Equation (B7c)

Equation (B7d)

The remaining two terms required for evaluation of Equations B4a and B4b are the products of the respective partial derivatives:

Equation (B7e)

35

Page 36: Maxwell Distribution Rev3

DRAFT

Equation (B7f)

Combining expressions in Equations B6 and B7 (after substantial algebra) produces a remarkably simple result for each of the two standard errors.

Inserting the constant values results in:

Equation (B8a)

Equation (B8b)

Neither of the standard errors depends on the parameter “b”. Figure B2 shows plots for each of the mean square errors, where it is noted sample size is just a scale factor (assumed 1.0 in the figure). The result agrees with our intuition that “location” of the distribution should not affect the standard errors, but the scale parameter does have a significant affect.

1 1.5 2 2.5 3 3.5 4 4.5 50

2

4

6

8

10

12

14

Parameter a

Varia

nce

of a

ha

t - (n

= 1

)

1 1.5 2 2.5 3 3.5 4 4.5 50

5

10

15

20

25

30

35

Parameter a

Varia

nce

of b

ha

t - (n

= 1

)

Figure B2 – Variance of MOM Estimators “a” and “b” (n = 1)

In practical inference applications, we do not know the actual values of the parameters as these are what we are trying to estimate. Usual statistical practice is to first compute the estimates and , and use these estimates in the standard error formulas. One should note that this introduces some additional errors into these error estimates.

36

Page 37: Maxwell Distribution Rev3

DRAFT

A second issue that can arise in applying Equations B3a and B3b to actual data is that, in some cases, the estimate , where is the smallest sample value. If this occurs, the following procedure is recommended:

1. Calculate the standard error of .

2. Determine if the inequality is true.

3. If this inequality is true, assume the minimum value is as the “two-sigma” error of estimate includes this value.

4. If the inequality is false, a likely reason is that the sample is not from a Maxwell distribution.

5. Check the condition:

6. If the inequality is false for the second smallest order statistic, one can assume the sample is not from a Maxwell distribution.

7. If only the smallest order statistic violates the “two-sigma” error, but does so by a large amount, the analyst may choose to treat this value as an outlier.

8. In any case, it is suggested that the minimum of and be used for in any calculations involving the accepted distribution.

37

Page 38: Maxwell Distribution Rev3

DRAFT

Maximum Likelihood Estimators

As noted previously, the two parameter Maxwell distribution does not have a sufficient statistic and the CRLB cannot be calculated. The ideas behind MLE estimation are still applicable. Recall that the likelihood is a function of the parameters, with data values fixed. MLE estimates are the values of the parameters that maximize this function. The two parameter Maxwell log likelihood is:

The respective MLE equations are:

Equation (B9a)

Equation (B9b)

These two equations can be combined:

Equation (B9c)

Equation (B9c) provides the estimate and equation (B9a) then provides . Note that

the value of is again constrained such that where is the smallest sample

value. The same rules suggested for MOM estimators when should generally be applicable.

Numerical Test Results

Monte Carlo generated experiments have been used to test both the MOM and the MLE estimators. The basic test set consisted of 10,000 replications of samples of various sizes. Some conclusions drawn from these experiments are:

1. Both the MOM and MLE estimators can produce estimates of that are greater than the minimum sample values .

2. The MOM estimate of is virtually always within twice the standard deviation of the variance provided by Equation B8b (1 exception in 10,000).

38

Page 39: Maxwell Distribution Rev3

DRAFT

3. The MOM estimates of shows very little sensitivity to not replacing with . It is recommended that the constraint be enforced so that no negative values occur for the random variable . The MSE of these estimates are essentially equal to the values given by Equations B8.

4. MLE estimates show large sensitivity to not replacing with . If this replacement is not made, both a larger variance and a bias is introduced into the estimates. For MLE estimates it is imperative that the constraint be enforced.

5. Variance of the MLE parameter estimates is less than that of the corresponding MOM estimates, if bias is not considered. The MLE estimates do appear to be biased however even when the minimum constraint is enforced. The experiments show

is positively biased (i.e., larger that the population value) while is negatively biased.

6. The reason for the large sensitivity of the MLE estimates, when , is that the assumption of positive values in the derivation of Equation B9c is violated. The resulting solution does not maximize the likelihood in this case. Enforcing the constraint implies that the log-likelihood of Equation B9a is valid, but the function is not necessarily maximized.

39

Page 40: Maxwell Distribution Rev3

DRAFT

Appendix C

Sample Moment Standard Error and the Delta Method

Variance and Covariance of Sample Moments

Calculation of standard error formulae for MLE and MOM estimators involves requires knowing the variance (and covariance for the two parameter distribution) of the sample moments. The r-th sample moments are calculated as:

These moments are unbiased as seen by taking the expectation of this statistic:

Before calculating the variance and covariance, two mathematical facts are reviewed. The first is that the expectation of the product of functions of two independent variables is the product of the expectations. Assume x and y are independent variables, and g(x) and h(y) are functions of these variables, respectively:

The second useful fact is algebraic. Consider the product of two expressions that are composed of a sum of n values each:

Upon expansion, this expression consists of n terms of the form: . There are

remaining cross terms: .

The variance of any sample moment is:

40

Page 41: Maxwell Distribution Rev3

DRAFT

The term involving the expectation can be expanded:

Because each sample value is independent, the following holds:

Combining the previous three equations results in:

Equation (C1)

Derivation of the expression for the covariance is similar. The covariance for any two sample moments is:

Expanding the expectation, and aging using independence of the sample values:

Combining the above expressions results in the formula for the covariance:

Equation (C2)

In actual calculation, the population parameters (and thus moments) are generally not known as these are the quantities being estimated. The sample moments themselves are substituted into Equation C1 and Equation C2 in these cases. As such, the variance and covariance are only approximations.

41

Page 42: Maxwell Distribution Rev3

DRAFT

The Delta Method

The delta method is a general procedure to find estimates of the mean and variance of a function of random variables, when the mean and variance of the random variables is known. The method rests on Taylor’s Theorem for representing functions that are sufficiently differentiable. We will consider a function of two variables only, but the method can be extended to a function of any number of random variables.

Taylor’s Theorem (Reference 2) states that any function can be expanded around a fixed value: For a function of two variables: , we can expand around the point , retaining only linear terms:

The term is the remainder and the key point of Taylor’s theorem is that it approaches zero as n approaches infinity. In many applications only the linear terms are retained, and this will be the procedure here.

In our statistical applications, we will chose the mean of the random variables and expand around that point:

So if we take the expectation of this expansion, noting the derivative values are constants:

(Equation C3)

Hence the expected value of a function of random variables is just the function evaluated at the expected values of the variables. Figure C1 provides a illustration of the assumption involved, for a function of one variable.

Figure C1 –Linear Function Approximation and Underlying Density

42

f(y)

y0 = Y

y

pdf(y)

f(y0) + (df/dy)|y=y0 (y – y0)

f(y)

y0 = Y

y

pdf(y)

f(y0) + (df/dy)|y=y0 (y – y0)

Page 43: Maxwell Distribution Rev3

DRAFT

The approximation will be very good if the probability density of the underlying variable is concentrated enough such that most of the “probability” lies within an interval where the linear approximation does not deviate significantly from the actual function.

The variance of the functions is also required.

Note that the functions x and y are random variables and in general are correlated. This last expression becomes:

This basic formula is used often in deriving the MSE of various types of statistics (estimators) that are functions of other statistics with known variances. Examples are estimators that are functions of the sample moments. The variance and covariance of sample moments is calculated, or approximated, as discussed above.

43

Page 44: Maxwell Distribution Rev3

DRAFT

Appendix D

Some Results for Order Statistics of a Sample

Distribution of Order Statistics

Start by assuming a random sample of size n, from some distribution specified by . The sample values are arranged in ascending order:

Each observation is now the r-th order statistic, where r refers to the index of the

observed value. is the smallest, and is the largest value. The process of ordering the observations has introduced dependence between the order statistics.

Reference [5] provides a discussion of order statistics which includes a derivation of their distribution and other properties of these statistics.

If the underlying density function is and the corresponding CDF is , the density function, , for the r-th order statistic is:

Equation (D1)

The parameter can be either a scalar or vector quantity. Subsequent discussion assumes the support of the underlying distribution is . The area to the left of any order statistic is . The expected value of this area (or, equivalently, the probability) is:

Invoking a change of variable: ; noting , and using density function transforms the integral expression:

This is a “beta-like” integral, and carrying out the integration results in:

Invoking the gamma function property for integer arguments provides a simple expression for the expected value of the probability of any order statistic (with a continuous underlying distribution):

Equation (D2)

Variance of Order Statistics

44

Page 45: Maxwell Distribution Rev3

DRAFT

We now seek an expression for the variance of the r-th order statistic. Variance of an order statistic will only be an approximation, where the derivation utilizes the “Delta Method” discussed in Appendix C.

Begin by considering a uniform distribution on the unit interval [0,1]. Equation D1 becomes:

The expectation of this function is seen again to involve “beta like” integrals, and as expected, direct integration provides the same answer as Equation D2 (which applies for any distribution).

The variance of the uniformly distributed variable can also be found by direct integration. We obtain:

Equation (D3)

For any continuous density function, , and CDF , define the random variable u:

This random variable is uniform on the interval [0,1]. The order statistic therefore has the distribution:

Recall that if variable y is a function of variable x, i.e., and the mean and variance of variable x are known, the delta method provides a way to approximate the mean and variance of variable y.

In the current case, and . The required derivative is:

It useful to note that because y is a monotonic function of variable u:

so that

Since we know both the mean and variance of , the variance of the order statistic follows from the above relations:

45

Page 46: Maxwell Distribution Rev3

DRAFT

The last equation for the variance of the order statistic implies that for moderately large n,

Equation (D4)

As n becomes large, the order statistic , has an asymptotic normal distribution with variance and mean given by the preceding equations. Reference [5] provides a detailed derivation of this Theorem.

Relative Efficiency of Maxwell Mean and Median Estimators

An application of Equation D4 is to find the relative efficiency of the sample mean and the sample median for the one parameter Maxwell distribution.

A Maxwell distributed variable has sample mean and variance:

The median value of the Maxwell distribution can be evaluated from Figure 2 in the main report, where At this point .

This shows that the sample mean has about 61% of the variability of the sample median (For comparison the relative efficiency for the normal distribution sample mean to sample median is about 0.64.) The sample mean is preferable for estimating the parameter of the Maxwell distribution if one of these simple statistics were used, rather than MLE, MOM or quantile estimators.

46

Page 47: Maxwell Distribution Rev3

DRAFT

Appendix E

Sums of Maxwell IID Random Variables

It may be of interest in some applications to determine the distribution of the sum of two or more independent identically distributed (IID) random variables drawn from a population that follows the Maxwell distribution. Moment generating functions or characteristic functions are often employed when closed form expressions for these functions exist and when their products inverted. This approach does not provide expressions that can be inverted for the Maxwell distribution. A direct integration approach will be used.

The joint distribution of two IID random variables, X and Y, is provided by the product of the density functions. Define the sum of these as random variable Z. Then:

or

The distribution of Z can be found by substituting into the joint density, and integrating out variable Y. Because both X and Y are confined to non-negative values, integration is over the interval [0, z], which limits X to non-negative values.

For the Maxwell distribution, the integral is:

Evaluation is carried out by expanding the terms in the exponent and the polynomial, and integrating term by term, and finally collecting terms. The result is:

Equation

(E1)

As before, denotes the error function. The density function of equation E1 cannot be integrated in closed form to obtain the CDF, or moments of this distribution. This density can be integrated using numerical techniques to find the desired quantities, for a given value of the parameter “a”.

The mean and variance for a sum of Maxwell variables follows from the basic theorem for the addition of independent variables. This theorem can be stated mathematically for the sum of N variables, :

47

Page 48: Maxwell Distribution Rev3

DRAFT

Equation (E2)

Equation (E3)

Performing numerical integration for various values of the parameter confirms this result for N = 2. Also numerical integration shows that this distribution has constant skewness:

Recall that skewness of the Maxwell distribution is 0.4857, so the sum of Maxwell variables produces a more symmetrical distribution. The Central Limit Theorem implies that as N becomes “large,” the distribution of the sum should approach the normal distribution with mean and variance as indicated in equations E2 and E3, respectively. The skewness calculation confirms this for the sum of two variables.

The size required by the CLT for a “large number” of terms will depend on the underlying distributions involved. This requirement can be investigated for the Maxwell distribution using the sum of only two variables.

Figure E1 is a plot of equation E1, for the exact distribution of the sum of two Maxwell variables. Also shown is the normal approximation for this distribution, using Equations E2 and E3, where N = 2. The parameter “a” is 1.0 in both cases. The underlying Maxwell distribution for each variable is shown for comparison. The figure implies that the normal distribution is a reasonable approximation of the sum, except for the “tails” of the distribution, even for two variables.

48

Page 49: Maxwell Distribution Rev3

DRAFT

0 1 2 3 4 5 6 70

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Random Variables Z and Y

Pro

ba

bility

Maxwell Distribution a = 1Exact Distribution, N = 2Normal Approximation, N = 2

Figure E1 – Maxwell Density and Sum of Two Random Variables

If sums of Maxwell random variables are required, it appears that the normal distribution is adequate for in most cases, with mean and variance given in equations E2 and E3. Evaluation of the exact distributions is possible only through numerical procedures. If , equation E1 is recommended along with numerical evaluation of the CDF or other quantities of interest.

49

Page 50: Maxwell Distribution Rev3

DRAFT

Appendix F

Goodness of Fit and Outlier Tests for Maxwell Distribution

Hypothesis Tests

Discussion of point estimation methods has been based on the assumption that a data sample is in fact from a Maxwell distribution. Given a set of data, an analyst would probably want to get some indication of the veracity of this assumption prior to drawing conclusions about the statistical model. The procedures used to determine if a set of data is from a particular distribution, or family of distributions, fall under the category of statistical hypothesis tests.

Hypothesis tests are essentially a tool for making decisions under uncertainty. These tests consist of specifying a dichotomous decision specified by a null hypothesis, ,and an alternative hypothesis, . Often, the null hypothesis is specified with the intent to reject it. If the null hypothesis is rejected, this implies the alternative hypothesis is accepted. The converse is not true: failure to reject the null hypothesis does not mean that it is true. Rather, failure to reject the null hypothesis means that insufficient data is available. Reference [11] covers the theory of hypothesis tests in more detail.

In the decision framework, four possible outcomes can occur, depending on which decision is made, and what is the true “state of nature.” If the null hypothesis is true (i.e., actual state of nature), and the analyst does not reject it, or if the alternative hypothesis is true and the analyst rejects the null hypothesis, then the correct decision is made. However, if the state if nature is such that the null hypothesis is true, but the analyst rejects it, a Type I error has occurred. If the alternative hypothesis is true, but the analyst fails to reject the null hypothesis, then a Type II error has occurred. Useful statistical tests generally provide the analyst with a quantitative way to assess the Type I error probability which is referred to as he Type II error probability, which is referred to as is generally much harder to asses for a given statistical test. In general, if the size of a sample is fixed, the Type I and Type II errors have an inverse relationship. Decreasing both and requires increasing sample size. Most hypothesis tests specify the required probability and do not explicitly specify . This approach assumes that making a Type I error is more serious that making a Type II error. The Type I error probability is called the “significance level” of the test.

Obviously, hypothesis tests require that a relevant statistic be available. Since a statistic depends on the sample, it is a random variable, and has an associated distribution. Different hypothesis tests use different statistics, each with a distribution that can be derived exactly or asymptotically (i.e., large sample approximations). The percentage points for a significance test are the probability values that a particular statistic has a specified value when the null hypothesis is in fact true.

A simple hypothesis is one where the distribution is completely specified, which includes not only the type of distribution, but also any parameters of that distribution. An example of a simple hypothesis is for a Maxwell type distribution with parameter . A composite hypothesis specifies the type of distribution (e.g., Maxwell) but allows for one of more free parameters that are not specified. Most goodness-of-fit tests require a simple null hypothesis

50

Page 51: Maxwell Distribution Rev3

DRAFT

but some do admit composite hypothesis. It is necessary to be aware of the assumptions behind any goodness-of-fit test that is being used to provide support for the analysis.

Before proceeding further, it is noted that a some controversy arises regarding hypothesis tests as regards simulation experiments (Reference [14]). The essence of the argument involves the “power” of a test. Power is defined as the probability of rejecting the null hypothesis. In practical terms, the power is:

; allowable parameters for

As the size of the sample increases, the Type II error decreases, and eventually, enough samples will be taken that the null hypothesis will be rejected. In simulation experiments, this means that unless we absolutely know the data comes from the null hypothesis distribution, any distributional hypothesis will be rejected if enough Monte Carlo samples are obtained. The analyst will need to decide if results based on goodness of fit tests are of value in defending results based on the assumed distribution.

Power of a particular test is hard to evaluate quantitatively since it requires knowing probabilities of the test statistic under all other alternatives from the null hypothesis. In other words, the power of a test is a function of the alternative hypothesis, as well as the test statistic itself. Statistical theory for hypothesis tests deals with deriving test statistics and “most powerful’ tests for given situations. References [1], [2], [5] and [11] discuss this topic in more detail.

In practical application, the power function generally is not known. The approach is to specify the null hypothesis, and then determine the critical region, for a particular statistic, where the null hypothesis is rejected under the condition that the null hypothesis is true. This approach is used to derive statistics relevant to the specification of the null hypothesis as a Maxwell distribution in the following. The critical region for specified statistics are derived through Monte Carlo simulation.

Hypothesis tests for distributions, referred to as goodness-of-fit tests, essentially try to show that the data sample does not come from the distribution of the null hypothesis. In this sense it is never proved that the data comes from the test distribution. Either the hypothesis is rejected which “proves” the data does not come from that distribution; or the hypothesis is not rejected. To emphasize: failure to reject the null hypothesis does not mean the data is from the specified distribution, rather it merely means that we can’t reject that hypothesis.

Tests for outliers in distributions are similar in approach. The null hypothesis is that data comes from a given distribution. A test statistic must be chosen in some way, and the critical region for that statistic is again calculated using Monte Carlo simulation.

51

Page 52: Maxwell Distribution Rev3

DRAFT

Exploratory Data Analysis – Probability Plots

Prior to performing any type of hypothesis test regarding the null distribution, one may want to perform an Exploratory Data Analysis (EDA) or the sample data. EDA is a set of techniques that are used to obtain a qualitative understanding of the data and as an aid in formulating hypothesis tests regarding. EDA is a data driven procedure that is intended to “let the data speak for itself” rather than consideration to specified models. The only technique addressed in this paper is the use of probability plots, which are directly related to formulation of hypothesis about probability distributions.

Probability plots consist of taking the sampled values, placing them in ascending order, and calculating the empirical probability associated with each of these order statistics. The empirical probability is sometimes referred to as plotting position. The suggested plotting position formula for a data sample of size n is:

The empirical probability is plotted against the corresponding ordered sample value on one or more specially constructed “probability papers.” These papers are most easily constructed when the particular distribution can be put into a standardized “location-scale” form. Normal probability paper is the most common example.

A plot if the sample data along with the empirical probability should produce an approximate straight line, if the data is in fact from the assumed distribution. For the Maxwell distribution, there is no generally available probability paper. As such, a MATLAB function was written to produce Maxwell probability paper. Appendix G provides more discussion, as well as a listing of the code for this development.

In EDA, the probability plot is often used to obtain a preliminary indication of whether the data is in fact from the selected distribution. If the distribution is markedly different from a straight line, then proceeding with a formal goodness-of-fit test is not required. Rather, a different distribution should be considered.

In many cases, a generally good fit can be obtained, but a few of the data points deviate considerably at either the lower or upper end of the distribution. These values are potential outliers. The probability plot is quite helpful in indicating if outliers may be present in the sample. When possible outliers are identified, then some appropriate statistical test for identification is warranted. Figure F0 shows Maxwell probability plots where goodness-of-fit is questionable (panel a), and a plot where there appear to be outliers (panel b).

The following sections discuss relevant statistics and hypothesis tests for goodness-of-fit (GoF) tests. This is followed by a discussion of selecting a relevant statistics for testing outliers in for the Maxwell distribution along with tables of percentage points. Unless noted, a specific test applies to the one parameter Maxwell distribution.

52

Page 53: Maxwell Distribution Rev3

DRAFT

-4 -2 0 2 4 6 8 100

1

4

10

20

30

50

70

80

90

95

98

99

Maxwell Probability Plot

Prob

abilit

y %

Data Values0 1 2 3 4 5 6 7 8 9 10

0

1

4

10

20

30

50

70

80

90

95

98

99

Maxwell Probability Plot

Prob

abilit

y %Data Values

(a) Possible Poor Fit (b) Possible Outliers

Figure F0 – EDA Probability plots for Maxwell Distribution

Goodness-of-Fit Tests - General

A large number of goodness-of-fit tests have been devised over the last century. A good survey is provided in Reference [12]. This paper discusses only two of these: methods based on the Empirical Distribution Function (EDF) and those based on correlation.

One of the oldest tests for goodness-of-fit is Pearson’s Chi-squared test. These tests do allow for a composite null hypothesis, where only the distribution type is required. The Chi-squared test is accurate only when the sample size is large enough that the test statistic is in fact Chi-square distributed. Small samples or too few cells (or “bins”) can invalidate this test. The process of “binning” the observations leads to a loss of information, and these tests are generally less powerful than those based on order statistics or regression. A discussion and procedures for implementation of the Chi-Squared test are provided in Reference [10].

Empirical Distribution Function (EDF) Tests

EDF statistical tests are divided into two classes: those using supremum statistics and those using quadratic statistics. The statistic is the supremum (or maximum value) of the absolute difference between the empirical CDF of size n, , and the hypothesized CDF, :

53

Page 54: Maxwell Distribution Rev3

DRAFT

The empirical distribution function of the sample, consists of the probability values for the sample order statistics, where the r-th order statistic:

Supremum statistics are very useful because they are distribution free. This implies that the test can be applied to any continuous distribution, without having to derive particular values of the statistic for every distribution. The non-parametric nature of the test rests on the fact that random variables from arbitrary distribution are related to the distribution of uniform random variables on the unit interval [0,1].

The most common application of a supremum test is the Kolmogorov-Smirnov (K-S) procedure. A potential difficulty with the K-S test is that it requires a simple null hypothesis. As discussed, this requires that not only the distribution form be specified but also the parameters be specified. Estimating parameters from the data and then applying the K-S test is not correct and can lead to serious errors, particularly for small samples. The reason is that when the parameters are estimated from the data, the resulting order statistics may not longer be uniformly distributed as assumed by the K-S test. References 10 and 12 contain additional details on use of this test.

The distribution of the K-S statistic is derived in Reference [5]. Tables for percentage points at selected levels of significance as a function of sample size (n) are provided for this statistic. The MATLAB statistics toolbox provides an implementation of this test, with any choice of sample size and level of significance.

The K-S test is very good at assessing a shift between the hypothesized distribution and the data. This is because the supremum statistic is a measure of distance between the two distributions. On the other hand, scale differences between distributions often are most evidenced in the tail of the distribution, and the K-S statistic is least sensitive there. Recall that the variance of an order statistic is proportional to , where p is the quantile probability. This has a maximum at p = 0.5, and decreases to zero in the tail of the sample distribution. As such, the power of the test is highest when differences occur between central portions of the distributions. To alleviate this situation, the supremum statistics could be derived using a weighed average. This concept is discussed in the next section as applied to quadratic statistics.

Quadratic Tests

Quadratic statistics are formulated on the basis of a weighted average of the squares of the deviations from the empirical distribution and the hypothesized distribution . The general form is:

The function is the hypothesized distribution and is a weighting function. This approach also requires a simple null hypothesis specification in order to derive an analytic

54

Page 55: Maxwell Distribution Rev3

DRAFT

expression for the distribution of T. If , the resulting statistic is called the Cramer-von Mises statistic.

The following weight function leads to the Anderson-Darling statistic, Reference [13]:

Equation (F1)

This weight function tends to place a higher “value’ on the tails of the distribution, and as such is a more powerful test for detecting changes in scale between two distributions. The computational formula for this statistic, denoted as , is:

Equation (F2)

values from the EDF data, , where the are the ordered sample data values.

Monte Carlo Simulation of Statistics

The idea behind Monte Carlo development for goodness-of-fit tests rests on being able to generate random samples from the distribution that is being assumed (i.e., the null hypothesis distribution). Samples drawn from the actual distribution are random variables. If a very large number of samples are drawn, and the desired statistic is evaluated for each sample, percentage points can be calculated based on these results. The assumption is that the number of samples is large enough to adequately approximate the actual population.

This approach applies for either K-S or quadratic type tests, as well as for correlation and regression goodness-of-fit statistics. Reference [10] discusses this issue and states that when the distribution is of “location-scale” type, distribution of the test statistic does not depend on the actual parameter values. Heuristically, this seems reasonable since the location-scale random variable can be standardized. As noted in the main report, both the one and two parameter Maxwell distributions are of this type.

Quadratic statistics have the property that they converge very quickly to their asymptotic limits. For other than small sample sizes, the percentage points are therefore depend asymptotically only on the level of significance chosen. These statistics have also been found to have higher power than Chi-squared or K-S test as a result of the distribution tail having more weight in the sample.

One point necessary to keep in mind when using asymptotic results is: the estimator used for the unknown parameters must be asymptotically efficient. For the one parameter Maxwell distribution, this implies using the maximum likelihood estimator, . The MOM estimates for the two parameter distribution are not asymptotically efficient. If we are to use these moment estimates, the limiting distribution for the quadratic statistic would be a function of

55

Page 56: Maxwell Distribution Rev3

DRAFT

the sample size (as well as a function of the estimation method). The following results apply only to the single parameter Maxwell distribution.

Anderson-Darling Statistic for Maxwell One-Parameter Distribution

The methodology provided in Reference [12] was used to derive the Anderson-Darling asymptotic percentage points for the Maxwell distribution. These results, along with some very small sample value results, are given in Table F1 for selected levels of significance. The results are derived for 50,000 Monte Carlo samples, each of the size specified. The methodology is applied as follows:

1. Generate a Monte Carlo sample (size n) of standardized values “t” by obtaining the root-mean-square of three standard normal random variables. Note: the sample values are drawn from a population with parameter Arrange values in ascending order.

2. The MLE estimate of “a” is next calculated using the sample data values. This is accomplished as:

The factor is required because the normalization factor in the Maxwell distribution includes this term .

3. The n sample values must reflect that that an estimated parameter, rather than the actual population parameter is being used. The sample standardized value is then:

4. The empirical distribution function for the sample is calculated from the CDF:

5. Equation F2 is then calculated for each sample. This is done 50,000 times for the current results. The values of the statistic are sorted and quantile values selected corresponding to the selected significance level.

Sample Significance Level

0.50 0.25 0.10 0.05 0.02 0.01

56

Page 57: Maxwell Distribution Rev3

DRAFT

Size

5 .4589 .6646 .9358 1.1493 1.4458 1.7076110 .4723 .6868 .9728 1.2041 1.5304 1.782225 .4744 .6913 .9929 1.2239 1.5488 1.7797

> = 30 .4805 .7070 1.0094 1.2390 1.5643 1.8162

Table F1 – Maxwell Distribution Percentage Points for the A2 Statistic

It is emphasized that when using the percentage points in Table F1, the statistic must be calculated using the MLE estimate of the parameter. If the value of the parameter “a” is known (or assumed) independently, a table with different values at each significance level would result.

Correlation Statistics for Two-Parameter Maxwell Distribution

A somewhat different approach can be taken for testing goodness-of-fit when the distribution is of “location-scale” type. The idea uses the fact that the quantiles of order statistics from a sample should be distributed uniformly on the unit interval. The development follows Reference [12].

The procedure uses the form of the standard definition of the correlation coefficient, defined between a random variable and some set of constants, .

;

The usual correlation coefficient is defined between two random variables. The above definition extends this definition to the case where one set of variables is not random.

The distribution of is based on a large number of Monte Carlo simulations of random samples drawn from the selected distribution. In general, the distribution or will depend on the underlying distribution form, as well as the specific parameters. In the case of “location-scale” distributions, the random variables can be standardized, and only the resulting correlation results depend only on the distribution form.

The values from each sample are arranged in ascending order (i,e, as order statistics). The most representative constant associated with each order statistic is the expected value of that order statistic. Although the exact distribution of the order statistics are known (Appendix D), the moments of these distributions cannot be calculated in closed form for most probability densities, including the Maxwell distribution.

An alternative to the mean value of the order statistic is the mean value of the quantile associated with that order statistic. This value is known in a simple form:

57

Page 58: Maxwell Distribution Rev3

DRAFT

As an approximation, select constants such that:

Using the Delta Method, it can be shown that this is an approximation to the mean of the actual order statistic, for a one term Taylor expansion of the inverse function. As sample size increases, the approximation becomes more exact (Reference [5]).

Hence, for sample size n, and a specified location-scale distribution, a set ; for , can be calculated using the distribution inverse function. A large number of (Monte Carlos) samples are drawn from the specified distribution, and the regression with the fixed is computed for each sample. The resulting sample statistics are ordered, and percentage points determined.

As the statistic decreases, it is less likely that the sample belongs to the specified distribution (null hypothesis). In accordance with standard practice, the following related statistic is generally used:

This statistic has the property that it increases as indication of fit decreases.

A MATLAB implementation of the procedure was developed. As a check, the Type I Extreme Value distribution result correlation results of Reference [12] were checked, using 50,000 rather than 10,000 samples. The method was then applied to the Maxwell distribution. Note that these results can be used for either the one or two parameter distribution. The distribution of is quite different than a normal distribution.

Figure F-1 shows the distribution of the statistic for 50,000 Monte Carlos runs, and a sample size of 100 plotted on normal probability paper.

Percentage points for goodness-of-fit regression testing of a Maxwell distribution is provided in Table E2, for indicated levels of significance and sample sizes.

58

Page 59: Maxwell Distribution Rev3

DRAFT

0 0.02 0.04 0.06 0.08 0.1 0.12

0.0010.0030.01 0.02 0.05 0.10

0.25

0.50

0.75

0.90 0.95 0.98 0.99 0.9970.999

Data

Pro

babi

lity

Normal Probability Plot

Figure F1 – Distribution of Zn for n = 100 (50,000 MC runs)

Sample Size

Significance Level 0.50 0.25 0.10 0.05 0.02 0.01

25 .7601 1.1414 1.6714 2.1169 2.7207 3.232150 .8960 1.3301 1.9313 2.4214 3.1496 3.754375 .9699 1.4302 2.0813 2.6271 3.4239 4.0950100 1.0213 1.4964 2.1690 2.7294 3.5346 4.2037150 1.0894 1.5831 2.2870 2.8972 3.7203 4.4158200 1.1373 1.6498 2.3622 2.9590 3.8189 4.5451400 1.2345 1.7753 2.5359 3.1263 4.0064 4.6605750 1.3251 1.8966 2.6751 3.3299 4.2250 4.87621000 1.3611 1.9446 2.7354 3.3592 4.2418 4.9649

Table F2 – Percentage points for the Maxwell Distribution: Zn = n [1 – R2]

Figure F2 is a plot of the percentage points of Table F2. The plot shows that, for a given level of significance, an asymptote is approached when the sample is sufficiently large. For small sample sizes, the relationship is quite non-linear. This is a characteristic of all goodness-of-fit statistics although definition of “large sample” differs depending on the test.

59

Page 60: Maxwell Distribution Rev3

DRAFT

0 100 200 300 400 500 600 700 800 900 10000.5

1

1.5

2

2.5

3

3.5

4

4.5

5

sample size

pe

rce

nta

ge

po

ints

Z =

n [1

- R

2]

Figure F2 – Percentage Points for Maxwell Correlation Statistic

Regression Statistics

Regression statistics are related to correlations statistics, and can be derived the same way. For location-scale distributions, the regression uses the ordered sample values and the constants to perform an ordinary least-squares liner fit:

The test statistic is a function of the residuals of the least-squares fit because these tests are closely related to correlation tests. Also, because the power functions have not been determined, the choice of regression vs. correlation test is not evident.

For single parameter Maxwell distribution the Anderson-Darling statistic discussed above is believed to be more powerful than either regression or correlation statistics. The interested reader can consult Reference [12] for further discussion.

60

Page 61: Maxwell Distribution Rev3

DRAFT

Outlier Tests

Panel (b) of Figure F0 provides an example of a data set that appears to follow a Maxwell distribution quite closely, excepting for the three upper values. Applying one of the previous goodness-of-fit tests may or may not result in rejecting the null hypothesis of the distribution type. In either case, it may be of interest to determine if these extreme values can be identified as outliers.

Outliers have long history and a certain degree of empiricism is involved in selecting a relevant statistic for identifying and rejecting outliers as discussed in Reference [15]. Much of the work done on outliers has assumed that the samples are drawn from normal distributions. When distributions other than normal are involved, one should approach the historical criteria for outlier detection with caution.

Detection of outliers is a hypothesis testing problem as noted previously. The null hypothesis is that the data are from a specified distribution. The alternative hypothesis is that “k” outliers are present, where often k equals 1. In any case, the alternative hypothesis must state if the outliers are at the upper tail, the lower tail or possibly at either tail of the distribution. Any useful test for outliers will depend on this alternative hypothesis. Different decision statistics are usually be required to test different alternatives. The present paper deals only with the null hypothesis of a Maxwell distribution and an alternative that includes outliers on the upper tail.

As noted in the general discussion, an outlier test involves a composite hypothesis. For a decision statistic T, the test takes the form:

Null hypothesis H0: All data vales come from the same Maxwell distribution

Alternative H1: The data contains k outliers from a different distribution

Test: where is the probability of a Type I error (significance level)

The first question is: how can we come up with a useful statistic for testing for upper outliers? To the end, we consider basing our choice on the likelihood ratio for the null hypothesis and a selected alternative. Likelihood ratio tests are a tool used to handle composite hypothesis. This type of test compares the ratio if the likelihood of the composite hypothesis to the likelihood of the null hypothesis, given the particular sample data :

As the likelihood the alternative increases, the probability that the null hypothesis is true decreases. As such, the likelihood ratio increases as probability of rejecting the null hypothesis is increases. The idea is to find a suitable statistic T reflects this behavior. The approach is again to used Monte Carlo simulation to determine percentage points at specified significance levels for the statistic, under the assumption the null hypothesis is true.

61

Page 62: Maxwell Distribution Rev3

DRAFT

The specific type of alternative considered for the Maxwell outlier test statistic is for “slippage” alternative. This means that we consider one or more of the data values to have come from the same type of distribution (i.e., Maxwell), but to have different parameters than the null distribution.

Although the slippage alternative assumes the same distribution type for outliers, it should be reasonable if the outlier data are from a different distribution family (Reference [15]). In particular alternate distributions that are of the “exponential family” such as the Normal or Gamma distributions should have essentially the same test statistic. A definition and discussion of exponential families of distributions is provided in References [1], [2] and [3].

In the development that follows, consideration is restricted to the hypothesis of a single outlier at the upper tail. Development of a test statistic for k outliers is very similar. Results will be stated for multiple outliers. The approach is to calculate the likelihood ratio as discussed above and determine a statistic that increases as the likelihood increases. As discussed in the main text, we will actually use the log-likelihood for convenience of manipulation.

The log-likelihood function for the null hypothesis, n data values is:

The alternative hypothesis assumes that data values are from the same Maxwell distribution, but one observation is from a different distribution. This different distribution is characterized by a scale parameter , where is the “slippage parameter.” The log-likelihood function takes the form:

Note that the slippage parameter should be less than 1.0 in order for the tail of this “outlier” distribution to be extended. The data point represents the largest sample value. The log-likelihood for the alternative hypothesis is:

The parameter values that maximize each of these likelihood functions can be found in the usual way by zeroing the derivative and solving for the various terms. For the null hypothesis, the MLE solution is the same as derived in previous sections of this paper:

62

Page 63: Maxwell Distribution Rev3

DRAFT

An important notational convention is noted:

and

Substituting “a” into the log-likelihood for provides the maximum value of the null hypothesis function, for the given sample data:

The alternative likelihood function has two parameters. Note that parameter “a” in the alternative likelihood has a different value than it does for the null likelihood. Taking derivatives with respect to and a, and equating to zero and solving provides:

Substituting these expressions into the alternative likelihood function, and after some combining of terms, the maximum function value is:

Also, substituting for “a”, the last term of this expression becomes:

The maximized log-likelihood ratio is the difference of these functions:

:

Now define the statistic: and the above expression reduces to:

Equation (F3)

The relevant statistic is the ratio of the square of the largest data value and the average of the remaining data values. The following comments apply to this formulation:

63

Page 64: Maxwell Distribution Rev3

DRAFT

1. The statistic is always greater than 1.0. As such, it can be seen from Equation (F3) that as increases, the log-likelihood ratio increases. This implies that as becomes larger, the rejection of the null hypothesis will increase.

2. The statistic is called an “exclusive” statistic since the average does not include the

extreme value. An alternative “inclusive” statistic could have chosen as .

and are functionally related as can be easily shown. As such power of the test for either statistic is equal. In the following tables, the “exclusive” form of the statistic has been used.

3. The test statistic for k outliers, based on the slippage alternative is:

where

4. Strictly speaking, the likelihood ratio as used in the derivation of these statistics is not correct, because selecting the k largest values violates the assumption of randomness in the likelihood function. Reference [15] addresses this issue. Essentially, what is being done is to recognize that there are permutations of the sample data. If a hypothesis test was performed for each permutation, the one resulting in the maximum of the likelihood ratio would be chosen. Since this is the only one of interest, the above procedure can be used.

The following tables provide percentage points for the specified levels of significance. Each table is for a different sample size. The exclusive for of statistic is used. For outlier rejection, the significance level should be small. The tables reflect this consideration. All tables reflect results of 50,000 Monte Carlo simulation samples.

64

Page 65: Maxwell Distribution Rev3

DRAFT

Maxwell Outlier percentage points n = 25

Outliers Significance Level .10 .05 .02 .01 .005 .001

1 4.8145 5.4052 6.2223 6.8581 7.4441 ----2 4.4494 4.8802 5.4402 5.8519 6.2644 ----

Maxwell Outlier percentage points n = 50

Outliers Significance Level .10 .05 .02 .01 .005 .001

1 5.1436 5.7111 6.4765 6.9823 7.6112 8.95102 4.6623 5.0379 5.5728 5.9360 6.3128 7.05603 4.4140 4.7270 5.1355 5.4232 5.7602 6.34154 4.2591 4.5401 4.8972 5.1420 5.4380 5.9670

Maxwell Outlier percentage points n = 75

Outliers Significance Level .10 .05 .02 .01 .005 .001

1 5.3799 5.9265 6.6109 7.1600 7.7189 9.01122 4.8484 5.2052 5.6900 6.0244 6.3659 7.12853 4.5690 4.8665 5.2383 5.5123 5.7930 6.40354 4.3980 4.6556 4.9654 5.2079 5.4433 5.9793

Maxwell Outlier percentage points n = 100

Outliers Significance Level .10 .05 .02 .01 .005 .001

1 5.5275 6.0788 6.7843 7.3133 7.8463 9.10312 4.9945 5.3513 5.8099 6.1483 6.4609 7.24273 4.6902 4.9838 5,3575 5.6042 5.8680 6.50904 4.4938 4.7507 5.0644 5.2797 5.4996 5.98635 4.3535 4.5879 4.8676 5.0629 5.2502 5.6770

65

Page 66: Maxwell Distribution Rev3

DRAFT

Maxwell Outlier percentage points n = 200

Outliers Significance Level .10 .05 .02 .01 .005 .001

1 5.9591 6.4758 7.1402 7.6411 8.1419 9.41742 5.3773 5.7278 6.1531 6.4487 6.7565 7.50013 5.0492 5.3298 5.6506 5.8891 6.1371 6.66554 4.8270 5.0670 5.3452 5.5360 5.7442 6.18965 4.6628 4.8747 5.1289 5.2989 5.4698 5.8298

Maxwell Outlier percentage points n = 400

Outliers Significance Level .10 .05 .02 .01 .005 .001

1 6.4135 6.9305 7.5920 8.0892 8.5975 9.76962 5.8003 6.1378 6.5539 6.8654 7.1519 7.82583 5.4567 5.7216 6.0462 6.2671 6.4962 7.00094 5.2221 5.4428 5.7136 5.9018 6.0728 6.49555 5.0431 5.2376 5.4770 5.6349 5.7883 6.1710

66

Page 67: Maxwell Distribution Rev3

DRAFT

Appendix G

Probability Plots and MATLAB Code for Plot Generation

The first step in many Exploratory Data Analyses (EDA) of a data set is plotting the data on one or more types of probability. This allows for rapid qualitative assessment of whether the data appears to come from the selected distribution. The probability plot is also useful in identifying possible outliers as discussed in Appendix F.

MATLAB provides Normal distribution plots directly. Maxwell probability plots can be generated using appropriate MATLAB code and plotting utilities. The resulting probability paper can be used for either the one or two parameter distributions because of the “location-scale” nature of this distribution.

The figure on the next page is the general purpose Maxwell probability paper which can be copied and used for hand plotting if required. Following this figure is code for a MATLAB function that takes a data vector as input and develops the probability plot axes, plots the input data vector, and provides an estimate of the “best fit” Maxwell distribution for the given data.

Essentially, the probability plot is just plotting standardized values on both the ordinate and abscissa, both being on a linear scale. Various probability values, which correspond to fixed values of the standardized variable, are superimposed on the ordinate. This axis is always the same regardless of the data. The abscissa scale is linearly adjusted to fit the actual data input.

The best fit line is generated by obtaining estimates of the parameters and using the method of moments. The MOM estimators are used for simplicity since they can be calculated directly. The MOM and MLE estimators provided the same information for qualitative evaluation.

67

Page 68: Maxwell Distribution Rev3

DRAFT

0

1

4

10

20

30

50

70

80

90

95

98

99

Maxwell Probability PlotP

roba

bilit

y %

Data Values

General Purpose Maxwell Probability Paper

68

Page 69: Maxwell Distribution Rev3

DRAFT

MATLAB Code for Probability Plotting Function

function [ahat bhat] = maxwell_probplt(Y)%% This function takes in a data vector and develops a plot of the values on% probability axes corresponding to a general two dimensional Maxwell % distribution. A fitted line, based on moment parameter estimates is also% drawn on the plot.% Tht Maxwell CDF in terms of the standardized variable t is:%% F(t) = erf(t) - (2/sqrt(pi))*t*exp(-t^2)%% t = (Y - b)/a%% First sort the data array for plotting purposes%Y = sort(Y);% % check for at least four data valuesnsamp = length(Y);%if( nsamp < 4) disp('Error - Input requires at least 4 data values') returnend%% Prepare the basic probability plot axes. This amounts to plotting the% standardized Maxwell variable on both the X and Y axes. The Y axis then% has tick marks superimposed corresponding to probabilities.qv = (0:0.01:.99);for i = 1:length(qv) q = qv(i); T(i) = fzero(@(t) maxinv(t,q),0.5);endT(1) = 0; % rounding error may cause non zero value so set to zero. %Q = [0 0.01 0.04 .1 .2 .3 .5 .7 .8 .9 .95 .98 .99];for i = 1:length(Q) q = Q(i); L(i) = fzero(@(t) maxinv(t,q),0.5);endL(1) = 0;%%plot(T,T,'Linestyle','none'),gridset(gca,'YTick',L)set(gca,'YTickLabel',{'0';'1';'4';'10';'20';'30';'50';'70';'80';'90' ... ;'95';'98';'99'})title('Maxwell Probability Plot')ylabel('Probability %')xlabel('Data Values')hold%

69

Page 70: Maxwell Distribution Rev3

DRAFT

% % Now compute the empirical probability and the standardized varibles% associated with the empirical values. for i = 1:nsamp r = i/(1 + nsamp); et(i) = fzero(@(t) maxinv(t,r),0.5);end%% Plot the standardized empirical values on the Y-axis (which are% equivalent to the empirical probability) and the actual values on the% X-axis.plot(Y,et,'ro')%% Compute MOM estimates of the a and b for the Maxwell distribution% See document Appendix B for derivation%m2 = sum(Y.*Y)/nsamp;m1 = sum(Y)/nsamp;mdiff = m2 - m1^2;%ahat = 1.4849*sqrt(mdiff);bhat = m1 - 2.3695*sqrt(mdiff);if(bhat > Y(nsamp)) bhat = Y(nsamp);endTFit = L*sqrt(2)*ahat + bhat;%plot(TFit,L,'Linestyle','--')hold off%return%%% Solution of standardized variates requires root-finding solution to% contained in this functionfunction f = maxinv(t,q) f = erf(t) - (2/sqrt(pi))*t*exp(-t^2) - q;return

70

Page 71: Maxwell Distribution Rev3

DRAFT

Appendix H

Chi Distributions for k > 3 and Nakagami Distributions

It has been noted that the Rayleigh and Maxwell distributions can be derived from the Chi-squared distribution, with 2 and 3 degrees-of-freedom, respectively. These are useful because certain physical phenomena give rise to such distributions. Two and three dimensional miss distance distributions are examples. There may be physical situations where more than three terms are involved in the root-mean-square variable. It was noted in the main report that these give rise to Chi-distributions.

The Chi distributions can be further generalized so that the shape parameter k is no longer an integer value, but can take on any continuous value greater than zero. These are called Nakagami distributions. This distribution has been proposed in physical applications as a model for fading of communication signals in multi-path.

Basic Properties of Chi Distributions

Consider the following random variable, where the are normal random variables with a common variance “a”.

The chi-squared distribution of W, with k degrees of freedom is:

The distribution of the random variable y follows from using the standard Jacobian transformation method:

Equation (H1)

The cumulative density function is found by direct integration:

Equation (H2)

71

Page 72: Maxwell Distribution Rev3

DRAFT

The last function is the (lower) incomplete gamma function, and is defined as:

The incomplete gamma function has a recursive property that makes evaluation easier (Reference [16]):

and

These distributions for a given k are collectively referred to as “Chi-k distributions.”

These above relations show that when k is even, the distribution function involves only exponential terms, whereas when k is odd, the CDF involves the error function.

As an example, the above procedure can be applied when k = 4 to obtain:

It is noted that regardless of the value of k, the distribution can be written in terms of the standardized variable used for the Maxwell distribution:

The p-th raw moment of the Chi-k distribution is given as:

Equation (H3)

A plot of several Chi distributions (a = 1) is given in Figure H1. As k increases, the mode of the distribution shifts to the right as would be expected since the root-mean-square sum involves more terms. Differentiating Equation (H1) with respect to y and equating to zero gives the mode of the distribution as:

The figure also appears to show that the variance and mode probability value change very slowly as k increases.

72

Page 73: Maxwell Distribution Rev3

DRAFT

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Data Value

Pro

babili

ty

k = 3

k = 4k = 6

k = 9

Figure H1 – Chi-k Distributions for Parameter a = 1

Table H1 provides values of the distribution mean, variance, skewness and kurtosis excess for several of the Chi-k distributions. The raw moments were computed using Equation (H3).

k mean 2nd Moment variance skewness Kurtosis excess

3 1.5958 a 1.8800

3 a2 0.4535 a2 0.4857 0.1082

4 1.8800 a 4 a2 0.4657 a2 0.4057 0.0593

5 2.1277a 5 a2 0.4729 a2 0.3542 0.0370

6 2.3500 a 6 a2 0.4777 a2 0.3179 0.0251

9 2.9180 a 9 a2 0.4854 a2 0.2519 0.0106

Table H1 – Moment Descriptors for Selected Chi-k Distributions

The table confirms that variance of the distribution changes little, but the skewness and kurtosis excess indicate as k increases, the distributions approach the normal distribution. Approximating the Chi distribution as normal when k becomes “large” may be adequate for some applications.

73

Page 74: Maxwell Distribution Rev3

DRAFT

MLE Estimator and CRLB

All of the procedures discussed for the Maxwell distribution can be applied to Chi-k distributions. As an example, the log-likelihood function for a sample of size n is:

where . Differentiating and setting to zero produces the MLE estimate for the parameter:

The CRLB could be derived as was done for the Maxwell distribution using the appropriate density function. A simpler alternate argument can be used however. If we apply the Factorization theorem discussed in Appendix A to the above likelihood function, it is clear that the sufficient statistic for all k is the same:

The MLE estimate is a function of this sufficient statistic and as was the case for k = 3, the MLE is an efficient estimator. Thus, the CRLB is the mean square error for this estimator, and the necessary relation is (see main body or this paper):

The relation between relevant moments turns out to have a simple form:

and

Substituting and simplifying results in:

Equation (H4)

The variance of the MLE estimate can be used to compare efficiencies with the MOM or quantile estimator MSE if desired. It is interesting that the MSE of the estimator decreases with increasing k. Noting that “a” is the variance of underlying normal variables, it would be expected that as more variables are included in the summing (i.e., as k increases) , more information is available for a given sample size n, and the estimate MSE would decrease.

The MOM estimate for parameter “a” is the same as the MLE estimate when using the second moment. MOM estimates are discussed below in general terms for the Nakagami distribution.

74

Page 75: Maxwell Distribution Rev3

DRAFT

Random Number Generation and Inverse CDF

The distribution function of Equation [H2] becomes more complex as k increases. Generation of random variables from Chi distributions is easily accomplished by taking the root-mean-square of the number of normal random variables. This is the recommended procedure for developing Monte Carlo samples if statistics for outliers or goodness of fit are needed.

Calculation of Chi-k random variates at specified quantile values will require inversion of the distribution function. Examples where this is needed are for construction of probability paper or Monte Carlo evaluation of the distribution of certain statistics. When k gets large, deriving the exact distribution function using Equation (H2) and the recursive relations becomes unwieldy. Depending on the problem requirements, the normal distribution might be an alternative, with mean and variance calculated from Equation (H3).

It is noted that if the normal component distributions do not have the same underlying variance, and/or if there are correlations present, then the Mahalanobis distance should be used. In this case the Chi-k parameter for the distribution is unity.

Nakagami Distribution

As noted in the introduction, Chi-k distributions are related to even more general types of Nakagami distributions. The connection is established in the following discussion.

This distribution has the probability density function:

Equation (H5)

Here is a shape parameter and is a scale parameter. If these parameters are chosen:

and

With these choices, the Nakagami distribution becomes a Chi-k distribution. It is apparent that this former distribution is a generalization of the latter, where the shape parameter is not restricted to integers or an integer plus one-half.

If one is dealing with a set of data that is suspected of coming from a Chi-k distribution, but the value of k is uncertain, this more general form may be useful for estimating the value of the shape parameter. The likelihood function for the Nakagami distribution is:

Since two parameters need to be estimated, we ask if there are two sufficient statistics for this likelihood function. We can apply the Factorization theorem for multiple parameters. In the case of the above likelihood function note that:

75

Page 76: Maxwell Distribution Rev3

DRAFT

Thus, the data is represented by the two sufficient statistics:

and

The first of these is the usual summation (moment) statistic for the squares of the data values. The second on is a product statistic.

In order to determine the mean square error of the estimator, it would be required to know the variance of these two statistics. Appendix C and the main report, discuss how the variance of summation statistics can be calculated. Unfortunately, product statistics are much more difficult to deal with.

The maximum likelihood estimator of the two parameters can be developed however by the usual method. Consider the log-likelihood function:

Differentiating with respect to and equating to zero produces:

or

Differentiating with respect to and equating to zero results in:

Substituting the expression for into the last term reduces this term to n. The derivative of the natural logarithm of the gamma function is called the “psi” function or the digamma function. This function does not have a simple closed form and is usually represented by symbol .

This last equation can be solved numerically using a MATLAB function which implements evaluation of the digamma function. The resulting equation for the solution of is:

As a practical note, the equation solver may extrapolate values to . This value must be always positive. As such, these values need to be reset to some small positive value if this occurs. Generally, we would expect parameter .

Solving the previous equation gives . The analyst could use this, along with the estimate

and use the Nakagami distribution of Equation (H5). The mean square error for the

76

Page 77: Maxwell Distribution Rev3

DRAFT

estimate of is easily found since this involves only the data second moment. Mean square error of the estimate of does not have a simple expression.

The method of moments may be applied to the Nakagami distribution to obtain parameter estimates in terms of summation statistics. The general expression for the p-th moment of this distribution is found in terms of the gamma function as:

The second moment using this equation, and noting provides:

The MOM estimator for is the same as the MLE estimator. The first moment could be used for the remaining parameter; however, for simplicity the so-called “inverse normalized variance estimator” is generally used. This results by considering the fourth moment:

On rearrangement and substitution:

If required, the mean square error of these estimators can be derived using methods discussed in Appendix C for functions of moment estimators.

If the data is required to have a Chi-k distribution, it seems reasonable to choose k such that it rounds to the integer closest to . The estimate of parameter “aChi” would be updated using

this rounded value in the estimator equation: .

77

Page 78: Maxwell Distribution Rev3

DRAFT

Appendix I

Inverse Maxwell Distribution

The Inverse Maxwell (IM) distribution is related to the Maxwell distribution in the following way. If random variable y is Maxwell distributed, random variable is inverse Maxwell distributed.

The distribution of x can be derived in the usual manner for single variable function transformation:

Substituting the transformed variables in the Maxwell density results in the inverse Maxwell density function:

Equation (I-1)

The distribution function is obtained from direct integration of this equation:

Equation (I-2)

The symbol “erf ” denotes the error function as before. The IM distribution involves only a “scale” parameter, which leads to some simplification, as noted previously when the Maxwell distribution was discussed.

The density function is of indeterminate form as This limit can be evaluated by rewriting the expression and using L’Hopital’s rule twice.

Let then

A comparison of the IM and the Maxwell is shown in Figure I1 for a = 1. Applications for the IM distribution are not readily available, but from the shape of the density function suggests that it may be useful when the data are highly skewed, which often occurs when considering extreme value events.

78

Page 79: Maxwell Distribution Rev3

DRAFT

0 0.5 1 1.5 2 2.5 3 3.5 40

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8Pr

obab

ility

Dens

ity

y

Maxwell and Inverse Maxwell Densities a = 1

Inverse Maxwell

Maxwell

0 0.5 1 1.5 2 2.5 3 3.5 40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pro

babi

lity

Dis

tribu

tion

y

Maxwell and Inverse Maxwell Distributions a = 1

Inverse Maxwell

Maxwell

Figure I1: Maxwell and Inverse Maxwell Probability Functions (a = 1)

Basic Properties of the Inverse Maxwell Distribution

The mean and variance of the IM distribution are easily obtained by taking the required expectation operations:

The “inverse” effect of the value of the parameter a can be seen from the respective variances and means. As “a” increases in the Maxwell distribution, the mean and variance increase; whereas, the IM distribution mean and variance decrease.

Continuing with the third moment of the distribution:

Where is the exponential integral. At the upper bound (i.e., ), the argument approaches zero and this integral is undefined. Thus, the third moment does not exist. Proceeding in the same fashion, it can be shown that only the first and second moments exist.

The mode of the inverse Maxwell distribution is found by setting the derivative of the density function to zero:

79

Page 80: Maxwell Distribution Rev3

DRAFT

or

CRLB and Parameter Estimators

The CRLB for this distribution can be calculated in a similar manner to that for the Maxwell distribution. Recall:

and

The first integral is (and invoking L’Hopital’s rule for evaluation):

Likewise:

The CRLB for the inverse Maxwell distribution is:

Equation I-3

This value is the same as for the Maxwell distribution.

The maximum likelihood estimator for the IM density is found from the log-likelihood:

Differentiating with respect to parameter “a” and equating to zero results in:

or

For this distribution, the statistic is seen to be sufficient. It was proved that for the

Maxwell distribution, this estimator is efficient for any sample size “n”. The method there

80

Page 81: Maxwell Distribution Rev3

DRAFT

relied on the fact that the MLE estimator was a function of the second sample moment of the data. For the inverse Maxwell, this is not the case. The statistic cannot be expressed in terms of any sample moment.

Although an exact expression for the MSE is not available, likelihood theory shows that MLE’s are asymptotically efficient. Thus, for large enough sample sizes, the CRLB can be used as an estimate of the mean square error of this estimator.

Moment Estimator

It is tempting to use the method of moments and estimate the parameter “a”. It would seem that the second data moment would provide the estimate:

and

The MSE of this estimator is infinite because the fourth moment of the density is not defined.

The first moment could be used to derive an estimator. In this case:

and

The MSE does exist for this estimator. Carrying out the evaluation results in:

The efficiency of the first moment estimator can now be found:

This is a very low efficiency estimator. The MLE should be used when estimating the parameter of an inverse Maxwell distribution.

Quantile estimators depend only on the distribution and the relation between quantiles and the estimator. As such, these estimators could be derived if required. Further, Bayesian parameter estimation could be done if desired.

Inverse Maxwell Variate Generation

When a large number of IM distributed random variates are required, it is faster to use the inverse relationship

where

81

Page 82: Maxwell Distribution Rev3

DRAFT

In some applications, inversion of the IM distribution function is required. Because the distribution function involves only a scale parameter, it can be expressed in standard form by defining :

A general inversion algorithm for the scaled variates can be developed from this. Random variates from specific distributions are calculated from the scaled values.

The ability to make the distribution function non-dimensional allows for development of goodness-of-fit tests such as the Anderson-Darling or correlation type if desired. Also, tests for outlier rejection can be developed using the same likelihood ratio simulation techniques as applied for the Maxwell distribution.

Probability plotting paper can be developed easily for this distribution, using the same procedure as applied for the Maxwell distribution.

82

Page 83: Maxwell Distribution Rev3

DRAFT

References

1. Hogg, R. V., J. W. McKean, A. T. Craig, “Introduction to Mathematical Statistics,” 6th

Edition, Prentice-Hall, 2005.

2. Casella, G., and R. Berger, “Statistical Inference,” 2nd Edition, Duxbury Publishing, 2002.

3. Bury, K. V., “Statistical Models in Applied Science,” John Wiley & Sons, 1975.

4. Papoulis, A., and U. Pillai, “Probability, Random Variables and Stochastic Processes,” 4th Edition, McGraw-Hill, 2002.

5. Gibbons, J., and S. Chakraborti, “Nonparametric Statistical Inference,” 3rd Edition, Dekker Publishing, 1992.

6. Brookner, E., “Tracking and Kalman Filtering Made Easy,” J. Wiley and Sons, 1998.

7. Presss, W., et. al., “Numerical Recipes,” Cambridge University Press, 1986.

8. Wackerly, D. W. Mendenhall and R. Scheaffer, “Mathematical Statistics with Applications,” 5th Edition, Duxbury Press, 1996.

9. Parzen, E., “Modern Probability and Its Applications,” John Wiley and Sons, 1960.10. Conover, W., “Practical Nonparametric Statistics,” 2nd Edition, J. Wiley and Sons,

1980.11. Freund, J. E., “Mathematical Statistics,” 2nd Edition, Prentice-Hall, 1971.12. D’Agostino, R., and M. A. Stevens, “Goodness-of-Fit Techniques,” Dekker

Publishing, 1986.

13. Anderson, T.W., and D.A. Darling, “A Test of Goodness of fit,” J. American Statistical Association, Vol. 49, No. 268, December 1954.

14. Schmeiser, Bruce W., “Some Myths and Common Errors in Simulation Experiment,” Proceedings of the 2001 Winter Simulation Conference.

15. Barnett, V. and T. Lewis, “Outliers in Statistical Data,” 3rd Edition, John Wiley and Sons, 1994.

16. Abramowitz, M. and I. Stegun, “Handbook of Mathematical Functions,” Applied Mathematics Series 55, National Bureau of Standards, U.S. Government Printing Office, 1964.

83