Chapter 4 Modeling and Analysis of Uncertainty. Statistical Inference Statistical inference is using...

Chapter 4

Modeling and Analysis of Uncertainty

Statistical Inference

• Statistical inference is using the information contained in a sample to make decisions about the population. Statistical inference has two sub areas:

1. Parameter estimation – the estimating of a parameter of a population probability distribution. Point and Interval estimates.

2. Hypothesis testing – the testing of a hypothesis made regarding a parameter of a probability distribution.

Point Estimation• If X is a random variable with a probability

distribution f (x), and the distribution has an unknown parameter θ, and if X1..Xn is a random sample of size n from f (x)

.ˆ statistic a of ˆ valuenumerical

single a is parameter population some of estimatepoint A 3.

.ˆ as toreferred

is that valuenumerical a has ˆ and valuesnumerical have

variablesrandom the,been taken has sample random After the 2.

variable.random a is ˆ Therefore

sample. in the ) variablesrandom (all nsobservatio theoffunction a ish

. ofestimator point a called is X,X,Xhˆ .1 n21

)ˆ( EstimatesPoint and Parameters

Population Mean μ ------------Sample mean

Population Variance σ2 ----------Sample variance S2

Difference in two means μ1–μ2----Difference in two sample means

All of these point estimates are statistics.

A. p-quantile of X and denote the p-quantile of the standard normal. Assume X~N(μ, σ2) . Then

Let denote thepx

xF x p p

Thus, we would estimate bypxˆ ˆ ˆ .p p px z x sz (4)

Estimator for Percentiles

B. When sampling from X exp(),

1 1, , ln 1 ln 1 .X X p Xx p p

1ˆ ˆ ˆ, , and ln 1 . px x x px

What if we wanted to estimateWe wouldn’t use s2, we would use

2 2ˆ .X x

Estimators for Percentiles

Example 1: The data (ordered) are thicknesses of copper foil from a rolling operation (units in mils).

Average: 5.04StDev: 0.165745N: 15

Anderson-Darling Normality TestA-Squared: 0.159P-Value: 0.936

4.7 4.8 4.9 5.0 5.1 5.2 5.3 5.4

THICKNESS

Normal Probability PlotTHICKNESS4.724.844.874.944.985.005.015.055.065.075.115.125.215.245.38

The normal distribution provides a good fit to the data. Thus, assume the data come from a normal distribution, with

ˆ ˆ5.04, 0.1657.x s

The 90th percentile is estimated by

0.90 0.90ˆ 5.04 0.1657(1.2816) 5.25x x sz

(To obtain standard normal quantiles use normal tables or MINITAB or find normal quantiles directly from MINITAB –we’ll demonstrate).

The sample 90th percentile (see Chapter 2) is 5.30.

Unbiased Estimators

• Bias is a measure of closeness of the estimator to the parameter. It is defined by:

The point estimator is an unbiased estimator for the parameter θ if:

E( ) = θ • If the estimator is not unbiased, then the

difference E( ) - θ is called the bias of the estimator .

In words: “An estimator is unbiased when its expected value is equal to the parameter it estimates.”

Pdf of a biased estimator 2

Pdf of an unbiased estimator 1

“On the average” whenan error of size is made.

2ˆ is used,

Point Estimation

Example 2: Recall that for a random sample from a population with mean 2 and variance ,

2 2 and .E X E S

Thus, the sample mean (variance) is always unbiased for the population mean (variance).

But note – the sample standard deviation is never unbiased for :

2 2E S E S E S

However, when sampling from the normal distribution, for sample sizes 10 the bias is very small.

Remark: In the expression for sample variance,

why not use n in the denominator? Because by using n – 1, we get an unbiased estimator for the population variance.

Whether or not an estimator is unbiased, the smaller the variance the better.

Point Estimation

Estimator Variance

• When there are two unbiased estimators for the same parameter, a decision as to which is better can be made by examining the variance of each.

• The estimator with the least variance is more likely to result in an estimate that is closer to the population parameter.

• The unbiased estimator with the least variance is referred to as the minimum variance unbiased estimator (MVUE).

Mean Square Error

• A measure that incorporates both the bias and the variance of an estimator is the mean square error.

• The mean square error of an estimator

of the parameter θ is defined as :

MSE ( ) = E( - θ)2 • The mean square error can be rewritten as:• MSE ( ) = V( )+(bias)2

Relative Efficiency

• The mean square error can be used to compare the ability of two estimators to estimate a parameter.

• For two estimators , , the relative efficiency of to is:

A ratio less than 1 indicates that is not as efficient as in estimating the parameter value .

1. The MSE of an unbiased estimator is simply 2. An estimator with small variance and small

bias may be more efficient than one with a larger variance and no bias, as depicted below.

Pdf of unbiased estimator 1

Pdf of a biased estimator

Which estimator is better? Why?

ˆVar .

Standard Error

• The standard error of a statistic is the standard deviation of its sampling distribution.

• If the expression for the standard error contains its own parameters with unknown values, but which have estimators, use the estimators and the result is the estimated standard error of a statistic.

Point Estimation

Hypothesis Testing

We like to think of statistical hypothesis testing as the data analysis stage of a comparative experiment, in which the engineer is interested, for example, in comparing the mean of a population to a specified value (e.g. mean pull strength).

Statistical Hypotheses

Hypothesis Testing

• A statistical hypothesis is a statement about the parameters of one or more populations. They are expressed

H0: μ = 50cm/s the null hypothesis

H1: μ ≠ 50cm/s the alternative hypothesis

• Generally the interest is in whether or not a parameter has changed, is in accordance with theory, or is in conformance with a requirement.

Hypothesis Testing cont.

• Testing the hypothesis involves taking a random sample, computing a test statistic, and from the test statistic make a decision about the null hypothesis.

• The value of the test statistic is compared to the critical region, and if it is in the critical region the null hypothesis is rejected.

Type I Error• The limits of the critical region are referred to as

the critical values.• These critical values depend on the desired

probability for rejecting the null hypothesis when in fact it is true.

• Rejecting the null hypothesis H0 when it is true is defined as a type I error.

• The probability of a type I error is denoted as α. P( reject Ho when Ho is true)

• The type I error probability is also referred to as the significance level or size of the test.

• Generally the α value is decided prior to conducting the test.

Type II Error

• Failing to reject the null hypothesis when it is false is referred to as a Type II error.

• The probability of making a type II error is denoted as β. P( fail to reject H0 when H0 is false).

• This probability is a function of 2 values. First, how close the hypothesized value of the parameter is to the actual value of the parameter, and second, the sample size.

• The closer the two values of the parameter and the smaller the sample size, the larger the value of β.

• To calculate β, a true population parameter value is required.

Power of a test

• The power of a statistical test is the probability of rejecting the null hypothesis when it is false.

• This probability is 1-β.

• If the power value is deemed to low, increase α or the sample size.

Hypothesis Testing

For example, suppose that we are interested in the burning rate of a solid propellant used to power aircrew escape systems.

• Now burning rate is a random variable that can be described by a probability distribution.

• Suppose that our interest focuses on the mean burning rate (a parameter of this distribution).

• Specifically, we are interested in deciding whether or not the mean burning rate is 50 centimeters per second.

Statistical Hypotheses

Hypothesis Testing

4-3.1 Statistical HypothesesTwo-sided Alternative Hypothesis

One-sided Alternative Hypotheses

Hypothesis Testing

4-3.1 Statistical HypothesesTest of a Hypothesis • A procedure leading to a decision about a particular hypothesis

• Hypothesis-testing procedures rely on using the information in a random sample from the population of interest.

• If this information is consistent with the hypothesis, then we will conclude that the hypothesis is true; if this information is inconsistent with the hypothesis, we will conclude that the hypothesis is false.

Hypothesis Testing

4-3.2 Testing Statistical Hypotheses

Hypothesis Testing

Sometimes the type I error probability is called the significance level, or the -error, or the size of the test.

Hypothesis Testing

4-3 Hypothesis Testing

Hypothesis Testing

• The power is computed as 1 - , and power can be interpreted as the probability of correctly rejecting a false null hypothesis. We often compare statistical tests by comparing their power properties.

• For example, consider the propellant burning rate problem whenwe are testing H 0 : = 50 centimeters per second against H 1 : not equal 50 centimeters per second . Suppose that the true value of the mean is = 52. When n = 10, we found that = 0.2643, so the power of this test is 1 - = 1 - 0.2643 = 0.7357 when = 52.

Hypothesis Testing

4-3.3 One-Sided and Two-Sided Hypotheses

Two-Sided Test:

One-Sided Tests:

Hypothesis Testing

4-3.4 General Procedure for Hypothesis Testing

4 and 5 determine 6

Case 1: Inference on the Mean of a Population, Variance Known

Assumptions

4-4 Inference on the Mean of a Population, Variance Known

4-4.1 Hypothesis Testing on the Mean

We wish to test:

The test statistic is:This statistic follows the standard normal distribution when H0 is true

Reject H0 if the observed value of the test statistic z0 is either:

z0 > z/2 or z0 < -z/2

Fail to reject H0 if

-z/2 < z0 < z/2

As ± z/2 are the critical values that define the borders of the critical (reject) region of the test statistic

Inference on the Mean of a Population, Variance Known

4-4.2 P-Values in Hypothesis Testing

Alternatively it is the probability that the test statistic will assume the calculated value z0 or greater when the null is true.

4-4.2 P-Values in Hypothesis Testing

4-4.3 Type II Error and Choice of Sample Size

Finding The Probability of Type II Error

Sample Size Formulas

Example 3: A. Let 1 2, , , be a random sample fromnX X X

2 2 , . Since , , the standard error of

as an estimator of is . In most cases will be

unknown. The estimated statndard error is

N X N n

ˆˆ .X

B. For the data of Example 2, (thickness of copper foil)

ˆ ˆ5.04, 0.1657.x s

0.1657ˆ 0.0428.

Recall that MINITAB displays this:

Descriptive StatisticsVariable N Mean Median TrMean StDev SE MeanTHICKNESS 15 5.0400 5.0500 5.0385 0.1657 0.0428

Variable Minimum Maximum Q1 Q3THICKNESS 4.7200 5.3800 4.9400 5.1200

Example 4: The mean force exerted at a particular point in a newly designed mechanical assembly should be less than 4000 psi. Based on previous designs, the engineer “knows” that the force is normally distributed with standard deviation 65 psi. He will obtain data on 20 prototype assemblies to test

: 4000

at the 0.01 level of significance.

A. He obtained 3982.8.x

0 0.01 0

40001.18

Therefore z not in the critical region.

Do not reject .

z z is

The engineer cannot (as he “hoped” to) conclude that the mean force is less than 4000 psi.

B. Do the problem on MINITAB, but note that you must have raw, not summarized, data.

The engineer collected the data on the next slide.

Stat > Basic Statistics > 1-Sample Z

MINITAB Output:Test of mu = 4000.0 vs mu < 4000.0The assumed sigma = 65.0

Variable N Mean StDev SE Mean Z PFORCE 20 3982.8 60.5 14.5 -1.18 0.12

P-value 0.01.Do not reject H0

4-4.4 Large-Sample Test

In general, if n 30, the sample variance s2 will be close to 2 for most samples, and so s can be substituted for in the test procedures with little harmful effect. That is for large sample sizes:

approx

The results we have developed for the case of known variance are used for large sample size tests with unknown variance, except for two important differences.

1. Wherever you see a substitute the sample standard deviation s.

2. The confidence intervals and hypothesis tests become approximate rather than exact.

3. If the underlying distribution is not too non-normal, the approximation should be adequate if n 30.

We’ll do an example.

Example 5: A civil engineer working for NYSDOT tests 50 standard-sized specimens of concrete for moisture content after 5 hours of drying time. She collects the following data (appropriate units). She also constructs the histogram shown below.

11.715.329.113.139.19.414.911.211.225.0

26.431.020.612.419.616.95.215.817.821.1

19.641.739.322.733.123.423.019.319.317.2

5.159.45.712.127.76.516.921.932.020.9

18.610.45.714.211.715.315.219.511.413.6

19.38 10.51x s

0 12 24 36 48 60

MOISTURE

The histogram indicates that distribution is quite skewed. She also does a normality test: in MINITAB do

Stat > Basic Statistics > Normality Test

Average: 19.384StDev: 10.5096N: 50

Anderson-Darling Normality TestA-Squared: 1.264P-Value: 0.002

10 20 30 40 50 60

MOISTURE

Normal Probability Plot

Very low P-value

She just wants a 95% confidence interval on the mean moisture content. Since the sample size is fairly large, an approximate 95% confidence interval is:

10.5119.38 1.96

19.38 2.91 16.47,22.29

You can do this on MINITAB by “making believe” that is known and s.

4-4.5 Some Practical Comments on Hypothesis Testing

Statistical versus Practical Significance

4-4.5 Some Practical Comments on Hypothesis Testing

Statistical versus Practical Significance

Interval Estimation

The standard error provides a measure of precision of an estimate. In fact, very often in the engineering literature a point on a graph will have so-called “error bars” around it. The plotted point is actually the mean of a sample and the error bar is This is actually a special case of a confidence interval, formally defined on the next slide.

ˆ .Xx

4-4.6 Confidence Interval on the Mean

Two-sided confidence interval:

One-sided confidence intervals:

Confidence coefficient:

0.005 2.576

0.01 2.326

0.025 1.96

0.05 1.645

Example 6: A. Let 1 2, , , be a random sample fromnX X X

2 , and assume that is known. Since ,X

XP z z

P z X zn n

P X z X zn n

[by (10)]

Comparison of (9) and (11) implies that a 100(1 – )% confidence interval for is

2 2 2, .x z x z x zn n n

B. The output temperature of a gas turbine is An engineer has “tweaked” the design in order to increase it by at least 5%. The standard deviation will not be affected. A test of 16 turbines gives a sample mean of 678º. By (12), a 95% confidence interval for the (population) mean is

2625 C,22 .N

22678 1.96 678 10.78 667.2,688.8 .

Relationship between Tests of Hypotheses and Confidence Intervals

If [l,u] is a 100(1 - ) percent confidence interval for the parameter, then the test of significance level of the hypothesis

will lead to rejection of H0 if and only if the hypothesized value is not in the 100(1 - ) percent confidence interval [l, u].

Confidence Level and Precision of Estimation

The length of the two-sided 95% confidence interval is

whereas the length of the two-sided 99% confidence interval is

Choice of Sample Size

Example 7: The tensile strength of a fiber used in the manufacture of insulation material is normally distributed with a standard deviation of 0.04gm. Having made some process adjustments, however, the mean has not been established. In a test of 40 fibers, the data shown on the next slide was collected. Find a 99% confidence interval for the population mean.

9.9910.0510.029.9910.059.9610.009.9910.0310.0510.029.989.9610.0110.009.9010.049.999.939.99

9.979.9610.0010.079.959.969.979.9710.019.979.9910.0110.0110.019.959.999.979.959.9610.01

Tensile Strength (gms)

9.9908x

The 99% confidence interval is:

0.049.9908 (2.5758)

409.9908 (2.5758)0.00632

9.9908 0.0163 9.974,10.007

Let’s see how to do it using MINITAB.

Do Basic Statistics > 1-Sample Z

The result of Example 7 is: “With 99% confidence, the mean tensile strength is between 9.974 and 10.007 grams.” However, for a variable like tensile strength, an engineer would perhaps like to say: “With 99% confidence, the mean tensile strength is at least (whatever).”

This brings us to the notion of one-sided confidence intervals, defined next.

Definition: Suppose is an unknown parameter. (i) Suppose L is a statistic based on a sample 1 2, , , . If nX X X

1 , P L (23a)

then the observed value of L, l say, is called a 100(1 – )% one-sided lower confidence limit or bound for .

(ii) Suppose U is a statistic based on a sample

1 2, , , . If nX X X

1 ,P U (23b)

then the observed value of U, u say, is called a 100(1 – )% one-sided upper confidence limit for .

We said we would define one-sided confidence intervals, but in the definition, we refer to one-sided limits or bounds. Well – we really have defined intervals. Corresponding to the limits l and u in (23a,b) are the intervals (-, u) and ( l, ).

For the present case of sampling fromknown, one-sided lower and upper 100(1 )% confidence limits are respectively

2, with N

and .x z x zn n

Example 8: In Example 7, it would probably have been more appropriate to report a one-sided confidence limit.

Suppose there is a customer-mandated spec that specifies that the mean strength be at least 10. The lower 99% one-sided confidence limit for the mean tensile strength of this fiber is

0.019.9908 (0.00632) 9.9908 2.3263(0.00632) 9.976z

Remark: MINITAB Student doesn’t do one-sided limits directly. Can you figure out how to do it indirectly?

One-Sided Confidence Bounds

Problem 4-27• In the production of airbag inflators for

automotive safety systems, a company is interested in ensuring that the mean distance of the foil to the edge of the inflator is at least 2 cm. Measurements on 20 inflators yielded an average value of 2.02 cm. Assume a standard deviation of .05 on the distance measurements and a significance level of .01.

a) Test for conformance to the company’s requirement.

Airbag Inflator and Assembly

Problem 4-27b) What is the P-value of this test?c) What is the β of this test if the true mean is

1.97 cm.?d) What sample size would be necessary to

detect a true mean of 1.97 with probability of at least .90?

e) Find a 99% lower confidence bound on the true mean.

f) Use the CI found in part e to test the hypothesis in part a.

g) What sample size is required to be 95% confident that the error in estimating the mean is less than .01

Let be a random sample fromwith ² unknown.

This case differs from Case I in that now we shall consider the more realistic situation in which the variance is unknown. The primary reason for considering Case I at all is that it is a convenient way of illustrating the basic concepts of confidence interval estimation and hypothesis testing.

1 2, , , nX X X 2, ,N

4.4 Case II: Inference on the Mean of a Population, Variance Unknown

When sampling from the normal distribution the exact confidence interval for is based solely on the fact that

and for the exact hypothesis test, under H0

0 0,1 .X

Although (27a,b) are true whether or not the variance is known, the problem is that if is not known this knowledge does us no good with respect to confidence intervals and hypothesis tests for the mean.

Intuitively, what you don’t know, you estimate. In particular, suppose in (27a), we estimate the population standard deviation by the sample standard deviation. This leads to the question

The pdf of the random variable T defined in (28) has been derived [see eqn. (4-40), pg. 163, text)]. The distribution is called (Student’s) t distribution. The parameter of a t distribution is it’s degrees of freedom, say k.

In particular, the random variable defined by (28) has n – 1 degrees of freedom (write

Thus, for each sample size, T has a different pdf.

Intuitively, it would seem that the pdfs of the t distributions should look like standard normal pdfs. They do, but there is a difference. Some representative t pdfs are shown on the next slide.

1).nT t

Pdfs of several t distributions, where k denotes the degrees of freedom. They look much like standard normal pdfs, but the tails are higher (more variance), but note that the t pdf approaches that of the standard normal as .k

Higher tails than pdf of standard normal

Percentage points of t distributions are defined similarly to those of the standard normal as shown in the figure below.

Percentage points of the t distribution, where k denotes the degrees of freedom. To obtain percentage points, use MINITAB or Table II in Appendix A. We’ll demonstrate.

4-5 Inference on the Mean of a Population, Variance Unknown

Example 4-7• A sample of size 15 was taken on the

coefficient of restitution of golf club drivers.

The variance of the coefficient is unknown

• Does the mean coefficient of restitution exceed .82

4-5.2 P-value for a t-Test

We need to know what is the distribution of the test statistic

under H1?

The answer is: T0 has a so-called non-central t-distribution. The probability for making a type II error, and the corresponding power of the test can be illustrated for hypothesis tests having an unknown variance in an identical fashion as was done for tests having a known variance. However, numerically this is a cumbersome computational procedure, an we will find β values and the power of a test using Minitab only.

- P( Type II Error)

β - P( Type II Error)Example 4-8

If the mean differs from .82 by as much as .02, is the sample size of 15 adequate to ensure the null hypothesis will be rejected with probability of at least .80

• Stat>Power and Sample>1 sample t

Calculate power for each sample size

Specify a sample size, difference, sigma, click on options and insert the type of alternative hypothesis and level of significance.

Sample size

• The sample size required to detect a difference from the hypothesized mean when the variance of the characteristic of interest is unknown, with a specified power will be determined using Minitab.

Example 4-8• Stat>Power and Sample>1 sample t

Calculate sample size for each power value

Specify a power, difference, sigma, click on options and insert the type of alternative hypothesis and level of significance.

Fortunately, this unpleasant task has already been done, and the results are summarized in a series of graphs in Appendix A Charts Va, Vb, Vc, and Vd that plot for the t-test against a parameter for various sample sizes n.

These graphics are called operating characteristic (or OC) curves. Curves are provided for two-sided alternatives on Charts Va and Vb. The abscissa scale factor d on these charts is defined as

Confidence Intervals for the MeanWe can derive a confidence interval for the t-distributed test statistic in a manner similar to when the test statistic follows the standard normal distribution. We can write:

2, 1 2, 1

XP t t

P t S n X t S n

P X t S n X t S n

By the definition of confidence intervals (see (9)), we see from (29) that

(29)1where ~

A 100(1 – )% confidence interval for the mean of a normal distribution, variance unknown, is

2, 1 2, 1 2, 1, .n n n

s s sx t x t x t

To obtain one-sided lower (upper) 100(1 – )% limits, in (30), change /2 to and take the lower (upper) limit.

Example 9: Suppose in Example 7 neither the mean nor the standard deviation is known.

A. Use the data given in Example 7 to find a 99% confidence interval for the mean.

In Example 7,confidence interval when the variance is unknown is thus

9.9908 and 0.03540. The 99%x s

0.005,39

0.035409.9908 (2.7079)

409.9908 (2.7079)0.0055972

9.9908 0.015156 9.976,10.006

Recall when the variance was known in ex. 7, the CI was

B. Now use MINITAB.

Go to: Stat>Basic Statistics > 1-Sample t

The results appear on the next slide.

9.9908 0.0163 9.974,10.007

Recall that in the case where the variance is known, sample size for a 100(1 )% confidence interval is given by (26), repeated below

Sample Size Determination for Confidence Intervals (Variance Unknown)

In the present case with the variance unknown, the best that can be done is to use (26) with your best guess of . You can also take a pilot sample and estimate .

Chapter 4 Modeling and Analysis of Uncertainty. Statistical Inference Statistical inference is using...

Documents

Models and Statistical Inference: The Controversy …stats.org.uk/statistical-inference/Lenhard2006.pdfBrit. J. Phil. Sci. 57 (2006), 69–91 Models and Statistical Inference: The

CO902 Probabilistic and statistical inference · CO902 Probabilistic and statistical inference Lecture 1 ... “Probability and Statistics ... Casella & Berger “Statistical Inference

Statistical Inference - UZH

Contrasts & Statistical Inference

통계적 추론 Statistical Inference 개념wolfpack.hnu.ac.kr/2015_Fall/D4BE/통계적추론.pdfMathematical Statistics Statistical Inference 통계적 추론 Statistical Inference

Statistical Inference Two Statistical Tasks 1. Description 2. Inference

6.2 Statistical inference

BASIC STATISTICAL INFERENCE

Statistical Inference - Lecture 1

Statistical Inference Web

Constrained Statistical Inference for Categorical Datacurve.carleton.ca/.../said-constrainedstatisticalinferenceforcategorical.pdfConstrained Statistical Inference for Categorical

Statistical Inference

Casella Berger Statistical Inference

Introduction to Statistical Inference - Department of Statistical

PARAMETRIC STATISTICAL INFERENCE

Lecture 4: Statistical Inference

The Bayesian approach to inverse problems€¦ · Statistical inference Why is a statistical perspective useful in inverse problems? To characterize uncertainty in the inverse solution

Big Data, Statistical Inference and Official Statistics • BIG DATA, STATISTICAL INFERENCE AND OFFICIAL STATISTICS † 1351.0.55.054 1 BIG DATA, STATISTICAL INFERENCE AND OFFICIAL

Statistical Inference: Introduction

Introduction to Statistical Inferencefab2/inference_talk.pdfIntroduction to Statistical Inference. Statistical Inference. Statistical Inference. Inference data" if the ratio of its