1 Lecture 19: Hypothesis Tests Devore, Ch. 8.1. Topics I.Statistical Hypotheses (pl!) –Null and...

Preview:

Citation preview

1

Lecture 19: Hypothesis Tests

Devore, Ch. 8.1

Topics

I. Statistical Hypotheses (pl!)– Null and Alternative Hypotheses– Testing statistics and rejection

regions

II. Errors in Hypothesis Testing– Type I and II errors

I. A Statistical Hypothesis..

• Is a claim about the value of a single population characteristic, or relationship between several population characteristics.

– Examples of Claims:• Mean diameter of the engine cylinder is 81 mm.• Mean of batch 1 is no different than the mean of batch 2.• Variance of batch 1 is different than variance of batch 2.• % Defective of batch 1 is less than 5%.

• In hypothesis testing, we take a sample of data and test a claim.

Null and Alternative Hypotheses

• To evaluate a claim, you identify a null and alternative hypothesis.

• Null Hypothesis, Ho– Claim that is initially assumed to be true.

• Alternative Hypothesis, Ha– Assertion that is contradictory to Ho.

• Null Hypothesis is rejected if sample evidence suggests that it is false. If not false, we fail to Reject Ho.

• So, possible outcomes of test are:– Reject Ho Or Fail to Reject Ho – NOTE: fail to reject Ho is different from saying

that we have proven Ho is true.

“Favored Claim”• In setting up a test, we typically have a favored claim which is

the Ho.– In practice, we typically set Ho as the condition with the

“=“ sign, and Ha as the condition with “<“, “>”, or “≠”• Familiar analogy: innocent until proven guilty.

• Practical examples:– Suppose you want to determine if a worker performs his

job ok.• Ho: worker is meeting the minimum job requirements.• Ha: worker is not meeting the minimum job

requirements.

– Suppose you want to know whether to rework a machine-tool.

• Ho: machine produces a part feature average on target.• Ha: machine does not produce a feature average on

target.

Sample Null Hypotheses

• Identify a null and alternative hypothesis for each of the prior examples.

– Mean diameter of the engine cylinder is 81 mm.

– Mean of batch 1 is not different than mean of batch 2.

– Variance of batch 1 is different than the variance of batch 2.

– % Defective of batch 1 is less than 5%.

A Test of Hypotheses

A test of hypotheses is a method for using sample data to decide whether the null hypothesis should be rejected.

Statistical Hypothesis Tests

• Hypothesis tests often involve one of the following:– Comparison of Means

• Single mean to a standard value• Two sample means• More than two sample means

– Comparison of Variances• Single variance to a standard value• Two sample variances• More than two sample variances.

– Comparison of Proportions• Single proportion to a standard value• Two proportions

• Of course, they may be applied to any test statistic: e.g., correlation, median, if distribution is normal, etc.

Test Procedures

• To perform a hypothesis test, you need:

– Null and Alternative Hypothesis– An assumed data pattern / distribution

(e.g., normal, iid)– Test Statistic - function of the sample data

on which the decision is based.– Rejection region - set of test statistic

values for which the Ho is rejected. (based on error threshold)

Example: Cylinder Bore

• Suppose you take a sample of 25 cylinders and consider a mean to be off target if bore diameter < 80.9. – If X ~ N( = 81, x

= 0.302) x-bar = ?

• Construct a 95% two-sided CI on the true mean, ,?

• With 95% confidence, would you conclude that a sample mean of 80.9 is different than a mean = 81?

II. Errors in Hypothesis Testing

• Rejection region of a hypothesis test is based on an acceptance of error when drawing a conclusion.

• When drawing a conclusion based on a test, four results are possible (hint: think of possible outcomes in a jury trial).

TRUTH What the Jury Says

Innocent / Guilty Innocent / Guilty

Outcomes of a Decision

ConcludeOr Say

NotDifferent

Different

Truth

Not Different Different

Definitions of Error Types

• Type I error [also known as alpha () error] - • FORMAL: Reject Ho when Ho is true• PRACTICAL: Conclude a difference exists when no

difference exists.

• Type II error [also known as beta () error] - • FORMAL: Fail to Reject Ho when Ho is false • PRACTICAL: Conclude no difference exists when a

difference exists.

• Why might a decision from a statistical test be wrong?

• Can we eliminate both types of errors?

Rejection Region: and

Suppose an experiment and a sample size are fixed, and a test statistic is chosen.

Decreasing the size of the rejection region to obtain a smaller value of

results in a larger value of , for any particular parameter value consistent with Ha.

Significance Level

Specify the largest value of that can be tolerated and find a rejection region having that value of . This makes as small as possible subject to the bound on . The resulting value of is referred to as the significance level.

Level Test

A test corresponding to the significance level is called a level test. A test with significance level is one for which the type I error probability is controlled at the specified level.

P(type I error) ~

• If we assume that Ho is true, we may calculate the probability of a wrong conclusion.

• Requirements: need an assumed distribution and estimates of distribution parameters (e.g., expected mean, variance).

• We do this by finding the probability of some value, X, relative to its underlying distribution (or pdf/cdf).– Value for X given X ~ Bin (15, 0.2)– Value for X given X ~ Normal (81, 0.252)

P(type I error) - Example

• Suppose 1% of books for a particular textbook fail a binding test. You wish to test the hypothesis that p=0.01 when X ~ Bin(200, 0.01).

– Test Statistic: X is the number of test failures.– Rejection Region: Conclude the failure rate has increased (reject

Ho) if you draw a sample with X >= 5. Accept Ho X<=4

– Identify a Ho and Ha for this situation.– If Ho is true, what is the probability that you will observe 5 or more

failures? [Hint: 1 - Pr(X <= 4)]– When Ho is true, what % of the time will you make a wrong

conclusion?– If you wanted to be 95% confident in your decision, would you

conclude the Ho is true? What is alpha in this example?

P(type II error) ~ • P(type II error) ~ prob conclude no difference exists, when a

difference exists. In other words, failure to detect some difference, .

– Example: suppose shifts some , so new= + – A type II error would be a failure to detect shift (i.e., conclude

no difference exists even though a shift has occurred)

• Unliketheerror does not exist for a unique value of test parameter (rejection limit). Rather, it varies based on the value for the amount of difference, , you are trying to detect given some sample size. – Ho: p = 0.01 (single value); Ha: p > 0.01 (many values exist)

Binding test example, Find

• If the fraction defective, p, equal to 0.01; n = 200.– Rejection Region: X >= 5

• What is the probability that you will fail to detect a shift in p from 0.01 to 0.015 (n=200)?

• What is the probability that you will fail to detect a shift in p from 0.01 to 0.05 (n=200)?

Engine Cylinder: & Errors

• Suppose you take a sample of 25 cylinders – If X ~ N( = 81, x = 0.302) x-bar = ?– Suppose you reject batch if sample mean <= 80.9.

• What is Pr(type I error)?– P(Ho is rejected when Ho is true)?

• What is Pr(type II error) if true mean shifts to 80.9 mm?

• What is Pr(type II error) if true mean shifts to 80.75 mm?

/ and Rejection Regions

• Suppose you wish to reduce type I errors by increasing the rejection region, what happens to error if you maintain the same test and same sample size, n.

• In our prior example, what is Pr(type I error) if rejection region is expanded to 80.75?

• What happens to Pr(type II error) if rejection region is expanded to 80.75 and you try to detect a shift from 81 - 80.75?

• Identify general statement about , and rejection regions given same experiment and fixed n.

/ and Sample Size

• Suppose you wish to increase the sample size from 25 to 50.

• If If X ~ N( = 81, x = 0.302)

• What happens to Pr(type I error) if n increases to 50?

• What happens to Pr(type II error) if n increases to 50 and you try to detect a new mean = 80.75?

• Comment about , and n.

Recommended