8

Click here to load reader

Agresti 9 Notes 2012

Embed Size (px)

DESCRIPTION

Agresti 9 Notes 2012

Citation preview

Page 1: Agresti 9 Notes 2012

Agresti Ch. 9

Statistical Hypothesis – a conjecture about a population parameter tested using sample data, using the understanding of sampling error in sampling distributions Steps in significance tests of hypotheses: 1) Conditions/Assumptions specify the number and type of variable(s) and what the tested parameter represents

and then make any necessary assumptions pertaining to data collection, sample size, and shape of sampling or population distribution

2) Hypotheses specify a null hypothesis (H0), typically a single parameter value indicating ‘no effect’,

and an alternative hypothesis (Ha), a set of alternative parameter values 3) Test-statistic collect your sample data and measure the distance between the sample statistic and the

hypothesized population parameter value, yielding a z or t as the test statistic or a more complicated computation could yield F or χ2

4) P-value P-value is the probability of getting a sample statistic (such as the mean) or a more extreme sample statistic in the direction of the alternative hypothesis when the null hypothesis is true 5) Conclusion report and interpret the p-value in the context of the study

make a decision about the null hypothesis based on the p-value: is the probability of the sample result so small that you should reject the null?

discuss the real-world implications of your decision. Conditions/Assumptions:

Number and type of variable(s) What the tested parameter represents

Random sampling is almost always required Sample size considerations are often important for identifying the shape of sampling distribution Other assumptions may be necessary to identify the population or sampling distribution

Page 2: Agresti 9 Notes 2012

Agresti Ch. 9, p. 2

1/31/2012

Hypotheses: Examples based on variable type and nature of the claim: Mean Proportion Claim: Textbooks typically cost Claim: Most Americans think about $450/semester. continued fighting in Afghanistan is justified. μ represents the average textbook cost p is the proportion thinking invasion justified a single value is claimed, so this is two-sided a range of possibilities is claimed, so this situation is one-sided We specify two hypotheses, null and alternative: H0: μ = 450 H0: p ≤ 0.50 Ha: μ ≠ 450 Ha: p > 0.50 Null hypothesis, H0, states that the parameter is a particular value [or in a range of values the opposite of that which is expected or desired]. Alternative hypothesis, Ha, states that the parameter differs from a particular value or is in a range of expected or desired values. Hypotheses can be tested multiple ways, but the procedure will not change the appropriate set of hypotheses. Three types of pairs of hypotheses, using means here, are possible: (names focus on the alternative) Two-tailed Right-tailed (GT) Left-tailed (LT) H0: μ = μ0 H0: μ = μ0 [or μ ≤ μ0] H0: μ = μ0 [or μ ≥ μ0] Ha: μ ≠ μ0 Ha: μ > μ0 Ha: μ < μ0 where μ0 is the numerical value of the population mean specified in the claim The right-tailed and left-tailed are generically known as one-tailed. You will typically find the null specified as the = case, even when the alternative is one-sided.

Technically, we must allow for a range of values when the alternative is one-sided. In those case, we test against the ‘borderline’ value specified in the null.

Page 3: Agresti 9 Notes 2012

Agresti Ch. 9, p. 3

1/31/2012

How to decide what hypothesis goes where: 1) If you are doing a two-tailed test, your null must be the equality, not so much because we prefer null hypotheses to include the equality relationship (although we often do), but because, when choosing between = and ≠, only the = hypothesis could potentially be rejected. It is impossible to reject the ≠ hypothesis. This may mean that the claim ends up in the null, even if we would prefer it be in the alternative. 2) If you are doing a one-tailed test, you should put your claim/hope in the alternative, because the strongest conclusions occur when you reject the null, and then say that the alternative must be true because you ruled out everything else. If you put your claim in the null, the best you can say is “this could be true,” or “I couldn’t rule this out” -- much weaker statements. This is also justified by realizing that the null is assumed true unless evidence to the contrary is found – you would rather not assume your claim to be true; you should prove it by ruling out the null instead. 3) If you have a one-tailed test and there is no clear preferred alternative, you may want to choose the

hypotheses so that the most severe consequence of error is a Type I error.

Actual Situation H0 True H0 False Conclusion Reject H0 Type I Error No Error

Based on (Probability α)

Hypothesis Test Not reject H0 No Error (Probability 1-α)

Type II Error

Examples (from Hawkes and Marsh, p. 501): For the following situations, identify the appropriate H0 and Ha and state what the consequences would be for Type I and Type II errors. a. A company which manufactures one-half inch bolts selects a random sample of bolts to determine if the diameter of the bolts differs significantly from the required one-half inch. b. A company which manufactures safety flares randomly selects 100 flares to determine if the flares last at least three hours on average. Test Statistic Depends on the distribution of the sample statistic In turn related to assumptions, sample size, etc. Proportions: if sample size large enough, p̂ is normal, and you can create Z as test statistic Means: if sample large enough, x is normal, and you can create t as test statistic

Page 4: Agresti 9 Notes 2012

Agresti Ch. 9, p. 4

1/31/2012

P-value Determine the likelihood of the test statistic or a more extreme one (or, equivalently, the underlying sample statistic that generated the test statistic) if the null hypothesis was true

In one-tailed cases, we figure out the probability of getting values beyond the test statistic in the direction of the alternative. If we have a right-tailed alternative and are using the Z distribution, we would compute P(Z≥Zcalc) as the P-value. If we have a left-tailed alternative and are using the Z distribution, we would compute P(Z≤Zcalc) as the P-value. Zcalc refers to the calculated value of the test statistic. We can easily dismiss the possibility of rejecting the null when the sign is ‘wrong’, i.e., positive for LT alternative, meaning p̂ exceeded p0, or negative for a GT alternative, meaning p̂ was less than p0.

For two-tailed cases, we would find the probability in the tail consistent with the result and then double that value.

Conclusion Based on how small the P-value is, we either reject or fail to reject the null hypothesis. Some people explicitly use an α cut-off; others report the p-value and leave it to others to decide if the value is so rare as to justify rejecting the null. Remember that we never explicitly accept a null hypothesis. Once the decision is made, we relate it to the original question.

Agresti and Franklin, p. 411

Agresti and Franklin, p. 418

Page 5: Agresti 9 Notes 2012

Agresti Ch. 9, p. 5

1/31/2012

Z test for a Proportion – Assumptions: A single categorical variable, random sampling, np0≥15 and n(1-p0)≥15, so normality applies Types of hypotheses: Two-tailed Right-tailed (GT) Left-tailed (LT) H0: p = p0 H0: p = p0 (or p ≤ p0) H0: p = p0 (or p ≥ p0) Ha: p ≠ p0 Ha: p > p0 Ha: p <p 0 where p0 is the numerical value of the population proportion specified in the claim

Test-statistic:

npp

ppse

ppZ)1(

ˆˆ

00

0

0

0

−−

=−

=

P-value: Right-tail probability for GT alternative; left-tail for LT alternative, two-tail for NE alternative Conclusion: Smaller P-values give stronger evidence against H0. If decision needed, compare P-value to α: If P-value<α, reject H0. Examples Suppose you are arguing about how many older teens text while they drive, and you claim that it is more than one in five. A recent Pew Research survey of 800 older teens revealed that 26% of teens admit to having texted while driving. Is there sufficient evidence at the 0.05 level of significance to conclude that the proportion of older teens texting while driving differs from one in five?

Page 6: Agresti 9 Notes 2012

Agresti Ch. 9, p. 6

1/31/2012

t test for the mean – Conditions/Assumptions: A single quantitative variable, random sampling, x normal,

usually gotten by assuming a normal population or having a sample larger than 30. Hypotheses: Two-tailed Right-tailed (GT) Left-tailed (LT)

H0: μ = μ0 H0: μ = μ0 [or μ ≤ μ0] H0: μ = μ0 [or μ ≥ μ0] Ha: μ ≠ μ0 Ha: μ > μ0 Ha: μ < μ0

where μ0 is the numerical value of the population mean specified in the claim

Test-statistic:

ns

xse

xt 00 µµ −=

−=

P-value: Right-tail probability for GT alternative; left-tail for LT alternative, two-tail for NE alternative P-values for t-statistics are more difficult than for Z-statistics, where we could use tables for exact values. The best one can do using tables is to find the interval within which the p-values lies. Fortunately, p-values are commonly provided by computer software. You can also use the tdist function in Excel to get exact p-values. Conclusion: Smaller P-values give stronger evidence against H0. If decision needed, compare P-value to α: If P-value<α, reject H0. Example (Bluman, p. 415): The average production of peanuts in the state of Virginia is 3000 pounds per acre. A new plant food has been developed and is tested on 60 individual plots of land. The mean yield with the new plant food is 3120 pounds of peanuts per acre with a standard deviation of 578 pounds. At α = 0.05, can one conclude that the average production has increased? The well-known ‘normal’ temperature for humans is 98.6. A recent study decided to test this value using a sample of 130 adults. If the mean of the sample was 98.25, and the standard deviation was 0.73, is there sufficient evidence at the α = 0.05 level to conclude that average temperature differs from 98.6? (from Shoemaker. A. 1996. Journal of Statistics Education v.4, n.2.)

Page 7: Agresti 9 Notes 2012

Agresti Ch. 9, p. 7

1/31/2012

Consider the following random sample of size eight from a normal population. Based on the sample, test the claim that the mean of the population is greater than 100 at α=0.10. 100 150 120 90 95 110 100 80 XLSTATS:

Numerical Summaries for xNumber 8 Min 80

Mean 105.625 Q1 93.75St Dev 21.61968 Median 100

Coeff of Var 0.204683 Q3 112.5Skew 1.286236 Max 150

Sample Data

Sample Size 8Mean 105.625

Standard Deviation 21.61968SE Mean 7.643712

Hypothesis Tests Confidence Intervals for �H0: � = 100 Type (2,U,L) 2

Confidence Level 0.95ME Lower Upper

H1: � � 100 18.07451 87.55049 123.6995T 0.7359

DF 7p-value = 0.24286

Alternative≠ > <

SPSS: T-Test

One-Sample Statistics

N Mean Std. Deviation

Std. Error Mean

VAR00001

8 105.6250 21.6197 7.6437

One-Sample Test

Test Value = 100

t

df

Sig. (2-tailed)

Mean Difference

95% Confidence Interval of the

Difference Lower Upper

VAR0000

.736 7 .486 5.6250 -12.4495 23.6995

Page 8: Agresti 9 Notes 2012

Agresti Ch. 9, p. 8

1/31/2012

Confidence Intervals and Hypothesis Testing If a value is not included in the confidence interval, a two-tailed hypothesis test using the value will lead to rejection of the null hypothesis. If a value is in the confidence interval, the null will not be rejected. This does not apply to one-tailed tests, since they use α, leading to lower critical values than what is used in two-tailed tests and confidence intervals. Some statisticians advocate abandoning formal hypothesis tests altogether, with and emphasis on confidence interval instead. confidence intervals tell us all plausible values of the population parameter one-tailed tests can be ‘easier,’ regarded as loosening standards Misinterpretations of Results of Significance Tests “Do not reject H0” does not mean “Accept H0” Statistical significance does not mean practical significance The P-value cannot be interpreted as the probability that H0 is true. It is P(test statistic takes observed value or beyond in tails | H0 true) Not P(H0 true | observed test statistic value) It is misleading to report results only if they are ‘statistically significant’ Some tests may be statistically significant just by chance