35
4 HYPOTHESIS TESTING 49 4 Hypothesis testing In sections 2 and 3 we considered the problem of estimating a single parameter of interest, θ. In this section we consider the related problem of testing whether or not θ equals a particular value of interest, or lies in a particular range of values of interest. Estimation and hypothesis testing can be thought of as two related (dual) aspects of the inference problem, as we shall see later. 4.1 Types of hypothesis and types of error Suppose X 1 ,X 2 ,...,X n are an independent random sample from a probability density function f X (x|θ). Instead of estimating θ, we now wish to use the sample to test hypotheses about θ. Definition 4.1.1: Simple and composite hypotheses We define a hypothesis to be an assertion or conjecture about θ. If the hypothesis com- pletely specifies the distribution of X , it is called a simple hypothesis. Otherwise it is called a composite hypothesis.

4 Hypothesis testing - Newcastle University

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

4 HYPOTHESIS TESTING 49

4 Hypothesis testing

In sections 2 and 3 we considered the problem of estimating a single parameter of interest,θ.

In this section we consider the related problem of testing whether or not θ equals aparticular value of interest, or lies in a particular range of values of interest.

Estimation and hypothesis testing can be thought of as two related (dual) aspects of theinference problem, as we shall see later.

4.1 Types of hypothesis and types of error

Suppose X1, X2, . . . , Xn are an independent random sample from a probability densityfunction fX(x|θ).

Instead of estimating θ, we now wish to use the sample to test hypotheses about θ.

Definition 4.1.1: Simple and composite hypotheses

We define a hypothesis to be an assertion or conjecture about θ. If the hypothesis com-pletely specifies the distribution of X , it is called a simple hypothesis. Otherwise it iscalled a composite hypothesis.

4 HYPOTHESIS TESTING 50

Example 4.1.1

Suppose we take an independent random sample X1, X2, . . . , Xn from a random variableX ∼ N(µ, σ2).

Conisder the following hypotheses. Which are simple and which are composite?

(i) H1 : µ = 100, σ = 15;

(ii) H2 : µ > 100, σ = 15;

(iii) H3 : µ > 100, σ = µ/10;

(iv) H4 : µ = 100;

(v) H5 : σ = 15;

(vi) H6 : µ < 100.

Solution:

Comparing two hypotheses

Usually in hypothesis testing we compare two hypotheses, the first, called the null hy-pothesis is

H0 : θ ∈ ω

and the second, the alternative hypothesis is

H1 : θ ∈ ω̄

where ω ⊂ S, ω⋃

ω̄ = S, ω⋂

ω̄ = ∅ and S is the set of all possible values for the param-eter θ of the distribution of the random variable X .

4 HYPOTHESIS TESTING 51

Example 4.1.2

We are interested in whether a new method of sealing light bulbs increases the averagelifetime of the bulbs.

Here, if θ is the mean lifetime of the bulbs sealed by the new method, and we know themean lifetime of standard bulbs is 140 hours, our hypothesis test will be a test of

H0 : θ = 140 versus H1 : θ > 140.

Now suppose we assume that the lifetime X of a new bulb follows an Exponential distri-bution, i.e. X ∼ Exp(1/θ).

Which of H0 and H1 is simple and which is composite?

What are the sets S, ω and ω which define this hypothesis test?

Solution:

Definition 4.1.2: Acceptance region and rejection region

Let A be the sample space of X, i.e. the set of all possible values of a random sample ofsize n from X . A test procedure divides A into subsets A0 and A1 (with A0

A1 = A,A0

A1 = ∅) such that if

X ∈ A0, we accept H0

and if

X ∈ A1, we reject H0 and accept H1.

A0 is called the acceptance region and A1 the rejection region of the test.

4 HYPOTHESIS TESTING 52

Definition 4.1.3: Type I error and type II error

When performing a test we may make the correct decision, or one of two possible errors:

(i) Type I error: reject H0 when it is true;

(ii) Type II error: accept H0 when it is false.

The type I error is usually regarded as the more serious mistake. The probabilities ofmaking type I and type II errors are usually denoted by α(θ) and β(θ) respectively.

Example 4.1.3

Now returning to the lightbulbs sealed by the new method in Example 4.1.2, suppose thatonce again we wish to test:

H0 : θ = 140 versus H1 : θ > 140,

and we collect some data consisting of ten measurements of lifetimes x1, . . . , x10.

Suppose we choose to accept H0 if the sample mean x̄ satisfies x̄ < 150, and to reject H0

(and hence accept H1) if x̄ ≥ 150.

What are the sample space, the acceptance region and the rejection region for this test?

What are the Type I and Type II errors in this specific case?

Solution:

4 HYPOTHESIS TESTING 53

In Sections 4.2 to 4.6 we will develop the ideas of hypothesis testing by study-ing the main important cases.

4.2 Inference for a single Normal sample

For this section we will assume that X1, X2, . . . , Xn is an i.i.d. random sample from aN(µ, σ2) distribution.

For the time being, we assume σ2 is known, i.e. a constant.

Moreover, a particular value µ = µ0 for the population mean has been suggested byprevious work or ideas. In this case the null hypothesis is denoted by

H0 : µ = µ0.

There are a variety of options for the alternative hypothesis. Commonly used alternativehypotheses are:

(A) H1 : µ = µ1 > µ0 (µ1 fixed constant)

(B) H1 : µ = µ1 < µ0 (µ1 fixed constant)

(C) H1 : µ > µ0

(D) H1 : µ < µ0

(E) H1 : µ 6= µ0.

Example 4.2.1

Suppose the marks for a particular test are believed to follow a N(µ, 100) distribution,and the null hypothesis is H0 : µ = 50.

In which category (A) - (E) are each of the following alternative hypotheses:

1. H1 : µ < 50; 2. H1 : µ = 57; 3. H1 : µ 6= 50?

Solution:

4 HYPOTHESIS TESTING 54

Alternative (E) is the most commonly used, and the easiest to justify in most real–lifesituations. All the others assume some knowledge which it is usually unrealistic to assume.

The null and alternative hypotheses are treated in the following way: we adopt the nullhypothesis unless there is evidence against it.

The test statistic we choose to use for a single Normal sample is X̄ , the sample mean.

It makes sense to test a hypothesis about the population mean µ using the sample meanX̄ , but more than this, we know the distribution of X̄ under the null hypothesis, whichis crucial.

If H0 is true, X1, . . . , Xn are i.i.d. N(µ0, σ2) random variables, and so

X̄ ∼ N(

µ0, σ2/n

)

⇒ Z =X̄ − µ0

σ/√n

∼ N(0, 1).

We now need to decide for which values of the test statistic we will reject H0. Thesevalues will comprise the rejection region A1.

We reject H0 in cases

(A) or (C) : if Z is sufficiently far into the right-hand tail;

(B) or (D) : if Z is sufficiently far into the left-hand tail;

(E) : if Z is sufficiently far into either tail.

In case (E) the rejection region is split between the tails of the distribution giving a two-

tailed test. The other cases are one-tailed tests.

If P (Type I error)=α, the test is said to have significance level α. Commonly used sig-nificance levels are 0.05 (5%), 0.01 (1%) and 0.001 (0.1%). Once the significance level ischosen, the rejection region is precisely determined.

4 HYPOTHESIS TESTING 55

Example 4.2.2

For α = 0.05, calculate the rejection regions (in terms of z) for each category of alternativehypothesis (A) - (E).

Solution:

Example 4.2.3

The widths (mm) of 64 beetles chosen from a particular locality were measured and thesample mean was found to be

x̄ = 24.8.

Previous extensive measurements of beetles of the same species had shown the widths tobe Normally distributed with mean 23mm and variance 16mm.

Test at the 5% level whether or not the beetles from the chosen locality have a differentmean width from the main population, assuming that they have the same variance.

4 HYPOTHESIS TESTING 56

Solution:

4 HYPOTHESIS TESTING 57

4.2 cont. A single Normal sample with unknown variance σ2

Now we consider hypothesis tests about µ where X1, X2, . . . , Xn is an i.i.d. random samplefrom a N(µ, σ2) distribution, and σ2 is unknown.

This is usually more realistic than assuming we know σ2, but it is also a more complexproblem. We have to estimate µ in the presence of the nuisance parameter σ2.

The solution is to replace σ2 with a suitable estimate; here we use the sample varianceS2.

Example 4.2.4

Cola makers test new recipes for loss of sweetness during storage. For one particularrecipe, ten trained tasters rate the sweetness before and after, enabling us to calculatethe change (sweetness after storage minus sweetness before storage), as follows:

Before 8.0 7.6 8.1 8.2 6.8 7.9 8.0 9.2 7.1 7.0After 6.0 7.2 7.4 6.2 7.2 5.7 9.3 7.9 6.0 4.7Change -2.0 -0.4 -0.7 -2.0 0.4 -2.2 1.3 -1.3 -1.1 -2.3

Is there evidence that in general, the storage causes the cola to lose sweetness?

Solution:

4 HYPOTHESIS TESTING 58

Solution/cont.

When we knew σ2, we used the test statistic

Z =X̄ − µ0

σ/√n,

which we know has a N(0, 1) distribution.

Now we are estimating σ2 using S2, so our test statistic becomes

T =X̄ − µ0

s/√n

,

and this has a slightly different distribution, called the Student t distribution, or just thet distribution . . .

Definition 4.2.1: The Student t distribution

If Z ∼ N(0, 1) and U ∼ χ2n are independent random variables then

Tn =Z

U/n

has a Student t-distribution on n-degrees of freedom. The distribution is denoted by tn.

4 HYPOTHESIS TESTING 59

Example 4.2.5

Sketch the t− distribution with

(a) 1;(b) 5;(c) 100

degrees of freedom.

Solution:

Figure 2: the t1, t5 and t100 distributions

0.0

0.1

0.2

0.3

0.0

0.0

0.1

0.1

0.2

0.2

0.3

0.3

0.4

pdf

pdf

pdf

t1

t5

t100

-4

-4

-4

-2

-2

-2

0

0

0

2

2

2

4

4

4

4 HYPOTHESIS TESTING 60

The t-distribution with n degrees of freedom has a p.d.f. which is symmetric and bell-shaped, like the Normal, but with somewhat thicker tails.

Smaller values of n correspond to the thickest tails. Larger values of n cause the tndistribution to be more like the Normal distribution.

All we have to be able to do is to use statistical tables or R to look up the appropriatetail probability, since the distribution of our test statistic is given by:

Tn−1 =X̄ − µ0

s/√n

∼ tn−1.

Example 4.2.6

For the cola example in 4.2.4 the test statistic was t = −2.697, and the sample size wasn = 10.

Carry out the test ofH0 : µ = 0 (no loss in sweetness);

againstH1 : µ < 0 (some loss in sweetness).

Solution:

4 HYPOTHESIS TESTING 61

4.3 Hypothesis test for two Normal means: two–sample t–test

Now suppose we have two samples (x1, x2, . . . , xn1) and (xn1+1, xn1+2, . . . , xn1+n2

), i.e.samples of sizes n1 and n2 from two different populations. We are interested in whetherthe two population means are equal.

Assuming that the data are sampled from Normally distributed populations with equal

variance, σ2, in each population, then if we want to test

H0 : µ1 = µ2 versus H1 : µ1 6= µ2

where µ1 and µ2 are the means of each population, we can perform a t-test with teststatistic given by . . .

t =x̄1 − x̄2

s√

1n1

+ 1n2

, where s =

(n1 − 1)s21 + (n2 − 1)s22n1 + n2 − 2

,

where x̄1, x̄2, s1 and s2 are the sample means and standard deviations from each popula-tion.

Here s =√s2 is the pooled estimate of the common standard deviation σ.

If the null hypothesis is true, then the test statistic comes from a t–distribution onn1 + n2 − 2 degrees of freedom, so we use the tables for tn1+n2−2 to carry out the test.

This test is called the two–sample t–test.

4 HYPOTHESIS TESTING 62

Example 4.3.1

Consider the lifetime of two brands of light bulbs. For a random sample of n1 = 12 bulbsof one brand the mean bulb life is x̄1 = 3, 400 hours with a sample standard deviation ofs1 = 240 hours.

For the second brand of bulbs the mean bulb life for a sample of n2 = 8 bulbs is x̄2 = 2, 800hours with s2 = 210 hours.

We assume that distribution of bulb life is approximately Normal, and the standarddeviations of the two populations are assumed to be equal. Test

H0 : µ1 = µ2 versus H1 : µ1 6= µ2

using a two sample t-test at the 1% level.

Solution:

4 HYPOTHESIS TESTING 63

4.4 Two Normal populations: testing the assumption of equalvariances

In Section 4.3 we had to make the assumption that our two Normal populations had equal

variance σ2. Here we see how we can carry out a hypothesis test to check this assumption!

We denote the two population variances by σ21 and σ2

2. We wish to test

H0 : σ21 = σ2

2 versus H1 : σ21 6= σ2

2.

Notice that these hypotheses don’t make any assumptions about the values of µ1 and µ2.

If the null hypothesis is true, then the ratio of sample variances

S21

S22

will have a distribution called the F-distribution, on n1−1 and n2−1 degrees of freedom.

Definition 4.4.1: The F distribution

If U and V are independent chi-square random variables such that U ∼ χ2r and V ∼ χ2

s,then

F =U/r

V/s

has an F distribution on r and s degrees of freedom. The distribution is denoted by Fr,s.

Note that the F distribution is characterized by two separate measures of degrees offreedom: r corresponds to the numerator and s corresponds to the denominator. PrintedF tables are available, and of course we can always use R (except in an exam!).

Note that it follows immediately that the reciprocal ratio of sample variances

S22

S21

will have an F distribution on n2 − 1 and n1 − 1 degrees of freedom.

In practice, we carry out the hypothesis test for equal variances as follows. We will onlyconsider the case of the two–sided alternative (“not equal”), giving rise to a two–tailedtest. In this case it is sensible to reject H0 if either s21/s

22 or s22/s

21 is large. We form our

test statistic as

F = max

{

s21s22,s22s21

}

,

and compare this with Fr,s tables, where if s11 > s22 we set r = n1 − 1 and s = n2 − 1,while if s12 > s21 we set r = n2 − 1 and s = n1 − 1. To account for the fact that under H0,these two outcomes could happen with equal probability, the significance level of the testis *double* the upper tail probability of the F distribution (obtained from tables or R).

4 HYPOTHESIS TESTING 64

Example 4.4.1

For the data in Example 4.3.1, test the assumption that the standard deviations of thetwo populations are equal.

Solution:

4 HYPOTHESIS TESTING 65

4.5 Inference for a single Binomial proportion (r not small!)

Here we consider the situation where we have a single observation x from a Binomialrandom variable X ∼ Bin(r, θ), and we are interested in testing hypotheses about θ.

Note that x can be viewed as the number of successes from r independent trials, eachwith success probability θ. In this section we consider the case where r is not small, i.e.r > 20.

We will test H0 : θ = θ0 against an alternative from one of the categories (A) to (E)above.

Example 4.5.1

UK survey of sexual behaviour: in 2004/05, 11% of UK residents aged 16–49 claimed tohave had more than one sexual partner.

Suppose that in 2008-09, a random sample of 600 UK residents in the 16–49 age–groupshows that 83 had more than one sexual partner.

Is this evidence for an increase in the population proportion having more than one sexualpartner?

Formulate this problem as a hypothesis test.

Solution:

4 HYPOTHESIS TESTING 66

We need to derive a test statistic whose distribution we can evaluate conditional on H0

being true.

We use the Normal approximation to the Binomial distribution.

I.e. ifX ∼ Bin(r, θ),

with r > 20, then to a reasonable approximation

X ∼ N [rθ, rθ(1− θ)].

(Note that the approximation involves rounding the outcome of a Normal random variableto the nearest integer! See below.)

Now suppose the null hypothesis H0 is true, i.e. θ = θ0.

Then the Normal approximation implies

X ∼ N [rθ0, rθ0(1− θ0)],

and hence the test statistic

Z =X − rθ0

rθ0(1− θ0)

has a N(0,1) distribution.

This means we can carry out a one–sample z–test exactly as we did in Section 4.2.

N.B. because of the rounding issue, it makes sense to replace x in the test statistic byx−0.5 when x > rθ0, and by x+0.5 when x < rθ0. This is called a continuity correction.

Example 4.5.2

For the sexual behaviour data in Example 4.5.1 we have r = 600, we have observed x = 83,and we want to test

H0 : θ = 0.11

againstH1 : θ > 0.11.

Carry out the hypothesis test.

4 HYPOTHESIS TESTING 67

Solution:

Notes on significance levels and p–values

1. If you are not told what level of significance to use, a sensible procedure is to test atthe 5% level. If not significant then stop, otherwise test at the 1% level. If not significantthen stop, otherwise test at the 0.1% level.

2. If you have access to the p-value, e.g. from Normal tables, or from R (see Exercises4B Questions 1 and 2) then you immediately have the result of a hypothesis test at anygiven significance level.

E.g. in Example 4.5.2 immediately above, we had p = 0.0158. It follows immediatelythat our test is significant at 5% but not at 1%, because 0.05 > p > 0.01.

4 HYPOTHESIS TESTING 68

4.6 Inference for two Binomial proportions (samples not small!)

Example 4.6.1

Consider a survey of employment carried out seperately in Northern England and Scot-land, among people who had left school six months earlier. Suppose we obtain the fol-lowing data:

Scotland Northern England TotalUnemployedEmployed

In general we have two independent samples of size n1 and n2, with each observationclassified as success or failure:

Sample 1 Sample 2 TotalSuccess O11 O12 R1 = O11 +O12

Failure O21 O22 R2 = O21 +O22

n1 n2 n = n1 + n2

Assuming all observations are independent, and that the success probability is constantwithin each sample, we have two Binomial samples. Suppose that the true probabilitiesof success are θ1 and θ2. We wish to test

H0 : θ1 = θ2

versus

H1 : θ1 6= θ2.

As always with a hypothesis test, we need to find a test statistic whose distribution isknown when H0 is true.

Now if H0 is true, then θ1 = θ2 = θ, say.

The combined samples give the number of successes in n1 + n2 trials, in each of whichthere is a probability θ of a success. So we may estimate θ by θ̂ = R1/n, where

R1 = O11 +O12 (total for first row)

n = n1 + n2 (grand total).

4 HYPOTHESIS TESTING 69

Hence, under H0, the expected number of successes in each of the samples is

E11 =n1R1

n; E21 =

n1R2

n; E12 =

n2R1

n; E22 =

n2R2

n.

whereR2 = O21 +O22 (total for second row).

To measure how closely the expected values match the observed values we calculate thetest statistic

X2 =

2∑

i=1

2∑

j=1

(Oij − Eij)2

Eij

.

Under H0, X2 has an asymptotic distribution which is a χ2

1 distribution (a “chi–squaredistribution with 1 degree of freedom”).

Definition 4.6.1: The chi–square distribution χ2n

If Z1, . . . , Zn are independent N(0, 1) random variables, then

X2 =

n∑

i=1

Z2i

has a chi–square distribution on n-degrees of freedom. The distribution is denoted by χ2n.

If H0 is true, the observed values should be close to the expected values, and so X2 willbe small. Hence we reject H0 if X2 is large enough, using Tables (or R).

4 HYPOTHESIS TESTING 70

Example 4.6.2

Consider the data in Example 4.6.1:

TestH0 : the unemployment rates are equal

against

H1 : the unemployment rates are not equal.

Solution:

4 HYPOTHESIS TESTING 71

Notes

1. The method we just described for 2 × 2 tables also works for r × c tables, that istables with r rows and c columns. The test statistic is given by

X2 =r

i=1

c∑

j=1

(Oij − Eij)2

Eij

,

and this is compared with a chi-square distribution with (r− 1)× (c− 1) degrees offreedom, i.e. χ2

(r−1)(c−1).

2. Since deviation from what is expected under H0 always corresponds to higher val-ues of X2, chi–square tests for 2 proportions (and for r × c contingency tables)are ***always*** 1–tailed, and always use the upper tail of the chi–squaredistribution!!!

4 HYPOTHESIS TESTING 72

4.7 The relationship between hypothesis tests and confidence in-tervals

Every hypothesis test we carry out has a corresponding confidence interval associatedwith it!

Example 4.7.1

For the beetle widths given in Example 4.2.3, calculate a 95% confidence interval for thepopulation mean µ.

Solution:

Looking back at that example, we can deduce immediately that 23 also lies outside the99% confidence interval, and the 99.9% confidence interval. (Exercise: check this!)

The general rule is:

The 100(1−α)% confidence interval consists precisely of all those values which would not

be rejected at the 100α% significance level.

4 HYPOTHESIS TESTING 73

4.8 Hypothesis tests: size and power function

Hypothesis tests can be described in terms of their size and power.

Definition 4.8.1: the size of a hypothesis test

Consider a particular hypothesis test on a single parameter θ. We define the size of thetest to be

supθ∈ω

{Pr(reject H0)}.

Note that for a simple null hypothesis, this is just the probability we reject H0 if it’s true,i.e. the probability of a Type I error.

For a composite null hypothesis, it is the supremum of this rejection probability over allthe values of θ for which the null hypothesis holds.

Definition 4.8.2: the power function for a hypothesis test

The power K(θ) is the probability of rejecting H0, considered as a function of θ.

A plot of the power function is helpful in determining how good our test is at rejectingthe null hypothesis when it is false.

Informally, the power of a test is often used to refer to the probability that it will rejectthe null hypothesis when it is false. However from our different categories of alternativehypothesis (A) - (E), this only makes real sense for (A) and (B), i.e. when we arecomparing two simple hypotheses.

4 HYPOTHESIS TESTING 74

Example 4.8.1

Suppose X1, . . . , X4 is a random sample from X ∼ N(µ, 36), and we wish to test

H0 : µ = 10 against H1 : µ > 10.

Note that this is a one–tailed alternative. Now suppose we base our rejection region onthe value of X̄ ; specifically we construct it as A1 = {X : X̄ > 17}.

(a) Plot the power function for this test in the range 10 ≤ µ ≤ 20.

(b) What is the size of this test?

(c) What would be the power of the test if the alternative was, in fact, H1 : µ = 22?

Solution:

4 HYPOTHESIS TESTING 75

Solution: (cont.)

Choice of rejection region

In Example 4.8.1 we found that our rejection region gave a test with desirable properties:a ‘standard’ size of 1%, and a well-defined power function.

So how can we design such a test ourselves? Fortunately there is a very useful theoremwhich helps us to define an ‘optimal’ rejection region...

4 HYPOTHESIS TESTING 76

The Neyman-Pearson Lemma

Suppose we have a random sample x1, x2, . . . , xn from a random variable X with densityfX(x|θ), and we wish to test H0 : θ = θ0 against the simple alternative H1 : θ = θ1.

Consider the Likelihood Ratio defined as

Λ(x) =L(θ0|x)L(θ1|x)

.

Suppose we define a test by rejecting H0 in favour of H1 if Λ(x) is small enough. Specifi-cally, suppose we choose a cut–off point η such that Pr(Λ(x) ≤ η|H0) = α.

Then the test based on the rejection region A1 = {x : Λ(x) ≤ η} is the most powerful test

of size α.

Now suppose we have a composite alternative hypothesis H1 : θ ∈ Θ1. If the test is themost powerful for all θ1 ∈ Θ1, then it is said to be the uniformly most powerful (UMP)test for alternatives in the set Θ1.

Notes

1. Informally, the Neyman-Pearson Lemma says that if we base our test on the valueof the likelihood ratio, then we get the best possible test (in the sense of being themost powerful).

2. Note that if we need to define a rejection region in terms of

Λ(x) =L(θ0|x)L(θ1|x)

,

it is often easier to work with

log[Λ(x)] = log[L(θ0|x)]− log[L(θ1|x)].

Example 4.8.2

SupposeX1, X2, . . . , Xn is a random sample from aN(µ, σ2) distribution where µ is known,and we wish to test

H0 : σ2 = σ2

0 versus H1 : σ2 = σ2

1 ,

where σ20 < σ2

1.

Find an appropriate test statistic on which to base a rejection region.

4 HYPOTHESIS TESTING 77

Solution:

4 HYPOTHESIS TESTING 78

4.9 Small sample methods

In this section we consider statistical inference (estimation and hypothesis testing) insituations where the sample size is small.

The crucial change from large–sample methods is that we can no longer rely on theasymptotic distribution of either the maximum likelihood estimator, or the test statistic,in a hypothesis test.

In fact the cases for one and two Normal means have already been dealt with, becausethe adjustments made to deal with unknown variance (t–tests!) work for arbitrarily smallsamples.

The cases which need special treatment are the cases of (a) inference on one Binomialproportion, and (b) the comparison of two Binomial proportions . . .

4.9.1 Inference for a single Binomial proportion (r is small!)

Suppose we have a single observation x from a Binomial random variable X ∼ Bin(r, θ),and we want to test hypotheses about θ.

This is the same kind of problem as we considered in Section 4.5, but this time we assumethat the number of trials r is small, i.e. ≤ 20. The crucial difference is that the Normalapproximation is now too poor to use, and we should use the Binomial distribution di-rectly (using Tables or R).

The fact that we are now working with a genuinely discrete distribution leads to a com-plication: we cannot carry out a test precisely for any specified significance level; we haveto use the nearest approximate significance level.

Example 4.9.1

A leading cat–food manufacturer has a slogan which could be interpreted as follows:

“80% of cats prefer our product.”

In an experiment to test this, 20 cats are each given the choice between the product inquestion, Brand W, and the leading market competitor, Brand X.

Result: 12 cats go for Brand W, and 8 cats go for Brand X.

Is this evidence against Brand W’s claim?

4 HYPOTHESIS TESTING 79

Solution:

4 HYPOTHESIS TESTING 80

4.9.2 Inference for two Binomial proportions (small samples!)

Here we consider the same kind of problem as in Section 4.6, i.e. two Binomial proportions,with the data arranged in a 2× 2 contingency table.

Here we consider the case when one or both samples are small.

Example 4.9.2

A small study into the dieting habits of teenagers is undertaken, to investigate whetheror not the proportions of males and females who diet are equal.

Suppose the population proportions of males and females who are dieting at any one timeare denoted by θM and θF respectively.

We wish to test:H0 : θM = θF against H1 : θM 6= θF .

A random sample of 12 boys and 12 girls is selected, and we ascertain whether eachindividual is currently on a diet.

Data:

Table 4.1

boys girls Total

dietingnot dieting

Total

4 HYPOTHESIS TESTING 81

It certainly appears that in the population, girls are more likely to be dieting, since inour sample:

9 out of 12 girls are dieting;1 out of 12 boys are dieting.

The question is:

“How significant are these results?”

In other words, how much evidence do we have against H0 : θM = θF ?

The way we answer this is that we assume the row totals and the column totals are fixed atthe observed values. We then assume that H0 is true (as ever!) and we ask, how unlikelyis the result we have observed?

In other words:

If we were to choose 10 of the teenagers at random, what is the probabilitythat 9 of them would be among the 12 girls, and only 1 from among the 12boys?

The p–value for this test will be the probability of all outcomes which are as extreme asthis one, or more so . . .

Solution:

We introduce the notation:

boys girls Total

dietingnot dieting

Total

4 HYPOTHESIS TESTING 82

Table 4.2

boys girls Total

dietingnot dieting

Total

4 HYPOTHESIS TESTING 83

Review of Section 4

In this section we have:

1. Introduced the principles of hypothesis testing.

2. Seen how to carry out hypothesis tests in some specific cases when the sample sizen is reasonably large:

(a) the mean of a single Normal population (variance known and unknown);

(b) the means of two Normal populations (variances assumed equal);

(c) the variances of two Normal populations;

(d) the success probability for a single Binomial proportion;

(e) the success probabilities for two Binomial proportions;

3. Introduced three new probability distributions needed to carry out the tests: theStudent t distribution, the F distribution and the chi–square distribution.

4. Learned how to use Statistical Tables to carry out the tests at specific significancelevels.

5. Learned how to use R to do some of these tests, and to interpret the precise p–valueobtained.

6. Understood the relationship between hypothesis tests and confidence intervals.

7. Considered the properties of hypothesis tests, namely size and power.

8. Seen how we may construct the most powerful tests using the Neyman-Pearson

Lemma.

9. Seen how to carry out hypothesis tests relating to the success probability of a singleBinomial distribution when the number of trials is small (≤ 20);

10. Seen how to carry out hypothesis tests to compare two Binomial proportions whenthe sample sizes in a 2× 2 contingency table are small. [Note that for the cases ofone Normal mean and two Normal means, the methods we developed in Section 3(z-tests and t-tests) already work for arbitrarily small samples.]