Hypothesis Testing Notes

College Of Science, Engineering & Technology

School of Life and Physical Sciences

Foundation Studies

General Mathematics B

Hypothesis Testing

Hypothesis Testing1 Hypotheses

(i) What is a Hypothesis?

An hypothesis is an assumption, a statement made to explain a set offacts and to form a basis of further investigation. It is understood thatthe statement is subject to proof or checking.

(ii) Examples of hypotheses, or statements, made are:

- 25% of all males over the age of 50 are divorced,

- the average length of time spent combing one's hair is6 minutes/day,

- trains only run on time 5% a day,

- the average weekly income per family is $900 ,

- females spend more time watching television than males.

All these hypotheses have one thing in common. The populations ofinterest are so large that for various reasons it would not be feasibleto study all the items, or persons in the population.

(iii) What is Statistical Testing?

One of the major roles of statisticians in practice is to draw conclusionsfrom a set of data. This process is known as statistical inference butit must always be borne in mind that, whatever conclusion is reached,it can always be wrong. However, in many circumstances we can puta probability on whether our conclusion is correct and so we can makea decision that we could say is 'beyond reasonable doubt'.This process is called statistical testing.

Statistical testing begins with a hypothesis - an assumption about thevalue of a population parameter, (which is usually the mean). A sample is taken from the population, and the value of the sample mean is calculated. A decision then has to be made. If there is no significant difference between the values, the hypothesis may be accepted. However, if there is a difference, the hypothesis may be rejected.

These decisions are made on the significance (or size) of the difference.

2

(iv) In hypothesis testing, the hypothesis is not accepted or rejected withabsolute certainty, but with a definite level of confidence that the errorin the decision is small.

Hypothesis testing starts with an assumed value of the population mean, and sampled data is collected to test the assumption made with a specified level of confidence.

(v) Null Hypothesis

The null hypothesis is the hypothesis that is to be tested.The null hypothesis is denoted by H0 .

H - stands for hypothesis0 - implies nothing has changed.

H0 : m = m0 suggests that the population mean m , is as it claims to be,in other words, there is no difference between what is observed andwhat is claimed.

Generally speaking, the null hypothesis is set up for the purpose ofeither rejecting or accepting it. Alternatively, it is a statement that willbe accepted if the sample data fails to provide us with convincing evidence that it is false.

(vi) Alternative Hypothesis

The alternative hypothesis describes what you would believe by rejecting the null hypothesis. It is denoted H1 , and read as 'H one'. The alternative hypothesis (H1) will be accepted if the sample data provide us with evidence that the null hypothesis (H0) should be rejected.

The alternative hypothesis (H1) is the statement that will be accepted if the data from the sample provide us with enough evidence that the null hypothesis should be rejected (ie H0 is false).

(vii) Classification of Hypotheses

The specific wording of a hypothesis for a question should always be expressed in terms of the data for that question. For example, consider the following question:

Is the population mean m equal to a specified value?

3

Hypotheses:

H0 : The population mean m is equal to the specified value.v.

H1 : The population mean m is not equal to the specified value.

Another way of expressing the null and alternative hypotheses is in the form of symbols. In the case where we have only a single sample, this takes the form in the null hypothesis of the population mean m taking on a specified value m0 . That is:

H0 : m = m0

(viii) Two-tailed Test

In every case, the alternative hypothesis is the complement of the null hypothesis.

A two-tailed test assumes no preconceived notions about the truevalue of m . That is, the true value of m can either be above orbelow the hypothesized value of m0 .

The alternative hypothesis is then written as:

H1 : m ¹ m0

(ix) One-tailed Test

A one-tailed test assumes that there is a stronger conviction aboutthe true value of m . That is, the true value of m can be greater than m0 . In this case:

H0 : m = m0

H1 : m > m0 , or ,

the true value of m can be less than m0 .

4

In this case:

H0 : m = m0

H1 : m < m0

(x) Significance Levels

The null hypothesis is rejected if the probability of obtaining aresult is unlikely as the one which was obtained is small.How small? This is a question to which the answer is arbitraryand depends to some extent on the use that is to be made of theinvestigation. It is necessary to choose a probability and agreethat probabilities below this are 'unlikely'. The value which ischosen is called the significance level and it measures the probability of rejecting the null hypothesis when it is true.

The level of significance is the risk we assume if rejecting the null hypothesis (H0) when it is actually true.

The level of significance is designated by a (the Greek letteralpha). It is also known as the level of risk. This may be amore appropriate term because it is the risk taken if you rejectthe null hypothesis when it is really true.

The most common significance level used is 0.05 (oftencalled the 5% significance level) which is commonly writtenas a = 0.05 . Another widely used level is a = 0.01 (or the1% significance level). Although in theory any significancelevel may be used, these two are by far the most popular.If we use, say, a 5% significance level, what we are saying ineffect is that an event (or sample) that occurs less than 5% ofthe time is considered unusual. In this case, we will reject H0

as being false if the probability of obtaining a sample like oursis less than 0.05 and accept H0 as being true if this probability is more than 0.05 .

If we use, for example, a 1% significance level then we aresaying that an event (or sample) that occurs less than 1% ofthe time is considered unusual. In this case, we will rejectH0 , as being false if the probability of obtaining a sample likeours is less than 0.01 and accept H0 as being true if thisprobability is more than 0.01 .

5

(xi) Errors

In performing a hypothesis test, a statistician must be awareof the consequences of drawing the wrong conclusion.These consequences assist in deciding which significancelevel to use. In effect there are two possible errors that canbe made when making a conclusion about a null hypothesis.These are:

(xii) Type I Error

This error occurs when you reject H0 as being false whenH0 is really true. The probability of making a type I erroris the significance level of the test. A type I error is designated by the Greek letter a .

A type I error occurs if the null hypothesis (H0) is rejectedwhen it is actually true.

(xiii) Type II Error

This error occurs when you accept H0 as being true whenH0 is really false. The probability of making a type II erroris denoted by the Greek letter b (beta). Of course wewould like to avoid both errors as much as possible. Unfortunately, in trying to avoid one of them we increase thechance of making the other one.

A type II error occurs if we accept the null hypothesis (H0)when it is actually false.

(xiv) Summary

The table below, (table 1), summarises the relationshipbetween rejecting/accepting H0 and whether or not H0

is true is shown below in terms of type I and type II errors:

6

decision H0 true H0 false

reject H0 type I error no error made

accept H0 no error made type II error

table 1

Example 1

In a courtroom, of a murder case for example, we must testthe hypothesis:

H0 : the defendant is innocentH1 : the defendant is guilty

It is up to the prosecutor to show reasonable evidence toconvict.

(a) A type I error occurs when we reject H0 , whenH0 is true, in other words a jury convicts aninnocent person.

(b) A type II error occurs when we accept H0 ,when H0 is false, in other words the jury findsa guilty person innocent.

If a jury finds an innocent verdict, this means that there arenot sufficient evidence to show his guilt.

Example 2

A company has developed a drug which it feels may be a curefor certain types of cancer. It has collected vast amounts ofdata as a result of clinical trials and has asked you whether thedrug actually works. The null and alternative hypotheses(in words) are:

H0 : the drug does not workH1 : the drug does work

7

(a) A type I error occurs when it is concluded that thedrug works when in fact it doesn't.

If we want to avoid a type I error then a small valueof a should be chosen, say a = 0.01 .

(b) A type II error occurs when it is concluded that thedrug does not work when in fact it does work.

2 z-test Statistic

(i) What is a Test Statistic?

A test statistic is a value, determined from sample information,used to accept or reject the null hypothesis: H0 : m = m0 .

We will deal with the case of a single sample being chosen froma population and the question of whether that particular samplemight be consistent with the rest of the population. Exactly which test statistic is appropriate depends on the informationavailable. However, it is very important that the correct one isused since the use of an incorrect test statistic can lead to anincorrect conclusion.

In calculating the value of a test statistic, it will be assumedthat the following information will always be available:

1 the size (n) of the sample,2 the mean ( x ) of the sample,3 the standard deviation (s) of the sample.

(ii) z-test statistic

A z-test statistic is used when the size of the sample is morethan 25 , (n > 25) .

(a) If the standard deviation of the population, s , isknown, then:

8

0

n

xz

sm

(b) If the value of s is unknown, the standard deviationis approximated by the sample deviation s , then:

(iii) Standard Error

The expression is referred to as the standard error of the

mean.

3 The Critical Value

(i) The critical value is the value of the test statistic which issignificant. By significant we mean the value that leadsto the rejection of the null hypothesis.

A critical value for a z-test statistic is denoted by zc .

The critical value (zc) is the dividing point betweenthe region where the null hypothesis is rejected ornot rejected.

(ii) The particular critical value to use depends on two things:

1 whether we are using a one-tailed or two-tailed test, and2 the significance level used. ( a = 0.01 or a = 0.05)

9

There are four cases:

case 1: two-tailed test with a = 0.05case 2: two-tailed test with a = 0.01case 3: one-tailed test with a = 0.05case 4: one-tailed test with a = 0.01

(iii) Case 1: Two-tailed Test with a = 0.05

H0 : m = m0

H1: m ¹ m0 a = 0.05

The critical value zc = 1.96 and -1.96 are obtained by considering the z-score when 95% of the region under a normal curve is acceptable, (figure 1):

region of rejection region of region of rejection (0.025) acceptance (0.025)

(0.95)

-1.96 0 1.96 z scale

figure 1

(iv) Case 2: Two-tailed Test with a = 0.01

H0 : m = m0

H1: m ¹ m0 a = 0.01

The critical value zc = 2.58 and -2.58 are obtained by considering the z-score when 99% of the region under a normal curve is acceptable, (figure 2):


(0.99)

0 z scale

-2.58 2.58 figure 2

10

(v) Case 3: One-tailed Test with a = 0.05

(a) H0 : m = m0 (b) H0 : m = m0

H1: m < m0 or H1: m > m0

with a = 0.05

The critical values are zc = - 1.645 for (a) or zc = 1.645 for (b) .These values are obtained by considering the z-score when 95% of the region under a normal curve is acceptable, (figure 3):

(a)

region of rejection region of

(0.05) acceptance (0.95)

- 1.645 0 z scale

(b)

acceptance region of region of rejection

(0.95) (0.05)

0 1.645 z scale

figure 3

(vi) Case 4: One-tailed Test with a = 0.01

(a) H0 : m = m0 (b) H0 : m = m0

H1: m < m0 or H1: m > m0

with a = 0.01

11

The critical values are zc = -2.33 for (a) or zc = 2.33 for (b) .These values are obtained by considering the z-score when 99% of the region under a normal curve is acceptable, (figure 4):

(a)

region of rejection (0.01)

-2.33 0 z scale zc

(b)

region of rejection(0.01)

z scale 0 2.33 zc

figure 4

Example 3

The efficiency ratings of BHP steelworkers at the Newcastle plant have been studied over a period of many years and found to be normally

distributed. The arithmetic mean of the workers is 150 , and the

standard deviation is 12 . Recently, however, young employees

have been hired and new training and production methods have commenced. The latest sample of 100 workers revealed a sample mean

of 152.7 . Test the hypotheses that the mean of 150 is still correct at:

(a)(b)

12

Solution

(a)

Note, this is a two-tailed test because the alternative hypothesis does not give a direction of the difference. That is, it does not state whether the mean is greater than or less than 150 .

sample mean = 152.7 sample size n = 100population standard deviation s = 12population mean = 150

Because we know the population standard deviation we use the

following z-test statistic formula:

which gives:

From the sample of 100 workers, the z-test statistic z = 2.25.Since 2.25 lies outside the region between –1.96 < zc < 1.96 , (case 1).

is rejected.

region of rejection region of rejection (0.025) (0.025)

-1.96 0 1.96 z scale (zc) (zc)

test statistic 2.25

13

(b)

The z-test statistic z = 2.25 (as before)

Since 2.25 is within the region between -2.58 and +2.58 (case 2) which is the region of acceptance, H0 is not rejected. We can conclude that the population mean is not different from 150 . The difference between 152.7 and 150 can be attributed to the variation due to sampling (chance). We therefore conclude that based on the sample data we do not reject the null hypothesis. We therefore assume that the null hypothesis is true.

We did not reject the null hypothesis that the population mean efficiency rating is 150 , based on sample evidence. However, we did not prove beyond doubt that H0 is true. The only way to prove beyond doubt that it is 150 is to check every efficiency rating in the population - that is, to take a 100 percent sample, which is really a census.

accept H0

reject H0 0 reject H0 z scale

-2.58 z = 2.25 2.58

It should be noted that if the z-test statistic for our example had produced a value that was less than –2.58 or greater than +2.58 (the critical values) then the null hypothesis would be rejected in favour of the alternative hypothesis. Also, another thing to remember is that as the level of significance changes so to has the outcome changed.

It is important to select the significant level before setting up the hypothesis and sampling the population. As seen in this example the decision on the null hypothesis changed when the level of significance changed.

14

Example 4

The Myer Department Store issues its own credit card (Myercard). The finance manager of credit services wants to find out if the mean monthly unpaid balance is still at $1000 as it was six months ago. A random check of 172 unpaid balances revealed the sample mean to be $1017.50 and the standard deviation of the sample $95 . Should this finance manager conclude that the mean unpaid balance on Myercards is greater than $1000 , or is it reasonable to assume that the difference of $17.50 ($1017.50 - $1000 = $17.50) is due to coincidence (or chance)?

Test the hypothesis that the mean unpaid balance is not different from the usual amount at:

(a)(b)

Solution

(a)

This is a one-tailed test

s = 95 (Note, this is the sample standard deviation) n = 172

= 1000

Because the sample standard deviation (s) is known only, we use the following z-test statistic formula:

which gives

A one tailed test at the level has a critical value zc = 1.645 (case 3(b)).

15


z scale 0 1.645 critical value 2.42

test statistic

As the test statistic (z) of 2.42 lies in the region of rejection for the null hypothesis, (i.e. it is greater than the critical value (zc) of 1.645 , then the null hypothesis (H0) is rejected or the alternate hypothesis (H1) is accepted.

Therefore the decision is: The mean unpaid balance on Myercard is greater than the usual amount of $1000 .

(b)

The z-test statistic = 2.416 (as before)

A one tailed test at the level has a critical value zc = 2.33 (case 4(b)).

As before this z-test statistic lies in the region of rejection for the null hypothesis ie z > zc.The alternate hypothesis H1 is accepted.

Example 5

Cereal packets are meant to contain 500 gm of cereal. To check the accuracy of this statement, 100 packets were randomly selected and showed a mean of 497 gm with a standard deviation of 20 gm. Is the manufacturer under filling the packets?Perform a hypothesis test at the 5% level.

16

Solution

z-test statistic

A one-tailed test at has a critical value zc = -1.645 (case 3(a)). We accept the null hypothesis as the z-test statistics lies in the region of acceptance.


-1.645 0 z scale zc

-1.5 test statistic

Example 6

The personnel department of a company has been surveying employees and asking them how long it takes for them to travel from home to work each morning. It found that the distribution of times was skewed to the right with a mean of 21.6 minutes and a standard deviation of 7.2 minutes.

A random sample of 25 employees in the accounts section took an average of 24.1 minutes to travel to work. Are these employees different to other employees in their travel time? Test at significance level of .

17

Solution

n = 25

= 21.6

z-test statistic

Since z = 1.74 lies within the region –1.96 < zc < 1.96 (case 1).H0 is accepted.

Example 7

A taxi driver claims to make an average of $12.00 on each fare, but the Taxation Office believes that the average is higher than that. To test the driver’s claim, the Taxation Office makes a random sample of 30 fares. The amounts that the taxi driver made on the fares in the sample had a mean of $13.30 with a standard deviation of $2.50 .Test the driver’s claim at

Solution

n = 30

= 12

18

A one-tailed test at has a critical value zc = 2.33 (case 4(b)).Since the z-test statistic z = 2.85 lies in the region of rejection, H0 is rejected at .We therefore do not believe the taxi driver’s claim and conclude that there is evidence that the taxi driver makes an average of more than $12.00 on each fare.

4 t-test Statistic

(i) A small sample is one of less than 25 observations. If the population

standard deviation unknown then the z distribution is not the

appropriate test statistic. The student t , or the t distribution, as it is usually called, is used as the test statistic.The characteristics of student’s t distribution were developed by William S Gossett, a brewmaster for the Guinness Brewery in Ireland, who published his finding in 1908 using the pen name ‘Student’. Gossett was concerned with the behaviour of the z –statistic formula:

when s had to be used as an estimator of . He was especially worried about the discrepancy between s and when s was calculated from a very small sample. He proved that his t distribution (which is flatter, more ‘spread out’, than the normal z distribution) gave better or more correct results for small samples from a population which displayed a normal distribution.

19

Exercise 1(a)

The important to remember that the critical value for a given level of significance is greater for small samples than for larger samples. This is because there is more variability in sample means computed from small samples, therefore we have less confidence in the resulting estimates and are less likely to reject the null hypothesis.

(ii) Then we can use a t-test statistic defined as:

(iii) Unlike the z-test statistic, the t-test statistic has associated with it a quantity called degrees of freedom. In this case the degrees of freedom are denoted by the Greek letter v and are defined by v = n -1.

v = n - 1

(iv) Critical Value for t-test

The critical value in any t-distribution, tc , is found in the student-t distribution tables.

To use these tables, the following need to be ascertained:

1 the level of significance: 2 the number of degrees of freedom: 3 What type of tailed test is in question: one-tailed or two-tailed?

(v) To find tc look down the left-hand side of the row with the appropriate degrees of freedom, and across the top for the appropriate test (either one-tailed or two-tailed) and the significance level used.

20

Example 8

The General Insurance Company over a period of years has established that it costs $70 on average to process the paperwork, pay the assessor and finalise the claim. This cost when compared with that claimed by other insurance firms, is said to be much more expensive. As a result, cost-cutting measures were instituted. In order to evaluate the impact of these new measures a sample of 22 recent claims was chosen at random

and costs were recorded. It was found that the sample mean, , and

the standard deviation, s , of the sample were $66 and $10 , respectively. At the level of significance is there a reduction in the average cost, or can the difference of $4 ($66-$70) be attributed to chance?

Solution

The test is one-tailed because there is interest only in whether or not there has been a reduction in cost. The inequality in the alternative hypothesis points to the region of rejection in the left tail of the distribution.

t-test statistic

tc -critical value

One-tailed test

v = 21 (degrees of freedom)

Using the t-distribution tables:

tc = 2.518 , however as this is a one-tailed test, “less than” situation,tc = -2.518

21


-2.52 0 t scale (tc) -1.876

test statistic

As the t-test statistic lies in the region of acceptance, we accept the null hypothesis. Therefore, the cost cutting measures have not reduced the mean cost per claim to less than $70 based on the samples results.

Example 9

Experience has shown that the number of matches in boxes follows a normal distribution. A manufacturer claims that the average number of matches in its boxes is 50 .

A customer purchases a random sample of 9 boxes and counts the contents of each box. They were:

49 50 51 46 48 45 52 47 48

Based on this sample, should the customer believe the manufacturers claim? Use a two-sided test at .

Solution

22

t-test statistic

tc - critical value

Using the t-distribution tables; with

v = 8 (degrees of freedom)two-tailed testtc = 2.306

Since t = -2.036 lies in the acceptable region, iewe accept H0 at level of significance.

The claim made by the company that there is an average of 50 metres in its boxes may well be true.

Example 10

In a random sample of 20 components taken from a production line, the mean length of each component in this sample is 108.6 millimetres with a standard deviation of 6.3 millimetres. Given that each component should measure 105 millimetres long and that the population has proved to be normal, is there enough statistical evidence to show that the production line is producing components that are of an incorrect length? Test at the 5 percent level of significance.

Solution

23

t-test statistic

This is a two-tailed test at a level of significance of , with 19 degrees of freedom.

tc - critical value

Using the t-distribution tables:tc = 2.09

region of rejection region of rejection (0.025) (0.025)

2.09 0 2.09 z scale 2.556test statistic

As the t-test statistic lies in the region of rejection, we reject the null hypothesis.The components produced on the production line are of a different length to normal.

(vi) Summary of Steps in One Sample Hypothesis Testing

(a) Write down the null hypothesis H0 , and choose an appropriate form for the alternative hypothesis H1 , either not equal to

(a two tailed test) or a one tailed test either upper tail or

lower tail .

(b) Use the appropriate test statistic to calculate the value of z or t .

(c) Use a decision rule (at the level of significance) to test for the value of the test statistic.

24

(d) Compare the calculated z or t value and compared it with the critical z or t value and decide from the decision to either accept or reject the null hypothesis.

So far we have only considered one-sample tests. However, the general principles apply to all hypothesis testing in statistics for problems involving larger numbers of samples and other instances where a conclusion is to be drawn from data collected.

It should be emphasised that statistics is not an exact science – it doesn’t prove anything. What it does do is provide us with a guide for making reasonable conclusions based on the evidence before us, andeven provide us with the probability that we have made an error. However, the chance always remains that our conclusions may be incorrect!

5 Two sample Hypothesis Testing

(i) Another important use of statistical testing is to see whether there is a significant difference between the means of samples from two populations.A mathematics teacher may wish to know whether students taught with the aid of a computer have significantly higher grades than those taught with traditional methods.

(ii) The symbols used to describe aspects of each sample is shown in the table below, (table 2)Note, the two samples are drawn independently from the population:

sample symbol1 2

sizemean

standard deviation

n1

s1

n2

s2

table 2

(iii) We wish to examine the difference between the means of the two samples:

25

Exercise 1(b)

Generally speaking, when two sample means are different, we have two hypotheses to explore. First, there is the null hypothesis that the two populations from which the two samples originate have the same mean

. If this is the case, then the observed difference between the

two sample means is not significant and is attributed to chance or random sampling fluctuations. The alternative hypothesis to be explored is that the two samples are drawn from populations which have different means. If this hypothesis is true, the observed differences between the two sample means is deemed significant.

When two sample means are different, how can we decide whether or not the difference between the two means is significant? The standard procedure is to test the validity of the null hypothesis, which states that

, utilizing the information from the two samples. On the basis of the evidence produced by the two samples, we will either accept or reject the null hypothesis. If the null hypothesis is rejected, the observed difference between the two sample means is significant. However, the observed difference is not significant whenever the null hypothesis is accepted.

Symbolically we write:

Two Tailed Test One Tailed Test

or

Having established the appropriate null and alternative hypotheses, the appropriate statistic test needs to be used, depending on the sample size.

(iv) We will consider the situation when the sample size is large .

This requires the z-statistic test.

(v) Standard Deviation: known

When two samples are large, and the population standard

deviation, , is known, the standard error , (where d indicated

“difference”), of is given by the expression:

26

Note: the population standard deviation for a single sample is given by:

(vi) Standard Deviation: is not known

When two samples are large, and the population standard

deviation, , is not known, the standard error, , of

is given by the expression:

(vii) The z-statistic used for one sample hypothesis testing was given by:

When calculating the z-statistic for two sample hypothesis testing, we replace:

for

for

for

which gives:

27

Example 11

To compare the average life of two brands of 9-volt batteries, a sample of 100 batteries from each brand is tested. The sample selected from the first brand shows an average life of 47 hours and a standard deviation of 4 hours. A mean life of 48 hours and a standard deviation of 3 hours are recorded for the sample from the second brand. Is the observed difference between the means of the two samples significant at the 0.01 level?

Solution

There are two hypotheses:

and

Now

z-test statistic:

Now (case 2) at

we accept the null hypothesis

That is, the difference between the means of the two samples is not significant at the level.

28

Example 12

The efficiency of two training centers in a large company is to be evaluated. The test results of a group of students from each training centre is given below:

sample centre I centre II

sizemean

standard deviation

5082.57.2

40779.1

Determine whether there is a significant difference between the centres at the level of significance?

Solution

z-test statistic:

we reject the null hypothesis. There is a significant difference at .

Example 13

Two research laboratories have independently produced drugs that provide relief to arthritis sufferers. The first drug was tested on a group of 90 arthritis victims and produced an average of 8.5 hours of relief, with a standard deviation of 1.8 hours. The second drug was tested on 80 arthritis victims, producing an average of 7.9 hours of relief, with a

29

standard deviation of 2.1 hours. At the .05 level of significance, does the second drug provide a significantly shorter period of relief?

Solution

first drug second drug

This is a one-tailed test.

and = 0

Now z-test statistic =

Now (one tailed test at ) (case 3)We therefore reject H0 .

The second drug does provide significantly shorter relief.

30

Exercise 2(a)

6 t-test Statistic – two samples

(i) Standard Deviation: unknown

When two samples are small, (n1, n2 < 25) the sample standard deviation, sd of is given by the expression:

(ii) Degrees of Freedom

With a t-test , the degrees of freedom, v , is given by:

(iii) The t-statistic used for one sample hypothesis testing was given by:

When calculating the t-statistic for two sample hypothesis testing, we replace:

for =

31

which gives:

Example 14

A building security wishes to determine if there is a significant difference between the activity in the cheque account of two of its branches.

The following data was obtained:

sample branch I branch IIsize

meanstandard deviation

12$1000$150

10$900$120

Is there a significance difference between the two branches at the 5% level?

Solution

(two-tailed test)

= 100

also

t–test statistic

gives:

32

Now tc at two-tailed test with degrees of freedom,

we accept the null hypothesis.

There is no significant difference between the branches at the 5% level.

Example 15

A reading test is given to an elementary school class that consists of 12 Anglo-American children and 10 Mexican-American children. The results of the test are:

Anglo-American Mexican-American

Is the difference between the mean of the two groups significant at the 0.05 level?

Solution

Level of significance = 0 .05

To test the null hypothesis, we compute the observed value of t as:

= 4

33

t-test statistic

With v = 20 degrees of freedom (v = 12 + 10 - 2)at level, the t-critical value tc :

tc = 2.086 (two tailed test) we accept the null hypothesis.

The difference between the mean is not significant at the 0.05 level.

Example 16

A consumer-research organization routinely selects several car models each year and evaluates their fuel efficiency. In this year’s study of two similar subcompact models from two different automakers, the average gas mileage for twelve cars of brand A was 27.2 miles per gallon, with a standard deviation of 3.8 mpg . The nine brand B cars that were tested averaged 32.1 mpg , with a standard deviation of 4.3 mpg. At should it conclude that brand B cars have higher average gas mileage than do brand A cars?

Solution

(one tailed)

34

t-test statistic

t–critical value

d.f

One tailed test at

tc = -2.539

reject H0 :

Brand B does have a significantly higher average gas mileage than Brand A at the 1% level of significance.

35

Exercise 2(b)

7 Hypothesis Testing of Proportions

(i) So far we have discussed hypothesis testing involving the mean(one sample test) of a sample, or two means (two sample test) of different samples.

In each case we have dealt with large samples (z statistic) and small samples (t statistic).

In this section we are going to discuss hypothesis testing of proportions, that is, proportion of occurrences in a population.

(ii) Normal Approximation to the Binomial Distribution

When dealing with proportions the binomial distribution is the theoretically correct distribution to use, since the data is discrete, not continuous. It can be shown that as a sample size increases, the binomial distribution approaches the normal in its characteristics.

We will use this normal approximation to the binomial when dealing with the hypothesis testing of proportions.

(iii) Sample proportions

The sample proportion represents the probability of a success of a given sample. The sample proportion in the best estimate when the population proportion is not known.

(iv) Mean and Standard Deviation

The mean or expected proportion of a sample, , equals the

population proportion.

The standard deviation of a sample proportion is also referred to as

the standard error of the mean proportion and is given by:

36

Note: q = 1 - pn = number of independent binomial trials

(v) Hypotheses

When dealing with testing a single proportion, the null hypothesis is that the expected proportion equals the population proportion.

alternatively,

(vi) z-test statistic

To test whether the null hypothesis is accepted or rejected we determine the z statistic and test this value against the critical value (zc) at a given level of significance.

The z-statistic when dealing with proportions is given by:

= sample proportion= population proportion

= standard error

Example 17

Consider a company that is evaluating the promotability of its employees; that is, determining the proportion of them whose ability, training and experience qualify them for promotion.

The company estimates that 80% of their employees are promotable. After interviewing a random sample of 150 employees, a committee finds that only 70% of the sample deserve promotion.The company wishes to test the hypothesis that 80% of their workforce are promotable at a 5% level of significance.

37

Solution

The null hypothesis H0 is that the original proportion estimate of promotability.

alternatively,

at

Note also that = 0.7 = 0.3

n = 150

We are to test the expected proportion of the sample against the actual sample proportion.

The standard error:

The z–test statistic:

The critical value zc at 5% level:


(0.95)

-1.96 (zc) 0 1.96 (zc) z scale

z = -3.058test statistic

38

We reject the null hypothesis at .

The company should conclude that there is a significant difference between the expected (or hypothesized) proportion and the observed or actual proportion at the level of significance.The true proportion of promotable employees is not 80% .

Example 18

A member of a public interest group concerned with industrial pollution estimates that less than 60% of all factories comply with pollution standards.

A sample of 60 factories are sampled, with 33 complying with the pollution standards.

Test the null hypothesis that 60% are complying with pollution standards at the 1% level of significance.

Solution

, ,

standard error:

z-test statistic:

critical value zc at 1% level: zc = -2.33 (one tailed test).

39

region of rejection region

of acceptance

zc = -2.33 0 z scale critical value z = -0.779

z-test statistic

We accept the null hypothesis, even though the actual sample proportion is indeed below the expected proportion is indeed below the expected proportion, it is not significantly below this figure at the 1% level of significance.

Example 19

The sponsor of a weekly television show would like the studio audience to consist of an equal number of men and women. Out of 400 persons attending the show on a given night, 220 are men. Using a level of significance of 0.01 , can sponsor conclude that the desired sex composition of the audience is not properly maintained?

Solution

n = 400

standard error:

z-test statistic:

40

critical value: zc

At 1% level zc = 2.58 (two tailed test)

we accept null hypothesis at this level of significance.

Example 20

The Department of Health, Education and Welfare reports that only 10% of all persons over 65 years old are covered by adequate private health

insurance. What would the Australian Medical Association (AMA) conclude about the Department’s claim if, out of a random sample of 900 elderly persons, 99 possessed adequate private health insurance? Use a level of significance of .05 .

Solution

n = 900

standard error:

z-test statistic:

critical value: (one-tailed test at a = 0.05)

Since z is 1.0 , which is less than 1.64 , the null hypothesis cannot be rejected using the .05 level of significance. In other words, the AMA does not have enough evidence to reject the claim made by the Department of Health, Education, and Welfare.

41

Exercise 3(a)

8 Hypothesis Testing Between the Proportions

(i) In this section we will discuss the difference between the proportions of two samples.

(ii) Sample Proportions

For two samples, each containing respectively n1 and n2 data values, is the sample proportion with n1 values

is the sample proportion with n2 values

(iii) Mean of Sample Proportions

The mean or expected proportion for each respective sample equals their population proportions.

(iv) Hypotheses

If p1 and p2 denote the population proportions then the null hypothesis is that there is no significant difference in their proportions.

The alternative hypotheses would be either:

(two-tailed) , or

or (one-tailed)

(v) We wish to examine the difference between the two proportions:

42

Standard Error

The standard error (standard deviation) of the difference between the two proportions and is given by:

However, we do not know the population proportions, and thus we need to estimate them from the sample proportions. So in practice we calculate

using:

(vii) Overall Proportion

If we hypothesize that there is no difference between the two proportions, then our best estimate of the overall proportion of successes is the combined proportion of successes in both samples.

If is the overall proportion of success for both samples, then:

(viii) The standard error of the difference between the two proportions using the overall proportion, , is given by:

43

(ix) z-test statistic

To test whether the null hypothesis is accepted or rejected we determine the z-score and then test this value against the critical value (zc) at a given level of significance.

When testing one proportion, we used, z :

When calculating the z-score for two proportions hypothesis testing we replace:

44

Example 21

A drug company tests two compounds intended to reduce blood pressure levels. The compounds are given to different groups of animals.

Group 1 contained 100 animals, with 71 showing lower blood pressure levels with drug A .

Group 2 contained 90 animals, with 58 showing lower blood pressure levels with drug B .

Test to see if there is a difference between the effectiveness of thetwo drugs at a 0.05 level of significance.

Solution

Group 1 Group 2

The null hypothesis is that there is no difference between their population proportions.

with,

at

Two-tailed test

(a) Overall Proportion Estimate

45

(b) Standard Error

(c) Critical Value

At a 5% level of significance for a two-tailed test the zc critical values are +1.96 (case 1).

(d) z-test statistic

= 0

region of rejection region of region of rejection(0.025) acceptance (0.025)

(0.95)

-1.96 (zc) 0 1.96 (zc) z scale

z-statistic = 0.973

The difference between the two sample proportion lies within the acceptance limits. Thus, we accept the null hypothesis and conclude that these two drugs produce effects on blood pressure that are not significantly different, (at ) .

46

Example 22

A dental inspector found that, in area A, 20 out of a random sample of 200 had tooth decay, while in area B. 18 our of a random sample of 150 had tooth decay.

Does this indicate any difference in proportions at a 1% level of significance?

Solution

Area B Area B

The Null hypothesis H0 , is that there is no difference in the proportion of tooth decay in the two areas.

(two tailed)


(b) Standard Error

47

(c) Critical Value

At a 1% level of significance


region of rejection region of rejection (0.05) region of (0.05)

acceptance (0.95)

-2.58 (zc) 0 2.58 (zc) z scale z-test statistic z = 0.59

The difference between the two samples proportions is not significant at 1% level.

Accept the null hypothesis.

48

Example 23

A coal-fired power plant is considering two different systems for pollution abatement. The first system has reduced the emission of pollutants to acceptable levels 68 percent of the time, as determined from 200 air samples. The second, more expensive system has reduced the emission of pollutants to acceptable levels 76 percent of the time, as determined from 250 air samples. If the expensive system is significantly more effective than the inexpensive system in reducing pollutants to acceptable levels, then the management of the power plant will install the former system. Which system will be installed if management uses a significance level of 0.01 in making its decision?

Solution

(one tailed test at )


(b) Standard Error

49

(c) Critical Value

(one tailed)


Accept H0 : install cheaper system

9 Chi-Square Analysis

(i) We have investigated hypothesis tests from either one or two samples. We used one-sample tests to determine whether a mean of a proportion was significantly different from a hypothesized value. In the case of two-sample tests, we examined the difference between the two means or two proportions, to decide whether this difference was significant.

(ii) Chi–square Tests

Suppose we have more than two proportions to examine. If this is the case the current z-test would not be applicable. Instead we must use the Chi-square test. Chi-square tests enable us to test whether more than two population proportions can be considered equal.

(iii) Contingency Tables

Suppose that in four regions, the National Health Care Company samples its hospital employees’ attitudes toward job performance reviews. Respondents are given a choice between the present method, a proposed new method.

50

Exercise 3(b)

The table below, (table 3), illustrated the response to this question from the sample polled, is called a contingency table. A table such as this is made up of rows and columns; rows run horizontally, columns vertically. Notice that the four columns in Table 1 provide one basis of classification – geographical regions- and that the two rows classify the information another way; preference for review methods. Table 9-1 is called a “2×4 contingency table”, because it consists of two rows and four columns. We describe the dimensions of a contingency table by first stating the number of rows and then the number of columns. The “total” column and the “total” row are not counted as part of the dimensions.

method regionNortheast Southeast Central Westcoast total

present 68 75 57 79 279new 32 45 33 31 141total 100 120 90 110 420

table 3

(iv) Hypotheses

The null hypothesis (H0) in this case is that there is no relationship between the employee’s attitudes to job performance reviews and the region that they live in.

H0 : region and choice of method are independent alternately,H1 : region and choice of method are dependent

(v) Observed and Expected Frequencies

The observed frequencies, f0 , are the actual values obtained, which are recorded on the original contingency table.

The expected frequencies, fe , are those which are theoretically expected by considering the overall proportions of each classification.

The expected frequencies in a contingency table are determined by using the following formula:

51

where:

= the expected frequency in a given call= the row total for the row containing that cell

= the column total for the column containing that cell = the total number of observations

For example, the value for someone who prefers the present method in the Northeast region is given by:

The table below, (table 4), gives a summary of the observed and expected frequencies from table 1.

method Northeast Southeast Central Westcoast

present 68 75 57 79

66.43 79.72 59.79 73.07

new 32 45 33 31

33.57 40.28 30.21 36.93

table 4

(vi) Chi-square Statistics

The chi-square statistic is given by:

Using the information in table 4, we can establish the Chi-square statistic, (table 5):

52

6875577932453331

66.4379.7259.7973.0733.5740.2830.2136.93

1.57-4.72-2.79

5.93-1.57

4.72 2.79

-5.93

2.4622.28

7.7835.16

2.4622.28

7.7835.16

.0370

.2795

.1301

.4812

.0733

.5531

.2575

.9521 total 2.7638

table 5

(vii) Interpretation of Chi-square

The answer of 2.764 is the value for chi-square in our problem comparing preferences for review methods. If this value were as large as, say, 20 , it would indicate a substantial difference between our observed values and our expected values. A chi-square of zero, on the other hand, indicates that the observed frequencies exactly match the expected frequencies. The value of chi-square can never be negative, since the differences between the observed and expected frequencies

are always squared.

(viii) Chi-square Distribution

If the null hypothesis is true, then the sampling distribution of the chi-square statistic, , can be closely approximated by a continuous

curve known as chi-square distribution. As in the case of the t distribution, there is a different chi-square distribution for each different number of degrees of freedom.The chi-square distribution is a probability distribution. Therefore, the total area under the curve in each chi-square distribution is 1.0 .

(ix) Degrees of Freedom

To use the chi-square test, we must calculate the number of degrees if freedom (v) in the contingency table:

53

Where r is the number of rows in the problem, and c is the number of columns in the problem.

(x) Chi-square Critical Value

Returning to our example of job-review preferences of national health care hospital employees, we use the chi-square test to determine whether attitude about reviews is independent of geographical region. If the company wants to test the null hypothesis at the 0.05 level of significance, our problem can be summarized:

H0 : region and choice are independent H1 : region and choice are dependent

Since our contingency table for this problem (table 1) has two rows and four columns, the appropriate number of degrees of freedom is:

number of degrees of freedom v = (r-1)(c-1) v = (2-1)(4-1) v = (1)(3)

v = 3

The chi-square tables reveal that the chi-square critical value, with and v = 3 degrees of freedom equals 7.81 .

Thus the acceptance region for the null hypothesis in the figure below, (figure 5) goes from the left tail of the curve to the chi-square statistic of 7.81.

acceptance region chi-square distributionfor 3 degrees of freedom

sample chi-square value of 2.764

0.05 of the area

2.764 7.81

figure 5

The chi-square value calculated earlier , falls within the acceptance region. Therefore, we accept the null hypothesis that there is no difference between the attitudes about job interviews in the four geographical regions.

54

Example 24

Random samples of 160 , 240 , and 200 persons were selected from Melbourne, Sydney and Brisbane respectively. The persons selected were asked “What type of television program do you like best: drama, western, documentary, or comedy?” The responses are summarized below:

type of program

number of personsMelbourne Sydney Brisbane total

dramawestern

documentarycomedy

60303040

100 30 40 70

80 30 50 40

240 90120150

total 160 240 200 600

Test the hypothesis that there is a difference in television preferences among the resident in the three cities, at a level of significance of 0.05 .

Solution

(a) Hypotheses

H0 : the type of program watched is independent of the city.H1 : the type of program watched depends upon the city.

(b) Observed and Expected Frequencies

Using the formula:

we can establish both the observed and expected frequencies in one table.

The expected frequencies are in brackets.

Program Melbourne Sydney Brisbanedrama

westerndocumentary

comedy

60 (64)30 (24)30 (32)40 (40)

100 (96) 30 (36) 40 (48) 70 (60)

80 (80)30 (30)50 (40)40 (50)

(c) Chi-square Statistic

55

60100 80 30 30 30 30 40 50 40 70 40

649680243630324840406050

-4 4 0 6 -6 0 -2 -8

10 0

10 -10

16 16 0 36 36 0 4 64

100 0

100100

0.25 0.16

0 1.5 1

0 0.125 1.333

2.50

1.6672

total 10.535

(d) Degrees of Freedom

v = (r - 1)(c -1) = (4 -1)(3 - 1) = 6 (e) Critical Value

From tables with v = 6 gives

acceptance region

0.05 of area

10.535 12.6c2 statistic critical value

The statistic falls in the acceptance region.

Accept H0 . There is no connection between the preference for a program and the city that it is watched in.

56

Example 25

A teacher wished to determine whether the performance in a problem solving test is independent of the students’ year at school.

The teacher selected 120 students, 40 from each of Years 8, 9, and 10 and graded their performance in a test as A or B as shown in the table below:

year grade awarded totalA B

8 9

10

222627

181413

40 40 40

total 75 45 120

Test the hypothesis that performance in the test is independent of the students’ year at school, using the 5% and 1% level of significance.

Solution

(a) Hypotheses

The hypotheses being tested are:

H0 : there is no relationship between gradesH1 : there is a relationship


The table below sets out the observed and expected frequencies (in brackets):

year A B total 8 9

10

22 (25)26 (25)27 (25)

18 (15)14 (15)13 (15)

40 40 40

total 75 45 120


221826142713

251525152515

-3 3 1-1 2-2

0.360.600.040.070.160.27

total 1.50

57


v = (r-1)(c-1) = (3-1)(2-1) = 2

(e) Critical Value

From tables

also

So, we can accept the hypothesis that performance is independent of the students’ year at school at both the 1 % and 5 % level of significance.

Example 26

For random samples of 200 people contacted in each of six states, the number who favoured Australia becoming a republic is recorded in the table below:

preference State totalA B C D E F

yes 132 108 128 104 128 120 720no 68 92 72 96 72 80 480

total 200 200 200 200 200 200 1200

Test the hypothesis that people in the six states are equally in favour at the 5% level of significance.

Solution

(a) Hypotheses

H0 : people of the states are equally in favourH1 : people of the states are not equally in favour


The table below sets out the observed and expected frequencies (in brackets):

58

preference State totalA B C D E F

yes 132 (120) 108 (120) 128 (120) 104 (120) 128 (120) 120 (120) 720no 68 (80) 92 (80) 72 (80) 96 (80) 72 (80) 80 (80) 480

total 200 200 200 200 200 200 1200


132108128104128120

68 92 72 96 72 80

120120120120120120

80 80 80 80 80 80

12-12 8-16 8 0-12 12

-816

-8 0

1.21.2

0.533 2.133 0.533

01.81.80.83.20.8

0 total 14


v = (r-1)(c-1) v = (2-1)(6-1) v = 5

(e) Critical Value

From tables Reject null hypothesis. Not all states are equally in favour of a republic.

59

Exercise 4

60

61

Documents

Hypothesis Testing Notes