Chapter 10 qbm

8/18/2019 Chapter 10 qbm

1/38

QBM101 Module 3 :CHAPTER 10,11 and 13 Page1

Lecture Notes for wk 9 and 10

Chapter 10 - Sampling Distribution and Central Limit Theorem

Statistics is a science of inference. It is the science of generalization from a part (the

randomly chosen sample) to the whole (the population).

A random sample of n elements is a sample selected from the population selected from

the population in such a way that every set of n elements is as likely to be selected from

the It is important that the sample be drawn randomly from the entire population under

study.

10.1 Why Sample the Population?

The following are few reasons for sampling.

o The physical impossibility of checking all items in the population. Pop is

infinite.o The cost of studying all the items in a population is very high.o The sample results are usually adequate.o Contracting the whole population would often be time-consuming.

10.2 Sampling Distribution of the Sample Means, x .

If we were to take a great many samples of size n from the same population,

we would end up with a great many different values for the sample mean

and sample variance. The resulting collection of sample means, x could thenbe viewed as a new random variable with its own mean and standard

deviation. The probability distribution of these sample means is called the

distribution of sample means, or the sampling distribution of the sample

mean.

o The sampling distribution of the sample mean is a probability distribution of

all possible sample means of a given sample size selected from a population.

o The mean of the sampling distribution of the sample mean is denoted by

x m = m

o The standard deviation of the sampling distribution of the sample means is

called the standard error of the mean,n x

s s =


2/38


Example 10.1 : Assuming that the mean weight of all the 10,000 students in HELP is

50 kg with a standard deviation of 5 kg.

Samples of 30 students are selected. For each sample the mean(sample mean)

represented by x and the sample variance represented by s 2 are determined.

A frequency distribution of the sample mean is drawn and it is known as the

sampling distribution if the sample mean. This sampling distribution of sample

mean must have a mean and a standard deviation. The mean of the sampling

distribution of the mean is written as x and it must be equal to the population

mean. i.e kg x 50==

The standard deviation of the sampling distribution of the sample mean writtenas xs which is also known as standard error the mean and

kg n x

91.0305 === s s

10.3 Properties of the Distribution of Sample Means

When all possible samples of a specific size are selected with replacement from a

population, the distribution of the sample means for a variable has two important

properties. First property of Distribution of the sample means.The mean of the sample means is equal to the population mean ie m m = x Second Property:The standard deviation of the sample means is called the standard error of the mean, xs

Wheren x

s s =

10.4 The Central Limit Theorem

1. When the population distribution is normal, the sampling distribution of the

sample means will be normally distributed for any sample size n.

2. When the population distribution is non-normal, the sampling distribution of

sample means, will be normally distributed if the sample size, n ³ 30.


3/38


o The central limit theorem can be used to answer questions about sample

means in the same manner that the normal distribution can be used to answer

questions about individual values. The only difference is that a new formula

must be used for the z-value. It is:

n

x x Z

x

xs

m s

m -=-

=

The central limit theorem is basic to the concept of statistical inference

because it permits us to draw conclusions about the population based strictly

on sample data, and without having any knowledge about the distribution of

the underlying population.

o Eg 10.2 : The time required by workers to complete an assembly job has a mean of

50 minutes and a standard deviation of 8 minutes. To spot check the workers ’

progress on a particular day, their supervisor intends to record the time of 60

workers to complete one assembly job a piece.

o i. What is the probability that a randomly selected worker will take more

than a mean of 58 minutes to assemble the job.

ii What is the probability that the sample mean will be more than 52 minutes?

iii. What is the probability that the sample mean will be between 49 and 52

minutes?

i.

Required area

x

=50 58

z = 0 z=1 z

)8

5058()min58(

->=> Z P s x P

= )1( > Z P = 0.5 – 0.3413 =0.1587


4/38


ii.

Required area

50== x 52

Z = 0 1.94 z

Let x = sample mean time needed to complete the job, μ = 50, σ= 8, n = 60

ii.

÷÷÷

ø

ö

ççç

è

æ ->=>

n

x z p x p

s m

)52(

= )60(8

5052

608

5052 -=÷÷÷

ø

ö

ççç

è

æ -> z p = p (Z>1.94)

= 0.5- 0.4738 = 0.0262

i. ( )÷÷

÷ ö

çç

ç

è

æ -=> Z p Z P x p

= 0.5+0.3554 = 0.8554


5/38


Eg 10.4. Assuming that the marks obtained by students taking Businessstatistics are normally distributed with mean mark and standard deviationof 60 and 8 respectively.

i. What is the probability that the mark of a randomlyselected student is more than72 marks?

)5.1()8

6072()72( >=

->=> X P Z P x P

= 0.5- 0.4332 = 0.0668

ii. What is the probability that the mark of a randomly selectedstudent is between 55 and 70 ?

÷ ø ö

çè æ -


6/38


Exercise 1

Assuming that the time taken by the students to travel to the campus is normallydistributed with a mean of 30 minutes and a stand deviation of 5 minutes.

i.

What is the probability that the time taken to travel to the campus by a randomlyselected student is between 25 minutes and 40 minutes?ii. What is the probability that the time taken to travel to the campus by a randomly

selected student is between 32 minutes and 38 minutes?iii. Is random samples of 25 students were selected, find the mean and standard error

of the resultant sampling distribution of the sample mean.iv. What is the probability that the sample mean of a random sample of 25 students is

more than 31 minutes?v. What is the probability that the sample mean of a random sample of 25 students is

between 28 minutes and 31.5 minutes?

Exercise 2 : Assume that the waiting time to check in by airline customers has a meantime of 12 minutes and a standard deviation of 4 minutes.

a. What is the probability that the waiting time of a randomly selected student

i. is less than 16 minutes?ii. is between 6 minutes and 14 minutes?

b. What is the probability that the mean weight of a random sample of 50 studentsi. is between 11 minutes and 13 minutes?ii. is more than 11.5 minutes?

7.6 Sampling Distribution of Sample ProportionConsider samples of size n drawn from a population. For each sample, the proportion of

success ( )^

p , known as sample proportion is determined. The frequency distribution andthe histogram of the distribution of the sample proportion is drawn. This distribution isknown as the sampling distribution of sample proportion which can be approximated bythe normal distribution if both np and nq are more than or equal to 5.

This sampling distribution of sample proportion must have a mean and standarddeviation(standard error of sample proportion).

The mean is written as p p =^ m

The standard error is written asn

pq

p

=^s


7/38


And conversion of^

p to z using

n pq

p p z

-=^

Eg.10.5 The Laurier brand has a market share of 30%. In a survey, 1000 consumerswere asked which brand they prefer. What is the probability that more than 32% of therespondents say they prefer Laurier brand?Solution: Since np(300) and nq(700) are both more than 5, can use Z.

))(

()32.0(

^^

n pq

p p Z P p P

->=>

= )38.1(

1000

)7.0)(3.0()3.032.0(

( >=-> Z P Z P = 1- 0.9162 = 0.0838

Exercise 3:

Assume that 60% of HELP ’s students are female. Sample of 100 students are selected.

i. Find the mean and standard error of proportion of the resultant samplingdistribution of sample proportion

ii. What is the probability that sample proportion of female students in a randomsample of 100 students is more than 65%?

iii. What is the probability that the sample proportion is between 0.27 and 0.31?

Exercise 4

Assume that 40 % of a hyper-market customers are not local residents. What is the

probability that in a random sample of 50 students,

i. more than 18 students are not local residents?

ii. between 22 and 24 are not local residents?


8/38


Chapter Eleven: Estimation

The use of sample information to draw conclusion about the population is known as

inferential statistics.

One aspect of inferential statistics is estimation , which is the process of

estimating the value of a population parameter from information obtained from a

sample.

Example 11.1 : An inspector from the department of consumer affairs wanted to

know whether the actual weight of tuna was at least as the weight shown on the

label. Since she cannot weight every can of tuna, she draws a random sample of

cans and uses the sample data to estimate the mean of all cans of tuna.

There are two types of estimation, which are used to estimate the unknown value of the

parameter:1. Point Estimate

2. Interval Estimate

11.1 Point Estimates

o A Point estimate is one value (a point) that is used to estimate a population

parameter.

The point estimates are the sample mean, the sample standard deviation, the

sample variance, the sample proportion etc …

o Example.11.2 : The number of defective items produced by a machine was

recorded for five randomly selected hours during a 40-hours work week. The

observed numbers of defectives were 12, 4, 7, 14 and 10. Find the point

estimate of the average number of defectives.

o Solution:

o The sample mean = 4.95

)10147412 =++++= x

o Thus the point estimate for the hourly mean number of defectives, m , is 9.4.

Use x as an estimate of m

Note : sample standard deviation, s = 3.97


9/38


11 .2 Interval Estimates

An interval estimator draws inferences about a population by estimating the value

of an unknown population parameter, using an interval that is likely to include the

value of the population parameter based on a sample.

Confidence Interval : It is an interval estimate for which there is a specified degree of

certainty that the actual value of the population parameter will fall within the interval.

Confidence Level : It expresses the degree of certainty that an interval will include the

actual value of the population parameter, but it is stated as a percentage. For example,

0.95 confidence coefficient is equivalent to a 95% confidence level.

Sampling Error : The difference between the observed statistic and the actual value of

the population parameter being estimated. This may also be referred to as estimation

error. o The two confidence intervals that are used extensively are the 95% and the

99%.

o A 95% confidence interval means that about 95% of the similarly constructed

intervals will contain the parameter being estimated or 95% of the sample

means for a specified sample size will lie within 1.96 standard deviation of the

hypothesized population mean.

o For the 99% confidence interval, 99% of the sample means for a specified

sample size will lie within 2.58 standard deviations of the hypothesized

population mean.

8.3 Formula for Confidence Interval estimate for the population

mean( Population standard deviation is know

CI ( ) ÷ ø öç

è æ ±=

n Z X

s m a

2

Example 11.3 A certain medication is known to increase the pulse rate of its user. The

standard deviation of the pulse rate is known to be 5 beats per minute. A sample of 30

users had on an average pulse rate of 104 beats per minute.

i. Find the 99% confidence interval of the true population mean.


10/38


Solution : CI ( ) ÷ ø öç

è æ ±=

n Z X

s m a

2

At 99 % confidence level, 005.02

01.02

Z Z Z ==a

We can make use of the t table with d.f.=infinity

¥= ,005.0005.0 t Z =2.576 = 2.58(2 decimal places)

4.2104305

58.2104 ±=±= m = 101.6 to 106.4

Hence one can be 99% confident that the mean pulse rate of all users of this medication is

between 101.6 to 106.4 beats per minute, based on a sample of 30 users.

ii. Estimate the 90% confidence interval for the true pulse rate for all the users.

At 90% Confidence level, 645.12

1.0 = Z

51.1104305

645.1104 ±=±= m = 102.49 to 105.51

Therefore the 90% Confidence interval for the true pulse rate for all the users is 102.49

and 105.51

iii. Estimate the 95% confidence interval for the true pulse rate for all the users.

At 95% Confidence level, 96.12

05.0 = Z

79.1104305

96.1104 ±=±= m = 102.21 to 105.79

Therefore the 95% Confidence interval for the true pulse rate for all the users is 102.21

and 105.79

Exercise 1

In order to estimate the mean distance travelled daily by students to the HELP campus, arandom sample of 30 students were selected and the mean distance was found to be 20

km. Assuming that the population standard deviation was 2 km.

i. Find the 90% confidence interval for the mean time travelled daily by all the

students to HELP campus.


11/38


ii. Find the 95% confidence interval for the mean time travelled daily by all the


iii. Find the 99% confidence interval for the mean time travelled daily by all the


iv. What must be the minimum sample size so that the sampling error at the 95%

confidence level in part (ii) do not exceed 0.5 km?

11.4 Formula for Confidence Interval estimate for the mean( population standard

deviation is unknown)

CI. ( ) ÷ ø öç

è æ ±=

- n s

t X n 1,2

a m

11.5 Characteristics of Student ’ s t distribution

o The t-distribution has the following propertieso It is continuous, bell shaped and symmetrical about the mean, 0 like the z-

distribution.

o There is a family of t-distribution sharing a mean of zero but having different

standard deviation.

o The t-distribution is more spread out and flatter at the center than the z-

distribution as the sample size gets larger.

11.5.1 Degrees of Freedom

The number of degrees of freedom is equal to the total number of

measurements(these are not always raw data points), less the total number of

restrictions on the measurements. A restriction is a quantity computed from the

measurements.

Many statistical distribution use the concept of degrees of freedom, and the

formulae for finding the degrees of freedom vary for different statistical tests. The

degrees of freedom are the number of values that are free to vary after a sample

statistic has been computed, and they tell the researcher which specific curve to

use when a distribution consists of a family of curves.

The symbol d.f. will be used for degrees of freedom. The degrees of freedom for a

confidence interval for the mean are found by subtracting one from the sample


12/38


size. That is, d.f. = n-1. For some statistical tests the degrees of freedom, are not

equal to n-1.

Example 11.4 If the mean of 5 values is 10, then 4 of the 5 values are free to vary.

But once 4 values are selected, the fifth value must be a specific number to get a

sum of 50, since 50/5 = 10. Hence the degrees of freedom is 5-1=4 and this value

tells the researcher which curve to use.

Example 11.5 : : Ten randomly selected automobiles were stopped, and the tread

depth of the right front tire was measured. The mean was 0.32 inch, and the

standard deviation was 0.08 inch. Find the 95% confidence interval of the mean

depth. Assume that the variable is approximately normally distributed.

o Solution:

1008.032.0

9,205.01,2

t n

st xn

±=±=-a m

= 06.032.01008.0

262.232.0 ±=±

= 0.26 to 0.38

The 95% confidence interval estimate of the population mean depth is 0.26and

0.38.

Exercises.

Find14,2

1.0t , 50,21.0t , ¥,25.0

t , ¥,21.0t

Example 11.6 A random sample of 10 packets of chocolate bars were selected

and the weights are shown below:

490 gm , 505 gms, 496 gm, 510 gm, 493 gm

508 gm, 500 gm, 508 gm, 488 gm, 498 gm.

Estimate the 90% and 95% confidence intervals for the true mean weight of all

the chocolate bars. State any assumption made

å = 4996 x and å = 24965662 x

gmn x

x 6.49910

4996 ==å=


13/38


( ) gm

nn x

x s 92.7

910

49962496566

1

22

=÷÷÷÷

ø

ö

çççç

è

æ -=

÷÷÷÷

ø

ö

çççç

è

æ

-

å å-=

= x 499.6 gm And s = 7.92 gm

At 90% confidence level, 833.19.21.0 =t

1092.7

6.4999,2

1.01,2t

n s

t xn

±=±=-a m

= 6.46.4991092.7833.16.499 ±=±

= 495 gm to 504.2 gm

Therefore we are 90% confident that the mean weight of all chocolate bars weigh

between 495 gm and 504.2 gm

At 95% confidence level, 262.29,205.0 =t

10

92.76.499

9,201.01,2t

n

st x

n ±=±=

-a m

57.56.4991092.7

262.26.499 ±=±

= 494.03 gm to 505.17 gm

Therefore we are 95% confident that the mean weight of all chocolate bars weigh

between 494.03 gm and 505.17 gm

Assumption: We have to assume that the population distribution of weights of

chocolate bars from which the sample was drawn is normally distributed.

Exercise 2

In order to estimate the mean distance travelled daily by students to the HELP campus, a

random sample of 30 students were selected and the mean distance was found to be 20

km and the standard deviation was 2 km.


14/38


i. Find the 90% confidence interval for the mean time travelled daily by all the

students to HELP campus.Ans : (19.38 km, 20.62 km)


students to HELP campus. Ans : (19.25 km, 20.75 km)


students to HELP campus. Ans: (18.99 km, 21,01 km)

11.6 Confidence Interval for a Population Proportion

The confidence interval for a population proportion is estimated by:

CI(p) =

n

q p Z p

^^

2

^

a ±

Condition: It is appropriate to use Z when 5^

³ pn and 5^

³qn o Example 11.7 : A sample of 500 nursing applications included 60 for men.

Find the 90% confidence interval of the true proportion of men who applied to

the nursing program.

· Here α= 1 - 0.90 = 0.10, and

· Sample proportion = 60 / 500 = 0.12.

· 5440560^^

>=>= qandn pn , So it is appropriate to use Z

· 2

1.0 Z = 645.1,21.0 =¥t (1.65)

CI(p) =nq p

Z p

^^

2

^

a ±

= 0.12 500)88.0)(12.0(

65.1± = 0.12 ± 0.024 = 0.096 to 0.144The 90% C.I. estimate of the true proportion of men is 0.096, 0.144(9.6% ,

14.4%)11 .7 Factors of Confidence Interval

The factors that determine the width of a confidence interval are:

1. The sample size, n


15/38


2. The variability in the population, usually estimated by s

3. The desired level of confidence.

If all other quantities remain unchanged, an increase in the value of the level of

confidence will lead to an increase in table value,Z and it will lead to a wider

interval. If all other quantities remain unchanged, an increase in sample size will

lead to a narrow interval.

11.8 Sample Size needed for an Interval

Estimate of the Population mean.

In the formula for CI of the population mean, ( ) ÷ ø öç

è æ ±=

n Z X

s m a

2

÷ ø ö

çè æ

n Z s

a 2

is the limits of the estimate or known as sampling error denoted by

E

As n becomes larger and larger, sampling error € becomes smaller and smaller.

E = ÷ ø öç

è æ

n Z

s a

2 , ÷

ø öç

è æ

ns

= Z E

Z

n

s a 2

= Therefore

2

2

÷÷

ø

ö

çç

è

æ

= E

Z

n

s a

Where E or B is the maximum error of estimate.

If necessary, round the answer up to obtain a whole number.

· Example 11.8 : The college president asks the statistics teacher to estimate the

average age of the students at their college. How large a sample is necessary?

The teacher decides the estimate should be accurate within 1 year and be 99%

confident. From a previous study, the standard deviation of the ages is known

to be 3 years.


16/38


· Solution:

Since α= 0.01 (or 1 - 0.99),

Z0.01 / 2 ( ),201.0 ¥t = 2.576(2.58), and E = 1,

2

2÷÷

ø

öçç

è

æ =

E

Z n

s a = 9.59

1)3)(58.2( 2 =÷ ø öç

è æ

Therefore the minimum sample size is 60.

Eg11.9. We would like to estimate the time that students are late for lecture classes. A

random sample of 20 students selected have a mean time of 10 minutes late for lecture

classes. Assuming that the population standard deviation is 0.5 minutes

i. Find the 95% confidence interval for the true mean time that students were late

for lecture classes.

Ans : ( )205.0

96.1102

±=÷ ø öç

è æ ±=

n Z X

s m a

= 10 minutes ± 0.22 minutes= 9. 78 minutes to 10.22 minutes

The 95% C.I. is 9.78 minutes to 10.22 minutes

ii. What must be the minimum sample size so that the sampling error in part (i) at

the 95% confidence level do not exceed 0.1 minutes?

Ans : 04.961.0

)5.0)(96.1(. 22 =÷ ø öç

è æ =÷

ø öç

è æ =

E Z

n s

Therefore the minimum sample size is 97 students.

11.9 Sample Size needed for Interval Estimate of a Population Proportion

In the formula for CI for proportion,

CI,( p ) =nq p

Z p

^^

2

^

a ±


17/38


nq p

Z

^^

2a is the Limits of the estimates or the sampling error represented by E

or B(Selvanathan).

E =nq p

Z

^^

2a , =

2a Z

E nq p^^

nq p

Z E

^^2

2

=÷÷÷

ø

ö

ççç

è

æ

a

2

2^^

÷÷

ø

öçç

è

æ =

E

Z q pn

a

Where E or B is the maximum error of estimate.

If necessary, round the answer up to obtain a whole number.

· Example 11.10 : A researcher wishes to estimate, with 95% confidence, the

number of people who own a home computer. A previous study shows that

40% of those interviewed had a computer at home. The researcher wishes to

be accurate within 2% of the true proportion. Find the minimum sample size

necessary.

Solution

Since 96.1,05.0 ,205.0205.0 ===

¥t Z a

E=0.02, 4.0^

= p and 6.0^

=q 2

205.0^^

÷÷

ø

öçç

è

æ =

E

Z q pn = 96.2304

02.096.1

)6.0)(4.0(2

=÷ ø öç

è æ

Minimum sample size necessary = 2305 people


18/38


Exercise 3

A sample of 30 youth selected indicated that the mean number of times that they hang-

out in a month is 20. Assuming that the population standard deviation of the number of

times of hang-out in a month is 3.

i. Estimate the mean number of times of hang-out per month for all the youths at the

90% confidence level.

9.020303

645.1202

1.0 ±=±=±=

n z x

s m

90% CI for mean is ( 19.1, 20.9)

ii Estimate the mean number of times of hang-out per month for all the youths at the

95% confidence level.

07.120303

96.1202

05.0 ±=±=±=

n z x

s m

95% C.I for population mean is (18.93, 21.07)

iii. What must be the minimum sample size so that the sampling error in part(ii) at the

95% confidence level do not exceed 0.5 times?

29.1385.0

)3(96.1. 22 ³÷ ø öç

è æ ³÷

ø öç

è æ ³

E Z

n s

Therefore minimum sample size = 139 youths

Exercise 4. In order to estimate the mean number of hours per day that college students

spend surfing the internet, a random sample of 50 college students were selected. The

mean number of hours per day reported by the sample of college students was 3 hours.

Assuming that the population standard deviation was 15 minutes.

i. Find the 95% confidence interval for the true mean number of hours that collegestudents spent surfing the internet per day.

07.035025.0

96.132

05.0 ±=±=±=

n z x

s m

Therefore 95% CI for population mean is (2.93 hrs, 3.07 hrs)


19/38


ii. What must be the minimum sample size so that the sampling error in part (i) do not

exceed 3 minutes at the 95% confidence level?

04.96

05.0

)25.0(96.1.22

³÷ ø

öçè

æ ³÷ ø

öçè

æ ³

E

Z n

s

Therefore minimum sample size is 97 students

Exercise. 5 A nutritionist would like to estimate the mean number of times that

Malaysian eat-out for dinner in a month, A random sample of 10 Malaysian selected

reported a mean of 20 times of eat-out with a standard deviation is 5.

i. Estimate the 90% confidence interval for the true mean time of eat-out per month

for all Malaysian.

ii. Estimate the 95% confidence interval for the true mean time of eat-out per month

for all Malaysian.

Ex. 6 In order to estimate the mean expenditure per car service , a random sample

cars were selected and their service expenditures are shown below.

RM250, RM220, RM180, RM200, RM210

RM 280, RM230, RM220, RM190, RM220

Also given : 220= x and s = 29.06i. Estimate the 90% confidence interval for the true mean expenditure per car

service for all the cars.

ii. Estimate the 95% confidence interval for the true mean expenditure per car


iii. Estimate the 99% confidence interval for the true mean expenditure per car


Ex.7 A sports scientist would like to estimate the proportion of school children spending

their time on sports in the evening. In a random sample of 100 school children, 40

of them spent their time on sports in the evening.


20/38


i. Estimate the 95% confidence interval for the true proportion of all school

children spending their time on sports in the evening.

ii. What must be the sample size so that the sampling error in part (i) do not

exceed 0.05 at the 95% confidence level?(Use the above sample information).

Ex 8. A lecturer would like to estimate the proportion of students who do not continue

lecture classes after a break. In a random sample of 50 students in a lecture class, 15

students did not come back after the break.

i. Estimate the 90% confidence interval for the true proportion of students who do

not come back to the tutorial class after the break.

ii. Using the above sample information, what must be the minimum sample size so

that the sampling in part(i) do not exceed 7%?

Chapter Thirteen: Testing of Hypothesis (One Sample Tests)

Statistical inference is concerned with how we draw conclusions from sample data about

the larger population from which the sample is selected.

In the previous chapter we discussed one branch of inference, namely estimation theory.

Another branch of inference is hypotheses testing theory.

Researchers are interested in answering many types of questions. For example, an

educator might wish to see whether a new teaching technique is better than a traditional

one. Automobile manufacturers are interested in determining whether seat belts will

reduce the severity of injuries caused by accidents.

These types of questions can be addressed

through statistical hypothesis testing , which is a decision-making process for evaluating

claims about a population.

In hypothesis testing, the researcher must define the population under study, state the

particular hypotheses that will be investigated, give the significance level, select a sample


21/38


from the population, collect the data, perform the calculations required for the statistical

test, and reach a conclusion.

13.1 Meaning of Hypothesis

o A Hypothesis is a statement about the value of a population parameter

developed for the purpose of testing.

13.2 Types of Hypotheses

A statistical hypothesis is a conjecture about the population parameter. This

conjecture may or may not be true. There are two types of hypotheses for each

situation: the null hypothesis and the alternate hypothesis.

· Null Hypothesis, H 0: A statement about the value of a population parameter.

The null hypothesis is either rejected or failed to be rejected

· Alternative Hypothesis, H 1: A statement about a population parameter alternateto the hull hypothesis. It is accepted if the sample data provide evidence that the

null hypothesis is false.

Example 13.1 : The mean income for systems analysts is $42000 per annum

against the claim that the mean income of systems analysts is

i). not equal to $42000,

ii) more than $42,000 andiii) less than 42,000

H o is the Null Hypothesis and H 1 is the alternate hypothesis

i. 42000: = m o H against 42000:1 ¹ m H

ii. 42000: = m o H OR 42000: £ m o H against 42000:1 > m H

iii. 42000: = m o H OR 42000: ³ m o H against 42000:1 < m H

Null Hypothesis Alternate Hypothesis42000$: = m o H

= Indicates:

· Equal to

· Is exactly the same as

42000$:1 ¹ m H

¹ Indicates· Not equal to

· Is different from

m is $42000 against

is not equal to

$42000


22/38


· Has not changed from

· Is the same as

· Has changed from

· Is not the same as

42000$: £ m o H

£ Indicates Less than or equal to· At Most or not more than

42000$:1 > m H

>Indicates More than· Above or has Increased

m is no more than

$42000 against ismore than $42,000

000,42$: ³o H

³ Indicates· More than or equal to

· At Least

000,42$: m , Accept the claim that mean weight will increase to more than 60kgAnother eg on Testing of population proportion.

Example 13.3 Last year 30% of university students were smokers. The university

introduce anti smoking program.

i. Can you accepted the claim that the proportion of smokers has changed from 30%


23/38


Ans : H o : p = 0.3, reject the claim that p ¹ 0.3H1 : 3.0¹ p , Accept the claim that 3.0¹ p

ii. Can you accept the claim that the proportion of smokers has been reduced?

Ans : Ho : p = 0.3, reject the claim that p 0.3

H1 : 3.0> , Accept the claim that 3.0> p

13.3 Hypothesis testing and the nature of the test.

When formulating the null and alternate hypotheses, the nature, or purpose of the test must betaken into account.

The purpose of the test can guide us towards the appropriate testing procedure.

13.3.1 One-Tail: Level Tests of significance

A one tail test indicates that the null hypothesis should be rejected when the test value is in the

critical region on one side of the mean. A one tail test is either right tailed or left tailed,

depending on the direction of the alternate hypothesis.

A one-tail test in which the rejection region is at the right is known as the right tail test.

A one-tail test in which the rejection region is at the left is known as left tail test.

Right-tail Test : A test is one-tailed to the right when the alternate hypothesis,H 1, states a right

direction, such as

Ho: The mean income of professors is $42000, m =$42000

H1: The mean income of professors is more than $42000, 42000$> m The test is a right tail since the alternate hypothesis indicated mean is more than $42000. So we

use more than sign in the alternate hypothesis.

Draw a graph of a right tail test at 0.05 level of significance showing the acceptance and

rejection or critical region and the critical value.


24/38


Left-tail Test : A test is one-tailed to the left when the alternate hypothesis,H 1, states a left

direction, such as


H1: The mean income of professors is less than $42000,

42000$<

m

The test is a left tail since the alternate hypothesis indicated mean is less than $42000. So we

use less than sign in the alternate hypothesis.

Draw a graph of a left tail test at 0.05 level of significance showing the acceptance and


13.3.2 Two-tail Tests: Level of significance

In a two tail test, the null hypothesis should be rejected when the test value(statistic) is in

either of the two critical region.

A two tail test in which the rejection areas are at both ends of the distribution.

A test is two-tailed when no direction is specified in the alternate hypothesis, H 1, such as


H1: The mean income of professors is not equal to $42000, 42000$¹ m Draw a graph of a two tail test at 0.05 level of significance showing the acceptance and


13.3.3 Errors in Hypothesis testing

In the judicial system, guilty persons sometimes go free, and innocent persons are

sometimes convicted.

· The following extract is reproduced from:

Hamburg, M (1977) pages 258-59, Statistical

Analysis for Decision Making, (2 nd Education)

Harcourt Brace Jovanovich, Inc, New York.

· Consider the process by which an accused individual is judged in a court of lawunder our legal system. Under Anglo-Saxon law, the person before the bar is

assumed innocent; the burden of proof of guilt rests on the prosecution. Using the

language of hypothesis testing, let us say that we want to test hypotheses, which we


25/38


denote as H 0, that the person before the bar is innocent. This means that there exists

an alternative hypothesis, H 1, that the defendant is guilty.

· The jury examines the evidence to determine whether the prosecution has

demonstrated that this evidence is inconsistent with the basic hypotheses, H 0 of

innocent. If the jurors decide the evidence is inconsistent with H 0, they reject that

hypotheses and therefore accept its alternative, H 1, that the defendant is guilty.

· If we analyse the situation that results when the jury makes its decision, we find

that four possibilities exist. The first two possibilities pertain to the case in which

the basic hypothesis H 0 is true, and the second to the case in which the basic

hypotheses H 0 is false.

1. The defendant is innocent (H 0 is true), and the jury finds that he is innocent (accept H 0);

hence the correct decision has been made.

2. The defendant is innocent (H 0 is true), and the jury finds him guilty (reject H 0); hence

an error has been made.

3. The defendant is guilty (H 0 is false), and the jury finds that he is guilty (reject H 0);

hence the correct decision has been made.

4. The defendant is guilty (H 0 is false), and the jury finds him is innocent (accept H 0);

hence an error has been made.· In case (1) and (3), the jury reaches the correct decision; in case (2) and (4), it

makes an error. Let us consider these errors in terms of conventional statistical

terminology.

· Type I Error: When we reject a null hypothesis, there is a chance that we made

a mistake i.e. we have rejected a true statement. Rejecting a true null hypothesis

is referred to as a Type I error and the maximum probability of making such an

error is “the level of significance ” represented by

a

· Type II Error: On the other hand we can also make the mistake of failing to

reject a false null hypothesis which we referred as a Type II error and the

maximum probability of making a type II error is represented by .


26/38


· It may be noted that in our legal system a Type I error is considered to be more

serious than type II error as we feel that it is worse to convict an innocent

person than to let a guilty one go free.

· Had we made H o the hypothesis that the defendant is guilty, the meaning of type

I and Type II error would have been reversed.

13.4 Meaning of Hypothesis Testing

o Hypothesis testing is a procedure, based on sample evidence and probability

theory, used to determine whether the hypothesis is a reasonable statement

and should not be rejected, or is unreasonable and should be rejected.

13.5 Steps in Testing a Hypothesis

In hypothesis testing, the researcher must define the population under study, state the

particular hypotheses that will be investigated, give the significance level, select a sample

from the population, collect the data, perform the calculations required for the statistical

test, and reach a conclusion.

Below is the summary of the six steps of hypothesis testing.

Step 1: State null and alternate hypotheses.

Step 2: Identify the test statistic

Step 3: Select a level of significance, given in question

Step 4: Formulate a decision rule

Step 5: Take a sample, calculate the test statistic


27/38


Step 6 : Decision

13.5.1 Formulating the null and alternate hypothesis

The null hypothesis asserts that a population is equal to, no more than or no less than

some exact value, and it in evaluated in the face of numerical evidence. An appropriate

alternate hypothesis covers other possible values for the alternate parameter.

13.5.2 : Selecting the test statistic

The test statistic will be either Z or t, corresponding to the normal and the t distribution

respectively.

An important consideration in tests involving a sample mean is whether the population

standard deviation is known

· The Z test will be used for hypothesis tests involving a sample mean if the

population standard deviation is known.· The t test will be used for hypothesis tests involving a sample mean if the

population standard deviation is unknown.

· The Z test will be used for hypothesis tests involving a sample proportion with the

conditions of both np and nq are more than or equal to 5.

13.5.3 : Select the significance level(Given in the question)

If we end up rejecting the null hypothesis, there is a chance that we are wrong in doing so

i.e. That we made a type I error. The significance level is the maximum probability that

we will make such a mistake.

13.5.4 : Formulating the decision rule

The critical value(s) will bound rejection and non-rejection regions for the null

hypothesis. The rejection and non-rejection regions can be stated as a decision rule

specifying the conclusion to be reached for a given outcome of the test.

Do not reject H o Reject H o and accept H 1


28/38


Critical value : The dividing point between the region where the null hypothesis is

rejected and the region where it is not rejected.

The critical or rejection region: It is the range of values of the test value that indicates

that there is a significant difference and the null hypothesis should be rejected.

The non-critical or non-rejection region: It is the range of values of the test value that

indicates that the difference was probably due to chance and that the null hypothesis

should not be rejected.

13.5.5 Calculating the test statistic value

The purpose of the test is to determine whether it is appropriate to reject or not to reject

the null hypothesis.

So the test statistic value is a value determine from the sample information, used to

determine whether or not to reject the null hypothesis.13.5.6 : Making the decision

If the calculated value is in the rejection region, the null hypothesis will be rejected.

Otherwise the null hypothesis cannot be rejected. Faluure to reject the null hypothesis

does not constitute proof that it is true but rather that we are unable to reject it at the level

of significance being used for the test.

13.6 : Methods of hypothesis Testing

The three methods used to test hypothesis are

1. The traditional method

2. The p-value method

3. The confidence level method

· The traditional method will be explained first. It has been used since the hypothesis

testing method was formulated.

· A new method, called the p-value method , has become popular with the advent of

modern computers and high-powered statistical calculators.

· The confidence interval method illustrates the relationship between hypothesis testing

and confidence intervals estimates.

13.6.1 Traditional method of Hypothesis Testing for the Population Mean:

Population Standard Deviation Known


29/38


o When testing for the population mean and the population standard deviation

is known, the test statistic is given by:

n

x Z

s m -=

o Example 13.4 : It is claimed that the average college student reads less than the

general public. The national average is 29.4 hours per week, with a standard

deviation of 2 hours. A sample of 30 college students has a mean of 27 hours.

Is there enough evidence to support the claim at α = 0.01?

o Solution:

Step 1 H 0: μ= 29.4 hrs, Reject the claim that college students read less than national

averageH1: μ< 29.4 hrs, Accept the claim that college students read less than national

average

Step 2: Test statistic:

n

x Z

s m -= since s is known

Step 3: Level of Significance: α = 0.01

Step 4: Decision Rule: Reject H 0 if Z < -2.33

Step 5: Value of the test statistic:27 - 29.4

Z = ————— = -6.572 / √30

Step 6: Conclusion: -6.57 < -2.33; Reject H 0

There is enough evidence to support the claim that college students read less than the

general public at the 1%(0.01) level of significance.

13.6.2 p-value in Hypothesis Testingo A p-value is the probability, assuming that the null hypothesis is true, of finding

a value of the test statistic at least as extreme as the computed value for the test.

o If the p-value is smaller than the significance level a , H 0 is rejected.

o If the p-value is larger than the significance level a , H 0 is not rejected.


30/38


The p-value of a test of hypothesis is the smallest value of that would lead to rejection of

the null hypothesis.

13.6.3 Computation of the p-value

· One-Tailed Test: p-value =

P {z ≥ absolute value of the computed test statistic value}

· Two-Tailed Test: p-value =

2 times P {z ≥ absolute value of the computed test statistic value}

o Example 13.5 : The mean marks obtained by the students used to be 45 with a

standard deviation of 4.36 marks. A new system was introduced and it was

claimed that the new system will produce different grade of students. After a few

months of introducing the new system a random sample of 50 students selected

showed that the mean marks obtained was 46.54. Can we accept the claim at 5%

level of significance? Use p-value met

Solution:

H0: μ = 45 (the claim is not accepted)

H1: μ ≠ 45 (the claim is accepted)


n

x Z

s m -=

Step 3: Level of Significance: α= 0.05Step 4: Decision Rule: Reject H 0 if p-Value < α

Or Reject Ho if Z > 1.96 or Z Z

Step 5: Value of the test statistic:

46.54 - 45Z = ————— = 2.498 ≈ 2.50

4.36 / √50

Area value of between 0 and 2.50 = 0.4938

p-value = 2 (1 - 0.9938) = 0.0124

Step 6: Conclusion: 0.0124 < 0.05; Reject H 0

Or Reject H o because 2.5 > 1.96


31/38


There is evidence to accept the claim that the new method will produce different grade

of students at the 5% level of significance.

Example 13.6 The mean marks obtained by the students used to be 45. A new system

was introduced and it was claimed that the new system will produce better students.

After a few months of introducing the new system a random sample of 50 students

selected showed that the mean marks obtained was 46.54. The population std deviation

was known to be 4.36 marks. Can we accept the claim at 5% level of significance? Use

p-value method.

Solution:

Step 1 H 0: μ = 45 (the claim is not accepted)

H1: μ > 45 (the claim is accepted)


n

x Z s

m -= Pop SD is Known

Step 3: Level of Significance: α= 0.05

Step 4: Decision Rule: Reject H 0 if p-value < 0.05 Or Z cal > 1.645


50.2498.2

50

36.4)455.46( »=-= Z

p-value = 1 - 0.9938 = 0.0062

Step6:Conclusion:0.0062


32/38


iii. Can we accept the claim that the mean age of all customers is less than 26.5 years at

the 5% significance level.

Exercise 2: rework question 1 using p-value method.

9.7 Testing for a Population Mean: Population standard deviation is unknown

o When the population standard deviation is unknown., the test statistic for the one

sample case is given by:

n s

xt

m -=

o Example 13.7 According to the norms established for a reading comprehension

test, standard six students should average 84.3. If 25 randomly selected standard

six students from a certain city averaged 87.8, with a standard deviation of 8.6,test the null hypothesis μ= 84.3 for that city against the alternative hypothesis μ>

84.3, using α = 0.05.

o Solution:

H0: μ = 84.3, mean has not increased

H1: μ > 84.3, mean has increased


n s

xt

m -=

Step 3: Level of Significance: α= 0.05

Step 4: Decision Rule: Reject H 0 if t > t α,v i.e > t 0.05, 24 = 1.711

Step 5: Value of the test statistic: 04.2

256.8

3.848.87 =-=t

Step 6 : Conclusion, 2.04 > 1.711, Reject H o There is sufficient evidence that μ> 84.3 at the 5% level of significance.

13.8 Confidence Interval MethodExample 13.8 A high school counselor believes the mean number of dropouts at her

college is 21 years. She reviews a sample of 17 dropouts and records the ages as shown

below:

12 18 24 16 21 20 18 19 19

22 25 16 18 19 19 20 23


33/38


a) At the 0.01 level of significance, is the counselor ’s hypothesis refutable?

b) Find the 99% confidence interval estimate of the mean number of all dropouts.

c) Is the hypothesized value m contained in the interval of b? Does this confirm

the conclusion in part (a)? Explain

Solution : 35.19= x , s = 3.16

By calculation : 35.1917329 ==å=

n x

x

16.316

17329

65272

=-

= s

21: = m o H Do not refute counselor ’s hypothesis21:1 ¹ m H Refute counselor ’s hypothesis

Test Statistics :

n s

xt

m -=

Level of significance = 0.01

Decision rule : Reject H o if 921.2>cal t ÷ ø

öçè

æ 16.1

01.0t

OR if t cal>2.921 or t < -2.921

Value of test statistic: 15.2

1716.3

)2135.19( -=-=cal t

Conclusion: Since -2.15>-2.921, do not reject H o

Therefore there is insufficient evidence at the 1% level of significance to refute the

counselor ’s belief that the mean number of dropouts is 21.

b) CI ÷ ø öç

è æ ±=

n s

t x16,2

01.0)( m = ÷ ø öç

è æ ±

1716.3

92.235.19

= 19.35 ± 2.238 = 17.11 to 21.5887Therefore we are 99% confident that the mean number of dropouts is between 17 to 22


34/38


C) Yes the hypothesized value of = m 21 is contained in the interval (17 to 22).Since 21 is within the 99% CI of 17 to 22, do not reject H o and this confirm our

m decision in part (a).

Exercise 2.A random sample of 30 customers patronizing a cyber cafe was found to have a mean age

of 25 years and a standard deviation is 4 years.

i. Can we accept the claim that the mean age of all customers is not 26.5 years at the

5% significance level?

ii. Can we accept the claim that the mean age of all customers is more than 23.5 years

at the 5% significance level?

iii. Can we accept the claim that the mean age of all customers is less than 26.5 years at

the 5% significance level.

Exercise 3

An educationist claims that the student ’s performance will be better with private tuition.

A random sample of 10 students with private tuition selected were given a test and the

marks scored are shown below:

40 45 58 59 60 63 70 80 85 90

Also given: å = 650 x and å = 446642 x i. Calculate the sample mean and standard deviation.

ii. Can we accept the educationist ’s claim that the mean mark of students will be more

than 55 at the 0.05 level of significance?

13.9 Tests Concerning Population Proportion

o A proportion is the fraction or percentage that indicates the part of the

population or sample having a particular trait of interest.

Eg. If 2000 out of 10,000 students in HELP are female, the population

proportion of female students, represented by 2.0100002000 == p


35/38


Eg. If 10 out of 25 students in a sample are female, then the sample proportion

of female students, represented by 4.02510^ == p

o The sample proportion is denoted by^

p and is found by:

eSampled NumberInThthesampleccesse Numberofsu

psin^ = =

n x

13.9.1 Test Statistic for Testing a Single Population Proportion.

n pq

p p Z

-=^

where q = 1 – p

The sample proportion is^

p and the population proportion is p.Condition : Both np and nq must be at least 5

o Example 13.9 It is claimed that less than 40% of HELP ’s students are hard working.

To test the claim a sample of 100 students selected at random showed that 35 are hard

working. Should the claim be accepted at 0.05 level of significance?

o Solution:np(0.4times100)=(40) and nq(0.6times100)=(60) > 5, so it is

appropriate to use Z test

Sample proportion, 35.010035^ == p

Step1:

H0: p = 0.4, Reject the claim that less than 40% of students are hardworking

H1: p < 0.4, Accept the claim that less than40% of students are hardworking

Step 2 : Test Statistic:

n pq

p p Z

-=^

Pg 2 of formula sheet

Step 3 :Level of Significance: α= 0.05(given in question)

Step 4: Decision Rule: Reject H 0 if z < -1.645

Or using p-value method, Reject H o if p-value < 0.05


36/38



02.1

100)6.0)(4.0(

4.035.0 -=-= Z

p-value = 0.1539

Step 6:Conclusion: -1.02 > -1.645; Do not reject H o

Using p-value method, Since p-value 0.1539 >0.05, Do not reject H o

There is sufficient evidence to reject the claim that less than 40% of HELP ’s

students are hard working at the 5% level of significance.

Example 13.10

In a random sample of 50 students, 20 students were late for tutorial classes. Can

we accept the claim that less than 50% of students were late for tutorial classes at

the 5% significant level?

Ans: 4.05020^ == p , 6.04.01

^=-=q

Step1:

H0: p = 0.5, Reject the claim that less than 50% of students were late for tutorial

classes

H1: p < 0.5, Accept the claim that less than 50% of students are late for tutorialclasses.

Step 2 : Test Statistic:

n pq

p p Z

-=^

, Pg 6 of formula sheet

Step 3 :Level of Significance: α= 0.05

Step 4: Decision Rule: Reject H 0 if z < -1.645Or using p-value method, Reject H o if p-value < 0.05



37/38


41.1

50)5.0)(5.0(

5.04.0 -=-= Z

p-value = 0.0793

Step 6:

Conclusion: -1.41 > -1.645; Or p-value(0.0793) > 0.05,

Do Not Reject H 0

There is insufficient evidence to accept the claim that less than 50% of students

were late for tutorial classes at the 5% level of significance.

ii. Will your decision of part(i) change at the 10% level of significance?

Answer: p-value(0.0793) < 0.1, Reject H o

Therefore decision has changed to accepting the claim that less than 50% students

were late for tutorial classes.

Example 13.11

It is claimed that more than 40% of HELP students do not do homework.

A random sample of 300 HELP students selected showed that 135 do not do home work.

i. Can we accept the claim at the 5%(0.05) level of significance?

ii. Find the p-value of the test

iii. Will your decision in part(i) change at the 2.5%(0.025) level of significance?ii. Step 1 : H o : p =0.4, Reject the claim that >40% of students do not do homework

H1 : p > 0.4 Accept the claim that ……………… homework

Step 2 : Test Statistics :

n pq

p p Z

-=^

Step 3 & 4 : At 5% sig level , Reject H o if Z cal > 1.645

Step 5 : Value of test statistic

77.1

300)6.0)(4.0(

4.045.0^

=-=-=

n pq

p p Z


38/38

Documents

Chapter 10 qbm