1 Pertemuan 09 Pengujian Hipotesis Proporsi dan Data Katagorik Matakuliah: A0392 – Statistik Ekonomi Tahun: 2006

1

Pertemuan 09Pengujian Hipotesis Proporsi

dan Data Katagorik

Matakuliah : A0392 – Statistik Ekonomi

Tahun : 2006

2

Outline Materi :

• Uji hipotesis proporsi

• Uji hipotesis beda proporsi

• Uji kebebasan data katagorik

3

Summary of Test Statistics to be Used Summary of Test Statistics to be Used in ain a

Hypothesis Test about a Population Hypothesis Test about a Population MeanMean

n n >> 30 ? 30 ?

known ?known ?

Popul. Popul. approx.approx.normal normal

?? known ?known ?

Use Use ss to toestimate estimate

Use Use ss to toestimate estimate

Increase Increase nnto to >> 30 30/

xz

n

/

xz

n

/

xzs n

/

xzs n

/

xz

n

/

xz

n

/

xts n

/

xts n

YesYes

YesYes

YesYes

YesYes

NoNo

NoNo

NoNo

NoNo

4

A Summary of Forms for Null and Alternative Hypotheses about a

Population Proportion

• The equality part of the hypotheses always appears in the null hypothesis.

• In general, a hypothesis test about the value of a population proportion p must take one of the following three forms (where p0 is the hypothesized value of the population proportion).

H0: p > p0 H0: p < p0 H0: p = p0

Ha: p < p0 Ha: p > p0 Ha: p p0

5

Tests about a Population Proportion:Large-Sample Case (np > 5 and n(1 - p) >

5)• Test Statistic

where:

• Rejection Rule

One-Tailed Two-Tailed

H0: pp Reject H0 if z > z

H0: pp Reject H0 if z < -z

H0: pp Reject H0 if |z| > z

zp p

p

0

z

p p

p

0

pp p

n

0 01( ) pp p

n

0 01( )

6

Example: NSC

• Two-Tailed Test about a Population Proportion: Large n

For a Christmas and New Year’s week, the National Safety Council estimated that 500 people would be killed and 25,000 injured on the nation’s roads. The NSC claimed that 50% of the accidents would be caused by drunk driving.

A sample of 120 accidents showed that 67 were caused by drunk driving. Use these data to test the NSC’s claim with = 0.05.

7

Example: NSC

• Two-Tailed Test about a Population Proportion: Large n– Hypothesis

H0: p = .5

Ha: p .5

– Test Statistic

0 (67/ 120) .51.278

.045644p

p pz

0 (67/ 120) .51.278

.045644p

p pz

0 0(1 ) .5(1 .5).045644

120p

p pn

0 0(1 ) .5(1 .5).045644

120p

p pn

8

Example: NSC

• Two-Tailed Test about a Population Proportion: Large n– Rejection Rule

Reject H0 if z < -1.96 or z > 1.96

– Conclusion

Do not reject H0.

For z = 1.278, the p-value is .201. If we reject

H0, we exceed the maximum allowed risk of committing a Type I error (p-value > .050).

9

Hypothesis Testing and Decision Making

• In many decision-making situations the decision maker may want, and in some cases may be forced, to take action with both the conclusion do not reject H0 and the conclusion reject H0.

• In such situations, it is recommended that the hypothesis-testing procedure be extended to include consideration of making a Type II error.

10

Calculating the Probability of a Type II Error in Hypothesis Tests about

a Population Mean

1. Formulate the null and alternative hypotheses.

2. Use the level of significance to establish a rejection rule based on the test statistic.

3. Using the rejection rule, solve for the value of the sample mean that identifies the rejection region.

4. Use the results from step 3 to state the values of the sample mean that lead to the acceptance of H0; this defines the acceptance region.

5. Using the sampling distribution of for any value of from the alternative hypothesis, and the acceptance region from step 4, compute the probability that the sample mean will be in the acceptance region.

xx

11

Example: Metro EMS (revisited)

• Calculating the Probability of a Type II Error

1. Hypotheses are: H0: and Ha:

2. Rejection rule is: Reject H0 if z > 1.645

3. Value of the sample mean that identifies the rejection region:

4. We will accept H0 when x < 12.8323

121.645

3.2/ 40x

z

12

1.6453.2/ 40

xz

3.212 1.645 12.8323

40x

3.212 1.645 12.8323

40x

12



5. Probabilities that the sample mean will be in the acceptance region:

Values of 1-

14.0 -2.31 .0104 .989613.6 -1.52 .0643 .935713.2 -0.73 .2327 .767312.83 0.00 .5000 .500012.8 0.06 .5239 .476112.4 0.85 .8023 .197712.0001 1.645 .9500 .0500

12.83233.2/ 40

z

12.83233.2/ 40

z

13



Observations about the preceding table:– When the true population mean is close to

the null hypothesis value of 12, there is a high probability that we will make a Type II error.

– When the true population mean is far above the null hypothesis value of 12, there is a low probability that we will make a Type II error.

14

Power of the Test

• The probability of correctly rejecting H0 when it is false is called the power of the test.

• For any particular value of , the power is 1 – .

• We can show graphically the power associated with each value of ; such a graph is called a power curve.

15

Determining the Sample Sizefor a Hypothesis Test About a Population

Mean

where

z = z value providing an area of in the tail

z = z value providing an area of in the tail= population standard deviation

0 = value of the population mean in H0

a = value of the population mean used for the Type II error

Note: In a two-tailed hypothesis test, use z /2 not z

nz z

a

( )

( )

2 2

02

nz z

a

( )

( )

2 2

02

16

Relationship among , , and n

• Once two of the three values are known, the other can be computed.

• For a given level of significance , increasing the sample size n will reduce .

• For a given sample size n, decreasing will increase , whereas increasing will decrease .

17

Inferences About the Difference Between the Proportions of Two

Populations

• Sampling Distribution of

• Interval Estimation of p1 - p2

• Hypothesis Tests about p1 - p2

p p1 2p p1 2

18

• Expected Value

• Standard Deviation

• Distribution Form

If the sample sizes are large (n1p1, n1(1 - p1), n2p2,

and n2(1 - p2) are all greater than or equal to 5), thesampling distribution of can be approximatedby a normal probability distribution.

Sampling Distribution of p p1 2p p1 2

E p p p p( )1 2 1 2 E p p p p( )1 2 1 2

p pp pn

p pn1 2

1 1

1

2 2

2

1 1 ( ) ( ) p p

p pn

p pn1 2

1 1

1

2 2

2

1 1 ( ) ( )

p p1 2p p1 2

19

Interval Estimation of p1 - p2

• Interval Estimate

• Point Estimator of

p p z p p1 2 2 1 2 /p p z p p1 2 2 1 2 /

p p1 2 p p1 2

sp pn

p pnp p1 2

1 1

1

2 2

2

1 1 ( ) ( )

sp pn

p pnp p1 2

1 1

1

2 2

2

1 1 ( ) ( )

20

Example: MRA

MRA (Market Research Associates) is conducting research to evaluate the effectiveness of a client’s new advertising campaign. Before the new campaign began, a telephone survey of 150 households in the test market area showed 60 households “aware” of the client’s product. The new campaign has been initiated with TV and newspaper advertisements running for three weeks. A survey conducted immediately after the new campaign showed 120 of 250 households “aware” of the client’s product.

Does the data support the position that the advertising campaign has provided an increased awareness of the client’s product?

21

Example: MRA

• Point Estimator of the Difference Between the Proportions of Two Populations

p1 = proportion of the population of households “aware” of the product after the new campaign

p2 = proportion of the population of households “aware” of the product before the new

campaign = sample proportion of households “aware” of the

product after the new campaign = sample proportion of households “aware” of the

product before the new campaign

p p p p1 2 1 2120250

60150

48 40 08 . . .p p p p1 2 1 2120250

60150

48 40 08 . . .

p1p1

p2p2

22

Example: MRA

• Interval Estimate of p1 - p2: Large-Sample Case

For = .05, z.025 = 1.96:

.08 + 1.96(.0510)

.08 + .10

or -.02 to +.18– Conclusion

At a 95% confidence level, the interval estimate of the difference between the proportion of households aware of the client’s product before and after the new advertising campaign is -.02 to +.18.

. . .. (. ) . (. )

48 40 1 9648 52

25040 60

150 . . .

. (. ) . (. )48 40 1 96

48 52250

40 60150

23

Hypothesis Tests about p1 - p2

• Hypotheses

H0: p1 - p2 < 0

Ha: p1 - p2 > 0

• Test statistic

• Point Estimator of where p1 = p2

where:

zp p p p

p p

( ) ( )1 2 1 2

1 2

zp p p p

p p

( ) ( )1 2 1 2

1 2

s p p n np p1 21 1 11 2 ( )( )s p p n np p1 21 1 11 2 ( )( )

pn p n pn n

1 1 2 2

1 2

pn p n pn n

1 1 2 2

1 2

p p1 2 p p1 2

24

Example: MRA


Can we conclude, using a .05 level of significance, that the proportion of households aware of the client’s product increased after the new advertising campaign?

p1 = proportion of the population of households

“aware” of the product after the new campaign

p2 = proportion of the population of households

“aware” of the product before the new campaign

– Hypotheses H0: p1 - p2 < 0

Ha: p1 - p2 > 0

25

Example: MRA


– Rejection Rule Reject H0 if z > 1.645

– Test Statistic

– Conclusion Do not reject H0.

p

250 48 150 40250 150

180400

45(. ) (. )

.p

250 48 150 40250 150

180400

45(. ) (. )

.

sp p1 245 55 1

2501150 0514 . (. )( ) .sp p1 2

45 55 1250

1150 0514 . (. )( ) .

z

(. . )

..

..

48 40 00514

080514

1 56z

(. . )

..

..

48 40 00514

080514

1 56

26

Test of Independence: Contingency Test of Independence: Contingency TablesTables

1. Set up the null and alternative hypotheses.1. Set up the null and alternative hypotheses.

2. Select a random sample and record the 2. Select a random sample and record the observedobserved

frequency, frequency, ffij ij , for each cell of the contingency , for each cell of the contingency table.table.

3. Compute the expected frequency, 3. Compute the expected frequency, eeij ij , for , for each cell. each cell.

ei j

ij (Row Total )(Column Total )

Sample Sizee

i jij

(Row Total )(Column Total ) Sample Size

27

Test of Independence: Contingency Tables

4. Compute the test statistic.

5. Reject H0 if (where is the significance level and with n rows and m columns there are

(n - 1)(m - 1) degrees of freedom).

22

( )f e

eij ij

ijji2

2

( )f e

eij ij

ijji

2 2 2 2

28

Example: Finger Lakes Homes (B)

• Contingency Table (Independence) Test Each home sold can be classified according to

price and to style. Finger Lakes Homes’ manager would like to determine if the price of the home and the style of the home are independent variables.

The number of homes sold for each model and price for the past two years is shown below. For convenience, the price of the home is listed as either $65,000 or less or more than $65,000.

Price Colonial Ranch Split-Level A-Frame < $65,000 18 6 19 12 > $65,000 12 14 16 3

29

Contingency Table (Independence) TestContingency Table (Independence) Test

• HypothesesHypotheses

HH00: Price of the home : Price of the home isis independent of independent of the style the style of the home that is purchased of the home that is purchased

HHaa: Price of the home : Price of the home is notis not independent of theindependent of the

style of the home that is purchasedstyle of the home that is purchased

• Expected FrequenciesExpected Frequencies

PricePrice Colonial Ranch Split-Level A-Frame Colonial Ranch Split-Level A-Frame Total Total

<< $99K 18 $99K 18 6 19 12 6 19 12 55 55

> $99K 12 14 16 > $99K 12 14 16 3 45 3 45

Total 30 20 35 Total 30 20 35 15 10015 100

Example: Finger Lakes Homes (B)Example: Finger Lakes Homes (B)

30

• Contingency Table (Independence) Test– Test Statistic

= .1364 + 2.2727 + . . . + 2.0833 = 9.1486– Rejection Rule

With = .05 and (2 - 1)(4 - 1) = 3 d.f.,

Reject H0 if 2 > 7.81

– Conclusion

We reject H0, the assumption that the price of the home is independent of the style of the home

that is purchased.

22 2 218 16 5

16 56 11

113 6 75

6 75 ( . )

.( )

. .( . )

. . 2

2 2 218 16 516 5

6 1111

3 6 756 75

( . ).

( ). .

( . ).

.

. .052 7 81. .052 7 81

Example: Finger Lakes Homes (B)

Documents

1 Pertemuan 09 Pengujian Hipotesis Proporsi dan Data Katagorik Matakuliah: A0392 – Statistik Ekonomi Tahun: 2006