27
Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry http://www.cba.uiuc.edu/jpetry/ Econ_173_fa01/

Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry

Embed Size (px)

Citation preview

Page 1: Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry

Economics 173Business Statistics

Lecture 7

Fall, 2001

Professor J. Petry

http://www.cba.uiuc.edu/jpetry/Econ_173_fa01/

Page 2: Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry

2

Organization of Techniques• Keeping track of the different tests we are conducting is best done

with the “Decision Tree” and “Summary” provided in Chapter 22 of your book.

• As we go through the chapters you should be utilizing the decision tree and Summary to do your problems.– You will be given copies of both for the exams.– We will use the version at the end of the book (chapter 22) so you have the

same one to use during the mid-term and the final. – The versions we are handing out today, include statistical tables which, as

we announced last class will no longer be used in this course.• Develop a process to work each problem. My process is . . .

– Read the question at least twice– Ask myself what type of question does this feel like? Parameter? H1?– Go down the decision tree formally

Page 3: Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry

3

Organization of TechniquesExample 1:

In a recent municipal election the high cost of housing became an important issue. A candidate seeking to unseat an incumbent claimed that the average family spends more than 30% of its annual income on housing. A housing expert was asked to investigate the claim. A random sample of 125 households was asked to report the percentage of household income spent on housing costs. Assuming you were given the data, what technique would you use to determine if the candidate was correct at the 5% significance level?

Example 2:The number of internet users is rapidly increasing. A recent survey reveals that there are about 30 million Internet users in North America. Suppose a survey of 200 of these people were asked to report how many hours they spent on the Internet last week. Assuming you were given the data, what technique would you use to estimate with 95% confidence the average amount of time spent by all North Americans on the Internet?

Page 4: Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry

4

Organization of TechniquesExample 3:

A rock promoter is in the process of deciding whether to book a new band for a rock concert. He knows that this band appeals almost exclusively to teenagers. According to the latest census, there are 400,000 teenagers in the area. Assuming you were provided the data, what technique would you use to estimate the proportion of teenagers who will attend the concert?

Example 4:Some traffic experts believe that the major cause of highway collisions is the differing speeds of cars. That is, when some cars are driven slowly while others are driven fast, cars tend to congregate in bunches increasing the probability of accidents. Thus the greater the variation in speeds, the greater the number of collisions that occur. Suppose that one expert believes that when the variance exceeds 18 (mph), the number of accidents will be unacceptably high. Assuming you are provided the data, what technique would you use to test whether the variance in speeds exceeds 18 (mph)?

Page 5: Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry

5

Inference about the Comparison of

Two Populations

Inference about the Comparison of

Two Populations

Chapter 12

Page 6: Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry

6

12.1 Introduction

• Variety of techniques are presented whose objective is to compare two populations.

• We are interested in:– The difference between two means.– The ratio of two variances.– The difference between two proportions.

Page 7: Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry

7

• Two random samples are drawn from the two populations of interest.

• Because we are interested in the difference between the two means, we build the statistic for each sample.

12.2 Inference about the Difference b/n Two Means: Independent Samples

x

Page 8: Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry

8

is normally distributed if the (original) population distributions are normal .

is approximately normally distributed if the (original) population is not normal, but the sample size is large.

Expected value of is 1 - 2

The variance of is 12/n1 + 2

2/n2

21 xx

21 xx

The Sampling Distribution of 21xx

21xx

21xx

Page 9: Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry

9

• If the sampling distribution of is normal or approximately normal we can write:

• Z can be used to build a test statistic or a confidence interval for 1 - 2

21

21

nn

)()xx(Z

21

21

nn

)()xx(Z

21xx

Page 10: Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry

10

• Practically, the “Z” statistic is hardly used, because the population variances are not known.

21

21

nn

)()xx(Z

21

21

nn

)()xx(Z

? ?

• Instead, we construct a “t” statistic using the sample “variances” (S1

2 and S22).

S22S1

2t

Page 11: Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry

11

• Two cases are considered when producing the t-statistic.

– The two unknown population variances are equal.

– The two unknown population variances are not equal.

Page 12: Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry

12

Case I: The two variances are equal

2nns)1n(s)1n(

S21

2

22

2

112

p

Example: S12 = 25; S2

2 = 30; n1 = 10; n2 = 15. Then,

04347.2821510

)30)(115()25)(110(S2

p

• Calculate the pooled variance estimate by:

2pS

n2 = 15n1 = 10

21S

22S

Page 13: Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry

13

• Construct the t-statistic as follows:

2nn.f.d

)n1

n1

(s

)()xx(t

21

21

2p

21

• Perform a hypothesis test H0: = 0 H1: > 0;

or < 0; or 0

Build an interval estimate

level. confidence the is where

)n1

n1

(st)xx(21

2

p21

Page 14: Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry

14

1)(

1)(

)/(d.f.

)(

)()(

2

2

222

1

21

21

22

221

21

2

22

1

21

21

nns

nns

nsns

ns

ns

xxt

Case II: The two variances are unequal

Page 15: Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry

15

Run a hypothesis test as needed, or, build an interval estimate

level. confidence theis where

n

s

n

st)xx(

Estimator

2

22

1

21

21

Page 16: Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry

16

• Example 12.1– Do people who eat high-fiber cereal for

breakfast consume, on average, fewer calories for lunch than people who do not eat high-fiber cereal for breakfast?

– A sample of 150 people was randomly drawn. Each person was identified as a consumer or a non-consumer of high-fiber cereal.

– For each person the number of calories consumed at lunch was recorded.

Page 17: Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry

17

Consmers Non-cmrs568 705498 819589 706681 509540 613646 582636 601739 608539 787596 573607 428529 754637 741617 628633 537555 748

. .

. .

. .

. .

Consmers Non-cmrs568 705498 819589 706681 509540 613646 582636 601739 608539 787596 573607 428529 754637 741617 628633 537555 748

. .

. .

. .

. .

Calories consumed at lunch

Solution: • The data are quantitative. • The parameter to be tested is the difference between two means. • The claim to be tested is that mean caloric intake of consumers (1) is less than that of non-consumers (2).

Page 18: Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry

18

• Identifying the technique

–The hypotheses are:

H0: (1 - 2) = 0H1: (1 - 2) < 0

– To check the relationships between the variances, we use a computer output to find the samples’ standard deviations. We have S1 = 64.05, and S2

= 103.29. It appears that the variances are unequal.

– We run the t - test for unequal variances.

1 < 2)

Page 19: Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry

19

Calories consumed at lunch

• At 5% significance level there is sufficient evidence to reject the null hypothesis.

Consmers Non-cmrs568 705498 819589 706681 509540 613646 582636 601739 608539 787596 573607 428529 754637 741617 628633 537555 748

. .

. .

. .

. .

Consmers Non-cmrs568 705498 819589 706681 509540 613646 582636 601739 608539 787596 573607 428529 754637 741617 628633 537555 748

. .

. .

. .

. .

t-Test: Two-Sample Assuming Unequal Variances

ConsumersNonconsumersMean 604.023 633.234Variance 4102.98 10669.8Observations 43 107Hypothesized Mean Difference0df 123t Stat -2.09107P(T<=t) one-tail 0.01929t Critical one-tail 1.65734P(T<=t) two-tail 0.03858t Critical two-tail 1.97944

Page 20: Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry

20

• Solving by hand– The interval estimator for the difference between two

means is

65.2721.29107

29.1034305.64

9796.1)239.63302.604(

)2n

22s

1n

21s

(2t)2x1x(

22

Page 21: Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry

21

• Example 12.2

– Do job design (referring to worker movements) affect worker’s productivity?

– Two job designs are being considered for the production of a new computer desk.

– Two samples are randomly and independently selected• A sample of 25 workers assembled a desk using design A. • A sample of 25 workers assembled the desk using design B.• The assembly times were recorded

– Do the assembly times of the two designs differs?

Page 22: Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry

22

Design-A Design-B6.8 5.25.0 6.77.9 5.75.2 6.67.6 8.55.0 6.55.9 5.95.2 6.76.5 6.6. .. .. .. .

Design-A Design-B6.8 5.25.0 6.77.9 5.75.2 6.67.6 8.55.0 6.55.9 5.95.2 6.76.5 6.6. .. .. .. .

Assembly times in Minutes

Solution

• The data are quantitative.

• The parameter of interest is the difference between two population means.

• The claim to be tested is whether a difference between the two designs exists.

Page 23: Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry

23

Design-A Design-B6.8 5.25.0 6.77.9 5.75.2 6.67.6 8.55.0 6.55.9 5.95.2 6.76.5 6.6. .. .. .. .

Design-A Design-B6.8 5.25.0 6.77.9 5.75.2 6.67.6 8.55.0 6.55.9 5.95.2 6.76.5 6.6. .. .. .. .

t-Test: Two-Sample Assuming Equal Variances

Design-A Design-BMean 6.288 6.016Variance 0.847766667 1.3030667Observations 25 25Pooled Variance 1.075416667Hypothesized Mean Difference0df 48t Stat 0.927332603P(T<=t) one-tail 0.179196744t Critical one-tail 1.677224191P(T<=t) two-tail 0.358393488t Critical two-tail 2.01063358

t-Test: Two-Sample Assuming Equal Variances

Design-A Design-BMean 6.288 6.016Variance 0.847766667 1.3030667Observations 25 25Pooled Variance 1.075416667Hypothesized Mean Difference0df 48t Stat 0.927332603P(T<=t) one-tail 0.179196744t Critical one-tail 1.677224191P(T<=t) two-tail 0.358393488t Critical two-tail 2.01063358

The Excel printout

P-value of the one tail test

P-value of the two tail test

Degrees of freedomt - statistic

2

1S 2

2S2

pS

Page 24: Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry

24

A 95% confidence interval for 1 - 2 is calculated as follows:

]8616.0,3176.0[5896.0272.0

)251

251

1.075(0106.2016.6288.6

)n1

n1

(st)xx(21

2

p21

Thus, at 95% confidence level

-0.3176 < 1 - 2 < 0.8616

Notice: “Zero” is included in the interval

Page 25: Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry

25

Checking the required Conditions for the equal variances case (example 12.2)

The distributions are notbell shaped, but theyseem to be approximately normal. Since the techniqueis robust, we can be confidentabout the results.

0

2

4

6

8

10

12

5 5.8 6.6 7.4 8.2 More

Design A

01234567

4.2 5 5.8 6.6 7.4 More

Design B

Page 26: Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry

26

Example

• 12.20 from book• Random samples were drawn from each of two

populations. The data are stored in columns 1 and 2, respectively, in file XR12-20.

• Is there sufficient evidence at the 5% significance level to infer that the mean of population 1 is greater than the mean of population 2?

Page 27: Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry

27

X1 X2

Mean 246.80 Mean 239.66Standard Error 2.88 Standard Error 0.94Median 247.00 Median 240.00Mode 280.00 Mode 240.00Standard Deviation 28.81 Standard Deviation 11.57Sample Variance 829.90 Sample Variance 133.81Kurtosis 0.34 Kurtosis 0.02Skewness -0.02 Skewness 0.02Range 162.00 Range 61.00Minimum 158.00 Minimum 213.00Maximum 320.00 Maximum 274.00Sum 24680.00 Sum 35949.00Count 100.00 Count 150.00Confidence Level(95.0%) 5.72 Confidence Level(95.0%) 1.87

t-Test: Two-Sample Assuming Unequal Variances

X1 X2Mean 246.8 239.66Variance 829.89899 133.8097987Observations 100 150Hypothesized Mean Difference 0df 121t Stat 2.3551335P(T<=t) one-tail 0.0100626t Critical one-tail 1.657545P(T<=t) two-tail 0.0201252t Critical two-tail 1.9797653