49
1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

  • View
    223

  • Download
    1

Embed Size (px)

Citation preview

Page 1: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

1

Review #2

Chapter 9

Chapter 10

Chapter 11

Chapter 12

Page 2: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Chapter 9

• A statistic is a random variable describing a characteristic of a random samples.– Sample mean– Sample variance

• We use statistic values in inferential statistics (make inference about population characteristics from sample characteristics).

• Statistics have distributions of their own.

Page 3: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

The Central Limit Theorem

– The distribution of the sample mean is normal if the parent distribution is normal.

– The distribution of the sample mean approaches the normal distribution for sufficiently large samples (n 30), even if the parent distribution is not normal.

– The parameters of the sample distribution of the mean are:

• Mean:

• Standard deviation:

nx

x

xx

Page 4: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Problem 1

• Given a normal population whose mean is 50 and whose standard deviation is 5,– Find the probability that a random sample of 4

has a mean between 49 and 52– Answer:

.4435.3446.7881.8)Z.4P(

)45

5052Z

45

5049P(52)xP(49

-.4 .8

Page 5: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Problem 2

– Find the probability that a random sample of 16 has a mean between 49 and 52.

– Answer

.7213.2119.93321.6)Z.8P(

)165

5052Z

165

5049P(52)xP(49

Page 6: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Problem 2

• The amount of time per day spent by adults watching TV is normally distributed with =6 and =1.5 hours.– What is the probability that a

randomly selected adult watches TV for more than 7 hours a day?

– Answer:

4.251.74861.67)P(Z1.567

ZP7)P(X

– What is the probability that 5 adults watch TV on the average 7 or more hours?

– Answer:

.0681.931911.49)P(Z

51.5

67ZP7)XP(

Page 7: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Problem 2

• Additional question– What is the probability that the total TV

watching time of the five adults sampled will exceed 28 hours?

– Answer:

551

665528

.

.)/( ZPXP

Page 8: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Sampling distribution of the sample proportion

• In a sample of size n, if np > 5 and n(1-p) > 5, then the sample proportion p = x/n is approximately normally distributed with the following parameters:

^

np)p(1

ppZ

therefore,,np)p(1

σandpμ pp

ˆ

ˆˆ

Page 9: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Problem 3

• A commercial of a household appliances manufacturer claims that less than 5% of all of its products require a service call in the first year.

• A survey of 400 households that recently purchased the manufacturer products was conducted to check the claim.

Page 10: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Problem 3

– Assuming the manufacturer is right, what is the probability that more than 10% of the surveyed households require a service call within the first year?

059440005105

051010

).(

).(.

..).ˆ( ZPZPpP

If indeed 10% of the sampled households reported a call for service within the first year, what does ittell you about the the manufacturer claim?

Page 11: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Chapter 10

• A population’s parameter can be estimated by a point estimator and by an interval estimator.

• A confidence interval with 1- confidence level is an interval estimator that covers the estimated parameters (1-)% of the time.

• Confidence intervals are constructed using sampling distributions.

Page 12: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Confidence interval of the mean

• We use the central limit theorem to build the following confidence interval

nzx

nzx

22 //

z/2-z/2

/2/2 1 -

Page 13: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Problem 4

• How many classes university students miss each semester? A survey of 100 students was conducted. (see Missed Classes)

• Assuming the standard deviation of the number of classes missed is 2.2, estimate the mean number of classes missed per student.

• Use 99% confidence level.

Page 14: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Problem 4

– Solution = 10.21 2.575 = 10.21 .57

nzx

2/

1- = .99 = .01/2 = .005Za/2 = Z.005= 2.575

100

2.2

LCL = 9.64, UCL = 10.78

Missed classes

Mean 10.21Standard Error 0.21755993Median 10Mode 10Standard Deviation 2.1755993Sample Variance 4.73323232Kurtosis 0.91111511Skewness -0.107237Range 14Minimum 3Maximum 17Sum 1021Count 100

Page 15: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Selecting the sample size

• The shorter the confidence interval, the more accurate the estimate.

• We can, therefore, limit the width of the interval to W, and get

• From here we have

nzWor

nzxWx

22 //

22

W

zn /

Page 16: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Problem 5

• An operation manager wants to estimate the average amount of time needed by a worker to assemble a new electronic component.

• Sigma is known to be 6 minutes.• The required estimate accuracy is within 20

seconds. • The confidence level is 90%; 95%.• Find the sample size.

Page 17: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Problem 5

– Solution = 6 min; W = 20 sec = 1/3 min;

• 1 - =.90 Z/2 = Z.05 = 1.645

• 1- = .95, Z/2 = Z.025 = 1.96

877

758763166451

2205

22

nTake

W

z

W

zn .

/

)(../

1245671244316961

2

nTaken .

/

)(.

Page 18: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Chapter 11

• Hypotheses tests– In hypothesis tests we hypothesize on a value of

a population parameter, and test to see if there is sufficient evidence to support our belief.

– The structure of hypotheses test• Formulate two hypotheses.

– H0: The one we try to reject in favor of …

– H1: The alternative hypothesis, the one we try to prove.

• Define a significance level

Page 19: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Hypotheses tests

– The significance level is the probability of erroneously reject the null hypothesis.

= P(reject H0 when H0 is true)– Sample from the population and calculate a

statistic that provides an indication whether or not the parameter value defined under H1 is more probable.

– We shall test the population mean assuming the standard deviation is known.

Page 20: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Problem 6

• A machine is set so that the average diameter of ball bearings it produces is .50 inch. In a sample of 100 ball bearings the mean diameter was .51 inch. Assuming the standard deviation is .05 inch, can we conclude at 5% significance level that the mean diameter is not .50 inch.

Page 21: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Problem 6

• The population studied is the ball-bearing diameters.

• We hypothesize on the population mean.• A good point estimator for the population

mean is the sample mean.• We use the distribution of the sample mean

to build a sample statistic to test whether = .50 inch.

Page 22: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Problem 6

• Solution– Define the hypotheses:

• H0: = .50

• H1: = .50

Define a rejection region. Note that this is a two tail test because of the inequality.

L2L1

L2L1

L2L1

ZZarearejectionlsymmetricatakeusLet

.05.50)μthatgivenZZorZP(Z

.05.50)μthatgivenXXorXXP(

Probability of type one error

Page 23: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Problem 6

.05.50)μthatgivenZZorZP(Z .025.025

Calculate the value of the sample Z statistic and compare it to the critical value

Z.025 = 1.96 (obtained from the Z-table)

Build a rejection region: Zsample> Z/2, or

Zsample<-Z/2

Critical Z

210005.50.51.

nX

Zsample

Since 2 > 1.96, there is sufficient evidence to rejectH0 in favor of H1 at 5% significance level.

1.96-1.96

Page 24: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Problem 6

• We can perform the test in terms of the mean value.

• Let us find the critical mean values for rejection

XL1=0 + Z.025 =.50+1.96(.05/(100)1/2=.5098

XL2=0 - Z.025 =.50 -1.96(.05/(100)1/2=.402

n

n

Since.51 > .5098, there is sufficient evidence to reject the null hypothesis at 5% significance level.

Page 25: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Problem 7

• The average annual return on investment for American banks was found to be 10.2% with standard deviation of 0.8%.

• It is believed that banks that exercise comprehensive planning do better.

• A sample of 26 banks that conducted a comprehensive training provided the following result: Mean return = 10.5%.

• Can we infer that the belief about bank performance is supported at 10% significance level by this sample result?

Page 26: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Problem 7

– The population tested is the “annual rate of return.”

H0: = 10.2

H1: > 10.2

– Let us perform the test with the p-value method:• P(X > 10.5 given that = 10.2) =

P(Z > (10.5 – 10.2)/[.8/(26)1/2] = P(Z > 1.91) = 1 - .5719 = .0281

– Since .0281 < .10 we reject the null hypothesis at 10% significance level.

Page 27: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Problem 7

• Note the equivalence between the standardized method or the rejection region method and the p-value method.

• P(Z>Z.10) = .10Z10 = 1.28

• Run the test with Data Analysis Plus.See data in Return

1.911.28

.0281

Page 28: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Type II Error

• Type II error occurs when H0 is erroneously not rejected.

• The probability of a type II error is called =P(Do not reject H0 when H1 is true)

• To calculate – H1 specifies an actual parameter value (not a range of

values). Example: H0: = 100; H1: = 110

– The critical value is expressed in original terms (not in standard terms).

Page 29: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Problem 7a

• What is the probability you’ll believe the mean return in problem 7 is 10.2% while actually it’s 10.6%, if the sample provided a mean return of 10.5%?

Page 30: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

• Solution– The two hypotheses are:

H0: = 10.2

H1: = 10.6

– H0 is not rejected (we believe = 10.2) if the sample mean is less than a critical value.

– Therefore, the probability required is: = P(X < Xcr | = 10.6).

Problem 7a

Page 31: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

• The critical value is (recall, this problem was a case of a right hand tail test, with 10% significance level):

= P(X<10.4 when = 10.6) =

P(Z < (10.4-10.6)/[.8/(26)1/2]) = P(Z < -1.27) = .102

10.4026.8

1.2810.2nσ

ZμX .100L

Problem 7a

Page 32: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Chapter 12

• Generally, the standard deviation is unknown the same way the mean may be unknown.

• When the standard deviation is unknown, we need to change the test statistic from “Z” to “t”.

• We shall test three population parameters:– Mean

– Variance

– Proportion

Page 33: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Testing the mean (unknown variance)

• Replace the statistic Z with “t”

The original distribution must be normal (or at least mound shaped).

ns

μXt

Page 34: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Problem 8

• A federal agency inspects packages to determine if the contents is at least as great as that advertised.

• A random sample of (i)5, (ii)50 containers whose packaging states that the weight was 8.04 ounces was drawn. (See Content).

• From the sample results…– Can we conclude that the average weight does not meet

the weight stated? (use = .05).– Estimate the mean weight of all containers with 99%

confidence– What assumption must be met?

Page 35: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Problem 8

• Solution– We hypothesize on the mean weight.

• H0: = 8.04

• H1: < 8.04

• (i) n=5. For small samples let us solve manuallyAssume the sample was: 8.07, 8.03, 7.99, 7.95, 7.94

– The rejection region: t < -tn = -t.05,5-1 = -2.132The tsample = ?

– Mean = (8.07+…+7.94)/5 = 7.996Std. Dev.={[(8.07-7.996)2+…+(7.94- 7.996)2]/4}1/2 = 0.054

-2.132

Page 36: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Problem 8

• The t sample is calculated as follows:

• Since -1.32 > -2.132 the sample statistic does not fall into the rejection region. There is insufficient evidence to conclude that the mean weight is smaller than 8, at 5% significance level.

32150540

0489967.

.

..

ns

Xt

Rejection Region

-1.32

-2.132

Page 37: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Problem 8

• (ii) n=50. To calculate the sample statistics we use Excel, “Descriptive statistics” from the Tools>Data analysis menu. From the sample we obtain:Mean = 8.02; Std. Dev. = .04

• The confidence interval is calculated by = 8.02 2.678 = 8.02 .015

or LCL = 8.005, UCL = 8.35

n

stx 2/

50

04.

t.005,50-1 = about 2.678 from the t - table

1- = .99 = .01/2 = .005

Page 38: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

• Comments– Check whether it appears that the distribution is

normal

Problem 8

Frequency

0

5

10

15

20

7.93 7.97 8.01 8.05 8.09 More

Page 39: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

– To obtain an exact value for ‘t’ use the TINV function:

The exact value:

Using Excel

=TINV(0.01,49)

.01 is the two tail probability

Degrees of freedom

2.6799535

Page 40: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Problem 8

– In our example recall:• H0: = 8.4

• H1: < 8.4

• The p-value = .000187 < .05

– There is sufficient evidence to reject the H0 in favor of H1.

t-Test: Two-Sample Assuming Unequal Variances

Weights V 2Mean 8.0182 8.04Variance 0.001627 0Observations 50 50Hypothesized Mean Difference0df 49t Stat -3.82126P(T<=t) one-tail 0.000187t Critical one-tail 1.676551P(T<=t) two-tail 0.000375t Critical two-tail 2.009574

Note: t = (8.018-8.04)/[.0403/(50)1.2]=-3.82. < -t.05,49 = -1.676

Page 41: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Inference about the population Variance

• The following statistic is 2 (Chi squared) distributed with n-1 degrees of freedom:

• We use this relationship to test and estimate the variance.

2

22 1

sn )(

Page 42: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Inference about the population Variance

• The Hypotheses tested are:

• The rejection region is:

20

20

20

21

20

20

ororH

H

:

:

.with2

replacetesttailtwotheFor

)(,,

211

212

0

21nn or

sn

Page 43: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Problem 9

• A random sample of 100 observations was taken from a normal population. The sample variance was 29.76.

• Can we infer at 2.5% significance level that the population variance exceeds 30?

• Estimate the population variance with 90% confidence.

Page 44: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Problem 9

• Solution:• H0:2 = 30• H1:2 < 30

2 = = = 97.42

2,n-1 = 2

.025,100-1 = about 129.561

– Since 97.42 < 129.42 we conclude that there is sufficient evidence at 2.5% significance level that the variance is smaller than 30.

(n – 1)s2

2(100 – 1)29.762

2

For the confidence interval look at page 370.

Rejection region: 2 < 2, n-1

Page 45: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

– We can get an exact value of the probability P(2d.f.> 2) =

for a given 2 and known d.f. This makes it possible to

determine the p-value.

– Use the CHIDIST function: For example: = .526

That is: P(299> 97.42) = .526

– In our example we had a left hand tail rejection region. The

p-value is calculated based on the 2 value (97.42):

P(299 < 97.42) = 1 - .526

= CHIDIST(97.42,99)

Using Excel

=CHIDIST(2,d.f.)

Page 46: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Using Excel

– We can get the exact 2 value for which P(2d.f.>

2) = for any given probability and known d.f.

– Use the CHIINV function

For example: =CHIINV(.025,99) = 128.4219

That is: P(299 > ?) = .025. 2 = 128.4219

=CHIINV(,d.f.)

Page 47: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Inference about a population proportion

• The test and the confidence interval are based on the approximated normal distribution of the sample proportion, if np>5 and n(1-p)>5.

• For the confidence interval of p we have:

where p = x/n

• For the hypotheses test, we run a Z test.

n

)p̂(p̂Zp̂

12

^

Page 48: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Problem 10

• A consumer protection group run a survey of 400 dentists to check a claim that 4 out of 5 dentists recommend ingredients included in a certain toothpaste.

• The survey results are as follows: 71 – No; 329 – Yes

• At 5% significance level, can the consumer group infer that the claim is true?

Page 49: 1 Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Problem 10

• Solution– The two hypotheses are:

• H0: p = .8• H1: p > .8

Z.05 = 1.645

– Since 1.18 < 1.645 the consumer group cannot confirm the claim at 5% significance level.

The rejection region: Z > Z

181400822518225

88225

1.

).(.

..

)ˆ(ˆ

ˆ

npp

ppZ