34
Learning Objectives • Estimate a population mean from a sample mean when s is known. • Estimate a population mean from a sample mean when s is unknown. • Estimate a population proportion using the z statistic. • Use the chi-square distribution to estimate the population variance given the sample variance. • Determine the sample size needed in order to estimate the population mean and population proportion.

JM Statr session 13, Jan 11

Embed Size (px)

Citation preview

Page 1: JM Statr session 13, Jan 11

Learning Objectives

• Estimate a population mean from a sample mean when s is known.

• Estimate a population mean from a sample mean when s is unknown.

• Estimate a population proportion using the z statistic.• Use the chi-square distribution to estimate the

population variance given the sample variance.• Determine the sample size needed in order to

estimate the population mean and population proportion.

Page 2: JM Statr session 13, Jan 11

Estimating the Population Parameter

• A point estimate is a statistic calculated from a sample that is used to estimate a population parameter.

• Interval estimate - a range of values within which the analyst can declare, with some confidence, the population parameter lies.

Page 3: JM Statr session 13, Jan 11

Point Estimate of μ

• Point estimate

• Point estimate is also called Estimator• Varies from sample to sample

nxx

Page 4: JM Statr session 13, Jan 11

Interval Estimate of μ

• Because of variation in sample statistics, a population parameter is estimated using an Interval Estimate

• An interval estimate (confidence interval) is a range of values within which the researcher feels, with some confidence, that the population mean lies

Page 5: JM Statr session 13, Jan 11

Estimating the Population Mean

using Interval Estimate

Since sample mean can be greater or less than the mean, z can be positive or negative, and

xz

n

s

Page 6: JM Statr session 13, Jan 11

What is in Confidence Interval?

Confidence Interval to estimate is :

where = the total area under the normal curve outside the confidence interval area (expressed in decimal fraction) = the area in one end (tail) under the normal curve outside the confidence interval

Page 7: JM Statr session 13, Jan 11

Finding out z value for95% Confidence Interval

• is used to locate the Z value in constructing the confidence interval

• For a 95% confidence interval = 0.05 /2 = 0.025

• Value of z for /2 or z.025 look at the standard normal distribution table under the area

.5000 - .0250 = .4750• From Table A5 look up 0.4750, and read 1.96 as the z

value from the row and column

Page 8: JM Statr session 13, Jan 11

A 95% Confidence Interval

for Population Parameter

.4750 .4750

X

95%.025.025

Z1.96-1.96 0

= =

=- =

Page 9: JM Statr session 13, Jan 11

Significance of Level of Confidence

• What does the Level of Confidence to be 95%/ mean?

• It means that if the research analyst were to randomly select 100 samples of some size n and use the result i.e. calculated sample mean to construct a 95% confidence interval, approximately 95 of the 100 confidence intervals would contain the population mean.

• You will try out a practical example using R.

Page 10: JM Statr session 13, Jan 11

95% Confidence Intervals for μ

X

95%

XX

X

XX

X

Page 11: JM Statr session 13, Jan 11

Values of z for common Levels of Confidence

Confidence Level Z Value90% 1.64595% 1.9698% 2.3399% 2.575

Think: What happens to the length of Confidence Interval as the Confidence Level increases?

Page 12: JM Statr session 13, Jan 11

95% Confidence Intervals for μ

/21300, 160, 85, 1.96x n z s

/2 /2

46 461300 1.96 1300 1.9685 85

1300 34.01 1300 34.011265.99 1334.01

x z x zn n s s

Page 13: JM Statr session 13, Jan 11

Demonstration Problem 8.1

• A survey was taken of U.S. companies that dobusiness with firms in India. One of the questionson the survey was: Approximately how many yearshas your company been trading with firms in India?A random sample of 44 responses to this question yielded a mean of 10.455 years. Suppose the population standard deviation for this questionis 7.7 years. Using this information, construct a 90% confidence interval for the mean number of years that a company has been trading in India for the population of U.S. companies trading with firms in India.

Page 14: JM Statr session 13, Jan 11

Demonstration Problem 8.1:Solution

365.12545.891.1455.1091.1455.10

447.7645.1455.10

447.7645.1455.10

ssn

zxn

zx

645.1 confidence %90.44 ,7.7 ,455.10

znx s

Page 15: JM Statr session 13, Jan 11

Demonstration Problem 8.2

• A study is conducted in a company that employs 800 engineers. A random sample of 50 engineers reveals that the average sample age is 34.3 years. Historically, the population standard deviation of the age of the company’s engineers is approximately 8 years. Construct a 98% confidence interval to estimate the average age of all the engineers in this company.

Page 16: JM Statr session 13, Jan 11

Demonstration Problem 8.2:Solution

33.2 confidence %98.50 and ,800= ,8 ,3.34

znNx s

85.3675.31554.23.34554.23.34

180050800

50833.23.34

180050800

50833.23.34

11

ssN

nNn

zxN

nNn

zx

Page 17: JM Statr session 13, Jan 11

What is t distribution?

• A family of distributions -- a unique distribution for each value of its parameter, degrees of freedom (d.f.)

• t distribution is used instead of the z distribution for doing inferential statistics on the population mean when the population Standard Deviation is unknown and the population is normally distributed

• With the t distribution, you use the Sample Standard Deviation, s

Page 18: JM Statr session 13, Jan 11

t Distribution

A family of distributions - a unique distribution for each value of its parameter using degrees of freedom (d.f.), every sample size having a different distribution

ns

xt

Page 19: JM Statr session 13, Jan 11

t Distribution Characteristics

• t distribution – symmetric, unimodal, mean = 0, flatter in middle and have more area in their tails than the normal distribution

• t distribution approaches the normal curve as n becomes larger

• t distribution is to be used when the Population Varianceor Population Standard Deviation is unknown, regardless of the size of the sample

Page 20: JM Statr session 13, Jan 11

Robustness of t Distribution

• Most statistical techniques have one or more underlying assumptions

• If a technique is relatively insensitive to minor violations in one or more assumptions, the technique is said to be robust to that assumption.

• t statistic for estimating a population mean is relatively robust to the assumption that the population is normally distributed

Page 21: JM Statr session 13, Jan 11

Reading the t Distribution table

• t table uses the area in the tail of the distribution

• Emphasis in the t table is on , and each tail of the distribution contains /2 of the area under the curve when confidence intervals are constructed

• t values are located at the intersection of the df value and the selected /2 value

Page 22: JM Statr session 13, Jan 11

t statistic: Degrees of Freedom (df)

• For t statistic, df is n-1• Degree of Freedom refers to the number of

independent observations for a source of variation minus the number of independent parameters estimated in computing the variation

• Number of independent observations = n• One independent parameter, population

mean μ, is being estimated

Page 23: JM Statr session 13, Jan 11

Confidence Intervals for μ of aNormal Population: Unknown σ

1

1,2/1,2/

1,2/

ndfnstx

nstx

ornstx

nn

n

Page 24: JM Statr session 13, Jan 11

Table of Critical Values of t

df t0.100 t0.050 t0.025 t0.010 t0.0051 3.078 6.314 12.706 31.821 63.6562 1.886 2.920 4.303 6.965 9.9253 1.638 2.353 3.182 4.541 5.8414 1.533 2.132 2.776 3.747 4.6045 1.476 2.015 2.571 3.365 4.032

23 1.319 1.714 2.069 2.500 2.80724 1.318 1.711 2.064 2.492 2.79725 1.316 1.708 2.060 2.485 2.787

29 1.311 1.699 2.045 2.462 2.75630 1.310 1.697 2.042 2.457 2.750

40 1.303 1.684 2.021 2.423 2.70460 1.296 1.671 2.000 2.390 2.660

120 1.289 1.658 1.980 2.358 2.6171.282 1.645 1.960 2.327 2.576

t

With df = 24 and = 0.05, t = 1.711.

Page 25: JM Statr session 13, Jan 11

Demonstration Problem 8.3• The owner of a large equipment rental company wants to make

a rather quick estimate of the average number of days a piece of ditch digging equipment is rented out per person per time. The company has records of all rentals, but the amount of time required to conduct an audit of all accounts would be prohibitive. The owner decides to take a random sample of rental invoices. Fourteen different rentals of ditch diggers are selected randomly from the files, yielding the following data. She uses these data to construct a 99% confidence interval to estimate the average number of days that a ditch digger is rented and assumes that the number of days per rental is normally distributed in the population.

• Data: 3 1 3 2 5 1 2 1 4 2 1 3 1 1

Page 26: JM Statr session 13, Jan 11

Solution to Demonstration Problem 8.3

012.3

005.0299.1

2

131 ,14 ,29.1,14.2

13,005.

t

ndfn sx

18.310.104.114.204.114.2

1429.1012.314.2

1429.1012.314.2

nstx

nstx

Page 27: JM Statr session 13, Jan 11

Confidence Interval to Estimate the Population Proportion

2 2

ˆ ˆ ˆ ˆˆ ˆ

:ˆ = sample proportionˆ ˆ=1

= population proportion = sample size

p q p qp z p p zn n

wherepq ppn

Page 28: JM Statr session 13, Jan 11

Demonstration Problem 8.5

A clothing company produces men’s jeans. The jeans are made and sold with either a regular cut or a boot cut. In an effort to estimate the proportion of their men’s jeans market in Oklahoma City that prefers boot-cut jeans, the analyst takes a random sampleof 423 jeans sales from the company’s two Oklahoma City retail outlets. Only 72 of the sales were forboot-cut jeans. Construct a 90% confidence interval to estimate the proportion of the population in Oklahoma City who prefer boot-cut jeans.

Page 29: JM Statr session 13, Jan 11

Solution forDemonstration Problem 8.5

72ˆ423, 72, 0.17423

ˆ ˆ=1 1 0.17 0.8390% 1.645

xn x pn

q pConfidence z

ˆ ˆ ˆ ˆˆ ˆ

(0.17)(0.83) (0.17)(0.83)0.17 1.645 0.17 1.645423 423

0.17 0.03 0.17 0.030.14 0.20

pq pqp z p p zn n

p

pp

Page 30: JM Statr session 13, Jan 11

Estimating Population Variance

Population Parameter s

Estimator of s

formula for Single Variance2

22

( 1)

degrees of freedom 1

n s

n

s

1)( 2

2

n

xxs

Page 31: JM Statr session 13, Jan 11

Chi-square statistic to estimate Population Variance

• Extremely sensitive to the violations of the assumption that the population is normally distributed• This technique lacks robustness• Take extreme caution while constructing

confidence interval

Page 32: JM Statr session 13, Jan 11

Confidence Interval for

confidence of level 11

112

21

22

2

2

2

s

ndf

snsn

Page 33: JM Statr session 13, Jan 11

Two Table Values of χ2

0 2 4 6 8 10 12 14 16 18 20

df 0.950 0.0501 3.93219E-03 3.841462 0.102586 5.991483 0.351846 7.814724 0.710724 9.487735 1.145477 11.070486 1.63538 12.59167 2.16735 14.06718 2.73263 15.50739 3.32512 16.9190

10 3.94030 18.3070

20 10.8508 31.410421 11.5913 32.670622 12.3380 33.924523 13.0905 35.172524 13.8484 36.415025 14.6114 37.6525

df = 7

.05

df = 7

.05

.05

.95

2.16735 14.0671

Page 34: JM Statr session 13, Jan 11

Exercise in R: Confidence Intervals

Open URL: www.openintro.orgGo to Labs in R and select 4B - Confidence Levels