23
STA 291 Winter 09/10 Lecture 7 Dustin Lueker

Lecture 7 Dustin Lueker. 2 Point Estimate ◦ A single number that is the best guess for the parameter Sample mean is usually at good guess for the

Embed Size (px)

Citation preview

Page 1: Lecture 7 Dustin Lueker. 2  Point Estimate ◦ A single number that is the best guess for the parameter  Sample mean is usually at good guess for the

STA 291Winter 09/10

Lecture 7Dustin Lueker

Page 2: Lecture 7 Dustin Lueker. 2  Point Estimate ◦ A single number that is the best guess for the parameter  Sample mean is usually at good guess for the

2

Two Types of Estimators Point Estimate

◦ A single number that is the best guess for the parameter Sample mean is usually at good guess for the

population mean Interval Estimate

◦ Point estimator with error bound A range of numbers around the point estimate Gives an idea about the precision of the estimator

The proportion of people voting for A is between 67% and 73%

STA 291 Winter 09/10 Lecture 7

Page 3: Lecture 7 Dustin Lueker. 2  Point Estimate ◦ A single number that is the best guess for the parameter  Sample mean is usually at good guess for the

3

Confidence Interval Inferential statement about a parameter

should always provide the accuracy of the estimate◦ How close is the estimate likely to fall to the true

parameter value? Within 1 unit? 2 units? 10 units?

◦ This can be determined using the sampling distribution of the estimator/sample statistic

◦ In particular, we need the standard error to make a statement about accuracy of the estimator

STA 291 Winter 09/10 Lecture 7

Page 4: Lecture 7 Dustin Lueker. 2  Point Estimate ◦ A single number that is the best guess for the parameter  Sample mean is usually at good guess for the

4

Confidence Interval Range of numbers that is likely to cover (or

capture) the true parameter Probability that the confidence interval

captures the true parameter is called the confidence coefficient or more commonly the confidence level◦ Confidence level is a chosen number close to 1,

usually 0.90, 0.95 or 0.99◦ Level of significance = α = 1 – confidence level

STA 291 Winter 09/10 Lecture 7

Page 5: Lecture 7 Dustin Lueker. 2  Point Estimate ◦ A single number that is the best guess for the parameter  Sample mean is usually at good guess for the

5

Confidence Interval To calculate the confidence interval, we

use the Central Limit Theorem◦ Substituting the sample standard deviation for

the population standard deviation

Also, we need a that is determined by the confidence level

Formula for 100(1-α)% confidence interval for μ

/ 2z

n

sZx 2/

STA 291 Winter 09/10 Lecture 7

Page 6: Lecture 7 Dustin Lueker. 2  Point Estimate ◦ A single number that is the best guess for the parameter  Sample mean is usually at good guess for the

90% confidence interval◦ Confidence level of 0.90

α=.10 Zα/2=1.645

95% confidence interval◦ Confidence level of 0.95

α=.05 Zα/2=1.96

99% confidence interval◦ Confidence level of 0.99

α=.01 Zα/2=2.576

Common Confidence Levels

6STA 291 Winter 09/10 Lecture 7

Page 7: Lecture 7 Dustin Lueker. 2  Point Estimate ◦ A single number that is the best guess for the parameter  Sample mean is usually at good guess for the

This interval will contain μ with a 100(1-α)% confidence◦ If we are estimating µ, then why it is

unreasonable for us to know σ? Thus we replace σ by s (sample standard deviation) This formula is used for large sample size (n≥30)

If we have a sample size less than 30 a different distribution is used, the t-distribution, we will get to this later

Confidence Intervals

7

n

sZx 2/

STA 291 Winter 09/10 Lecture 7

Page 8: Lecture 7 Dustin Lueker. 2  Point Estimate ◦ A single number that is the best guess for the parameter  Sample mean is usually at good guess for the

Compute a 95% confidence interval for μ if we know that s=12 and the sample of size 36 yielded a mean of 7

Example

8STA 291 Winter 09/10 Lecture 7

Page 9: Lecture 7 Dustin Lueker. 2  Point Estimate ◦ A single number that is the best guess for the parameter  Sample mean is usually at good guess for the

“Probability” means that in the long run 100(1-α)% of the intervals will contain the parameter◦ If repeated samples were taken and confidence

intervals calculated then 100(1-α)% of the intervals will contain the parameter

For one sample, we do not know whether the confidence interval contains the parameter

The 100(1-α)% probability only refers to the method that is being used

Interpreting Confidence Intervals

9STA 291 Winter 09/10 Lecture 7

Page 10: Lecture 7 Dustin Lueker. 2  Point Estimate ◦ A single number that is the best guess for the parameter  Sample mean is usually at good guess for the

Interpreting Confidence Intervals

STA 291 Winter 09/10 Lecture 7 10

Page 11: Lecture 7 Dustin Lueker. 2  Point Estimate ◦ A single number that is the best guess for the parameter  Sample mean is usually at good guess for the

Incorrect statement◦ With 95% probability, the population mean will fall

in the interval from 3.5 to 5.2

To avoid the misleading word “probability” we say that we are “confident”◦ We are 95% confident that the true population

mean will fall between 3.5 and 5.2

Interpreting Confidence Intervals

11STA 291 Winter 09/10 Lecture 7

Page 12: Lecture 7 Dustin Lueker. 2  Point Estimate ◦ A single number that is the best guess for the parameter  Sample mean is usually at good guess for the

Confidence Intervals Changing our confidence level will change

our confidence interval◦ Increasing our confidence level will increase the

length of the confidence interval A confidence level of 100% would require a

confidence interval of infinite length Not informative

There is a tradeoff between length and accuracy◦ Ideally we would like a short interval with high

accuracy (high confidence level)

12STA 291 Winter 09/10 Lecture 7

Page 13: Lecture 7 Dustin Lueker. 2  Point Estimate ◦ A single number that is the best guess for the parameter  Sample mean is usually at good guess for the

The width of a confidence interval◦ as the confidence level increases◦ as the error probability decreases◦ as the standard error increases◦ as the sample size n decreases

Why?

Facts about Confidence Intervals

13STA 291 Winter 09/10 Lecture 7

Page 14: Lecture 7 Dustin Lueker. 2  Point Estimate ◦ A single number that is the best guess for the parameter  Sample mean is usually at good guess for the

Start with the confidence interval formula assuming that the population standard deviation is known

Mathematically we need to solve the above equation for n

Choice of Sample Size

14

MExn

Zx

2/

2

2/2

ME

Zn

STA 291 Winter 09/10 Lecture 7

Page 15: Lecture 7 Dustin Lueker. 2  Point Estimate ◦ A single number that is the best guess for the parameter  Sample mean is usually at good guess for the

15

Example About how large a sample would have been

adequate if we merely needed to estimate the mean to within 0.5, with 95% confidence? Assume

Note: We will always round the sample size up to ensure that we get within the desired error bound.

5

STA 291 Winter 09/10 Lecture 7

Page 16: Lecture 7 Dustin Lueker. 2  Point Estimate ◦ A single number that is the best guess for the parameter  Sample mean is usually at good guess for the

To account for the extra variability of using a sample size of less than 30 the student’s t-distribution is used instead of the normal distribution

Confidence Interval for n<30

16

n

stx 2/

STA 291 Winter 09/10 Lecture 7

Page 17: Lecture 7 Dustin Lueker. 2  Point Estimate ◦ A single number that is the best guess for the parameter  Sample mean is usually at good guess for the

t-distributions are bell-shaped and symmetric around zero

The smaller the degrees of freedom the more spread out the distribution is

t-distribution look much like normal distributions

In face, the limit of the t-distribution is a normal distribution as n gets larger

t-distribution

STA 291 Winter 09/10 Lecture 7 17

Page 18: Lecture 7 Dustin Lueker. 2  Point Estimate ◦ A single number that is the best guess for the parameter  Sample mean is usually at good guess for the

Need to know α and degrees of freedom (df)◦ df = n-1

α=.05, n=23◦ tα/2=

α=.01, n=17◦ tα/2=

α=.1, n=20◦ tα/2=

Finding tα/2

18STA 291 Winter 09/10 Lecture 7

Page 19: Lecture 7 Dustin Lueker. 2  Point Estimate ◦ A single number that is the best guess for the parameter  Sample mean is usually at good guess for the

A sample of 12 individuals yields a mean of 5.4 and a variance of 16. Estimate the population mean with 98% confidence.

Example

19STA 291 Winter 09/10 Lecture 7

Page 20: Lecture 7 Dustin Lueker. 2  Point Estimate ◦ A single number that is the best guess for the parameter  Sample mean is usually at good guess for the

Confidence Interval for a Proportion The sample proportion is an unbiased and

efficient estimator of the population proportion◦ The proportion is a special case of the mean

20

n

ppZp

)ˆ1(ˆˆ 2/

STA 291 Winter 09/10 Lecture 7

Page 21: Lecture 7 Dustin Lueker. 2  Point Estimate ◦ A single number that is the best guess for the parameter  Sample mean is usually at good guess for the

Sample Size As with a confidence interval for the sample

mean a desired sample size for a given margin of error (ME) and confidence level can be computed for a confidence interval about the sample proportion

◦ This formula requires guessing before taking the sample, or taking the safe but conservative approach of letting = .5 Why is this the worst case scenario? (conservative

approach)

21

ME

Zppn 2/)ˆ1(ˆ

STA 291 Winter 09/10 Lecture 7

Page 22: Lecture 7 Dustin Lueker. 2  Point Estimate ◦ A single number that is the best guess for the parameter  Sample mean is usually at good guess for the

Example ABC/Washington Post poll (December 2006)

◦ Sample size of 1005◦ Question

Do you approve or disapprove of the way George W. Bush is handling his job as president? 362 people approved

Construct a 95% confidence interval for p What is the margin of error?

22STA 291 Winter 09/10 Lecture 7

Page 23: Lecture 7 Dustin Lueker. 2  Point Estimate ◦ A single number that is the best guess for the parameter  Sample mean is usually at good guess for the

Example If we wanted B=2%, using the sample

proportion from the Washington Post poll, recall that the sample proportion was .36

◦ n=2212.7, so we need a sample of 2213 What do we get if we use the conservative

approach?

23

21.96

0.36 (1 0.36)0.02

n

STA 291 Winter 09/10 Lecture 7