366_8. Estimation: Chapter 8 Suppose we observe something in a random sample how confident are we in saying our observation is an accurate reflection

366_8

Estimation: Chapter 8

• Suppose we observe something in a random sample

• how confident are we in saying our observation is an accurate reflection of the population?

Estimation

• Confidence intervals– the range in which the population parameter is

estimated to be

– ‘margin of error’

– accounting for sampling error

Estimation

• Confidence intervals– We need:• standard error (we calculate this)• the estimated mean• a choice of confidence level

– 68%– 90%– 95%– 99%

Estimation

• Confidence intervals– We need:• mean (Spss gets this)• Z value for confidence level (we pick this)• standard error of the mean (we calculate this)

– Calculated with standard deviation and sample size– Larger sample, less error– Smaller standard deviation, less error

Estimation

• Confidence intervals– We need:• standard error of the mean

– s.e. = standard deviation / sqrt of N

– Example 500 students have mean of 7.5 hrs/week of commute time, std. deviation = 1.5 hrs

s.e. = 1.5 / sqrt of 500

=.067

Estimation

• Confidence intervals– We need:• standard error of the mean

– s.e. = standard deviation / sqrt of N

– Changed example 300 students have mean of 7.5 hrs/week of commute time, std. deviation = 3.5 hrs

s.e. = 3.5 / sqrt of 300

=.20

Estimation

• Confidence intervals– We need:• Select confidence interval (68%, 90%, 95%, 99%)

• 68% CI = mean +/- 1 (s.e)• 90% CI = mean +/- 1.64 (s.e)• 95% CI = mean +/- 1.96 (s.e)• 99% CI = mean +/- 2.58 (s.e)

Estimation

• Confidence intervals– 95% confidence:• CI=7.5hrs +/- 1.96 (.07) = 7.5hrs +/- 0.14 hrs = 7.36 to 7.64 hrs

We are 95% confidence that population mean is between 7.36 and 7.64

Estimation

• Confidence intervals– 68% confidence:• CI=7.5hrs +/- 1 (.07) = 7.5hrs +/- 0.07 hrs = 7.43 to 7.57 hrs


Estimation

• Confidence intervals– =99% confidence:• CI=7.5hrs +/- 2.58 (.07) = 7.5hrs +/- 0.18 hrs = 7.43 to 7.57 hrs


Estimation

• At any level of confidence– The interval is determined by sample size and the

standard deviation of the estimated mean

– More variation around mean, less confident

– Fewer observations, less confident

Estimation

• So far, we had interval data (Hours of commute)

• Works different if nominal– Approve or disapprove of Obama– A proportion (percent), not a mean– Different formula

Estimation

• Confidence interval for proportion– We need• Standard error (we calculate, again)• observed proportion• select our confidence level

Estimation

• Confidence interval for proportion– We need• Standard error

s.e.p. = sqrt [of (p)*(1-p) / n]

CI = p +/- Z (s.e.p)

Obama estimated at .46 approval in Pew Values SurveyCI = .46 +/- 1.96 (s.e.p)

Estimation

• Confidence interval for proportion– We need• Standard error

s.e.p. = sqrt [of (.46)*(1-.46) / n ]

= sqrt of ((.46)*(1-.46) / 1515 ) = sqrt of (.2484 / 1514) = .013

Obama estimated at .46 approval in Pew Values SurveyCI = .46 +/- 1.96 (.013):

Estimation

• Confidence interval for proportion– Obama estimated at .46 approval in Pew Values

Survey

CI = .46 +/- 1.96 (.013) = .46 +/- .025 95% confident population approval of Obama is between .435 and .485 or between 43.5% and 48.5%

Hypothesis Testing: Chpt 9

• Statistics test a Null Hypothesis

• The mean age for tea party supporters and non supporters is the same

• There is no difference between tea party supporters and non supporters


• Statistics test a Null Hypothesis

• Support for the Tea Party is independent of gender

• Gender does not affect support for the Tea Party


• Statistical significance

– Probability that the NULL is wrong

– Probability that nothing is going on

– Probability that an observed relationship is a sampling fluke


• Statistical significance

– We need to decide what is ‘significantly improbable’

– The level we reject the null hypothesis• happens just 5% of the time? (.05 alpha)• just 1% of the time (.01 alpha)

Statistical significance

• Type I vs Type II Errors

decision Null is true Null is false

Reject null Type I error correct decision

Retain null Correct decision Type II error

Significance (alpha) is chance of a Type I error

We want to avoid Type I errors, Type II are less dangerous:

Drug trials, criminal justice

Hypothesis Testing with t-testResearch Hypothesis (H1):

Something is going on.

There is a difference between groups, Men have higher score.H1: Xm > Xf

Null Hypothesis (H0):

There is no difference Mean for group 1 = the mean for group 2

H0: X1 = X2

Hypothesis Testing with t

Observe difference between means:

Magnitude of difference Variance in measure of X1 and X2Number of observations

What is the likelihood that such a difference would occur by chance?

T-test

• Assume– Random samples, independent of each other– Variable being compared is interval or ratio– Distributions are normal– Roughly equal variance of each group

T-test

• Decide criteria, or critical t– Alpha to reject (chance of a Type 1 error)• t= 1.65 for alpha = .10

• t= 1.96 for alpha = .05

– Directional test?• do you think value is higher/lower for a specific group?

t test

• Calculate t

t = mean1 – mean 2________________s x1-x2 ----------std. error of the difference between 2 means

this part is messy, but includes info about sample sizesand variances of each mean

t-test

• result is one value (a t-statistic) we can use to check if difference between groups is significant

• Example:– Corruption, south vs. non south• What hypothesis?

Projects

• Identify testable hypotheses– x causes y– x explains differences in y– differences in x explain y– x and y go together in some interesting way

• State null hypothesis

Mean of group 1 significantly different than mean of group 2?

Non south, x= .33; s.e. .03 South, x= .43; s.e .05

t-test

• Southern states, zoomed in....

Results ttest percap_convic, by(var82)

Two-sample t test with equal variances------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]---------+-------------------------------------------------------------------- 0 | 39 .3358051 .0313071 .1955129 .2724272 .3991831 1 | 11 .4324917 .0577252 .1914528 .303872 .5611115---------+--------------------------------------------------------------------combined | 50 .3570762 .0278429 .1968793 .3011237 .4130287---------+-------------------------------------------------------------------- diff | -.0966866 .0664606 -.2303146 .0369414------------------------------------------------------------------------------ diff = mean(0) - mean(1) t = -1.4548Ho: diff = 0 degrees of freedom = 48

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0761 Pr(|T| > |t|) = 0.1522 Pr(T > t) = 0.9239

.

SPSS Results: Age * gender

SPSS results

Results

• Note each mean is given• variation around mean is given• confidence intervals• difference between means is given (-.096)• std. error of differences btwn means given

• AND t values

Results

• Note different t values are given

• Each is for a specific hypothesis– Difference is greater than 0, positive (one tail)– Difference is greater than 0, negative (one tail)

– “Absolute difference” (two tail)

t test results

• Do we accept of reject null hypothesis?

Documents

366_8. Estimation: Chapter 8 Suppose we observe something in a random sample how confident are we in saying our observation is an accurate reflection