Upload
brendan-gibson
View
216
Download
0
Embed Size (px)
DESCRIPTION
Estimation Confidence intervals – the range in which the population parameter is estimated to be – ‘margin of error’ – accounting for sampling error
Citation preview
366_8
Estimation: Chapter 8
• Suppose we observe something in a random sample
• how confident are we in saying our observation is an accurate reflection of the population?
Estimation
• Confidence intervals– the range in which the population parameter is
estimated to be
– ‘margin of error’
– accounting for sampling error
Estimation
• Confidence intervals– We need:• standard error (we calculate this)• the estimated mean• a choice of confidence level
– 68%– 90%– 95%– 99%
Estimation
• Confidence intervals– We need:• mean (Spss gets this)• Z value for confidence level (we pick this)• standard error of the mean (we calculate this)
– Calculated with standard deviation and sample size– Larger sample, less error– Smaller standard deviation, less error
Estimation
• Confidence intervals– We need:• standard error of the mean
– s.e. = standard deviation / sqrt of N
– Example 500 students have mean of 7.5 hrs/week of commute time, std. deviation = 1.5 hrs
s.e. = 1.5 / sqrt of 500
=.067
Estimation
• Confidence intervals– We need:• standard error of the mean
– s.e. = standard deviation / sqrt of N
– Changed example 300 students have mean of 7.5 hrs/week of commute time, std. deviation = 3.5 hrs
s.e. = 3.5 / sqrt of 300
=.20
Estimation
• Confidence intervals– We need:• Select confidence interval (68%, 90%, 95%, 99%)
• 68% CI = mean +/- 1 (s.e)• 90% CI = mean +/- 1.64 (s.e)• 95% CI = mean +/- 1.96 (s.e)• 99% CI = mean +/- 2.58 (s.e)
Estimation
• Confidence intervals– 95% confidence:• CI=7.5hrs +/- 1.96 (.07) = 7.5hrs +/- 0.14 hrs = 7.36 to 7.64 hrs
We are 95% confidence that population mean is between 7.36 and 7.64
Estimation
• Confidence intervals– 68% confidence:• CI=7.5hrs +/- 1 (.07) = 7.5hrs +/- 0.07 hrs = 7.43 to 7.57 hrs
We are 68% confidence that population mean is between 7.43 and 7.57
Estimation
• Confidence intervals– =99% confidence:• CI=7.5hrs +/- 2.58 (.07) = 7.5hrs +/- 0.18 hrs = 7.43 to 7.57 hrs
We are 99% confidence that population mean is between 7.32 and 7.68
Estimation
• At any level of confidence– The interval is determined by sample size and the
standard deviation of the estimated mean
– More variation around mean, less confident
– Fewer observations, less confident
Estimation
• So far, we had interval data (Hours of commute)
• Works different if nominal– Approve or disapprove of Obama– A proportion (percent), not a mean– Different formula
Estimation
• Confidence interval for proportion– We need• Standard error (we calculate, again)• observed proportion• select our confidence level
Estimation
• Confidence interval for proportion– We need• Standard error
s.e.p. = sqrt [of (p)*(1-p) / n]
CI = p +/- Z (s.e.p)
Obama estimated at .46 approval in Pew Values SurveyCI = .46 +/- 1.96 (s.e.p)
Estimation
• Confidence interval for proportion– We need• Standard error
s.e.p. = sqrt [of (.46)*(1-.46) / n ]
= sqrt of ((.46)*(1-.46) / 1515 ) = sqrt of (.2484 / 1514) = .013
Obama estimated at .46 approval in Pew Values SurveyCI = .46 +/- 1.96 (.013):
Estimation
• Confidence interval for proportion– Obama estimated at .46 approval in Pew Values
Survey
CI = .46 +/- 1.96 (.013) = .46 +/- .025 95% confident population approval of Obama is between .435 and .485 or between 43.5% and 48.5%
Hypothesis Testing: Chpt 9
• Statistics test a Null Hypothesis
• The mean age for tea party supporters and non supporters is the same
• There is no difference between tea party supporters and non supporters
Hypothesis Testing: Chpt 9
• Statistics test a Null Hypothesis
• Support for the Tea Party is independent of gender
• Gender does not affect support for the Tea Party
Hypothesis Testing: Chpt 9
• Statistical significance
– Probability that the NULL is wrong
– Probability that nothing is going on
– Probability that an observed relationship is a sampling fluke
Hypothesis Testing: Chpt 9
• Statistical significance
– We need to decide what is ‘significantly improbable’
– The level we reject the null hypothesis• happens just 5% of the time? (.05 alpha)• just 1% of the time (.01 alpha)
Statistical significance
• Type I vs Type II Errors
decision Null is true Null is false
Reject null Type I error correct decision
Retain null Correct decision Type II error
Significance (alpha) is chance of a Type I error
We want to avoid Type I errors, Type II are less dangerous:
Drug trials, criminal justice
Hypothesis Testing with t-testResearch Hypothesis (H1):
Something is going on.
There is a difference between groups, Men have higher score.H1: Xm > Xf
Null Hypothesis (H0):
There is no difference Mean for group 1 = the mean for group 2
H0: X1 = X2
Hypothesis Testing with t
Observe difference between means:
Magnitude of difference Variance in measure of X1 and X2Number of observations
What is the likelihood that such a difference would occur by chance?
T-test
• Assume– Random samples, independent of each other– Variable being compared is interval or ratio– Distributions are normal– Roughly equal variance of each group
T-test
• Decide criteria, or critical t– Alpha to reject (chance of a Type 1 error)• t= 1.65 for alpha = .10
• t= 1.96 for alpha = .05
– Directional test?• do you think value is higher/lower for a specific group?
t test
• Calculate t
t = mean1 – mean 2________________s x1-x2 ----------std. error of the difference between 2 means
this part is messy, but includes info about sample sizesand variances of each mean
t-test
• result is one value (a t-statistic) we can use to check if difference between groups is significant
• Example:– Corruption, south vs. non south• What hypothesis?
Projects
• Identify testable hypotheses– x causes y– x explains differences in y– differences in x explain y– x and y go together in some interesting way
• State null hypothesis
Mean of group 1 significantly different than mean of group 2?
Non south, x= .33; s.e. .03 South, x= .43; s.e .05
t-test
• Southern states, zoomed in....
Results ttest percap_convic, by(var82)
Two-sample t test with equal variances------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]---------+-------------------------------------------------------------------- 0 | 39 .3358051 .0313071 .1955129 .2724272 .3991831 1 | 11 .4324917 .0577252 .1914528 .303872 .5611115---------+--------------------------------------------------------------------combined | 50 .3570762 .0278429 .1968793 .3011237 .4130287---------+-------------------------------------------------------------------- diff | -.0966866 .0664606 -.2303146 .0369414------------------------------------------------------------------------------ diff = mean(0) - mean(1) t = -1.4548Ho: diff = 0 degrees of freedom = 48
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0761 Pr(|T| > |t|) = 0.1522 Pr(T > t) = 0.9239
.
SPSS Results: Age * gender
SPSS results
Results
• Note each mean is given• variation around mean is given• confidence intervals• difference between means is given (-.096)• std. error of differences btwn means given
• AND t values
Results
• Note different t values are given
• Each is for a specific hypothesis– Difference is greater than 0, positive (one tail)– Difference is greater than 0, negative (one tail)
– “Absolute difference” (two tail)
t test results
• Do we accept of reject null hypothesis?