Upload
kosey
View
69
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Two-Sample Problems – Means. Comparing two (unpaired) populations Assume: 2 SRSs, independent samples, Normal populations. Make an inference for their difference:. Sample from population 1:. Sample from population 2:. S.E. – standard error in the two-sample process. Confidence Interval:. - PowerPoint PPT Presentation
Citation preview
1
Two-Sample Problems – Means1. Comparing two (unpaired) populations2. Assume: 2 SRSs, independent samples,
Normal populations
Make an inference for their difference: 21
Sample from population 1: 111 ,, sxn
Sample from population 2: 222 ,, sxn
2
S.E. – standard error in the two-sample process
..)( *21 EStxx
Confidence Interval: Estimate ± margin of error
testedbeing means population, 21 )(or 0: 21210 H
Significance Test:
2
22
1
21
2121
..0)(:StatTest
ns
ns
xxESxxt
2
22
1
21..
ns
nsES
)1,1 min( 21 nndf
3
Using the CalculatorConfidence Interval:
On calculator: STAT, TESTS, 0:2-SampTInt…
Given data, need to enter: Lists locations, C-Level
Given stats, need to enter, for each sample: x, s, n and then C-Level
Select input (Data or Stats), enter appropriate info, then Calculate
4
Using the CalculatorSignificance Test:
On calculator: STAT, TESTS, 4:2-SampT –Test…
Given data, need to enter: Lists locations, Ha
Given stats, need to enter, for each sample : x, s, nand then Ha
Select input (Data or Stats), enter appropriate info, then Calculate or Draw
Output: Test stat, p-value
5
Ex 1. Is one model of camp stove any different at boiling water than another at the 5% significance level?
::0
aHH
t:statTest
Model 1: 5.2,4.11,10 111 sxnModel 2: 0.3,9.9,12 222 sxn
:value p
6
Ex 2. Is there evidence that children get more REM sleep than adults at the 1% significance level?
::0
aHH
t:statTest
Children: 5.0,8.2,11 111 sxnAdults: 7.0,1.2,13 222 sxn
:value p
7
Ex 3. Create a 98% C.I for estimating the mean differencein petal lengths (in cm) for two species of iris.
Iris virginica: 55.0,48.5,35 111 sxnIris setosa: 21.0,49.1,38 222 sxn
:error ofmargin
:Interval-t
8
Ex 4. Is one species of iris any different at petal length than another at the 2% significance level?
::0
aHH
t:statTest
Iris virginica: 55.0,48.5,35 111 sxn
Iris setosa: 21.0,49.1,38 222 sxn
-2 0 1 2 3-3-4 4-1
:value p
9
Two-Sample Problems – Proportions
Make an inference for their difference: 21 pp
Sample from population 1: 111 ˆ,, pxn
Sample from population 2: 222 ˆ,, pxn
10
Using the CalculatorConfidence Interval:
On calculator: STAT, TESTS, B:2-PropZInt…
Need to enter: C-LevelEnter appropriate info, then Calculate.
..)ˆˆ( *21 ESzpp
Estimate ± margin of error
,, 11 xn ,, 22 xn
11
Using the Calculator
On calculator: STAT, TESTS, 6:2-PropZTest…
Need to enter: and then Ha
Enter appropriate info, then Calculate or Draw
Output: Test stat, p-value
testedbeing sproportion population, 21 pp
)(or 0: 21210 ppppH
Significance Test:
,, 11 xn ,, 22 xn
12
Ex 5. Create a 95% C.I for the difference in proportions of eggs hatched.Nesting boxes apart/hidden: )(270,478 11 hatchedxn
Nesting boxes close/visible: )(270,805 22 hatchedxn
:error ofmargin
:Interval-z
13
Ex 6. Split 1100 potential voters into two groups, those who get a reminder to register and those who do not.
Of the 600 who got reminders, 332 registered.Of the 500 who got no reminders, 248 registered.
Is there evidence at the 1% significance level that the proportion of potential voters who registered was greater than in the group that received reminders?
Group 1: 332,600 11 xn
Group 2: 248,500 22 xn
14
Ex 6. (continued)
::0
aHH
z:statTest
:value p
15
Ex 7. “Can people be trusted?”Among 250 18-25 year olds, 45 said “yes”.Among 280 35-45 year olds, 72 said “yes”.
Does this indicate that the proportion of trusting people is higher in the older population? Use a significance level of α = .05.
Group 1: 45,250 11 xn
Group 2: 72,280 22 xn
16
Ex 7. (continued)
::0
aHH
z:statTest
:value p
17
Scatterplots & Correlation
Each individual in the population/sample will have two characteristics looked at, instead of one.
Goal: able to make accurate predictions for one variable in terms of another variable based on a data set of paired values.
18
Variables
Explanatory (independent) variable, x, is used to predict a response.Response (dependent) variable, y, will be the outcome from a study or experiment.
height vs. weight, age vs. memory, temperature vs. sales
19
ScatterplotsPlot of paired values helps to determine if a
relationship exists.
Ex: variables – height(in), weight (lb)Height Weight
72 171
65 150
68 180
70 180
72 185
66 165 65 66 70 72
190
150
170
68
20
Scatterplots - FeaturesDirection: negative, positiveForm: line, parabola, wave(sine)Strength: how close to following a pattern
Direction:
65 66 70 72
190
150
170Form:
Strength:
21
Scatterplots – Temp vs Oil used
Direction:
20 30 70 90
45
25
35Form:
Strength:
22
CorrelationCorrelation, r, measures the strength of the linear
relationship between two variables.
r > 0: positive directionr < 0: negative direction
Close to +1:Close to -1:Close to 0:
23.85, -.02, .13, -.79
24
Lines - Review
y = a + bx
1 2 3 4
3
12
a:
b:
xy232
25
Regression
Looking at a scatterplot, if form seems linear, then use a linear model or regression line to describe how a response variable y changes as an explanatory variable changes.
Regression models are often used to predict the value of a response variable for a given explanatoryvariable.
26
Least-Squares Regression Line
The line that best fits the data:
where:
bxay ˆ
x
y
ss
rb
xbya
27
ExampleFat and calories for 11 fast food chicken sandwichesFat: Calories:
8.9,6.20 xsx2.144,7.472 ysy 947.r
ExampleFat and calories for 11 fast food chicken sandwichesFat:
Fat
Calories
Calories: 8.9,6.20 xsx
2.144,7.472 ysy947.r
28
29
Example-continuedxy 93.1365.185ˆ
What is the slope and what does it mean?
What is the intercept and what does it mean?
How many calories would you predict a sandwich with 40 grams of fat has?
30
Why “Least-squares”?The least-squares lines is the line that minimizesthe sum of the squared residuals.Residual: difference between actual and predicted
1 3
27
9
18
x y
1 10 14 -4
3 25 24 1
… … … …
yy ˆy
31
Scatterplots – Residuals
To double-check the appropriateness of using a linear regression model, plot residuals against the explanatory variable.
No unusual patterns means good linear relationship.
32
Other things to look forSquared correlation, r2, give the percent of
variation explained by the regression line.
947.rChicken data:
33
Other things to look forInfluential observations:
Prediction vs. Causation:x and y are linked (associated) somehow butwe don’t say “x causes y to occur”. Other forces may be
causing the relationship (lurking variables).
34
Extrapolation: using the regression for a prediction outside of the range of values for the explanatory variables.
age weight
20 180
25 190
32 190
36 200
36 225
40 215
47 220
15 20 25 30 35 40 45 50160
170
180
190
200
210
220
230
f(x) = 1.61262304574406 x + 148.488708743486R² = 0.715721941417884
yLinear (y)
age
weight
35
On calculatorSet up: 2nd 0(catalog), x-1(D), scroll down to “Diagnostic On”, Enter, Enter
Scatterplots: 2nd Y=(Stat Plot), 1, On, Select TypeAnd list locations for x values and y valuesThen, ZOOM, 9(Zoom Stat)
Regression: STAT, CALC, 8: LinReg (a + bx), enter, List location for x, list location for y, enterGraph: Y=, enter line into Y1
36
Examples:Cat Chick Dog Duck Goat Lion Bird Pig Bun
nySquirrel
x 63 22 63 28 151 108 18 115 31 44 Incubation, days
y 11 7.5 11 10 12 10 8 10 7 9 Lifespan, years
x 2 5 2 5 4 5 1 1 4 2 6 1 age, years
y 16 11 17 10 12 11 20 19 10 16 11 20 resale, thousands $
37
Contingency Tables
• Contingency tables summarize all outcomes – Row variable: one row for each possible value– Column variable: one column for each possible value– Each cell (i,j) describes number of individuals with those values
for the respective variables.
Making comparisons between two categorical variables
Age\Income <15 15-30 >30 Total
<21 5 3 1 9
21-25 4 9 6 19
>25 2 2 8 12
Total 11 14 15 40
38
• Info from the table– # who are over 25 and make under $15,000:
– % who are over 25 and make under $15,000:
– % who are over 25:
– % of the over 25 who make under $15,000:
Age\Income <15 15-30 >30 Total
<21 5 3 1 9
21-25 4 9 6 19
>25 2 2 8 12
Total 11 14 15 40
39
Marginal Distributions– Look to margins of tables for individual variable’s distribution
– Marginal distribution for age:
– Marginal distribution for income:
Age\Income <15 15-30 >30 Total<21 5 3 1 9
21-25 4 9 6 19>25 2 2 8 12
Total 11 14 15 40
Age Freq. Rel. Freq<21 9
21-25 19>25 12
Total 40
Income <15 15-30 >30 TotalFreq. 11 14 15 40
Rel. Freq.
40
Conditional Distributions– Look at one variable’s distribution given another– How does income vary over the different age groups?– Consider each age group as a separate population and compute
relative frequencies:Age\Income <15 15-30 >30 Total
<21 5 3 1 9
21-25 4 9 6 19
>25 2 2 8 12
Age\Income <15 15-30 >30 Total
<21
21-25
>25
41
Independence RevisitedTwo variables are independent if knowledge of onedoes not affect the chances of the other.
In terms of contingency tables, this means that the conditional distribution of one variable is (almost) the same for all values of the other variable.
In the age/income example, the conditionals are not even close. These variables are not independent. There is some association between age and income.
42
Test for IndependenceIs there an association between two variable?– H0: The variables are( The two variables are )– Ha: The variables(The two variables are )
Assuming independence:– Expected number in each cell (i, j):(% of value i for variable 1)x(% of j value for variable 2)x
(sample size) =
43
Example of Computing Expected ValuesRh\Blood A B AB O Total
+ 176 28 22 198 424- 30 12 4 30 76
Total 206 40 26 228 500
Expected number in cell (A, +):
Rh\Blood A B AB O Total+ 22.048 193.344 424- 3.952 34.656 76
Total 206 40 26 228 500
44
Chi-square statisticTo measure the difference between the observed
table and the expected table, we use the chi-square test statistic:
where the summation occurs for each cell in the table.
count expectedcount expected count observed 2
2
1. Skewed right2. df = (r – 1)(c – 1)3. Right-tailed test
45
Test for Independence – Steps State variables being tested
State hypotheses: H0, the null hypothesis, vars independentHa, the alternative, vars not independent
Compute test statistic: if the null hypothesis is true, where does the sample fall? Test stat = X2-score
Compute p-value: what is the probability of seeing a test stat as extreme (or more extreme) as that?
Conclusion: small p-values lead to strong evidence against H0.
46
ST – on the calculator
On calculator: STAT, TESTS, C:X2 –Test Observed: [A]Expected: [B]
Enter observed info into matrix A, then perform test with Calculate or Draw.
Output: Test stat, p-value, df
To enter observed info into matrix A: 2nd, x-1 (Matrix), EDIT, 1: A, change dimensions, enter info in each cell.
47
Ex . Test whether type and rh factor are independent at a 5% significance level.
::0
aHH
2:statTest :value p
:conclusion
48
Ex . Test whether age and stance on marijuana legalization are associated.
::0
aHH
2:statTest :value p:conclusion
stance\age 18-29 30-49 50- Totalfor 172 313 258 743
against 52 103 119 274Total 224 416 377 1017
49
Additional Examplespersonality\college Health Science Lib Arts Educator
extrovert 68 56 62 47introvert 94 81 45 66
Job grade\marital status Single Married Divorced1 58 874 152 222 3450 603 74 1204 93
City size\practice status Government Judicial Private Salaried<250,000 30 44 258 36
250-500,000 79 102 651 90>500,000 22 34 127 23