Two-Sample Problems – Means

1

Two-Sample Problems – Means1. Comparing two (unpaired) populations2. Assume: 2 SRSs, independent samples,

Normal populations

Make an inference for their difference: 21

Sample from population 1: 111 ,, sxn

Sample from population 2: 222 ,, sxn

2

S.E. – standard error in the two-sample process

..)( *21 EStxx

Confidence Interval: Estimate ± margin of error

testedbeing means population, 21 )(or 0: 21210 H

Significance Test:

2

22

1

21

2121

..0)(:StatTest

ns

ns

xxESxxt

2

22

1

21..

ns

nsES

)1,1 min( 21 nndf

3

Using the CalculatorConfidence Interval:

On calculator: STAT, TESTS, 0:2-SampTInt…

Given data, need to enter: Lists locations, C-Level

Given stats, need to enter, for each sample: x, s, n and then C-Level

Select input (Data or Stats), enter appropriate info, then Calculate

4

Using the CalculatorSignificance Test:

On calculator: STAT, TESTS, 4:2-SampT –Test…

Given data, need to enter: Lists locations, Ha

Given stats, need to enter, for each sample : x, s, nand then Ha

Select input (Data or Stats), enter appropriate info, then Calculate or Draw

Output: Test stat, p-value

5

Ex 1. Is one model of camp stove any different at boiling water than another at the 5% significance level?

::0

aHH

t:statTest

Model 1: 5.2,4.11,10 111 sxnModel 2: 0.3,9.9,12 222 sxn

:value p

6

Ex 2. Is there evidence that children get more REM sleep than adults at the 1% significance level?

::0

aHH

t:statTest

Children: 5.0,8.2,11 111 sxnAdults: 7.0,1.2,13 222 sxn

:value p

7

Ex 3. Create a 98% C.I for estimating the mean differencein petal lengths (in cm) for two species of iris.

Iris virginica: 55.0,48.5,35 111 sxnIris setosa: 21.0,49.1,38 222 sxn

:error ofmargin

:Interval-t

8

Ex 4. Is one species of iris any different at petal length than another at the 2% significance level?

::0

aHH

t:statTest

Iris virginica: 55.0,48.5,35 111 sxn

Iris setosa: 21.0,49.1,38 222 sxn

-2 0 1 2 3-3-4 4-1

:value p

9

Two-Sample Problems – Proportions

Make an inference for their difference: 21 pp

Sample from population 1: 111 ˆ,, pxn

Sample from population 2: 222 ˆ,, pxn

10

Using the CalculatorConfidence Interval:

On calculator: STAT, TESTS, B:2-PropZInt…

Need to enter: C-LevelEnter appropriate info, then Calculate.

..)ˆˆ( *21 ESzpp

Estimate ± margin of error

,, 11 xn ,, 22 xn

11

Using the Calculator

On calculator: STAT, TESTS, 6:2-PropZTest…

Need to enter: and then Ha

Enter appropriate info, then Calculate or Draw

Output: Test stat, p-value

testedbeing sproportion population, 21 pp

)(or 0: 21210 ppppH

Significance Test:

,, 11 xn ,, 22 xn

12

Ex 5. Create a 95% C.I for the difference in proportions of eggs hatched.Nesting boxes apart/hidden: )(270,478 11 hatchedxn

Nesting boxes close/visible: )(270,805 22 hatchedxn

:error ofmargin

:Interval-z

13

Ex 6. Split 1100 potential voters into two groups, those who get a reminder to register and those who do not.

Of the 600 who got reminders, 332 registered.Of the 500 who got no reminders, 248 registered.

Is there evidence at the 1% significance level that the proportion of potential voters who registered was greater than in the group that received reminders?

Group 1: 332,600 11 xn

Group 2: 248,500 22 xn

14

Ex 6. (continued)

::0

aHH

z:statTest

:value p

15

Ex 7. “Can people be trusted?”Among 250 18-25 year olds, 45 said “yes”.Among 280 35-45 year olds, 72 said “yes”.

Does this indicate that the proportion of trusting people is higher in the older population? Use a significance level of α = .05.

Group 1: 45,250 11 xn

Group 2: 72,280 22 xn

16

Ex 7. (continued)

::0

aHH

z:statTest

:value p

17

Scatterplots & Correlation

Each individual in the population/sample will have two characteristics looked at, instead of one.

Goal: able to make accurate predictions for one variable in terms of another variable based on a data set of paired values.

18

Variables

Explanatory (independent) variable, x, is used to predict a response.Response (dependent) variable, y, will be the outcome from a study or experiment.

height vs. weight, age vs. memory, temperature vs. sales

19

ScatterplotsPlot of paired values helps to determine if a

relationship exists.

Ex: variables – height(in), weight (lb)Height Weight

72 171

65 150

68 180

70 180

72 185

66 165 65 66 70 72

190

150

170

68

20

Scatterplots - FeaturesDirection: negative, positiveForm: line, parabola, wave(sine)Strength: how close to following a pattern

Direction:

65 66 70 72

190

150

170Form:

Strength:

21

Scatterplots – Temp vs Oil used

Direction:

20 30 70 90

45

25

35Form:

Strength:

22

CorrelationCorrelation, r, measures the strength of the linear

relationship between two variables.

r > 0: positive directionr < 0: negative direction

Close to +1:Close to -1:Close to 0:

23.85, -.02, .13, -.79

24

Lines - Review

y = a + bx

1 2 3 4

3

12

a:

b:

xy232

25

Regression

Looking at a scatterplot, if form seems linear, then use a linear model or regression line to describe how a response variable y changes as an explanatory variable changes.

Regression models are often used to predict the value of a response variable for a given explanatoryvariable.

26

Least-Squares Regression Line

The line that best fits the data:

where:

bxay ˆ

x

y

ss

rb

xbya

27

ExampleFat and calories for 11 fast food chicken sandwichesFat: Calories:

8.9,6.20 xsx2.144,7.472 ysy 947.r

ExampleFat and calories for 11 fast food chicken sandwichesFat:

Fat

Calories

Calories: 8.9,6.20 xsx

2.144,7.472 ysy947.r

28

29

Example-continuedxy 93.1365.185ˆ

What is the slope and what does it mean?

What is the intercept and what does it mean?

How many calories would you predict a sandwich with 40 grams of fat has?

30

Why “Least-squares”?The least-squares lines is the line that minimizesthe sum of the squared residuals.Residual: difference between actual and predicted

1 3

27

9

18

x y

1 10 14 -4

3 25 24 1

… … … …

yy ˆy

31

Scatterplots – Residuals

To double-check the appropriateness of using a linear regression model, plot residuals against the explanatory variable.

No unusual patterns means good linear relationship.

32

Other things to look forSquared correlation, r2, give the percent of

variation explained by the regression line.

947.rChicken data:

33

Other things to look forInfluential observations:

Prediction vs. Causation:x and y are linked (associated) somehow butwe don’t say “x causes y to occur”. Other forces may be

causing the relationship (lurking variables).

34

Extrapolation: using the regression for a prediction outside of the range of values for the explanatory variables.

age weight

20 180

25 190

32 190

36 200

36 225

40 215

47 220

15 20 25 30 35 40 45 50160

170

180

190

200

210

220

230

f(x) = 1.61262304574406 x + 148.488708743486R² = 0.715721941417884

yLinear (y)

age

weight

35

On calculatorSet up: 2nd 0(catalog), x-1(D), scroll down to “Diagnostic On”, Enter, Enter

Scatterplots: 2nd Y=(Stat Plot), 1, On, Select TypeAnd list locations for x values and y valuesThen, ZOOM, 9(Zoom Stat)

Regression: STAT, CALC, 8: LinReg (a + bx), enter, List location for x, list location for y, enterGraph: Y=, enter line into Y1

36

Examples:Cat Chick Dog Duck Goat Lion Bird Pig Bun

nySquirrel

x 63 22 63 28 151 108 18 115 31 44 Incubation, days

y 11 7.5 11 10 12 10 8 10 7 9 Lifespan, years

x 2 5 2 5 4 5 1 1 4 2 6 1 age, years

y 16 11 17 10 12 11 20 19 10 16 11 20 resale, thousands $

37

Contingency Tables

• Contingency tables summarize all outcomes – Row variable: one row for each possible value– Column variable: one column for each possible value– Each cell (i,j) describes number of individuals with those values

for the respective variables.

Making comparisons between two categorical variables

Age\Income <15 15-30 >30 Total

<21 5 3 1 9

21-25 4 9 6 19

>25 2 2 8 12

Total 11 14 15 40

38

• Info from the table– # who are over 25 and make under $15,000:

– % who are over 25 and make under $15,000:

– % who are over 25:

– % of the over 25 who make under $15,000:


<21 5 3 1 9

21-25 4 9 6 19

>25 2 2 8 12

Total 11 14 15 40

39

Marginal Distributions– Look to margins of tables for individual variable’s distribution

– Marginal distribution for age:

– Marginal distribution for income:

Age\Income <15 15-30 >30 Total<21 5 3 1 9

21-25 4 9 6 19>25 2 2 8 12

Total 11 14 15 40

Age Freq. Rel. Freq<21 9

21-25 19>25 12

Total 40

Income <15 15-30 >30 TotalFreq. 11 14 15 40

Rel. Freq.

40

Conditional Distributions– Look at one variable’s distribution given another– How does income vary over the different age groups?– Consider each age group as a separate population and compute

relative frequencies:Age\Income <15 15-30 >30 Total

<21 5 3 1 9

21-25 4 9 6 19

>25 2 2 8 12


<21

21-25

>25

41

Independence RevisitedTwo variables are independent if knowledge of onedoes not affect the chances of the other.

In terms of contingency tables, this means that the conditional distribution of one variable is (almost) the same for all values of the other variable.

In the age/income example, the conditionals are not even close. These variables are not independent. There is some association between age and income.

42

Test for IndependenceIs there an association between two variable?– H0: The variables are( The two variables are )– Ha: The variables(The two variables are )

Assuming independence:– Expected number in each cell (i, j):(% of value i for variable 1)x(% of j value for variable 2)x

(sample size) =

43

Example of Computing Expected ValuesRh\Blood A B AB O Total

+ 176 28 22 198 424- 30 12 4 30 76

Total 206 40 26 228 500

Expected number in cell (A, +):

Rh\Blood A B AB O Total+ 22.048 193.344 424- 3.952 34.656 76

Total 206 40 26 228 500

44

Chi-square statisticTo measure the difference between the observed

table and the expected table, we use the chi-square test statistic:

where the summation occurs for each cell in the table.

count expectedcount expected count observed 2

2

1. Skewed right2. df = (r – 1)(c – 1)3. Right-tailed test

45

Test for Independence – Steps State variables being tested

State hypotheses: H0, the null hypothesis, vars independentHa, the alternative, vars not independent

Compute test statistic: if the null hypothesis is true, where does the sample fall? Test stat = X2-score

Compute p-value: what is the probability of seeing a test stat as extreme (or more extreme) as that?

Conclusion: small p-values lead to strong evidence against H0.

46

ST – on the calculator

On calculator: STAT, TESTS, C:X2 –Test Observed: [A]Expected: [B]

Enter observed info into matrix A, then perform test with Calculate or Draw.

Output: Test stat, p-value, df

To enter observed info into matrix A: 2nd, x-1 (Matrix), EDIT, 1: A, change dimensions, enter info in each cell.

47

Ex . Test whether type and rh factor are independent at a 5% significance level.

::0

aHH

2:statTest :value p

:conclusion

48

Ex . Test whether age and stance on marijuana legalization are associated.

::0

aHH

2:statTest :value p:conclusion

stance\age 18-29 30-49 50- Totalfor 172 313 258 743

against 52 103 119 274Total 224 416 377 1017

49

Additional Examplespersonality\college Health Science Lib Arts Educator

extrovert 68 56 62 47introvert 94 81 45 66

Job grade\marital status Single Married Divorced1 58 874 152 222 3450 603 74 1204 93

City size\practice status Government Judicial Private Salaried<250,000 30 44 258 36

250-500,000 79 102 651 90>500,000 22 34 127 23

Documents

Two-Sample Problems – Means