Correlations and T-tests Matching level of measurement to statistical procedures

Correlations and T-tests

Matching level of measurement to statistical procedures

We can match statistical methods to the level of measurement of the two variables that we want to assess:Level of Measurement

Nominal Ordinal Interval Ratio

Nominal Chi-square

Chi-square

T-test

ANOVA

T-test

ANOVA

Ordinal Chi-square

Chi-Square

ANOVA ANOVA

Interval T-test

ANOVA

ANOVA Correlation

Regression

Correlation

Regression

Ratio T-test

ANOVA

ANOVA Correlation

Regression

Correlation

Regression

However, we should only use these tests when: We have a normal distribution for an interval

or ratio level variable. When the dependent variable (for

Correlation, T-test, ANOVA, and Regression) is interval or ratio.

When our sample has been randomly selected or is from a population.

Interpreting a Correlation from an SPSS Printout

Correlations

1 .633**. .000

474 474.633** 1.000 .474 474

Pearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)N

Educational Level (years)

Beginning Salary

EducationalLevel (years)

BeginningSalary

Correlation is significant at the 0.01 level (2-tailed).**.

A correlation is:

An association between two interval or ratio variables.

Can be positive or negative. Measures the strength of the association

between the two variables and whether it is large enough to be statistically signficant.

Can range from -1.00 to 0.00 and from 0.00 to 1.00.

Example: Types of Relationships Positive Negative No Relationship

Income

($)

Education

(yrs)

Income

($)

Education

(yrs)

Income

($)

Education

(yrs)

20,000 10 20,000 18 20,000 14

30,000 12 30,000 16 30,000 18

40,000 14 40,000 14 40,000 10

50,000 16 50,000 12 50,000 12

75,000 18 75,000 10 75,000 16

The stronger the correlation the closer it will be to 1.00 or -1.00. Weak correlations will be close to 0.00 (either positive or negative)

You can see the degree of correlation (association) by using a scatterplot graph

Current Salary

140000120000100000800006000040000200000

Educational Level (y

ears

)

22

20

18

16

14

12

10

8

6

Looking at a scatterplot from the same data set, current and beginning salary we can see a stronger correlation

Current Salary

140000120000100000800006000040000200000

Begin

nin

g S

ala

ry

100000

80000

60000

40000

20000

0

If we run the correlation between these two variables in SPSS, we find

Correlations

1 .880**. .000

474 474.880** 1.000 .474 474

Pearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)N

Beginning Salary

Current Salary

BeginningSalary

CurrentSalary

Correlation is significant at the 0.01 level (2-tailed).**.

For these two variables, if we were to test a hypothesis at Confidence Level, .01

Alternative Hypothesis:There is a positive association between beginning and current salary.

Null Hypothesis:There is no association between beginning and current salary.

Decision: r (correlation) = .88 at p. = .000. .000 is less than .01.

We reject the null hypothesis and accept the alternative hypothesis!

(Bonus Question): Why would we expect the previous correlation to be statistically significant at below the p.= .01 level?

Answer: This is a large data set N = 474 – this makes it likely that if there is a correlation, it will be statistically significant at a low significance (p) level.

Larger data sets are less likely to be affected by sampling or random error!

Other important information on correlation Correlation does not tell us if one variable “causes”

the other – so there really isn’t an independent or dependent variable.

With correlation, you should be able to draw a straight line between the highest and lowest point in the distribution. Points that are off the “best fit” line, indicate that the correlation is less than perfect (-1/+1).

Regression is the statistical method that allows us to determine whether the value of one interval/ratio level can be used to predict or determine the value of another.

Another measure of association is a t-test. T-tests Measure the association between a nominal

level variable and an interval or ratio level variable.

It looks at whether the nominal level variable causes a change in the interval/ratio variable.

Therefore the nominal level variable is always the independent variable and the interval/ratio variable is always the dependent.

Example of t-test – Self –Esteem Scores

Men Women

32 34

44 18

56 52

18 16

21 33

39 26

25 35

28 20

32.875 29.25

Important things to know about an independent samples t-test It can only be used when the nominal variable has

only two categories. Most often the nominal variable pertains to

membership in a specific demographic group or a sample.

The association examined by the independent samples t-test is whether the mean of interval/ratio variable differs significantly in each of the two groups. If it does, that means that group membership “causes” the change or difference in the mean score.

Looking at the difference in means between the two groups, can we tell if the difference is large enough to be statistically significant?

Group Statistics

258 $20301.4 ********* $567.275216 $13092.0 ********* $199.742

GenderMaleFemale

Beginning SalaryN Mean

Std.Deviation

Std. ErrorMean

T-test results

Independent Samples Test

105.969 .000 11.152 472 .000 $7,209.43 $646.447 $5939.16 $8479.70

11.987 318.818 .000 $7,209.43 $601.413 $6026.19 $8392.67

Equal variancesassumedEqual variancesnot assumed

Beginning SalaryF Sig.

Levene's Test forEquality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

Positive and Negative t-tests

Your t-test will be positive when, the lowest value category (1,2) or (0,1) is entered into the grouping menu first and the mean of that first group is higher than the second group.

Your t-test will be negative when the lowest value category is entered into the grouping menu first and the mean of the second group is higher than the first group.

Paired Samples T-Test

Used when respondents have taken both a pre and post-test using the same measurement tool (usually a standardized test).

Supplements results obtained when the mean scores for all the respondents on the post test is subtracted from the pre test scores. If there is a change in the scores from the pre test and post test, it usually means that the intervention is effective.

A statistically significant paired samples t-test usually means that the change in pre and post test score is large enough that the change can not be simply due to random or sampling error.

An important exception here is that the change in pre and post test score must be in the direction (positive/negative specified in the hypothesis).

Pair-samples t-test (continued)For example if our hypothesis states that:

Participation in the welfare reform experiment is associated with a positive change in welfare recipient wages from work and participation in the experiment actually decreased wages, then our hypothesis would not be confirmed. We would accept the null hypothesis and accept the alternative hypothesis.

Pre-test wages = Mean = $400 per month for each participant

Post-test wages = Mean = $350 per month for each participant.

However, we need to know the t-test value to know if the difference in means is large enough to be statistically significant.

What are the alternative and null hypothesis for this study?

Let’s test a hypothesis for an independent t-test We want to know if women have higher

scores on a test of exam-related anxiety than men.

The researcher has set the confidence level for this study at p. = .05.

On the SPSS printout, t=2.6, p. = .03.

What are the alternative and null hypothesis?

Can we accept or reject the null hypothesis.

Answer

Alternative hypothesis:

Women have higher levels of exam-related anxiety than men as measured by a standardized test.

Null hypothesis: There will be no difference between men and women on the standardized test of exam-related anxiety.

Reject the null hypothesis, (p = .03 is less than the confidence level of .05.) Accept the alternative hypothesis. There is a relationship.

Computing a Correlation

Select Analyze Select Correlate Select two or more variables and click add Click o.k.

Computing an independent t-test Select Analyze Select Means Select Independent T-test Select Test (Dependent Variable - must be ratio) Select Grouping Variable (must be nominal – only

two categories) Select numerical category for each group (Usually group 1 = 1, group 2 = 2)Click o.k.

Computing a paired sample t-test Select Analyze Select Compare Means Select Paired Samples T-test Highlight two interval/ratio variables – should

be from pre and post test Click on arrow Click o.k.

Data from Paired Sample T-test

Paired Samples Statistics

$34419.6 474 ********* $784.311$17016.1 474 ********* $361.510

Current SalaryBeginning Salary

Pair1

Mean NStd.

DeviationStd. Error

Mean

More data from paired samples t-test

Paired Samples Test

$17403.5 ********* $496.732 $16427.4 $18379.6 35.036 473 .000Current Salary -Beginning Salary

Pair1

MeanStd.

DeviationStd. Error

Mean Lower Upper

95% ConfidenceInterval of the

Difference

Paired Differences

t df Sig. (2-tailed)

Analysis of Variance (ANOVA) Is used when you want to compare means for

three or more groups. You have a normal distribution (random

sample or population. It can be used to determine causation. It contains an independent variable that is

nominal and a dependent variable that is interval/ratio.

Documents

Correlations and T-tests Matching level of measurement to statistical procedures