Upload
naoko
View
22
Download
2
Embed Size (px)
DESCRIPTION
Chi Square & Correlation. Nonparametric Test of Chi 2. Used when too many assumptions are violated in T-Tests: Sample size too small to reflect population Data are not continuous and thus not appropriate for parametric tests based on normal distributions. - PowerPoint PPT Presentation
Citation preview
Chi Square & Correlation
Nonparametric Test of Chi2
Used when too many assumptions are violated in T-Tests: Sample size too small to reflect population Data are not continuous and thus not
appropriate for parametric tests based on normal distributions.
χ2 is another way of showing that some pattern in data is not created randomly by chance.
X2 can be one or two dimensional. X2 deals with the question of whether what
we observed is different from what is expected
Calculating X2
What would a contingency table look like if no relationship exists between gender and voting for Bush? (i.e. statistical independence)
25 2525 25
Male Female
Voted for Bush
Voted for Kerry
50
50
50 50 100
NOTE: INDEPENDENT VARIABLES ON COLUMS AND DEPENDENT ON ROWS
Calculating X2
What would a contingency table look like if a perfect relationship exists between gender and voting for Bush?
50 00 50
Male Female
Voted for Bush
Voted for Kerry
Calculating the expected value
Nff
f jiij
))((^
ijf^
The expected frequency of the cell in the ith row and jth column
Fi = The total in the ith row marginalFj = The total in the jth column marginalN = The grand total, or sample size for the entire table
Expected Voted for Bush = 50x50 / 100 = 25
Nonparametric Test of Chi2
Again, the basic question is what you are observing in some given data created by chance or through some systematic process?
EEO 2
2 )(
O= Observed frequency E= Expected frequency
Nonparametric Test of Chi2
The null hypothesis we are testing here is that the proportion of occurrences in each category are equal to each other (Ho: B=K). Our research hypothesis is that they are not equal (Ha: B =K).
Given the sample size, how many cases could we expect in each category (n/#categories)? The obtained/critical value estimation will provide a coefficient and a Pr. that the results are random.
Let’s do a X2 (50-25)2/25=25 (0 - 25)2 /25=25(0 - 25)2 /25=25(50-25)2 /25=25
X2=100
50 00 50
Male Female
Voted for Bush
Voted For Kerry
What would X2 be when there is statistical independence?
Let’s corroborate with SPSS
Chi-Square Tests
.000b 1 1.000
.000 1 1.000
.000 1 1.0001.000 .579
.000 1 1.000
100
Pearson Chi-SquareContinuity Correctiona
Likelihood RatioFisher's Exact TestLinear-by-LinearAssociationN of Valid Cases
Value dfAsymp. Sig.
(2-sided)Exact Sig.(2-sided)
Exact Sig.(1-sided)
Computed only for a 2x2 tablea.
0 cells (.0%) have expected count less than 5. The minimum expected count is25.00.
b.
Chi-Square Tests
100.000b 1 .00096.040 1 .000
138.629 1 .000.000 .000
99.000 1 .000
100
Pearson Chi-SquareContinuity Correctiona
Likelihood RatioFisher's Exact TestLinear-by-LinearAssociationN of Valid Cases
Value dfAsymp. Sig.
(2-sided)Exact Sig.(2-sided)
Exact Sig.(1-sided)
Computed only for a 2x2 tablea.
0 cells (.0%) have expected count less than 5. The minimum expected count is25.00.
b.
Testing for significance How do we know if the
relationship is statistically significant?
We need to know the df (df= (R-1) (C-1) )(2-1)(2-1)= 1
We go to the X2 distribution to look for the critical value (CV= 3.84)
We conclude that the relationship gender and voting is statistically significant.
20 3030 20
Male Female
Voted for Bush
Voted forKerry
X2= 4
When is X2 appropriate to use?
X2 is perhaps the most widely used statistical technique to analyze nominal and ordinal data
Nominal X nominal (gender and voting preferences)
Nominal and ordinal (gender and opinion for W)
X2 can also be used with larger tables
Opinion of Bush
MALE FEMALE
Favorable 40 5
Indifferent 10 20
Unfavorable 15 55
45
30
70
65 80 145
(19.4)
(.88)
(8.6)
(15.8)
(.72)
(6.9)
X2=52.3 Do we reject the null hypothesis?
Correlation (Does not mean causation) We want to know how two variables
are related to each other Does eating doughnuts affect
weight? Does spending more hours studying
increase test scores? Correlation means how much two
variables overlap with each other
Types of Correlations
X (cause) Y (effect) Correlation
Values
Increases Increases Positive 0 to1
Decreases Decreases Positive 0 to 1
Increases Decreases Negative -1 to 0
Decreases Increases Negative -1 to 0
IncreaseDecreases
Does not change
Independent 0
Conceptualizing Correlation
Measuring Development
GPD POP WEIGHT
Weak
GDP EDUCATION
Strong
Correlation will be associated with what type of validity?
Correlation Coefficient
])(][)([ 2222 YYnXXn
YXXYnrxy
Log value Log sqft value2 sqft2 Val * sqft
5.13 4.02 26.3169 16.1604 20.6226
5.2 4.54 27.04 20.6116 23.608
4.53 3.53 20.5209 12.4609 15.9909
4.79 3.8 22.9441 14.44 18.202
4.78 3.86 22.8484 14.8996 18.4508
4.72 4.17 22.2784 17.3889 19.6824
29.15 23.92 141.95 95.96 116.56
Home Value & Square footage
Correlation Coefficient
])92.23()6*96.95[(])15.29()6*95.141[(
)92.23)(15.29()56.116*6(22
xyr
66.209.278.
Correlations
1 .778. .0686 6
.778 1
.068 .6 6
Pearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)N
VALUE
SQFT
VALUE SQFT
Rules of ThumbSize of correlation coefficient
General Interpretation
.8 - 1.0 Very Strong
.6 - .8 Strong
.4 - .6 Moderate
.2 - .4 Weak
.0 - .2 Very Weak or no relationship
Multiple Correlation Coefficients
Correlations
1 .784** .775** .708**. .000 .000 .000
46 46 46 46.784** 1 .669** .654**.000 . .000 .000
46 46 46 46.775** .669** 1 .895**.000 .000 . .000
46 46 46 46.708** .654** .895** 1.000 .000 .000 .
46 46 46 46
Pearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)N
VALUE
SQFT
BTH
BDR
VALUE SQFT BTH BDR
Correlation is significant at the 0.01 level (2-tailed).**.
Limitation of correlation coefficients They tell us how strong two variables are
related However, r coefficients are limited because
they cannot tell anything about: 1. Causation between X and Y 2. Marginal impact of X on Y 3. What percentage of the variation of Y is
explained by X 4. Forecasting Because of the above Ordinary Least Square
(OLS) is most useful
Do you have the BLUES? B for Best (Minimum error) L for Linear (The form of the
relationship) U for Un-bias (does the parameter
truly reflect the effect?) E for Estimator
Home value and sq. Feet
SQFT
4.64.44.24.03.83.63.4
VALU
E
5.3
5.2
5.1
5.0
4.9
4.8
4.7
4.6
4.5
XY
Does the above line meet the BLUE criteria?