76
Chi-Square Chi-Square Dr Mahmoud Alhussami Dr Mahmoud Alhussami

Chi square mahmoud

  • Upload
    -

  • View
    623

  • Download
    2

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Chi square mahmoud

Chi-SquareChi-Square

Dr Mahmoud AlhussamiDr Mahmoud Alhussami

Page 2: Chi square mahmoud

Types of Statistical TestsTypes of Statistical Tests

When running a t test and ANOVAWhen running a t test and ANOVA We compare:We compare:

Mean differences between groupsMean differences between groups We assumeWe assume

random samplingrandom sampling the groups are homogeneousthe groups are homogeneous distribution is normaldistribution is normal samples are large enough to represent population samples are large enough to represent population

(>30)(>30) DV Data: represented on an DV Data: represented on an interval or ratiointerval or ratio scale scale

These are Parametric tests!These are Parametric tests!

Page 3: Chi square mahmoud

Types of TestsTypes of Tests

When the assumptions are violated:When the assumptions are violated: Subjects were not randomly sampledSubjects were not randomly sampled DV DataDV Data: :

Ordinal (ranked)Ordinal (ranked) Nominal (categorized: types of car, levels of Nominal (categorized: types of car, levels of

education, learning styles, Likert Scale)education, learning styles, Likert Scale) The scores are greatly skewed or we have no The scores are greatly skewed or we have no

knowledge of the distributionknowledge of the distribution

We use tests that are equivalent to t test and We use tests that are equivalent to t test and ANOVAANOVA

Non-Parametric TestNon-Parametric Test!!

Page 4: Chi square mahmoud

Requirements for Chi-Requirements for Chi-Square testSquare test4

Must be a random sample from populationMust be a random sample from populationData must be in raw frequenciesData must be in raw frequenciesVariables must be independentVariables must be independentA sufficiently large sample size is required A sufficiently large sample size is required

(at least 20) (at least 20) Actual count data (not percentages) Actual count data (not percentages) Observations must be independent. Observations must be independent. Does not prove causalityDoes not prove causality..

Page 5: Chi square mahmoud

Different Scales, Different Measures Different Scales, Different Measures of Associationof Association

Scale of Both Scale of Both Variables Variables

Measures of Measures of AssociationAssociation

Nominal ScaleNominal ScalePearson Chi-Pearson Chi-Square: Square: χχ22

Ordinal ScaleOrdinal ScaleSpearman’s rhoSpearman’s rho

Interval or Ratio Interval or Ratio ScaleScale

Pearson rPearson r

Page 6: Chi square mahmoud

Chi SquareChi Square

Used when data are nominal (both IV and DV)Used when data are nominal (both IV and DV) Comparing frequencies of distributions occurring in Comparing frequencies of distributions occurring in

different categories or groupsdifferent categories or groups Tests whether group distributions are differentTests whether group distributions are different

• Shoppers’ preference for the taste of 3 brands of candyShoppers’ preference for the taste of 3 brands of candy determines the association between IV and DV by determines the association between IV and DV by

counting the frequencies of distributioncounting the frequencies of distribution• Gender relative to study preference (alone or in group)Gender relative to study preference (alone or in group)

Page 7: Chi square mahmoud

ImportantImportant

The chi square test can only be used on The chi square test can only be used on data that has the following characteristics:data that has the following characteristics:The data must be in the

form of frequencies

The frequency data must have a precise numerical value and must

be organised into categories or groups.

The total number of observations must be greater than 20.

The expected frequency in any one cell of the table must be greater than

5.

Page 8: Chi square mahmoud

FormulaFormula

χ 2 = ∑ (O – E)2

E

χ2 = The value of chi squareO = The observed valueE = The expected value∑ (O – E)2 = all the values of (O – E) squared then added together

Page 9: Chi square mahmoud

What is itWhat is it??What is itWhat is it??

Test of proportionsTest of proportions Non parametric testNon parametric test Dichotomous variables are usedDichotomous variables are used Tests the association between two Tests the association between two

factorsfactors

e.g. treatment and diseasee.g. treatment and disease

gender and mortalitygender and mortality

Page 10: Chi square mahmoud

types of chi-square analysis types of chi-square analysis techniquestechniques

Tests of IndependenceTests of Independence is a chi-square is a chi-square technique used to determine whether two technique used to determine whether two characteristics (such as food spoilage and characteristics (such as food spoilage and refrigeration temperature) are related or refrigeration temperature) are related or independent. independent.

Goodness-of-fit testGoodness-of-fit test is a chi-square test is a chi-square test technique used to study similarities between technique used to study similarities between proportions or frequencies between groupings proportions or frequencies between groupings (or classification) of categorical data (comparing (or classification) of categorical data (comparing a distribution of data with another distribution of a distribution of data with another distribution of data where the expected frequencies are data where the expected frequencies are known). known).

Page 11: Chi square mahmoud

Chi Square Test of IndependenceChi Square Test of Independence PurposePurpose

To determine if two variables of interest independent (not To determine if two variables of interest independent (not related) or are related (dependent)?related) or are related (dependent)?

When the variables are independent, we are saying that knowledge of When the variables are independent, we are saying that knowledge of one gives us no information about the other variable. When they are one gives us no information about the other variable. When they are dependent, we are saying that knowledge of one variable is helpful in dependent, we are saying that knowledge of one variable is helpful in predicting the value of the other variable.predicting the value of the other variable.

The chi-square test of independence is a test of the influence or The chi-square test of independence is a test of the influence or impact that a subject’s value on one variable has on the same impact that a subject’s value on one variable has on the same subject’s value for a second variable. subject’s value for a second variable.

Some examples where one might use the chi-squared test of Some examples where one might use the chi-squared test of independence are:independence are:

• Is level of education related to level of income?Is level of education related to level of income?• Is the level of price related to the level of quality in production?Is the level of price related to the level of quality in production?

HypothesesHypotheses The null hypothesis is that the two variables are independent. This will The null hypothesis is that the two variables are independent. This will

be true if the observed counts in the sample are similar to the be true if the observed counts in the sample are similar to the expected counts.expected counts.

• HH00: X and Y are independent: X and Y are independent• HH11: X and Y are dependent: X and Y are dependent

Page 12: Chi square mahmoud

Chi Square Test of Chi Square Test of IndependenceIndependence

Wording of Research questionsWording of Research questions Are X and Y independent?Are X and Y independent? Are X and Y related?Are X and Y related? The research hypothesis states that the The research hypothesis states that the

two variables are dependent or related. two variables are dependent or related. This will be true if the observed counts for This will be true if the observed counts for the categories of the variables in the the categories of the variables in the sample are different from the expected sample are different from the expected counts.counts.

Level of MeasurementLevel of Measurement Both X and Y are categoricalBoth X and Y are categorical

Page 13: Chi square mahmoud

AssumptionsAssumptionsChi Square Test of IndependenceChi Square Test of Independence

Each subject contributes data to only one cellEach subject contributes data to only one cell

Finite valuesFinite values Observations must be grouped in categories. No assumption is Observations must be grouped in categories. No assumption is

made about level of data. Nominal, ordinal, or interval data may be made about level of data. Nominal, ordinal, or interval data may be used with chi-square tests.used with chi-square tests.

A sufficiently large sample sizeA sufficiently large sample size In general N > 20. In general N > 20. No one accepted cutoff – the general rules areNo one accepted cutoff – the general rules are

• No cells with No cells with observed observed frequency = 0frequency = 0• No cells with the No cells with the expectedexpected frequency < 5 frequency < 5• Applying chi-square to small samples exposes the researcher to an Applying chi-square to small samples exposes the researcher to an

unacceptable rate of Type II errors.unacceptable rate of Type II errors.Note: chi-square must be calculated on actual count data, not Note: chi-square must be calculated on actual count data, not substituting percentages, which would have the effect of pretending substituting percentages, which would have the effect of pretending the sample size is 100.the sample size is 100.

Page 14: Chi square mahmoud

Chi Square Test of Goodness of FitChi Square Test of Goodness of Fit PurposePurpose

To determine whether an observed To determine whether an observed frequency distribution departs significantly frequency distribution departs significantly from a hypothesized frequency distribution.from a hypothesized frequency distribution.

This test is sometimes called a One-sample This test is sometimes called a One-sample Chi Square Test.Chi Square Test.

HypothesesHypotheses The null hypothesis is that the two variables are The null hypothesis is that the two variables are

independent. This will be true if the observed independent. This will be true if the observed counts in the sample are similar to the expected counts in the sample are similar to the expected counts.counts.• HH00: X follows the hypothesized distribution: X follows the hypothesized distribution• HH11: X deviates from the hypothesized distribution: X deviates from the hypothesized distribution

Page 15: Chi square mahmoud

Chi Square Test of Goodness of FitChi Square Test of Goodness of Fit

Sample Research QuestionsSample Research Questions Do students buy more Coke, Gatorade or Do students buy more Coke, Gatorade or

Coffee?Coffee? Does my sample contain a Does my sample contain a

disproportionate amount of Hispanics as disproportionate amount of Hispanics as compared to the population of the county compared to the population of the county from which they were sampled?from which they were sampled?

Has the ethnic composition of the city of Has the ethnic composition of the city of Amman changed since 1990?Amman changed since 1990?

Level of MeasurementLevel of Measurement X is categoricalX is categorical

Page 16: Chi square mahmoud

AssumptionsAssumptionsChi Square Test of Goodness of FitChi Square Test of Goodness of Fit The research question involves the comparison of the The research question involves the comparison of the

observed frequency of one categorical variable within a observed frequency of one categorical variable within a sample to the expected frequency of that variable.sample to the expected frequency of that variable.

The observed and theoretical distributions must contain The observed and theoretical distributions must contain the same divisions the same divisions (i.e. ‘levels’ or ‘classes’)(i.e. ‘levels’ or ‘classes’)

The expected frequency in each division must be >5The expected frequency in each division must be >5

There must be a sufficient sample (in general N>20)There must be a sufficient sample (in general N>20)

Page 17: Chi square mahmoud

Steps in Test of HypothesisSteps in Test of Hypothesis

1.1. Determine the appropriate test Determine the appropriate test 2.2. Establish the level of significance:Establish the level of significance:αα3.3. Formulate the statistical hypothesisFormulate the statistical hypothesis4.4. Calculate the test statisticCalculate the test statistic5.5. Determine the degree of freedomDetermine the degree of freedom6.6. Compare computed test statistic against a Compare computed test statistic against a

tabled/critical valuetabled/critical value

Page 18: Chi square mahmoud

11 . .Determine Appropriate TestDetermine Appropriate Test Chi Square is used when both variables are Chi Square is used when both variables are

measured on a nominal scale.measured on a nominal scale. It can be applied to interval or ratio data that It can be applied to interval or ratio data that

have been categorized into a small number of have been categorized into a small number of groups.groups.

It assumes that the observations are It assumes that the observations are randomly sampled from the population.randomly sampled from the population.

All observations are independent (an All observations are independent (an individual can appear only once in a table and individual can appear only once in a table and there are no overlapping categories).there are no overlapping categories).

It does not make any assumptions about the It does not make any assumptions about the shape of the distribution nor about the shape of the distribution nor about the homogeneity of variances.homogeneity of variances.

Page 19: Chi square mahmoud

22 . .Establish Level of Establish Level of SignificanceSignificance

αα is a predetermined value is a predetermined value The conventionThe convention

• αα = .05 = .05• αα = .01 = .01 • αα = .001 = .001

Page 20: Chi square mahmoud

33 . .Determine The Hypothesis:Determine The Hypothesis:Whether There is an Whether There is an Association or NotAssociation or Not

HHoo : The two variables are independent : The two variables are independent

HHaa : The two variables are associated : The two variables are associated

Page 21: Chi square mahmoud

44 . .Calculating Test StatisticsCalculating Test Statistics Contrasts Contrasts observedobserved frequencies in each cell of a frequencies in each cell of a

contingency table with contingency table with expectedexpected frequencies. frequencies. The expected frequencies represent the number The expected frequencies represent the number

of cases that would be found in each cell if the of cases that would be found in each cell if the null hypothesis were true ( i.e. the nominal null hypothesis were true ( i.e. the nominal variables are unrelated).variables are unrelated).

Expected frequency of two unrelated events is Expected frequency of two unrelated events is product of the row and column frequency divided product of the row and column frequency divided by number of cases.by number of cases.

FFee= F= Fr r FFc c / N/ N

Expected frequency = row total x column total

Grand total

Page 22: Chi square mahmoud

44 . .Calculating Test StatisticsCalculating Test Statistics

e

eo

F

FF 22 )(

Page 23: Chi square mahmoud

44 . .Calculating Test StatisticsCalculating Test Statistics

e

eo

F

FF 22 )(

Observed

frequencies

Expe

cted

fre

quen

cy

Expected

frequency

Page 24: Chi square mahmoud

5. Determine Degrees of 5. Determine Degrees of FreedomFreedomdf df = = (R-1)(C-1) (R-1)(C-1)

Num

ber of

levels in column

variable

Num

ber of levels in row

variable

Page 25: Chi square mahmoud

66 . .Compare computed test statistic Compare computed test statistic against a tabled/critical valueagainst a tabled/critical value

The computed value of the Pearson chi- The computed value of the Pearson chi- square statistic is compared with the critical square statistic is compared with the critical value to determine if the computed value is value to determine if the computed value is improbableimprobable

The critical tabled values are based on The critical tabled values are based on sampling distributions of the Pearson chi-sampling distributions of the Pearson chi-square statisticsquare statistic

If calculated If calculated 22 is greater than is greater than 22 table value, table value, reject Hreject Hoo

Page 26: Chi square mahmoud

Decision and InterpretationDecision and Interpretation

If the probability of the test statistic is less than or If the probability of the test statistic is less than or equal to the probability of the alpha error rate, we equal to the probability of the alpha error rate, we reject the null hypothesis and conclude that our reject the null hypothesis and conclude that our data supports the research hypothesis. We data supports the research hypothesis. We conclude that there is a relationship between the conclude that there is a relationship between the variables.variables.

If the probability of the test statistic is greater than If the probability of the test statistic is greater than the probability of the alpha error rate, we fail to the probability of the alpha error rate, we fail to reject the null hypothesis. We conclude that there reject the null hypothesis. We conclude that there is no relationship between the variables, i.e. they is no relationship between the variables, i.e. they are independent. are independent.

Page 27: Chi square mahmoud

ExampleExample

Suppose a researcher is interested in Suppose a researcher is interested in voting preferences on gun control issues.voting preferences on gun control issues.

A questionnaire was developed and sent A questionnaire was developed and sent to a random sample of 90 voters. to a random sample of 90 voters.

The researcher also collects information The researcher also collects information about the political party membership of the about the political party membership of the sample of 90 respondents.sample of 90 respondents.

Page 28: Chi square mahmoud

Bivariate Frequency Table or Bivariate Frequency Table or Contingency TableContingency Table

FavorFavorNeutralNeutralOpposeOpposef f rowrow

DemocratDemocrat1010101030305050

RepublicanRepublican1515151510104040

f f columncolumn252525254040n = 90n = 90

Page 29: Chi square mahmoud

Bivariate Frequency Table or Bivariate Frequency Table or Contingency TableContingency Table

FavorFavorNeutralNeutralOpposeOpposef f rowrow

DemocratDemocrat1010101030305050

RepublicanRepublican1515151510104040

f f columncolumn252525254040n = 90n = 90

Observ

ed

frequ

encie

s

Page 30: Chi square mahmoud

Bivariate Frequency Table or Bivariate Frequency Table or Contingency TableContingency Table

FavorFavorNeutralNeutralOpposeOpposef f rowrow

DemocratDemocrat1010101030305050

RepublicanRepublican1515151510104040

f f columncolumn252525254040n = 90n = 90

Row

frequency

Page 31: Chi square mahmoud

Bivariate Frequency Table or Bivariate Frequency Table or Contingency TableContingency Table

FavorFavorNeutralNeutralOpposeOpposef f rowrow

DemocratDemocrat1010101030305050

RepublicanRepublican1515151510104040

f f columncolumn252525254040n = 90n = 90Column frequency

Page 32: Chi square mahmoud

11 . .Determine Appropriate TestDetermine Appropriate Test

1.1. Party Membership ( 2 levels) and Party Membership ( 2 levels) and NominalNominal

2.2. Voting Preference ( 3 levels) and Voting Preference ( 3 levels) and NominalNominal

Page 33: Chi square mahmoud

2. Establish Level of 2. Establish Level of SignificanceSignificance

Alpha of .05 Alpha of .05

Page 34: Chi square mahmoud

33 . .Determine The HypothesisDetermine The Hypothesis

• Ho : There is no difference between D & R Ho : There is no difference between D & R in their opinion on gun control issue.in their opinion on gun control issue.

• Ha : There is an association between Ha : There is an association between responses to the gun control survey and responses to the gun control survey and the party membership in the population.the party membership in the population.

Page 35: Chi square mahmoud

44 . .Calculating Test StatisticsCalculating Test Statistics

FavorFavorNeutralNeutralOpposeOpposef f rowrow

DemocratDemocratffoo =10 =10

ffee =13.9 =13.9

ffoo =10 =10

ffee =13.9 =13.9

ffoo =30 =30

ffee=22.2=22.2

5050

RepublicanRepublicanffoo =15 =15

ffee =11.1 =11.1

ffoo =15 =15

ffee =11.1 =11.1

ffoo =10 =10

ffee =17.8 =17.8

4040

f f columncolumn252525254040n = 90n = 90

Page 36: Chi square mahmoud

44 . .Calculating Test StatisticsCalculating Test Statistics

FavorFavorNeutralNeutralOpposeOpposef f rowrow

DemocratDemocratffoo =10 =10

ffee = =13.913.9

ffoo =10 =10

ffee =13.9 =13.9

ffoo =30 =30

ffee=22.2=22.2

5050

RepublicaRepublicann

ffoo =15 =15

ffee =11.1 =11.1

ffoo =15 =15

ffee =11.1 =11.1

ffoo =10 =10

ffee =17.8 =17.8

4040

f f columncolumn252525254040n = 90n = 90

= 50*25/90

Page 37: Chi square mahmoud

44 . .Calculating Test StatisticsCalculating Test Statistics

FavorFavorNeutralNeutralOpposeOpposef f rowrow

DemocratDemocratffoo =10 =10

ffee =13.9 =13.9

ffoo =10 =10

ffee =13.9 =13.9

ffoo =30 =30

ffee=22.2=22.2

5050

RepublicaRepublicann

ffoo =15 =15

ffee = =11.111.1

ffoo =15 =15

ffee =11.1 =11.1

ffoo =10 =10

ffee =17.8 =17.8

4040

f f columncolumn252525254040n = 90n = 90

= 40* 25/90

Page 38: Chi square mahmoud

44 . .Calculating Test StatisticsCalculating Test Statistics

8.17

)8.1710(

11.11

)11.1115(

11.11

)11.1115(

2.22

)2.2230(

89.13

)89.1310(

89.13

)89.1310(

222

2222

= 11.03

Page 39: Chi square mahmoud

5. Determine Degrees of 5. Determine Degrees of FreedomFreedom

df = (R-1)(C-1) =df = (R-1)(C-1) =(2-1)(3-1) = 2(2-1)(3-1) = 2

Page 40: Chi square mahmoud

66 . .Compare computed test statistic Compare computed test statistic against a tabled/critical valueagainst a tabled/critical value

αα = 0.05 = 0.05 df = 2df = 2 Critical tabled value = 5.991Critical tabled value = 5.991 Test statistic, Test statistic, 11.03,11.03, exceeds critical value exceeds critical value Null hypothesis is rejectedNull hypothesis is rejected Democrats & Republicans differ Democrats & Republicans differ

significantly in their opinions on gun significantly in their opinions on gun control issuescontrol issues

Page 41: Chi square mahmoud

SPSS Output for Gun Control SPSS Output for Gun Control ExampleExample

Chi-Square Tests

11.025a 2 .004

11.365 2 .003

8.722 1 .003

90

Pearson Chi-Square

Likelihood Ratio

Linear-by-LinearAssociation

N of Valid Cases

Value dfAsymp. Sig.

(2-sided)

0 cells (.0%) have expected count less than 5. Theminimum expected count is 11.11.

a.

Page 42: Chi square mahmoud

Additional Information in SPSS Additional Information in SPSS OutputOutput

Exceptions that might distort Exceptions that might distort χχ22 AssumptionsAssumptions Associations in some but not all categoriesAssociations in some but not all categories Low expected frequency per cellLow expected frequency per cell

Extent of association is not same as Extent of association is not same as statistical significancestatistical significance

Demonstratedthrough an example

Page 43: Chi square mahmoud

Another Example Heparin Lock Another Example Heparin Lock PlacementPlacement

Complication Incidence * Heparin Lock Placement Time Group Crosstabulation

9 11 20

10.0 10.0 20.0

18.0% 22.0% 20.0%

41 39 80

40.0 40.0 80.0

82.0% 78.0% 80.0%

50 50 100

50.0 50.0 100.0

100.0% 100.0% 100.0%

Count

Expected Count

% within Heparin LockPlacement Time Group

Count

Expected Count

% within Heparin LockPlacement Time Group

Count

Expected Count

% within Heparin LockPlacement Time Group

Had Compilca

Had NO Compilca

ComplicationIncidence

Total

1 2

Heparin LockPlacement Time Group

Total

from Polit Text: Table 8-1

Time:1 = 72 hrs 2 = 96 hrs

Page 44: Chi square mahmoud

Hypotheses in Heparin Lock PlacementHypotheses in Heparin Lock Placement

HHoo:: There is no association between There is no association between

complication incidence and length of complication incidence and length of heparin lock placement. (The variables are heparin lock placement. (The variables are independent).independent).

HHaa:: There is an association between There is an association between

complication incidence and length of complication incidence and length of heparin lock placement. (The variables are heparin lock placement. (The variables are related).related).

Page 45: Chi square mahmoud

More of SPSS OutputMore of SPSS Output

Page 46: Chi square mahmoud

Pearson Chi-SquarePearson Chi-Square Pearson Chi-Square Pearson Chi-Square

= .250, p = .617= .250, p = .617 Since the p > .05, we fail Since the p > .05, we fail

to reject the null to reject the null hypothesis that the hypothesis that the complication rate is complication rate is unrelated to heparin lock unrelated to heparin lock placement time.placement time.

Continuity correction is Continuity correction is used in situations in used in situations in which the expected which the expected frequency for any cell in a frequency for any cell in a 2 by 2 table is less than 2 by 2 table is less than 10.10.

Page 47: Chi square mahmoud

More SPSS OutputMore SPSS Output

Symmetric Measures

-.050 .617

.050 .617

-.050 .100 -.496 .621c

-.050 .100 -.496 .621c

100

Phi

Cramer's V

Nominal byNominal

Pearson's RInterval by Interval

Spearman CorrelationOrdinal by Ordinal

N of Valid Cases

ValueAsymp.

Std. Errora

Approx. Tb

Approx. Sig.

Not assuming the null hypothesis.a.

Using the asymptotic standard error assuming the null hypothesis.b.

Based on normal approximation.c.

Page 48: Chi square mahmoud

Phi CoefficientPhi Coefficient

Pearson Chi-Square Pearson Chi-Square provides information provides information about the existence of about the existence of relationship between 2 relationship between 2 nominal variables, but not nominal variables, but not about the magnitude of about the magnitude of the relationshipthe relationship

Phi coefficient is the Phi coefficient is the measure of the strength measure of the strength of the associationof the association

Symmetric Measures

-.050

.050

-.050

-.050

100

Phi

Cramer's V

Nominal byNominal

Pearson's RInterval by Interval

Spearman CorrelationOrdinal by Ordinal

N of Valid Cases

Value

Not assuming the null hypothesis.a.

Using the asymptotic standard error assuming the null hypothesis.b.

Based on normal approximation.c.

N

2

Page 49: Chi square mahmoud

Cramer’s VCramer’s V When the table is larger than 2 When the table is larger than 2

by 2, a different index must be by 2, a different index must be used to measure the strength used to measure the strength of the relationship between the of the relationship between the variables. One such index is variables. One such index is Cramer’s V.Cramer’s V.

If Cramer’s V is large, it means If Cramer’s V is large, it means that there is a tendency for that there is a tendency for particular categories of the first particular categories of the first variable to be associated with variable to be associated with particular categories of the particular categories of the second variable.second variable.

Symmetric Measures

-.050

.050

-.050 .100

-.050 .100

100

Phi

Cramer's V

Nominal byNominal

Pearson's RInterval by Interval

Spearman CorrelationOrdinal by Ordinal

N of Valid Cases

ValueAsymp.

Std. Error

Not assuming the null hypothesis.a.

Using the asymptotic standard error assuming the null hypothesis.b.

Based on normal approximation.c.

)1(

2

kNV

Page 50: Chi square mahmoud

Cramer’s VCramer’s V When the table is larger than 2 When the table is larger than 2

by 2, a different index must be by 2, a different index must be used to measure the strength used to measure the strength of the relationship between the of the relationship between the variables. One such index is variables. One such index is Cramer’s V.Cramer’s V.

If Cramer’s V is large, it means If Cramer’s V is large, it means that there is a tendency for that there is a tendency for particular categories of the first particular categories of the first variable to be associated with variable to be associated with particular categories of the particular categories of the second variable.second variable.

Symmetric Measures

-.050

.050

-.050 .100

-.050 .100

100

Phi

Cramer's V

Nominal byNominal

Pearson's RInterval by Interval

Spearman CorrelationOrdinal by Ordinal

N of Valid Cases

ValueAsymp.

Std. Error

Not assuming the null hypothesis.a.

Using the asymptotic standard error assuming the null hypothesis.b.

Based on normal approximation.c.

)1(

2

kNV

Number of cases

Smallest of number of rows or columns

Page 51: Chi square mahmoud

Interpreting Cell Differences in Interpreting Cell Differences in a Chi-square Test - 1a Chi-square Test - 1

A chi-square test of independence of the relationship between sex and marital status finds a statistically significant relationship between the variables.

Page 52: Chi square mahmoud

Interpreting Cell Differences in Interpreting Cell Differences in a Chi-square Test - 2a Chi-square Test - 2

Researcher often try to identify try to identify which cell or cells are the major contributors to the significant chi-square test by examining the pattern of column percentages.

Based on the column percentages, we would identify cells on the married row and the widowed row as the ones producing the significant result because they show the largest differences: 8.2% on the married row (50.9%-42.7%) and 9.0% on the widowed row (13.1%-4.1%)

Page 53: Chi square mahmoud

Interpreting Cell Differences in Interpreting Cell Differences in a Chi-square Test - 3a Chi-square Test - 3

Using a level of significance of 0.05, the critical value for a standardized residual would be -1.96 and +1.96. Using standardized residuals, we would find that only the cells on the widowed row are the significant contributors to the chi-square relationship between sex and marital status.

If we interpreted the contribution of the married marital status, we would be mistaken. Basing the interpretation on column percentages can be misleading.

Page 54: Chi square mahmoud

Chi-Square Test of Chi-Square Test of Independence: post hoc test Independence: post hoc test

in SPSS (1)in SPSS (1)

You can conduct a chi-square test of independence in crosstabulation of SPSS by selecting:

Analyze > Descriptive Statistics > Crosstabs…

Page 55: Chi square mahmoud

Chi-Square Test of Chi-Square Test of Independence: post hoc test Independence: post hoc test

in SPSS (2)in SPSS (2)First, select and move the variables for the question to “Row(s):” and “Column(s):” list boxes.

The variable mentioned first in the problem, sex, is used as the independent variable and is moved to the “Column(s):” list box.

The variable mentioned second in the problem, [fund], is used as the dependent variable and is moved to the “Row(s)” list box.

Second, click on “Statistics…” button to request the test statistic.

Page 56: Chi square mahmoud

Chi-Square Test of Chi-Square Test of Independence: post hoc test Independence: post hoc test

in SPSS (3)in SPSS (3)

Second, click on “Continue” button to close the Statistics dialog box.

First, click on “Chi-square” to request the chi-square test of independence.

Page 57: Chi square mahmoud

Chi-Square Test of Chi-Square Test of Independence: post hoc test Independence: post hoc test

in SPSS (6)in SPSS (6)

In the table Chi-Square Tests result, SPSS also tells us that “0 cells have expected count less than 5 and the minimum expected count is 70.63”.

The sample size requirement for the chi-square test of independence is satisfied.

Page 58: Chi square mahmoud

Chi-Square Test of Chi-Square Test of Independence: post hoc test Independence: post hoc test

in SPSS (7)in SPSS (7)The probability of the chi-square test statistic (chi-square=2.821) was p=0.244, greater than the alpha level of significance of 0.05. The null hypothesis that differences in "degree of religious fundamentalism" are independent of differences in "sex" is not rejected.

The research hypothesis that differences in "degree of religious fundamentalism" are related to differences in "sex" is not supported by this analysis.

Thus, the answer for this question is False. We do not interpret cell differences unless the chi-square test statistic supports the research hypothesis.

Page 59: Chi square mahmoud

SPSS AnalysisSPSS AnalysisChi Square Test of Goodness Chi Square Test of Goodness

of Fitof Fit Analyze – Analyze –

Nonparametric Tests – Nonparametric Tests – Chi SquareChi Square

X variable goes here

If all expected frequencies are the same, click

this box

If all expected frequencies are

not the same, enter the expected value for each division

here

Page 60: Chi square mahmoud

ExamplesExamples(using the Montana.sav data)(using the Montana.sav data)

Allexpected frequencies

are the same

Allexpected frequencies

are not the same

Page 61: Chi square mahmoud

Types of Statistical InferenceTypes of Statistical Inference

Parameter estimation:Parameter estimation: It is used to estimate a population value, such as a It is used to estimate a population value, such as a

mean, relative risk index or a mean difference mean, relative risk index or a mean difference between two groups.between two groups.

Estimation can take two forms:Estimation can take two forms:• Point estimation: involves calculating a single statistic to Point estimation: involves calculating a single statistic to

estimate the parameter. E.g. mean and median. estimate the parameter. E.g. mean and median. Disadvantages: they offer no context for interpreting their Disadvantages: they offer no context for interpreting their

accuracy and a point estimate gives no information regarding accuracy and a point estimate gives no information regarding the probability that it is correct or close to the population value.the probability that it is correct or close to the population value.

• Interval estimation: is to estimate a range of values that has Interval estimation: is to estimate a range of values that has a high probability of containing the population value . a high probability of containing the population value .

Page 62: Chi square mahmoud

Interval EstimationInterval Estimation For example, it is more likely the population For example, it is more likely the population

height mean lies between 165-175cm.height mean lies between 165-175cm. Interval estimation involves constructing a Interval estimation involves constructing a

confidence interval (CI) around the point confidence interval (CI) around the point estimate.estimate.

The upper and lower limits of the CI are called The upper and lower limits of the CI are called confidence limits.confidence limits.

A CI around a sample mean communicates a A CI around a sample mean communicates a range of values for the population value, and the range of values for the population value, and the probability of being right. That is, the estimate is probability of being right. That is, the estimate is made with a certain degree of confidence of made with a certain degree of confidence of capturing the parameter. capturing the parameter.

Page 63: Chi square mahmoud

Confidence Intervals around a Confidence Intervals around a MeanMean

95% CI = (mean 95% CI = (mean ++ (1.96 x SEM) (1.96 x SEM) This statement indicates that we can be 95% confident that the This statement indicates that we can be 95% confident that the

population mean lies between the confident limits , and that these population mean lies between the confident limits , and that these limits are equal to 1.96 times the true standard error, above and limits are equal to 1.96 times the true standard error, above and below the sample mean. below the sample mean.

E.g. if the mean = 61 inches, and SEM = 1, What is 95% CI.E.g. if the mean = 61 inches, and SEM = 1, What is 95% CI. Solution: Solution: 95% CI = (61 95% CI = (61 ++ (1.96 X 1)) (1.96 X 1))

95% CI = (61 95% CI = (61 ++ 1.96) 1.96)95% CI = 59.04 95% CI = 59.04 << μ μ << 62.96 62.96

E.g. if the mean = 61 inches, and SEM = 1, What is 99% CI.E.g. if the mean = 61 inches, and SEM = 1, What is 99% CI. Solution: Solution: 99% CI = (61 99% CI = (61 ++ (2.58 X 1)) (2.58 X 1))

99% CI = (61 99% CI = (61 ++ 2.58) 2.58)99% CI = 58.42 99% CI = 58.42 << μ μ << 63.58 63.58

Page 64: Chi square mahmoud

Data InterpretationData Interpretation Consideration:Consideration:

1.1. AccuracyAccuracy1.1. critical view of the data critical view of the data

2.2. investigating evidence of the resultsinvestigating evidence of the results

3.3. consider other studies’ resultsconsider other studies’ results

4.4. peripheral data analysisperipheral data analysis

5.5. conduct power analysis: type I & type IIconduct power analysis: type I & type II

CorrectCorrect Type-IIType-II

TypeType -I -ICorrectCorrect

True True

FalseFalse

True FalseTrue False

Page 65: Chi square mahmoud

Types of ErrorsTypes of Errors

If You……If You……When the Null When the Null Hypothesis is…Hypothesis is…

Then You Then You Have……. Have…….

Reject the null Reject the null hypothesishypothesis

True True (there really (there really

are no difference)are no difference) Made a Type I Made a Type I ErrorError

Reject the null Reject the null hypothesishypothesis

False False (there really (there really

are difference)are difference) ☻ ☻

Accept the null Accept the null hypothesishypothesis

False False (there really (there really are difference)are difference)

Made Type II Made Type II ErrorError

Accept the null Accept the null hypothesishypothesis

True True (there really (there really are no difference)are no difference)

☻☻

Page 66: Chi square mahmoud

alpha : the level of significance used for alpha : the level of significance used for establishing type-I error establishing type-I error

ββ : the probability of type-II error : the probability of type-II error 1 – 1 – ββ : is the probability of obtaining : is the probability of obtaining

significance results (significance results ( powerpower) ) Effect size: how much we can say that the Effect size: how much we can say that the

intervention made a significance difference intervention made a significance difference

Page 67: Chi square mahmoud

2. 2. Meaning of the results Meaning of the results - translation of the results and make it - translation of the results and make it

understandableunderstandable3. 3. ImportanceImportance:: - translation of the significant findings into - translation of the significant findings into

practical findings practical findings 4. 4. GeneralizabilityGeneralizability: : - how can we make the findings useful for all - how can we make the findings useful for all

the population the population 5. 5. ImplicationImplication:: - what have we learned related to what has - what have we learned related to what has

been used during study been used during study

Page 68: Chi square mahmoud

Needed ParametersNeeded Parameters

Alpha--chance of a Type I errorAlpha--chance of a Type I error Beta--chance of a Type II errorBeta--chance of a Type II error Power = 1 - betaPower = 1 - beta Effect size--difference between groups Effect size--difference between groups oror

amount of variance explained amount of variance explained oror how how much relationship there is between the DV much relationship there is between the DV and the IVs and the IVs

Page 69: Chi square mahmoud

Remember this in EnglishRemember this in English??

Type I error is when you say there is a Type I error is when you say there is a difference or relationship and there is notdifference or relationship and there is not

Type II error is when you say there is no Type II error is when you say there is no difference or relationship and there really difference or relationship and there really is is

Page 70: Chi square mahmoud

Which is more importantWhich is more important??

Type I error more important if possibility of Type I error more important if possibility of harm or lethal effectharm or lethal effect

Type II error more important in relatively Type II error more important in relatively unexplored areas of researchunexplored areas of research

In some studies, Type I and Type II errors In some studies, Type I and Type II errors may be equally importantmay be equally important

Page 71: Chi square mahmoud

How to Increase PowerHow to Increase Power

1. Increase the n1. Increase the n2. Decrease the unexplained variance--control by design or statistics 2. Decrease the unexplained variance--control by design or statistics

(e.g. ANCOVA)(e.g. ANCOVA)3. Increase alpha (controversial)3. Increase alpha (controversial)4. Use a one tailed test (directional hypothesis)--puts the zone of 4. Use a one tailed test (directional hypothesis)--puts the zone of

rejection all in one tail; same effect as increasing alpharejection all in one tail; same effect as increasing alpha5. Use parametric statistics as long as you meet the assumptions. If 5. Use parametric statistics as long as you meet the assumptions. If

not, parametric statistics are LESS powerfulnot, parametric statistics are LESS powerful6. Decrease measurement error (decrease unexplained variance)--use 6. Decrease measurement error (decrease unexplained variance)--use

more reliable instruments, standardize measurement protocol, more reliable instruments, standardize measurement protocol, frequent calibration of physiologic instruments, improve inter-rater frequent calibration of physiologic instruments, improve inter-rater reliabilityreliability

Page 72: Chi square mahmoud

What is good powerWhat is good power??

By tradition, “good” power is 80%By tradition, “good” power is 80%

The correct answer is it depends on the nature of The correct answer is it depends on the nature of the phenomenon and which kind of error is most the phenomenon and which kind of error is most important in your study. This is a theoretical important in your study. This is a theoretical argument that you have to make.argument that you have to make.

Using convention (alpha = .05 and power = .80, Using convention (alpha = .05 and power = .80, beta = .20) you are saying that Type I error is beta = .20) you are saying that Type I error is _________ as serious as a Type II error_________ as serious as a Type II error

Page 73: Chi square mahmoud

Effect SizeEffect Size

How large an effect do I expect exists in the How large an effect do I expect exists in the population if the null is false?population if the null is false?

OROR

How much of a difference do I want to be How much of a difference do I want to be able to detect?able to detect?

The larger the effect, the fewer the cases The larger the effect, the fewer the cases needed to see it. (The difference is so big needed to see it. (The difference is so big you can trip on it.)you can trip on it.)

Page 74: Chi square mahmoud

The World According to PowerThe World According to PowerKraemer & ThiemannKraemer & Thiemann

The more stringent the significance level, the greater the The more stringent the significance level, the greater the necessary sample size. More subjects are needed for a necessary sample size. More subjects are needed for a 1% level than a 5% level1% level than a 5% level

Two tailed tests require larger sample sizes than one Two tailed tests require larger sample sizes than one tailed tests. Assessing two directions at the same time tailed tests. Assessing two directions at the same time requires a greater investment.requires a greater investment.

The smaller the effect size, the larger the necessary The smaller the effect size, the larger the necessary sample size. Subtle effects require greater efforts.sample size. Subtle effects require greater efforts.

The larger the power required, the larger the necessary The larger the power required, the larger the necessary sample size. Greater protection from failure requires sample size. Greater protection from failure requires greater effort.greater effort.

The smaller the sample size, the smaller the power, ie The smaller the sample size, the smaller the power, ie the greater the chance of failurethe greater the chance of failure

Page 75: Chi square mahmoud

The World According to PowerThe World According to PowerKraemer & ThiemannKraemer & Thiemann

If one proposed to go with a sample size If one proposed to go with a sample size of 20 or fewer, you have to be willing to of 20 or fewer, you have to be willing to have a high risk of failure or a huge effect have a high risk of failure or a huge effect sizesize

To achieve 99% power for a effect size To achieve 99% power for a effect size of .01, you need > 150,000 subjectsof .01, you need > 150,000 subjects

Page 76: Chi square mahmoud

Power for each testPower for each test

You do a power analysis for each statistic You do a power analysis for each statistic you are going to use.you are going to use.

Choose the sample size based on the Choose the sample size based on the highest number of subjects from the power highest number of subjects from the power analysis. analysis.

Use the most conservative power Use the most conservative power analysis--guarantees you the most analysis--guarantees you the most subjectssubjects