Upload
pennie
View
38
Download
2
Embed Size (px)
DESCRIPTION
Parametric versus Nonparametric Statistics-when to use them and which is more powerful?. Dr Mahmoud Alhussami. Parametric Assumptions. The observations must be independent. Dependent variable should be continuous (I/R) The observations must be drawn from normally distributed populations - PowerPoint PPT Presentation
Citation preview
Parametric versus Nonparametric Parametric versus Nonparametric Statistics-when to use them and Statistics-when to use them and
which is more powerful?which is more powerful?
Dr Mahmoud AlhussamiDr Mahmoud Alhussami
Parametric AssumptionsParametric Assumptions The observations must be independent.The observations must be independent. Dependent variable should be continuous (I/R)Dependent variable should be continuous (I/R) The observations must be drawn from normally The observations must be drawn from normally
distributed populationsdistributed populations These populations must have the same variances. These populations must have the same variances.
Equal variance (homogeneity of variance)Equal variance (homogeneity of variance) The groups should be randomly drawn from normally The groups should be randomly drawn from normally
distributed and independent populations distributed and independent populations e.g. Male X Female e.g. Male X Female Nurse X Physician Nurse X Physician Manager X Staff Manager X Staff NO OVER LAPNO OVER LAP
Parametric AssumptionsParametric Assumptions
The independent variable is categorical with The independent variable is categorical with two or more levels.two or more levels.
Distribution for the Distribution for the two or more independent two or more independent variables is normal.variables is normal.
large variation = less likely to have sig t or F large variation = less likely to have sig t or F test = accepting null hypothesis (fail to reject) test = accepting null hypothesis (fail to reject) = Type II error = a threat to power= Type II error = a threat to power
Sending an innocent to jail for no significant Sending an innocent to jail for no significant reason reason
Advantages of Parametric Advantages of Parametric TechniquesTechniques
They are more powerful and more flexible They are more powerful and more flexible than nonparametric techniques.than nonparametric techniques.
They not only allow the researcher to They not only allow the researcher to study the effect of many independent study the effect of many independent variables on the dependent variable, but variables on the dependent variable, but they also make possible the study of their they also make possible the study of their interaction. interaction.
Most of the statistical methods referred to as Most of the statistical methods referred to as parametric require the use of interval- or ratio-scaled parametric require the use of interval- or ratio-scaled data.data.
Nonparametric methods are often the only way to Nonparametric methods are often the only way to analyze nominal or ordinal data and draw statistical analyze nominal or ordinal data and draw statistical conclusions.conclusions.
Nonparametric methods require no assumptions Nonparametric methods require no assumptions about the population probability distributions.about the population probability distributions.
Nonparametric methods are often called distribution-Nonparametric methods are often called distribution-free methods.free methods.
Nonparametric methods can be used with small Nonparametric methods can be used with small samplessamples
Nonparametric MethodsNonparametric Methods
Nonparametric MethodsNonparametric Methods
In general, for a statistical method to be In general, for a statistical method to be classified as nonparametric, it must satisfy classified as nonparametric, it must satisfy at least one of the following conditions.at least one of the following conditions. The method can be used with nominal data.The method can be used with nominal data. The method can be used with ordinal data.The method can be used with ordinal data. The method can be used with interval or ratio The method can be used with interval or ratio
data when no assumption can be made about data when no assumption can be made about the population probability distribution.the population probability distribution.
Non Parametric TestsNon Parametric Tests
Do not make as many assumptions about Do not make as many assumptions about the distribution of the data as the the distribution of the data as the tt test. test. Do not require data to be Normal Do not require data to be Normal Good for data with outliersGood for data with outliers
Non-parametric tests based on ranks of Non-parametric tests based on ranks of the datathe data Work well for ordinal data (data that have a Work well for ordinal data (data that have a
defined order, but for which averages may not defined order, but for which averages may not make sense).make sense).
Nonparametric MethodsNonparametric Methods
There is at least one nonparametric test There is at least one nonparametric test equivalent to a parametric testequivalent to a parametric test
These tests fall into several categoriesThese tests fall into several categories1.1. Tests of differences between groups Tests of differences between groups
(independent samples)(independent samples)
2.2. Tests of differences between variables Tests of differences between variables (dependent samples)(dependent samples)
3.3. Tests of relationships between variablesTests of relationships between variables
Advantages of Nonparametric TechniquesAdvantages of Nonparametric Techniques
Sometimes there is noSometimes there is no parametric alternative to parametric alternative to the use of nonparametric statistics.the use of nonparametric statistics.
Certain nonparametric test can be used to Certain nonparametric test can be used to analyze nominal data.analyze nominal data.
Certain nonparametric test can be used to Certain nonparametric test can be used to analyze ordinal data.analyze ordinal data.
The computations on nonparametric statistics The computations on nonparametric statistics are usually less complicated than those for are usually less complicated than those for parametric statistics, particularly for small parametric statistics, particularly for small samples.samples.
Probability statements obtained from most Probability statements obtained from most nonparametric tests are exact probabilitiesnonparametric tests are exact probabilities ..
Advantages of Nonparametric Advantages of Nonparametric TestsTests
Treat samples made up of observations from Treat samples made up of observations from several different populations.several different populations.
Can treat data which are inherently in ranks as Can treat data which are inherently in ranks as well as data whose seemingly numerical scores well as data whose seemingly numerical scores have the strength in rankshave the strength in ranks
They are available to treat data which are They are available to treat data which are classificatoryclassificatory
Easier to learn and apply than parametric testsEasier to learn and apply than parametric tests
Disadvantages of Nonparametric Disadvantages of Nonparametric StatisticsStatistics
Nonparametric tests can be wasteful of data Nonparametric tests can be wasteful of data if parametric tests are available for use with if parametric tests are available for use with the data.the data.
Nonparametric tests are usually not as widely Nonparametric tests are usually not as widely available and well known as parametric tests.available and well known as parametric tests.
For large samples, the calculations for many For large samples, the calculations for many nonparametric statistics can be tedious.nonparametric statistics can be tedious.
Criticisms of Nonparametric Criticisms of Nonparametric ProceduresProcedures
Losing precision/wasteful of dataLosing precision/wasteful of data Low powerLow power False sense of securityFalse sense of security Lack of softwareLack of software Testing distributions onlyTesting distributions only Higher-ordered interactions not dealt withHigher-ordered interactions not dealt with
Parametric vs. Nonparametric Parametric vs. Nonparametric StatisticsStatistics
Parametric Statistics are statistical techniques based Parametric Statistics are statistical techniques based on assumptions about the population from which the on assumptions about the population from which the sample data are collectedsample data are collected.. Assumption that data being analyzed are Assumption that data being analyzed are
randomly selected from a normally distributed randomly selected from a normally distributed population. population.
Requires quantitative measurement that yield Requires quantitative measurement that yield interval or ratio level data.interval or ratio level data.
Nonparametric Statistics are based on fewer Nonparametric Statistics are based on fewer assumptions about the population and the assumptions about the population and the parametersparameters.. Sometimes called “distribution-free” statistics.Sometimes called “distribution-free” statistics. A variety of nonparametric statistics are available for use A variety of nonparametric statistics are available for use
with nominal or ordinal data.with nominal or ordinal data.
Summary Table of Statistical TestsSummary Table of Statistical Tests
Level of Measurement
Sample Characteristics Correlation
1 Sample
2 Sample K Sample (i.e., >2)
Independent Dependent
Independent Dependent
Categorical or Nominal
Χ2 Χ2 Macnarmar’s
Χ2Χ2 Cochran’s Q
Rank or Ordinal
Mann Whitney U
Wilcoxin Matched
Pairs Signed Ranks
Kruskal Wallis H
Friendman’s ANOVA
Spearman’s rho
Parametric (Interval &
Ratio)
z test or t test
t test between groups
t test within groups
1 way ANOVA between groups
1 way ANOVA (within or repeated measure)
Pearson’s r
Factorial (2 way) ANOVA
Chi-SquareChi-Square
Types of Statistical TestsTypes of Statistical Tests
When running a t test and ANOVAWhen running a t test and ANOVA We compare:We compare:
Mean differences between groupsMean differences between groups We assumeWe assume
random samplingrandom sampling the groups are homogeneousthe groups are homogeneous distribution is normaldistribution is normal samples are large enough to represent population samples are large enough to represent population
(>30)(>30) DV Data: represented on an DV Data: represented on an interval or ratiointerval or ratio scale scale
These are Parametric tests!These are Parametric tests!
Types of TestsTypes of Tests
When the assumptions are violated:When the assumptions are violated: Subjects were not randomly sampledSubjects were not randomly sampled DV DataDV Data: :
Ordinal (ranked)Ordinal (ranked) Nominal (categorized: types of car, levels of Nominal (categorized: types of car, levels of
education, learning styles, Likert Scale)education, learning styles, Likert Scale) The scores are greatly skewed or we have no The scores are greatly skewed or we have no
knowledge of the distributionknowledge of the distribution
We use tests that are equivalent to t test and We use tests that are equivalent to t test and ANOVAANOVA
Non-Parametric TestNon-Parametric Test!!
Requirements for Chi-Requirements for Chi-Square testSquare test18
Must be a random sample from populationMust be a random sample from populationData must be in raw frequenciesData must be in raw frequenciesVariables must be independentVariables must be independentA sufficiently large sample size is required A sufficiently large sample size is required
(at least 20) (at least 20) Actual count data (not percentages) Actual count data (not percentages) Observations must be independent. Observations must be independent. Does not prove causalityDoes not prove causality..
Different Scales, Different Measures Different Scales, Different Measures of Associationof Association
Scale of Both Scale of Both Variables Variables
Measures of Measures of AssociationAssociation
Nominal ScaleNominal ScalePearson Chi-Pearson Chi-Square: Square: χχ22
Ordinal ScaleOrdinal ScaleSpearman’s rhoSpearman’s rho
Interval or Ratio Interval or Ratio ScaleScale
Pearson rPearson r
Chi SquareChi Square
Used when data are nominal (both IV and DV)Used when data are nominal (both IV and DV) Comparing frequencies of distributions occurring in Comparing frequencies of distributions occurring in
different categories or groupsdifferent categories or groups Tests whether group distributions are differentTests whether group distributions are different
• Shoppers’ preference for the taste of 3 brands of candyShoppers’ preference for the taste of 3 brands of candy determines the association between IV and DV by determines the association between IV and DV by
counting the frequencies of distributioncounting the frequencies of distribution• Gender relative to study preference (alone or in group)Gender relative to study preference (alone or in group)
ImportantImportant
The chi square test can only be used on The chi square test can only be used on data that has the following characteristics:data that has the following characteristics:The data must be in the
form of frequencies
The frequency data must have a precise numerical value and must
be organised into categories or groups.
The total number of observations must be greater than 20.
The expected frequency in any one cell of the table must be greater than
5.
FormulaFormula
χ 2 = ∑ (O – E)2
E
χ2 = The value of chi squareO = The observed valueE = The expected value∑ (O – E)2 = all the values of (O – E) squared then added together
What is itWhat is it??What is itWhat is it??
Test of proportionsTest of proportions Non parametric testNon parametric test Dichotomous variables are usedDichotomous variables are used Tests the association between two Tests the association between two
factorsfactors
e.g. treatment and diseasee.g. treatment and disease
gender and mortalitygender and mortality
types of chi-square analysis types of chi-square analysis techniquestechniques
Tests of IndependenceTests of Independence is a chi-square is a chi-square technique used to determine whether two technique used to determine whether two characteristics (such as food spoilage and characteristics (such as food spoilage and refrigeration temperature) are related or refrigeration temperature) are related or independent. independent.
Goodness-of-fit testGoodness-of-fit test is a chi-square test is a chi-square test technique used to study similarities between technique used to study similarities between proportions or frequencies between groupings proportions or frequencies between groupings (or classification) of categorical data (comparing (or classification) of categorical data (comparing a distribution of data with another distribution of a distribution of data with another distribution of data where the expected frequencies are data where the expected frequencies are known). known).
Chi Square Test of IndependenceChi Square Test of Independence PurposePurpose
To determine if two variables of interest independent (not To determine if two variables of interest independent (not related) or are related (dependent)?related) or are related (dependent)?
When the variables are independent, we are saying that knowledge of When the variables are independent, we are saying that knowledge of one gives us no information about the other variable. When they are one gives us no information about the other variable. When they are dependent, we are saying that knowledge of one variable is helpful in dependent, we are saying that knowledge of one variable is helpful in predicting the value of the other variable.predicting the value of the other variable.
The chi-square test of independence is a test of the influence or The chi-square test of independence is a test of the influence or impact that a subject’s value on one variable has on the same impact that a subject’s value on one variable has on the same subject’s value for a second variable. subject’s value for a second variable.
Some examples where one might use the chi-squared test of Some examples where one might use the chi-squared test of independence are:independence are:
• Is level of education related to level of income?Is level of education related to level of income?• Is the level of price related to the level of quality in production?Is the level of price related to the level of quality in production?
HypothesesHypotheses The null hypothesis is that the two variables are independent. This will The null hypothesis is that the two variables are independent. This will
be true if the observed counts in the sample are similar to the be true if the observed counts in the sample are similar to the expected counts.expected counts.
• HH00: X and Y are independent: X and Y are independent• HH11: X and Y are dependent: X and Y are dependent
Chi Square Test of Chi Square Test of IndependenceIndependence
Wording of Research questionsWording of Research questions Are X and Y independent?Are X and Y independent? Are X and Y related?Are X and Y related? The research hypothesis states that the The research hypothesis states that the
two variables are dependent or related. two variables are dependent or related. This will be true if the observed counts for This will be true if the observed counts for the categories of the variables in the the categories of the variables in the sample are different from the expected sample are different from the expected counts.counts.
Level of MeasurementLevel of Measurement Both X and Y are categoricalBoth X and Y are categorical
AssumptionsAssumptionsChi Square Test of IndependenceChi Square Test of Independence
Each subject contributes data to only one cellEach subject contributes data to only one cell
Finite valuesFinite values Observations must be grouped in categories. No assumption is Observations must be grouped in categories. No assumption is
made about level of data. Nominal, ordinal, or interval data may be made about level of data. Nominal, ordinal, or interval data may be used with chi-square tests.used with chi-square tests.
A sufficiently large sample sizeA sufficiently large sample size In general N > 20. In general N > 20. No one accepted cutoff – the general rules areNo one accepted cutoff – the general rules are
• No cells with No cells with observed observed frequency = 0frequency = 0• No cells with the No cells with the expectedexpected frequency < 5 frequency < 5• Applying chi-square to small samples exposes the researcher to an Applying chi-square to small samples exposes the researcher to an
unacceptable rate of Type II errors.unacceptable rate of Type II errors.Note: chi-square must be calculated on actual count data, not Note: chi-square must be calculated on actual count data, not substituting percentages, which would have the effect of pretending substituting percentages, which would have the effect of pretending the sample size is 100.the sample size is 100.
Chi Square Test of Goodness of FitChi Square Test of Goodness of Fit PurposePurpose
To determine whether an observed To determine whether an observed frequency distribution departs significantly frequency distribution departs significantly from a hypothesized frequency distribution.from a hypothesized frequency distribution.
This test is sometimes called a One-sample This test is sometimes called a One-sample Chi Square Test.Chi Square Test.
HypothesesHypotheses The null hypothesis is that the two variables are The null hypothesis is that the two variables are
independent. This will be true if the observed independent. This will be true if the observed counts in the sample are similar to the expected counts in the sample are similar to the expected counts.counts.• HH00: X follows the hypothesized distribution: X follows the hypothesized distribution• HH11: X deviates from the hypothesized distribution: X deviates from the hypothesized distribution
Chi Square Test of Goodness of FitChi Square Test of Goodness of Fit
Sample Research QuestionsSample Research Questions Do students buy more Coke, Gatorade or Do students buy more Coke, Gatorade or
Coffee?Coffee? Does my sample contain a Does my sample contain a
disproportionate amount of Hispanics as disproportionate amount of Hispanics as compared to the population of the county compared to the population of the county from which they were sampled?from which they were sampled?
Has the ethnic composition of the city of Has the ethnic composition of the city of Amman changed since 1990?Amman changed since 1990?
Level of MeasurementLevel of Measurement X is categoricalX is categorical
AssumptionsAssumptionsChi Square Test of Goodness of FitChi Square Test of Goodness of Fit The research question involves the comparison of the The research question involves the comparison of the
observed frequency of one categorical variable within a observed frequency of one categorical variable within a sample to the expected frequency of that variable.sample to the expected frequency of that variable.
The observed and theoretical distributions must contain The observed and theoretical distributions must contain the same divisions the same divisions (i.e. ‘levels’ or ‘classes’)(i.e. ‘levels’ or ‘classes’)
The expected frequency in each division must be >5The expected frequency in each division must be >5
There must be a sufficient sample (in general N>20)There must be a sufficient sample (in general N>20)
Steps in Test of HypothesisSteps in Test of Hypothesis
1.1. Determine the appropriate test Determine the appropriate test 2.2. Establish the level of significance:Establish the level of significance:αα3.3. Formulate the statistical hypothesisFormulate the statistical hypothesis4.4. Calculate the test statisticCalculate the test statistic5.5. Determine the degree of freedomDetermine the degree of freedom6.6. Compare computed test statistic against a Compare computed test statistic against a
tabled/critical valuetabled/critical value
11 . .Determine Appropriate TestDetermine Appropriate Test Chi Square is used when both variables are Chi Square is used when both variables are
measured on a nominal scale.measured on a nominal scale. It can be applied to interval or ratio data that It can be applied to interval or ratio data that
have been categorized into a small number of have been categorized into a small number of groups.groups.
It assumes that the observations are It assumes that the observations are randomly sampled from the population.randomly sampled from the population.
All observations are independent (an All observations are independent (an individual can appear only once in a table and individual can appear only once in a table and there are no overlapping categories).there are no overlapping categories).
It does not make any assumptions about the It does not make any assumptions about the shape of the distribution nor about the shape of the distribution nor about the homogeneity of variances.homogeneity of variances.
22 . .Establish Level of Establish Level of SignificanceSignificance
αα is a predetermined value is a predetermined value The conventionThe convention
• αα = .05 = .05• αα = .01 = .01 • αα = .001 = .001
33 . .Determine The Hypothesis:Determine The Hypothesis:Whether There is an Whether There is an Association or NotAssociation or Not
HHoo : The two variables are independent : The two variables are independent
HHaa : The two variables are associated : The two variables are associated
44 . .Calculating Test StatisticsCalculating Test Statistics Contrasts Contrasts observedobserved frequencies in each cell of a frequencies in each cell of a
contingency table with contingency table with expectedexpected frequencies. frequencies. The expected frequencies represent the number The expected frequencies represent the number
of cases that would be found in each cell if the of cases that would be found in each cell if the null hypothesis were true ( i.e. the nominal null hypothesis were true ( i.e. the nominal variables are unrelated).variables are unrelated).
Expected frequency of two unrelated events is Expected frequency of two unrelated events is product of the row and column frequency divided product of the row and column frequency divided by number of cases.by number of cases.
FFee= F= Fr r FFc c / N/ N
Expected frequency = row total x column total
Grand total
44 . .Calculating Test StatisticsCalculating Test Statistics
e
eo
F
FF 22 )(
44 . .Calculating Test StatisticsCalculating Test Statistics
e
eo
F
FF 22 )(
Observed
frequencies
Expe
cted
fre
quen
cy
Expected
frequency
5. Determine Degrees of 5. Determine Degrees of FreedomFreedomdf df = = (R-1)(C-1) (R-1)(C-1)
Num
ber of
levels in column
variable
Num
ber of levels in row
variable
66 . .Compare computed test statistic Compare computed test statistic against a tabled/critical valueagainst a tabled/critical value
The computed value of the Pearson chi- The computed value of the Pearson chi- square statistic is compared with the critical square statistic is compared with the critical value to determine if the computed value is value to determine if the computed value is improbableimprobable
The critical tabled values are based on The critical tabled values are based on sampling distributions of the Pearson chi-sampling distributions of the Pearson chi-square statisticsquare statistic
If calculated If calculated 22 is greater than is greater than 22 table value, table value, reject Hreject Hoo
Decision and InterpretationDecision and Interpretation
If the probability of the test statistic is less than or If the probability of the test statistic is less than or equal to the probability of the alpha error rate, we equal to the probability of the alpha error rate, we reject the null hypothesis and conclude that our reject the null hypothesis and conclude that our data supports the research hypothesis. We data supports the research hypothesis. We conclude that there is a relationship between the conclude that there is a relationship between the variables.variables.
If the probability of the test statistic is greater than If the probability of the test statistic is greater than the probability of the alpha error rate, we fail to the probability of the alpha error rate, we fail to reject the null hypothesis. We conclude that there reject the null hypothesis. We conclude that there is no relationship between the variables, i.e. they is no relationship between the variables, i.e. they are independent. are independent.
ExampleExample
Suppose a researcher is interested in Suppose a researcher is interested in voting preferences on gun control issues.voting preferences on gun control issues.
A questionnaire was developed and sent A questionnaire was developed and sent to a random sample of 90 voters. to a random sample of 90 voters.
The researcher also collects information The researcher also collects information about the political party membership of the about the political party membership of the sample of 90 respondents.sample of 90 respondents.
Bivariate Frequency Table or Bivariate Frequency Table or Contingency TableContingency Table
FavorFavorNeutralNeutralOpposeOpposef f rowrow
DemocratDemocrat1010101030305050
RepublicanRepublican1515151510104040
f f columncolumn252525254040n = 90n = 90
Bivariate Frequency Table or Bivariate Frequency Table or Contingency TableContingency Table
FavorFavorNeutralNeutralOpposeOpposef f rowrow
DemocratDemocrat1010101030305050
RepublicanRepublican1515151510104040
f f columncolumn252525254040n = 90n = 90
Observ
ed
frequ
encie
s
Bivariate Frequency Table or Bivariate Frequency Table or Contingency TableContingency Table
FavorFavorNeutralNeutralOpposeOpposef f rowrow
DemocratDemocrat1010101030305050
RepublicanRepublican1515151510104040
f f columncolumn252525254040n = 90n = 90
Row
frequency
Bivariate Frequency Table or Bivariate Frequency Table or Contingency TableContingency Table
FavorFavorNeutralNeutralOpposeOpposef f rowrow
DemocratDemocrat1010101030305050
RepublicanRepublican1515151510104040
f f columncolumn252525254040n = 90n = 90Column frequency
11 . .Determine Appropriate TestDetermine Appropriate Test
1.1. Party Membership ( 2 levels) and Party Membership ( 2 levels) and NominalNominal
2.2. Voting Preference ( 3 levels) and Voting Preference ( 3 levels) and NominalNominal
2. Establish Level of 2. Establish Level of SignificanceSignificance
Alpha of .05 Alpha of .05
33 . .Determine The HypothesisDetermine The Hypothesis
• Ho : There is no difference between D & R Ho : There is no difference between D & R in their opinion on gun control issue.in their opinion on gun control issue.
• Ha : There is an association between Ha : There is an association between responses to the gun control survey and responses to the gun control survey and the party membership in the population.the party membership in the population.
44 . .Calculating Test StatisticsCalculating Test Statistics
FavorFavorNeutralNeutralOpposeOpposef f rowrow
DemocratDemocratffoo =10 =10
ffee =13.9 =13.9
ffoo =10 =10
ffee =13.9 =13.9
ffoo =30 =30
ffee=22.2=22.2
5050
RepublicanRepublicanffoo =15 =15
ffee =11.1 =11.1
ffoo =15 =15
ffee =11.1 =11.1
ffoo =10 =10
ffee =17.8 =17.8
4040
f f columncolumn252525254040n = 90n = 90
44 . .Calculating Test StatisticsCalculating Test Statistics
FavorFavorNeutralNeutralOpposeOpposef f rowrow
DemocratDemocratffoo =10 =10
ffee = =13.913.9
ffoo =10 =10
ffee =13.9 =13.9
ffoo =30 =30
ffee=22.2=22.2
5050
RepublicaRepublicann
ffoo =15 =15
ffee =11.1 =11.1
ffoo =15 =15
ffee =11.1 =11.1
ffoo =10 =10
ffee =17.8 =17.8
4040
f f columncolumn252525254040n = 90n = 90
= 50*25/90
44 . .Calculating Test StatisticsCalculating Test Statistics
FavorFavorNeutralNeutralOpposeOpposef f rowrow
DemocratDemocratffoo =10 =10
ffee =13.9 =13.9
ffoo =10 =10
ffee =13.9 =13.9
ffoo =30 =30
ffee=22.2=22.2
5050
RepublicaRepublicann
ffoo =15 =15
ffee = =11.111.1
ffoo =15 =15
ffee =11.1 =11.1
ffoo =10 =10
ffee =17.8 =17.8
4040
f f columncolumn252525254040n = 90n = 90
= 40* 25/90
44 . .Calculating Test StatisticsCalculating Test Statistics
8.17
)8.1710(
11.11
)11.1115(
11.11
)11.1115(
2.22
)2.2230(
89.13
)89.1310(
89.13
)89.1310(
222
2222
= 11.03
5. Determine Degrees of 5. Determine Degrees of FreedomFreedom
df = (R-1)(C-1) =df = (R-1)(C-1) =(2-1)(3-1) = 2(2-1)(3-1) = 2
66 . .Compare computed test statistic Compare computed test statistic against a tabled/critical valueagainst a tabled/critical value
αα = 0.05 = 0.05 df = 2df = 2 Critical tabled value = 5.991Critical tabled value = 5.991 Test statistic, Test statistic, 11.03,11.03, exceeds critical value exceeds critical value Null hypothesis is rejectedNull hypothesis is rejected Democrats & Republicans differ Democrats & Republicans differ
significantly in their opinions on gun significantly in their opinions on gun control issuescontrol issues
SPSS Output for Gun Control SPSS Output for Gun Control ExampleExample
Chi-Square Tests
11.025a 2 .004
11.365 2 .003
8.722 1 .003
90
Pearson Chi-Square
Likelihood Ratio
Linear-by-LinearAssociation
N of Valid Cases
Value dfAsymp. Sig.
(2-sided)
0 cells (.0%) have expected count less than 5. Theminimum expected count is 11.11.
a.
Additional Information in SPSS Additional Information in SPSS OutputOutput
Exceptions that might distort Exceptions that might distort χχ22 AssumptionsAssumptions Associations in some but not all categoriesAssociations in some but not all categories Low expected frequency per cellLow expected frequency per cell
Extent of association is not same as Extent of association is not same as statistical significancestatistical significance
Demonstratedthrough an example
Another Example Heparin Lock Another Example Heparin Lock PlacementPlacement
Complication Incidence * Heparin Lock Placement Time Group Crosstabulation
9 11 20
10.0 10.0 20.0
18.0% 22.0% 20.0%
41 39 80
40.0 40.0 80.0
82.0% 78.0% 80.0%
50 50 100
50.0 50.0 100.0
100.0% 100.0% 100.0%
Count
Expected Count
% within Heparin LockPlacement Time Group
Count
Expected Count
% within Heparin LockPlacement Time Group
Count
Expected Count
% within Heparin LockPlacement Time Group
Had Compilca
Had NO Compilca
ComplicationIncidence
Total
1 2
Heparin LockPlacement Time Group
Total
from Polit Text: Table 8-1
Time:1 = 72 hrs 2 = 96 hrs
Hypotheses in Heparin Lock PlacementHypotheses in Heparin Lock Placement
HHoo:: There is no association between There is no association between
complication incidence and length of complication incidence and length of heparin lock placement. (The variables are heparin lock placement. (The variables are independent).independent).
HHaa:: There is an association between There is an association between
complication incidence and length of complication incidence and length of heparin lock placement. (The variables are heparin lock placement. (The variables are related).related).
More of SPSS OutputMore of SPSS Output
Pearson Chi-SquarePearson Chi-Square Pearson Chi-Square Pearson Chi-Square
= .250, p = .617= .250, p = .617 Since the p > .05, we fail Since the p > .05, we fail
to reject the null to reject the null hypothesis that the hypothesis that the complication rate is complication rate is unrelated to heparin lock unrelated to heparin lock placement time.placement time.
Continuity correction is Continuity correction is used in situations in used in situations in which the expected which the expected frequency for any cell in a frequency for any cell in a 2 by 2 table is less than 2 by 2 table is less than 10.10.
More SPSS OutputMore SPSS Output
Symmetric Measures
-.050 .617
.050 .617
-.050 .100 -.496 .621c
-.050 .100 -.496 .621c
100
Phi
Cramer's V
Nominal byNominal
Pearson's RInterval by Interval
Spearman CorrelationOrdinal by Ordinal
N of Valid Cases
ValueAsymp.
Std. Errora
Approx. Tb
Approx. Sig.
Not assuming the null hypothesis.a.
Using the asymptotic standard error assuming the null hypothesis.b.
Based on normal approximation.c.
Phi CoefficientPhi Coefficient
Pearson Chi-Square Pearson Chi-Square provides information provides information about the existence of about the existence of relationship between 2 relationship between 2 nominal variables, but not nominal variables, but not about the magnitude of about the magnitude of the relationshipthe relationship
Phi coefficient is the Phi coefficient is the measure of the strength measure of the strength of the associationof the association
Symmetric Measures
-.050
.050
-.050
-.050
100
Phi
Cramer's V
Nominal byNominal
Pearson's RInterval by Interval
Spearman CorrelationOrdinal by Ordinal
N of Valid Cases
Value
Not assuming the null hypothesis.a.
Using the asymptotic standard error assuming the null hypothesis.b.
Based on normal approximation.c.
N
2
Cramer’s VCramer’s V When the table is larger than 2 When the table is larger than 2
by 2, a different index must be by 2, a different index must be used to measure the strength used to measure the strength of the relationship between the of the relationship between the variables. One such index is variables. One such index is Cramer’s V.Cramer’s V.
If Cramer’s V is large, it means If Cramer’s V is large, it means that there is a tendency for that there is a tendency for particular categories of the first particular categories of the first variable to be associated with variable to be associated with particular categories of the particular categories of the second variable.second variable.
Symmetric Measures
-.050
.050
-.050 .100
-.050 .100
100
Phi
Cramer's V
Nominal byNominal
Pearson's RInterval by Interval
Spearman CorrelationOrdinal by Ordinal
N of Valid Cases
ValueAsymp.
Std. Error
Not assuming the null hypothesis.a.
Using the asymptotic standard error assuming the null hypothesis.b.
Based on normal approximation.c.
)1(
2
kNV
Cramer’s VCramer’s V When the table is larger than 2 When the table is larger than 2
by 2, a different index must be by 2, a different index must be used to measure the strength used to measure the strength of the relationship between the of the relationship between the variables. One such index is variables. One such index is Cramer’s V.Cramer’s V.
If Cramer’s V is large, it means If Cramer’s V is large, it means that there is a tendency for that there is a tendency for particular categories of the first particular categories of the first variable to be associated with variable to be associated with particular categories of the particular categories of the second variable.second variable.
Symmetric Measures
-.050
.050
-.050 .100
-.050 .100
100
Phi
Cramer's V
Nominal byNominal
Pearson's RInterval by Interval
Spearman CorrelationOrdinal by Ordinal
N of Valid Cases
ValueAsymp.
Std. Error
Not assuming the null hypothesis.a.
Using the asymptotic standard error assuming the null hypothesis.b.
Based on normal approximation.c.
)1(
2
kNV
Number of cases
Smallest of number of rows or columns
Interpreting Cell Differences in Interpreting Cell Differences in a Chi-square Test - 1a Chi-square Test - 1
A chi-square test of independence of the relationship between sex and marital status finds a statistically significant relationship between the variables.
Interpreting Cell Differences in Interpreting Cell Differences in a Chi-square Test - 2a Chi-square Test - 2
Researcher often try to identify try to identify which cell or cells are the major contributors to the significant chi-square test by examining the pattern of column percentages.
Based on the column percentages, we would identify cells on the married row and the widowed row as the ones producing the significant result because they show the largest differences: 8.2% on the married row (50.9%-42.7%) and 9.0% on the widowed row (13.1%-4.1%)
Interpreting Cell Differences in Interpreting Cell Differences in a Chi-square Test - 3a Chi-square Test - 3
Using a level of significance of 0.05, the critical value for a standardized residual would be -1.96 and +1.96. Using standardized residuals, we would find that only the cells on the widowed row are the significant contributors to the chi-square relationship between sex and marital status.
If we interpreted the contribution of the married marital status, we would be mistaken. Basing the interpretation on column percentages can be misleading.
Chi-Square Test of Chi-Square Test of Independence: post hoc test Independence: post hoc test
in SPSS (1)in SPSS (1)
You can conduct a chi-square test of independence in crosstabulation of SPSS by selecting:
Analyze > Descriptive Statistics > Crosstabs…
Chi-Square Test of Chi-Square Test of Independence: post hoc test Independence: post hoc test
in SPSS (2)in SPSS (2)First, select and move the variables for the question to “Row(s):” and “Column(s):” list boxes.
The variable mentioned first in the problem, sex, is used as the independent variable and is moved to the “Column(s):” list box.
The variable mentioned second in the problem, [fund], is used as the dependent variable and is moved to the “Row(s)” list box.
Second, click on “Statistics…” button to request the test statistic.
Chi-Square Test of Chi-Square Test of Independence: post hoc test Independence: post hoc test
in SPSS (3)in SPSS (3)
Second, click on “Continue” button to close the Statistics dialog box.
First, click on “Chi-square” to request the chi-square test of independence.
Chi-Square Test of Chi-Square Test of Independence: post hoc test Independence: post hoc test
in SPSS (6)in SPSS (6)
In the table Chi-Square Tests result, SPSS also tells us that “0 cells have expected count less than 5 and the minimum expected count is 70.63”.
The sample size requirement for the chi-square test of independence is satisfied.
Chi-Square Test of Chi-Square Test of Independence: post hoc test Independence: post hoc test
in SPSS (7)in SPSS (7)The probability of the chi-square test statistic (chi-square=2.821) was p=0.244, greater than the alpha level of significance of 0.05. The null hypothesis that differences in "degree of religious fundamentalism" are independent of differences in "sex" is not rejected.
The research hypothesis that differences in "degree of religious fundamentalism" are related to differences in "sex" is not supported by this analysis.
Thus, the answer for this question is False. We do not interpret cell differences unless the chi-square test statistic supports the research hypothesis.
SPSS AnalysisSPSS AnalysisChi Square Test of Goodness Chi Square Test of Goodness
of Fitof Fit Analyze – Analyze –
Nonparametric Tests – Nonparametric Tests – Chi SquareChi Square
X variable goes here
If all expected frequencies are the same, click
this box
If all expected frequencies are
not the same, enter the expected value for each division
here
ExamplesExamples(using the Montana.sav data)(using the Montana.sav data)
Allexpected frequencies
are the same
Allexpected frequencies
are not the same
Thank You for Listening Thank You for Listening
Questions?Questions?