Upload
mandrewmartin
View
6.767
Download
3
Tags:
Embed Size (px)
Citation preview
Cross Tabulation
Andrew Martin
PS 372
University of Kentucky
What is a cross-tabulation?A cross-tabulation displays the joint frequencies
and relatives frequencies of two categorical (nominal or ordinal) variables.
The distribution is listed for each combination of categories that exists between two variables.
Each case is then placed in the cell of the table that represents the combination of values that corresponds to its score on the variables.
What is a cross-tabulation?Ex: Party Identification and gender (NES Data)
Assuming party identification has three categories (Democrat, Independent, Republican) and gender has two (male and female), the table would have 6 cells.
Part ID Male FemaleDem. Cell 1 Cell 2Ind. Cell 3 Cell 4Rep. Cell 5 Cell 6
What is a cross-tabulation?If we use the seven-point party identification scale,
the cross-tabulation gets bigger.
Part ID MaleFemale
Strong Democrat Cell 1 Cell 2Weak Democrat Cell 3
Cell 4Ind.-Democrat Cell 5
Cell 6Independent Cell 7
Cell 8Ind.-Republican Cell 9
Cell 10Weak Republican Cell 11 Cell
12Republican Cell
13 Cell 14
How to construct a cross-tab
(1) Separate the cases into groups based on their values for the independent variables.
(2) For each grouping on the independent variable compute the frequencies or percentages falling in each level of the dependent variable.
(3) Decide whether the frequency or percentage distributions differ from group to group, and if so, by how much.
Remember, with cross-tabs
1. The column percentages are more important than the row percentages.
2. The column percentages should add to 100.
Research Questions
We can use cross-tabs to investigate the following research questions:
Is there a relationship between gender and partisanship?
Are women more likely to be Democrats than men?
If so, does this mean women are more liberal than men?
Cross-tabs and two-variable relationships
We assume the relationship is such that:
Gender --> Partisanship
In other words, gender is the independent variable that explains variation in partisanship, which is the dependent variable.
Strength of Relationship
Refers to how different the observed values of the dependent variables are in the categories of the independent variable.
If every case of a dependent variable appeared in one category, there would be a perfect relationship. This almost never occurs.
If the dependent variable is equally distributed for different categories of the independent variable, there is no relationship.
Insert 12-4 and 12-5
Direction of the relationship
The direction of the relationship shows which values of the independent variable are associated with values of the dependent variable.
If higher values of the independent variable are associated with higher values of the dependent variable, the relationship is positive.
If lower values of the independent variable are associated with higher values of the independent variable, the relationship is negative.
Another research question
Suppose you were asked to predict how Americans would respond to a question about making gun control laws more stringent.
In absence of any information about their attitudes about gun control, what would you use to predict attitudes about pending gun control legislation?
Potential answers: Ideology, partisanship, related survey questions about gun rights and restrictions.
Table 12-6Figure 12-3
Cross-tab limitations
Sometimes it is practicable to examine the relationships of two variables by just looking at the tables.
In some instances analysis involves many tables or tables with many cells.
In those instances, it may be more useful to summarize the information using coefficients for ordinal data.
Calculating Coefficients for Ordinal Data
We're not going to learn how to each statistic, but there are some basic concepts we should review.
In particular, we need to know how to identify concordant pairs, discordant pairs and tied pairs.
Concordant, Discordant and Tied Pairs
In a concordant pair, one case is higher than another case for both variables.
In a discordant pair, one case is lower on one of the variables but higher on the other.
In a tied pair both cases have the same value on one or both variables.
Insert Table 12-7
Insert Table 12-7
3 2 1
3
2
1
Values by Name and Variable
Name Variable XVariable Y
Alex 33
Carl 31
Dawn 23
Ernesto 22
Fay 21
Gus 13
Hera 12
Ike 11
Jasmine 11
Determining Pair Type
Alex and Ike
Variable X = 3 (Alex) – 1 (Ike) = 2Variable Y = 3 (Alex) – 1 (Ike) = 2
Both numbers +; Alex and Ike are concordant
Determining Pair Type
Carl and Ernesto
Variable X = 3 (Carl) – 2 (Ernesto) = 1Variable Y = 1 (Carl) – 2 (Ernesto) = -1
One number +, the other -; Alex and Ike are discordant
Determining Pair Type
Ike and Jasmine
Variable X = 1 (Ike) – 1 (Jasmine) = 0Variable Y = 1 (Ike) – 1 (Jasmine) = 0
Both numbers are 0; this constitutes a tied pair.
Ordinal Coefficients
There are four commonly used coefficients of association for ordinal data:
Kendell's tau-bKendell's tau-c
Somer's dGoodman and Kruskal's gamma
Ordinal Coefficients
Each are calculated somewhat differently (see JRM p. 442), but the intuition is that they measure the probability of concordant pairs minus the probability of discordant pairs.
Measure = pconcordance
– pdiscordance
where p = probability
The measures treat tied pairs somewhat differently.
Ordinal Coefficient Properties1. Theoretically all vary between -1 and 1.
2. In practice a -1 or 1 is unlikely. In fact, a measure of -.4 or .4 or greater illustrates a strong enough association to investigate further.
3. Since 0 means no correlation, values of -.1 to .1 suggest a weak relationship.
4. All ordinal measures of correlation will have the same sign in a given table.
Ordinal Coefficient Properties5. The absolute value of gamma (γ)will always be
greater than or equal to the absolute value of any of the other measures.
6. The relationships among tau b, tau c and Somer's d are harder to generalize because they are affected differently by the structure of the table (that is, the
number of rows and columns).
7. Somer's d is an asymmetric measure because its value depends on which variable is considered
dependent.
Ordinal Coefficient Properties
8. A single measure by itself cannot assess how strongly one variable is related to another. After
the statistical software calculates the measures, you should scrutinize the tables. Do not be lazy with
analysis and interpretation.
9. These coefficients measure a particular type of association, namely correlation, whether linear or
monotonic.