52
Measures of Association for Contingency Tables

Measures of Association for Contingency Tables

  • Upload
    carlow

  • View
    66

  • Download
    9

Embed Size (px)

DESCRIPTION

Measures of Association for Contingency Tables. Measures of Association. General measures of association that can be used with any variable types. Measures of association when both X and Y are nominal. Measures of association when both X and Y are ordinal. - PowerPoint PPT Presentation

Citation preview

Page 1: Measures of Association  for Contingency Tables

Measures of Association

for Contingency Tables

Page 2: Measures of Association  for Contingency Tables

Measures of Association• General measures of association that

can be used with any variable types.• Measures of association when both X

and Y are nominal.• Measures of association when both X

and Y are ordinal.• Measures of association when X and

Y are both ordinal or dichotomous nominal.

Page 3: Measures of Association  for Contingency Tables

Measures of Association• There are two main classes of

measures of association: symmetric or asymmetric.

• Symmetric measures will be the same if the roles of X and Y are reversed. In other, words it does not matter which variable is viewed as the independent variable (X) and which is viewed as the dependent variable (Y).

Page 4: Measures of Association  for Contingency Tables

Measures of Association• Asymmetric measures will be

different if the roles of X and Y are reversed. In other words, which variable is viewed as the independent variable (X) and which is viewed as the dependent variable (Y) matters.

Page 5: Measures of Association  for Contingency Tables

Measures of AssociationAsymmetric Measure

Ordinal Variables

Nominal Variables

Yes Somer’s D Lambda (l) – asymmetricUncertainty Coefficient - asymmetric

No Gamma (g)Kendall’s Tau-bStuart’s Tau -c

Phi (f)Yule’s Q (2 x 2 tables)Cramer’s V (r x c tables)Pearson’s Contingency Coefficient (C) Uncertainty Coefficient – symmetricLambda (l) – symmetric

same

Page 6: Measures of Association  for Contingency Tables

Measures of AssociationRule of Thumb for Interpreting the Magnitude (i.e. ignoring the sign/direction)of the various measures of association we will be examining is as follows:

.00 to <.10 “no relationship”

.10 to <.30 “weak relationship”

.30 to <.50 “moderate relationship”

.50 to 1.00 “strong relationship”

You could find several other adjective scales, these are NOT set in stone!

Page 7: Measures of Association  for Contingency Tables

Measures of AssociationAsymmetric Measure

Ordinal Variables

Nominal Variables

Yes Somer’s D Lambda (l) – asymmetricUncertainty Coefficient - asymmetric

No Gamma (g)Kendall’s Tau-bStuart’s Tau -c

Phi (f)Yule’s Q (2 x 2 tables)Cramer’s V (r x c tables)Pearson’s Contingency Coefficient (C)Lambda (l) – symmetricUncertainty Coefficient – symmetric

same

SYMMETRIC AND CAN BE USED WITH ANY DATA TYPES

Page 8: Measures of Association  for Contingency Tables

Measures of Association Between Two Categorical Variables - Phi statistic

(𝜙)

SYMMETRIC AND CAN BE USED WITH ANY DATA TYPES

Page 9: Measures of Association  for Contingency Tables

Measures of Association Between Two Categorical Variables – Phi statistic

This can be applied to the cervical cancer case-control study.

157.366011.9)(

2

n

Phi

Using this measure, there is a weak association between risk factor and disease status.

Page 10: Measures of Association  for Contingency Tables

Symmetric Measure for 2 X 2 Tables only!

or

Measures of Association Between Two Categorical Variables – Yule’s Q

Disease Risk Present Risk AbsentYes a bNo c d

Page 11: Measures of Association  for Contingency Tables

Measures of Association Between Two Categorical Variables – Yule’s Q

There is a strong association between risk factor (Preg. Age < 25) and case-control status (Cervical Cancer) using this measure.

Page 12: Measures of Association  for Contingency Tables

Measures of Association Between Two Categorical Variables – Yule’s Q

There is a strong association between risk factor (Preg. Age < 25) and case-control status (Cervical Cancer) using this measure.

Page 13: Measures of Association  for Contingency Tables

Measures of Association Between Two Categorical Variables – Cramer’s V

SYMMETRIC AND CAN BE USED WITH ANY DATA TYPES

Page 14: Measures of Association  for Contingency Tables

Measures of Association Between Two Categorical Variables – Cramer’s V

For the Hodgkin’s study:

Which suggests a weak relationship between histological type and response to treatment.

266.)13(538

89.75

V

Is there a relationship between histological type of Hodgkin’s disease and response to treatment?

Page 15: Measures of Association  for Contingency Tables

Measures of Association Between Two Categorical Variables – Pearson’s C

This can be used for general r x c tables regardless of the data types involved.

SYMMETRIC AND CAN BE USED WITH ANY DATA TYPES

Page 16: Measures of Association  for Contingency Tables

Measures of Association Between Two Categorical Variables – Pearson’s C

This can be used for the Hodgkin’s example.Which suggests a moderate

relationshipbetween type and response to treatment.

3516.89.75538

89.75

C

Page 17: Measures of Association  for Contingency Tables

Measures of AssociationAsymmetric Measure

Ordinal Variables

Nominal Variables

Yes Somer’s D Lambda (l) – asymmetricUncertainty Coefficient - asymmetric

No Gamma (g)Kendall’s Tau-bStuart’s Tau -c

Phi (f)Yule’s Q (2 x 2 tables)Cramer’s V (r x c tables)Pearson’s Contingency Coefficient (C)Lambda (l) – symmetricUncertainty Coefficient – symmetric

same

ASYMMETRIC AND CAN BE USED WITH NOMINAL X & Y

Page 18: Measures of Association  for Contingency Tables

• Lambda (l) - Is an asymmetrical measure of association suitable for use with nominal variables that looks at predictive abilities, i.e. one variable predicting the level of the other. It provides us with an indication of the strength of an association between the independent (X) and dependent (Y) variables.

• It may range from 0.0 (meaning the extra information provided by the independent variable does not help prediction) to 1.0 (meaning use of independent variable results in no prediction errors).

• It is asymmetric, i.e. which variable is viewed as X and which as Y matters!

Measures of Association Between Two Categorical Variables – Lambda

Page 19: Measures of Association  for Contingency Tables

Lambda Measures of Association Between

Two Categorical Variables – LambdaLambda () is based on the concept ofProportional Reduction of Error (PRE).

where, errors of prediction made when the independent variable is ignored. errors of prediction made when the prediction is based on the independent variable.

Page 20: Measures of Association  for Contingency Tables

Lambda Measures of Association Between

Two Categorical Variables – LambdaFor calculating Lambda () :

The best way to see how these formulae work and the rationale behind them is to consider an example.

Page 21: Measures of Association  for Contingency Tables

Example: Physical and Psychological Pain of DBM Admits• These data come from a study conducted by

three master’s nursing students who recently graduated (Kelsey, Woods, & Langhans).

• One of the questions examined was whether there was a relationship between high physical pain at admission and high psychological pain. The high classification for psych pain meant 5 on five-point ordinal scale and high physical pain meant 5+ on the ten-point pain scale.

Page 22: Measures of Association  for Contingency Tables

Example: Physical and Psychological Pain of DBM Admits

Below is the a 2 X 2 table of the results with Physical Pain as Row (Y) and Psych Pain as Column (X).

Physical Pain

High PsychPain

NoRow Totals

High Phys. Pain

11 7 18

No 10 29 39

ColumnTotals

21 36 n = 57

Page 23: Measures of Association  for Contingency Tables

Example: Physical and Psychological Pain of DBM Admits

Physical Pain

High PsychPain

NoRow Totals

High Phys. Pain

11 7 18

No 10 29 39

ColumnTotals

21 36 n = 57

18 prediction errors using this approach.

In the absence of any information about psychological pain we predict they will not be suffering from high physical pain as that is the modal level on the physical pain scale.

Page 24: Measures of Association  for Contingency Tables

Example: Physical and Psychological Pain of DBM Admits

Physical Pain

High PsychPain

No Row Totals

High Phys. Pain

11 7 18

No 10 29 39

ColumnTotals

21 36 n = 57

i.e. we have 17 prediction errors using this approach.

Using Psych Pain to predict Physical Pain status we see that if the subject has high Psych Pain the modal response is High Physical Pain and if the subject does not have high Psych Pain the modal response is not having high Physical Pain.

Page 25: Measures of Association  for Contingency Tables

Example: Physical and Psychological Pain of DBM Admits

Physical Pain

High PsychPain

No Row Totals

High Phys. Pain

11 7 18

No 10 29 39

ColumnTotals

21 36 n = 57

Thus Lambda () = We have roughly a 5.56% improvement in predicting physical pain using knowledge about psychological pain.

Using Psych Pain to predict Physical Pain status we see that if the subject has high Psych Pain the modal response is High Physical Pain and if the subject does not have high Psych Pain the modal response is not having high Physical Pain.

Page 26: Measures of Association  for Contingency Tables

Example: Physical and Psychological Pain of DBM Admits

Psych Pain

High PhysicalPain No

Row Totals

High Psych Pain

11 10 21

No 7 29 38

ColumnTotals 18 39 n = 57

21 prediction errors 17 17 prediction errors Thus Lambda() = , a 19.05% improvement in prediction error (PRE). Notice the asymmetry of the association!!

Using Physical Pain to predict Psychological Pain status we see that if the subject has high Physical Pain the modal response is High Psych Pain and if the subject does not have high Physical Pain the modal response is not having High Psych Pain.

Page 27: Measures of Association  for Contingency Tables

Example: Physical and Psychological Pain of DBM Admits

Psych Pain

High PhysicalPain

No

Row Totals

High Psych Pain

11 10 21

No 7 29 38

ColumnTotals 18 39 n = 57

The symmetric Lambda () is simply the average of the two asymmetric measures, i.e. Lambda () – symmetric = or a 12.31% improvement in prediction error.

Page 28: Measures of Association  for Contingency Tables

Example: Physical and Psychological Pain of DBM Admits

The Lambda association measures are highlighted. You can see they match those we calculated on by hand the previous slides. The Uncertainty Coefficient is calculated differently, but measures the PRE like Lambda does thus it can be interpreted in a similar fashion.

Page 29: Measures of Association  for Contingency Tables

Measures of AssociationAsymmetric Measure

Ordinal Variables

Nominal Variables

Yes Somer’s D Lambda (l) – asymmetricUncertainty Coefficient - asymmetric

No Gamma (g)Kendall’s Tau-bStuart’s Tau -c

Phi (f)Yule’s Q (2 x 2 tables)Cramer’s V (r x c tables)Pearson’s Contingency Coefficient (C)Lambda (l) – symmetricUncertainty Coefficient – symmetric

same

SYMMETRIC AND ASYMMETRIC MEASURES USED TO MEASURE THE ASSOCIATION BETWEEN ORDINAL VARIABLES.

Page 30: Measures of Association  for Contingency Tables

Measures of Association Between Two Ordinal Variables

Some of the previously discussed measures can be used. However, for cases where both variables are ordinal better measures include Gamma, Kendall’s tau, Stuart’s tau and Somer’s D. We will discuss these in a bit.

First though, in some cases we wish to measure the degree of exact agreement between two nominal or ordinal variables measured using the same levels or scales, in which case we generally use Cohen’s Kappa (k).

Page 31: Measures of Association  for Contingency Tables

Medicare Health Outcomes Survey

http://www.hosonline.org/Content/Default.aspx

Website for Medicare Health Outcomes Survey:

Page 32: Measures of Association  for Contingency Tables

Medicare Health Outcomes Survey (HOS)

FROM THE MEDICARE HOS SURVEY WEBSITE:The Medicare HOS is the first patient-reported outcomes measure used in Medicare managed care. The goal of the Medicare HOS program is to gather valid and reliable clinically meaningful data that have many uses, such as for targeting quality improvement activities and resources; monitoring health plan performance and rewarding top-performing health plans; helping beneficiaries make informed health care choices; and advancing the science of functional health outcomes measurement. Managed care plans with Medicare Advantage (MA) contracts must participate.

Each spring a random sample of Medicare beneficiaries is drawn from each participating Medicare Advantage Organization (MAO), that has a minimum of 500 enrollees and is surveyed (i.e., a survey is administered to a different baseline cohort, or group, each year). Two years later, these same respondents are surveyed again (i.e., follow up measurement). Cohort 1 was surveyed in 1998 and was resurveyed in 2000. Cohort 2 was surveyed in 1999 and was resurveyed in 2001, and so on. During the current HOS administration (2013 Round 16), Cohort 16 is surveyed and Cohort 14 is resurveyed using HOS 2.5. For data collection years 1998-2006, the MAO sample size was one thousand. Effective 2007, the MAO sample size was increased to twelve hundred.

Page 33: Measures of Association  for Contingency Tables

Measures of Association Between Two Categorical Variables

Cohen’s Kappa (k) – measures the degree of agreementbetween two variables on the same scales.HOS Study – General health

measured ordinally at baseline and 2-yr. follow-up, how well do they agree?

k > .75 excellent agreement.4 < k < .75 good agreement

0 < k < .40 marginal agreementThere is a fairly good agreement between the general assessment of

overall health baseline and at follow-up. However, there appears to be some general trend for improvement as well.

Page 34: Measures of Association  for Contingency Tables

Bowker’s Test of Symmetry

Symmetry of DisagreementBowker’s test suggests the differences are asymmetric (p < .0001).

Examining the percentages suggests a majority of patients either stayed the same or improved in each group based on baseline score.

Therefore it is reasonable to state that we have evidence that in general subjects health stayed the same or if it did change, it was generally for the better (p < .0001).

Page 35: Measures of Association  for Contingency Tables

Kruskal’s Gamma (g)• Before computing Gamma we need to

introduce the concept of discordant and concordant paired observations.

• Paired observations – Observations compared in terms of their relative rankings on the independent (X) and dependent variable (Y).

Page 36: Measures of Association  for Contingency Tables

Kruskal’s Gamma (g)• Same order pair (Ns) – Paired observations

that show a positive association; the member of the pair ranked higher on the independent variable is also ranked higher on the dependent variable.

• Inverse order pair (Nd) – Paired observations that show a negative association; the member of the pair ranked higher on the independent variable is ranked lower on the dependent variable.

Page 37: Measures of Association  for Contingency Tables

Kruskal’s Gamma (g)• Gamma is symmetrical measure of association

suitable for use with ordinal variables or with dichotomous nominal variables.

• For dichotomous nominal variables it is the same as Yule’s Q for 2 X 2 tables.

• It can vary from 0.0 (meaning the extra information provided by the independent variable does not help prediction) to 1.0 (meaning use of independent variable results in no prediction errors) and provides us with an indication of the strength and direction of the association between the variables.

• When there are more Ns pairs, gamma will be positive; when there are more Nd pairs, gamma will be negative.

Page 38: Measures of Association  for Contingency Tables

Example 1 : Job Security & Satisfaction

Job Satisfaction

High Medium Low

High 16 8 14Medium 19 17 60Low 9 11 56

Job Security

Page 39: Measures of Association  for Contingency Tables

Example 1: Job Security & Satisfaction

Job Satisfaction

High Medium Low

High 16 8 14Medium 19 17 60Low 9 11 56

Job Security

Same order pair (Ns) – Paired observations that show a positive association; the member of the pair ranked higher on the independent variable is also ranked higher on the dependent variable.

Page 40: Measures of Association  for Contingency Tables

Example 1: Job Security & Satisfaction

Job Satisfaction

High Medium Low

High 16 8 14Medium 19 17 60Low 9 11 56

Job Security

Inverse order pairs (Nd) – Paired observations that show a negative association; the member of the pair ranked higher on the independent variable is ranked lower on the dependent variable.

Page 41: Measures of Association  for Contingency Tables

Example 1: Job Security & Satisfaction

Job Satisfaction

High Medium Low

High 16 8 14Medium 19 17 60Low 9 11 56

Job Security

Gamma (g) =

The other measures use also but make adjustments for ties. Somer’s D as you can see is an asymmetrical measure.

Page 42: Measures of Association  for Contingency Tables

Example 2: Medicare Survey – General Health: Baseline vs.

Follow-up

Each highlighted measure suggests a strong relationship between general health at baseline and general health at follow-up as all measures exceed 0.50. The association is also positive indicating if health was good at baseline it also tends to be good at follow-up.

Page 43: Measures of Association  for Contingency Tables

SummaryWe have considered the following measures of association for contingency tables. Depending on the variable types and the goals of our analysis, we generally choose from among these measures.

Page 44: Measures of Association  for Contingency Tables

Other Measures for Ordinal Variables• There other measures that can be used when

both X and Y are ordinal in nature. These are more akin to the traditional correlation measure for continuous X and Y, which is Pearson’s Product Moment Correlation (r).

• Spearman’s Rank Correlation - (a.k.a. Spearman’s Rho), Kendall’s t, and Hoeffding’s D are all available in JMP, but are obtained by using the Analyze > Multivariate Methods and are found under the Nonparametric Correlations option.

Page 45: Measures of Association  for Contingency Tables

Example: NHANES Survey

The National Health and Nutrition Examination Survey (NHANES) is a program of studies designed to assess the health and nutritional status of adults and children in the United States. The survey is unique in that it combines interviews and physical examinations.

Page 46: Measures of Association  for Contingency Tables

Example: NHANES Dermatology Survey

This link we will take you to a description of the NHANES dermatology survey module conducted in 2005-2006.http://www.cdc.gov/nchs/nhanes/nhanes2005-2006/DEQ_D.htm

Page 47: Measures of Association  for Contingency Tables

Example: NHANES Dermatology Survey

Here we are examining ordinal measures on several variables pertaining to sun protective measures. The higher the score, the more frequently the respondent said they used the preventative measure.

As these are ALL ordinal variable the use of Pearson’s Product Moment Correlation is NOT appropriate!

Page 48: Measures of Association  for Contingency Tables

Example: NHANES Dermatology Survey

The nonparametric correlations we might consider using are found in the Nonparametric Correlations pull-out menu.

Spearman’s Rho is a good choice when X and Y are continuous but neither variable is normally distributed or if there are noticeable outliers. It can also be used with ordinal variables like we have here.

Kendall’s Tau is also a valid choice for ordinal variables.

Hoeffding’s D is good when the relationship between X and Y is nonlinear which would rarely, if ever, be the case for ordinal X and Y.

Page 49: Measures of Association  for Contingency Tables

Example: NHANES Dermatology Survey

Summary:As one would expect all correlations are positive, as someone who is cautious in one aspect of sun protection, probably tends to cautious in others as well.

Spearman’s r and Kendall’s t yield similar results.

Hoeffding’s D should not be used for these data!

Page 50: Measures of Association  for Contingency Tables

Summary• If X and Y are ordinal but not on the same

scale, or agreement when they are is not of primary interest, then there are several choices: Gamma, Kendall’s, Stuart’s and Somer’s. Try them all, pick the one you think is “best”.

• For non-ordinal associations you again have several choices: Phi, Cramer’s V (Yule’s Q), Lambda, Uncertainty Coefficient, etc. Again try them all, think about what you are trying to show and choose the one you think is “best”.

Page 51: Measures of Association  for Contingency Tables

Summary• If X and Y are ordinal and on exactly

the same scale we can examine Cohen’s Kappa (k) to measure the degree of exact agreement.

• To test for any asymmetries (i.e. a tendency for X > Y or X < Y) we can use Bowker’s Test for r x r tables or McNemar’s Test for 2 X 2 tables.

Page 52: Measures of Association  for Contingency Tables

Summary• If you clearly have an independent

variable (X) and a dependent variable (Y) then you might consider the asymmetric options.

• If X and Y are interchangeable and you simply want to measure or quantify the degree of association then a symmetric measure would be preferable.