28
Chapter 6: Loglinear Models for Contingency Tables 1 Ch. 6: Loglinear Models for Contingency Tables Loglinear models for contingency tables are GLM’s that treat cell counts as Poisson-distributed and use the log link. Motivation: Two-Way Tables In a two-way table, and are independent if for all and i.e., For expected frequencies , this is equivalent to or, taking logs, where effect of falling in row effect of falling in column Note that this loglinear model of independence resembles two-way ANOVA with no interaction.

Chapter 6: Loglinear Models for Contingency Tables Ch. 6 ...users.stat.ufl.edu/.../Courses/sta4504-2000sp/Handouts/slides-ch6.pdf · Chapter 6: Loglinear Models for Contingency Tables

  • Upload
    lyxuyen

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Chapter 6: Loglinear Models for Contingency Tables 1

Ch. 6: Loglinear Models for Contingency Tables

Loglinear models for contingency tables are GLM’s that treatcell counts as Poisson-distributed and use the log link.

Motivation: Two-Way Tables

In a two-way table,�

and � are independent if� ��� � ��� � � ��� � � � � ���� � � � �

for all�

and

i.e.,

����� � ����������� � ��� ����� ���!��"�� � ��� ��� �!�$#%�

For expected frequencies & �'� � ( � ��� , this is equivalent to

& ��� � ( � ��� � ��� �

or, taking logs,)+*-, � & �'� �� )+*., � (/10 )+*., � � �2� 30 )+*-, � � ���

� 4 0 465� 0 467�where

465� �effect of

�falling in row

�4 7� �

effect of � falling in column

Note that this loglinear model of independence resemblestwo-way ANOVA with no interaction.

Chapter 6: Loglinear Models for Contingency Tables 2

Example (Income and Job Satisfaction (1991 GSS)).Job Satisfaction

Very A little Moderately VeryIncome Dissat. Dissat. Satis. Satis.� 5000 2 4 13 3

5–15,000 2 6 22 415–25,000 0 1 15 8� 25,000 0 3 13 8

104

Using dummy variables, the model)+*-, � & �'� �� 4 0 4��� 0 4���

can be expressed as)+*-, � & �'� �� 4 0 4������� 0 4����� 0 4��� � �� 0 4�������� 0 4����� 0 4��� ����

where

��� � �-�income � ������� �

� � otherwise�

� � � �-� ������� � income � � � � ����� �� � otherwise

�...

���� � �-�moderately sat.

� � otherwise�

Chapter 6: Loglinear Models for Contingency Tables 3

data jobsatis;input income satis count @@;cards;1 1 2 1 2 4 1 3 13 1 4 32 1 2 2 2 6 2 3 22 2 4 43 1 0 3 2 1 3 3 15 3 4 84 1 0 4 2 3 4 3 13 4 4 8;proc freq;weight count;tables income*satis / chisq;

proc genmod; class income satis;model count = income satis

/ dist=poi link=log residuals obstats;

proc genmod; class income satis;model count = income satis income*satis

/ dist=poi link=log residuals obstats;run;

Chapter 6: Loglinear Models for Contingency Tables 4

Independence Model

STATISTICS FOR TABLE OF INCOME BY SATISStatistic DF Value Prob------------------------------------------------------Chi-Square 9 11.524 0.241Likelihood Ratio Chi-Square 9 13.467 0.143

The GENMOD ProcedureCriteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 9 13.4673 1.4964Pearson Chi-Square 9 11.5243 1.2805Log Likelihood . 129.0550 .

Analysis Of Parameter EstimatesParameter DF Estimate Std Err ChiSquare Pr>Chi

INTERCEPT 1 1.6692 0.2748 36.8874 0.0001INCOME 1 1 -0.0870 0.2952 0.0869 0.7682INCOME 2 1 0.3483 0.2666 1.7068 0.1914INCOME 3 1 0.0000 0.2887 0.0000 1.0000INCOME 4 0 0.0000 0.0000 . .SATIS 1 1 -1.7492 0.5417 10.4256 0.0012SATIS 2 1 -0.4964 0.3390 2.1448 0.1431SATIS 3 1 1.0076 0.2436 17.1073 0.0001SATIS 4 0 0.0000 0.0000 . .

Observation StatisticsCOUNT Pred Reschi StReschi

2 0.8462 1.2544 1.44064 2.9615 0.6034 0.7305

13 13.3269 -0.0896 -0.16063 4.8654 -0.8457 -1.07922 1.3077 0.6054 0.75256 4.5769 0.6652 0.8716

22 20.5962 0.3093 0.60054 7.5192 -1.2834 -1.77260 0.9231 -0.9608 -1.11711 3.2308 -1.2411 -1.5211

15 14.5385 0.1210 0.21988 5.3077 1.1686 1.50980 0.9231 -0.9608 -1.11713 3.2308 -0.1284 -0.1574

13 14.5385 -0.4035 -0.73278 5.3077 1.1686 1.5098

Chapter 6: Loglinear Models for Contingency Tables 5

Saturated Model

The GENMOD Procedure

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 0 0.0000 .Pearson Chi-Square 0 0.0000 .Log Likelihood . 135.7886 .

Analysis Of Parameter Estimates

Parameter DF Estimate Std Err ChiSquare Pr>Chi

INTERCEPT 1 2.0794 0.3536 34.5926 0.0001INCOME 1 1 -0.9808 0.6770 2.0990 0.1474INCOME 2 1 -0.6931 0.6124 1.2812 0.2577INCOME 3 1 -0.0000 0.5000 0.0000 1.0000INCOME 4 0 0.0000 0.0000 . .SATIS 1 1 -24.6931 0.8660 813.0021 0.0001SATIS 2 1 -0.9808 0.6770 2.0990 0.1474SATIS 3 1 0.4855 0.4494 1.1674 0.2799SATIS 4 0 0.0000 0.0000 . .INCOME*SATIS 1 1 1 24.2877 1.2583 372.5631 0.0001INCOME*SATIS 1 2 1 1.2685 1.0206 1.5448 0.2139INCOME*SATIS 1 3 1 0.9808 0.7824 1.5715 0.2100INCOME*SATIS 1 4 0 0.0000 0.0000 . .INCOME*SATIS 2 1 0 24.0000 0.0000 . .INCOME*SATIS 2 2 1 1.3863 0.9354 2.1964 0.1383INCOME*SATIS 2 3 1 1.2192 0.7053 2.9888 0.0838INCOME*SATIS 2 4 0 0.0000 0.0000 . .INCOME*SATIS 3 1 1 0.0000 81377.4 0.0000 1.0000INCOME*SATIS 3 2 1 -1.0986 1.2583 0.7623 0.3826INCOME*SATIS 3 3 1 0.1431 0.6274 0.0520 0.8196INCOME*SATIS 3 4 0 0.0000 0.0000 . .INCOME*SATIS 4 1 0 0.0000 0.0000 . .INCOME*SATIS 4 2 0 0.0000 0.0000 . .INCOME*SATIS 4 3 0 0.0000 0.0000 . .INCOME*SATIS 4 4 0 0.0000 0.0000 . .

Chapter 6: Loglinear Models for Contingency Tables 6

Observation Statistics

COUNT Pred Reschi StReschi

2 2.0000 -2.51E-15 -8.429E-84 4.0000 -4.44E-16 .

13 13.0000 -1.48E-15 -3.622E-83 3.0000 -5.13E-16 .2 2.0000 -2.51E-15 -2.89E-106 6.0000 -1.09E-15 -5.162E-8

22 22.0000 -7.57E-16 .4 4.0000 -4.44E-16 -7.23E-110 1.51E-10 -0.000012 .1 1.0000 -4.44E-16 .

15 15.0000 0 08 8.0000 -6.28E-16 .0 1.51E-10 -0.000012 -0.0000123 3.0000 -5.13E-16 -2.176E-8

13 13.0000 0 08 8.0000 -6.28E-16 -1.45E-10

Chapter 6: Loglinear Models for Contingency Tables 7

Like indep. models, models for expected cell counts incontingency tables are multiplicative, so log link producesadditivity and a “linear predictor”.

A model allowing association between�

and � has form

)+*-, � & �'� �� 4 0 465� 0 467� 0 465 7��� �

For example, for a� � �

table, odds-ratio � satisfies

)+*-, � � �� ) *-, & � � & & � & �� ) *-, � & � � 30 ) *-, � & �� )+*-, � & � �� )+*., � & ��

� � 4 0 465 � 0 467 � 0 465 7� � 30 � 4 0 465 0 467 0 465 7 � � 4 0 4 5 � 0 4 7 0 4 5 7� �� � 4 0 4 5 0 4 7 � 0 4 5 7 �

� 465 7� � 0 465 7 � 465 7� � 465 7��

� 4 5 7�'� � are association parameters:

465 7�'� � � all�, )+*-, � � �� � �

and � indep.

Chapter 6: Loglinear Models for Contingency Tables 8

NumberParameter Nonredundant4 �

4 5� " � �e.g., can set

4 5 � � �4 7� # � � 4 7 � � �4 5 7�'� ��" � � � # � �

(no. of products of dummy vars)

Total� " #

Note. For a Poisson loglinear model, the “residual” degreesof freedom are

df�

number of cells� ��� �no. of Poisson observations

�number nonredundant parameters

The test of independence using�

or�

is simply agoodness-of-fit test of the indep. loglinear model.

The model allowing association,

)+*-, � & �'� �� 4 0 4 5� 0 4 7� 0 4 5 7��� �

has

df� " # � � � 0 � " � � 30 � # � � 30 ��" � � � # � � � � � �

It is saturated, giving a perfect fit to any data set.

Chapter 6: Loglinear Models for Contingency Tables 9

Example (Income and Job Satisfaction (� � �

)).

❏ Independence model,

)+*-, � & �'� �� 4 0 4��� 0 4���

has goodness of fit statistics

� � �-��� � � � � ����� � � df� �

❏ Saturated model has� � � � � (df

� � ).

Estimated odds ratio using highest and lowest categoriesfor each variable is�& � � �&�����& � � �&�� �

� ���� ) *-, � �& � � 30 )+*-, � �&���� � )+*-, � �& � � �� )+*-, � �& � � ��

� ��� �� �4 0 �4��� 0 �4��� 0 �4�� �� ��� 0 ��� � � ��� � � ��� � �� ��� �4�� �� � 0 �4�� ���� �

�4�� �� � ��4�� �� � �� � ��� ���� � � � � � � � �! � �! � �

Note that the theoretical value should be " , since this isa saturated model and( � � ( ���( � � ( � �

� � � !�$# �$� !� � �

Chapter 6: Loglinear Models for Contingency Tables 10

Loglinear models for Three-Way Tables

Suppose�

, � , and�

are categorical variables with"

,#

, and�

levels, respectively, so that observations can be recorded inan

" � # � �contingency table.

Two-factor terms in a loglinear model represent conditionallog odds ratios at a fixed level of the third variable.

Example. Consider the loglinear model

) *-, � & �'��� �� 4 0 4 5� 0 4 7� 0 4��� 0 4 5 ���� 0 4 7 ���� �

For a� � � � �

table, this model satisfies

❏�

and � are conditionally independent, given�

:

)+*-, � � 5 7� �� �� � � � 5 7�� � � �

❏ the�

-�

( � -�

) odds ratio is the same at all levels of �(�

), i.e., there is no three-factor interaction. E.g.,

) *-, � � 5 � �� � �� 465 �� � 0 465 � � 465 �� � 465 � �� ��� �

does not depend on�

Denote this model by� � � � � �

, called the model of�

- �conditional independence.

Chapter 6: Loglinear Models for Contingency Tables 11

Example. Consider the loglinear model

) *-, � & �'��� � 4 0 465� 0 467� 0 4 �� 0 465 7��� 0 465 ���� 0 467 ���� �

This is called the model of homogeneous association, or nothree-factor interaction, denoted by

��� � � � � � � � .

Each pair of variables is conditionally dependent, butassociation (as measured by odds ratios) is the same at alllevels of the third variable.

Example. The model

)+*., � & ����� �� 4 0 465� 0 467� 0 4 ��0 465 7� � 0 465 ���� 0 467 ���� 0 465 7 ��'��� �

❏ permits three-factor interactions, and

❏ is saturated (�& �'��� � ( ����� ).

Denote just by��� � �

.

Chapter 6: Loglinear Models for Contingency Tables 12

Berkeley Graduate Admissions Data

GenderMale Female

AdmittedDept. Yes No Yes No

1 512 313 89 192 353 207 17 83 120 205 202 3914 138 279 131 2445 53 138 94 2996 22 351 24 317

1198 1493 557 1278(44.5%) (55.5%) (30.4%) (69.6%)

Let� �

admission (yes,no)� �

gender (M,F)� �

dept. (1,2,3,4,5,6)

(We consider loglinear models, and then logit models withresponse

�.)

Chapter 6: Loglinear Models for Contingency Tables 13

For�

-�

marginal table,

AdmittedYes No

M 1198 1493F 557 1278

�� � ���-��� # � � � #

� � � � � !� ��� - � ��� # �

(Estimated odds of admission for males is nearly double thatfor females).

For H � :�

and�

independent in� � �

marginal table,� � � � � �, df

� �. Very strong evidence of a higher

admission rate for men.

Chapter 6: Loglinear Models for Contingency Tables 14

1. Loglinear model� � � � � �

assumes�

and�

conditionally independent given�

E.g., for dept. 1,

�� ��� � � �

� � � �-� � � !� � #�� � � � � � � � � .!��� ��� � - � �-� �

� �� ��� � � ����� � �

� ��� ��� �

2. Model� � � � � � � � �

allows an (conditional)�

-�

association, with the same odds ratio for each dept.E.g., for dept. 1,

�� ��� � � �

� � � ��� � -!�$����� � -� � � � �� � !� ���� � � � � � �

� �� ��� � � ����� � �

� ��� ��� � ��� � �4 ���� � 0 �4 ��� � �4 ���� � �4 ��� �

� � � � �

�� �� � � � ���

Controlling for department, the estimated odds ofadmission for males equals

� � � times the estimated oddsfor females.

Recall that the sample marginal�

-�

odds ratio was1.84. This ignores dept., rather than controlling for it(Simpson’s paradox).

Chapter 6: Loglinear Models for Contingency Tables 15

data berkeley;input dept gender admit count @@;cards;1 1 1 512 1 1 2 313 1 2 1 89 1 2 2 192 1 1 353 2 1 2 207 2 2 1 17 2 2 2 83 1 1 120 3 1 2 205 3 2 1 202 3 2 2 3914 1 1 138 4 1 2 279 4 2 1 131 4 2 2 2445 1 1 53 5 1 2 138 5 2 1 94 5 2 2 2996 1 1 22 6 1 2 351 6 2 1 24 6 2 2 317;proc genmod; class admit dept gender;

model count = admit dept gender admit*dept dept*gender/ dist=poi link=log obstats residuals;

proc genmod; class admit dept gender;model count = admit dept gender

admit*gender admit*dept dept*gender/ dist=poi link=log obstats residuals;

data logistic;input dep gen adm_yes total @@;cards;1 1 512 825 1 2 89 1082 1 353 560 2 2 17 253 1 120 325 3 2 202 5934 1 138 417 4 2 131 3755 1 53 191 5 2 94 3936 1 22 373 6 2 24 341;proc genmod data=logistic; class dep;

model adm_yes/total = dep/ dist=bin link=logit obstats residuals;

proc genmod data=logistic; class dep gen;model adm_yes/total = dep gen / dist=bin link=logit ;

run;

Chapter 6: Loglinear Models for Contingency Tables 16

Loglinear Model � � ����� �

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 6 21.7355 3.6226Pearson Chi-Square 6 19.9383 3.3231Log Likelihood . 20502.3379 .

Observation Statistics

COUNT Pred Reschi StReschi

512 531.4308 -0.8429 -4.1531313 293.5692 1.1341 4.153189 69.5695 2.3296 4.153019 38.4311 -3.1344 -4.1531353 354.1880 -0.0631 -0.5037207 205.8120 0.0828 0.503717 15.8120 0.2988 0.50378 9.1880 -0.3919 -0.5037

120 113.9978 0.5622 0.8681205 211.0022 -0.4132 -0.8681202 208.0022 -0.4162 -0.8681391 384.9978 0.3059 0.8681138 141.6326 -0.3052 -0.5459279 275.3674 0.2189 0.5459131 127.3674 0.3219 0.5459244 247.6326 -0.2308 -0.545953 48.0771 0.7100 1.0005138 142.9229 -0.4118 -1.000594 98.9229 -0.4950 -1.0005299 294.0771 0.2871 1.000522 24.0308 -0.4143 -0.6198351 348.9692 0.1087 0.619824 21.9692 0.4333 0.6198317 319.0308 -0.1137 -0.6198

Chapter 6: Loglinear Models for Contingency Tables 17

Loglinear Model � ��� � � � � � �

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 5 20.2043 4.0409Pearson Chi-Square 5 18.8242 3.7648Log Likelihood . 20503.1035 .

Analysis Of Parameter Estimates

Parameter DF Estimate Std Err ChiSquare Pr>Chi

INTERCEPT 1 5.7619 0.0552 10898.8902 0.0001ADMIT 1 1 -2.6246 0.1577 276.8824 0.0001ADMIT 2 0 0.0000 0.0000 . .DEPT 1 1 -2.1709 0.1280 287.7591 0.0001DEPT 2 1 -3.6056 0.2202 268.1010 0.0001DEPT 3 1 0.1789 0.0734 5.9326 0.0149DEPT 4 1 -0.2680 0.0809 10.9740 0.0009DEPT 5 1 -0.0863 0.0788 1.1995 0.2734DEPT 6 0 0.0000 0.0000 . .GENDER 1 1 0.0961 0.0751 1.6383 0.2006GENDER 2 0 0.0000 0.0000 . .ADMIT*GENDER 1 1 1 -0.0999 0.0808 1.5260 0.2167ADMIT*GENDER 1 2 0 0.0000 0.0000 . .ADMIT*GENDER 2 1 0 0.0000 0.0000 . .ADMIT*GENDER 2 2 0 0.0000 0.0000 . .ADMIT*DEPT 1 1 1 3.3065 0.1700 378.3789 0.0001ADMIT*DEPT 1 2 1 3.2631 0.1788 333.1189 0.0001ADMIT*DEPT 1 3 1 2.0439 0.1679 148.2433 0.0001ADMIT*DEPT 1 4 1 2.0119 0.1699 140.1808 0.0001ADMIT*DEPT 1 5 1 1.5672 0.1804 75.4378 0.0001ADMIT*DEPT 1 6 0 0.0000 0.0000 . .ADMIT*DEPT 2 1 0 0.0000 0.0000 . .ADMIT*DEPT 2 2 0 0.0000 0.0000 . .ADMIT*DEPT 2 3 0 0.0000 0.0000 . .ADMIT*DEPT 2 4 0 0.0000 0.0000 . .ADMIT*DEPT 2 5 0 0.0000 0.0000 . .ADMIT*DEPT 2 6 0 0.0000 0.0000 . .DEPT*GENDER 1 1 1 2.0023 0.1357 217.6813 0.0001DEPT*GENDER 1 2 0 0.0000 0.0000 . .DEPT*GENDER 2 1 1 3.0771 0.2229 190.6312 0.0001DEPT*GENDER 2 2 0 0.0000 0.0000 . .DEPT*GENDER 3 1 1 -0.6628 0.1044 40.3402 0.0001DEPT*GENDER 3 2 0 0.0000 0.0000 . .DEPT*GENDER 4 1 1 0.0440 0.1057 0.1731 0.6774DEPT*GENDER 4 2 0 0.0000 0.0000 . .DEPT*GENDER 5 1 1 -0.7929 0.1167 46.1874 0.0001DEPT*GENDER 5 2 0 0.0000 0.0000 . .DEPT*GENDER 6 1 0 0.0000 0.0000 . .DEPT*GENDER 6 2 0 0.0000 0.0000 . .

Chapter 6: Loglinear Models for Contingency Tables 18

Observation Statistics

COUNT Pred Reschi StReschi

512 529.2699 -0.7507 -4.0273313 295.7301 1.0042 4.027389 71.7303 2.0391 4.027219 36.2701 -2.8676 -4.0273353 353.6395 -0.0340 -0.2797207 206.3605 0.0445 0.279717 16.3605 0.1581 0.27978 8.6395 -0.2176 -0.2797

120 109.2453 1.0290 1.8808205 215.7547 -0.7322 -1.8808202 212.7547 -0.7373 -1.8808391 380.2453 0.5515 1.8808138 137.2074 0.0677 0.1413279 279.7926 -0.0474 -0.1413131 131.7926 -0.0690 -0.1413244 243.2074 0.0508 0.141353 45.6808 1.0829 1.6335138 145.3192 -0.6072 -1.633594 101.3192 -0.7271 -1.6335299 291.6808 0.4286 1.633522 22.9571 -0.1998 -0.3026351 350.0429 0.0512 0.302624 23.0429 0.1994 0.3026317 317.9571 -0.0537 -0.3026

Chapter 6: Loglinear Models for Contingency Tables 19

Logistic Regression:� resp.,

� pred.

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 6 21.7355 3.6226Pearson Chi-Square 6 19.9384 3.3231Log Likelihood . -2594.5099 .

Analysis Of Parameter Estimates

Parameter DF Estimate Std Err ChiSquare Pr>Chi

INTERCEPT 1 -2.6756 0.1524 308.1014 0.0001DEP 1 1 3.2691 0.1671 382.8830 0.0001DEP 2 1 3.2185 0.1749 338.6341 0.0001DEP 3 1 2.0600 0.1674 151.4450 0.0001DEP 4 1 2.0108 0.1699 140.0704 0.0001DEP 5 1 1.5861 0.1798 77.8249 0.0001DEP 6 0 0.0000 0.0000 . .

Observation Statistics

ADM_YES TOTAL Pred Lower Upper Reschi StReschi

512 825 0.6442 0.6129 0.6743 -1.4130 -4.153189 108 0.6442 0.6129 0.6743 3.9053 4.1531

353 560 0.6325 0.5926 0.6706 -0.1041 -0.503717 25 0.6325 0.5926 0.6706 0.4928 0.5037

120 325 0.3508 0.3206 0.3822 0.6977 0.8681202 593 0.3508 0.3206 0.3822 -0.5165 -0.8681138 417 0.3396 0.3075 0.3734 -0.3756 -0.5459131 375 0.3396 0.3075 0.3734 0.3961 0.545953 191 0.2517 0.2182 0.2885 0.8208 1.000594 393 0.2517 0.2182 0.2885 -0.5722 -1.000522 373 0.0644 0.0486 0.0850 -0.4283 -0.619824 341 0.0644 0.0486 0.0850 0.4479 0.6198

Chapter 6: Loglinear Models for Contingency Tables 20

Logistic Regression:� resp.,

��� � pred.

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 5 20.2043 4.0409Pearson Chi-Square 5 18.8243 3.7649Log Likelihood . -2593.7442 .

Analysis Of Parameter Estimates

Parameter DF Estimate Std Err ChiSquare Pr>Chi

INTERCEPT 1 -2.6246 0.1577 276.8823 0.0001DEP 1 1 3.3065 0.1700 378.3789 0.0001DEP 2 1 3.2631 0.1788 333.1190 0.0001DEP 3 1 2.0439 0.1679 148.2433 0.0001DEP 4 1 2.0119 0.1699 140.1808 0.0001DEP 5 1 1.5672 0.1804 75.4378 0.0001DEP 6 0 0.0000 0.0000 . .GEN 1 1 -0.0999 0.0808 1.5260 0.2167GEN 2 0 0.0000 0.0000 . .

Chapter 6: Loglinear Models for Contingency Tables 21

Goodness of Fit

To test H � : “model holds”, compare� ( �'��� � to

� �& �'��� � using� ,�

.

df�

no. counts�

no. nonredundant parameters

Example 1 (Berkeley Admissions Data).

Model (AD,DG)

) *-, � & �'��� �� 4 0 4 �� 0 4 �� 0 4 � � 0 4 � ���� 0 4 � ����� � � ���� � � � � ��� � � � df

� � � � � � 0 � 0 � 0 � 0 � 0 � � � �

P-value� � ��� ��� �

P-value� � ��� � #

Model gives a poor fit, i.e,�

and�

are not conditionallyindependent given

�.

Model (AG,AD,GD)

� � � � � � � � � � � #�� # � df

� � � � � � 0 � 0 � 0 � 0 � 0 � 0 � ��� �

P-value� � ��� �-� �

P-value� � ��� � �

So even the most complicated unsaturated loglinearmodel fits these data poorly.

Chapter 6: Loglinear Models for Contingency Tables 22

Residual Analysis

Recall for Poisson counts, the Pearson residual and theadjusted Pearson residual are

� �'��� � ( �'��� � �& �'����& �'���� ��� �� ������ ������

and � ����� � ( �'��� � �& ����������� � ( �'��� � �& �����

If model holds, dist. of � �'��� is approx. std normal ( � � � � � ).Example (Model (AD,DG)). (

�and

�cond. indep.).

Consider, e.g., the count for females not admitted in dept. 1:( � � ����� �& � � � #�� � ���

� �� � ��� � � #�� � �� � #�� � � � � ��� � ��� � � � � � � � �

The no. of females not admitted is much lower than expectedif

�indep. of

�, controlling for

�(analogous results hold

for other cells for this dept.).

Residuals show lack of fit only for dept. 1 (see SAS output).

If we re-fit the model (AD,DG) to the� � � � � table for

dept.’s 2–6, get� � � � � #

, df� � , indicating a very good fit.

So except for dept. 1, admission does not appear to depend ongender. In dept. 1, the probability of acceptance is higher forfemales than males.

Chapter 6: Loglinear Models for Contingency Tables 23

Inference About Partial Associations

LRT’s about parameters in loglinear models comparedeviances for models with and w/out those parameters.

E.g., in model (XY,XZ,YZ), testing H � : “all4 5 7�'� � � ”

corresponds to comparing this model to (XZ,YZ). The LRTtest statistic is� � ���

�� � � � � � � � � � � �� � ��� � � � � � � � �

Example. Test H � : “all4 � ��'� � � ” in (AG,AD,DG):

)+*-, � & �'��� �� 4 0 4 �� 0 4 �� 0 4 � � 0 4 � ��'� 0 4 � ���� 0 4 � ���� �

� � ����� � � �� � � � � � � � � � � � � � � � � � �

� � ���� � � � � � � � � ��� � �df

� � � � � �P-value

� � � �

According to this test, H � is plausible.

Note however that this test assumes that the model(AG,AD,DG) is correct, while in fact we have strongevidence to the contrary. Thus this example is only meant tobe illustrative of the method and should not otherwise betaken too seriously.

Chapter 6: Loglinear Models for Contingency Tables 24

Recall for model (AG,AD,DG),�� ��� � �� � ���1� �4 ���� � 0 �4 ��� � �4 ���� � �4 ��� �

� ���1� �4 ���� � (for GENMOD coding)

� �� � � � � � � � � �

A 95% CI for4 ���� �

is

� � � � � � � ��� ��� � � � # � # �� � � � � � # ��� � � #

and a 95% CI for � ��� � �� � � ��1� 4 ���� � is

� �� � ���� �!� � ���� � � �� � � ��� � � � �

Thus is it plausible that � ��� � �� � �.

Again, this is only meant for illustration and should nototherwise be taken too seriously, since the model(AG,AD,DG) does not fit these data well.

Chapter 6: Loglinear Models for Contingency Tables 25

Note. ❏ Loglinear models extend to any no. of dimensions(see Sec. 6.4), e.g., for a four-way table with vars.�

,�

, � ,�

, the model��� � ��� � ��� � � � � � � � � � �

permits assoc. for each pair of variables, but allows nothree-factor interactions.

❏ Loglinear models treat all variables as responses (naturalfor data such as book’s example on use of alcohol,tobacco, and marijuana). Logistic regression modelstreat a binary var. � as a response and other variables atexplanatory vars., which is more natural when therereally is a single response of interest, as in our currentexample (admission).

Chapter 6: Loglinear Models for Contingency Tables 26

The Loglinear-Logit Connection

The loglinear model��� � � � � � � �

, i.e.,) *-, � & �'��� �� 4 0 465� 0 467� 0 4 �� 0 465 7��� 0 465 ���� 0 467 ���� �

❏ treats variables symmetrically

❏ permits association for each pair of vars.

❏ allows no three-factor association (i.e., implieshomogeneous association)

Suppose � is binary, and

� ��� � � � � � � � � � ����� � � �If model

��� � ��� � � � � holds, then

)+*-,���� � � ��� �� )+*-,�� � ���� � � ���� � ) *-,�� � � � � � � � � ����� � �

� � � � � � � � ����� � � �� )+*-, � & � � � �� ) *-, � & � � � � 4 0 465� 0 467 � 0 4 �� 0 465 7� � 0 465 �� � 0 467 �� �

� � 4 0 465� 0 467 0 4 �� 0 465 7� 0 465 �� � 0 467 � � � � 4 7 � � 4 7

� ��� �0 � 4 5 7� � � 4 5 7�

� � � �� �0 � 4 7 �� � � 4 7 � �

� � � ���� � � 0 � 5� 0 � ��

i.e., logit model has additive main effects and no interaction.

Chapter 6: Loglinear Models for Contingency Tables 27

Example (Berkeley Graduate Admissions). Let� �

admission (yes/no) be response var. Logit model) *-, � � � � ��� � � 0 � �� 0 � ��

has goodness-of-fit� � � � � � � (df

� � ), identical tologlinear model

� � � � � � � � � .

Est.’d odds ratio for effect of�

on�

, controlling for�

, is���1� �� �� � �� � �� �� � ��� � � � � �

(identical to� ��1� �4 ���� � 0 �4 ��� � �4 ���� � �4 ��� �

).

Loglinear model treats table as 24 indep. Poisson variates.

Logit model treats table as 12 indep. binomial variates onresponse

�at 12 combinations of levels of

�and

�.

Note. The df for testing fit are the same for each model:

Logit model

no. obs.� � �

no. param.� � 0 � 0 � �

(residual) df� � � � � �

Loglinear model

no. obs.� � �

no. param.� � 0 � 0 � 0 � 0 � 0 � 0 � � ���

(residual) df� � � � ��� � �

Chapter 6: Loglinear Models for Contingency Tables 28

Note. ❏ For a given logit model, the loglinear model thatis equivalent (same goodness of fit, df, fitted values, etc)has the associations of � with explanatory var.’s impliedby the logit model, and has the fullest interaction termamong explanatory var.’s.

Example. � � � � � � � , predictors

�,

�,�

(4-waytable).

)+*., ��� � � �� � 0 � �� 0 ���� 0 ����corresponds to loglinear model

� � � � � � � � � � � � � .

❏ When there is a single binary response, simplest toapproach data directly using logit models.

❏ Loglinear models have advantage of generality — canhandle multiple responses, some of which may havemore than two outcome categories.