8
Biom. J. 34 (1992) 7, 781-788 Akademie Verlag On Estimation of Relative Risk in Case Control Studies PADAM SlNGH & ABHA RANI AGGARWAL Institute For Research in Medical Statistics (ICMR) Summary An estimator of relative risk in a case control study has been proposed in terms of observed cell frequencies and the probabilitiy of disease. The bias of the usual estimator i.e. odds ratio as compared to the new estimator has been workedout. The expression of Mean Square Error of proposed estimator has been derived in situations where probability of disease is exactly known and when it is estimated through an independent survey. It has been observed that there is a serious error using odds ratio as an estimate of relative risk when probability of disease is not negligible. In such situations the proposed estimator can be used with advantage. Key words: Bias; Case control study; Mean square error; Relative risk. 1. Introduction The case control study may be defined as that method of epidemiological investigation in which the frequency of an attribute or exposure to an environmental factor in cases (an individual with the disease), is compared to that in non disease as controls. If higher frequency of individuals with the Characteristic is found among cases than the controls, an association between disease and the characteristic may be inferred. CORNFIELDS (1951) showed that it is possible to use the relative frequency data of these type of case control studies to extimate the relative risk. He proposed odds ratio as an estimator of relative risk. He, however, indicated that the odds ratio provides the estimate of relative risk only when the disease under study has a low prevalence/incidence in the population. But, there are diseases for which incidence/prevalence rate is high. NEUTRA and DROLETTE (1978) dealt with unbiased estimation of the exposure specific rates without making any rare disease assumption in the three types of case control studies. The three types of case control studies considered depend upon the methods by which cases and non cases are selected. KUPPER et al. (1975) proposed an estimate of relative risk without any rare disease assumption utilizing probability of exposure. However, they derived the expressions for confidence interval and not of the variance of proposed estimator of relative risk. FLANDERS et al. (1986) considered exposure odds ratio in nested case control

On Estimation of Relative Risk in Case Control Studies

Embed Size (px)

Citation preview

Page 1: On Estimation of Relative Risk in Case Control Studies

Biom. J. 34 (1992) 7, 781-788 Akademie Verlag

On Estimation of Relative Risk in Case Control Studies

PADAM SlNGH & ABHA R A N I AGGARWAL Institute For Research in Medical Statistics (ICMR)

Summary

An estimator of relative risk in a case control study has been proposed in terms of observed cell frequencies and the probabilitiy of disease. The bias of the usual estimator i.e. odds ratio as compared to the new estimator has been workedout. The expression of Mean Square Error of proposed estimator has been derived in situations where probability of disease is exactly known and when it is estimated through an independent survey. It has been observed that there is a serious error using odds ratio as an estimate of relative risk when probability of disease is not negligible. In such situations the proposed estimator can be used with advantage.

Key words: Bias; Case control study; Mean square error; Relative risk.

1. Introduction

The case control study may be defined as that method of epidemiological investigation in which the frequency of an attribute or exposure to an environmental factor in cases (an individual with the disease), is compared to that in non disease as controls. If higher frequency of individuals with the Characteristic is found among cases than the controls, an association between disease and the characteristic may be inferred. CORNFIELDS (1951) showed that it is possible to use the relative frequency data of these type of case control studies to extimate the relative risk. He proposed odds ratio as an estimator of relative risk. He, however, indicated that the odds ratio provides the estimate of relative risk only when the disease under study has a low prevalence/incidence in the population. But, there are diseases for which incidence/prevalence rate is high. NEUTRA and DROLETTE (1978) dealt with unbiased estimation of the exposure specific rates without making any rare disease assumption in the three types of case control studies. The three types of case control studies considered depend upon the methods by which cases and non cases are selected. KUPPER et al. (1975) proposed an estimate of relative risk without any rare disease assumption utilizing probability of exposure. However, they derived the expressions for confidence interval and not of the variance of proposed estimator of relative risk. FLANDERS et al. (1986) considered exposure odds ratio in nested case control

Page 2: On Estimation of Relative Risk in Case Control Studies

782 P. SINCH, A. R. AGGARWAL: Relative Risk

studies with competing risks. HOGUE et al. (1983) suggested the modification in odds ratio with any one of the following information. (1) Overall probability of disease. (2) Probability of disease in the unexposed population. (3) Probability of disease in the exposed population. (4) Overall probability of exposure.

They suggested three estimators corresponding to situations (2) to (4) above. However HOGUE et al. (1983) did not derive the expression of variance/mean square error of their estimates explicitly in terms of cell frequencies, in the absence of which it is difficult to study the extent of error in estimation of relative risk as well as implications on test of significance. In the present paper an estimator of relative risk is proposed in terms of observed cell frequencies and overall probability of disease on the lines of HOGUE et al. (1983). The standard error associated with the estimator of overall probability of disease has also been considered while working out the standard error of the proposed estimator. The bias of the odds ratio as compared to the proposed estimator has been worked- out. Also the extent of under estimation of Mean Square Error (MSE) by using odds ratio vis-a-vis the variance/MSE of proposed estimator has been investi- gated. The bias in estimation of relative risk and MSE/Variance have implica- tions not only on the magnitude of relative risk but also on their significance.

2. Estimation Procedure

Consider a population consisting of N units of which N , are exposed to an environment and the rest ( N 2 = N - N , ) are not exposed. After certain time suppose A out of N , exposed and C out of N , unexposed develope the disease. The same can be represented as under

With disease Without disease Total Exposed A B = ( N 1 - A ) N , Unexposed C D = ( N 2 - C) N2

Total A + C B + D A + B + C + D = N

The parameter of interest is

For this ideally prospective studies should be used in which samples of exposed and unexposed are taken and followed. At the end of the follow up the number of exposed and unexposed getting the disease are observed. But since the prospective studies are costly and time consuming the retrospective studies or

Page 3: On Estimation of Relative Risk in Case Control Studies

Biom. J. 34 (1992) 7 783

case control studies are generally preferred. In this samples of cases i.e., (diseased persons) and controls i. e. (undiseased persons) are taken independently and the number of exposed in each is recorded retrospectively. Thus, essentially samples of n from cases and n’ from controls consist of samples from populations of ( A + C) and ( B + D) respectively. Suppose in a sample of n cases a are observed to be exposed to the environment and in the sample of n‘ controls b are observed as exposed, then, we have the following

Disease No disease Exposed a b Unexposed c( = n -a ) d ( = n’ - b)

Total n = a + c n’=b+d -~

In this u /a+c and h/b+d estimate unbiasedly the A / A + C & B/B+ D respectively. The estimator proposed by CORNFIELD (1951) known as the odds ratio is given by

The estimate of relative S.E of the estimator OR is given by

RSE=1/{(l/a)+(l/b)+(l/c)+(1/d)} ( 3 )

As already mentioned (2) estimates (1) only in situations where the disease under investigation is rare.

3. Proposed Estimator

Consider the new estimator of 0 as under

(4)

where P=incidence of disease in the population and Q = 1-P. Here P = ( A + C) /N and Q = ( R + D ) / N .

Dividing Numerator and Denominator by (a + c ) ~ . (b + d ) we get

b +Q.--.- a d +-----} C d

a + c b+d a + c b + d a d b c U b

+ Q . b + d ’ z ( 5 )

Page 4: On Estimation of Relative Risk in Case Control Studies

7 84 P. SINCH, A. R . ACGARWAL: Relative Risk

B and r’=- Let r=-

A A + C B + D

D p’=-

A + C B + D C

p=-

Obviously p = 1 - r , p‘ = 1 - 5 ’ .

a c Putting ~ = z + & , __ = p - E, Where E ( E ) = 0 and E ( c 2 ) = T ( 1 - z ) /n

a + c a + c

5 ’ ( 1 - T ’ ) - p ‘ - - d , Where E(E’)=O and E ( E ” ) =

d = 5 ’ + & ‘ , --

b & -

b + d b + d n’

By putting these values in equation ( 5 ) we get

U P 0 =-.-. U - - & ‘ + & -_ - + ( T p v) . _ . terms of higher order of E

1 r U P { V P

Where U = t ’ p P + Q r ’ p + p p ’ V = P r p ‘ + Q ~ ‘ p + T 7’

- t U . . E ( O ) = - - . - upto 1st order of approximation putting the values of U & V

we get P V

t ( r ’ p P + Q ~ p ‘ +,up’)

p ( P r p ’ + Q r ‘ p + 5 ~ ’ ) . E ( 6 ) = -

Now putting back the values of r , r’, p , p’ and P & Q in terms of A, B, C, D in equation ( 6 ) we get

A A A + B

E(O)=- c . C + D

On the similar lines

Page 5: On Estimation of Relative Risk in Case Control Studies

Biom. J. 34 (1992) 7 785

By putting the value of E ( c 2 ) = r p / n & E ( E ” ) = r‘p‘/n‘ we get M S E ( 6 ) as

T 2 1 p2 v4 n’p’

- - - - [ ( U V - P z p)2 (A + $) + Q z’ p’ (A + ‘>1 (8)

Substituting the values of 7, T’, p, p’, P, Q, U , & V in terms of A, B, C & D in equation ( 8 ) we get

______ ( A + B . C + D . A + C .- NAC)’

MSE (0) = - - A 2

C 2 ( A + B)4 A + C [- N ~ B ~ D ~ ( n , ~ B I )] +- +-- -

B + D n’D

Substituting the estimates of various terms in equation (8) in terms of a, b, c & d , the estimate of MSE(6) is given by

( W X - Pn” ac)2

+(;+a) ( Q 2 n 4 b 2 d 2 ) }

where W = Pad t ab + Qbc

and X = P b c + Q a d + c d . (

It is easy to see that for P = 0, Q = 1 the estimator in equation (4) reduces (adlbc) and the relative S.E from equation (9) equals equation (3).

4. Situation When the Standard Error in Estimation of P is Considered

In the derivative of MSE of the estimate it has been assumed that information on P is known. In fact the estimates of P are available from independent surveys alongwith their standard errors. If the sample size for estimate of P is known, say in, then the variance associated with estimate of P is given by PQ/m. This information can be utilised in deriving the expression of MSE by taking into account the S.E of estimate of P also.

Page 6: On Estimation of Relative Risk in Case Control Studies

786 P. S I N ~ ~ H , A. R. AGGARWAL: Relative Risk

In this case it is easy to verify that MSE(@) is given by

( A + B . C + D .A + C .- N A C ) ’ A + C

MSE(@ = -- C 2 ( A + B)4 r--

PQ/m] ( W N 2 (C + D)2 (BC - AD)’

+ Ni ?:’ (A A) + ( A + C)’ ( B + D)’

and its estimate is given by

MSE(@) =y+ 7 - +- (WX - Pn’’ U C ) ~ -

C v: {(: r ) 1 (Q’ n4 b2 d’) + X 2 (bc - ad)’ PQ/m

where W and X are defined by equation (10).

5. Empirical Investigation

With a view to illustrate the extent of bias using odds ratio as compared to the proposed estimator an empirical investigation has been undertaken. In this investigation populations have been constructed providing varying ranges for relative risks from 1.23 to 18.0. Further in each the prevalence of the disease is taken from .005 to 0.2. These ranges for relative risk and incidence rates have been considered so as to study the bias using odds ratio as compared to the proposed estimator in different situations. Also the extent of underestimation of the variance/mean square error has also been worked out as this has relevance on the significance of the relative risk.

The results are presented in Table 1.

Table 1

Comparison of odds ratio and the proposed estimator (0)

Sample Incidence Estimates Variance of the estimates A rate I 1 a b c d P O R 0 bias(%) V ( 0 ) V ( 0 R ) ( V ( O R ) /

V ( 0 ) - 1) x 100 of OR

Population 1 60 60 65 80 0.2 1.23 1.18 4.3 0.05 0.09 71 60 60 65 80 0.1 1.23 1.20 2.1 0.07 0.09 29 60 60 65 80 0.05 1.23 1.22 1.1 0.08 0.09 13 60 60 65 80 0.005 1.23 1.23 0.1 0.09 0.09 1

Population 2 112 176 88 224 0.2 1.62 1.47 10.2 0.04 0.08 90 112 176 88 224 0.1 1.62 1.54 5.0 0.06 0.08 36 112 176 88 224 0.05 1.62 1.58 2.5 0.07 0.08 17 112 176 88 224 0.005 1.62 1.61 0.2 0.08 0.08 2

Page 7: On Estimation of Relative Risk in Case Control Studies

Biom. J. 34 (1992) 7 787

Table 1 (Continuation)

Sample Incidence Estimates Variance of the estimates A rate I a b c d P O R 0 bias(%) V ( 0 ) V ( O R ) (V(OR)/

of O R V(O)- l )x 100

Population 3 60 30 33 80 0.2 60 30 33 80 0.1 60 30 33 80 0.05 60 30 33 80 0.005

Population 4 150 100 30 160 0.2 150 100 30 160 0.1 150 100 30 160 .05 150 100 30 160 ,005

Population 5 150 100 30 260 0.2 150 100 30 260 0.1 150 100 30 260 0.05 150 100 30 260 0.005

Population 6 150 100 50 500 0.2 150 100 50 500 0.1 150 100 50 500 0.05 150 100 50 500 0.005

Population 7 200 30 100 270 0.2 200 30 100 270 0.1 200 30 100 270 0.05 200 30 100 270 0.005

4.8 3.42 41.8 4.8 4.05 19.8 4.8 4.42 9.6 4.8 4.80 0.9

8.0 5.54 44.39 8.0 6.64 20.44 8.0 7.28 9.83 8.0 7.29 .95

13.0 7.86 65.4 13.0 1G.00 30.0 13.0 11.36 14.4 13.0 12.82 1.4

15.0 7.59 97.7 15.0 10.33 45.2 15.0 12.31 21.8 15.0 14.69 2.1

18.0 7.4 144.0 18.0 10.8 67.2 18.0 13.6 32.5 18.0 17.4 3.2

0.61 1.14 1.57 2.1 1

I .26 2.1 1 2.75 3.50

2.32 4.43 6.27 8.76

1.23 2.93 4.86 8.17

.8 1 2.81 6.24

15.08

2.18 2.18 2.18 2.18

3.60 3.60 3.60 3.60

9.1 9.1 9.1 9. I

8.7 8.7 8.7 8.7

16.86 16.86 16.86 16.86

256 91 39 3

186 70 31 3

293 105 45 4

609 197 79 6

1977 498 170

12

It would appear from this table that for lower values of incidence ratio the odds ratio and the proposed estimator provide almost the same results. But, for higher values of incidence rate the extent of bias becomes sizeable and it is as high as over 140% when incidence rate is 0.2 & about 70% when the incidence rate is 0.1.

References

CORNFIELD, J., 1951 : A method of estimating comparative rates from clinical data. Applications to

CAROL J. R. HOCUE, DAVID W. GAYLOR and K. F. SCHULZ, 1983: Estimators of relative risk for case

CAROL J. R. HOGIJE, DAVID W. GAYLOR and K. F. SCHULZ, 1991: The case exposure study. A further

FLIS, J. L., 1981: Statistical methods l o r rates and proportions. Second edition. New York: John Wiley

cancer of the lung, breast, and cervix. JNCI 11, 1269-1275.

control studies. Amer. J. Epi 118, 396-397.

explication and response to a critique. Amer. J. Epi. 124, 877-883.

and Sons.

Page 8: On Estimation of Relative Risk in Case Control Studies

788 P. SINGI-I, A. R. AGGARWAL: Relative Risk

FLANDERS, W. DANA, WILLIAM C. LOW, 1986: The exposure odds ratio in nested case control studies

GREENLAND, S., THOMAS, D. C., MARCENSTERN, H., 1986: The rare disease assumption revisited: a reviewed of case control design and a critique of “Estimators of relative risk for case control studies”. Amer. J. Epi. 124, 869-876. GREENLAND, S., THOMAS, D. C., 1982: O n the need for the rare disease assumption in case control

studies. Amer. J. Epi. 116, 547-553. KUPPER, L. L., MCMICHAEL, A. J. and SPIRTAS, R., 1975: A Hybrid Epidemiologic study design useful in estimating relative risk. JASA 70, 524-528. NEUTRA, R. and DROLETTE, E. MARGARET, 1978: Estimating exposure specific disease rates from case

Received June 1991

with competing risk. Amer. J. Epi. 124, 684-692.

control studies using Bayes’ Theorem. Amer. J. Epi. 108, 214-222.

Dr. PADAM SINGH Institute for Research in Medical Statistics Medical Enclave, Ansari, Nagar New Delhi- 110029 India