An objective Bayesian estimation of parameters in a log-binomial model

Contents lists available at ScienceDirect

Journal of Statistical Planning and Inference

Journal of Statistical Planning and Inference 146 (2014) 113–121

0378-37http://d

n CorrE-m

journal homepage: www.elsevier.com/locate/jspi

An objective Bayesian estimation of parametersin a log-binomial model

Rong Zhou a, Siva Sivaganesan b,n, Martial Longla c

a Medpace, Cincinnati, OH, USAb Department of Mathematical Sciences, University of Cincinnati, P.O. Box 210025, Cincinnati, OH 452421, USAc University of Mississippi, University, Mississippi, USA

a r t i c l e i n f o

Article history:Received 22 January 2012Received in revised form30 July 2013Accepted 11 September 2013Available online 25 September 2013

Keywords:Log-binomial modelBayesian methodsObjective priors

58/$ - see front matter & 2013 Elsevier B.V.x.doi.org/10.1016/j.jspi.2013.09.006

esponding author. Tel.: þ1 513 556 4097.ail address: [email protected] (S. Sivaga

a b s t r a c t

Log-binomial model is commonly recommended for modeling prevalence ratio just aslogistic regression is used to model log odds-ratio. However, for the log-binomial model, theparameter space turns out to be restricted causing difficulties for the maximum likelihoodestimation in terms of convergence of numerical algorithms and calculation of standarderrors. Bayesian approach is a natural choice for modeling log-binomial model as it involvesneither maximization nor large sample approximation. We consider two objective or non-informative priors for the parameters in a log-binomial model: an improper flat prior anda proper prior. We give sufficient conditions for the posterior from the improper flat priorto be proper, and compare the two priors in terms of the resulting posterior summaries.We use Markov Chain Monte Carlo via slice sampling to simulate from the posteriordistributions.

& 2013 Elsevier B.V. All rights reserved.

1. Introduction

When modeling binary outcomes, log-binomial model is an alternative to the more common logistic regression model.When the probability of an outcome is not low, for instance, as in modeling the prevalence of a non-rare decease, log-binomial model has been recommended, see Wacholder (1986), Zocchetti et al. (1995), and Skov et al. (1998).

However, unlike in the logistic regression model, the parameters in the log-binomial model have to satisfy certainrestrictions to ensure that the probability of disease must lie between 0 and 1. This results in a parameter space that is asubset of the Euclidean space with certain hyper-planes as boundaries. When the maximum likelihood approach is used tofit a log-binomial model, this restriction causes difficulties when the maximum likelihood estimate (MLE) is close to theboundary of the parameter space. In such cases, standard software such as SAS can have convergence problems, and evenwhen the MLE is found through careful numerical methods, the frequentist standard error based on Fisher information canbe a poor approximation.

Bayesian approach offers an attractive alternative for estimation of parameters in a log-binomial model, as it does notrely on restricted maximization or large sample normal approximation. Chu and Cole (2010) used Bayesian approach toestimate log-binomial parameters. They used a flat or constant prior for the model parameters, and implemented the modelfitting using WinBUGS (Spiegelhalter et al., 2003). They used a parameter space based on a rectangular region for the rangeof covariate values assuming that all combinations of covariates set at their extreme values are plausible. Use of WinBUGS

All rights reserved.

nesan).

www.sciencedirect.com/science/journal/03783758

www.elsevier.com/locate/jspi

http://dx.doi.org/10.1016/j.jspi.2013.09.006



http://crossmark.crossref.org/dialog/?doi=10.1016/j.jspi.2013.09.006&domain=pdf



mailto:[email protected]


R. Zhou et al. / Journal of Statistical Planning and Inference 146 (2014) 113–121114

with a large number of covariates, or when the values of covariates in the data or their plausible values fall in an irregular ornon-rectangular region can be problematic. In this paper, we extend the work of Chu and Cole, by evaluating the conditionsunder which the posterior for flat prior is proper, propose an alternative proper objective prior, and compare the results. Theslice sampling method of Neal (2003) is used to carry out the Markov Chain Monte Carlo (MCMC) simulation from theposterior distribution, in general settings.

The paper is organized as follows. In Section 2, we describe the log-binomial model and briefly review the frequentistapproach. In Section 3, we focus on Bayesian approach to estimation of parameters of log-binomial model with onecovariate, give a necessary and sufficient condition for the property of the posterior using flat prior, propose an objectiveproper prior, and compare the results using simulation. In Section 4, we focus on multiple covariates, give a sufficientcondition for the posterior with resect to flat prior to be proper, extend the objective proper prior, compare the results, andgive a real data example. We end with concluding remarks.

2. Log-binomial models

Let Y ð ¼ 0 or 1Þ denote the absence or presence of an event for a subject, for whom values of K covariates x¼ ðx1;…; xK Þis available. The log-binomial model assumes that the probability of the event p is given by

logðpÞ ¼ log PðY ¼ 1jxÞ ¼ β0þβ1x1þ⋯þβKxK : ð1ÞSince the probability p is between 0 and 1, the parameters β0; β1;…; βK must satisfy the condition

β0þβ1x1þ⋯þβKxKr0 for all x in the data: ð2ÞFor subject i with covariate value xi ¼ ðxi1;…; xiK Þ, the condition (2) leads to the restriction

β0þβ1xi1þ⋯þβKxiKr0:

Therefore the parameters space for a log-binomial model with covariate values xi ¼ ðxi1;…; xiK Þ for subjects i¼ 1;…;n is

Ω¼ β ¼ ðβ0;…; βK Þ : β0þ ∑K

k ¼ 1βkxikr0 for i¼ 1;…;n

( ): ð3Þ

Thus the likelihood based on sample data y¼ fðxi; yiÞ : i¼ 1;…;ng is

LðβjyÞ ¼ ∏i:yi ¼ 1

eβ0 þ∑Kk ¼ 1βkxik � ∏

i:yi ¼ 01�eβ0 þ∑K

k ¼ 1βkxik� �

IΩðβÞ; ð4Þ

where IAð�Þ is the indicator function for set A.

2.1. Frequentist methodology

Calculation of MLE of the log-binomial model parameters using standard software such as SAS encounters difficultieswhen the value of the MLE lies near the boundary of the parameters space. In order to overcome this difficulty in calculatingthe MLE for the log-binomial models, Deddens et al. (2003) proposed a method, called the COPY method; also see Deddensand Peterson (2008). This method creates an expanded data set that contains ðc�1Þ copies of the original data, and one copyof the original data with the Yi values switched (1's changed to 0's and 0's changed to 1's), where c is a positive integer. Forthe expanded data, MLE can be found easily using software like PROC GENMOD in SAS, and when choosing a suitably largenumber of copies, c, this estimate would be sufficiently close to the MLE for the original data.

COPY method can be thought of as similar to the method of simulated annealing to maximize an objective function, andit gives a good approximation to the MLE of the parameters in the log-binomial model. However, it does not provide ameans to get a good estimate of the standard error of the MLE. The standard error obtained by adjusting the standard errorobtained from the expanded data is not a reliable estimate. For a simulated data set with one covariate x and 109 subjects, asgiven in Table 1, the algorithm used by SAS/GENMOD to calculate the MLE of the parameter converges with the standarderror of 0.1167 for the regression coefficient. Answers from Bayesian approach also gave similar posterior standard deviationfor the parameter. But the COPY method gives a much smaller adjusted standard error of 0.0422. The COPY method does notgive the correct standard error since the estimated Fisher information matrix based on the original data may not be (evenapproximately) proportional to that based on the expanded data. It also has difficulties when there are large number of

Table 1Simulated data.

Frequency X

0 1 2 3 5

y¼0 68 5 4 1 0y¼1 24 1 4 1 1

R. Zhou et al. / Journal of Statistical Planning and Inference 146 (2014) 113–121 115

covariates. When the MLE is on or near the boundary of the parameter space as evidence that the true values of theparameters are near the boundary, use of large sample normal approximation of MLE to make inference may not bereasonably acculturate. There is also an issue with prediction when using the MLE approach. If the covariate values forprediction are out of their range in the data used to fit the model, the estimated probability may be greater than 1,preventing prediction of the response variable.

2.2. Bayesian inference using non-informative priors

Often, there is little or no prior information about the parameters to carry out a subjective Bayesian analysis, and in suchinstances, non-informative or objective priors are commonly used, see Kass and Wasserman (1996) for a review on thechoices of non-informative priors. There is a rich literature on Bayesian analysis of generalized linear models using non-informative priors. Among them are, with further references therein, Ibrahim and Laud (1991), Sun et al. (2001), and Gelfandand Sahu (1999). For parameters in a generalized linear model, the “standard” non-informative priors such as Jeffreys prior,reference prior (Berger and Bernardo, 1992) or probability matching prior (Datta and Mukerjee, 2004), while attractive, posechallenges in their derivation. While the log-binomial model does not belong to the standard generalized linear modelfamily, it also shares the same difficulties with regards to these non-informative priors. Simple non-informative priors areoften useful in this case, since they are easy to implement and to do computation with. However, when such a prior isimproper, there is still a need to verify that the posterior is proper (cf. Sun et al., 2001). Chu and Cole (2010) used flat (orconstant prior) for deriving Bayesian estimates for the log-binomial parameters. In the following sections, we prove thatunder standard conditions, the posterior corresponding to the flat prior is indeed proper, propose an alternative proper non-informative prior, and compare the results with that of the flat prior.

3. Bayesian inference with single covariate

We consider a model for a binary response variable Y given by

logðpiÞ ¼ log PðYi ¼ 1jxiÞ ¼ β0þβ1xi for i¼ 1;…;n ð5Þwhere xi's are assumed known. The parameter space for ðβ0; β1Þ is given by (see (3))

Θ¼ fðβ0; β1Þ : β0þβ1xir0 for i¼ 1;…;ng: ð6Þ

3.1. Flat prior

The flat prior or the constant prior for ðβ0; β1Þ ispðβ0; β1Þ ¼ 1 for ðβ0; β1ÞAΘ: ð7Þ

For the log-binomial model above, this prior is a simple choice as a non-informative prior, and the corresponding posteriordistribution is given by

pðβ0; β1jyÞp ∏n

i ¼ 1:yi ¼ 1eβ0 þβ1xi ∏

n

i ¼ 1:yi ¼ 0ð1�eβ0 þβ1xi Þ � IΘðβ0; β1Þ; ð8Þ

where IAð�Þ is the indicator function for set A. However, the prior in (7) is an improper prior since the parameter space isunbounded, and hence there is a need to ensure that the joint posterior distribution of ðβ0; β1Þ is proper.

The following theorem gives a necessary and sufficient condition for the posterior distribution resulting from the flatprior to be proper. The proof is given in the Appendix.

Theorem 1. The posterior distribution of ðβ0; β1Þ with respect to flat prior (7) is proper if and only if the following conditions aresatisfied:

1.
xi's are not all equal. 2. There is a success ðyi ¼ 1Þ corresponding to a xiamaxjxj, a non-maximal value of the covariate. 3. There is a success ðyi ¼ 1Þ corresponding to a xiaminj xj, a non-minimal value of the covariate.
From the above result, the posterior is proper if there are two or more distinct xi's, with a success at each of the smallest andthe largest values of xi's, or if there is a success at a value of x which is at neither extreme.

3.2. A proper non-informative prior

We now propose another non-informative prior, specific to the log-binomial model. We first assume without loss ofgenerality that the values of the covariate x are centered to have mean 0, with the minimum of the covariate values


minðxiÞo0 and the maximum maxðxiÞ40. We also assume that x¼0, after centering, is a plausible value. With thisassumption the parameter space can be written as

Θs ¼ fðβ0; β1Þ : β0o0; �β0=minðxiÞoβ1o�β0=maxðxiÞg: ð9Þ

We specify the prior for the parameters β0 and β1 in two stages. First, we assign Uniformð0;1Þ prior for the probabilityPðY ¼ 1jx¼ 0Þ ¼ eβ0 , which translates to a marginal prior for β0 given by

pðβ0Þ ¼ eβ0 for β0o0:

From (9), given β0, β1 lies in the interval ð�β0=minðxiÞ; �β0=maxðxiÞÞ, and hence a reasonable objective prior for β1,conditional on β0, is given by the uniform distribution

pðβ1 β0�� ¼ 1

jβ0jf1=maxðxiÞ�1=minðxiÞgfor �β0=min xið Þoβ1o�β0=max xið Þ:

Now, combining the above two, an objective prior for ðβ0; β1Þ is

π β0; β1� �¼ eβ0

jβ0jf1=maxðxiÞ�1=minðxiÞgfor β0; β1

� �AΘs: ð10Þ

3.2.1. Simulating from the posterior distributionClearly, the posterior distributions corresponding to the priors (7) and (10) are not conducive to calculation of posterior

summaries through direct evaluation or through direct Monte Carlo simulation, and we use MCMC methods to calculatethem. We used Metropolis–Hastings algorithm (Metropolis et al., 1953; Hastings, 1970), Slice Sampler (Neal, 2003), andAdaptive Rejection sampler (Gilks and Wild, 1992), and compared their performance with a few data sets. While all threegave similar values for the various posterior summaries, the slice sampler had smaller autocorrelation for the MCMC drawsthan the other two, along with smaller Monte Carlo error for the same simulation length. We therefore used slice samplerfor all the computation reported here.

Chu and Cole (2010) used WinBUGS to do the MCMC simulation from the posterior distribution. They performedextensive simulation evaluating the frequentist characteristics of the estimates and confidence intervals using the flat prior,and compared with their frequentist counterparts. They found that the Bayesian estimate had a slight bias but gave asmaller MSE, and Bayesian confidence intervals had good frequentist coverage probability.

3.3. Simulation study comparing the two priors

Here, we use simulation study to compare the two priors (7) and (10). Two simulations with different prevalence ratiosare performed.

Simulation 1. In this simulation, 100 data sets were simulated with each data set consisting of 100 subjects. The covariatex was assigned values drawn from uniform Uð0;10Þ. The true value of β0 and β1 was set to �1.204 and 0, respectively, whichcorresponds to a moderate prevalence (p¼0.3) when x¼5.

Simulation 2. Here, the simulation design was the same as in Simulation 1, except for the true values of β0; β1. They wereset as �0.6567 and 0.06, respectively, which corresponds to a high prevalence of p¼0.7 at x¼5.

The posterior mean and standard deviation of the parameters were obtained using the two priors and slice sampler, for eachdata set. The averages of the posterior means and standard deviations of β1 over different data sets and the MSE for theposterior mean are displayed in Table 2, for each prior. The table shows that the results from the two priors are very similar.Similar results were also obtained for different settings of the parameters.

Table 2Average of posterior means and standard deviations (s.d.) and MSE for β1 using the two priors.

Prior Posterior mean (s.d.) MSEðβ1Þ

Simulation 1 Flat prior �0.0046 (0.0566) 0.002037Proper prior �0.0061 (0.0623) 0.001812

Simulation 2 Flat prior 0.0545 (0.01960) 0.000290Proper prior 0.0528 (0.02786) 0.000339


4. Bayesian inference with multiple covariates

Suppose, as in (1), there are K41 covariates x1;…; xK , and we use the following log-binomial model to relate theprobability pi ¼ PðYi ¼ 1Þ of a Bernoulli random variable Yi with the covariate values xik:

logðpiÞ ¼ β0þβ1xi1þ⋯þβKxiK for i¼ 1;…;n and k¼ 1;…;K: ð11ÞHere, the parameter space is given by (see (3))

Ω¼ fβ ¼ ðβ0; β1;…; βK ÞARK : β0þβ1xi1þ⋯þβKxiK r0; i¼ 1;…;ng

and the likelihood function is given by

LðβÞ ¼ ∏n

i ¼ 1;yi ¼ 1eβ0 þ∑K

j ¼ 1βjxij ∏n

i ¼ 1;yi ¼ 01�eβ0 þ∑K

j ¼ 1βjxij� �

IΩðβÞ:

4.1. Prior distributions

As before, we consider the use of the flat prior and a proper non-informative prior for the parameter β .

4.1.1. Flat priorThe flat or the constant prior for β is

πðβ0; β1;…; βK Þ ¼ IΘðβÞ; 1r irn: ð12ÞThe posterior distribution using the flat prior exists under mild conditions, as stated in the theorem below. For this, we findit convenient to let S be the set of all vectors ð1; xi1;…; xiK Þ for which Yi¼1, i.e.,

S¼ fð1; xi1;…xiK Þ : Yi ¼ 1;1r irng �RKþ1:

Theorem 2. The posterior distribution of ðβ0; β1;…; βK Þ with respect to the flat prior exists if the vector space spanned by S is ofrank at least ðKþ1Þ.

The proof is given in the Appendix.

4.1.2. A non-informative proper priorHere, we propose a non-informative proper prior β , as in Section 3.2. As before, we assume that the values of each

covariate are centered so that, for each covariate, the mean is 0, that the maximum is positive, and the minimum is negative.We also assume that the values of all (centered) covariates being 0 represents a plausible scenario, allowing meaningfulinterpretation for the corresponding prevalence p. Hence, as in Section 3.2, β0r0, and

π0ðβ0Þ ¼ expðβ0Þ for β0o0;

is a reasonable (marginal) prior for β0.With β0 fixed, the space of ðβ1;…; βK Þ is given by (see (3))

Ωðβ0Þ ¼ fðβ1;⋯; βK Þ : β1xi1þ⋯þβKxiKr�β0; i¼ 1;…;ng ð13ÞOften this is a bounded set. In the Appendix (Lemma A2), we give one sufficient condition for Ωðβ0Þ to be bounded, and thereare many others. Also, in the Appendix, we suggest a numerical method to determine the boundedness of Ωðβ0Þ.

When Ωðβ0Þ is bounded, as in Section 3.2, we can use the uniform distribution over Ωðβ0Þ,πuðβ1;…; βK jβ0Þ ¼ 1=vðβ0Þp1=jβ0jK for ðβ1;…; βK ÞAΩðβ0Þ;

as the conditional prior for ðβ1;…; βK Þ. Here, the normalizing constant vðβ0Þ is the size (Lebesgue measure) of Ωðβ0Þ. It is easyto verify as shown in the Appendix (Lemma A1) that vðβ0Þp jβ0jK , which helps with the computation, as it obviates a need todo integration to calculate vðβ0Þ for each β0. The resulting non-informative proper (NIP) prior is

πðβÞ ¼ πuðβ1;…; βK jβ0Þπ0ðβ0Þpexpðβ0Þjβ0jK

; βAΩ: ð14Þ

The prior above is defined in two stages, with the choice of the prior at each stage made using commonly accepted non-informative proper priors. It is also an extension of the proper non-informative proper prior (10) to higher dimensions,when Ωðβ0Þ is bounded.

When Ωðβ0Þ is not bounded, one may use independent Cauchy priors centered around zero (Berger, 1985; Gelman et al.,2008). Taking the scale parameters proportional to β0 would be reasonable, and also helpful with the computation sincethere would be no need to do numerical integration over Ωðβ0Þ to calculate the normalizing constant.


4.2. Example

We use a simulated data set with a sample of size 50 as an example to illustrate the Bayesian approach for a two-covariate log-binomial model using the two priors. The first covariate x1 was varied from �1.47 to 1.47 by the step of 0.06,and the second covariate x2 was randomly simulated from the uniform distribution U½�1;1�. With true values of theparameters set as β0 ¼ �0:7; β1 ¼ 0:3; β2 ¼ 0:15, the binary response variable Y was simulated from a Bernoulli distributionwith probability p from (11). Overall, there were 26 cases, i.e., yi¼1.

We used both the flat prior and the non-informative proper (NIP) prior (14). We verified that the set Ωðβ0Þ is bounded byplotting some of the inequalities defining Ωðβ0Þ in the two dimensional space of ðβ1; β2Þ. As noted in the Appendix, it sufficesto check this for a single value of β0. The posterior mean and standard deviation of the three parameters, and the maximumlikelihood calculated using PROC GENMOD in SAS, are given in Table 3. The Bayes estimates using the two priors and themaximum likelihood estimates are very close.

4.3. Simulation study

In this study, we simulated 100 data sets each using the same configuration as in the previous example. Posterior meansand standard deviations of each parameter were obtained using flat prior, and NIP prior (14), for each data set. The averagesof the posterior means (denoted by β̂ i, i¼0,1,2) and standard deviations (s.d.) for the parameters, as well as the associatedMSE were calculated for each prior, and are given in Table 4.

These results from this simulation indicate that the two priors lead to very similar answers. We also found similar resultsfrom other simulations that we have carried out.

4.4. Example: backache in pregnancy

According to a study of 180 pregnant women in the London Hospital, 48% experienced backache during pregnancy(Mantle et al., 1977). Many covariates have been recorded, including age, number of previous pregnancies, height, weightgain during pregnancy.

Since the prevalence rate is relatively high, logistic regression resulted in an odds ratio which is very discrepant fromprevalence ratio. Log-binomial regression can directly measure the effects of covariates on the prevalence ratio and is a moresuitable approach for this study. Four covariates were included in the model:

logðpiÞ ¼ β0þβ1Aiþβ2Hiþβ3Piþβ4Wi; ð15Þ

where A is the age, H is the height, P is the number of previous pregnancies, and W is the weight gain during pregnancy.Corresponding variable Y is a binary variable indicating whether the patient suffered backache during the pregnancy.Yi follows a Bernoulli distribution with probability pi.

Frequentist approach using SAS/GENMOD had convergency problems for this data set. We numerically verified that theset Ωðβ0Þ is bounded using constrOptim function in R as described in the Appendix. Bayesian model estimation was carriedout using the proper non-informative prior (14), and the slice sampling algorithm. The results are given in Table 5. Forcomparison with logistic regression model, we also provide the MLE using logistic regression.

The parameter estimates from both models are different, particularly for the intercept parameter, indicating thedifference between the logistic and log-binomial regression models when the prevalence is high.

Table 3Bayesian estimates using the two priors and the MLE using a simulated sample of size 50.

Methods β0 β1 β2Estimate (std) Estimate (std) Estimate (std)

MLE �0.667 (0.1394) 0.172 (0.1498) 0.034 (0.2417)Flat prior �0.729 (0.1314) 0.168 (0.1300) 0.044 (0.2077)NIP prior �0.692 (0.1123) 0.193 (0.1293) 0.078 (0.1792)

Table 4Average values of the posterior means and standard deviations, and the MSE.

Prior β̂0 (s.d.) MSE(β0) β̂1 (s.d.) MSE(β1) β̂2 (s.d.) MSE(β2)

Flat prior �0.740 (0.1502) 0.0239 0.255 (0.1250) 0.0175 0.124 (0.1844) 0.0343NIP prior �0.724 (0.1354) 0.0195 0.268 (0.1219) 0.0133 0.126 (0.1633) 0.0286

Table 5Bayesian estimates of log-binomial model parameters for backache in pregnancy data.

Parameter Log binomial model Logistic regressionBayesian estimates MLE

Estimation Std dev Estimate Std error

β0 �0.6231 0.0821 �0.0592 0.1539β1 �0.0321 0.0210 0.0086 0.0327β2 �0.5211 0.7732 �2.7558 2.3776β3 0.1433 0.0841 0.3069 0.1627β4 �0.0144 0.0243 0.0635 0.0300


5. Conclusion

Bayesian approach to estimation of parameters in a log-binomial model has advantages over maximum likelihoodapproach as the latter often has difficulties since it requires numerical maximization over a multidimensional parameterspace restricted by a number of hyper-plane boundaries. The large sample normal approximation often used to makeinference when using the MLE approach may also be poor, depending on the sample size and how close the true values areto the boundary of the parameter space. Bayesian approach provides an attractive alternative as it obviates both of thesedifficulties. When prior information is not available, non-informative priors are often used to carry out Bayesian inference.The flat or constant prior is a commonly used non-informative prior for regression coefficients but when it is an improperprior, as it is for the log-binomial model, there is a need to verify that the posterior is proper. Here, we have given conditionsunder which the posterior from the flat prior is proper. We also proposed a non-informative proper prior for the regressioncoefficients, when a boundedness condition is satisfied. Based on the simulation study, both priors had similar inference. Wealso suggested an alternative proper prior when the boundedness condition is not satisfied. Jeffreys prior is also commonlyused as a non-informative prior in Bayesian estimation. We used Jeffreys prior based on a minimal sample size andcompared the results with those of the other two priors and found them to be fairly similar. We therefore reported theresults using the flat prior and the non-informative proper prior. Based on our findings, we recommend the flat prior forBayes estimation with log binomial models, for common use in the absence of prior information. The proper non-informative prior may be suitable when one is interested in an objective Bayesian testing or model selection with logbinomial models. We hope to report on this elsewhere.

Chu and Cole (2010) used WinBUGS to fit the log-binomial model and illustrated that resulting Bayesian estimates havegood frequentist properties. But, the implementation using WinBUGS, as stated by Chu and Cole, has certain limitations asthey used a parameter space based on a rectangular region for covariates defined by the extreme values of each covariate.This would mean an assuming that such values are plausible while they may not be part of the data. Here, we implementedMarkov Chain Monte Carlo methods without such restrictions and evaluated three commonly used approaches, and foundthe slice sampling to be preferable in terms of more accuracy of Monte Carlo estimates. The code implementing the slicesampling, written in SAS, will be made available in the second authors website, http://math.uc.edu/�siva.

Acknowledgments

The authors would like to express their deep gratitude to two reviewers for their very valuable comments, one of whichled to the correction of an error in an earlier manuscript. These comments have resulted in a much improved paper. The firsttwo authors would also like to thank Jim Deddens for introducing us to this topic.

Appendix A

Proof of Theorem 1. The log-binomial model with one covariate is (see (5))

logðpiÞ ¼ log PðYi ¼ 1jxiÞ ¼ β0þβ1xi for i¼ 1;…;n: ð16ÞWithout loss of generality, we will assume xir0, since we can re-define the covariate as follows to achieve this. LettingxðnÞ ¼max1r irnxi, the model above can be written as logðpiÞ ¼ β0þβ1xðnÞ þβ1ðxi�xðnÞÞ; for i¼ 1;…;n. Now, letting β′0 ¼β0þβ1xðnÞ, β

′1 ¼ β1 and x′i ¼ ðxi�xðnÞÞ, the model above is equivalent to

logðpiÞ ¼ β′0þβ′1x′i for i¼ 1;…;n:

This model is of the same form as in (16), with x′ir0 for i¼ 1;…;n. We will assume a model of the form (16) with xir0 forall i¼ 1;…;n for the rest of this proof. Letting b¼ �min1r irn xi, the likelihood is

Lðβ0; β1Þ ¼ esyβ0 þ sxyβ1 ∏n

i ¼ 1ð1�eβ0 þβ1xi Þ1�yi ;

http://math.uc.edu/~siva

http://math.uc.edu/~siva


where sy ¼∑yi and sxy ¼∑xiyi. The marginal likelihood with respect to flat prior (7) is

mðyÞ ¼ZΘLðβ0; β1Þ dβ0 dβ1 ¼

def :AþB;

where

A¼Z 1

0

Z 0

�1Lðβ0; β1Þ dβ0 dβ1r

Z 1

0

Z 0

�1esyβ0 þ sxyβ1 dβ0 dβ1; ð17Þ

and

B¼Z 0

�1

Z bβ1

�1Lðβ0; β1Þ dβ0 dβ1r

Z 0

�1

Z bβ1

�1esyβ0 þ sxyβ1 dβ0 dβ1: ð18Þ

Recall that the posterior density is proper if m(Y) is finite, which is equivalent to A and B being finite. We observe that A isfinite if sy40 and sxyo0, both of which are satisfied if yi¼1 for some xiamax xℓ. Similarly, B is finite if sy40 andbsyþsxy40, both of which hold true if yi¼1 for some xiamin xℓ. These two sets of conditions are equivalent to having yi¼1for some xiamax xℓ and for some xiamin xℓ, proving the sufficiency part of the theorem.To prove the necessity, we first note that, clearly, the mðyÞ ¼1 if all xi's are equal regardless of the values of yi. Now

suppose that yi¼1 only for xi ¼min xℓ ¼ �bo0 and there are k2Z1 such observations. Then, from (18),

mðyÞZB¼Z 0

�1

Z bβ1

�1ek2ðβ0 �bβ1Þ ∏

fi:yi ¼ 0gð1�eβ0 þβ1xi Þ dβ0 dβ1

4Z 0

�1

Z bβ1

�1ek2ðβ0 �bβ1Þð1�eβ0 �β1bÞn�k2 dβ0 dβ1;

¼Z 0

�1

Z 1

0uk2 �1ð1�uÞn�k2 du dβ1 ¼1;

which proves the necessity of yi¼1 for xi4min xℓ. Next, suppose yi¼1 only for xi ¼max xℓ ¼ 0 and there are k3Z1 suchobservations. Then, from (17),

mðyÞZA¼Z 1

0

Z 0

�1ek3β0 ∏

fi:yi ¼ 0gð1�eβ0 þβ1xi Þ dβ0 dβ1

4Z 1

0

Z 0

�1ek3β0 ∏

fi:yi ¼ 0gð1�eβ0 Þ dβ0 dβ1 ¼1;

proving the necessity of yi¼1 for xiomax xℓ, which concludes the proof. □

Proof of Theorem 2. Assume that the vector space spanned by S is of rank at least Kþ1. Then, there are Kþ1 linearlyindependent vectors in S, which we label as vi ¼ ð1; xi1;…xiK Þ; i¼ 1;…;Kþ1. Then, γi ¼ β0þ∑K

j ¼ 1βjxij; i¼ 1;…;Kþ1 isa one-to-one transformation from β ¼ ðβ0;…; βK Þ to γ ¼ ðγ1;…; γKþ1Þ. Now, the marginal likelihood with respect to the flatprior (see (12)) is

mðyÞ ¼ZΘLðβÞ dβ:

Using the observations yi¼1 corresponding to vi; i¼ 1;…;Kþ1,

mðyÞoZβAΘ

∏Kþ1

i ¼ 1eβ0 þ∑K

j ¼ 1βjxij

!dβ :

Making the transformation of variables from β to γ, we get

mðyÞoZγAΓ

∏Kþ1

i ¼ 1eγi

!j Jj dγo1;

where Γ ¼ ð�1;0ÞKþ1 and jJj is the Jacobian, concluding the proof. □

A.1. Boundedness of Ωðβ0Þ

We first show that the size of Ωðβ0Þ is a function of β0.

Lemma A1. The size (or, the Lebesgue measure) of Ωðβ0Þ, as in (13), is

vðβ0Þ ¼ vð�1Þ=ð�β0ÞK for β0o0:


Proof. For β0o0,

Ω β0� �¼ β1;…; βK

� �: β1xi1þ⋯þβKxiKr�β0; i¼ 1;…;n

� �;

¼ β1;…; βK� �

:β1�β0

xi1þ⋯þ βK�β0

xiKr1; i¼ 1;…;n

: ð19Þ

Letting β′j ¼ βj=ð�β0Þ and making a change of variables,

vðβ0Þ ¼ZΩðβ0Þ

∏K

j ¼ 1dβj ¼ ð�β0ÞK

ZΩð�1Þ

∏K

j ¼ 1dβ′j ¼ ð�β0ÞKvð�1Þ:

From the above lemma, it is clear that if Ωðβ0Þ is bounded for any specific β0o0, it is bounded for all β0o0. There are severalsufficient conditions for Ωðβ0Þ to be bounded. Below, we give one such minimal condition. Later, we comment on a wayto check for boundedness. Let X ¼ ðxijÞ be the n�K design matrix consisting the values xij; i¼ 1;…;n; j¼ 1;…;K of theK covariates. We assume that the design matrix X is of full rank K. For simplicity, without loss of generality, we can assumethat the K�K submatrix, X1, of X formed by the first K rows of X is the identity matrix IK. (This can be archived by a lineartransformation Xn ¼ XA and βn ¼ A�1β where A¼ X�1

1 , so that the resulting matrices Xn and βn satisfy the same inequalityconstraints.) □

Lemma A2 (A sufficient condition for boundedness of Ωðβ0Þ). For X as described above, suppose there is a row, say, ðKþ1Þst row,with all of its elements negative, then Ωðβ0Þ is bounded.Proof. To verify this, we fix β0. Using the (first K) of the inequalities defining Ωðβ0Þ (as in (19)), and from the assumptionX1 ¼ IK , we have βjr�β0 for j¼ 1;…;K . This shows βj's are all bounded from above when β0 is fixed. Now, applying theinequality for i¼ Kþ1, we get

β1xi1þβ2xi2þ⋯þβKxiKr�β0 for i¼ Kþ1 ð20Þwith xijo0 for i¼ Kþ1; j¼ 1;…;K . Now, suppose that β1 is not bounded from below, and let β1-�1 in (20). Then, the firstterm in the summation on the left-hand side of (20) goes to 1, and hence the sum of the remaining terms must go to �1.For this to be valid, at least one of β2;…; βK must go to 1, leading to a contradiction. This verifies that β1 is bounded frombelow. Similarly, all βi's are bounded from below, and hence Ωðβ0Þ is bounded.On a practical level, checking boundedness in more than two dimensions ðK42Þ using conditions such as the above can

be difficult. Instead, we suggest using a numerical approach. One approach is to use a numerical algorithm to maximize afunction f ðβ1;…; βK Þ ¼∑K

j ¼ 1β2j (or minimizing its inverse), subject to the linear constraints in (19), and checking if the

maximum is finite. For the examples in this paper, we used the constrOptim function in R. □

References

Berger, J., 1985. Statistical Decision Theory and Bayesian Analysis. Springer-Verlag, New York.Berger, J., Bernardo, J., 1992. On the development of the reference prior method. Bayesian Statistics 4, J.M. Bernardo, et. al. (Eds.), Oxford University Press,

Oxford, pp. 35-60.Chu, H., Cole, S.R., 2010. Estimation of risk ratios in cohort studies with common outcomes a Bayesian approach. Epidemiology 21 (6), 855–862.Datta, G.S., Mukerjee, R., 2004. Probability Matching Priors: Higher Order Asymptotics. Lecture Notes in Statistics, Springer, New York.Deddens, J., Petersen, M., Lei, X., 2003. Estimation of prevalence ratio when PROC GENMOD does not converge. In: Proceedings of the 28th Annual SAS

Users Group International Conference, 30 March–2 April 2003. ⟨http://www2.sas.com/proceedings/sugi28/270-28.pdf⟩.Deddens, J.A., Peterson, M., 2008. Approaches for estimating prevalence ratios. Occupational and Environmental Medicine 65, 501–506.Gelfand, A.E., Sahu, S.K., 1999. Identifiability, improper priors, and gibbs sampling for generalized linear models. Journal of the American Statistical

Association 94, 247–253.Gelman, A., Jakulin, A., Pittau, M.G., Su, Y., 2008. A weakly informative default prior distribution for logistic and other regression models. The Annals of

Applied Statistics 2 (4), 1360–1383.Gilks, W.R., Wild, P., 1992. Adaptive rejection sampling for gibbs sampling. Applied Statistics 41, 337–348.Hastings, W., 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109.Ibrahim, J.G., Laud, P.W., 1991. On Bayesian analysis of generalized linear models using Jeffreys's prior. Journal of the American Statistical Association 86,

981–986.Kass, R.E., Wasserman, L., 1996. The selection of prior distributions by formal rules. Journal of the American Statistical Association 90, 1343–1370.Mantle, M.J., Greenwood, R.M., Currey, H.L., 1977. Backache in pregnancy. Rheumatology and Rehabilitation 16 (2), 95–101.Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E., 1953. Equations of state calculations by fast computing machines. Journal of

Chemical Physics 21, 1087–1091.Neal, R.M., 2003. Slice sampling (with discussion). Annals of Statistics 31, 705–767.Skov, T., Deddens, J., Petersen, M., Endahl, L., 1998. Prevalence proportion ratios: estimation and hypothesis testing. International Journal of Epidemiology

27, 91–95.Spiegelhalter, D.J., Thomas, A., Best, N.G., 2003. WinBUGS User Manual, Version 1.4. Medical Research Council Biostatistics Unit, Cambridge, United

Kingdom.Sun, D., Tsutakawa, R.K., He, Z., 2001. Propriety of posteriors with improper priors in hierarchical linear mixed models. Statistica Sinica 11, 77–95.Wacholder, S., 1986. Binomial regression in GLIM, estimating risk ratios and risk differences. American Journal of Epidemiology 123, 174–184.Zocchetti, C., Consonni, D., Bertazzi, P.A., 1995. Estimation of prevalence rate ratio from cross-sectional data. International Journal of Epidemiology 24,

1064–1065.

http://refhub.elsevier.com/S0378-3758(13)00230-9/sbref1

http://refhub.elsevier.com/S0378-3758(13)00230-9/oth741

http://refhub.elsevier.com/S0378-3758(13)00230-9/oth741


http://refhub.elsevier.com/S0378-3758(13)00230-9/othref0005


http://www2.sas.com/proceedings/sugi28/270-28.pdf























Documents

An objective Bayesian estimation of parameters in a log-binomial model