Download pdf - Bayesian Use of Likelihood Ratios in Biostatisticsdraper/draper-jsm-2010.pdfSepsis is a serious medical condition in which the entire body exhibits an in ... As a stepping-stone toward

Bayesian Use of

Likelihood Ratios

in Biostatistics

David Draper

Department of Applied Mathematics and Statistics

University of California, Santa Cruz, USA

[email protected]

www.ams.ucsc.edu/∼draper

JSM 2010

Vancouver, Canada

4 Aug 2010

Bayesian use of likelihood ratios in biostatistics 1

Case Study: Diagnosing Sepsis in Newborns

(Newman TB, Puopolo KM, Wi S, Draper D, Escobar GE (2010). Interpreting

complete blood counts soon after birth in newborns at risk for sepsis.

Pediatrics, forthcoming.)

Sepsis is a serious medical condition in which the entire body exhibits an

inflammatory response to infection, usually bacterial (e.g., Group B

streptococcus (GBS)).

It’s particularly dangerous in newborns, where early-onset sepsis (EOS)

usually presents within the first 24 hours after birth.

However, the evaluation of EOS is difficult: risk factors for infection are

common, and early signs and symptoms are nonspecific.

When newborns are symptomatic or have significant risk factors, a

complete blood count (CBC) is usually ordered; for example, CDC

guidelines recommend a CBC for high-risk infants (e.g., those with

GBS-positive mothers not adequately treated for infection).

Unfortunately the CDC recommendations are silent on how to use CBC

results to estimate the risk of infection.


Use of CBC Components to Diagnose Sepsis

Published reference ranges for components of the CBC — including the

absolute neutrophil count (ANC) and the proportion of total

neutrophils that are immature (I/T) — vary widely, and these variables

may be affected by many factors besides infection, including infant age (in

hours), the method of delivery, maternal hypertension, and

the infant’s sex.

Many different values for the sensitivity — P (test positive|sepsis) — and

specificity — P (test negative|not sepsis) — of CBC components have been

published, depending on the population studied and what levels of these

tests were considered abnormal.

Moreover, most previous studies have dichotomized each of the CBC

components rather than treating them continuously — which wastes

information by failing to quantify the difference between borderline and

profoundly abnormal results — and no one previous to our study had tried

to evaluate the effects of factors such as infant age and delivery method

on diagnostic performance.


Study Methods

As part of a larger project based on a $1.35 million NIH grant, we took

advantage of the electronic medical record systems at Northern California

Kaiser Permanente Medical Care Program (KPMCP) and Brigham and

Women’s Hospital (BWH, Boston) to improve on previous practice.

Methods. Retrospective cross-sectional study involving KPMCP, BWH

demographic, laboratory, hospitalization data bases; we queried

microbiology data bases to identify all infants for whom blood culture was

obtained at < 72 hours of age; we kept first positive blood culture for

infants with positive cultures (septic), and first blood culture for other

infants, then matched all blood cultures by date, time to (single) CBC

obtained closest in time to blood culture for each infant.

Study subjects. Newborn infants were eligible for the study if (a) they

were born from 1 Jan 1995 through 30 Sep 2007 at a KPMCP hospital that

had at least 100 total births in that time period, or at the BWH from 1 Jan

1993 through 31 Dec 2007; (b) their estimated gestational age was ≥ 34

weeks; and (c) they had a CBC and blood culture drawn within 1 hour of

one another at < 72 hours of age.


The Promise of Electronic Medical Records

Sepsis is rare but deadly: of the 550,367 infants eligible for the study

based on their hospital, year of birth, and gestational age, we identified 311

(0.57/1000 live births) with positive blood cultures; we included in this

study the subset of 67,623 infants (12.3% of the 550,367 eligible newborns)

who had a CBC done within 1 hour of a blood culture, including 245 of the

311 whose blood culture was positive (3.6/1000 infants receiving CBCs):

thus 245 sepsis-positive and 67,378 sepsis-negative babies.

Goal of analysis. With sepsis and other diseases, we’re working toward a

clinical goal — in the nascent era of electronic medical records (EMRs)

— in which current posterior probabilities of disease status and adverse

outcomes (e.g., unplanned transfer to the intensive care unit) become prior

probabilities for real-time sequential updating as new information

(vital signs, laboratory results, signs and symptoms) arrives.

As a stepping-stone toward that eventual goal, we’re now putting in place at

Kaiser a Bayesian system in which


Likelihood Ratios

(1) an initial probability of sepsis is estimated based on maternal risk

factors up til birth;

(2) the probability in (1) is updated at newborn age 12 hours via

Bayes’s Theorem based on new infant data in the first 12 hours of life;

(3) the probability in (2) is updated at 24 hours via Bayes’s Theorem

based on new infant data in hours 12–24; and so on.

A convenient way to do this Bayesian updating is with Bayes’s Theorem

in odds form: with diagnostic data y and true sepsis = S,

P (S|y)P (not S|y)

=[

P (S)P (not S)

]

·[

P (y|S)P (y|not S)

]

posterior

odds

=

prior

odds

·

Bayes

factor=

likelihood

ratio

(1)

So how should likelihood ratios be estimated from data?


Estimating Likelihood Ratios

Consider gathering data on a screening test T for a disease to estimate the

test’s sensitivity and specificity.

For this purpose you would take a random sample, of size (say) nD > 0, of

blood samples that were known (on the basis of a gold-standard test) to

contain the disease agent D, of which (say) rD would register as positive

(+) by T , and a parallel and independent random sample, of size (say)

nD̄ > 0, of blood samples that were known not to contain the disease

agent (using D̄ to denote absence of the disease), of which (say) rD̄ would

register as not positive (−) by T .

The sampling model would be

(rD|πD) ∼ Binomial(nD, πD)

(rD̄|πD̄) ∼ Binomial(nD̄, πD̄), in which(2)

• 0 < πD < 1 is the underlying probability P (+|D) of test-positives in the

population of all true-positive blood samples,

• similarly 0 < πD̄ < 1 is the underlying probability P (−|D̄) of

test-negatives in the population of all true-negative blood samples, and


Interval Estimation of a Likelihood Ratio

• rD and rD̄ are independent (given πD and πD̄).

With a given sample of blood of unknown disease status that came out

positive (say) on T , in this notation Bayes’s Theorem on the odds scale is

P (D|+)

P (D̄|+)=

[

P (D)

P (D̄)

]

·

[

P (+|D)

P (+|D̄)

]

, (3)

in which the second multiplicative factor P (+|D)

P (+|D̄)on the right side of (3) is

the likelihood ratio based on the screening test T ; the population quantity

that the likelihood ratio estimates is

θ =πD

1 − πD̄

, (4)

and the goal of the inference is an interval estimate for θ.

As usual the frequentist (repeated-sampling) and Bayesian approaches

may both be examined as methods for creating such an interval; with little

information about θ external to the data set (rD, nD, rD̄, nD̄) and large

values of (nD, nD̄), the expectation would be that the two approaches would

yield similar findings,


Likelihood-Based Inference

but for small (nD, nD̄) the Bayesian approach might well be better

calibrated (because it involves integrating over a skewed likelihood

function instead of maximizing over it).

Approximate likelihood (repeated-sampling) inference. From

standard Binomial-sampling results the maximum-likelihood estimates

(MLEs) of πD and πD̄ are π̂D = rD

nD

and π̂D̄ =r

D̄

nD̄

, respectively, and by the

functional-invariance property of maximum-likelihood estimation the MLE

of θ is then

θ̂ =π̂D

1 − π̂D̄

=rD nD̄

nD(nD̄ − rD̄), (5)

in which for sensible behavior (given that 0 < θ < ∞ by assumption) it’s

evidently necessary to assume that rD̄ < nD̄ and rD > 0.

Standard (Fisherian) maximum-likelihood inference is based on the hope

that in repeated sampling θ̂ will be approximately Gaussian, and indeed this

will be true for large enough sample sizes, but for moderate values of

(nD, nD̄) — since 0 < θ < ∞ — the repeated-sampling distribution of θ̂

will be positively skewed.


Transform the Scale

One approach to solving this problem is the bootstrap, which would be

straightforward but computationally intensive; another is to do

maximum-likelihood inference on a transformed scale (on which the

repeated-sampling distribution of the MLE is closer to Gaussian) and

back-transform; here I give details on the transformation approach.

The obvious transformation for positive θ is to work with

η = log(θ) = log(πD) − log(1 − πD̄), (6)

for which the MLE is

η̂ = log(θ̂) = log(π̂D) − log(1 − π̂D̄). (7)

In repeated sampling the distribution of η̂ should be approximately

Gaussian with mean fairly close to η and variance

V (η̂) = V [log(π̂D)] + V [log(1 − π̂D̄)] . (8)

The variances in (8) can each be approximated by a standard Taylor-series

(∆-method) calculation: if in repeated sampling Y has mean E(Y ) and


∆ Method

variance V (Y ) and f is a function whose first derivative exists at E(Y ), then

V [f(Y )].=

{

f ′[E(Y )]}2

V (Y ). (9)

With f(y) = log(y) and Y = π̂D, so that E(Y ) = πD and V (Y ) = πD(1−πD)nD

,

this yields

V [log(π̂D)].=

(

1

πD

)2πD(1 − πD)

nD

=1 − πD

nD πD

, (10)

and a similar calculation with f(y) = log(1 − y) and Y = π̂D̄ gives

V [log(1 − π̂D̄)].=

πD̄

nD̄(1 − πD̄), (11)

so that the repeated-sampling variance of η̂ may be approximately

estimated by

V̂ (η̂).=

1 − π̂D

nD π̂D

+π̂D̄

nD̄(1 − π̂D̄)=

nD − rD

nD rD

+rD̄

nD̄(nD̄ − rD̄). (12)

To ensure both sensible estimates of θ in (5) and non-zero variance

estimates in (12) it’s necessary to assume that 0 < rD < nD and

0 < rD̄ < nD̄.


Bayesian Solution

Based on the above assumption of approximate Gaussian sampling

distribution for η̂, an approximate 100(1 − α)% confidence interval for η

would then be of the form

η̂ ± Φ−1(

1 −α

2

)

√

V̂ (η̂), (13)

where Φ is the standard normal CDF; denoting the left and right endpoints

of (13) by η̂L and η̂R, respectively, the corresponding approximate 100(1 − α)%

confidence interval for θ would then be

[exp(η̂L), exp(η̂R)] . (14)

Bayesian solution. This is simpler and does not require an appeal to

large-sample approximations.

If you have little information about the probabilities πD and πD̄ external to

the data set y = (rD, nD, rD̄, nD̄), as will often be the case, this can readily be

conveyed by augmenting model (2) above with conjugate Beta prior

distributions with small values of the hyper-parameters;


Bayesian Solution

the prior model is then

πD ∼ Beta(αD, βD)

πD̄ ∼ Beta(αD̄, βD̄)(15)

with (e.g.) αD = βD = αD̄ = βD̄ = ε for some small ε > 0.

By standard conjugate updating the posterior distributions for πD and πD̄

are then (independently) also Beta:

(πD|y) ∼ Beta(αD + rD, βD + nD − rD)

(πD̄|y) ∼ Beta(αD̄ + rD̄, βD̄ + nD̄ − rD̄).(16)

The posterior distribution p(θ|y) for θ given the data has no closed-form

expression but may easily be approximated to any desired accuracy by

simulation: you simply

• generate m IID draws from the Beta posterior distribution p(πD|y) in the

first line of (16), for some large value of m, and store the generated draws in

a column called π∗D;


Bayesian Solution

• independently generate m IID draws from the Beta posterior

distribution p(πD̄|y) in the second line of (16) and store the generated

draws in another column called π∗D̄; and

• create a third column θ∗ =π∗

D

1−π∗D̄

and summarize it in all relevant ways

(e.g., a density trace provides a visual summary of p(θ|y), the mean or

median of the θ∗ values may be used as a point estimate, and the α2

and(

1 − α2

)

quantiles of the θ∗ distribution provide the left and right endpoints of a

100(1 − α)% interval estimate for θ).

It’s also interesting to simulate from the posterior distribution for η given

y (by creating a fourth column η∗ = log(θ∗)) to see how close this distribution

is to a Gaussian form, to examine (by the Bernstein-von Mises Theorem)

whether the assumption on which the likelihood approach was based —

that in repeated sampling η̂IID∼ Gaussian[η, V (η̂)] — is reasonable for a given

data set.

An example. Consider a test with sensitivity 96% and specificity 97%,

and sample sizes ranging from 50 to 2,000.


An Example

Maximum-likelihood and Bayesian likelihood ratio point and interval estimates for a

moderately accurate screening test; the Bayesian results use

ε = 0.01 and m = 100,000.

Point Estimates 95% Interval

Posterior Likelihood Posterior

rD nD rD̄ nD̄ MLE Median Mean L U L U

48 50 97 100 32.0 35.5 47.4 10.5 97.7 13.5 151.8

96 100 194 200 32.0 33.7 38.2 14.5 70.4 16.6 86.2

960 1000 1940 2000 32.0 32.2 32.5 24.9 41.1 25.3 41.8

With small (nD, nD̄), MLE of likelihood ratio, which corresponds

approximately (with little information external to sample data) to posterior

mode, is substantially smaller than either posterior median or mean (see

skewness in posterior distributions for θ in figures on next page).

The Bayesian intervals are substantially wider than their likelihood

counterparts for small and moderate sample sizes, but by the time

(nD, nD̄) has reached (1000, 2000) the two methods have yielded

similar findings.


An Example

0 50 100 150 200

0.00

00.

010

0.02

0

theta

Den

sity

2 3 4 5 6 7 8

0.0

0.2

0.4

0.6

eta

Den

sity

0 50 100 150 200

0.00

00.

010

0.02

00.

030

theta

Den

sity

2 3 4 5 6

0.0

0.2

0.4

0.6

0.8

1.0

eta

Den

sity

Top and bottom panels are posterior distributions for θ and η,

respectively (with Gaussian approximation for η); left and right columns

correspond to (nD, nD̄) = (50, 100) and (100, 200), respectively.

The Gaussian approximation for η on which the likelihood method is

based is poor with (nD, nD̄) = (50, 100), better (but still not good) with

(nD, nD̄) = (100, 200), and excellent with (nD, nD̄) = (1000, 2000) (next page).


Simulation Study

20 30 40 50 600.

000.

040.

08

theta

Den

sity

3.0 3.2 3.4 3.6 3.8 4.0

0.0

1.0

2.0

3.0

eta

Den

sity

Simulation study (joint work with JC LaGuardia). We performed a

simulation study to examine repeated-sampling bias of point estimates

of likelihood ratios, and (repeated-sampling) actual coverage of

interval estimates.


ML and Bayesian Tuning Constants

A refinement. Whenever you use a screening test in a situation in which

the specificity is close to 1,

π̂D̄ =rD̄

nD̄

→ 1 ⇒π̂D

1 − π̂D̄

= θ̂ → ∞. (17)

In this case you’ll end up with a frequentist likelihood ratio estimate

that’s unstable, because its denominator is too close to 0.

In the Bayesian approach, in this same situation if the hyper-parameter

values are too close to 0, the posterior estimate of πD̄ will again be close

to 1 and the Bayesian point estimate θ∗ can be similarly unstable.

This can easily happen if the underlying specificity of the screening process

is high and/or if the sample sizes are small.

The obvious remedies are as follows:

• (Bayesian approach) Use hyper-parameter values

αD = αD̄ = βD = βD̄ = CB that are not too close to 0.

• (MLE) Mimic what happens in the Beta-Binomial Bayesian approach

by adding a constant CL to all of the values (rD, rD̄, nD, nD̄).


Simulation Study Results

Factorial design of the simulation study.

Variables Values

πD 0.1 0.9 0.95 0.98

πD̄ 0.8 0.9 0.95 0.98

nD 20 50 75 100 150

nD̄ 40 100 150 200 350

CL = CB 0.3 0.5 0.7 0.9 1.0 1.15

We used the full-factorial simulation design summarized in this table, with

2,400 Monte Carlo repetitions in each cell of the factorial.

By way of outcomes we monitored the relative bias of each of the point

estimates (modified MLE, Bayesian posterior mean, Bayesian

posterior mode) and the actual coverage of nominal 90% modified ML

and Bayesian intervals.

Simulation conclusions were as follows.


Simulation Study Results

• Both approaches can be calibrated to obtain approximately unbiased

point estimates in almost all scenarios examined, but Bayesian interval

estimates had better actual coverage behavior than modified-ML

interval estimates for small and moderate sample sizes: actual interval

coverage for Bayesian intervals, when using CB value that gave good point

estimate, was higher than interval coverage from modified 90% likelihood

confidence interval, when using CL that gave good point estimate.

• Within the Beta family of prior distributions for a Binomial parameter

π, three popular choices to specify diffuseness, when not much is known about

π external to the data, are

(a) the Jeffreys prior, with (α, β) = (0.5, 0.5);

(b) the Laplace (Uniform) prior, with (α, β) = (1.0, 1.0); and

(c) (α, β) = (ε, ε) for a value of ε near 0 (such as 0.1 or 0.01).

Of these three choices, the Uniform prior performs substantially better

than the other two conventional diffuse-prior choices when estimating a

likelihood ratio.


Results For Sepsis Screening

• Broadening the Laplace-Uniform idea, (α, β) values ranging from 0.7 to

1.15 are worth considering; if your sample size in the non-diseased group is

small, lean toward using lower values from that interval, and if your sample

size in the non-diseased group is large, go for higher values.

Results for sepsis screening were as follows.

Likelihood Ratiofor Age at % of % ofTime of Number Those With Those Without

CBC (hours) With Infection Infection

ANC < 1 1–4 > 4 Infection With Result With Result

0–0.99 7.5 33.5 115 35 14 0.4

1–1.99 2.3 9.3 51.7 30 12 1.1

2–4.99 1.0 1.1 6.9 44 18 9.6

5–9.99 0.89 0.92 0.64 70 29 33.7

≥ 10 0.93 0.55 0.31 65 27 55.3

Low ANC values are highly predictive of sepsis, especially if they occur

more than 4 hours after birth.


Results For Sepsis Screening; The Next Step

Likelihood Ratiofor Age at % of % ofTime of Number Those With Those Without

CBC (hours) With Infection Infection

I/T < 1 1–4 > 4 Infection With Result With Result

0–0.1499 0.45 0.46 0.25 61 25 66

0.15–0.299 1.3 1.2 1.2 69 28 23

0.3–0.4499 1.4 2.9 3.1 44 18 7

0.45–0.599 4.8 3.3 8.8 37 15 3

≥ 0.6 6.1 8.4 10.7 33 15 2

High values of the I to T ratio are moderately predictive of sepsis,

especially if they occur more than 4 hours after birth.

The next step. How would you use both the ANC and I/T values to

modify a baseline probability of sepsis from the maternal information?

You can only multiply the likelihood ratios if ANC and I/T are

independent for both the sepsis and non-sepsis infants (not likely to be

true); we need to estimate their joint likelihood ratio.


Bayes’s Theorem Backwards

If an accurate method can be found to estimate P (sepsis|ANC, I/T ), this

can be done by running Bayes’s Theorem in odds form backwards: with

S = 1 for sepsis and 0 otherwise,

P (ANC, I/T |S = 1)

P (ANC, I/T |S = 0)=

[

P (S = 0)

P (S = 1)

]

·

[

P (S = 1|ANC, I/T )

1 − P (S = 1|ANC, I/T )

]

. (18)

The first thing that comes to mind in estimating P (S = 1|ANC, I/T ) is

logistic regression, but it’s important to bring the predictors ANC and I/T

into the model in the correct form; what does the surface

P (S = 1|ANC, I/T ) look like with our data?

Exploratory tools for generalized linear models are not as abundant as

with linear models; I used local regression, via the loess command

(followed by predict) in R, to explore this surface; recall that my data set has

245 sepsis-positive and 67,378 sepsis-negative babies.

Actually I really want to look at P (S = 1|ANC, I/T, age), but this will be

difficult to visualize, and my clinician colleagues prefer the ANC and I/T

answer to be stratified by age group, so I found age cutpoints that


Local Regression

captured approximately equal numbers of sepsis-positive infants:

Number of

Age (hours) Sepsis-Positives Sepsis-Negatives

≤ 1 64 10150

(1, 2] 60 19650

(2, 6] 60 24115

> 6 61 13523

A bit of advice: with up to 25,000 observations in each data set, run the

loess command like this:

case.anc.i2t.age1.loess <- loess( case1 ~ anc1 * i2t1,

case.anc.i2t.age1, statistics = "approximate",

trace.hat = "approximate" )

For several of the age groups the results were remarkable; perspective and

contour plots follow; note that predictions sometimes go negative, because

loess doesn’t know anything about bounds on the outcome.


Response Surface Exploration

ANC

I2T

P( case )

Age <= 1 hour



Age <= 1 hour

ANC

I2T

−0.02

0

0.02

0.04

0.0

6

0.0

8

0 5 10 15 20 25 30

0.0

0.2

0.4

0.6

0.8

1.0



ANC

I2T

P( case )

1 < Age <= 2 hours



1 < Age <= 2 hours

ANC

I2T

−0.02

−0.01

0

0.01

0.02

0.0

3 0

.04

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0



ANC

I2T

P( case )

2 < Age <= 6 hours



2 < Age <= 6 hours

ANC

I2T

0 0

0.0

2

0.0

4 0

.06

0.0

8 0

.1

0.1

2 0

.14

0.1

6

0 10 20 30 40 50 60

0.0

0.2

0.4

0.6

0.8

1.0



ANC

I2T

P( case )

Age > 6 hours



Age > 6 hours

ANC

I2T

−0.

05

0 0

0

0.0

5

0.05

0.1

0.1

0.1

5

0.15

0.2

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0


Fixing the Negative “Probabilities”

The estimated “probabilities” from loess are highly suggestive, but

sometimes go negative; I see three ways forward (in progress):

• Try to come up with a parametric surface in ANC and I/T for use in, e.g.,

a logistic regression model (challenging for several of the age groups).

• Figure out how to scale the estimated “probabilities” from loess so that

they retain fidelity to the correct response surface while

not going negative.

A variety of reasonable ways of doing this have all led to similar results;

one such set is plotted on the following pages (using the overall rate of

sepsis (0.003623) as P (S = 1)).

• Fit a Bayesian nonparametric model to the data, via (e.g.) Gaussian

processes (joint work with B Gramacy):

The generative, hierarchical, GP classification model we use may be

described as follows: let C(x) ∈ {0, 1} be the classification label at input

x ∈ Rm; let Z ≡ Z(X) ∈ R

N be a vector of N latent variables, one for each

row in the N × m design matrix X; each row is xi for i = 1, . . . , N with


Likelihood Ratio Estimation

ANC

I2T

LR

Age <= 1 hour


Lik

elih

ood

Ratio

Estim

atio

n

Ag

e <= 1 ho

ur

AN

C

I2T

1

2

3

4

5

6

7

8

9

10

11

12 13

14 15 16

17

18

19 20

21

22 23

25 26

05

1015

2025

30

0.0 0.2 0.4 0.6 0.8 1.0

Bayesia

nuse

oflik

elih

ood

ratio

sin

bio

statis

tic

s35


ANC

I2T

LR

1 < Age <= 2 hours



1 < Age <= 2 hours

ANC

I2T

1

2 3

4

5

6

7

8

9

10

11

12

13

14

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0



ANC

I2T

LR

2 < Age <= 6 hours


Lik

elih

ood

Ratio

Estim

atio

n

2 < Ag

e <= 6 ho

urs

AN

C

I2T

2

4

6

8 10 12 14 16

18 20 22 24 26 28 30 32

34 36 38 40 42 44

46 50 52

010

2030

4050

60

0.0 0.2 0.4 0.6 0.8 1.0

Bayesia

nuse

oflik

elih

ood

ratio

sin

bio

statis

tic

s39


ANC

I2T

LR

Age > 6 hours



Age > 6 hours

ANC

I2T

10

10

20

20

30

30

40

40

50

50

60

60

70

80

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0


Gaussian Process Classification

corresponding latent Zi; we assume that X has been pre-scaled to the unit

cube; our generative model is

C(xi)indep∼ Bernoulli[p(xi)]

p(xi) =exp{−Zi}

1 + exp{−Zi}

Z|σ2, K ∼ GP(0, σ2, K) ≡ NN (0, σ2K), where Ki,j = K(xi, xj)

K(xi, xj)|d, g = exp

{

−

m∑

k=1

|xik − xjk|2/dj}

}

+ δi,jg

σ2 ∼ IG(5/2, 10/2)

diiid∼ G(1, 20)

g ∼ Exp(1)

The priors chosen for the free parameters d = (d1, . . . , dm), g, σ2 are the

defaults in the tgp package (Gramacy and Taddy, 2010) for R.



The “correlation” function K is from the separable Gaussian family, and

d and g are the range and nugget parameters, respectively; we use the

shorthand K ≡ K|d, g.

A logit link is implied by the second line of the model, when g = 0; freeing

g ≥ 0 generalizes the logit of “effective links” parameteritizing a continuum

between probit and logit links (Neal, 1998); thus by inferring g (in the

posterior) we infer the link.

Conditional on the parameters and settings of the latent Z variables, a

sample from the predictive distribution of C(x) at a new input x is obtained

via standard kriging equations and an application of the inverse logit

transformation: We have that Z(x)|σ2, K is normally distributed with mean

k(x)K−1Z and variance σ2[1 + g + k(x)T K−1k(x)], where

k(x) = (K(x, x1), . . . , K(x, xN ))T .

Samples from the posterior predictive distribution are obtained by

conditioning on samples from the posterior of Z, σ2 and (the parameters of)

K; these are then mapped to the probabilities of class labels.



Posterior inference for the parameters of the GP classification model is

obtained by MCMC using Metropolis-within-Gibbs sampling.

Condiional on the latent Z variables, samples for (σ2, d, g) may be obtained by

following any one of several approaches for inference in regression GPs, by

treating the latents as real-valued observations at the predictors X; you

get an IG conditional for (σ2|d, g) for a Gibbs update, and (blocked) MH

or slice sampling of full conditionals can be used for (d, g|σ2); see Gramacy

and Lee (2008) for details.

Conditional on the parameters (σ2, d, g), there are two common ways to

update the latents Z: Neal (1998) proposes an adaptive rejection sampling

approach; we follow Broderick and Gramacy (2010), who proposed a 10-fold

randomly blocked Metropolis-within-Gibbs approach which exploits

convenient factorization of the label (P (C(X) = c(X)|Z(X)) and latent

Z(X) parts of the prior, and the fact that the kriging equations are easily

generalized to the multivariate conditional distribution of one group of the

latents given the others; the result is a trivial Metropolis-Hastings

acceptance calculation and good mixing properties.



Software, which is an extension of the tgp package, is available from Bobby

Gramacy upon request; see Gramacy (2007) for specific computational

details and help with the R interface.

The main computational problem is having to invert matrices on each

MCMC iteration that unfortunately grow in size with the number of

observations; getting even 10,000 posterior samples with data on

10,000–24,000 infants would take an appallingly long time.

Some idea of what to expect can be found by retaining all of the 245

sepsis-positive babies and sampling (say) 755 sepsis-negative babies in

a space-filling way in ANC–I/T space, to yield a data set with 1,000

observations; this permits results to be obtained overnight, but biases

estimates of P (S = 1|ANC, I/T ) upward by oversampling on the

positives; it may be possible to overcome this bias (work in progress).


loess (Full Data ) Versus GP (Subsample)

ANC

I2T

P( case )

Age <= 1 hour

ANC

I2T

P( case )

Age <= 1 hours


loess

(Full

Data

)V

ersu

sG

P(S

ubsa

mple

)

Ag

e <= 1 ho

ur

AN

C

I2T

−0.02

0

0.02

0.04 0.06

0.08

05

1015

2025

300.0 0.2 0.4 0.6 0.8 1.0

Ag

e <= 1 ho

urs

AN

C

I2T

0.06 0.07

0.08 0.09

0.1

0.1

0.11

0.11

0.12

0.12

0.13

0.14 0.15

0.16

0.17

0.17

0.18

0.18

0.19

0.2

0.21

0.22

05

1015

2025

30

0.0 0.2 0.4 0.6 0.8 1.0

Bayesia

nuse

oflik

elih

ood

ratio

sin

bio

statis

tic

s47


ANC

I2T

P( case )

1 < Age <= 2 hours

ANC

I2T

P( case )

2 <= Age <= 3 hours


loess

(Full

Data

)V

ersu

sG

P(S

ubsa

mple

)

1 < Ag

e <= 2 ho

urs

AN

C

I2T

−0.02

−0.01

0

0.01

0.02

0.03 0.04

010

2030

400.0 0.2 0.4 0.6 0.8 1.0

1 <= Ag

e <= 2 ho

urs

AN

C

I2T

0.05

0.06

0.07 0.07

0.08

0.08

0.09

0.09

0.1

0.1

0.11

0.11

0.12

0.12

0.13

0.13

0.14

0.14

0.15

0.16 0.17

0.18

0.19

0.19

0.2

0.21

0.22 0.23

010

2030

40

0.0 0.2 0.4 0.6 0.8 1.0

Bayesia

nuse

oflik

elih

ood

ratio

sin

bio

statis

tic

s49


ANC

I2T

P( case )

Age > 6 hours

ANC

I2T

P( case )

Age > 6 hours


loess

(Full

Data

)V

ersu

sG

P(S

ubsa

mple

)

Ag

e > 6 ho

urs

AN

C

I2T

−0.05

0

0

0

0.05

0.05

0.1

0.1

0.15

0.15

0.2

020

4060

800.0 0.2 0.4 0.6 0.8 1.0

Ag

e > ho

urs

AN

C

I2T

0.02

0.04

0.06

0.06

0.08

0.08

0.1

0.1

0.12

0.12

0.14

0.14

0.16

0.16

0.18

0.18

0.18

0.2

0.2

0.22

0.22

0.24

0.24

0.26 0.28 0.3

0.32 0.34 0.36

0.38 0.4

0.42 0.44 0.46

0.48

0.5

0.52 020

4060

80

0.0 0.2 0.4 0.6 0.8 1.0

Bayesia

nuse

oflik

elih

ood

ratio

sin

bio

statis

tic

s51