New interval estimating procedures for the disease transmission probability in multiple- vector transfer designs Joshua M. Tebbs and Christopher R. Bilder

New interval estimating procedures for the disease transmission probability in multiple-

vector transfer designs

Joshua M. Tebbs and Christopher R. Bilder

Department of Statistics

Oklahoma State University

[email protected] and [email protected]

Joshua M. Tebbs and Christopher R. Bilder 2

Introduction Plant disease is responsible for major losses in agricultural

throughout the world Diseases are often spread by insect vectors (e.g., aphids,

leafhoppers, planthoppers, etc.) Example:

www.knowledgebank.irri.org/ricedoctor_mx/Fact_Sheets/Pests/Planthopper.htm

Brown planthopper

Whitebacked planthopper


Example Ornaghi et al. (1999) study the effects of the “Mal Rio Cuarto”

(MRC) virus and its spread by the Delphacodes kuscheli planthopper The MRC virus is most-damaging maize virus in Argentina It was desired to estimate p, the probability of disease

transmission for a single vector Vector-transfers are often used by plant pathologists wanting

to estimate p In such experiments, insects are moved from an infected

source to the test plants


Single-vector transfers The most straightforward way to estimate p is by using a

single-vector transfer Each test plant contains one vector, and test plants must

be individually caged Under the binomial model, the proportion of infected test

plants gives the maximum likelihood estimate of p Disadvantages with a single-vector transfer:

Requires a large amount of space (since insects must be individually isolated)

Is a costly design since one needs a large number of test plants and individual cages


Multiple-vector transfers A group of s > 1 insect vectors is allocated to each test plant.

Even though test plants are occupied by multiple insects, the goal is still to estimate p, the probability of disease transmission for a single vector

Greenhouse

Enclosed test plant

Does not transmit virus

Transmits virus

Y=0

Y=1Y=0

Y=1Y=0

Y=0

Planthopper


Multiple-vector transfers Advantages of a multiple-vector versus single-vector transfer:

Potential savings in time, cost, and space Statistical properties of estimators are much better (for a

fixed number of test plants) A multiple-vector transfer is an application of the group-testing

experimental design Other applications of group testing:

Infectious disease seroprevalence estimation in human populations

Disease-transmission in animal studies Drug discovery applications


Notation and assumptions Define:

n = number of test plants s = number of insects per plant (“group size”) Y=1 “infected test plant” – plant for which at least one

vector (out of s) infects Y=0 “uninfected test plant” – plant for which no vectors (out

of s) infect Assumptions:

Common group size s The statuses of individual vectors are iid Bernoulli random

variables with mean p The statuses of test plants are independent Test plants are not misclassified


Maximum likelihood estimator for p Let T = Y denote the number of infected test plants. Under

our design assumptions, T has a binomial distribution with parameters n and

The maximum likelihood estimator of p is given by

where (the proportion of infected test plants) Estimates of p are computed by only examining the test plants

(and not the individual vectors themselves) The binomial model is only appropriate if test plants do not

differ materially in their resistance to pathogen transmission

1 (1 ) sp

1/ˆˆ 1 (1 ) , sp

/ˆ T n


Properties of the MLE and the Wald CI The statistic has the following properties:

Consistent as n gets large Approximately normally distributed; more precisely,

where

A 100(1-) percent Wald confidence interval is given by

where

p

ˆ [ , ( )/ ],p AN p v p n

2 2

1 (1 )( )

(1 )

s

s

pv p

s p

2 2

ˆ1 (1 )ˆˆ( )ˆ(1 )

s

s

pv p

s p

/2ˆ ˆˆ( ) ,p z v p n


Variance stabilizing interval (VSI) Goal: Find whose variance is free of the parameter p Solve the following differential equation:

With c0 = 1, a solution is given by It follows that

is a 100(1-) percent confidence interval for p. Here,

( ),ˆg p

2 20 (1 )

( )1-(1 )

s

s

c s pg' p

p( ) 2arctan (1 ) 1 sg p p

1/ 1/1+cos( ) 1+cos( )

1 ,12 2

s sa b

1 /2ˆ2arctan (1 ) 1 /

sa p z n

1 /2ˆ2arctan (1 ) 1 /

sb p z n


Modified Clopper-Pearson (CP) interval The number of infected test plants, T, has a binomial

distribution with parameters n and One can obtain an exact Clopper-Pearson interval for and

then transform back to the p scale (Chiang and Reeves, 1962) Exact 100(1-) percent confidence limits for p are given by

and

where F1-,a,b denotes the 1- quantile of the central F distribution with a (numerator) and b (denominator) degrees of freedom

1 (1 ) sp

1/

1 /2,2( 1),2

11 1

11

s

n t tn t

Ft

1/

1 /2,2( 1),2( )

1 /2,2( 1),2( )

1

1 1 ,1

1

s

t n t

t n t

tF

n tt

Fn t


Comparing the Wald, VSI, and CP The Wald interval is simple and easy to compute. However, it

has three main drawbacks: Provides symmetric confidence intervals even though the

distribution of may be very skewed Often produces negative lower limits when p is small!

The VSI handles each of these drawbacks Not symmetric Always produces lower limits within the parameter space

(i.e., strictly larger than zero) The CP interval’s main advantage is that its coverage

probability is always greater than or equal to 1-. However, such intervals can be wastefully wide, especially if n is small.

p


Bayesian estimation Prior distribution for p

One parameter Beta distribution

for a known value of Takes into account p is small Example when = 52.4

I1( | ) (1 ) (0 1) Pf p p p

0.00 0.02 0.04 0.06 0.08

01

02

03

04

05

0

p

f(p

)


Bayesian estimation Prior distribution for p

Why use one parameter instead of two parameter Beta? Sensible model acknowledging p is small Bayes and empirical Bayes estimators are simpler

Resulting estimator using squared error loss with a two parameter beta is ratio of complicated alternating sums

See Chaubey and Li (Journal of Official Statistics, 1995) for Bayes estimators


Bayesian estimation Posterior distribution for 0 < p < 1

Note: U = 1 − (1 − P)s ~ beta(t + 1, n − t + /s)

| ,

( ) 1

( | , ) ( , | ) / ( | )

( / 1)(1 ) [1 (1 ) ]

( / ) ( 1)

P T T P T

s n t s t

f p t f t p f t

s n sp p

n t s t


Empirical Bayesian estimation Use the marginal distribution for T to derive an estimate for Why?

Avoid possible poor choice for n is often small in multiple-vector transfer experiments

Posterior may be adversely affected by the prior Marginal distribution of T for t = 0, 1, …, n

Maximize fT(t|) as a function of to obtain the marginal maximum likelihood estimate, Iteratively solve for in

where ( ) is the digamma function

( 1) ( / )( | )

( 1) ( / 1)

T

n n t sf t

s n t n s

1 1log ( | ) ( / ) ( / 1) 0

Tf t s n t s n s


Credible intervals (1 − )100% Equal-tail

[pL, pU] satisfy

and

Use relationship with Beta distribution, U = 1 − (1 − p)s ~ beta(t + 1, n − t + /s)

Interval:

where B,a,b is the quantile of a Beta(a,b) distribution

|0

ˆ( | , ) / 2Lp

P Tf p t dp 1

|ˆ( | , ) / 2

U

P Tp

f p t dp

1/ 1/ˆ ˆ/ 2; 1, / 1 / 2; 1, /1 (1 ) ,1 (1 )

s st n t s t n t sB B

Remember that = 1 − (1 − p)s implies p = 1 − (1 − )1/s


Credible intervals (1 − )100% highest posterior density (HPD) regions

Posterior is unimodal and right skewed Find [pL, pU] such that (1 − )100% area of posterior density

is included and pU − pL is as small as possible See Tanner (1996, p. 103-4)

Key is to sample from posterior distribution Use U = 1 − (1 − p)s ~ beta(t + 1, n − t + /s) relationship


Example - Ornaghi et al. (1999) Data

s = 7 planthoppers per plant n = 24 plants t = 3 infected plants observed

95% interval estimates for p

IntervalLower limit

Upper limit Length

Wald -0.0023 0.0401 0.0424

VSI 0.0037 0.0465 0.0428

Modified Clopper-Pearson 0.0038 0.0543 0.0505

Equal-tail 0.0052 0.0410 0.0358

HPD 0.0034 0.0373 0.0339

ˆ 52.4


Interval comparisons Coverage

where I(n,t,s) = 1 if the interval contains 1 and I(n,t,s) = 0 otherwise. Do not consider the t = 0 and t = n cases

Poor multiple-vector transfer experimental design See Swallow (1985, Phytopathology) for guidance in

choosing s Brown, Cai, and DasGupta (2001, Statistical Science) Frequentist evaluation similar to how Carlin and Louis

(2000) approach evaluating confidence and credible intervals

1 ( )

1( , , ) 1 1 1

( , , ) ,1 1 1 1

tn s s n t

t

nsn s

nn t s p p

tC p n s

p p

I


Interval comparisons = 0.05, n=40, and s=10 Black line denotes Wald & bold line denotes plot title

0.00 0.02 0.04 0.06 0.08 0.10

0.85

0.90

0.95

1.00

VSI

p

Cov

erag

e

0.00 0.02 0.04 0.06 0.08 0.10

0.85

0.90

0.95

1.00

Clopper-Pearson

p

Cov

erag

e

0.00 0.02 0.04 0.06 0.08 0.10

0.85

0.90

0.95

1.00

Equal-tail

p

Cov

erag

e

0.00 0.02 0.04 0.06 0.08 0.10

0.85

0.90

0.95

1.00

HPD

p

Cov

erag

e


Summary Best interval: VSI or modified Clopper-Pearson

Credible intervals may be improved by taking into account variability of the estimators

Bootstrap intervals mentioned in abstract – VSI and Clopper-Pearson perform better

Many other intervals could be investigated! Website

www.chrisbilder.com/bilder_tebbs Contains R programs for examining the interval estimation

properties Different values of p, n, and s can be used Also calculates empirical Bayes estimators

Program for Ornaghi et al. (1999) data example

New interval estimating procedures for the disease transmission probability in multiple-

vector transfer designs

Joshua M. Tebbs and Christopher R. Bilder

Department of Statistics

Oklahoma State University

[email protected] and [email protected]

Contact address starting Fall 2003:

Joshua M. TebbsDepartment of StatisticsKansas State University

Christopher R. BilderDepartment of StatisticsUniversity of [email protected]

Documents

New interval estimating procedures for the disease transmission probability in multiple- vector transfer designs Joshua M. Tebbs and Christopher R. Bilder