Upload
rudolph-harvey
View
217
Download
4
Tags:
Embed Size (px)
Citation preview
New interval estimating procedures for the disease transmission probability in multiple-
vector transfer designs
Joshua M. Tebbs and Christopher R. Bilder
Department of Statistics
Oklahoma State University
Joshua M. Tebbs and Christopher R. Bilder 2
Introduction Plant disease is responsible for major losses in agricultural
throughout the world Diseases are often spread by insect vectors (e.g., aphids,
leafhoppers, planthoppers, etc.) Example:
www.knowledgebank.irri.org/ricedoctor_mx/Fact_Sheets/Pests/Planthopper.htm
Brown planthopper
Whitebacked planthopper
Joshua M. Tebbs and Christopher R. Bilder 3
Example Ornaghi et al. (1999) study the effects of the “Mal Rio Cuarto”
(MRC) virus and its spread by the Delphacodes kuscheli planthopper The MRC virus is most-damaging maize virus in Argentina It was desired to estimate p, the probability of disease
transmission for a single vector Vector-transfers are often used by plant pathologists wanting
to estimate p In such experiments, insects are moved from an infected
source to the test plants
Joshua M. Tebbs and Christopher R. Bilder 4
Single-vector transfers The most straightforward way to estimate p is by using a
single-vector transfer Each test plant contains one vector, and test plants must
be individually caged Under the binomial model, the proportion of infected test
plants gives the maximum likelihood estimate of p Disadvantages with a single-vector transfer:
Requires a large amount of space (since insects must be individually isolated)
Is a costly design since one needs a large number of test plants and individual cages
Joshua M. Tebbs and Christopher R. Bilder 5
Multiple-vector transfers A group of s > 1 insect vectors is allocated to each test plant.
Even though test plants are occupied by multiple insects, the goal is still to estimate p, the probability of disease transmission for a single vector
Greenhouse
Enclosed test plant
Does not transmit virus
Transmits virus
Y=0
Y=1Y=0
Y=1Y=0
Y=0
Planthopper
Joshua M. Tebbs and Christopher R. Bilder 6
Multiple-vector transfers Advantages of a multiple-vector versus single-vector transfer:
Potential savings in time, cost, and space Statistical properties of estimators are much better (for a
fixed number of test plants) A multiple-vector transfer is an application of the group-testing
experimental design Other applications of group testing:
Infectious disease seroprevalence estimation in human populations
Disease-transmission in animal studies Drug discovery applications
Joshua M. Tebbs and Christopher R. Bilder 7
Notation and assumptions Define:
n = number of test plants s = number of insects per plant (“group size”) Y=1 “infected test plant” – plant for which at least one
vector (out of s) infects Y=0 “uninfected test plant” – plant for which no vectors (out
of s) infect Assumptions:
Common group size s The statuses of individual vectors are iid Bernoulli random
variables with mean p The statuses of test plants are independent Test plants are not misclassified
Joshua M. Tebbs and Christopher R. Bilder 8
Maximum likelihood estimator for p Let T = Y denote the number of infected test plants. Under
our design assumptions, T has a binomial distribution with parameters n and
The maximum likelihood estimator of p is given by
where (the proportion of infected test plants) Estimates of p are computed by only examining the test plants
(and not the individual vectors themselves) The binomial model is only appropriate if test plants do not
differ materially in their resistance to pathogen transmission
1 (1 ) sp
1/ˆˆ 1 (1 ) , sp
/ˆ T n
Joshua M. Tebbs and Christopher R. Bilder 9
Properties of the MLE and the Wald CI The statistic has the following properties:
Consistent as n gets large Approximately normally distributed; more precisely,
where
A 100(1-) percent Wald confidence interval is given by
where
p
ˆ [ , ( )/ ],p AN p v p n
2 2
1 (1 )( )
(1 )
s
s
pv p
s p
2 2
ˆ1 (1 )ˆˆ( )ˆ(1 )
s
s
pv p
s p
/2ˆ ˆˆ( ) ,p z v p n
Joshua M. Tebbs and Christopher R. Bilder 10
Variance stabilizing interval (VSI) Goal: Find whose variance is free of the parameter p Solve the following differential equation:
With c0 = 1, a solution is given by It follows that
is a 100(1-) percent confidence interval for p. Here,
( ),ˆg p
2 20 (1 )
( )1-(1 )
s
s
c s pg' p
p( ) 2arctan (1 ) 1 sg p p
1/ 1/1+cos( ) 1+cos( )
1 ,12 2
s sa b
1 /2ˆ2arctan (1 ) 1 /
sa p z n
1 /2ˆ2arctan (1 ) 1 /
sb p z n
Joshua M. Tebbs and Christopher R. Bilder 11
Modified Clopper-Pearson (CP) interval The number of infected test plants, T, has a binomial
distribution with parameters n and One can obtain an exact Clopper-Pearson interval for and
then transform back to the p scale (Chiang and Reeves, 1962) Exact 100(1-) percent confidence limits for p are given by
and
where F1-,a,b denotes the 1- quantile of the central F distribution with a (numerator) and b (denominator) degrees of freedom
1 (1 ) sp
1/
1 /2,2( 1),2
11 1
11
s
n t tn t
Ft
1/
1 /2,2( 1),2( )
1 /2,2( 1),2( )
1
1 1 ,1
1
s
t n t
t n t
tF
n tt
Fn t
Joshua M. Tebbs and Christopher R. Bilder 12
Comparing the Wald, VSI, and CP The Wald interval is simple and easy to compute. However, it
has three main drawbacks: Provides symmetric confidence intervals even though the
distribution of may be very skewed Often produces negative lower limits when p is small!
The VSI handles each of these drawbacks Not symmetric Always produces lower limits within the parameter space
(i.e., strictly larger than zero) The CP interval’s main advantage is that its coverage
probability is always greater than or equal to 1-. However, such intervals can be wastefully wide, especially if n is small.
p
Joshua M. Tebbs and Christopher R. Bilder 13
Bayesian estimation Prior distribution for p
One parameter Beta distribution
for a known value of Takes into account p is small Example when = 52.4
I1( | ) (1 ) (0 1) Pf p p p
0.00 0.02 0.04 0.06 0.08
01
02
03
04
05
0
p
f(p
)
Joshua M. Tebbs and Christopher R. Bilder 14
Bayesian estimation Prior distribution for p
Why use one parameter instead of two parameter Beta? Sensible model acknowledging p is small Bayes and empirical Bayes estimators are simpler
Resulting estimator using squared error loss with a two parameter beta is ratio of complicated alternating sums
See Chaubey and Li (Journal of Official Statistics, 1995) for Bayes estimators
Joshua M. Tebbs and Christopher R. Bilder 15
Bayesian estimation Posterior distribution for 0 < p < 1
Note: U = 1 − (1 − P)s ~ beta(t + 1, n − t + /s)
| ,
( ) 1
( | , ) ( , | ) / ( | )
( / 1)(1 ) [1 (1 ) ]
( / ) ( 1)
P T T P T
s n t s t
f p t f t p f t
s n sp p
n t s t
Joshua M. Tebbs and Christopher R. Bilder 16
Empirical Bayesian estimation Use the marginal distribution for T to derive an estimate for Why?
Avoid possible poor choice for n is often small in multiple-vector transfer experiments
Posterior may be adversely affected by the prior Marginal distribution of T for t = 0, 1, …, n
Maximize fT(t|) as a function of to obtain the marginal maximum likelihood estimate, Iteratively solve for in
where ( ) is the digamma function
( 1) ( / )( | )
( 1) ( / 1)
T
n n t sf t
s n t n s
1 1log ( | ) ( / ) ( / 1) 0
Tf t s n t s n s
Joshua M. Tebbs and Christopher R. Bilder 17
Credible intervals (1 − )100% Equal-tail
[pL, pU] satisfy
and
Use relationship with Beta distribution, U = 1 − (1 − p)s ~ beta(t + 1, n − t + /s)
Interval:
where B,a,b is the quantile of a Beta(a,b) distribution
|0
ˆ( | , ) / 2Lp
P Tf p t dp 1
|ˆ( | , ) / 2
U
P Tp
f p t dp
1/ 1/ˆ ˆ/ 2; 1, / 1 / 2; 1, /1 (1 ) ,1 (1 )
s st n t s t n t sB B
Remember that = 1 − (1 − p)s implies p = 1 − (1 − )1/s
Joshua M. Tebbs and Christopher R. Bilder 18
Credible intervals (1 − )100% highest posterior density (HPD) regions
Posterior is unimodal and right skewed Find [pL, pU] such that (1 − )100% area of posterior density
is included and pU − pL is as small as possible See Tanner (1996, p. 103-4)
Key is to sample from posterior distribution Use U = 1 − (1 − p)s ~ beta(t + 1, n − t + /s) relationship
Joshua M. Tebbs and Christopher R. Bilder 19
Example - Ornaghi et al. (1999) Data
s = 7 planthoppers per plant n = 24 plants t = 3 infected plants observed
95% interval estimates for p
IntervalLower limit
Upper limit Length
Wald -0.0023 0.0401 0.0424
VSI 0.0037 0.0465 0.0428
Modified Clopper-Pearson 0.0038 0.0543 0.0505
Equal-tail 0.0052 0.0410 0.0358
HPD 0.0034 0.0373 0.0339
ˆ 52.4
Joshua M. Tebbs and Christopher R. Bilder 20
Interval comparisons Coverage
where I(n,t,s) = 1 if the interval contains 1 and I(n,t,s) = 0 otherwise. Do not consider the t = 0 and t = n cases
Poor multiple-vector transfer experimental design See Swallow (1985, Phytopathology) for guidance in
choosing s Brown, Cai, and DasGupta (2001, Statistical Science) Frequentist evaluation similar to how Carlin and Louis
(2000) approach evaluating confidence and credible intervals
1 ( )
1( , , ) 1 1 1
( , , ) ,1 1 1 1
tn s s n t
t
nsn s
nn t s p p
tC p n s
p p
I
Joshua M. Tebbs and Christopher R. Bilder 21
Interval comparisons = 0.05, n=40, and s=10 Black line denotes Wald & bold line denotes plot title
0.00 0.02 0.04 0.06 0.08 0.10
0.85
0.90
0.95
1.00
VSI
p
Cov
erag
e
0.00 0.02 0.04 0.06 0.08 0.10
0.85
0.90
0.95
1.00
Clopper-Pearson
p
Cov
erag
e
0.00 0.02 0.04 0.06 0.08 0.10
0.85
0.90
0.95
1.00
Equal-tail
p
Cov
erag
e
0.00 0.02 0.04 0.06 0.08 0.10
0.85
0.90
0.95
1.00
HPD
p
Cov
erag
e
Joshua M. Tebbs and Christopher R. Bilder 22
Summary Best interval: VSI or modified Clopper-Pearson
Credible intervals may be improved by taking into account variability of the estimators
Bootstrap intervals mentioned in abstract – VSI and Clopper-Pearson perform better
Many other intervals could be investigated! Website
www.chrisbilder.com/bilder_tebbs Contains R programs for examining the interval estimation
properties Different values of p, n, and s can be used Also calculates empirical Bayes estimators
Program for Ornaghi et al. (1999) data example
New interval estimating procedures for the disease transmission probability in multiple-
vector transfer designs
Joshua M. Tebbs and Christopher R. Bilder
Department of Statistics
Oklahoma State University
[email protected] and [email protected]
Contact address starting Fall 2003:
Joshua M. TebbsDepartment of StatisticsKansas State University
Christopher R. BilderDepartment of StatisticsUniversity of [email protected]