biosecurity built on science
Efficient elicitation of a CPT
in Bayesian Networks
Australian Bayesian Network Modelling Society Workshop 22 Nov 2011
Samantha Low Choy (CRCNPB/QUT Maths)
1
biosecurity built on science
Course Outline
Understanding Probabilities
Credible probabilities
Eliciting one probability
Eliciting a CPT
•Defining probabilities
•Logical fallacies
•Ambiguity
•Cognitive biases
•Calibration
•Structured design of elicitation
•Validation
•4-step elicitation method
•Outside-in (Elicitator)
•Smorgasbord
•CPT calculator
•HUGIN Table Generator
•Elicitator
2
biosecurity built on science
Eliciting a CPT
CPT calculator
Table Generator in HUGIN
Indirect elicitation (Elicitator)
Incorporating models
•Scenario based encoding
•Elicit extremes
•Then One-At-a-Time
•Calculate a CPT using a model, such as a function, or parametric distribution
•Elicit scenarios by outside-in
•Combine scen’s via regression
•Choose scenarios by design
•Choose scenarios by design
3
biosecurity built on science
Large CPTs Problematic Slope
Geology Flat Moderate Steep Very steep
Igneous
Volcanic
Other
Sub-total
Infer CPTs - Choose some
scenarios, using a pre-specified design.
- Elicit the CPT entry for each scenario.
- Based on elicited―and potentially extra―info infer remaining cells.
- Use software tool!
4
Problem: Too many (eg 4x3=12) CPT entries.
Solution: Elicit the expert’s underlying conceptual model
(eg trend in likely presence over slope within geology).
biosecurity built on science
CPT Calculator Specification of MPTs and CPTs by extrapolating
from the best, nearly best and worst cases.
Success hinges on framing (negative vs positive),
on whether any probabilities are close to 0 or 1, on appropriateness of simple regression.
5
biosecurity built on science
CPT calculator Method Slope
Geology Flat Moderate Steep Very steep
Igneous OAT
Volcanic OAT Best
Other Worst
Sub-total
Advantages
- Relatively quick: #elicitations = 2 + #parents
- Works well when probabilities are in the range 0.15-0.85.
- Software online.
- Design straightforward.
6
Elicit Pr(outcome) for best & worst scenarios.
Alter best scenario by one parent at a time; elicit.
Linearly interpolate to infer remaining cells.
biosecurity built on science
CPT Calculator Issues
Linear interpolation Probabilities do not behave linearly near
(ie within .15 of) 0 and 1.
- “Ordinary” regression assumes Normal responses so ignores extreme probabilities near boundary.
Some response distributions do allow for extreme probabilities
- eg Beta regression (Elicitator).
Uncertainty Expert certainty may vary by scenario.
- Elicit probability of outcome for each scenario with uncertainty (Elicitator).
7
biosecurity built on science
Incorporating Uncertainty
Donald+ (2011) developed a BN as a conceptual model using BBN software, then assessed sensitivity to uncertainty to inputs using WinBUGS
8
biosecurity built on science
Framing Bias Positive-Negative
Of BN & questions
Outcome
Pr(presence) vs Pr(absence)
Answers don’t always add up ... due to
Uncertainty on discrimination between cases
Different perspective, so may recall different information
Hidden conditions (unspecified nodes)
See Kynn (2008) for review
Donald+(2011), Borsuk+:
- Focus BN on
Successful operation of RWTP (recycled water treatment plant)
contamination of RWTP
- Differing focus will
Condition expert’s thinking differently
May reveal different triggers for success vs contamination
Greatly affects choices of threshold in discrete BN
Affect OAT scenarios wrt best or worst case
9
biosecurity built on science
HUGIN Table Generator
Specification of MPTs and CPTs via discretizations of parametric distributions
Successful use hinges on choice of thresholds (Another day!), on suitability of parametric model
and on encoded parameters.
10
biosecurity built on science
Table Generator Method Slope
Geology Flat Moderate Steep Very steep
Igneous
Build a formula to model all scenarios
Volcanic
Other
Advantages
- Broader picture of what expert knows about (like curve-elicitation of Kynn 2005)
- Consider all scenarios simultaneously
- Range of probabilities ~ parametric model.
- Inbuilt to HUGIN.
- No design required.
11
Apply a parametric model for Pr(outcome|parents).
Elicit parameters for the model, either directly or indirectly [refer to notes on Eliciting 1 number].
biosecurity built on science
HUGIN Parametric encoding Consider two dice: a fake
die and an equally-weighted one.
The #sixes rolled depends on the #rolls and whether the die is fake or not. [#6s | #rolls, fake die, need toilet]
Eliciting all entries in the CPT needs to address all combinations of [#rolls, fake
die, need toilet]
Consider using entries so far to extrapolate to the whole CPT.
12
Would need a table for values of response (0, 1, 2, ... n) for each set of covariates (fake=T,F) and (#rolls=1,2,3... n).
That amounts to: n 2 n = 2n2 entries!
biosecurity built on science
HUGIN Parametric encoding
Help>Table Generator
Functions>Expressions>Build Expression
Pr(#6s | n_rolls, fake_die) = if (fake_die, Binomial (n_rolls, 1/5), Binomial(n_rolls, 1/6))
13
biosecurity built on science
HUGIN Parametric encoding
Encodes #sixes as a Binomial distribution
- A parent node defines the #Binomial trials
- The probability of a six (if die is fake) is pre-estimated
Q: How is the probability of a six set to 1/5?
A: Elicit a single probability as outlined before...
14
biosecurity built on science
HUGIN Parametric encoding
Suppose distribution of C1 can be approx. by a Normal with mean C2 (discretized), and SD 1
15
biosecurity built on science
HUGIN Parametric encoding
The probability of each state j is specified in parent node C3.
The likely values of the mean are specified in C2.
This is similar to a (finite) mixture model, where the marginal distribution is written
16
1 3 1 3 2Pr C Pr( ) Pr C | C ,C j
C j j
biosecurity built on science
17
*Special case of another
Distribution
Parameters that need to be encoded D
irect
Elicit-N
Elicitato
r
SH
ELF
Normal M, Variance V1 V2 V2
LogNormal M, V, [Offset] V1 V2 V2
Beta Shape a, b, [L, U] V1 V1 V2
Gamma or Weibull Shape, Scale, [Offset] V1 V2 V2
Exponential M, [Offset] * *
Uniform L, U
Triangular M, Min, Max
PERT M, Min, Max, [Shape=4] * *
Binomial N (# trials), P
Continuous V2
Poisson M *
Negative Binomial N (# successes), P V2
Geometric P
Histogram P for each bin
Noisy OR Boolean parents, Inhibitors
biosecurity built on science
Elicitator
Specification of MPTs and CPTs by extrapolating from well-chosen scenarios.
Success hinges on choice of scenarios (need good design) and on complexity of model.
18
biosecurity built on science
Elicitator Method Slope
Geology Flat Moderate Steep Very steep
Igneous X X X
Volcanic X X Best
Other Worst X
Advantages
- Broader picture of what expert knows about
- Explore more scenarios than 1-2 × #parents
- Assumes probabilities not equal to 0 or 1.
- Software developed.
- Design intuitively or formally.
19
Design scenarios to cover the range of parents, within limited elicitation time.
Elicit Pr(outcome) for scenarios & interpolate via Beta regression.
biosecurity built on science
The Beta Distribution Probability of presence yi
follows a Beta with shape and scale .
Can interpret as #presences, as #absences, + as effective
sample size.
2
| , ~ Beta( , )
E[ ]
Var( )( ) ( 1)
i i i i iy
y
y
0.0 0.2 0.4 0.6 0.8 1.0
24
68
10
yi
p(y
i|0.5
, 0.5
)
0.0 0.2 0.4 0.6 0.8 1.0
0.6
0.8
1.0
1.2
1.4
yi
p(y
i|1, 1)
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
yi
p(y
i|2, 2)
0.0 0.2 0.4 0.6 0.8 1.0
0.0
1.0
2.0
3.0
yip(y
i|1, 3)
0.0 0.2 0.4 0.6 0.8 1.0
02
46
810
yi
p(y
i|1, 10)
0.0 0.2 0.4 0.6 0.8 1.0
020
40
60
80
yi
p(y
i|1, 100)
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
yi
p(y
i|2, 3)
0.0 0.2 0.4 0.6 0.8 1.0
01
23
4
yip(y
i|2, 10)
0.0 0.2 0.4 0.6 0.8 1.0
010
20
30
yi
p(y
i|2, 100)
biosecurity built on science
Expert data One expert, many sites
biosecurity built on science
Indirect elicitation of response to get weights
Elicit outcome (Probability of Presence given habitat) for each scenario.
Expert’s underlying conceptual model for the plausible range of weights of each parent in determining the outcome probability (with scaling).
biosecurity built on science
Feedback on implications
Multi-platform tool
Low Choy+ (2011) in Expert Knowledge & Its Applications in Landscape Ecology, eds Perrera, Drew, Johnson
James+ (2010) Environmental Modelling & Software
Are misfits related to size of predicted probability?
Does the encoded expert model fit their assessments
overall?
Does the encoded expert model fit the assessments
wrt one parent?
Does the encoded expert model fit the assessments across values of a parent?
Are misfits related to size of predicted probability?
biosecurity built on science
Testing scientific hypotheses Many experts, across sites
Low Choy+ (2010) in Oxford Handbook of Applied Bayesian Analysis, eds West & O’Hagan;
Murray+ (2009) Journal of Applied Ecology
biosecurity built on science
Predicting spatial distribution
Experts’ Posterior update using data
Probability of Probability of Uncertainty
presence presence (standard error)
Low Choy+ (2011) in Expert Knowledge & Its Applications in Landscape Ecology,
eds Perrera, Drew, Johnson
biosecurity built on science
WHAT LIES BENEATH
Integrates open source packages (Java, R, MySQL)
Workflow, dynamic object-oriented design
Relational database
biosecurity built on science
Process Flowchart
James+ (2010)
Environmental Modelling &
Software;
Low Choy+ (2009) in
MODSIM 2009 Proceedings
biosecurity built on science
Conceptual Data Model
biosecurity built on science
SOME HINTS ON DESIGN
Many successful experts already have good intuition about design. Statistical design can be seen as simply a mathematical formalisation of good scientific practice.
Specific designs are in textbooks on Experimental Design.
My favourite is Box, Hunter & Hunter (2005) Statistics for Experimenters: Design, Innovation and Discovery,
See list of introductory texts in Low-Choy et al (2011) Chp3 in EKALE.
This tour helps you get started.
biosecurity built on science
Design Performance criteria
Is explanation and/or prediction a primary goal?
- Explanation: Is it more/less important to obtain: Adequate coverage of main sources of variation
More bang for your buck (information ~ variability)
Precise information on at least some aspect of the problem
- Prediction: Is it more/less important to commit: Type I error: deduce a covariate is important when it isn’t
Type II error: decide a covariate is unimportant when it is
BTRW Example - Explanation was considered more important, although prediction
was an intended use for the model. Explanatory ability used in design & results, predictive ability assessed in results.
- Design had to be simple, and aimed for adequate coverage, accounting for regional, landscape and site scale variation.
biosecurity built on science
Design Focus
What are the scientific questions (XY relationship)
- What is the (ecological) response? On what? Over what time period?
- What are the covariates (factors) affecting this response? Are they direct or indirect indicators? At what scale?
- Are there any catalysts, modifiers or confounders that affect this relationship?
BTRW Examples - What are habitat requirements of this species, at landscape
(mappable scale) over the last decade?
- How useful are geology, vegetation, landuse, elevation, slope and aspect in predicting species occurrence?
- Do these depend on region?
biosecurity built on science
Design Risk assessment
Identify important potential sources of bias/imprecision
- Elicitation: What are strengths/weaknesses of these experts that will affect their assessments?
- Expert: What characteristics of experts are important?
- Scenarios: Fit-for-purpose, affect elicitation, match to suitable experts
biosecurity built on science
Design Risk management
Manage potential sources of variation by:
1. Setting: hold factor constant Elicitation: Same order of scenarios, interview script, interview
Experts: All must have at least 5 years experience as an XYZ
Scenarios: Only consider inhabitable habitats in one region
2. Randomizing: essential for avoiding selection bias; should be applied at some scale. Experts: Select randomly (via random number generator, eg
Excel, R) from a list of eligible experts
3. Controlling Scenarios: Stratify by landscape factors (geology, forest type),
then by topographic factors (elevation, slope, aspect)
4. Ignoring Elicitation: Ignore effect of age/gender of expert and elicitor.
biosecurity built on science
Design Logic
Scientific thinking
1. Controls for interpreting size of effects Elicitation: Same order of scenarios, interview script, interview
Experts: All must have at least 5 years experience as an XYZ
Scenarios: Only consider inhabitable habitats in one region
2. Randomizing: essential for avoiding selection bias; should be applied at some scale. Experts: Select randomly (via random number generator, eg
Excel, R) from a list of eligible experts
3. Units Elicitation: Ignore effect of age/gender of expert and elicitor.
4. Replication Elicitation: Ignore effect of age/gender of expert and elicitor.
biosecurity built on science
Design Extrapolation
A poor design - restricts the researcher to merely describing the sample, and
cannot be extrapolated to a broader population.
Describe the sample wrt the target population - What are key sub-groups in the population?
- How representative is the sample of each sub-group?
- Calculate weights for each sub-group.
- If necessary apply weights to elicitation outputs.
Examples - Experts: Is private/public/university sector important? How
many of each in sample and overall? Calculate %.
- Scenarios: Fit-for-purpose, affect elicitation, match to suitable experts
biosecurity built on science
Probabilistic Sampling
Basic algorithms
Haphazard
Systematic
Completely randomized
Stratified
Sudoku (Graeco-
Latin Square)
Judgmental* Clustered*
Increasing statistical know-how required to:
• generate design, and
• adjust analysis for representativeness of sample
*Needs multilevel or reweighted analysis
biosecurity built on science
Probabilistic Sampling
Intermediate algorithms
Before-After-Control-Impact
Ranked set sampling
Taguchi (sacrifice most interactions)*
Fractional factorial (sacrifice some high order interactions)*
Incomplete blocks
(sacrifice some interactions)*
Rotating Panel*
biosecurity built on science
CURRENT DIRECTIONS
.
biosecurity built on science
In the pipeline...
Experts explicitly inform
design (elicit sampling
frame to address sampling bias)
with Daglish, Ridley, Burrell
Better target expert
knowledge: Indirect
elicitation of scenarios not
scores
with James
Eliciting probabilities for
surveillance
with Taylor, Hammond, Penrose, Anderson
Treat expert knowledge as
data with elicitation error
with Albert, Donnet,
Guihenneuc, Mengersen, Rousseau
Calibrate and combine expert
opinions
with Mittinty, MacLeod,
Mengersen
Estimating species Richness on coral reefs,
with uncertainty
with Fisher, O’Leary, Caley,
Mengersen
Expert knowledge is valuable in biosecurity & biodiversity, in BNs & other models.
biosecurity built on science
Final thoughts ... on eliciting CPTs
Four elicitation methods. DIRECT: One-by-one; HUGIN Table Generator
INDIRECT: Elicitator (but there are others)
Can construct CPTs via models or elicitation.
- Key elements
Choosing scenarios or thresholds – from conceptual to numeric values (only for elicitation in this course!)
Consider separating the scientific knowledge from political context influencing thresholds & benchmarks (see 2009, Ecology)
- Underlying skill
Designing experiments: from intuition to statistics
40
biosecurity built on science
Final thoughts ... on eliciting into BNs
Protocol Just like any other data collection!
Basic script ensures prepared, piloted and repeatable, transparent, peer review.
Allows flexibility in follow-up questions.
Why should anyone believe the BBN? Designed approach
Key is managing uncertainty & variability.
Beyond one-off BBNs Adopting modelling best practice (sensitivity, testing)
Implement within Bayesian learning cycle, with priors on table entries (and parameters), so EK can be updated (convolved not averaged) with data
41
biosecurity built on science
Thank you
Allan James, Kerrie Mengersen
Justine Murray, Anne Goldizen, Clive McAlpine, Hugh Possingham
Nichole Hammond, Lindsay Penrose, Sharyn Taylor, Paul Pheloung
Rebecca O’Leary, Rebecca Fisher, Julian Caley
Robert Denham, Mary Kynn, Tara Martin, Petra Kuhnert
Arnon Accad, Wayne Rochester, Kristen Williams, David Pullar
Sandra Johnson
Greg Daglish, Andrew Ridley, Phil Burrell, Pat Collins
For more information, please email [email protected]
Download these slides from QUT e-prints
biosecurity built on science
Further reading Introductions & Overviews
•Spetzler & von Holstein 1975 Management Science
•O’Hagan+2006 Uncertain Probabilities, Appendix C
•Low Choy+2009 Ecology
•Aspinall 2010 Nature
•Knol+2010 Environmental Health
•Kuhnert+2010 Ecology Letters
•Martin+(accepted) Biological Conservation
Especially for Bayesian Networks
•Renooij 2011 Knowledge Engin Review
•Marcot+2006 Canad J Forest Research
• James+2010 Environ Modelling & Software
•Speirs-Bridge+ 2010 Risk Analysis
• Johnson, Low-Choy & Mengersen 2011 Integ Environ Asst Mgmt
•Low-Choy+ 2011, Chp 3, Expert Knowledge & Its Applic in Ecol
Cognitive & Logical traps
•Cooke 1991 Experts in Uncertainty
•Hoffrage 2000 Science
•O’Hagan+ 2006 Uncertain Probabilities Chapters 1-3
•Kynn 2008 J Roy Stat Soc A
•Low Choy & Wilson 2009 IASE Proceedings
•Burgman+ 2011 PLoS ONE