Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
1
Bayesian statistics in the assessment of the benefit-risk balance of medicines using Multi Criteria Decision Analysis
Submitted for the degree of PhD
by
Edward Waddingham
Imperial Clinical Trials Unit, School of Public Health, Faculty of Medicine, Imperial College London.
2
Declaration of Originality
I hereby declare that the work in this thesis is my own contribution. Any published and unpublished
work of others has been acknowledged in the text and a list of references is given.
Copyright Declaration
The copyright of this thesis rests with the author. Unless otherwise indicated, its contents are
licensed under a Creative Commons Attribution-Non Commercial 4.0 International Licence (CC BY-
NC).
Under this licence, you may copy and redistribute the material in any medium or format. You may
also create and distribute modified versions of the work. This is on the condition that: you credit the
author and do not use it, or any derivative works, for a commercial purpose.
When reusing or sharing this work, ensure you make the licence terms clear to others by naming the
licence and linking to the licence text. Where a work has been adapted, you should indicate that the
work has been changed and describe those changes.
Please seek permission from the copyright holder for uses of this work that are not included in this
licence or permitted under UK Copyright Law.
3
Acknowledgements
I am deeply grateful to my supervisors Professor Deborah Ashby and Professor Paul Matthews.
Their help, advice, inspiration, reassurance, feedback, and patience have been immensely valuable in
producing this work.
Many thanks also to Dr Marc Chadeau-Hyam and Professor Nicky Welton, who examined this thesis
and whose feedback has greatly improved the manuscript.
I am indebted to my colleagues from PROTECT (Pharmacological Research on Outcomes of
Therapeutics by a European ConsorTium, funded by the Innovative Medicines initiative – www.imi-
protect.eu) for my introduction to benefit-risk assessment while still an MSc student, and the boost
that it gave to my fledgling career in biostatistical research. PROTECT has influenced this thesis in a
more direct sense, providing a great deal of the data I have used. In particular I am grateful to the
late Richard Nixon for leading the natalizumab case study team, Kimberley Hockley and her Patient
& Public Involvement team for their work on patient preference elicitation (which provided much of
the data used herein) and Shahrul Mt-Isa for his technical expertise and support.
Additional thanks to Kimberley Hockley for her comments on a draft of the manuscript.
This work was funded by an Imperial College PhD Scholarship.
4
Abstract
Medical decisions such as benefit-risk assessments of treatments should be based on the best
clinical evidence but also require subjective value judgements regarding the impact of disease and
treatment outcomes. This thesis argues for a Bayesian implementation of Multi-Criteria Decision
Analysis (MCDA) for such problems. It seeks to establish whether suitable Bayesian models can be
constructed given the variety of data formats and the interdependencies between the many
variables involved.
A modelling framework is developed for joint multivariate Bayesian inference of treatment effects
and preference values based on data from clinical trials and stated preference studies. This method
allows the sampling uncertainty of the parameters to be reflected in the analysis, overcoming a
recognised shortcoming of MCDA. Markov Chain Monte Carlo simulation is used to derive the
posterior distributions. The models are illustrated using a case study involving treatments for
relapsing remitting multiple sclerosis.
The clinical evidence synthesis has several advantages over existing multivariate evidence synthesis
models, including a comprehensive flexible allowance for correlations, compatibility with any
number of treatments and outcomes, and the ability to estimate unreported treatment-outcome
combinations.
The preference models can analyse data from a variety of elicitation methods such as discrete
choice, Analytic Hierarchy Process and swing weighting. In the case of swing weighting no Bayesian
analysis has previously been presented, and the results suggest a possible flaw in the standard
deterministic analysis that may bias the preference estimates when judgements are subject to
random variability. A novel meta-analysis model for preference elicitation studies is also presented.
The framework has the unique ability to analyse data from multiple methods jointly to yield a
common set of preference parameters.
These results demonstrate the flexibility of the Bayesian approach, and the depth of insight it can
provide into the impact of uncertainty and heterogeneity in multi-criteria medical decisions.
5
Table of Contents
Declaration of Originality ................................................................................................................... 2
Copyright Declaration ........................................................................................................................ 2
Acknowledgements ........................................................................................................................... 3
Abstract ............................................................................................................................................ 4
List of abbreviations ........................................................................................................................ 10
List of figures ................................................................................................................................... 12
List of tables .................................................................................................................................... 17
I. Introduction ............................................................................................................................ 19
I.1 Background to the thesis .................................................................................................. 19
I.1.1 Evidence-based medical decisions ............................................................................ 19
I.1.2 Benefit-risk balance .................................................................................................. 19
I.1.3 Benefit-risk assessment in practice ........................................................................... 20
I.1.4 Structured/quantitative benefit-risk assessment ...................................................... 22
I.1.5 Multi-Criteria Decision Analysis (MCDA) ................................................................... 23
I.1.6 Uncertainty, decision making and Bayesian statistics ................................................ 25
I.2 Purpose of the research ................................................................................................... 28
I.2.1 Motivations .............................................................................................................. 28
I.2.2 Research question .................................................................................................... 29
I.3 Methods/strategy ............................................................................................................ 31
I.3.1 Work plan and thesis structure ................................................................................. 31
I.3.2 Case study ................................................................................................................ 31
I.3.3 Software ................................................................................................................... 31
I.4 Literature search .............................................................................................................. 32
I.4.1 Search strategy ......................................................................................................... 32
I.4.2 Literature search flowchart....................................................................................... 34
I.4.3 Literature on quantitative benefit-risk assessment ................................................... 34
II. Bayesian synthesis of clinical evidence for benefit-risk assessment .......................................... 36
6
II.1 Background, aims & objectives ......................................................................................... 36
II.1.1 Introduction ............................................................................................................. 36
II.1.2 Aim, objectives, scope .............................................................................................. 43
II.1.3 Synopsis of literature ................................................................................................ 44
II.2 High level model structure ............................................................................................... 49
II.3 Data ................................................................................................................................. 50
II.3.1 Data structure .......................................................................................................... 50
II.3.2 Dataset: Relapsing-remitting multiple sclerosis ......................................................... 51
II.4 Treatment effects module ................................................................................................ 58
II.4.1 Initial (naïve) model: all outcomes independent (Model 0) ....................................... 58
II.4.2 Correlated non-zero outcomes (Model 1) ................................................................. 60
II.4.3 Contrast-level data (Model 1*) ................................................................................. 64
II.4.4 BUGS coding via variance decomposition ................................................................. 65
II.4.5 Fixed baseline (Model 2) ........................................................................................... 73
II.4.6 Mappings (Model 3) ................................................................................................. 76
II.4.7 Outcomes with zeroes (Models 4a and 4b) ............................................................... 80
II.4.8 Priors ........................................................................................................................ 83
II.4.9 Assessing model fit and complexity .......................................................................... 84
II.5 Population calibration module ......................................................................................... 87
II.5.1 Statistical model ....................................................................................................... 87
II.5.2 Priors ........................................................................................................................ 89
II.5.3 Outputs .................................................................................................................... 90
II.5.4 Rankings ................................................................................................................... 92
II.6 Results ............................................................................................................................. 93
II.6.1 Treatment effects module ........................................................................................ 93
II.6.2 Population calibration module ................................................................................ 122
II.6.3 Final synthesised outcomes on absolute scale ........................................................ 123
II.6.4 Rankings ................................................................................................................. 130
7
II.6.5 Conclusions regarding RRMS treatments ................................................................ 134
II.6.6 Sensitivity analyses ................................................................................................. 135
II.7 Discussion ...................................................................................................................... 135
III. Bayesian multi-criteria utility modelling ............................................................................. 142
III.1 Background, aims & objectives ....................................................................................... 142
III.1.1 Introduction ........................................................................................................... 142
III.1.2 Preference elicitation methods ............................................................................... 147
III.1.3 Data types .............................................................................................................. 154
III.1.4 Allowing for uncertainty in preferences .................................................................. 167
III.1.5 Aim and objectives ................................................................................................. 168
III.2 High level model structure ............................................................................................. 170
III.2.1 Notes on preference parameters ............................................................................ 170
III.3 Bayesian analysis of elicited ratings ................................................................................ 178
III.3.2 Datasets ................................................................................................................. 181
III.3.3 Statistical model ..................................................................................................... 185
III.3.4 Results.................................................................................................................... 191
III.3.5 Discussion .............................................................................................................. 202
III.4 Bayesian analysis of choice data ..................................................................................... 204
III.4.1 Data structure ........................................................................................................ 204
III.4.2 Dataset - PROTECT patient choice data ................................................................... 204
III.4.3 Choice model ......................................................................................................... 205
III.4.4 Results.................................................................................................................... 207
III.4.5 Discussion .............................................................................................................. 208
III.5 Bayesian meta-analysis of preferences ........................................................................... 210
III.5.1 Data structure ........................................................................................................ 211
III.5.2 Dataset: RRMS ........................................................................................................ 211
III.5.3 Data extraction ....................................................................................................... 213
III.5.4 Data rebasing ......................................................................................................... 219
8
III.5.5 Statistical model ..................................................................................................... 223
III.5.6 Results.................................................................................................................... 226
III.5.7 Discussion .............................................................................................................. 229
III.6 Combining preferences from different methods ............................................................. 232
III.6.1 Datasets ................................................................................................................. 233
III.6.2 Statistical model ..................................................................................................... 237
III.6.3 Results.................................................................................................................... 240
III.6.4 Discussion .............................................................................................................. 247
IV. Assessing the overall benefit-risk balance .......................................................................... 253
IV.1 Methods ........................................................................................................................ 254
IV.1.1 High level model structure ...................................................................................... 254
IV.1.2 Selection of outcomes and model versions ............................................................. 254
IV.2 Results ........................................................................................................................... 257
IV.2.1 Benefit-risk scores .................................................................................................. 257
IV.2.2 Rankings ................................................................................................................. 260
IV.2.3 Sensitivity analyses ................................................................................................. 262
IV.3 Discussion ...................................................................................................................... 268
IV.3.1 Bayesian MCDA ...................................................................................................... 268
IV.3.2 Benefit-risk assessment of RRMS treatments .......................................................... 269
V. Conclusions ........................................................................................................................... 271
V.1 Summary of results ........................................................................................................ 271
V.1.1 Bayesian synthesis of clinical evidence for benefit-risk assessment (Chapter II) ...... 271
V.1.2 Bayesian multi-criteria utility modelling (Chapter III) .............................................. 272
V.1.3 Assessing the overall benefit-risk balance (Chapter IV) ........................................... 272
V.2 Strengths ....................................................................................................................... 273
V.3 Limitations ..................................................................................................................... 273
V.4 Reflections on generalisability & applicability ................................................................. 275
V.5 Contribution to the field ................................................................................................. 277
9
V.6 Future research priorities ............................................................................................... 278
V.7 Concluding summary ...................................................................................................... 280
References .................................................................................................................................... 281
Appendices.................................................................................................................................... 293
10
List of abbreviations
Abbreviation Full name
AHP Analytic Hierarchy Process
ALT Alanine Aminotransferase
ARR Annualised Relapse Rate
BR Benefit-Risk
BRA Benefit-Risk Assessment
BUGS Bayesian Inference Using Gibbs Sampling
CC Continuity Correction
CI Confidence Interval or Credibility Interval
DCE Discrete Choice Experiment
DF Dimethyl Fumarate
DP Disability Progression
EDSS Expanded Disability Status Scale
EMA European Medicines Agency
EU European Union
FDA Food & Drug Administration
FM Fingolimod
GA Glatiramer Acetate
GI Gastrointestinal
IA (IM) Interferon beta-1a (intramuscular)
IA (SC) Interferon beta-1a (subcutaneous)
IB Interferon beta-1b
IM Intramuscular
IV Intravenous
JAGS Just Another Gibbs Sampler
LQ Laquinimod
MA Meta-Analysis
MACBETH Measuring Attractiveness by a Categorical-Based Evaluation Technique
MAUT Multi-Attribute Utility Theory
MCDA Multi-Criteria Decision Analysis
MCMC Markov Chain Monte Carlo
MED Macular Edema
MHRA Medicines and Healthcare Regulatory Agency
MNL Multinomial logit
MS Multiple Sclerosis
NICE National Institute for Clinical Excellence
NMA Network Meta-Analysis
PL Placebo
PML Progressive Multifocal Leukoencephalopathy
PROTECT Pharmacoepidemiological Research on Outcomes of Treatments by a European Consortium
PVF Partial Value Function
RFP Relapse-Free Proportion
RRMS Relapsing Remitting Multiple Sclerosis
SBC Serious Bradycardia
SC Subcutaneous
SD Standard Deviation
SE Standard Error
11
SGI Serious Gastrointestinal disorders
SUCRA Surface Under the Cumulative Ranking Curve
TF Teriflunomide
UK United Kingdom
ULN Upper Limit of Normal range
USA United States of America
12
List of figures
Figure 1 – Literature search flowchart ............................................................................................. 34
Figure 2 – Network diagram: pairwise meta-analysis ....................................................................... 38
Figure 3 – Network diagram: simple network meta-analysis (i).. ...................................................... 38
Figure 4 – Network diagram: simple network meta-analysis (ii). ...................................................... 39
Figure 5 – A disconnected network involving treatments A, B, C, D and E (top) is made connected by
the addition of treatment F (bottom). ............................................................................................. 41
Figure 6 – Venn diagram illustrating the relationships between various types of meta-analysis model.
........................................................................................................................................................ 45
Figure 7 – Pictorial representation of the types of meta-analysis model discussed in this section.. .. 47
Figure 8 - High-level model structure, focusing on clinical evidence synthesis. ................................. 49
Figure 9 – Network diagram for the RRMS case study (all outcomes combined). .............................. 53
Figure 10 - Outcomes for the RRMS case study ................................................................................ 55
Figure 11 - Posterior credibility intervals of relative treatment effects (population averages) from
Model 0, fixed effects. ..................................................................................................................... 94
Figure 12 - Posterior credibility intervals of relative treatment effects (population averages) from
Model 0, random effects (except serious GI disorders, serious bradycardia and macular edema). ... 95
Figure 13 - Posterior credibility intervals of relative treatment effects (population averages) from
Model 1 (random effects), with all correlations between outcomes set to zero ............................... 98
Figure 14 - Posterior credibility intervals of relative treatment effects (population averages) from
Model 1 (random effects), with all correlations between outcomes set to 0.6. ................................ 99
Figure 15 - Posterior credibility intervals of relative treatment effects (population averages) from
Model 2 (random effects), with all correlations between outcomes set to zero. ............................ 102
Figure 16 - Posterior credibility intervals of relative treatment effects (population averages) from
Model 2 (random effects), with all correlations between outcomes set to zero. ............................ 103
Figure 17 – Deviance and complexity (leverage) per observation for individual studies in the RRMS
dataset (Model 2, correlations of 0.6). ........................................................................................... 105
Figure 18 - Posterior credibility intervals of relative treatment effects (population averages) from
Model 3 (random effects, fixed mappings, one mapping group, all correlation coefficients between
outcomes = 0.6).. ........................................................................................................................... 106
Figure 19 - Posterior credibility intervals of relative treatment effects (population averages) from
Model 3 (random effects, random mappings, one mapping group, all correlation coefficients
between outcomes = 0.6). ............................................................................................................. 107
13
Figure 20 – Deviance and complexity (leverage) per observation for individual studies in the RRMS
dataset (Model 3, correlations of 0.6, fixed mappings in one group). ............................................. 108
Figure 21 – Deviance and complexity (leverage) per observation for individual studies in the RRMS
dataset (Model 3, correlations of 0.6, random mappings in one group). ........................................ 109
Figure 22 - Posterior credibility intervals of relative treatment effects (population averages) from
Model 4b (random effects, random mappings, one mapping group, all correlation coefficients
between outcomes = 0.6, sample variances estimated as 𝟎.𝟎𝟐𝟓 + 𝒑𝟎.𝟗𝟕𝟓− 𝒑 × 𝟏𝟎𝟎𝑵 for the
“zeroes” outcomes). ...................................................................................................................... 116
Figure 23 – Deviance and complexity (leverage) per observation for individual studies in the RRMS
dataset (Model 4b, correlations of 0.6, random mappings in one group). ...................................... 117
Figure 24 - Posterior distributions of relative treatment effects (population averages) on Normal
scale from the final model (fixed effects on “zeroes” outcomes, otherwise random effects, three
mapping groups, all correlation coefficients between outcomes = 0.6). ......................................... 119
Figure 25 - Posterior distributions of relative treatment effects (population averages) on Normal
scale from the final model (fixed effects on “zeroes” outcomes, otherwise random effects, three
mapping groups, all correlation coefficients between outcomes = 0.6). ......................................... 120
Figure 26 – Deviance and complexity (leverage) per observation for individual studies in the RRMS
dataset (Final model, fixed mappings in three groups). .................................................................. 121
Figure 27 – Deviance and complexity (leverage) per observation for individual studies in the RRMS
dataset (Final model, random mappings in three groups). ............................................................. 122
Figure 28 - Posterior distributions of absolute treatment outcomes (population averages) on Normal
scale from the final model (fixed effects on “zeroes” outcomes, otherwise random effects, three
mapping groups, all correlation coefficients between outcomes = 0.6). ......................................... 125
Figure 29 - Posterior distributions of absolute treatment outcomes (population averages) on their
original scales from the final model (fixed effects on “zeroes” outcomes, otherwise random effects,
three mapping groups, all correlation coefficients between outcomes = 0.6). ................................ 126
Figure 30 - Posterior distributions of absolute treatment outcomes (study-level averages) on their
original scales from the final model (fixed effects on “zeroes” outcomes, otherwise random effects,
three mapping groups, all correlation coefficients between outcomes = 0.6). ................................ 128
Figure 31 - Posterior distributions of absolute treatment outcomes (individual-level) on their original
scales from the final model (fixed effects on “zeroes” outcomes, otherwise random effects, three
mapping groups, all correlation coefficients between outcomes = 0.6). ......................................... 129
Figure 32 - SUCRA based on population averages; fixed mapping model ........................................ 130
Figure 33 - SUCRA based on population averages; random mapping model .................................. 131
14
Figure 34 - SUCRA based on population averages: serious gastrointestinal events ....................... 132
Figure 35 - SUCRA for the efficacy and liver outcomes in the three-group fixed-mapping model: the
impact of predictive variability. ..................................................................................................... 133
Figure 36 - Probabilistic rankings for the population average relapse rate, one-group random
mappings model ............................................................................................................................ 134
Figure 37 – A “star”-shaped evidence network with six active treatments (A, B, C, D, E, F) and placebo
(P). ................................................................................................................................................ 140
Figure 38 – Example of an AHP judgement matrix. ......................................................................... 148
Figure 39 – Swing weighting example using RRMS treatment outcomes and administration modes:
step (i) ........................................................................................................................................... 150
Figure 40 – Swing weighting example using RRMS treatment outcomes and administration modes:
step (ii) .......................................................................................................................................... 151
Figure 41 – Swing weighting example using RRMS treatment outcomes and administration modes:
step (iii) ......................................................................................................................................... 152
Figure 42 – Swing weighting example using RRMS treatment outcomes and administration modes:
step (iv).. ....................................................................................................................................... 153
Figure 43 – Example of a binary choice set..................................................................................... 154
Figure 44 - Simple example of a network of outcome preferences (i). ............................................ 158
Figure 45 - Simple example of a network of outcome preferences (ii). ........................................... 159
Figure 46 – Example of a “web” network with six outcomes/criteria .............................................. 160
Figure 47 – Example of a “fan” network with six outcomes/criteria ............................................... 160
Figure 48 – Hierarchical elicitation network for 10 criteria arranged in two groups of three and one
group of four, using the agglomeration rule and webs at both levels of the hierarchy. .................. 162
Figure 49 - Hierarchical elicitation network for 10 criteria arranged in two groups of three and one
group of four, using the substitution rule and webs at both levels of the hierarchy. ...................... 163
Figure 50 - Hierarchical elicitation network for 10 criteria arranged in two groups of three and one
group of four, using the substitution rule and fans at both levels of the hierarchy – that is, a tree.
...................................................................................................................................................... 164
Figure 51 - Hierarchical elicitation network in Figure 50, shown before identification of criteria for
promotion, i.e. in value tree format. .............................................................................................. 165
Figure 52 - High-level model structure, focusing on preference modelling. .................................... 170
Figure 53 – Two ways to display preferences for categorical variables – an example using criteria
from the RRMS case study (but fictional data). .............................................................................. 174
Figure 54 – Value tree for the RRMS investigator ratings dataset before elicitation. ...................... 183
15
Figure 55 – Value tree for the RRMS investigator ratings dataset after the elicitation process is
complete. ...................................................................................................................................... 184
Figure 56 – Elicitation network diagram for administration modes in the PROTECT RRMS patient
ratings data. .................................................................................................................................. 185
Figure 57 - Network diagram of preference elicitation studies concerning relapsing remitting multiple
sclerosis treatment outcomes........................................................................................................ 224
Figure 58 – Example of combined preference network .................................................................. 234
Figure 59 – Hierarchical structure of the preference data, indicating the levels where random
preference distributions can be used. ............................................................................................ 238
Figure 60 – Forest plot showing the posterior predictive distributions of preference weights in the
full RRMS preference model, at various levels of predictive variability. .......................................... 246
Figure 61 – Preference weights (posterior means) for the key benefit-risk criteria, for three different
combinations of the source datasets. ............................................................................................ 252
Figure 62 – High level structure of the entire benefit-risk assessment model. ................................ 254
Figure 63 – Posterior benefit-risk score for RRMS treatments at three levels of predictive variability.
...................................................................................................................................................... 258
Figure 64 – Posterior benefit-risk score for RRMS treatments at three levels of predictive variability,
with a maximum of 3 relapses per year. ........................................................................................ 259
Figure 65 - SUCRA statistic for the overall benefit-risk score of the RRMS treatments at three levels of
predictive variability. ..................................................................................................................... 261
Figure 66 – SUCRA statistic by treatment based on population-average benefit risk score; efficacy
outcomes only ............................................................................................................................... 262
Figure 67 – SUCRA statistic by treatment based on population-average benefit risk score; liver safety
only. P ........................................................................................................................................... 263
Figure 68 – SUCRA statistic by treatment based on population-average benefit risk score; efficacy
and liver safety outcomes (but not administration). ...................................................................... 263
Figure 69 - SUCRA statistic by treatment based on population-average benefit risk score; disability
progression weight relates to disability progression events confirmed 6 months later (rather than 3
months later in the main results). .................................................................................................. 264
Figure 70 - SUCRA statistic by treatment based on population-average benefit risk score; liver
enzyme elevation weight relates to alanine aminotransferase above 3x upper limit of normal range
(rather than simply above upper limit of normal range as in the main results).. ............................. 265
16
Figure 71 - SUCRA statistic by treatment based on population-average benefit risk score; liver
enzyme elevation weight relates to alanine aminotransferase above 5x upper limit of normal range
(rather than simply above upper limit of normal range as in the main results).. ............................. 266
Figure 72 - SUCRA statistic by treatment based on population-average benefit risk score; preferences
from published studies excluded. .................................................................................................. 267
Figure 73 - SUCRA statistic by treatment based on population-average benefit risk score; PROTECT
patient choice dataset excluded. ................................................................................................... 267
Figure 74 - SUCRA statistic by treatment based on population-average benefit risk score; PROTECT
ratings datasets excluded. ............................................................................................................. 268
17
List of tables
Table 1 - Proportion of patients experiencing effects of treatment for a fictional chronic disease. ... 23
Table 2 - Proportion of patients experiencing effects of treatment for a fictional chronic disease. ... 24
Table 3 - Proportion of patients experiencing effects of treatment for a fictional chronic disease –
with illustrative weights .................................................................................................................. 24
Table 4 – Treatments in the RRMS case study. ................................................................................. 52
Table 5 – Published trial reports providing data to the RRMS case study. ......................................... 56
Table 6 – Distributions commonly used for modelling clinical outcomes at group level. .................. 58
Table 7 – Posterior mean effect estimates from Model 2. ................................................................ 76
Table 8 – Priors for treatment effect module parameters ................................................................ 84
Table 9 – Priors for the population calibration module. ................................................................... 90
Table 10 – Posterior distributions from Model 4: effect of varying mapping groups (random effects,
all correlation coefficients between outcomes = 0.6). .................................................................... 110
Table 11 – Posterior distributions from Models 4a and 4b for the “zeroes” outcomes (fixed effects,
no correlations between outcomes), and empirical treatment effect estimates. .......................... 112
Table 12 - Posterior distributions from Models 4a and 4b (with 100x inflated sample variances) for
the “zeroes” outcomes (fixed effects, no correlations between outcomes), and empirical treatment
effect estimates. . .......................................................................................................................... 114
Table 13 - Posterior distributions of untreated population outcomes on Normal scale from
population calibration module....................................................................................................... 123
Table 14 – Data structure for a “fan” network with six outcomes A, B, C, D, E and F. ..................... 179
Table 15 – Data structure for a “web” network with six criteria A, B, C, D, E and F. ........................ 179
Table 16 – Data structure for the network in Figure 49 with ten outcomes A, B, C, D, E, F, G, H, I and
J. ................................................................................................................................................... 180
Table 17 – Data structure for the tree in Figure 50 with ten outcomes A, B, C, D, E, F, G, H, I and J.
...................................................................................................................................................... 181
Table 18 – Priors for ratings model parameters ............................................................................. 190
Table 19 – Mean preference weights for individual participants in the investigator ratings dataset;
deterministic analysis and Bayesian analysis with sensitivity to assumed ratings standard deviation
...................................................................................................................................................... 193
Table 20 – Posterior distribution of preferences for simultaneous analysis of all participants in the
investigator ratings dataset ........................................................................................................... 194
Table 21 – Median preferences for a single participant in the patient ratings dataset; deterministic
analysis and Bayesian analysis with sensitivity to assumed ratings standard deviation ................... 196
18
Table 22 – Posterior distribution of preferences for simultaneous analysis of all participants in the
patient ratings dataset .................................................................................................................. 198
Table 23 – Posterior distribution of preferences for simultaneous analysis of all participants in the
investigator ratings and patient ratings datasets ........................................................................... 200
Table 24 - Posterior distribution of preferences in the patient choice dataset ................................ 208
Table 25 – Comparison of criteria weights in the investigator ratings and patient choices datasets.
...................................................................................................................................................... 209
Table 26 – RRMS case study outcomes for the preference synthesis module. PVF = partial value
function......................................................................................................................................... 212
Table 27 – Source studies for the RRMS dataset for the preference synthesis module. .................. 213
Table 28 – Example of a 4-category variable and its dummy-coded indicator variables .................. 214
Table 29 – Example of a 4-category variable and its effects-coded indicator variables ................... 216
Table 30 - Posterior distribution of preferences in published RRMS preference elicitaton studies;
fixed preference model ................................................................................................................. 226
Table 31 - Posterior distribution of preferences in published RRMS preference elicitation studies;
random preference model ............................................................................................................. 227
Table 32 - Posterior distribution of preferences in published RRMS choice studies and summary data
from PROTECT patient choice study; random preference model .................................................... 228
Table 33 – Treatment outcomes and administration modes for the RRMS treatments, and the
availability of corresponding preference data. ............................................................................... 233
Table 34 – Overall RRMS preference model: Criteria from each dataset for inclusion/exclusion .... 236
Table 35 - Posterior distribution of preferences based on published RRMS choice studies and full
data from PROTECT patient choice study; fixed preference model ................................................. 241
Table 36 - Posterior distribution of preferences based on published RRMS choice studies and full
data from PROTECT patient choice study; random (by study) preference model ............................ 242
Table 37 - Posterior distribution of preferences based on all preference datasets; fixed preference
model ............................................................................................................................................ 244
Table 38 - Posterior distribution of preferences based on all preference datasets; random (by study)
preference model .......................................................................................................................... 245
Table 39 – Identification of outcomes to which preferences relate for the criteria in the RRMS case
study. ............................................................................................................................................ 255
Table 40 – Benefit-risk score by treatment, with breakdown by criterion. Figures are population
average posterior means and (standard deviations). ..................................................................... 258
Chapter I.1
19
I. Introduction
I.1 Background to the thesis
I.1.1 Evidence-based medical decisions
Modern healthcare provision is guided by the concept of “evidence-based medicine”, which has
been defined as “the conscientious, explicit, and judicious use of current best evidence in making
decisions about the care of individual patients” 1.
Ultimately most medical decisions are a choice between treatments (or the option of no treatment)
for a particular indication. For example, a patient may need to choose which (if any) drug to take; a
clinician may need to choose which treatment to prescribe; a healthcare provider, which drugs to
stock; a regulator, which treatments to license; or a pharmaceutical company, which potential new
drugs to develop. All of these decisions ultimately influence the patient’s choice of treatment.
Medical statisticians aim to provide evidence to inform decisions such as these. However, they often
do not attempt to directly answer the overall question of which treatment(s) is/are optimal. Instead
they break the problem down into smaller, more focused questions that can be answered more
easily (which treatment is best with respect to efficacy outcome A, or safety outcome B, for
example). This is a necessary part of the evidence gathering process, but unless it is made clear how
the answers to those smaller questions can be put together to answer the overall question, then
arguably the job is only half done.
As the examples above illustrate, within the healthcare field there are a number of different decision
makers and types of decision. This thesis focuses in particular on regulatory benefit-risk
assessments, as explained in the next section, but many of the principles apply to medical decision
making in the more general context.
I.1.2 Benefit-risk balance
Ensuring the safety of medicines is a key priority for drug developers and regulators, but in practice
there is no such thing as a drug that is 100% safe. All pharmaceutical treatments are associated
with a risk of adverse events of one kind or another, even if the events are mild or the risk is low (or
confined to certain subgroups of the patient population).
The key question then, when deciding whether a treatment is appropriate, is: do the benefits
outweigh the risks?
Chapter I.1
20
In the context of regulatory licensing this is known as the benefit-risk balance. The concept of
balance recognises that there is an implicit trade-off between benefit and risk – in other words,
there is (perhaps within limits) no level of benefit, or of risk, that on its own should tip the scales and
lead to a treatment being approved (or not approved) for the market; rather, treatments with a high
level of risk should balance this with a high level of benefit, and treatments with modest benefits
should only be associated with modest risks. This reflects the fact that there is a range of needs and
expectations in the patient population. Not all patients require or demand the same level of
effectiveness, but those who opt for more aggressive therapy may do so in the knowledge that the
risks may also increase.
This trade-off or balance is the key consideration in getting drugs through development, clinical
trials, and ultimately onto the market. This process culminates in a benefit-risk assessment, when
the drug’s clinical benefits and risks (in other words, its efficacy and safety profile) are weighed
against one another to determine whether it is fit for general use 2. Such an assessment is typically
carried out by a jurisdiction’s medicine regulator when it considers whether to issue a marketing
licence for the treatment in question, and periodically thereafter as new data emerges. The
pharmaceutical company that manufactures the drug may also carry out similar assessments to
support its licence application/renewal, or earlier during the development process.
I.1.3 Benefit-risk assessment in practice
A real-word benefit-risk assessment carried out for regulatory purposes should reflect the latest
clinical evidence, which may be drawn from several studies. Various data sources may be available,
but those most likely to be suitable for benefit-risk assessment are:
• Randomised controlled trials (RCTs) are generally seen as the gold standard for assessing the
relative efficacy and safety of two or more treatments; the randomised design eliminates
the selection bias that can occur in observational studies, ensuring the treatment groups are
comparable and avoiding confounding between treatment allocation and outcome. The
main limitations of RCTs for benefit-risk assessment are that they are of limited duration and
are carried out in relatively small numbers of patients, and thus cannot always establish
outcomes that occur rarely or take a long time to manifest.
• Post-marketing studies are carried out after a drug has been through the RCT process and
released onto the market3. These studies typically use observational designs which leave
their relative effect estimates more open to bias, and so are not often used to supplant RCT
evidence, especially for efficacy outcomes. However, since post-marketing studies can
follow much larger groups of patients for much longer periods than RCTs, they can be a
Chapter I.1
21
valuable source of data on rare and/or long-term adverse events which could not be
measured in a trial and may be important in a benefit-risk assessment.
• Registries typically collect information on a routine basis from a large number of patients
distributed across multiple sites in a healthcare system, and so may be well placed to
provide data on a set of patients that is highly representative of the target population for a
given decision. As such, registry data can be useful for estimating the baseline distribution
of outcomes experienced among untreated patients (or patients on the current standard of
care). However, as a form of observational data, and with enrolment and data entry
practises that may vary between sites and among personnel, registry data is usually less
suitable for deriving effects estimates4.
Evidence synthesis methods such as meta-analyses may help to combine the study results into a
coherent overall picture. However, with or without such techniques, there are a number of
complicating factors that can cause difficulties in gathering and combining the evidence, and
establishing the appropriate balance of efficacy and safety, including:
• Comparators: If alternative treatments already exist for the same indication, it may
sometimes be appropriate to assess the new drug’s benefit-risk profile relative to these
comparators, rather than in isolation 2. Hereinafter the term “decision set” will be used to
refer to the group of treatments that are included in a benefit-risk assessment – in other
words, the drug in question and any relevant comparators. The more treatments are in a
decision set, the more complex the assessment process.
• Multiple benefits and risks: within each of the categories “benefit” and “risk” there may be
several clinical outcomes to consider; and the set of outcomes may vary between a
treatment and its comparators. It can take a significant amount of time and effort to
examine what can be a large volume of data and pick out a coherent set of outcomes on
which to base the assessment 5. Furthermore, if evidence syntheses are performed, these
should reflect the possibility of correlations between outcomes.
• Few source studies: At or soon after the point of licensing, the number of studies providing
data on a drug is likely to be minimal; there can remain therefore a good deal of uncertainty
in the evidence.
Chapter I.1
22
• Limited, sparse or heterogeneously defined data: It may not always be possible to find
clinical evidence for all of the relevant clinical outcomes for each treatment in the decision
set. Additionally, studies may adopt different definitions or measurement scales for a given
outcome, leading to compatibility problems.
• Establishing appropriate trade-offs: Although it may sometimes be clear, the appropriate
level of trade-off between benefits and risks is, in general, a question of subjective value
judgements.
Traditionally, regulatory benefit-risk assessments have been carried out by committees who
consider the clinical evidence (typically presented as a written summary of individual study findings)
and come to a judgement regarding the overall benefit-risk balance. Often no attempt is made to
present source data side-by-side in a form suitable for direct comparison. In the early 21st century
there arose concerns that this approach lacks rigour and transparency 6. The factors listed above
contribute to a highly complex evidence base that will often simply be too difficult to weigh up
reliably unless one can perform additional analyses to elucidate the key differences in outcomes
between treatments and/or work through the implicit value trade-offs. Attempting to disentangle
all of the strands of such a problem “in one’s head” can lead to poor decisions because there is a
limit on the number of factors people can weigh up simultaneously, meaning that some aspects of
the problem may be misjudged or overlooked 7,8.
Structured benefit-risk assessment methods have been gaining momentum as a way of addressing
these concerns.
I.1.4 Structured/quantitative benefit-risk assessment
Recently, regulators including the European Medicines Agency (EMA) in the EU, MHRA (Medicines
and Healthcare Regulatory Agency) in the UK and Food & Drug Administration (FDA) in the USA
began to show interest in the use of more formal decision-making techniques for benefit-risk
assessment.
In 2009 the EMA embarked on a three-year project looking into the feasibility of adopting methods
from decision theory for this purpose9. Other similar initiatives were launched, on both sides of the
Atlantic, in collaboration between regulators, pharmaceutical companies and academics 10-12. These
projects identified a number of methods that may be suitable and explored these via applications to
a number of topical problems in the field of drug regulation.
Chapter I.1
23
These methods range from simple stepwise frameworks that encourage structured thinking and
documentation of the decision process 11,13, through to fully quantitative decision analysis
techniques, of which a leading example is multi-criteria decision analysis (MCDA) 14. Quantitative in
this sense means that preferences for specific benefits and risks are explicitly incorporated in the
assessment and used to weigh the effects of each treatment 5. Preferences can be elicited from
patients or other stakeholders. Modelling preferences in this way is somewhat novel in the health
sciences and is not always straightforward, but can provide evidence fundamental to understanding
and making decisions. It may be worth noting that in their definition of “evidence-based” medicine,
Sackett et al refer to “thoughtful identification and compassionate use of individual patients'
predicaments, rights, and preferences in making clinical decisions”1.
Such methods have begun to make an impact on industry and regulatory practice. The EMA has
issued guidance on benefit-risk assessments stating that “the assumptions, considerations, and
judgement or weighting that support the conclusions of the benefit-risk evaluation should be clear”
and acknowledging that quantitative methods may sometimes be used2. Regulators in the USA have
begun carrying out their own elicitation studies in order to inform real-world decisions 15.
I.1.5 Multi-Criteria Decision Analysis (MCDA)
MCDA is a formal framework for breaking down complex decisions into a series of simpler
judgements that logically lead to an overall solution. The key value trade-offs are identified and
addressed (e.g. how many occurrences of a particular adverse event can be tolerated for a given
level of benefit), facilitating critical thinking about the problem and transparent communication of
the final decision. MCDA in the broad sense refers to a family of related methods, with histories of
use in various fields, with minor differences in their formulations and terminology. Most
implementations of MCDA require decision makers to explicitly specify their value judgements
(preferences) in quantitative terms, as shown in the example below.
Table 1 shows the proportion of patients experiencing the key benefit and side effects of two
treatments for a fictional chronic disease. A patient faced with choosing between these two
treatment options must decide whether the additional chance of benefit on Drug B (15%) outweighs
the elevated risk of cardiovascular events (4%), a potentially serious side effect.
Table 1 - Proportion of patients experiencing effects of treatment for a fictional chronic disease.
Treatment >50% reduction in disease symptoms
Cardiovascular events
Drug A 30% 0%
Drug B 45% 4%
Chapter I.1
24
In this simple example, with only one trade-off to consider, it is probably not so difficult to come to a
decision without any further analysis. A decision maker can simply weigh up a 4% increase in
cardiovascular events However, consider Table 2 which includes evidence on 2 additional risks, liver
damage and seizures.
Table 2 - Proportion of patients experiencing effects of treatment for a fictional chronic disease.
Treatment Benefit: >50% reduction in disease symptoms
Cardiovascular events
Liver damage Seizures
Drug A 30% 0% 3% 2%
Drug B 45% 4% 1% 0
This time there are more trade-offs to consider and the problem starts to become too complex to be
handled in the decision maker’s head – particularly if the decision maker is a regulator with
professional responsibility for public safety. Simply forming an opinion without explicit
consideration of the underlying trade-offs is not likely to be a satisfactory approach; regardless of
the decision maker’s confidence in his or her judgement, the decision that is ultimately made should
be defensible as being transparently based on sound evidence and reasoning. MCDA involves
breaking the problem down into a set of simpler trade-offs and clearly setting these out. Suppose,
for example, the decision maker forms the following opinions regarding the various effects of
treatment: (i) a cardiovascular event is the most serious outcome to avoid; (ii) reducing disease
symptoms by 50% is about half as important as avoiding a cardiovascular event; (iii) a seizure is also
about half as important as a cardiovascular event; (iv) liver damage is only one quarter as important
as a cardiovascular event. These judgements can be expressed as a vector of weights, shown as an
additional row in Table 3.
Table 3 - Proportion of patients experiencing effects of treatment for a fictional chronic disease – with illustrative weights
Treatment Benefit: >50%
reduction in
disease
symptoms
Cardiovascular
events
Liver damage Seizures
Drug A 30% 0 3% 2%
Drug B 45% 4% 1% 0
Weight 22% 45% 11% 22%
Chapter I.1
25
The overall weighted effect in favour of Drug A is then calculated as the weighted average of the
individual effects (paying careful attention to the signs so that a positive sign indicates that the
effect favours Drug A and a negative sign favours Drug B), i.e.
Net benefit on drug A = 22% x (30% - 45%) + 45% x (4% - 0%) + 11% x (1% - 3%) + 22% x (0% - 2%)
= -2.2%
In other words, the overall weighted effect shows that on the basis of the specified preference
weights, Drug B is the favoured choice. This is an idealised illustration of the MCDA approach using
a simplified version of the method (sometimes referred to as net clinical benefit 16). By explicitly
valuing the trade-offs underlying decision, and then putting those values together with the data in a
principled fashion, the logical course of action is revealed. Furthermore the decision has been made
on a transparent basis, facilitating critical appraisal or future reviews, and helping to ensure
consistency with other decisions. Another important benefit of the method is the ability to
sensitivity-test decisions by repeating the analysis with different assumptions, clinical data values or
preference trade-offs.
The tables above are examples of effects tables, displays used in benefit-risk assessment which
show data for all the key favourable and unfavourable effects for the treatments in the decision set.
I.1.6 Uncertainty, decision making and Bayesian statistics
The deterministic nature of MCDA has often been recognised as problematic. The problem is not
insurmountable, but exactly how uncertainty should be incorporated is an open question; this issue
has been acknowledged in healthcare 17-19 and other fields 20-23.
Various methods have been proposed to handle uncertainty in preferences, including:
o Altering the elicitation tasks to include direct elicitation of uncertainty levels 24-26 or
preference intervals 27,28. These techniques require participants to state not only point
estimates of their preferences in the usual way but also to provide some indication of the
certainty of those estimates. A number of variations on this technique exist – to give just
two examples, participants might be asked to suggest a plausible range for the estimate, or
to estimate probabilities from the cumulartive distribution function at various points in the
distribution. These responses can be used to derive estimates of the underlying
distributions – for example, if the preference distributions are assumed to follow specific
parametric forms then their parameters can be estimated from elicited ranges by treating
Chapter I.1
26
them as confidence intervals 27. There are some problems with this approach, however.
Firstly, it may increase the cognitive burden on participants. Secondly, translating the stated
ranges or certainty measures into estimates of the actual preference parameters may rest
on some rather strong assumptions regarding both the shape of the underlying preference
distributions and the participants’ ability to characterise those distributions
accurately.Conducting one-way sensitivity analyses 29. This is a simplistic form of analysing
the impact of uncertainty, performed by repeating the decision analysis with different values
for the preference parameters. The key problem with this approach is that provides no
information on the probability of those values being observed, and hence gives no sense of
the distribution of the results.
o Stochastic Multicriteria Acceptability Analysis (SMAA) 30, which assumes complete
uncertainty over preferences (or minimal information such as criteria rankings only), and
performs an analysis treating all possible weight combinations as equaly likely. Where no
elicited preference data exists, this may be a reasonable approach, but where there is some
data available, SMAA cannot make best use of it to narrow down the estimates31,32.
o External estimation of probability distributions: This approach involves estimating the
distribution of preferences outside the main analysis model (for example by bootstrapping a
sample of preferences33). The resulting distributions can then be fed into the main MCDA
model. This approach will tend to require tailoring to the specific study and sample on
which it is based, and is therefore difficult to generalise to arbitrary datasets; it also requires
the distributions to be estimated separately to the main model, which may be laborious,
approximate, and lacking in elegance compared to a holistic one-step analysis.
I would argue however that the best approach may be a long-established paradigm that allows the
parameters of a system (here, the preferences and treatment effects) to themselves be regarded as
random variables subject to uncertainty. I refer of course to Bayesian statistics, which provides a
principled means for constructing credibility distributions of statistical parameters given observed
data. In the Bayesian paradigm there is no need to augment the data with additional questions
relating to uncertainty, as the uncertainty level is inferred from the data and model structure.
Furthermore the parameter distributions are informed and constrained by the evidence, not
completely uncertain as in SMAA. Unlike in one-way sensitivity analyses, a full characterisation of
the parameter distributions is obtained.
Bayes’ theorem, and the principles of Bayesian inference, have been known about for hundreds of
years, and their potential applicability to decision making has long been recognised34. Owing in part
Chapter I.1
27
to computational difficulties, however, applications in medicine were few and far between
throughout much of the 20th century. Finally, in the 1990s, advances in computing made it feasible
to sample from arbitrary posterior distributions using Markov Chain Monte Carlo (MCMC)
techniques, leading to an increase in applied Bayesian work 35.
The use of statistical methods to assist decision making is not novel in healthcare. Meta-analysis, for
example, has long been associated with systematic reviews of evidence, functioning as an important
tool for aggregation and efficient communication of the results. Nevertheless it has been argued
that statisticians should do more to bridge the gap between their analyses and decision-making, and
that Bayesian thinking and utilitarianism are natural tools for this task36-38. Others have noted the
promise of Bayesian methods for benefit-risk assessment 39. There are a number of reasons why
Bayesian thinking is well suited to the problem of incorporating uncertainty in MCDA and benefit-
risk assessment, such as:
- The long alliance between Bayes and decision making under uncertainty 34,38;
- Treating parameters as random variables translates well to inference on functions of
multiple parameters;
- The ability to supplement evidence with priors if data is lacking;
- MCMC allows construction of (almost) arbitrarily complex models; provided the likelihood
and priors can be specified, there is no need for closed-form posteriors. Similarly, the
distribution of any derived variables can be obtained.
Chapter I.2
28
I.2 Purpose of the research
I.2.1 Motivations
Researchers, pharmaceutical companies and regulators have begun to show an interest in
quantitative benefit-risk assessment using MCDA-style approaches. It is recognised that such
methods can be valuable in helping decision makers to make sense of the data and clarify the
decision process. MCDA is a relatively new and unfamiliar method in the health sciences, however,
and - even in its standard, simple determistic form - is not always implemented well, both in terms of
the technical details of its application and the suitability of its underlying assumptions. One frequent
mistake often seen in health science applications is the confounding of outcome importance with
incidence (see III.1.3.3). Even with a good understanding of the method, significant practical
difficulties remain that must be overcome before the use of MCDA for benefit-risk assessment can
become more widely adopted. A summary of some of the pitfalls to watch out for – and mistakes
made by existing studies – has been provided by Garcia-Hernandez 40.
However, if done well, the use of methods such as MCDA could bring benefits. Incorporating patient
preferences in health decision making is recognised as advantageous in theory 41 and alignment of
treatment prescriptions with patient preferences has been shown to result in greater treatment
satisfaction 42.
Many of the challenges in using MCDA to support benefit-risk assessments are statistical in nature.
As noted above, summaries of multiple clinical outcomes of multiple treatments derived from
multiple studies may need to be combined, pushing the limits of current meta-analytical techniques.
Benefit-risk is by nature a multidimensional problem, but typically meta-analyses are restricted to a
single outcome and a single treatment contrast, and extending these dimensions is difficult to
handle in a rigorous manner – one cannot simply carry out separate analyses of the various contrasts
and outcomes, as they are linked by correlations and consistency relations. Extensions to either
multiple contrasts or multiple outcomes exist 43-47, but models that can handle both situations are
few and none are entirely satisfactory, for reasons that will be discussed at greater length in the next
chapter. Further problems arise when the evidence base is sparse or outcome definitions are not
consistent.
One notable challenge with using MCDA for healthcare decision-making is the methid’s deterministic
nature. The standard version of MCDA makes no allowance for sampling error or other uncertainties
in the data, or the preference parameters, and therefore no indication of the robustness of the
conclusions. A Bayesian implementation of MCDA would go some way towards addressing this, and
Chapter I.2
29
demonstrating that such an analysis is possible is the main focus of this thesis. Bayesian (or other
probabilistic) modelling of outcome preferences is again not straightforward because of correlations
and consistency relations (see below) among them. Allowing for correlations among parameters is
particularly important in MCDA, because the method computes an overall score as a linear function
of the individual parameters, and this means that the presence of correlations (or equivalently,
covariances) contribute additional terms to the variance (since 𝑣𝑎𝑟(𝑎𝑋 + 𝑏𝑌) = 𝑎2𝑣𝑎𝑟(𝑋) +
𝑏2𝑣𝑎𝑟(𝑌) + 2𝑎𝑏𝑐𝑜𝑣(𝑋, 𝑌) where X, Y are random variables and a, b are linear coefficients).
“Consistency relations” refers to relationships among the model parameters that must hold in order
for the estimates to have a coherent logical interpretation when considered as a whole. Briefly, this
means that (for example) when a parameter representing the difference between quantity A and
quantity B is added to a parameter representing the difference between quantity B and quantity C,
the result always corresponds to the difference between A and C. Consistency is a key concept
underlying the models in this thesis; subsequent chapters discuss in more detail the relevance of
consistency relations to estimating treatment effects (see II.4.2), mappings between outcomes
(II.4.6), and preferences (III.2.1.4).The use of elicited preference values is particularly novel in the
field of benefit-risk assessment, and raises interesting questions relating to the uncertainty and
homogeneity of preferences among patient populations and other stakeholders, how to make
decisions in the face of preference heterogeneity, and whether the elicitation process introduces
further uncertainties that may impact the elicited results. In principle at least, Bayesian statistics
(with its perspective of parameters as random variables) can provide a convenient framework for
propagating preference uncertainty and examining/accounting for heterogeneity (for example using
random effects or hierarchical models). There has to date been little research into Bayesian
modelling of elicited preferences in healthcare, however.
For its utility as a benefit-risk assessment tool to be established, the Bayesian MCDA approach must
be shown to be able to address the issues discussed above for real-world benefit-risk decision
problems.
I.2.2 Research question
The overall question this research addresses is:
“Can a modelling framework be developed that facilitates a fully Bayesian implementation of MCDA
for benefit-risk decision making; with parameters for clinical outcomes and associated preferences
directly informed by real-world data, and reflecting the uncertainties inherent in such data, while
respecting all relevant correlations and consistency relations?”
Chapter I.2
30
Here “real-world data” refers to evidence obtained from actual studies carried out in the target
patient population (or other substantially similar population), which would be considered
appropriate to inform a real regulatory benefit-risk assessment. It excludes data consisting of
assumptions or estimates made by the decision maker, or fictional or idealised data fabricated
purely to illustrate the methodology.
Chapter I.3
31
I.3 Methods/strategy
I.3.1 Work plan and thesis structure
The modelling work involved in this project divides into three parts, dealt with in Chapters II, III and
IV in turn. Chapters II and III deal with methods for Bayesian inference ofthe two types of data
needed for quantitative benefit-risk assessment using MCDA: in Chapter II, clinical data concerning
the effects of treatments; and in Chapter III, preference data concerning the relative importance of
those effects.
Specific aims and objectives are set out in each of Chapters II and III. In each case the same broad
work strategy will apply:
• Identify key modelling issues to be addressed
• Review the literature for existing methods
• Try to develop new approaches where existing methods fall short
• Test approaches on case study data
• Evaluate the results and draw conclusions
Having developed the necessary methodological tools in Chapters II and III, Chapter IV then brings
the parameter distributions together and performs the MCDA calculations using probabilistic
simulation, addressing the overall goal of creating a full Bayesian benefit-risk assessment model.
Finally, chapter V reflects on the overall findings and puts the results in context.
I.3.2 Case study
A case study based on relapsing multiple sclerosis treatments will be used throughout the thesis to
work through the methodological issues.
I.3.3 Software
The project will require complex modelling of clinical evidence and preferences, possibly from a
diversity of data sources. The uncertainty from the various data sources will be propagated through
to final benefit-risk decision model. This means that the modelling approach will need to be
modular, multivariate and customisable.
For this reason no attempt will be made to derive closed-form posteriors at any stage of the
modelling, as this would impose tight restrictions on the model structure. Instead the approach will
be to construct complex models that simultaneously define the priors, the likelihood of the various
data inputs and the derivation of the variables that determine the overall benefit-risk balance.
Chapter I.4
32
MCMC techniques can then be used to sample from the joint posterior, with the result that the
uncertainty of the inputs is automatically propagated to the outputs.
Models will be specified in the BUGS language 48 and run using either WinBUGS (version 1.4.3) or
OpenBUGS (version 3.2.2). Any frequentist analyses required for comparison purposes will be run
using R (version 3.2.1).
I.4 Literature search
This section sets out the literature search and strategy and results, together with an overview of the
literature in the field of quantitative benefit-risk assessment. Chapters II and III each include a
synopsis of the relevant technical literature drawn from a wider field.
I.4.1 Search strategy
The literature search strategy comprised three searches in parallel, as detailed in the following
subsections and carried out on the PubMed, Scopus and Web of Science databases. After the
searches a screening process was carried out, in which duplicate records found by more than one
database or search are discarded, and references are examined for relevance, first based on title,
then abstract, and finally the full text. Forward and backward citation tracking was also carried out
on key references in order to pick up any additional publications of interest.
I.4.1.1 Search 1: Quantitative benefit risk assessment
Purpose: to establish the current state of the art regarding applications of quantitative benefit-risk
methods that either (i) are Bayesian or otherwise probabilistic; or (ii) focus on preference elicitation
methods
Scope: Journal articles, reviews and books, no time limit.
Keywords:
ANY OF:
AND
ANY OF:
AND
ANY OF:
“benefit risk” “risk benefit” “benefit and risk” “risk and benefit” “benefit harm” “harm benefit” “benefit and harm” “harm and benefit” “net benefit” “net clinical benefit”
“Bayesian” “probabilistic” “stochastic” "AHP" "Analytic Hierarchy Process" "Swing weighting" "MACBETH" "Measuring Attractiveness by a Categorical Based Evaluation Technique" "DCE" "discrete choice"
“MCDA” “MCDM” “MAUT” “multi criteria” “multiple criteria” “multi attribute” “multiple attributes” “multi outcome” “multiple outcomes” “multi endpoint” “multiple endpoint” “multivariate” “quantitative” “weighted” “utility”
Chapter I.4
33
I.4.1.2 Search 2: Bayesian meta-analysis
Purpose: to avoid duplicating work developing the meta-analytical techniques required for the
project.
Scope: Reviews and books, no time limit.
Keywords:
ANY OF:
AND
ANY OF:
“meta analysis” “evidence synthesis” “indirect treatment comparison” “mixed treatment comparison”
“Bayesian” “probabilistic” “stochastic”
I.4.1.3 Search 3: Preference elicitation
Purpose: to identify any existing methods for analysing stated preference data with explicit
allowance for sampling variability, potentially from any field of study
Scope: Journal articles, reviews and books, no time limit.
Keywords:
ANY OF:
AND
ANY OF:
AND
ANY OF:
"preference" "value" "utility" "weights" "judgements" "choice model" "stated preference" "conjoint analysis"
"Bayesian" "probabilistic" “regression” “stochastic”
"MCDA" “MCDM” “MAUT” "multi criteria" "multi attribute" "multiple attributes" "multi outcome" "multiple outcomes" "multi endpoint" "multiple endpoint" "AHP" "Analytic Hierarchy Process" "Swing weighting" "MACBETH" "Measuring Attractiveness by a Categorical Based Evaluation Technique" "DCE" "discrete choice"
Chapter I.4
34
I.4.2 Literature search flowchart
Figure 1 summarises the steps of the literature review. Searches were run on 2 June 2016.
Figure 1 – Literature search flowchart
I.4.3 Literature on quantitative benefit-risk assessment
MCDA has been used in various fields dating back at least to the 1970s; however, its use in benefit-
risk assessment of medicines is a more recent development. CIOMS (Council for International
Organisations of Medical Sciences) Working Group IV (1998) noted that it would be desirable if
regulatory decisions could be made on a firmer, more quantitative basis 49. The idea gained
momentum throughout the following decade and in 2009 two major European initiatives were
launched with the aim of evaluating the usefulness of quantitative benefit-risk assessment
methods9,10. A number of authors noted that MCDA could in principle be applied to benefit-risk
problems and/or demonstrated simple deterministic examples14,29,50.
Quantitative benefit-risk assessments allowing for uncertainty in the input parameters can also be
found in the literature, although these frequently apply only to specific examples with relatively
Initial search hits
•Search 1: PubMed 66, Scopus 183, Web of Science 93
•Search 2: PubMed 480, Scopus 404, Web of Science 366
•Search 3: PubMed 201, Scopus 1029, Web of Science 1197
•Total before duplicates removed: 4019
Removal of duplicates
•1389 duplicates removed
•Total 2630 hits remaining
Title and abstract
screening
•1967 rejected
•Total 663 hits remaining
Full text screening
•504 rejected
•Total 159 hits remaining
Citation tracking
•61 added
•Total 220 citations remaining
Chapter I.4
35
simple models or problem structures, rather than presenting a generalizable framework for
uncertainty in MCDA.
In 2005 Sutton et al used Bayesian modelling to derive distributions of the benefit-risk balance of
warfarin, an anticoagulant 16. The decision model used was known as Net Clinical Benefit (NCB) and
essentially corresponds to a special case of MCDA with binary outcomes and linear utility functions,
simplifying the statistics required. Furthermore, the case study was relatively simple with only one
benefit, one risk and two treatment options. Although this paper illustrates the value of a Bayesian
quantitative benefit-risk assessment, the net clinical benefit framework is somewhat restrictive,
limiting the applicability of this approach to other problems.
Hughes et al. used probabilistic methods to allow for uncertainty of treatment effects in benefit-risk
assessment using a “decision tree” framework that is very similar to MCDA 6. However, their
method relies on the existence of evidence from studies directly comparing each treatment of
interest, which is not always available. Caster et al. extended this approach by considering the
uncertainty of utilities given only qualitative preference data51. In both cases the approach used was
effective but not fully generalizable.
Stochastic multi-criteria acceptability analysis (SMAA) is a variation of MCDA, designed for situations
where preference weights are unknown or only partially known 30. SMAA ranks alternatives by
exploring all possible combinations of weights using MCDA and calculating how often each
alternative is chosen. SMAA has been aplied to benefit-risk assessment 31, using a specialised
software package 52 to carry out a probabilistic benefit-risk assessment using Monte Carlo
simulations to allow for uncertainty in both clinical parameters and preferences.
SMAA provides a useful computational approach for obtaining results from a multi-criteria decision
model in the absence of clear preference data, but it does not provide any guidance on how the
underlying parameter distributions can be derived starting from clinical data or from elicited
preferences. In a world of evidence-based medicine, sound methods are needed for making
inferences from real data and these are absent from the standard SMAA approach.
Waddingham et al carried out a Bayesian quantitative benefit-risk assessment of natalizumab for
relapsing-remitting multiple sclerosis, using MCDA53. The Bayesian model was applied successfully
but the modelling approach was overly simplistic, lacking correlations, and required a number of ad
hoc alterations to fit the data; a more rigorous and generalisable approach is needed.
MCDA in healthcare has generated enough interest that a number of “good practice” guides have
appeared in recent years 5,54,55.
Chapter II.1
36
II. Bayesian synthesis of clinical evidence for benefit-risk assessment
II.1 Background, aims & objectives
II.1.1 Introduction
In medical science, there is a close alliance between decision-making, systematic reviews and meta-
analyses. These are highly related disciplines, each with its own particular emphasis but all relating
to the process of gathering, summarising, and interpreting existing evidence.
Meta-analysis, or evidence synthesis, focuses on the quantitative, statistical aspects of this process.
It is essentially a technique for combining treatment effect estimates from multiple clinical studies to
give an overall “average” estimate. Much of the modern discipline of meta-analysis was pioneered
by Gene Glass in the late twentieth century 56 but its roots in clinical research go back at least as far
as 190457.
The simplest and most familiar form of meta-analysis is known as pairwise meta-analysis because it
focuses on a single pair of treatments. A set of head-to-head studies involving both treatments is
identified, and some relative outcome measure (such as a difference in a continuous outcome, an
odds ratio, or a hazard ratio) is extracted from each study; the overall combined estimate can be
derived in a number of ways but essentially represents an average of the individual study estimates,
weighted by the inverse of their variances.
Researchers have found it necessary to extend/adapt this approach in various ways order to
estimate two or more related parameters that inform a decision. This family of approaches is
sometimes referred to by the umbrella term evidence synthesis. For example, one may wish to
jointly estimate a set of probabilities that inform a decision model, or to combine parameters
estimated from distinct sets of studies 58. Such methods are often employed in health economics -
where clinical outcomes and costs, and their relationships with one another, are to be considered
jointly - and for similar reasons, would appear to be well suited to benefit-risk assessments.
It is often tempting, but dangerous, to perform “evidence synthesis” in a naïve/informal manner.
One common example of poor practice is to pool randomised clinical trials with respect to a binary
event by simply adding up, for each treatment, the patient and event numbers in each trial arm
featuring that treatment (“naïve pooling”). This method, although holding great appeal due to its
simplicity, is not to be recommended. Within-study relative effects are generally more
homogeneous than the absolute baseline level of outcome, which often varies widely between
studies in different groups of patients. Randomisation is used within trials to eliminate the risk of
Chapter II.1
37
confounding the effects of treatment with variations in the baseline outcomes of etween study
populations, 59but naïve pooling derives the relative effect at the between-study level, comparing
groups of patients from different trials on a pooled basis. This defeats the object of randomisation,
which is to ensure that relative effects are only derived from comparisons between groups that
share the same baseline characteristics. Ultimately the naïve pooling approach results in evidence
that is of a similar grade to data from observational studies, in that it may be biased due to
confounding between the effects of treatment and any baseline differences in patient
characteristics59. The magnitude of this bias will depend on the extent of heterogeneity in baseline
outcomes between the source studies, and may be acceptably small if the study populations are very
similar; it can be avoided altogether, however, by using a more principled method to derive the
combined effects.
Formal evidence synthesis methods should be designed to eliminate any such confounding
influences and obtain estimates on a scale that is suitable for comparison 60. This is arguably of
particular importance when the treatment effect estimates are to be fed into quantitative decision
models such as MCDA, as any model is only as reliable as the data it is given. However, the
complicated nature of some benefit-risk assessments can present significant obstacles to evidence
synthesis. In particular, the following factors (many of which were also discussed in the previous
chapter) can be challenging to deal with:
Comparators
There may be a need to compare two or more treatments where no direct head-to-head studies
have been carried out. A simple example of such an indirect comparison is when one wishes to
compare two drugs that have each only been clinically evaluated alongside placebo (or some other
standard treatment), not directly against each other. A naïve comparison of the absolute level of an
outcome in the active arms of both studies will confound the difference between the treatments’
effects with the difference in characteristics between the two study populations. In this simple case,
it is straightforward to adjust the estimate to eliminate this confounding by subtracting the
difference in outcome between the untreated populations, i.e. the two placebo arms. This is
equivalent to comparing the relative effect of treatment (active vs placebo) in the two studies.
Generalising this technique to more complex datasets, and combining direct and indirect evidence to
obtain a coherent set of estimated effects for three or more treatments while avoiding confounding,
requires a more technical level of analysis.
Network meta-analysis (NMA) has established itself as a powerful evidence synthesis technique for
such situations 61-63. The method is a generalisation of standard meta-analysis that accommodates
Chapter II.1
38
both direct and indirect comparisons. This can be represented by means of a network structure,
where each treatment is represented by a node and lines between nodes represent head-to-head
studies, as shown in the examples below. The use of network diagrams such as these is widespread
among applications of network meta-analysis 64.
Figure 2 – Network diagram: pairwise meta-analysis
Pairwise meta-analysis is concerned with a single contrast AB (that is, a comparison between two
treatments A and B). All the evidence is drawn from studies directly comparing the two.
Figure 3 – Network diagram: simple network meta-analysis (i). The numbers given are (estimate, variance) for the treatment contrasts in the direction indicated by the arrows.
When more treatments are involved, there may not be direct evidence on all the treatment
contrasts. Network meta-analysis provides a solution by following the chain of evidence around the
network. In the simple case shown in Figure 3, separate trials provide information on the contrasts
AB and BC; the missing contrast AC can be estimated as the sum AB + BC 60 under the assumption
that the contrasts are expressed on an additive scale, perhaps after a transformation (for example,
taking the logarithm of a contrast expressed on a multiplicative scale). Since AB and BC are
estimated from different trials, and hence independent, var(AC) = var(AB + BC) = var (AB) + var(BC),
and so the indirect comparison has greater uncertainty than the direct evidence – and the greater
the number of steps in the chain, the greater the additional uncertainty. Here, the indirect estimate
of AC has expectation 1+2=3 and variance 1+2=3. Note that even if treatment B is not of direct
interest, its inclusion in the dataset has allowed an estimate of the contrast AC. Furthermore, since
AB and BC are obtained within randomised studies, the estimate AC = AB + BC is itself free of most of
the confounding and bias that may occur in non-randomised comparisons 59. There is an
Chapter II.1
39
assumption, however, that a treatment contrast measured one study is representative of the same
treatment effect in the other study population(s). In other words, the populations are homogeneous
with regard to the relative treatment effects; there are no differences between the populations that
act as effect modifiers.
Figure 4 – Network diagram: simple network meta-analysis (ii). The numbers given are (estimate, standard deviation) for the treatment contrasts in the direction indicated by the arrows.
Sometimes both direct and indirect evidence contributes to a contrast estimate. In the example
shown in Figure 4 there is now information on the previously missing contrast AC (from a head-to-
head trial of A versus C). Attempting to combine the evidence to get a clear picture of the relative
performance of the three treatments is not straightforward since the evidence in some networks
(specifically, networks with closed loops such as that formed by the three treatments in Figure 4)
may show inconsistency. Network meta-analysis aims to resolve any inconsistency that may occur
due to random chance and produce a set of estimates that are entirely consistent with one another.
More systematic inconsistency between the treatment contrasts indicates uneven distribution of
effect modifiers between studies and invalidates the model’s assumptions 65.
Estimating each treatment contrast from the direct evidence alone (in this case, simply using the
numbers shown in Figure 4) is not satisfactory when there is any inconsistency between those
contrasts. It is axiomatic that relative treatment effects should be transitive (XY + YZ = XZ for any X,
Y, Z) and that the (additive) effect of a treatment relative to itself is zero. Consequently it must be
the case that AB + BC + CA = AA = 0; in this example however, using the direct estimates gives AB +
BC + CA = 1 + 2 - 2 = 1. In a similar vein, we can see that for any given contrast there is inconsistency
between the direct and indirect estimates; for AC, say, the direct estimate is 2 and the indirect
estimate is obtained as AB + BC, giving 3.
One might attempt to perform a meta-analysis on each treatment comparison independently,
deriving for each contrast an overall estimate that corresponds to an inverse-variance-weighted
average of the direct and indirect evidence. For example, the combined estimate for AC in the
Chapter II.1
40
example in Figure 4 would be 3
4× (1 × 2 +
1
3× 3) = 2.25. Unfortunately, however, following this
approach for all treatment contrasts in the network will also in general produce treatment effect
estimates that are not consistent with one another (verifying this for the example in Figure 4 is left
as an exercise). The need for consistency means there is mutual dependence among the treatment
effects that is not respected if we estimate them independently of one another.
Network meta-analysis solves this problem by building consistency into the model structure via one
or more consistency equations. The consistency equation for the example in Figure 4 is AC = AB +
BC.
The key insight is that the model does not need an independent parameter for every treatment
contrast; but rather only for a subset of them (the basic parameters). The consistency equations
provide a means of calculating the remaining contrasts (the functional parameters) from the basic
parameters 66. Using the Figure 4 example, it is only necessary to introduce a model parameter for
two of the three contrasts AB, BC and AC; these two are the basic parameters, and the final
(functional) treatment effect parameter is evaluated via the consistency equation. This creates the
necessary dependence among the full set of treatment contrasts. The statistical model is
constructed based on only the basic parameters, which are independent.
Any network structure can be used so long as it is connected 67 (that is, any two nodes in the
network diagram are connected by a chain of one or more studies). Studies involving additional
treatments (beyond those directly relevant to the decision) can sometimes be introduced in order to
connect a disconnected network. For example, a network consisting of the direct comparisons AB ,
BC, AC and DE is disconnected, but introducing a new treatment F allows studies of BF and DF to be
included, resulting in a connected network (Figure 5).
Chapter II.1
41
Figure 5 – A disconnected network involving treatments A, B, C, D and E (top) is made connected by the addition of treatment F (bottom).
The set of basis parameters is not uniquely determined – in principle the modeller has some choice
over which parameters are to be treated as basic and which as functional, subject to certain
constraints 66. In practice the basic parameters are usually chosen to be the set of treatment effects
relative to some chosen reference treatment (often no treatment, placebo or standard of care). This
results in the consistency equations for an arbitrary evidence network taking the form XY = AY – AX
where A is the reference treatment and X, Y are any other treatments.
Constructing the model this way ensures that a consistent set of treatment effects is obtained. The
data themselves, however, may be inconsistent. Any inconsistency in the evidence will ultimately be
reflected in the variance of the treatment effect estimates. Excessive inconsistency in the evidence
network (exceeding that which may occur due to random sampling error) indicates that the assumed
consistency equations do not hold, suggesting that studies are heterogeneous with regard to effect
modifiers and casting doubt on the validity of the analysis. Network meta-analysis thus provides a
principled framework for comparing a set of several treatments based on summary data fron clinical
trials (no data at the individual participant level is required), and therefore provides an ideal starting
point for the models to be explored in this chapter; however, models will be needed that go beyond
standard network meta-analyses.
Few source studies
Unlike typical meta-analyses, many benefit-risk assessments (i.e. those carried out on fairly new
drugs) must rely on a small number of studies for each treatment. Chance imbalances in the
distribution of any effect modifiers are more likely when the number of studies is low, potentially
increasing heterogeneity and inconsistency.. It is all the more important, therefore, to ensure the
Chapter II.1
42
uncertainty of the treatment effect estimates is allowed for in the decision process rather than
relying on deterministic methods.
Multiple outcomes
Benefit-risk assessment is by definition a multivariate problem, with at least one benefit and one risk
to consider. Extending network meta-analyses into the multivariate domain has been rarely
attempted, however. Most published NMAs include only one key outcome of treatment, or perform
separate analyses for several outcomes independently, ignoring any correlations between the
outcomes. It has been noted that this approach is not particularly satisfactory, as correlations are
almost certain to exist and ignoring them will tend to understate the uncertainty in model outputs 68.
Correlations between outcomes can occur at both the within-study and between-study levels. A
within-study correlation means that the outcomes exhibit some mutual dependency as their values
vary from patient to patient in a study; a between-study correlation indicates mutual dependency in
the average values of the outcomes from study to study.
Summary data
Although the availability of individual patient data (IPD) from clinical studies is improving, it cannot
be relied upon, especially for older studies. Furthermore, analyses based on IPD from multiple
studies may be more complex and time-consuming than a regulatory benefit-risk assessment would
allow. This thesis therefore concentrates on clinical data that is summarised by treatment. Clinical
IPD is considered beyond scope; however, where it is available it may provide an alternative
framework for dealing with some of the same issues (eg within-study correlations, heterogeneity).
Limited, sparse or heterogeneously defined data
One practical issue with performing a multivariate evidence synthesis is that not all of the source
studies may have data on every outcome of interest. Furthermore, for some outcomes there may
be several alternative definitions adopted by different studies; this can be extremely frustrating for
reviewers who may end up with a piecemeal collection of outcomes, clearly clinically related to one
another yet too different to be reliably pooled for a meta-analysis.
Datasets with missing outcomes for some treatments, and/or with outcome definitions that are not
consistent from one study to the next, are hereinafter referred to as patchy.
It is not difficult to see that the ability to model a direct relationship between outcomes within a
meta-analysis could be invaluable in terms of maximising the useful information that can be gleaned
from patchy data, and even imputing missing outcomes for certain treatments. Developing such a
model is therefore the focus of this chapter.
Chapter II.1
43
II.1.2 Aim, objectives, scope
The overall aim of this part of the project is to establish a generalised method for Bayesian
multivariate evidence synthesis that is designed with the ability to exploit relationships between
outcomes in patchy datasets, so that the number of treatment effects that can be estimated is
maximised andall treatments can be compared in respect of all outcomes, such as may be required
in a benefit-risk assessment.
The methodology will be based on network meta-analysis. It may sometimes be the case that a
benefit-risk decision only relies on a single study, or a pairwise univariate or multivariate meta-
analysis, but there is no need to dwell on these methods here as they are already well established
(and besides, they are special restricted cases of the more general approach set out here). Instead
the chapter works within the framework of network meta-analysis, concentrating specifically on
issues relating to multiple outcomes and data sparseness which may be present in real-world
treatment decisions. Insofar as these issues are relevant to pairwise meta-analysis, or other
restricted special cases of network meta-analysis, the discussion and results herein will also apply.
It is assumed that the data available consists of arm-level aggregate summaries of randomised trials
unless otherwise stated. In principle observational data could be used, provided that one recognises
and accepts the elevated risk of biases including selection bias and attrition bias in such studies, and
bear in mind that a model is only as good as the data supplied to it.
Specific objectives to be addressed are:
• to find or create a working Bayesian multivariate network meta-analysis model;
• to investigate (and if necessary, develop) the model’s ability to provide the types of
parameter estimates required in benefit-risk assessment such as missing treatment-
outcome combinations and outcomes on absolute scales;
• to develop a fully Bayesian interface between the evidence synthesis model and MCDA
models.
Methods for evaluating inconsistency in evidence networks will not be directly addressed in this
thesis. Nevertheless it remains important to verify that the evidence is consistent when carrying
out network meta-analyses. Methods for assessing inconsistency in univariate networks have
been discussed at some length in the literature 65,66 69 70; future research should aim to extend
these approaches to the multivariate models in this chapter.
Chapter II.1
44
II.1.3 Synopsis of literature
II.1.3.1 Evidence synthesis for benefit-risk assessment
The few existing attempts at evidence synthesis for quantitative benefit-risk assessments have
tended to use either very simple datasets and/or pragmatic approaches that do not strictly follow
the statistical principles underlying evidence synthesis.
A quantitative benefit-risk assessment of treatments for depression 31 used probabilistic simulations
of multiple clinical outcomes but this was limited to data from a single trial, and with no allowance
for outcome correlations, which are likely to exist and may be influential upon the results 40,68. Later
benefit-risk assessments of statins 54 and antidepressants 32 by some of the same authors used
network meta-analysis on multiple outcomes but did not allow for correlations or have to deal with
any gaps in the data.
Caster et al carried out a benefit-risk assessment of methylprednisolone in multiple sclerosis 71, but
obtained their data by analysing individual treatment arms from multiple studies, rather than
focusing on the contrasts between arms within studies. In general such an approach risks
confounding the effects of treatment with characteristics of the study populations, as it sidesteps
randomisation, and thus exposes trial data to many of the same problems as observational data.
Contrast-based NMA models, which calculate treatment effects within studies, respect
randomisation and are therefore preferred.
Among the reports and publications on benefit-risk by the PROTECT initiative 10 were a methodology
review that identified NMA as a potentially useful tool for benefit-risk assessment 14 , and two
applications using contrast-based NMAs to synthesise multiple outcomes for benefit-risk assessment
purposes 53,72. The models, however, included various ad hoc approximations and modifications in
order to patch together the available data at the expense of generalisability and rigour. Again, there
was no allowance for correlations between outcomes.
Beyond the field of benefit-risk assessment, however, a number of principled Bayesian evidence
synthesis methods have been developed, and recently extensions into the multivariate domain have
appeared.
II.1.3.2 Network meta-analysis
Bayesian meta-analysis methods gathered momentum around the turn of the millennium as Gibbs
sampling made such models more practical. Smith et al developed a generalised Bayesian random
effects model for pairwise meta-analysis of binary outcomes based on the log odds ratio 73. Warn et
Chapter II.1
45
al later extended this work 74, developing analogous models based on the relative risk and/or risk
difference.
The concept of network meta-analysis, making use of indirect comparisons between treatments as
well as direct randomised trial evidence, also emerged during this period 59,75,76, and its potential was
quickly recognised. An early version of NMA was proposed by Lumley 77 but arguably the most
successful framework is that of Lu and Ades 60 who showed how the Bayesian meta-analysis models
of Smith et al 73 could be extended to indirect comparisons. This was later developed into a highly
influential canonical framework 78.
Interest in the technique has since accelerated, with numerous applications 61 and adaptations79,80.
It has quickly gained acceptance as an important tool for researchers and modellers, features in
high-profile systematic reviews 81 and is covered in regulatory guidance 82.
II.1.3.3 Multivariate meta-analysis and network meta-analysis
Bayesian bivariate or multivariate (pairwise) meta-analysis has been proposed on a number of
occasions83-85, and recently some multivariate network meta-analysis models have appeared43-45.
Univariate network meta-analysis and (univariate or multivariate) pairwise meta-analysis can be
seen as special cases of multivariate network meta-analysis (with one outcome and two tretaments
respectively), as shown in the Venn diagram in Figure 6.
Figure 6 – Venn diagram illustrating the relationships between various types of meta-analysis model. Multivariate network meta-analysis is the most generalised model, with the other types of model corresponding to special cases.
Chapter II.1
46
Many of these models allow for correlations among outcomes, but only a few models in the
literature define any mappings or other structural relationships between outcomes (“Models with
linked outcomes”, Figure 6) . The models that do not link outcomes in this way can “borrow
strength” in the sense of reducing the posterior variance of the treatment effects, but they cannot
impute unreported treatment-outcome combinations in patchy datasets. To do this one must also
introduce structural relationships between outcomes into the model – in other words, to provide
equations that define explicit mathematical connections between the outcomes.
One approach that can be used is to specify structural relationships that can logically be seen to
follow between specific outcomes in a model, perhaps given some simple assumptions. For
instance, a one-year survival probability 𝑝1 and a two-year survival probability 𝑝2 are clearly related
and can be explicitly linked by the equation 𝑝2 = 𝑝12 assuming a constant hazard. Similar context-
specific relationships have previously been used in multivariate NMAs 86-88.
One particular model for pairwise meta-analysis with structural links between outcomes has
appeared in two published applications 46,47. In each of these datasets the outcomes all represent
the same underlying clinical concept, but are measured using different test instruments which
express the results on different scales. As such the outcomes are assumed to be in strict linear
correspondence with one another at the between-study level (the relationship may be less perfect
at the within-study level due to measurement error) 46,47. In other words, the model links outcomes
with linear mappingsbetween the study-specific treatment effects.
Linking outcomes to one another via a known mathematical relationship may certainly be useful in
multivariate evidence synthesis, but this approach relies on the form of that relationship being
straightforward to identify a priori. On many occasions, however, it may seem likely that outcomes
are related to one another but the precise nature of the relationship may not be clear a priori.
One possible approach in such situations might be to adapt the linear mapping model 46,47 so that
the mappings apply to to the population average treatment effect parameters. Although the
assumption of a proportional linear relationship between outcomes is a strong one, applying the
mappings at the population-average level allows the outcomes to be less than perfectly correlated at
the between-study level in a random effects model, thereby potentially allowing the mappings to be
used for outcomes that are more loosely related. Allowing the mappings to vary between
treatments may provide some additional “wiggle room” and permit such a model to be used where
outcomes do not always occur in exactly the same proportions. A similar use of mappings has been
proposed before and applied to Bayesian meta-analysis of HIV with a view to establishing surrogacy
Chapter II.1
47
relationships between outcomes89. The model was limited to two treatments and fixed effects,
however, and would need to be extended beyond these limitations to be useful as a general tool for
benefit-risk evidence synthesis.
Figure 7 – Pictorial representation of the types of meta-analysis model discussed in this section. Top row: pairwise models for comparing two treatments only. Bottom row: models for use in connected evidence networks (an example network is shown). Each node in the evidence network represents a treatment and lines connecting nodes indicate the existence of head-to-head trial evidence. Different coloured circles represent different outcomes (the multivariate models are here illustrated with three outcomes) and white lines represent structural links between the outcomes for each treatment.
In all current multivariate NMA models the treatment effect parameters in each trial are expressed
relative to a study-specific control treatment – in other words, the baseline for the relative effect
parameters may vary from study to study depending on which treatments are present. This results
in the variance of the treatment outcome in the baseline arm (when the baseline is an active
treatment) being lower than in the other arms. Arguably this may not be a substantial problem in
most circumstances since it concerns only the prior variance, which will only have a significant
impact on the results when actual data is scarce; and also since the main target of inference is the
relative effects, while the baseline outcome is merely a nuisance parameter. Nevertheless it may
still be possible to avoid the issue with an alternative parameterisation. Others have previously
identified these problems with the usual parameterisation 45 but as a solution proposed a model
with a baseline that is not always identifiable from the data, with no allowance for within-study
correlations, and which appears not to converge when applied to the RRMS dataset. The same
authors also develop an alternative “arm-based” parameterisation where the absolute treatment
Chapter II.1
48
outcomes in each study arm are modelled directly; however, this type of model is not favoured here
as it risks confounding the treatment effects with differences in trial sample characteristics 59.
Existing multivariate NMA models also have many important practical limitations, with every model
identified in the literature having at least one of the following restrictions:
• limits on the number/types of treatments and outcomes that can be incorporated;
• model code that is tailored to the dimensions of a particular dataset;
• requirement for the user to specify unwieldy covariance arrays in the data; and/or
• failure to allow for correlations that must or may exist between variables.
Chapter II.2
49
II.2 High level model structure
The evidence synthesis strategy will use the overall model structure depicted within the blue area of
Figure 8, which also shows how this fits together with the other modelling components to be
described in the next chapter (shown in faded tones).
The two key Bayesian models to be covered in this chapter are the treatment effects module, which
extracts and aggregates the relative treatment contrasts from a set of randomised controlled trials,
and the population calibration module, which applies those contrasts to the overall distribution of
outcomes observed across the population in a selected set of studies (perhaps some or all of the
same trials used to estimate the treatment effects, and/or alternative data sources) . This strategy
allows outcomes to be estimated on the absolute scale while also avoiding confounding between
treatment effects and population characteristics, and is often used in health economic models
where the absolute level of outcomes is of key importance90,91
Figure 8 - High-level model structure, focusing on clinical evidence synthesis.
Chapter II.3
50
II.3 Data
II.3.1 Data structure
The models are designed to work with any dataset consisting of a set of randomised controlled trials
with any number of treatment arms and outcome measures.
II.3.1.1 Network-level constants
Let 1, … ,𝑁𝑇 be a set of treatments, where t=1 is the reference treatment relative to which all
the other treatments’ effects are expressed, usually placebo. Let 1, … , be a set of outcomes,
and 1, … ,𝑁𝑆 a set of studies.
II.3.1.2 Study-level constants
For each study 𝑖 ∈ 1,… ,𝑁𝑆 the following constants are taken to be known:
𝑁𝐴𝑖 ∈ 1,2, … the number of treatment groups/arms
𝑁𝑂𝑖 ∈ 1,… , the number of outcomes within study 𝑖 ∈ 1,… ,𝑁𝑆
The treatment arms 𝑘 ∈ 1,… , NA𝑖 and outcomes 𝑗 ∈ 1,… ,𝑁𝑂𝑖 within study i are ordered such
that 𝑡𝑖𝑘 ∈ 1,… ,𝑁𝑇 refers to the treatment in the kth arm and 𝜔𝑖𝑗 ∈ 1,… , refers to the jth
outcome.
For each 𝑘 ∈ 1,… , NA𝑖, 𝑛𝑖𝑘 ∈ 1,2, … refers to the number of patients in the kth treatment arm.
II.3.1.3 Arm/outcome-level data
In all of the models that follow, each outcome 𝑗 within each arm 𝑘 of each study 𝑖 is assumed to
have one of the following likelihoods based on well-known distributions:
• Normal likelihood with the observed within-arm mean and sample variance supplied as
given data and denoted by 𝑦𝑖𝑘𝑗 and 𝑣𝑖𝑘𝑗 respectively. (In practice, the variances may
instead be supplied as standard errors or standard deviations provided that appropriate
transformations are applied to the formulae/code given herein. It is straightforward to
convert between these measures given the patient numbers.) Multivariate normal
distributions will be used to allow for correlations between outcomes, with the correlation
coefficients either estimated in the model or supplied as additional data.
• Binomial likelihood with the observed within-arm number of events (out of 𝑛𝑖𝑘) supplied as
given data and denoted by 𝑦𝑖𝑘𝑗
• Poisson likelihood with the observed within-arm number of events and person-years of
exposure supplied as given data and denoted by 𝑦𝑖𝑘𝑗 and 𝑐𝑖𝑘𝑗 respectively
Chapter II.3
51
Whichever likelihood is used, the mean of the distribution (and/or any higher-level model
parameters on which the mean depends) is the principal unknown quantity regarding which
inferences are to be made. These parameters relate to the underlying population mean for each
outcome and can be estimated via Bayes’ Theorem.
In formal terms, if 𝝋 refers to the set of treatment effect parameters that are the target for
inference, and 𝒚 is the vector of observed values 𝑦𝑖𝑘𝑗, then the likelihood of the data conditional on
𝝋 is 𝑃(𝒚|𝝋) and, by Bayes’ Theorem, the joint posterior distribution of 𝝋 is
𝑃(𝝋|𝒚) ∝ 𝑃(𝒚|𝝋)𝑝(𝝋)
where 𝑝(𝝋) is the joint prior distribution.
II.3.2 Dataset: Relapsing-remitting multiple sclerosis
Multiple sclerosis (MS) is a disease of the immune system characterised by aggressive immune
system action against the myelin insulation of the body’s own neurons. The resulting nerve damage
can cause a variety of physical, sensory and cognitive symptoms 92.
The most common type of MS is relapsing-remitting MS (RRMS). RRMS patients suffer periodic
symptomatic attacks (relapses) together with a more general trend of increased disability
progression over time. No cure for RRMS exists; but there are several disease-modifying therapies
that vary in their effectiveness in reducing the frequency of relapses and delaying clinical disease
progression, and in the nature and frequency of their side effects. For many years the standard first-
line treatments were injectable drugs, but recently a number of oral drugs have appeared on the
market that show potential as first-line therapies, as they can be easily self-administered and are
reasonably well tolerated: dimethyl fumarate, fingolimod, teriflunomide and laquinimod. Many
patients on the older injectable therapies are expected to switch to one of these new drugs 93.
However, coming to firm conclusions as to the relative merits of these treatments is hindered by is a
lack of direct trial evidence.
A recent Cochrane review of RRMS treatments 81 presented a network meta-analysis of dimethyl
fumarate, fingolimod, teriflunomide and laquinimod together with a number of other RRMS
treatments (generally these were either older treatments, or drugs with more safety concerns that
are reserved for more aggressive disease). The review was very thorough in its identification and
evaluation of trials for inclusion but, was limited to two efficacy outcomes (the proportion of
patients avoiding relapse and disability progression) and one safety/acceptability outcome (the
proportion of patients adhering to treatment), and no attempt was made to allow for any
correlations between these outcomes (although the existence of correlations seems likely).
Chapter II.3
52
Furthermore, there were different definitions of the efficacy outcomes among the source studies;
where studies provided two definitions, either only one version was extracted or both were analysed
in separate NMAs. The dataset used in this chapter is based on the studies in the Cochrane review,
but uses an expanded set of outcomes. The treatment options are also restricted here to the
established first-line treatments and second-generation oral drugs specified below, and only at the
usual prescribed dosages. 16 studies in total provided data 94-109, covering 8 active treatments and
placebo.
II.3.2.1 Treatments
Treatment regimens are defined in accordance with the substances, dosages, administration routes
and frequencies shown in Table 4; studies or study arms not meeting these definitions are excluded
from the dataset (see II.3.2.3 for details of the exclusions). Dosages were selected in accordance
with prescribing guidelines.
Table 4 – Treatments in the RRMS case study.
Abbreviation Substance Route of administration Dose & frequency
PL Placebo As appropriate to each study
As appropriate to each study
DF Dimethyl fumarate Oral 240g 2x daily
FM Fingolimod Oral 500g 1x daily
GA Glatiramer acetate Subcutaneous injection 20 g 1x daily
IA (IM) Interferon beta-1a Intramuscular injection 30 g 1x weekly
IA (SC) Interferon beta-1a Subcutaneous injection 44 g 3x weekly
IBB Interferon beta-1b Subcutaneous injection 250g 1x every 2 days
LQ Laquinimod Oral 600g 1x daily
TF Teriflunomide Oral 14mg 1x daily
Figure 9 is a network diagram showing where the studies in the dataset provide direct evidence
comparing the treatments above. A line between two treatments indicates the existence of a head-
to-head trial comparing them; the line’s thickness is proportional to the number of such trials (in this
case the maximum is three head-to-head trials, between placebo and glatiramer acetate). Not all
studies report all outcomes, however, so for any given outcome there may be fewer links in the
network than are shown here. A network diagram for each outcome is provided in Appendix A.1.
Chapter II.3
53
Figure 9 – Network diagram for the RRMS case study (all outcomes combined). The thickness of the links is proportional to the number of studies directly comparing the linked treatments.
II.3.2.2 Outcomes
Definitions of all outcomes used are given below; study data must meet these definitions for
inclusion. The time horizon for all outcomes is 24 months. Only a small number of studies provide
outcomes at other time periods eg 12 or 36 months, and rather than include these it was considered
appropriate to focus on a single universal time horizon in this instance.
1. Annualised relapse rate (ARR): The mean number of relapses per subject per year, where a
relapse is defined as a new episode of significantly worsened neurological symptoms not
attributable to any disease other than multiple sclerosis and separated from previous
relapses by at least 30 days. Some studies required relapses to persist for at least 24 hours,
and other studies 48 hours; data based on either definition was accepted.
2. Relapse-free proportion (RFP): The proportion of subjects without relapses, where a relapse
is defined as above.
3. Proportion experiencing disability progression, confirmed 3 months later: The proportion of
subjects avoiding disability progression, defined as a 1-point increase in the Expanded
Disability Status Scale (EDSS), confirmed on two occasions three months apart. Slight
variations on this definition used by some studies were accepted, whereby the required
EDSS increase is 0.5 if the starting value is above 5 or 5.5 and/or is 1.5 if the starting EDSS is
0.
Chapter II.3
54
4. Proportion experiencing disability progression, confirmed 6 months later (DP6): The
proportion of subjects avoiding disability progression, defined as above but confirmed on
two occasions six months apart.
5. Proportion with ALT above upper limit (ALT1): the proportion of subjects with alanine
aminotransferase levels above the upper limit of the normal range, as revealed by a blood
test at any point within the follow-up period.
6. Proportion with ALT above 3x upper limit (ALT3): the proportion of subjects with alanine
aminotransferase levels above 3x the upper limit of the normal range, as revealed by a blood
test at any point within the follow-up period.
7. Proportion with ALT above 5x upper limit (ALT5): the proportion of subjects with alanine
aminotransferase levels above 5x the upper limit of the normal range, as revealed by a blood
test at any point within the follow-up period.
8. Proportion with serious gastrointestinal disorders (SGI): the proportion of subjects
experiencing, at least once during the follow-up period, any serious adverse event classed as
gastrointestinal or one of the following listed serious adverse events: diarrhoea, nausea,
upper abdominal pain, abdominal pain, gastritis, gastroenteritis, vomiting, abdominal
discomfort, appendicitis.
9. Proportion with serious bradycardia (SBC): the proportion of subjects experiencing at least
one serious adverse event classed as bradycardia during the follow-up period.
10. Proportion with macular edema (MED): the proportion of subjects experiencing at least one
serious adverse event classed as macular edema at any point within the follow-up period.
Outcomes 1-4 are the most commonly encountered measures of relapse and disability progression,
the key indicators of efficacy that are assessed in all RRMS clinical trials. It is not practical however
to adopt a comprehensive set of safety outcomes here for all of the treatments in the dataset, as
this would make the chapter rather unwieldy and difficult to digest. The safety outcomes (5-10)
have therefore been selected with a view to illustrating the methodology rather than making an
exhaustive assessment of the safety profile. The selection aims to include some adverse events
(outcomes 5-8) that occur on multiple treatments, some that have only been observed on one
treatment (outcomes 9-10), and some that are clearly closely related to one another (outcomes 5-7).
From a clinical perspective the selection may appear somewhat arbitrary, with some treatments’
safety profiles better represented than others, and comparing the safety of the featured treatments
purely on the basis of these illustrative results is not recommended.
Figure 10 is a hierarchical diagram showing the outcomes and the criteria they represent.
Chapter II.3
55
Figure 10 - Outcomes for the RRMS case study. Blue cells are the outcomes in the dataset; green cells are decision criteria (i.e. specific benefits and risks) that can be measured by the outcomes below them, and yellow cells represent the broad grouping into benefits and risks. This hierarchical structure will be exploited in some of the models in this chapter.
Treatment effects
Benefits (efficacy)
Reduction in relapses
Relapse rate
Relapse-free proportion
Slowing of disability progression
Proportion progressing; confirmed 3 months
later
Proportion progressing; confirmed 6 months
later
Risks (safety)
ALT elevation
ALT > ULN
ALT > 3 x ULN
ALT > 5 x ULN
Gastrointestinal disorders
Proportion with serious gastrointestinal events
Cardiac disordersProportion with serious
bradycardia
Eye disordersProportion with macular
edema
Chapter II.3
56
II.3.2.3 Source studies
Table 5 – Published trial reports providing data to the RRMS case study.
Name & publication year Number of subjects
Treat-ments
Outcomes
ARR RFP DP3 DP6 ALT1 ALT3 ALT5 SGI SBC MED
1. BRAVO 2014 109 1331 PL, IA
(IM), LQ
2. CONFIRM 2012 100 1072 PL, DF, GA
3. ALLEGRO 2012 97 1106 PL, LQ
4. BECOME 2009 95 75 GA, IB
5. BEYOND 2009 106 1345 GA, IB
6. DEFINE 2012 101 817 PL, DF
7. FREEDOMS 2010 104 843 PL, FM
8. FREEDOMS II 2014 96 713 PL, FM
9. INCOMIN 2002 98 182 IA(IM), IB
10. JOHNSON 1995 103 251 PL, GA
11. MSCRG 1996 102 301 PL, IA(IM)
12. PRISMS 1998 99 376 PL, IA(SC)
13. REGARD 2008 105 756 GA, IA(SC)
14. TEMSO 2011 107 721 PL, TF
15. BORNSTEIN 1987 94 48 PL, GA
16. IFNB 1993 108 227 PL, IB
The following three- or four-arm studies had one treatment arm excluded due to the use of non-
standard dosages:
• CONFIRM 2012 100: study arm receiving 720µg dimethyl fumarate daily
• DEFINE 2012 101: study arm receiving 720µg dimethyl fumarate daily
• FREEDOMS 2010 104: study arm receiving 1.25mg fingolimod daily
• FREEDOMS II 2014 96: study arm receiving 1.25mg fingolimod daily
Chapter II.3
57
• PRISMS 1998 99: study arm receiving 22µg subcutaneous interferon beta-1a three times per
week
• BEYOND 2009 106: study arm receiving 500µg interferon beta-1b every 2 days
• IFNB 1993 108: study arm receiving 500µg interferon beta-1b every 2 days
• TEMSO 2011 107: study arm receiving 7mg teriflunomide daily
Additionally, one two-arm study was excluded altogether because it included a treatment arm
receiving 22µg subcutaneous interferon beta-1a three times per week 110.
II.3.2.4 Extraction
Figures were extracted from the published study reports. Results were approximated visually based
on graphs where possible in the absence of quoted figures.
Annualised relapse rate standard errors were frequently not quoted; the missing values were
imputed based on the mean rate and number of patients (assuming Poisson-distributed relapses,
the variance of the sample mean is equal to the mean divided by the number of patients; later the
sensitivity to the imputation was checked by systematically altering the variances for this outcome,
with negligible impact on the model results).
The source data is tabulated in Appendix A and the BUGS data files are set out in Appendix B.
Chapter II.4
58
II.4 Treatment effects module
II.4.1 Initial (naïve) model: all outcomes independent (Model 0)
A first step towards constructing a true multivariate NMA model is to perform NMA simultaneously
but separately for each outcome in the network. The validity of this approach relies on an implicit
assumption that all outcomes occur independently of one another (at both the within- and between-
study levels). This is unlikely to hold in practice, but provides a simple starting point for getting to
grips with the high-level model structure.
The outcomes to be modelled may be represented by a number of types of variable. Table 6 shows
the most common outcome types encountered in the health sciences, the probability distributions
that are typically used to model the likelihood of observed data, their corresponding Normal
approximations, and the corresponding linear treatment contrasts that are used to compare pairs of
treatments. It is not however intended to be an exhaustive list and other modelling approaches are
available.
Table 6 – Distributions commonly used for modelling clinical outcomes at group level. Domain refers to the range of values taken by the approximate Normal statistic. *after possible transformation to account for skew/kurtosis; may in practice include integer-valued or fractional variables.
Outcome type Sampling distribution
at arm level
Approximate Normal sampling distribution Domain
Between-
arm contrast
Continuous
measurement*
𝑔𝑟𝑜𝑢𝑝 𝑚𝑒𝑎𝑛 𝑌
~𝑁𝑜𝑟𝑚𝑎𝑙(𝜇, 𝑆𝐸2))
𝑌~𝑁𝑜𝑟𝑚𝑎𝑙(𝜇, 𝑆𝐸2) ℝ Difference
(Potentially
recurrent) event
counts
𝑔𝑟𝑜𝑢𝑝 𝑡𝑜𝑡𝑎𝑙 𝑐𝑜𝑢𝑛𝑡 𝑌
~𝑃𝑜𝑖𝑠𝑠𝑜𝑛(𝜇)
log(𝑌)
~𝑁𝑜𝑟𝑚𝑎𝑙(log (𝜇), (1 𝜇⁄ )2)
ℝ Log rate
ratio
Binary outcomes 𝑔𝑟𝑜𝑢𝑝 𝑡𝑜𝑡𝑎𝑙 𝑐𝑜𝑢𝑛𝑡 𝑌
~𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙 (𝑛, 𝑝)
logit(𝑌/𝑛)
~𝑁𝑜𝑟𝑚𝑎𝑙(logit(𝑝), 1 𝑛𝑝 + 1/(𝑛(1 − 𝑝)⁄ )
ℝ
Log odds
ratio
𝑌/𝑛~𝑁𝑜𝑟𝑚𝑎𝑙(𝑝, 𝑛𝑝(1 − 𝑝)) [0,1] Risk
difference
log(𝑌/𝑛)
~𝑁𝑜𝑟𝑚𝑎𝑙(log (𝑝), 1 𝑛𝑝⁄ − 1/𝑛)
(-∞,0] Log relative
risk
Chapter II.4
59
For modelling continuous outcomes at the treatment-arm level it will be assumed that the Normal
distribution (using the sample standard error) is suitable. Whilst this is not the only continuous
distribution that could be employed, its mathematical tractability and well-understood properties
make it an obvious choice. Variables with skew or kurtosis can be handled via data transformations,
meaning that the Normal distribution is flexible enough for most purposes. The natural contrast
between treatment arms is the linear difference, which itself is Normally distributed given a
Normally distributed outcome in each arm.
For count outcomes taking integer values, typical distributions are the Poisson (or sometimes the
Negative Binomial). Alternatively, it is common practice in biostatistics (in Poisson regression, for
example) to model such outcomes by assigning a Normal sampling distribution to the log incidence
rate, with standard error estimated as 1/√(# 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑒𝑣𝑒𝑛𝑡𝑠). The conventional contrast for
count data is the incidence rate ratio, or its logarithm, which is equal to the difference in the log
event rates and can thus also be modelled using a Normal distribution.
For binary outcomes, the Binomial distribution is the natural likelihood for outcomes at the
treatment-arm level. Using this likelihood, various contrasts between treatment arms are possible
and have previously been applied in evidence synthesis models 74,111. The (log) odds ratio tends to
be favoured in most applications due to its mathematical properties; however it is undefined for
proportions of 0 or 100%. A risk difference model may be more appropriate in such circumstances.
The relative risk is an alternative that has been used elsewhere; it is not used here as it offers little
advantage over the odds ratio and risk difference approaches, but in principle a model such as that
used by Warn et al74 could be employed. For the odds-ratio based model, instead of a binomial
likelihood, a normal distribution can be assigned to the log odds in each treatment arm with
standard error estimated by √(1
𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑒𝑠+
1
𝑓𝑎𝑖𝑙𝑢𝑟𝑒𝑠).
Outcome types other than those in Table 6 may sometimes be encountered. Categorical outcomes
(with more than two categories) can be modelled using the multinomial distribution, or as a
combination of Binomial variables. Survival/time-to-event variables will require careful
consideration, as they are normally reported using a proportional hazards model that does not
require specification of an arm-level likelihood. Approaches that may be used for evidence synthesis
of such variables have been described elsewhere 112,113 but are considered beyond the scope of this
thesis.
Whichever likelihood and treatment contrast is adopted, the parameterisation of the treatment
effects follows the same principles, following conventions established in the canon 78:
Chapter II.4
60
• Study-specific “baseline” parameters: the mean of outcome j in arm 1 of study i is denoted
by 𝜇𝑖𝑗
• Basic population-average treatment effect parameters: the mean effect on outcome 𝜔 of
treatment t relative to treatment 1 is denoted by 𝑑𝜔𝑡
• Functional population-average treatment effects: For any two treatments t1 and t2, The
mean effect on outcome 𝜔 for t2 relative to t1 is calculated as 𝑑𝜔𝑡2 − 𝑑𝜔𝑡1 to ensure
consistency (see II.1.1)
• Study-specific random effects: 𝛿𝑖𝑘𝑗 ~ 𝑁( 𝑑𝜔𝑖𝑗𝑡𝑖𝑘 − 𝑑𝜔𝑖𝑗𝑡𝑖1 , 𝜎2) is the (marginal)
distribution of the treatment effect on outcome j in arm k of study i, relative to arm 1
• Treatment effect correlations within multi-arm trials: for a given outcome j, estimates
of 𝛿𝑖𝑘1𝑗 and 𝛿𝑖𝑘2𝑗 are linked by their common baseline where 𝑘1, 𝑘2 > 1 are distinct arms
in study i. A correlation between them therefore needs to be allowed for; this is usually
taken to be 0.5, under the assumption of equal between-study variance across treatment
contrasts 76 60(see the formal specification of the random effects distribution in II.4.2 for an
explanation).
The BUGS code for this model, as applied to the RRMS dataset, is given in Appendix B, illustrating
how the model is put together for continuous, count and binary outcomes. A random effects model
is applied to outcomes ARR, RFP, DP3, DP6, ALT1, ALT3 and ALT5 while a fixed effects model is used
for SGI, SBC and MED (due to the low number of studies contributing data for these outcomes).
II.4.2 Correlated non-zero outcomes (Model 1)
Model 0 incudes all of the outcomes simultaneously, but not jointly. The outcomes are statistically
independent and equivalent results could be achieved by running a series of separate NMAs. This
independent model is convenient but lacks rigour; ignoring correlations in multivariate analyses has
been shown to impact on the treatment effect estimates 40,114,115. The impact will be potentially
even greater on variables defined as functions of several treatment effect parameters (which
essentially describes the MCDA scores that will be constructed later) because the correlations
between parameters then add additional terms to the overall variance.
In terms of specifying the likelihood, it is important to allow for correlations in outcomes at two
levels of the model:
• Between-study correlations: these describe correlations among the random effects 𝛿𝑖𝑘𝑗 , i.e.
they relate to the random variability of the average treatment effect from study to study
(random effects models only)
Chapter II.4
61
• Within-study correlations: these describe correlations among the observed outcomes 𝑦𝑖𝑘𝑗
conditional on 𝛿𝑖𝑘𝑗 , i.e. they relate to random variability at the level of individual subjects
Implementing the within-study correlations in the distribution of Y for binary or count data is
somewhat problematic. Correlated versions of the Binomial, Poisson, and Negative Binomial
distributions are not straightforward to describe mathematically and in any event are not (currently)
supported by BUGS. However, the multivariate Normal distribution does not suffer from these
problems. The Poisson and Binomial distributions can both be approximated by Normal distributions
as set out in the last two columns of Table 6 unless the number of patients is small or the underlying
event rate is close to 0 (or 100%). To induce the correlations these Normal distirbutions can be
combined into a multivariate Normal.
This does however present an issue with some kinds of outcome when values of zero (or 100%) are
observed. Odds, the log rate, and the Normal approximation to the variance of the risk are all not
definable for such values. This issue will be revisited later (see II.4.7) but for now these outcomes
are excluded from the model, which in the context of the RRMS case study means dropping serious
gastrointestinal events, serious bradycardia and macular edema. The remaining outcomes are
transformed to the Normal scale indicated in Table 6.
The NOi-length vector 𝒚𝑖𝑘 (now referring to the transformed outcomes on the Normal scale) in arm
k of study i is thus given a multivariate Normal likelihood:
𝒚𝑖𝑘~ 𝑀𝑉𝑁(𝝁𝑖 + 𝜹𝑖𝑘 , 𝐂𝐕𝑖𝑘 ) (1)
where 𝜹𝑖𝑘 is a vector of length NOi whose elements are the study-specific treatment effects for arm
k relative to arm 1 in respect of outcomes 1 to NOi , 𝝁𝑖 is the study-specific baseline vector also of
length NOi , representing the outcomes in arm 1 of study i, and 𝐂𝐕𝑖𝑘 is the within-study covariance
matrix. We can rewrite (1) to more explicitly show the elements of the mean vector and covariance
matrix:
𝒚𝑖𝑘~ 𝑀𝑉𝑁
(
(
𝜇𝑖1 + 𝛿𝑖𝑘1𝜇𝑖2 + 𝛿𝑖𝑘2
⋮𝜇𝑖𝑁𝑂𝑖 + 𝛿𝑖𝑘𝑁𝑂𝑖
) ,
(
𝑣𝑎𝑟(𝑦𝑖𝑘1) 𝑐𝑜𝑣(𝑦𝑖𝑘1, 𝑦𝑖𝑘2)𝑐𝑜𝑣(𝑦𝑖𝑘1, 𝑦𝑖𝑘2) 𝑣𝑎𝑟(𝑦𝑖𝑘2)
⋯ 𝑐𝑜𝑣(𝑦𝑖𝑘1, 𝑦𝑖𝑘𝑁𝑂𝑖)
⋯ 𝑐𝑜𝑣(𝑦𝑖𝑘2, 𝑦𝑖𝑘𝑁𝑂𝑖)
⋮ ⋮𝑐𝑜𝑣(𝑦𝑖𝑘1, 𝑦𝑖𝑘𝑁𝑂𝑖) 𝑐𝑜𝑣(𝑦𝑖𝑘2, 𝑦𝑖𝑘𝑁𝑂𝑖)
⋱ ⋮⋯ 𝑣𝑎𝑟(𝑦𝑖𝑘𝑁𝑂𝑖) )
)
The diagonal terms of 𝐂𝐕𝑖𝑘 are the empirical within-study outcome variances 𝑣𝑎𝑟(𝑦𝑖𝑘𝑗) which are
provided in the arm-level data from the source studies. Note that this is the variance of the sample
mean, i.e. the squared standard error; if studies instead report sample variances, then the sample
mean variance can be easily obtained by dividing by the number of subjects. (It may be worth
Chapter II.4
62
examining the raw sample variances as a cursory check on the compatibility of the source studies,
however. Sample mean variances may differ greatly in magnitude between studies due to
differences in sample size, but one would normally expect the sample variances to be of a similar
magnitude in homogeneous populations).
The off-diagonal terms, however, cannot be estimated from typical arm-level summary data unless
the studies specifically report them; the approach taken here is to derive the covariances within the
model by using assumed values (or prior distributions) for the within-study correlations. Initially it
will be assumed that these correlations are equal to a fixed value ρw so that 𝑐𝑜𝑣(𝑦𝑖𝑘𝑗1 , 𝑦𝑖𝑘𝑗2) =
𝜌𝑤√𝑣𝑎𝑟(𝑦𝑖𝑘𝑗1)𝑣𝑎𝑟(𝑦𝑖𝑘𝑗2) for all pairs of outcomes 𝑗1, 𝑗2; later this assumption will be relaxed.
Other methods for handling the within-study correlations have also been proposed43,44,116.
By definition 𝛿𝑖𝑗𝑘 = 0 for 𝑘 = 1. For 𝑘 > 1 in the random effects model the δijk are jointly
described by a multivariate normal distribution, which allows for outcomes to be correlated at the
between-study level. Here, however, in addition to the correlations between outcomes, there are
also correlations linking the same outcome in different treatment arms to consider. For a given
outcome j, Estimates of 𝛿𝑖𝑘1𝑗 and 𝛿𝑖𝑘2𝑗 are linked by their common baseline where 𝑘1 , 𝑘2 > 1 are
distinct arms in study i. A correlation between them therefore needs to be allowed for; this is
usually taken to be 0.5, under the assumption of equal between-study variance across treatment
contrasts 76. This follows when one considers the variance of 𝛿𝑖𝑘2𝑗 − 𝛿𝑖𝑘1𝑗 , which by assumption is
𝜎2 (since it is itself a relative treatment contrast within study i) but is also equal to 𝜎2 + 𝜎2 − 2𝜌𝜎2
(by the formula for variance of a difference), implying the correlation coefficient 𝜌 = 0.5. 78. 43,46,47
If the between-study correlation coefficient for different outcomes in the same trial arm is taken to
be ρb, then the correlation coefficient for different outcomes in different trial arms is the product
0.5* ρb 44. This can be seen by considering the covariance between 𝛿𝑖𝑘2𝑗1 − 𝛿𝑖𝑘1𝑗1 and 𝛿𝑖𝑘2𝑗2 −
𝛿𝑖𝑘1𝑗2 for distinct outcomes j1, j2 in distinct arms k 1, k2 of study i. Each of these two expressions is,
by consistency, a treatment contrast within study i. Indeed they are the same treatment contrast
but for outcomes j1, j2 respectively. Therefore the covariance 𝑐𝑜𝑣( 𝛿𝑖𝑘2𝑗1 − 𝛿𝑖𝑘1𝑗1 , 𝛿𝑖𝑘2𝑗2 −
𝛿𝑖𝑘1𝑗2 ) = 𝜌𝑏𝜎2. But, using well-known properties of the covariance,
𝑐𝑜𝑣( 𝛿𝑖𝑘2𝑗1 − 𝛿𝑖𝑘1𝑗1 , 𝛿𝑖𝑘2𝑗2 − 𝛿𝑖𝑘1𝑗2 )
= 𝑐𝑜𝑣( 𝛿𝑖𝑘2𝑗1 , 𝛿𝑖𝑘2𝑗2 ) + 𝑐𝑜𝑣( 𝛿𝑖𝑘1𝑗1 , 𝛿𝑖𝑘1𝑗2 ) − 𝑐𝑜𝑣( 𝛿𝑖𝑘2𝑗1 , 𝛿𝑖𝑘1𝑗2 ) − 𝑐𝑜𝑣( 𝛿𝑖𝑘1𝑗1 , 𝛿𝑖𝑘2𝑗2 )
= 2𝜌𝑏𝜎2 − 𝑐𝑜𝑣( 𝛿𝑖𝑘2𝑗1 , 𝛿𝑖𝑘1𝑗2 ) − 𝑐𝑜𝑣( 𝛿𝑖𝑘1𝑗1 , 𝛿𝑖𝑘2𝑗2 )
Therefore 𝑐𝑜𝑣( 𝛿𝑖𝑘2𝑗1 , 𝛿𝑖𝑘1𝑗2 ) + 𝑐𝑜𝑣( 𝛿𝑖𝑘1𝑗1 , 𝛿𝑖𝑘2𝑗2 ) = 𝜌𝑏𝜎2.
Chapter II.4
63
Under the assumption that the correlation does not depend upon the ordering of the treatments,
𝑐𝑜𝑣( 𝛿𝑖𝑘2𝑗1 , 𝛿𝑖𝑘1𝑗2 ) = 𝑐𝑜𝑣( 𝛿𝑖𝑘1𝑗1 , 𝛿𝑖𝑘2𝑗2 ) = 0. 5𝜌𝑏𝜎2and therefore the correlation coefficient for
different outcomes in different trial arms is 0.5* ρb.
Using the notation 𝑀𝑉𝑁𝑗𝑘 to indicate that the components of the multivariate normal distribution
are indexed over values of both 𝑗 and 𝑘, we have:
𝛅𝐢 ~ 𝐌𝐕𝐍𝐣𝐤(𝐝𝐢𝐑,𝐢 ) (2)
where 𝒅𝑖𝑅
is a vector of length 𝑁𝑂𝑖 × (𝑁𝐴𝑖 − 1) whose elements are 𝑑𝜔𝑖𝑗𝑡𝑖𝑘 − 𝑑𝜔𝑖𝑗𝑡𝑖1 (indexed by
(𝑗, 𝑘) ∈ 1, … , 𝑁𝑂𝑖 × 2, … , 𝑁𝐴𝑖 ) and 𝒊 is a (𝑁𝑂𝑖 × (𝑁𝐴𝑖 − 1)) × (𝑁𝑂𝑖 × (𝑁𝐴𝑖 − 1)) between-study
treatment effects covariance matrix. The diagonal elements of 𝒊 are equal to the random-effects
variance σ2 and the off-diagonal elements are equal to either 0.5σ2 (same 𝑗 different 𝑘, ρbσ2
(different 𝑗 same 𝑘), or 0.5ρbσ2 (different 𝑗 different 𝑘). if we order the elements of 𝜹𝑖 and 𝒅𝑖
lexicographically (advancing through values of j first, then values of k starting at k=2), then (2) can
be written:
(
𝛿𝑖21⋮
𝛿𝑖2𝑁𝑂𝑖𝛿𝑖31⋮
𝛿𝑖3𝑁𝑂𝑖⋮⋮
𝛿𝑖𝑁𝐴𝑖1⋮
𝛿𝑖𝑁𝐴𝑖𝑁𝑂𝑖)
~ 𝑀𝑉𝑁(
(
𝑑𝜔𝑖1𝑡𝑖2 − 𝑑𝜔𝑖1𝑡𝑖1
⋮ 𝑑𝜔𝑖𝑁𝑂𝑖𝑡𝑖2 − 𝑑𝜔𝑖𝑁𝑂𝑖𝑡𝑖1
𝑑𝜔𝑖1𝑡𝑖3 − 𝑑𝜔𝑖1𝑡𝑖1
⋮ 𝑑𝜔𝑖𝑁𝑂𝑖𝑡𝑖3
⋮
− 𝑑𝜔𝑖𝑁𝑂𝑖𝑡𝑖1
⋮ 𝑑𝜔𝑖1𝑡𝑖𝑁𝐴𝑖
− 𝑑𝜔𝑖1𝑡𝑖1
⋮ 𝑑𝜔𝑖𝑁𝑂𝑖𝑡𝑖𝑁𝐴𝑖
− 𝑑𝜔𝑖𝑁𝑂𝑖𝑡𝑖1 )
,𝒊 )
Chapter II.4
64
where𝒊 takes the form
𝒊 = σ2
(
(
1 𝜌𝑏𝜌𝑏 1
⋯ 𝜌𝑏⋯ 𝜌𝑏
⋮ ⋮𝜌𝑏 𝜌𝑏
⋱ ⋮⋯ 1
) (
0.5 0.5𝜌𝑏0.5𝜌𝑏 0.5
⋯ 0.5𝜌𝑏⋯ 0.5𝜌𝑏
⋮ ⋮0.5𝜌𝑏 0.5𝜌𝑏
⋱ ⋮⋯ 0.5
)
(
0.5 0.5𝜌𝑏0.5𝜌𝑏 0.5
⋯ 0.5𝜌𝑏⋯ 0.5𝜌𝑏
⋮ ⋮0.5𝜌𝑏 0.5𝜌𝑏
⋱ ⋮⋯ 0.5
) (
1 𝜌𝑏𝜌𝑏 1
⋯ 𝜌𝑏⋯ 𝜌𝑏
⋮ ⋮𝜌𝑏 𝜌𝑏
⋱ ⋮⋯ 1
)
⋯ (
0.5 0.5𝜌𝑏0.5𝜌𝑏 0.5
⋯ 0.5𝜌𝑏⋯ 0.5𝜌𝑏
⋮ ⋮0.5𝜌𝑏 0.5𝜌𝑏
⋱ ⋮⋯ 0.5
)
⋯ (
0.5 0.5𝜌𝑏0.5𝜌𝑏 0.5
⋯ 0.5𝜌𝑏⋯ 0.5𝜌𝑏
⋮ ⋮0.5𝜌𝑏 0.5𝜌𝑏
⋱ ⋮⋯ 0.5
)
⋮ ⋯
(
0.5 0.5𝜌𝑏0.5𝜌𝑏 0.5
⋯ 0.5𝜌𝑏⋯ 0.5𝜌𝑏
⋮ ⋮0.5𝜌𝑏 0.5𝜌𝑏
⋱ ⋮⋯ 0.5
) (
0.5 0.5𝜌𝑏0.5𝜌𝑏 0.5
⋯ 0.5𝜌𝑏⋯ 0.5𝜌𝑏
⋮ ⋮0.5𝜌𝑏 0.5𝜌𝑏
⋱ ⋮⋯ 0.5
)
⋱ ⋮
⋯ (
1 𝜌𝑏𝜌𝑏 1
⋯ 𝜌𝑏⋯ 𝜌𝑏
⋮ ⋮𝜌𝑏 𝜌𝑏
⋱ ⋮⋯ 1
)
)
consisting of (𝑁𝐴𝑖 − 1) × (𝑁𝐴𝑖 − 1) sub-matrices with compound symmetry, each of size 𝑁𝑂𝑖 ×𝑁𝑂𝑖.
To pick out an element 𝒊[(𝑗1, 𝑘1), (𝑗2, 𝑘2)], 𝑘1 − 1 and 𝑘2 − 1 give the row and column coordinates
of the relevant sub-matrix and 𝑗1 and 𝑗2 give the row and column coordinates of the relevant
element within the sub-matrix.)The corresponding fixed effect model is obtained by replacing 𝛿𝑖𝑗𝑘
with its mean according to the distribution above.
II.4.3 Contrast-level data (Model 1*)
Instead of using the full arm-level data, the model can also be expressed in terms of contrast-level
data relative to the first (baseline) trial arm. In this alternative formulation, the data supplied to the
model consists of the contrasts 𝑦𝑖𝑘𝑗𝑐 = 𝑦𝑖𝑘𝑗 − 𝑦𝑖1𝑗 (𝑘 ∈ 2,… , NA𝑖) together with their estimated
variances, standard deviations or standard errors. This version eliminates the need for the
parameters 𝜇𝑖𝑗 as the mean of 𝑦𝑖𝑘𝑗𝑐 is simply equal to 𝛿𝑖𝑘𝑗, the difference between the means of 𝑦𝑖𝑘𝑗
and 𝑦𝑖1𝑗 in the arm-level model.
It is necessary however to allow for additional correlations between the contrasts in trials with more
than two treatment arms: 𝑦𝑖𝑘1𝑗𝑐 and 𝑦𝑖𝑘2𝑗
𝑐 will be correlated (for 𝑘1, 𝑘2 ∈ 2,… , NA𝑖) since they
both depend on 𝑦𝑖1𝑗, the outcome in the baseline arm. It has been shown114 that the covariance is
equal to the sampling variance of 𝑦𝑖1𝑗given by 𝑣𝑎𝑟(𝑦𝑖1𝑗) 𝑛𝑖1⁄ . If arms 𝑘1 , 𝑘2 have roughly the same
number of patients (i.e. 𝑛𝑖𝑘1 ≈ 𝑛𝑖𝑘2) and the outcome variance is assumed to be equal across trial
arms (i.e. 𝑣𝑎𝑟(𝑦𝑖1𝑗) ≈ 𝑣𝑎𝑟(𝑦𝑖𝑘1𝑗) ≈ 𝑣𝑎𝑟(𝑦𝑖𝑘2𝑗), then the correlation coefficient between the
sampling distributions of 𝑦𝑖𝑘1𝑗𝑐 and 𝑦𝑖𝑘2𝑗
𝑐 is given by
[𝑣𝑎𝑟(𝑦𝑖1𝑗) 𝑛𝑖1⁄ ] 𝑆𝐸(𝑦𝑖𝑘2𝑗𝑐 )𝑆𝐸(𝑦𝑖𝑘2𝑗
𝑐 )⁄
≈ [𝑣𝑎𝑟(𝑦𝑖1𝑗) 𝑛𝑖1⁄ ] 𝑆𝐸(𝑦𝑖𝑘2𝑗𝑐 )2⁄
Chapter II.4
65
≈ [𝑣𝑎𝑟(𝑦𝑖1𝑗) 𝑛𝑖1⁄ ] [𝑣𝑎𝑟(𝑦𝑖1𝑗) 𝑛𝑖1⁄ + 𝑣𝑎𝑟(𝑦𝑖𝑘𝑗) 𝑛𝑖𝑘⁄ ]⁄ ≈ 𝑛𝑖𝑘 (𝑛𝑖1⁄ + 𝑛𝑖𝑘) where 𝑘 = 𝑘1 or 𝑘2.
For trials where patients are randomised to treatment arms in equal proportions, this correlation
coefficient is simply equal to 0.5.
In all other respects the model for contrast-level data is defined identically to the arm-level version,
and the two models give equivalent results. It will not be used in this chapter, but code and data are
available in Appendix B.
II.4.4 BUGS coding via variance decomposition
In many existing Bayesian MCMC software packages (such as WinBUGS and OpenBUGS as used in
this thesis, or JAGS), implementing an indexed multivariate normal distribution with arbitrary
dimensions poses difficulties, and hence all multivariate NMAs to date have to some extent “hard-
coded” the model to be specific to the dimensions of the dataset. It is possible however to replace
the multivariate Normal with a combination of univariate Normals that can be coded in arbitrary
dimensions by exploiting the structure of the covariance matrix . This technique may not be
required if alternative software with more flexible provision for multivariate distributions is used, in
which case this section can be skipped and the multivariate distributions specified as already
described.
The key insight here is that the correlation coefficient 𝜌 between two Normally distributed variables
A and B (both with variance σ2) can be interpreted as the proportion of A’s variability that is shared
with B (i.e. their covariance), and vice versa. This follows since 𝑐𝑜𝑣(𝐴, 𝐵) = 𝜌𝜎2. The remaining
part of A’s variance is (1 − 𝜌)𝜎2, which it experiences independently of B (and B experiences the
same amount of variability independently of A). If we think of A as a mean plus a Normal random
term,
𝐴 = 𝑚𝑒𝑎𝑛𝐴 + 휀𝐴 휀𝐴 ~ 𝑁(0, 𝜎2)
Then we can rewrite this to partition the random term into two separate independent Normal
variables
𝐴 = 𝑚𝑒𝑎𝑛𝐴 + 휀𝐴∗ + 휀𝐴𝐵 휀𝐴∗ ~ 𝑁(0, (1 − 𝜌)𝜎2), 휀𝐴𝐵 ~ 𝑁(0, 𝜌𝜎
2)
(it should be easy to see that 휀𝐴∗ + 휀𝐴𝐵 has the same distribution as 휀𝐴 due to standard properties
of Normal distributions). B can be written similarly as
𝐵 = 𝑚𝑒𝑎𝑛𝐵 + 휀𝐵∗ + 휀𝐴𝐵 휀𝐵∗ ~ 𝑁(0, (1 − 𝜌)𝜎2)
Chapter II.4
66
Thus 휀𝐴𝐵 explicitly represents the portion of variability that is shared between A and B, and
𝑐𝑜𝑣(𝐴, 𝐵) = 𝑣𝑎𝑟(휀𝐴𝐵) = 𝜌𝜎2.
This approach partitions the variance of A and B into shared and independent components. This
allows the multivariate distribution of A and B to be constructed by specifying the mutually
independent variables 휀𝐴∗, 휀𝐵∗ and 휀𝐴𝐵 together with the correlation coefficient 𝜌.
The multivariate distributions of 𝒚𝑖𝑘 and 𝛅i shown in (1) and (2) have a slightly more complex
correlation structure than the variables A and B in this simple example, but the same technique of
partitioning the variance into shared and independent components can be used to construct the
required multivariate distributions as combinations of mutually independent variables, as shown in
the following subsections.
II.4.4.1.1 Constant non-negative correlation
Assuming for now that ρb is constant for all pairs of outcomes and 0 ≤ ρb ≤ 1, the definition of
implies that the variance σ2 of 𝛿𝑖𝑗𝑘 can be partitioned into the following components:
(i) 0.5ρbσ2 is shared as covariance with 𝛿𝑖𝑗`𝑘` for all 𝑗`, 𝑘`;
(ii) an additional (ρb − 0.5ρb)σ2 = 0.5ρbσ
2 is shared as covariance with 𝛿𝑖𝑗`𝑘 for all 𝑗` ;
(iii) an additional (0.5 − 0.5ρb)σ2 is shared as covariance with 𝛿𝑖𝑗𝑘` for all 𝑘` ; and
(iv) a remaining σ2 − 0.5ρbσ2 − (0.5 − 0.5ρb)σ
2 − 0.5ρbσ2 = 0.5σ2 − 0.5ρbσ
2 is unique to
𝛿𝑖𝑗𝑘.
This allows the multivariate normal distribution to be expressed as a combination of independent
univariate normal distributions. (2) is equivalent to
𝛿𝑖𝑗𝑘~ 𝑁(𝑑𝜔𝑖𝑗𝑡𝑖𝑘𝑅 + 𝐸𝑖 + 𝐹𝑖𝑘 + 𝐺𝑖𝑗, (0.5 − 0.5ρb)σ
2)
where 𝐸𝑖~ 𝑁(0, 0.5ρbσ2) corresponds to covariance (i), 𝐹𝑖𝑘~ 𝑁(0, (ρb − 0.5ρb)σ
2) corresponds to
covariance (ii), 𝐺𝑖𝑗~ 𝑁(0, (0.5 − 0.5ρb)σ2) corresponds to covariance (iii), and the remaining
variance (0.5 − 0.5ρb)σ2 is the final component (iv).
Equivalently, one can let 𝐸𝑖, 𝐹𝑖𝑘, 𝐺𝑖𝑗 ~ 𝑁(0,1) and rescale to the appropriate standard deviation (i.e.
multiply by √(0.5ρbσ2), etc) within the definition of 𝛿𝑖𝑗𝑘. This allows the variance σ2 to vary by arm
or by outcome, if desired.
Chapter II.4
67
This is a particular example of the result in Theorem 1. The i subscript does not contribute to the
theorem and can be ignored if preferred, but has been left in so that the notation matches the rest
of this chapter.
Theorem 1: If 𝜽𝑖 is a given vector of length 𝑁𝑂𝑖 ×𝑁𝐴𝑖 (indexed by (𝑗, 𝑘) ∈ 1, … , 𝑁𝑂𝑖 × 1, … , 𝑁𝐴𝑖
) and 𝒊 is a covariance matrix of size (𝑁𝑂𝑖 × 𝑁𝐴𝑖) × (𝑁𝑂𝑖 × 𝑁𝐴𝑖) where each element
𝒊[(𝑗1, 𝑘1), (𝑗2, 𝑘2)] is defined as follows:
𝒊[(𝑗1, 𝑘1)(𝑗2, 𝑘2)] =
𝜎𝑖𝑗𝑘
2 𝑗1 = 𝑗2 = 𝑗 , 𝑘1 = 𝑘2 = 𝑘
𝑟𝜎𝑖𝑗𝑘1𝜎𝑖𝑗𝑘2 𝑗1 = 𝑗2 = 𝑗 , 𝑘1 ≠ 𝑘2 𝜌𝜎𝑖𝑗1𝑘𝜎𝑖𝑗2𝑘 𝑗1 ≠ 𝑗2 , 𝑘1 = 𝑘2 = 𝑘
𝑟𝜌𝜎𝑖𝑗1𝑘1𝜎𝑖𝑗2𝑘2 𝑗1 ≠ 𝑗2 , 𝑘1 ≠ 𝑘2
𝑗1, 𝑗2 ∈ 1,… ,𝑁𝑂𝑖
𝑘1, 𝑘2 ∈ 1,… , 𝑁𝐴𝑖
then 𝜹𝑖 ~ 𝑀𝑉𝑁(𝜽𝑖 ,𝒊) is equivalent to
𝛿𝑖𝑗𝑘~ 𝑁( 𝜃𝑖𝑗𝑘 +√𝑟𝜌𝜎𝑖𝑗𝑘𝐸𝑖 + √(𝜌 − 𝑟𝜌)𝜎𝑖𝑗𝑘𝐹𝑖𝑘 + √(𝑟 − 𝑟𝜌)𝜎𝑖𝑗𝑘𝐺𝑖𝑗, (1 − 𝑟 − 𝜌 + 𝑟𝜌)σ𝑖𝑗𝑘2 )
where 𝐸𝑖 , 𝐹𝑖𝑘 , 𝐺𝑖𝑗~ 𝑁(0, 1) 𝑖. 𝑖. 𝑑.
(Note that the variance (1 − 𝑟 − 𝜌 + 𝑟𝜌)σ𝑖𝑗𝑘2 is guaranteed to be nonnegative for 𝑟, 𝜌 ∈ [0,1].)
Proof:
Given the latter specification, it is clear that the marginal distribution of each 𝛿𝑖𝑗𝑘 is Normal with a
mean of 𝑑𝑖𝑗𝑘 as required.
Any linear combination of the 𝛿𝑖𝑗𝑘 is also clearly Normal and so the 𝛿𝑖𝑗𝑘 are jointly Normally
distributed.
It is therefore necessary simply to verify that the variances and covariances among the 𝛿𝑖𝑗𝑘 are equal
to the corresponding elements of 𝒊.
𝑣𝑎𝑟(𝛿𝑖𝑗𝑘 = 𝑟𝜌𝜎𝑖𝑗𝑘2 + (𝜌 − 𝑟𝜌)𝜎𝑖𝑗𝑘
2 + (𝑟 − 𝑟𝜌)𝜎𝑖𝑗𝑘2 + (1 − 𝑟 − 𝜌 + 𝑟𝜌)𝜎𝑖𝑗𝑘
2
= 𝜎𝑖𝑗𝑘2 as required.
𝑐𝑜𝑣(𝛿𝑖𝑗1𝑘,𝛿𝑖𝑗2𝑘) = 𝑟𝜌𝜎𝑖𝑗1𝑘𝜎𝑖𝑗2𝑘𝑐𝑜𝑣(𝐸𝑖 , 𝐸𝑖) + (𝜌 − 𝑟𝜌)𝜎𝑖𝑗1𝑘𝜎𝑖𝑗2𝑘𝑐𝑜𝑣(𝐹𝑖𝑘 , 𝐹𝑖𝑘) = 𝜌𝜎𝑖𝑗1𝑘𝜎𝑖𝑗2𝑘 as
required.
𝑐𝑜𝑣(𝛿𝑖𝑗𝑘1,𝛿𝑖𝑗𝑘2) = 𝑟𝜌𝜎𝑖𝑗𝑘1𝜎𝑖𝑗𝑘2𝑐𝑜𝑣(𝐸𝑖 , 𝐸𝑖) + (𝑟 − 𝑟𝜌)𝜎𝑖𝑗𝑘1𝜎𝑖𝑗𝑘2𝑐𝑜𝑣(𝐺𝑖𝑗, 𝐺𝑖𝑗) = 𝑟𝜎𝑖𝑗𝑘1𝜎𝑖𝑗𝑘2 as
required.
𝑐𝑜𝑣(𝛿𝑖𝑗1𝑘1,𝛿𝑖𝑗2𝑘2) = 𝑟𝜌𝜎𝑖𝑗1𝑘1𝜎𝑖𝑗2𝑘2𝑐𝑜𝑣(𝐸𝑖 , 𝐸𝑖) = 𝑟𝜌𝜎𝑖𝑗1𝑘1𝜎𝑖𝑗2𝑘2 as required.
Chapter II.4
68
II.4.4.1.2 Examples of this decomposition in the evidence synthesis model
The theorem above makes it unnecessary to explicitly specify a multivariate normal distribution in
BUGS when writing a multivariate NMA model, provided that the assumed correlation structure can
be taken to hold.
Substituting 𝜽𝑖 = 𝒅𝑖𝑅, 𝜎𝑖𝑗𝑘 = 𝜎𝑖 , 𝜌 = 𝜌𝑏 , r = 0.5 gives the distribution of the study-specific
treatment effects 𝛿𝑖𝑗𝑘 .
Substituting 𝜽𝑖 = 𝝁𝑖 + 𝜹𝑖𝑘 , 𝜎𝑖𝑗𝑘 = √𝑣𝑖𝑘𝑗 , 𝜌 = 𝜌𝑤 , r = 0 gives the distribution of the observed
outcomes 𝑦𝑖𝑘𝑗 in the model for arm-level data.
Substituting 𝜽𝑖 = 𝝁𝑖 + 𝜹𝑖𝑘 , 𝜎𝑖𝑗𝑘 = √𝑣𝑖𝑘𝑗 , 𝜌 = 𝜌𝑤 , r = 𝑛𝑖𝑘 (𝑛𝑖1 + 𝑛𝑖𝑘)⁄ gives the distribution
of the observed outcomes 𝑦𝑖𝑘𝑗 in the model for contrast-level data under the assumption that 𝑛𝑖𝑘
is broadly equal for all 𝑘 ∈ 2,… , NA𝑖 and that the underlying outcome variance is broadly equal in
all treatment arms.
II.4.4.1.3 A more general correlation structure
The construction of the random effects multivariate normal distribution used above assumes a
universal non-negative correlation coefficient between all outcome pairs. By adapting the
construction slightly, it is possible to incorporate a broader class of covariance structures, with
correlation coefficients that can vary (both in sign and magnitude) between outcome pairs. The
universal correlation coefficient 𝜌 (at the within- or between-study level) is replaced by a vector
across the set of outcomes 1, … , , with a somewhat altered interpretation, as detailed in
Theorem 2 below. Again, the i subscript does not contribute to the theorem and can be ignored if
preferred, but has been left in so that the notation matches the rest of this chapter.
Theorem 2:
Let 𝝆 = 𝜌1,… , 𝜌 be a vector whose elements lie in the interval [−1,1], and let
𝛿𝑖𝑗𝑘~ 𝑁( 𝜃𝑖𝑗𝑘 + 𝑠𝑖𝑔𝑛(𝜌𝑗)√(𝑟|𝜌𝑗|)𝜎𝑖𝑗𝑘𝐸𝑖 + 𝑠𝑖𝑔𝑛(𝜌𝑗)√(|𝜌𝑗| − 𝑟|𝜌𝑗|)𝜎𝑖𝑗𝑘𝐹𝑖𝑘
+𝑠𝑖𝑔𝑛(𝜌𝑗)√(𝑟 − 𝑟|𝜌𝑗|)𝜎𝑖𝑗𝑘𝐺𝑖𝑗, (1 − 𝑟 − |𝜌𝑗| + 𝑟|𝜌𝑗|)σ𝑖𝑗𝑘2 )
where 𝐸𝑖 , 𝐹𝑖𝑘 , 𝐺𝑖𝑗~ 𝑁(0, 1) 𝑖. 𝑖. 𝑑. and 𝑠𝑖𝑔𝑛(𝑥) = 𝑥/|𝑥|
noting that the variance (1 − 𝑟 − |𝜌𝑗| + 𝑟|𝜌𝑗|)σ𝑖𝑗𝑘2 is still guaranteed to be nonnegative for 𝜌𝑗 ∈
[−1,1], 𝑟 ∈ [0,1].
Chapter II.4
69
This is equivalent to the multivariate normal distribution 𝜹𝑖 ~ 𝑀𝑉𝑁(𝜽𝑖 ,𝒊) where the covariance
matrix 𝒊 is now defined as follows:
𝒊[(𝑗1, 𝑘1)(𝑗2, 𝑘2)] =
𝜎𝑖𝑗𝑘
2 𝑗1 = 𝑗2 = 𝑗 , 𝑘1 = 𝑘2 = 𝑘
𝑟𝜎𝑖𝑗1𝑘𝜎𝑖𝑗2𝑘 𝑗1 = 𝑗2 = 𝑗 , 𝑘1 ≠ 𝑘2
𝑠𝑖𝑔𝑛(𝜌𝑗1)𝑠𝑖𝑔𝑛(𝜌𝑗2)√(|𝜌𝑗1𝜌𝑗2|)𝜎𝑖𝑗1𝑘𝜎𝑖𝑗2𝑘 𝑗1 ≠ 𝑗2 , 𝑘1 = 𝑘2 = 𝑘
𝑠𝑖𝑔𝑛(𝜌𝑗1)𝑠𝑖𝑔𝑛(𝜌𝑗2)𝑟√(|𝜌𝑗1𝜌𝑗2|)𝜎𝑖𝑗1𝑘𝜎𝑖𝑗2𝑘 𝑗1 ≠ 𝑗2 , 𝑘1 ≠ 𝑘2
Proof: again it is sufficient to verify the elements of the covariance matrix.
𝑣𝑎𝑟(𝛿𝑖𝑗𝑘) = 𝑟|𝜌𝑗|𝜎𝑖𝑗𝑘2 + (|𝜌𝑗| − 𝑟|𝜌𝑗|)𝜎𝑖𝑗𝑘
2 + (𝑟 − 𝑟|𝜌𝑗|)𝜎𝑖𝑗𝑘2 + (1 − 𝑟 − |𝜌𝑗| + 𝑟|𝜌𝑗|)𝜎𝑖𝑗𝑘
2
= 𝜎𝑖𝑗𝑘2 as required.
𝑐𝑜𝑣(𝛿𝑖𝑗1𝑘,𝛿𝑖𝑗2𝑘) = 𝑠𝑖𝑔𝑛(𝜌𝑗1)𝑠𝑖𝑔𝑛(𝜌𝑗2)𝑟 √(|𝜌𝑗1𝜌𝑗2|)𝜎𝑖𝑗1𝑘𝜎𝑖𝑗2𝑘
+ 𝑠𝑖𝑔𝑛(𝜌𝑗1)𝑠𝑖𝑔𝑛(𝜌𝑗2)√((|𝜌𝑗1| − 𝑟|𝜌𝑗1|)(|𝜌𝑗2| − 𝑟|𝜌𝑗2|))𝜎𝑖𝑗1𝑘𝜎𝑖𝑗2𝑘
= 𝑠𝑖𝑔𝑛(𝜌𝑗1)𝑠𝑖𝑔𝑛(𝜌𝑗2)𝜎𝑖𝑗1𝑘𝜎𝑖𝑗2𝑘 (𝑟 √|𝜌𝑗1𝜌𝑗2| + (1 − 𝑟)√|𝜌𝑗1𝜌𝑗2|)
= 𝑠𝑖𝑔𝑛(𝜌𝑗1)𝑠𝑖𝑔𝑛(𝜌𝑗2)√(|𝜌𝑗1𝜌𝑗2|)𝜎𝑖𝑗1𝑘𝜎𝑖𝑗2𝑘 as required.
𝑐𝑜𝑣(𝛿𝑖𝑗𝑘1,𝛿𝑖𝑗𝑘2) = 𝑟|𝜌𝑗|𝜎𝑖𝑗𝑘1𝜎𝑖𝑗𝑘2 + (𝑟 − 𝑟|𝜌𝑗|)𝜎𝑖𝑗𝑘1𝜎𝑖𝑗𝑘2 = 𝑟𝜎𝑖𝑗𝑘1𝜎𝑖𝑗𝑘2 as required.
𝑐𝑜𝑣(𝛿𝑖𝑗1𝑘1,𝛿𝑖𝑗2𝑘2) = 𝑠𝑖𝑔𝑛(𝜌𝑗1)𝑠𝑖𝑔𝑛(𝜌𝑗2)𝑟 √(|𝜌𝑗1𝜌𝑗2|)𝜎𝑖𝑗1𝑘1𝜎𝑖𝑗2𝑘2 as required.
In this version of the model, the correlation between outcomes 𝜔1 and 𝜔2 in the same trial arm is
equal to sign(ρ𝑗1)sign(ρ𝑗2)√|ρ𝑗1ρ𝑗2|. The parameter ρ𝑗 is no longer strictly a correlation coefficient,
but can be thought of as the propensity of outcome j to correlate with other outcomes. In terms of
magnitude the correlation between elements 𝛿𝑖𝑗1𝑘 and 𝛿𝑖𝑗2𝑘 is the geometric mean of the
correlation propensities ρ𝑗1 and ρ𝑗2 , with positive sign if the signs of ρ𝑗1 and ρ𝑗2match, and negative
if they do not. This results in a class of correlation structures that have a particular kind of symmetry
in the sense that each outcome blindly shares its “correlation propensity” with all other outcomes
equally, with any difference in the correlation coefficients being due to their own respective
propensities; an outcome cannot selectively favour any particular others for correlation. In
Chapter II.4
70
particular, if outcomes j1 and j2 are uncorrelated, then either ρ𝑗1 = 0 or ρ𝑗2 = 0, and so at least one
of them must be uncorrelated with every outcome in the model.
This structure permits the use of negative correlations between certain outcomes if desired, but
does not allow all outcomes to be negatively correlated with one another (except in the case with
only two outcomes); rather, the outcomes are partitioned according to the sign of ρ𝑗 into two sets
with positive intra-set correlations and negative inter-set correlations. This is expected to be flexible
enough for most purposes: the scenario where all pairs of outcomes are negatively correlated seems
somewhat improbable, and in any case it is always possible to reverse the sign of an outcome
variable and hence also reverse its correlations, if desired.
Although this structure places restrictions on the space of possible covariance matrices, one
advantage of this is that it is a sufficient (but not necessary) condition for the covariance matrix to be
positive-definite, which is a fundamental requirement of a multivariate Normal distribution. This
result is formalised in the theorem below. Once again, the i subscript plays no part in the theorem
and can be ignored if preferred, but has been left in so that the notation matches the rest of this
chapter.
Theorem 3: A matrix that can be written in the form 𝒊 as defined in Theorem 2 is always positive-
definite, but the converse does not hold.
Proof: First, suppose that we have a matrix 𝒊 as defined.
For clarity, drop the subscript i and define 𝑅𝑗 = 𝑠𝑖𝑔𝑛(𝜌𝑗)√(|𝜌𝑗|). Note that 𝑅𝑗2 = |𝜌𝑗|.
The elements of the multivariate normal distribution are indexed by pairs (𝑗, 𝑘) ∈ 1,… ,𝑁𝑂 ×
1,… ,𝑁𝐴. Ordering these lexicographically (advancing through values of j first, then values of k)
gives the following form for (shown landscape overleaf due to size):
Chapter II.4
71
=
(
(
𝜎112 𝑅1𝑅2𝜎11𝜎21 ⋯ 𝑅1𝑅𝑁𝑂𝜎11𝜎𝑁𝑂,1
𝑅1𝑅2𝜎11𝜎21 𝜎212
⋮ ⋱ ⋮𝑅1𝑅𝑁𝑂𝜎11𝜎𝑁𝑂,1 ⋯ 𝜎𝑁𝑂,1
2)
(
𝑟𝜎11𝜎12 𝑟𝑅1𝑅2𝜎12𝜎22 ⋯ 𝑟𝑅1𝑅𝑁𝑂𝜎1,2𝜎𝑁𝑂,2𝑟𝑅1𝑅2𝜎12𝜎22 𝑟𝜎21𝜎22
⋮ ⋱ ⋮𝑟𝑅1𝑅𝑁𝑂𝜎1,2𝜎𝑁𝑂,2 ⋯ 𝑟𝜎𝑁𝑂,1𝜎𝑁𝑂,2 )
⋯
(
𝑟𝜎11𝜎1𝑁𝐴 𝑟𝑅1𝑅2𝜎1,𝑁𝐴𝜎2,𝑁𝐴 ⋯ 𝑟𝑅1𝑅𝑁𝑂𝜎1,𝑁𝐴𝜎𝑁𝑂,𝑁𝐴𝑟𝑅1𝑅2𝜎1,𝑁𝐴𝜎2,𝑁𝐴 𝑟𝜎2,𝑁𝐴𝜎2,𝑁𝐴
⋮ ⋱ ⋮𝑟𝑅1𝑅𝑁𝑂𝜎1,𝑁𝐴𝜎𝑁𝑂,𝑁𝐴 ⋯ 𝑟𝜎𝑁𝑂,1𝜎𝑁𝑂,𝑁𝐴 )
(
𝑟𝜎11𝜎12 𝑟𝑅1𝑅2𝜎12𝜎22 ⋯ 𝑟𝑅1𝑅𝑁𝑂𝜎1,2𝜎𝑁𝑂,2𝑟𝑅1𝑅2𝜎12𝜎22 𝑟𝜎21𝜎22
⋮ ⋱ ⋮𝑟𝑅1𝑅𝑁𝑂𝜎1,2𝜎𝑁𝑂,2 ⋯ 𝑟𝜎𝑁𝑂,1𝜎𝑁𝑂,2 )
(
𝜎122 𝑅1𝑅2𝜎12𝜎22 ⋯ 𝑅1𝑅𝑁𝑂𝜎12𝜎𝑁𝑂,2
𝑅1𝑅2𝜎12𝜎22 𝜎222
⋮ ⋱ ⋮𝑅1𝑅𝑁𝑂𝜎12𝜎𝑁𝑂,2 ⋯ 𝜎𝑁𝑂,2
2)
⋮
⋮ ⋱
(
𝑟𝜎11𝜎1𝑁𝐴 𝑅1𝑅2𝜎1,𝑁𝐴𝜎2,𝑁𝐴 ⋯ 𝑅1𝑅𝑁𝑂𝜎1,𝑁𝐴𝜎𝑁𝑂,𝑁𝐴𝑟𝑅1𝑅2𝜎1,𝑁𝐴𝜎2,𝑁𝐴 𝑟𝜎2,𝑁𝐴𝜎2,𝑁𝐴
⋮ ⋱ ⋮𝑟𝑅1𝑅𝑁𝑂𝜎1,𝑁𝐴𝜎𝑁𝑂,𝑁𝐴 ⋯ 𝑟𝜎𝑁𝑂,1𝜎𝑁𝑂,𝑁𝐴 )
⋯
(
𝜎1,𝑁𝐴2 𝑟𝑅1𝑅2𝜎1,𝑁𝐴𝜎2,𝑁𝐴 ⋯ 𝑟𝑅1𝑅𝑁𝑂𝜎1,𝑁𝐴𝜎𝑁𝑂,𝑁𝐴
𝑟𝑅1𝑅2𝜎1,𝑁𝐴𝜎2,𝑁𝐴 𝜎2,𝑁𝐴2
⋮ ⋱ ⋮𝑟𝑅1𝑅𝑁𝑂𝜎1,𝑁𝐴𝜎𝑁𝑂,𝑁𝐴 ⋯ 𝜎𝑁𝑂,𝑁𝐴
2)
)
(There are 𝑁𝐴𝑖 ×𝑁𝐴𝑖 sub-matrices, each of size 𝑁𝑂𝑖 ×𝑁𝑂𝑖. To pick out an element 𝒊[(𝑗1, 𝑘1), (𝑗2, 𝑘2)], 𝑘1 and 𝑘2 give the row and column coordinates of
the relevant sub-matrix and 𝑗1 and 𝑗2 give the row and column coordinates of the relevant element within the sub-matrix.)
Chapter II.4
72
To prove that is positive-definite, it is necessary to show that 𝒙𝑇𝒙 > 𝟎 for any non-zero column
vector 𝒙 of length 𝑁𝑂 × 𝑁𝐴. Again this is written as 𝑥𝑗𝑘, (𝑗, 𝑘) ∈ 1,… , 𝑁𝑂 × 1, … ,𝑁𝐴.
It is possible to directly evaluate 𝒙𝑇𝒙
=∑∑𝑥𝑗𝑘
𝑁𝑂
𝑗=1
(𝜎𝑗𝑘2 𝑥𝑗𝑘 +∑𝑅𝑗𝑅𝑠𝜎𝑗𝑘𝜎𝑠𝑘𝑥𝑠𝑘
𝑠≠𝑗
+ 𝑟∑𝜎𝑗𝑘𝜎𝑗𝑡𝑥𝑗𝑡 +𝑡≠𝑘
𝑟∑∑𝑅𝑗𝑅𝑠𝜎𝑗𝑘𝜎𝑠𝑡𝑥𝑠𝑡𝑡≠𝑘𝑠≠𝑗
)
𝑁𝐴
𝑘=1
then rearrange and complete squares:
=∑(∑𝜎𝑗𝑘2 𝑥𝑗𝑘
2 +∑𝑅𝑗𝑅𝑠𝜎𝑗𝑘𝜎𝑠𝑘𝑥𝑗𝑘𝑥𝑠𝑘𝑠≠𝑗
𝑁𝑂
𝑗=1
)+∑∑(𝑟∑𝜎𝑗𝑘𝜎𝑗𝑡𝑥𝑗𝑘𝑥𝑗𝑡 +𝑡≠𝑘
𝑟∑∑𝑅𝑗𝑅𝑗𝑠𝜎𝑗𝑘𝜎𝑠𝑡𝑥𝑗𝑘𝑥𝑠𝑡𝑡≠𝑘𝑠≠𝑗
)
𝑁𝑂
𝑗=1
𝑁𝐴
𝑘=1
𝑁𝐴
𝑘=1
=∑[(∑𝑅𝑗𝜎𝑗𝑘𝑥𝑗𝑘
𝑁𝑂
𝑗=1
)
2
+∑(1− |𝜌𝑗|)𝜎𝑗𝑘2 𝑥𝑗𝑘
2
𝑁𝑂
𝑗=1
]
𝑁𝐴
𝑘=1
+ 𝑟∑∑(∑𝜎𝑗𝑘𝜎𝑗𝑡𝑥𝑗𝑘𝑥𝑗𝑡 +𝑡≠𝑘
∑∑𝑅𝑗𝑅𝑗𝑠𝜎𝑗𝑘𝜎𝑠𝑡𝑥𝑗𝑘𝑥𝑠𝑡𝑡≠𝑘𝑠≠𝑗
)
𝑁𝐴
𝑘=1
𝑁𝑂
𝑗=1
=∑(∑𝑅𝑗𝜎𝑗𝑘𝑥𝑗𝑘
𝑁𝑂
𝑗=1
)
2
+∑∑(1− 𝑟 − |𝜌𝑗|)𝜎𝑗𝑘2 𝑥𝑗𝑘
2
𝑁𝑂
𝑗=1
𝑁𝐴
𝑘=1
𝑁𝐴
𝑘=1
+ 𝑟∑(∑𝜎𝑗𝑘𝑥𝑗𝑘
𝑁𝐴
𝑘=1
)
2𝑁𝑂
𝑗=1
+ 𝑟∑∑∑∑𝑅𝑗𝑅𝑗𝑠𝜎𝑗𝑘𝜎𝑠𝑡𝑥𝑗𝑘𝑥𝑠𝑡𝑡≠𝑘𝑠≠𝑗
𝑁𝐴
𝑘=1
𝑁𝑂
𝑗=1
(3)
Now observe that (∑ ∑ 𝑅𝑗𝜎𝑗𝑘𝑥𝑗𝑘𝑁𝐴𝑘=1
𝑁𝑂𝑗=1 )
2 expands as follows:
(∑ ∑ 𝑅𝑗𝜎𝑗𝑘𝑥𝑗𝑘𝑁𝐴𝑘=1
𝑁𝑂𝑗=1 )
2
=∑∑|𝜌𝑗|𝜎𝑗𝑘2 𝑥𝑗𝑘
2
𝑁𝐴
𝑘=1
𝑁𝑂
𝑗=1
+∑|𝜌𝑗|∑∑𝜎𝑗𝑘𝜎𝑗𝑡𝑥𝑗𝑘𝑥𝑗𝑡𝑡≠𝑘
𝑁𝐴
𝑘=1
𝑁𝑂
𝑗=1
+∑∑∑𝑅𝑗𝑅𝑗𝑠𝜎𝑗𝑘𝜎𝑠𝑘𝑥𝑗𝑘𝑥𝑠𝑘
𝑠≠𝑗
𝑁𝑂
𝑗=1
𝑁𝐴
𝑘=1
+∑∑∑∑𝑅𝑗𝑅𝑗𝑠𝜎𝑗𝑘𝜎𝑠𝑡𝑥𝑠𝑡𝑡≠𝑘𝑠≠𝑗
𝑁𝐴
𝑘=1
𝑁𝑂
𝑗=1
Rearranging and completing squares again gives
∑∑∑∑𝑅𝑗𝑅𝑗𝑠𝜎𝑗𝑘𝜎𝑠𝑡𝑥𝑠𝑡𝑡≠𝑘𝑠≠𝑗
𝑁𝐴
𝑘=1
𝑁𝑂
𝑗=1
= (∑∑𝑅𝑗𝜎𝑗𝑘𝑥𝑗𝑘
𝑁𝐴
𝑘=1
𝑁𝑂
𝑗=1
)
2
−∑∑|𝜌𝑗| 𝜎𝑗𝑘
2 𝑥𝑗𝑘2
𝑁𝐴
𝑘=1
𝑁𝑂
𝑗=1
−∑|𝜌𝑗| [(∑𝜎𝑗𝑘𝑥𝑗𝑘
𝑁𝐴
𝑘=1
)
2
−∑𝜎𝑗𝑘2 𝑥𝑗𝑘
2
𝑁𝐴
𝑘=1
]
𝑁𝑂
𝑗=1
−∑[(∑𝑅𝑗𝜎𝑗𝑘𝑥𝑗𝑘
𝑁𝑂
𝑗=1
)
2
−∑|𝜌𝑗| 𝜎𝑗𝑘
2 𝑥𝑗𝑘2
𝑁𝑂
𝑗=1
]
𝑁𝐴
𝑘=1
= (∑∑ 𝑅𝑗𝜎𝑗𝑘𝑥𝑗𝑘
𝑁𝐴
𝑘=1
𝑁𝑂
𝑗=1
)
2
− ∑|𝜌𝑗|(∑ 𝜎𝑗𝑘𝑥𝑗𝑘
𝑁𝐴
𝑘=1
)
2
+
𝑁𝑂
𝑗=1
∑∑|𝜌𝑗|𝜎𝑗𝑘
2𝑥𝑗𝑘2
𝑁𝐴
𝑘=1
𝑁𝑂
𝑗=1
−∑(∑ 𝑅𝑗𝜎𝑗𝑘𝑥𝑗𝑘
𝑁𝑂
𝑗=1
)
2𝑁𝐴
𝑘=1
Substituting this into (3) gives
Chapter II.4
73
𝒙𝑇𝒙 =∑(∑𝑅𝑗𝜎𝑗𝑘𝑥𝑗𝑘
𝑁𝑂
𝑗=1
)
2
+∑∑(1 − 𝑟 − |𝜌𝑗|)𝜎𝑗𝑘2 𝑥𝑗𝑘
2
𝑁𝑂
𝑗=1
𝑁𝐴
𝑘=1
𝑁𝐴
𝑘=1
+ 𝑟∑(∑𝜎𝑗𝑘𝑥𝑗𝑘
𝑁𝐴
𝑘=1
)
2𝑁𝑂
𝑗=1
+ 𝑟(∑∑𝑅𝑗𝜎𝑗𝑘𝑥𝑗𝑘
𝑁𝐴
𝑘=1
𝑁𝑂
𝑗=1
)
2
− 𝑟∑|𝜌𝑗|(∑𝜎𝑗𝑘𝑥𝑗𝑘
𝑁𝐴
𝑘=1
)
2
+
𝑁𝑂
𝑗=1
𝑟∑∑|𝜌𝑗|𝜎𝑗𝑘2 𝑥𝑗𝑘
2
𝑁𝐴
𝑘=1
𝑁𝑂
𝑗=1
− 𝑟∑(∑𝑅𝑗𝜎𝑗𝑘𝑥𝑗𝑘
𝑁𝑂
𝑗=1
)
2𝑁𝐴
𝑘=1
= 𝑟(∑∑𝑅𝑗𝜎𝑗𝑘𝑥𝑗𝑘
𝑁𝐴
𝑘=1
𝑁𝑂
𝑗=1
)
2
+ 𝑟∑(1 − |𝜌𝑗|) (∑𝜎𝑗𝑘𝑥𝑗𝑘
𝑁𝐴
𝑘=1
)
2
+
𝑁𝑂
𝑗=1
(1 − 𝑟)∑(∑𝑅𝑗𝜎𝑗𝑘𝑥𝑗𝑘
𝑁𝑂
𝑗=1
)
2𝑁𝐴
𝑘=1
+∑∑(1− 𝑟 − |𝜌𝑗| + 𝑟|𝜌𝑗|)𝜎𝑗𝑘2 𝑥𝑗𝑘
2
𝑁𝑂
𝑗=1
𝑁𝐴
𝑘=1
For 𝑟 ∈ [0,1] every term in the final expression is guaranteed to be nonnegative, and at least one
term must be strictly positive. Therefore 𝒙𝑇𝒙 > 0 and is positive-definite.
To show that the converse does not hold, it is sufficient to provide a counterexample. One possible
positive-definite correlation matrix that does not conform to the correlation structure described
above is 𝑨 = (1 −0.5 0
−0.5 1 −0.50 −0.5 1
) with NO=3, NA=1.
If 𝑨 could be expressed with the parameters described above we would have 𝑅1𝑅3 = 0 and 𝑅1𝑅2 =
𝑅2𝑅3 = −0.5 which cannot simultaneously hold.
𝑨 is positive-definite since
(𝑥1, 𝑥2, 𝑥3)𝑨(𝑥1, 𝑥2, 𝑥3)𝑇 = 𝑥1
2 + 𝑥22 + 𝑥3
2 − 𝑥1𝑥2 − 𝑥1𝑥3 = (𝑥2 − 𝑥1 2)⁄ 𝟐+ (𝑥2 − 𝑥1 2)⁄ 𝟐
+ 𝑥12 2⁄ > 0 unless
𝑥1 = 𝑥2 = 𝑥3 = 0.
II.4.5 Fixed baseline (Model 2)
Although the model above allows correlations between outcomes to be incorporated, the existence
of active-active trials within the dataset means there is potentially a problem with the model,
stemming from the definitions of 𝜇 and 𝛿.
Within study i, the mean value of outcome j in the first (or “baseline”) arm (𝑘 = 1) is given by 𝜇𝑖𝑗
and in all other arms (𝑘 > 1) is given by 𝜇𝑖𝑗 + 𝛿𝑖𝑗𝑘 . In other words, the parameterisation is
asymmetrical across trial arms, with the mean outcome having higher prior variability for 𝑘 > 1 than
for 𝑘 = 1 (since 𝜇 and 𝛿 are assumed independent). Usually 𝑘 = 1 represents placebo or no
treatment, and the asymmetry perhaps makes intuitive sense, but in trials without a placebo arm
Chapter II.4
74
𝑘 = 1 represents the outcome on an arbitrarily chosen active treatment and the asymmetry is
undesirable.
This issue been noted elsewhere45 but the solutions put forward by the authors do not appear to
result in satisfactory models, as explained below.
Since it is only the prior variance of the arm-level outcomes that is affected by this issue with the
model structure, and the main target of inference is the relative effects, this issue may not usually be
of great concern. Still, it may be possible to avoid it altogether by a simple change in the
parameterisation so that the treatment effects are expressed relative to a fixed baseline of placebo /
no treatment in every trial. In other words, redefine 𝛿𝑖𝑗𝑘 = 0 for 𝑡𝑖𝑘 = 1. For 𝑘 > 1, in the random
effects model, the 𝛿𝑖𝑗𝑘 are jointly described by the following distribution:
𝜹𝑖 ~ 𝑀𝑉𝑁𝑗𝑘(𝒅𝑖 ,𝒊 )
where 𝒅𝑖 is a vector of length 𝑁𝑂𝑖 ×𝑁𝐴𝑖 whose elements are 𝑑𝜔𝑖𝑗𝑡𝑖𝑘 (indexed by (𝑗, 𝑘) ∈
1, … , 𝑁𝑂𝑖 × 1, … , 𝑁𝐴𝑖 ) and 𝒊 is of dimension (𝑁𝑂𝑖 ×𝑁𝐴𝑖) × (𝑁𝑂𝑖 ×𝑁𝐴𝑖) but otherwise defined
as before. The between-arm correlations now apply in any trial with at least two active treatments
(rather than only in multi-arm trials as before). The diagonal elements of 𝒊 are equal to the
random-effects variance σ2 and the off-diagonal elements are equal to either 0.5 σ2 (same 𝑗 different
𝑘, ρb σ2 (different 𝑗 same 𝑘), or 0.5ρb σ2 (different 𝑗 different 𝑘). If we order the elements of 𝜹𝑖 and
𝒅𝑖 lexicographically (advancing through values of j first, then values of k), then the distribution can
be written
(
𝛿𝑖11⋮
𝛿𝑖1𝑁𝑂𝑖𝛿𝑖21⋮
𝛿𝑖2𝑁𝑂𝑖⋮⋮
𝛿𝑖𝑁𝐴𝑖1⋮
𝛿𝑖𝑁𝐴𝑖𝑁𝑂𝑖)
~ 𝑀𝑉𝑁(
(
𝑑𝜔𝑖1𝑡𝑖1
⋮ 𝑑𝜔𝑖𝑁𝑂𝑖𝑡𝑖1
𝑑𝜔𝑖1𝑡𝑖2
⋮ 𝑑𝜔𝑖𝑁𝑂𝑖𝑡𝑖2
⋮⋮
𝑑𝜔𝑖1𝑡𝑖𝑁𝐴𝑖
⋮ 𝑑𝜔𝑖𝑁𝑂𝑖𝑡𝑖𝑁𝐴𝑖 )
,𝒊 )
Chapter II.4
75
where 𝒊 takes the form
𝒊 = σ2
(
(
1 𝜌𝑏𝜌𝑏 1
⋯ 𝜌𝑏⋯ 𝜌𝑏
⋮ ⋮𝜌𝑏 𝜌𝑏
⋱ ⋮⋯ 1
) (
0.5 0.5𝜌𝑏0.5𝜌𝑏 0.5
⋯ 0.5𝜌𝑏⋯ 0.5𝜌𝑏
⋮ ⋮0.5𝜌𝑏 0.5𝜌𝑏
⋱ ⋮⋯ 0.5
)
(
0.5 0.5𝜌𝑏0.5𝜌𝑏 0.5
⋯ 0.5𝜌𝑏⋯ 0.5𝜌𝑏
⋮ ⋮0.5𝜌𝑏 0.5𝜌𝑏
⋱ ⋮⋯ 0.5
) (
1 𝜌𝑏𝜌𝑏 1
⋯ 𝜌𝑏⋯ 𝜌𝑏
⋮ ⋮𝜌𝑏 𝜌𝑏
⋱ ⋮⋯ 1
)
⋯ (
0.5 0.5𝜌𝑏0.5𝜌𝑏 0.5
⋯ 0.5𝜌𝑏⋯ 0.5𝜌𝑏
⋮ ⋮0.5𝜌𝑏 0.5𝜌𝑏
⋱ ⋮⋯ 0.5
)
⋯ (
0.5 0.5𝜌𝑏0.5𝜌𝑏 0.5
⋯ 0.5𝜌𝑏⋯ 0.5𝜌𝑏
⋮ ⋮0.5𝜌𝑏 0.5𝜌𝑏
⋱ ⋮⋯ 0.5
)
⋮ ⋯
(
0.5 0.5𝜌𝑏0.5𝜌𝑏 0.5
⋯ 0.5𝜌𝑏⋯ 0.5𝜌𝑏
⋮ ⋮0.5𝜌𝑏 0.5𝜌𝑏
⋱ ⋮⋯ 0.5
) (
0.5 0.5𝜌𝑏0.5𝜌𝑏 0.5
⋯ 0.5𝜌𝑏⋯ 0.5𝜌𝑏
⋮ ⋮0.5𝜌𝑏 0.5𝜌𝑏
⋱ ⋮⋯ 0.5
)
⋱ ⋮
⋯ (
1 𝜌𝑏𝜌𝑏 1
⋯ 𝜌𝑏⋯ 𝜌𝑏
⋮ ⋮𝜌𝑏 𝜌𝑏
⋱ ⋮⋯ 1
)
)
consisting of 𝑁𝐴𝑖 ×𝑁𝐴𝑖 sub-matrices, each of size 𝑁𝑂𝑖 ×𝑁𝑂𝑖. To pick out an element
𝒊[(𝑗1, 𝑘1), (𝑗2, 𝑘2)], 𝑘1 and 𝑘2 give the row and column coordinates of the relevant sub-matrix and
𝑗1 and 𝑗2 give the row and column coordinates of the relevant element within the sub-matrix.)
A first attempt at accommodating this revised definition of 𝜹𝑖might simply be to redefine 𝜇𝑖𝑗 to be
the mean value of outcome 𝑗 in study 𝑗 on placebo (and indeed this model is suggested by Hong et
al45). However, 𝜇𝑖𝑗 thus defined cannot be estimated from the data for trials without a placebo arm,
which results in a model that does not converge.
Instead redefine 𝜇𝑖𝑗 to be the average value of outcome 𝑗 across all arms of study 𝑖. This quantity is
readily identifiable from the arm-level data. Then replace (1) with
𝒚𝑖𝑘~ 𝑀𝑉𝑁(𝝁𝑖 + 𝜹𝑖𝑘 −1
𝑁𝐴i∑ 𝜹𝑖𝑚𝑁𝐴𝑖𝑚=1 , 𝐂𝐕𝑖𝑘 )
It is important to note however that this parameterisation can only be used for arm-level, not
contrast-level data.
One can interpret this model in two ways:
• 𝜹𝑖𝑘 −1
𝑁𝐴i∑ 𝜹𝑖𝑚𝑁𝐴𝑖𝑚=1 is the treatment effect vector for arm 𝑘 relative to the outcome vector
in the “average” arm whose value is given by 𝝁𝑖 ; or
• 𝜹𝑖𝑘 is the treatment effect vector for arm 𝑘 relative to the outcome on placebo whose value
is given by 𝝁𝑖 −1
𝑁𝐴i∑ 𝜹𝑖𝑚𝑁𝐴𝑖𝑚=1
This parameterisation for 𝜇 and 𝛿 has previously been used in univariate NMAs 60 but not in the
multivariate setting. Under this model the prior outcome variance is better behaved in that it is
Chapter II.4
76
always placebo arms that have a lower variance, and active treatments are treated symmetrically. .
If desired, an active treatment can be used as the baseline, not just placebo, but due to these
properties a placebo baseline is to be preferred whenever possible.
Again it is straightforward to obtain the corresponding fixed effect model by replacing 𝛿𝑖𝑗𝑘 with its
mean according to the distribution above.
II.4.6 Mappings (Model 3)
The models described so far do not address the issues of patchy data. In patchy networks, outcomes
may be completely missing for some treatments, or studies may adopt different clinical thresholds
or definitions. The way I am proposing to address these issues is to provide structural links between
the mean treatment effects 𝑑𝜔𝑡 via proportional mappings between different outcomes. Specifically,
the treatment effect parameters for pairs of outcomes 1, 2 will be linked by equations of the form
𝑑𝜔1𝑡 = 𝑏𝑑𝜔2𝑡 for some constant 𝑏; in other words, linear mappings with no intercept. The mapping
parameters 𝑏 are to be estimated based on the ratios 𝑑𝜔1𝑡/𝑑𝜔2𝑡 of treatment effects estimated
from the trial data.
The concept can be illustrated using the RRMS case study. Table 7 shows the posterior mean effect
size (relative to placebo) for every treatment and outcome that could be estimated by Model 2.
Table 7 – Posterior mean effect estimates from Model 2.
Outcome
Treatment ARR RFP DP3 DP6 ALT ALT3 ALT5
PL 0 0 0 0 0 0 0
DF -0.70 -0.71 -0.53 -0.47 0.25 0.23 -0.53
FM -0.73 -0.92 -0.30 -0.41 1.37 1.30 0.64
GA -0.38 -0.63 -0.18 -0.08 -0.12 0.32 -0.11
IA (IM) -0.20 -0.38 -0.30 -0.44 0.63 0.21 -0.33
IA (SC) 0.03 -0.80 -0.65 0.48 1.61
IBB -0.41 -0.77 -0.11 -1.51 1.11
LQ -0.20 -0.33 -0.39 -0.48 0.74 0.84 -0.34
TF -0.38 -0.43 -0.40 0.88 0.00
The missing entries in the table are the treatment-outcome combinations that were not reported in
the data. The mapping-based model aims to fill in these gaps by, essentially, observing the ratios
between the column entries within other rows and applying the average ratios to the rows with
missing entries to impute the values. At the same time it will smooth the observed values somewhat
to better fit the average mapping ratios.
Introducing mappings into the model has appealing potential benefits in patchy networks:
Chapter II.4
77
• Estimation of treatment effect parameters for outcomes that are not reported in the data
by mapping from other outcomes that are reported. This facilitates the estimation of a
standard set of outcomes for each treatment to take forward for decision making, when the
outcomes as reported are not standardised across treatments.
• Increasing the extent of “borrowing strength” between closely related outcomes . The
mappings will tend to smooth the results between outcomes that are mapped to one
another which may be helpful In some situatuions, such as when there is any uncertrianty
over which version of an outcome to take forward in a decision analysis. Choosing any one
outcome risks discarding valuable information if mappings are not used, but with mappings
in place, the results from any one outcome will automatically be influenced by trends in
those with which it is mapped - so no data is truly discarded no matter which outcome is
chosen, and in a sense the choice becomes less critical.
The mappings could take a number of forms but in the absence of any specific hypotheses on the
relationships between outcomes, proportional mappings between outcomes appears an obvious and
straightforward starting point. It is logical not to include an additive/intercept term, as a null
treatment such as placebo would have no effect on any outcome as per the first row of Table 7).
The approach is similar to one employed by Lu et al and Ades et al 46,47, but their mappings were
applied to the study-specific random effects 𝜹. Moreover, their mappings were only used for
outcomes that measured the same underlying clinical concept, and hence could safely be assumed
to occur in linear proportion to one another. Here I am proposing to use the mappings to link
different but related clinical concepts, and as such the assumption that the treatment effects on
different outcomes occur in consistent proportions regardless of treatment is somewhat stronger.
In a sense this strong assumption is the price that must be paid for the additional inferences the
mappings make available. In order to verify that the proportionality assumption holds for the RRMS
dataset, Appendix A contains two-way plots of the posterior mean treatment effect estimates 𝑑𝜔𝑡
from a univariate NMA model for each outcome. Overall the plots appear to correspond reasonably
well to straight lines through the origin, suggesting that the estimates do indeed occur in fairly
constant proportions, although there is some spread around the apparent trend lines and a few
outliers.
Recall from II.1.1 that, since treatment contrasts are transitive (i.e. AC = AB + BC for treatments A, B
and C, where AB is the contrast comparing B to A, etc.), it is necessary to parameterise the model in
a way that gives estimates which are consistent with regard to transitivity. For this reason only the
basic treatment effect parameters 𝑑𝜔𝑡 (comparing each treatment t > 1 to the reference treatment
Chapter II.4
78
1) were independently defined in the model; the remaining treatment effects are found via the
consistency equations (for example, the effect for t2 relative to t1 is calculated as 𝑑𝜔𝑡2 − 𝑑𝜔𝑡1 ). An
analogous situation arises with the mapping ratios, which are also transitive (on a multiplicative
scale) in the sense that the ratio 𝑑𝜔3𝑡 / 𝑑𝜔1𝑡 is the product of the ratios 𝑑𝜔3𝑡 / 𝑑𝜔2𝑡 and
𝑑𝜔2𝑡 / 𝑑𝜔1𝑡 . Accordingly the model should also exhibit consistency of mappings, and so rather than
defining a mapping maprameter for every pair of outcomes, only the basic mapping parameters 𝑏𝜔𝑡
(the ratio 𝑑𝜔𝑡 / 𝑑1𝑡 between the effects on outcome ω > 1 and outcome 1) are independently
defined in the model, leaving the remaining mapping ratios between outcomes ω2 and ω1 to be
calculated as 𝑏𝜔2𝑡/𝑏𝜔1𝑡 if required.
Thus, using fixed mappings for all treatments, For >1 the mapping equation for each treatment t is
specified in the model as 𝑑𝜔𝑡 = 𝑏𝜔𝑑1𝑡 where 𝑏𝜔 maps the treatment’s effect on outcome 1 to its
effect on outcome .
46,47In the same way as Lu et al and Ades et al46,47, only the absolute value of the mappings is allowed
to be random – the sign of each treatment effect 𝑑𝜔𝑡 is assumed to be known a priori for each
outcome and is taken to be the same for all treatments. The reference treatment is usually a
placebo/no treatment option and therefore specifying the signs of the treatment effects in advance
should not be too controversial, at least for treatments that have already cleared early phase or
pivotal trials since as a rule these will have a nonnegative effect on efficacy and a negative effect on
safety.
For the “fixed-mapping” model as described, the mappings are identical for all treatments, and thus
the average treatment effects 𝑑𝜔𝑡 are kept in strict proportion. The strength of the proportionality
assumption can be relaxed somewhat by use of a “random-mapping” model, where mappings are
allowed to vary between treatments (but they remain similar in the sense they are drawn from the
same distribution, and always respect the known signs of the treatment effects). For >1, the
mapping equation for treatment t is 𝑑𝜔𝑡 = 𝛽𝜔𝑡𝑑1𝑡 where 𝛽𝜔𝑡 maps the treatment’s effect on
outcome 1 to its effect on outcome .
It is convenient to define the distribution of the random mappings on the logarithmic scale – i.e. by
assigning a distribution to log (𝛽𝜔𝑡) – for two reasons. The first is that constant interval variability
of mappings on the log scale corresponds to linear mappings with variability proportional to their
magnitude, which has fits well with the multiplicative nature of the mappings and prevents the
lower tail of the distribution from straying into negative territory.
Chapter II.4
79
The second reason for the log transformation relates to correlations between mappings. For
outcomes 1, 2 >1, the estimated mappings 𝛽𝜔1 and 𝛽𝜔2 will be correlated across treatments, as
for a given treatment, they are estimated by the absolute ratios |𝑑𝜔1/𝑑1| and |𝑑𝜔2/𝑑1|
respectively, sharing a common denominator 𝑑1 (which is essentially a weighted average of
random-effects estimates , in turn estimated as linear differences between observed data points).
These correlations are much more easily expressed on the logarithmic scale, which replaces the
ratios with linear differences, i.e. log (𝛽𝜔𝑡) = log(|𝑑𝜔1|) − log (|𝑑1|). Even so, the correlation
coefficients cannot be specified in advance with any accuracy, as they derive from the relative
variances of the estimates log ( |𝑑𝜔|), and hence depend not only on the network structure but also
on the magnitude of 𝑑𝜔. However, under the assumption that log ( |𝑑𝜔|) is of equal variance for
different values of , the correlation between log(𝛽𝜔1) and log(𝛽𝜔2) will on average be 0.5, and
this seems a reasonable starting assumption.
The random mapping distribution is therefore defined as log (𝜷𝒕) ~ 𝑀𝑉𝑁(log (𝒃), 𝑸) where 𝒃 =
𝑏2, 𝑏𝟑,… , 𝑏 is the vector of average mappings and 𝑸 is a covariance matrix with diagonal terms
equal to the between-treatment mapping variance 𝜎𝑚𝑎𝑝2 and off-diagonal terms equal to 0.5*𝜎𝑚𝑎𝑝
2 .
(
log (𝛽2)log (𝛽3)
⋮log (𝛽)
)~ 𝑀𝑉𝑁((
log (𝑏2)log (𝑏𝟑)
⋮log (𝑏)
) , 𝜎𝑚𝑎𝑝2 (
1 0.50.5 1
⋯ 0.5⋯ 0.5
⋮ ⋮0.5 0.5
⋱ ⋮⋯ 1
) )
As the assumption of proportionality may be considered too strong to apply universally across a
given set of outcomes, the mappings can be applied only within certain subsets of outcomes that are
especially closely related, rather than between all outcomes simultaneously. A number of mapping
schemes have been evaluated within the RRMS case study, as follows:
• One-group model: all outcomes grouped together.
• Two-group model: all efficacy outcomes grouped together, all liver-safety outcomes grouped
together.
• Three-group model: both relapse outcomes grouped together, both disability progression
outcomes grouped together, all liver-safety outcomes grouped together (groups correspond
to the green cells in Figure 10).
• No mappings (alternatively, this can be thought of as a model with a group for each
outcome).
Chapter II.4
80
The groupings apply only to the mappings and do not impose any restrictions on the within- or
between-study correlations.
II.4.7 Outcomes with zeroes (Models 4a and 4b)
When correlations were introduced to the models above, it was convenient to drop three binary
outcomes from the decision set (serious gastrointestinal events, serious bradycardia and macular
edema) as, due to the presence of zero rates in the data, they could not be expressed on a scale
suitable for modelling with a multivariate normal distribution. This section revisits these outcomes
and considers how they can be included in the multivariate model.
Many of the measurement scales typically used for binary outcomes cannot cope well with
proportions of zero (or, equivalently, 100%). Odds (and therefore odds ratios) cannot be defined for
a study arm with zero observed events for a given outcome, and neither can one inversion of the log
relative risk comparing study arms.
For a binary outcome in a benefit-risk assessment, any observed zeroes will usually occur alongside
non-zero observations (if all the observations were zeroes, the outcome would not differentiate
between treatments and could be excluded from the analysis). When considering such outcomes, it
may be helpful to consider whether the zeroes are likely to be chance observations from a
distribution with nonzero expectation or, alternatively, if any of the underlying average rates may
actually be zero (for example, adverse events that only occur on certain treatments). In the former
case, it is probably most straightforward to handle any isolated zero data points by adding an equal
continuity correction to all study arms, as is common practice117, and using the log odds scale for
binary outcomes as in the models above, as this is the most convenient scale for nonzero binary
rates. In the latter case, however, with zero average rates expected, this approach seems
unsatisfactory: not only would this require extensive modification to the data, but the log odds scale
simply seems inappropriate when it is mathematically unable to express the true expected rate.
Arguably modelling the proportion in each study arm, with the risk difference as contrast, is the
natural solution for such outcomes. However, using this scale together with the multivariate normal
distributions in the model presents a few further issues.
With a between-study random effects distribution on the risk difference scale, values of 𝛿 may
exceed the theoretical range [-1,1] of the risk difference (and an even tighter range may apply
depending on the corresponding arm-level probabilities). This should not present many issues
provided that any parameters or variables dependent on 𝛿 are subject to an appropriate
Chapter II.4
81
ceiling/floor, which is straightforward to apply in BUGS. The arm-level probability 𝑝𝑖𝑘𝑗 of the jth
outcome occurring in the kth arm of study i is specified as follows :
𝑝𝑖𝑘𝑗 = min(max(𝜇𝑖𝑗 + 𝛿𝑖𝑘𝑗) , 0 ) , 1)
What this means is that the portions of the tails of the random effects distribution that extend
beyond the range are replaced by probability masses at the limits. For the RRMS dataset, there are
few studies providing data for the outcomes that have zero rates, so it will typically be assumed in
this section that fixed effect models (and therefore, no between-study correlations) are to be fitted
for these outcomes; in other words 𝛿𝑖𝑘𝑗 = 𝑑𝜔𝑖𝑗𝑡𝑖𝑘 . The study-level likelihood is more problematic.
One approach is to use a Binomial likelihood for the outcome in question (as per Model 0) – call this
Model 4a:
𝑦𝑖𝑘𝑗~𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙(𝑛𝑖𝑘 , 𝑝𝑖𝑘𝑗)
The main drawback of this approach is that it renders it unfeasible to incorporate within-study
correlations with other outcomes. One might argue that, if the outcomes with zeroes are adverse
events caused by particular treatments, rather than part of the usual disease course, then it may be
reasonable to assume that these outcomes exhibit a low degree of correlation with other outcomes
in the model. However, some correlations may remain since, for example, the incidence of an
adverse event and the efficacy of treatment may both be influenced by a patient’s particular
pharmacodynamics and pharmacokinetics.
A model that allows for within-study correlations would therefore be desirable. This can be
achieved using a multivariate Normal likelihood, but two problems present themselves:
• the conventional approximate Normal variance of the risk, p(1-p)/n , estimated using the
observed proportion , gives a variance of zero when no events are observed;
• the Normal distribution gives a non-zero likelihood of proportions below zero.
The second point is arguably of little practical concern: parameters can be constrained to avoid
impossible values, for example via priors or by applying a ceiling/floor within the model. However,
the first point, estimating the variance of the data points with zero events, must be dealt with before
a Normal model can be fitted. One approach may be to use a continuity correction.
Adding a constant continuity correction to the observed outcome proportion in all treatment arms
allows the Normal approximation to be used – and hence, permits within-study correlations - while
preserving the observed risk differences. This is Model 4b. The estimated variances may be
Chapter II.4
82
spurious, however, they serve as a starting point. By comparing the model’s results to model 4a
under the assumption of zero within-study correlations, it may be possible to tune the sample
variances to give the “right” results, and perhaps derive a rule of thumb to estimate the sample
variance for observed zero proportions.
Assuming a constant within-study correlation coefficient 𝜌𝑤 between all pairs of outcomes, the
Model 4b likelihood is:
𝒚𝑖𝑘/𝑛𝑖𝑘 ~ 𝑀𝑉𝑁
(
(
𝑝𝑖𝑘1𝑝𝑖𝑘2⋮
𝑝𝑖𝑘𝑛
) ,
(
𝑆𝑖𝑘1 𝜌𝑤√𝑆𝑖𝑘1𝑆𝑖𝑘2
𝜌𝑤√𝑆𝑖𝑘1𝑆𝑖𝑘2 𝑆𝑖𝑘2
⋯ 𝜌𝑤√𝑆𝑖𝑘1𝑆𝑖𝑘𝑁𝑂𝑖
⋯ 𝜌𝑤√𝑆𝑖𝑘2𝑆𝑖𝑘𝑁𝑂𝑖
⋮ ⋮
𝜌𝑤√𝑆𝑖𝑘1𝑆𝑖𝑘𝑁𝑂𝑖 𝜌𝑤√𝑆𝑖𝑘2𝑆𝑖𝑘𝑁𝑂𝑖
⋱ ⋮⋯ 𝑆𝑖𝑘𝑁𝑂𝑖) )
)
where 𝒚𝑖𝑘/𝑛𝑖𝑘 = 𝒚𝑖𝑘/𝑛𝑖𝑘 + 𝑐𝑐 , i.e. the observed proportions are adjusted by adding a continuity
correction, 𝑆𝑖𝑘𝑗 = 𝛼𝑛𝑖𝑘𝑝𝑖𝑘𝑗(1 − 𝑝𝑖𝑘𝑗) based on the approximate sampling distribution of a
proportion and α is a pre-specified constant used to scale the variances.
A similar method based on rate differences should be possible for count outcomes, which also
present difficulties when zero rates are encountered. The RRMS dataset contains no such examples,
however, so this will not be explored here.
It is worth bearing in mind that outcomes with zeroes will often be adverse events specific to
particular treatments – and as such it will normally be more appropriate to assume a zero treatment
effect for treatments with no data, rather than using mappings to fill in the gaps based on the rates
observed on other treatments. “If it’s not reported, it doesn’t happen” may often be a fair
assumption when it comes to adverse events. However, this is not always the case. Reporting
practices vary118, and adverse events can go unmentioned in published trial reports for a number of
reasons, for example if they occur below a certain threshold rate, were not specified in the study
protocol, are judged to be clinically insignificant or unrelated to treatment, or simply at the
discretion of the investigators. In a trial of an active drug against placebo, or a pairwise meta-
analysis, allowing adverse events to go unreported will usually bias the benefit-risk assessment in
favour of the active drug. Where a more complex evidence network is used, the bias can go in
various directions depending on the structure of the evidence network and the (unknown) true rates
of unreported events; scenario-based sensitivity analyses could help to unpick the impact.
Chapter II.4
83
For the purpose of this thesis it will be assumed that serious gastrointestinal events, serious
bradycardia and macular edema did not occur unless reported, and the models shown here
therefore do not apply mappings to these outcomes, instead assuming a zero effect wherever there
is no data. This is implemented by modelling multiplying each treatment effect by a constant
specific to each treatment/outcome combination that takes the value 0 as needed (or 1 otherwise).
The details can be found with the code in Appendix B. This same method can be used in any
applications where it is believed a priori that some outcomes never occur on any treatment.
II.4.8 Priors
The general principle adopted here is to use “minimally informative” priors wherever possible. This
means they establish the appropriate scale/sign but go no further. Table 8 lists each relevant
parameter and the prior(s) it is given.
Outcomes 1-7 are expressed on scales where the outcomes and treatment effects can take any value
on the real line, whereas outcomes 8-10 (the “zeroes” outcomes) are expressed on the risk
difference scale where outcomes are restricted to the interval [0,1]. Different priors are employed
accordingly.
The random effects standard deviation is assigned a vague Uniform prior as has previously been
recommended 78; in evidence networks with very few studies where the posterior distribution of the
standard deviation is dominated by the prior, it might be appropriate to be more informative. The
random mappings precision uses a prior that has been suggested in a similar model 46,47.
The signs of the treatment effects are taken to be known, and it is only necessary to assign priors to
the magnitudes of the effects 𝑑𝜔𝑡 and mappings 𝑏 . The upper half of a Normal distribution
centred on 0 is sometimes used for this purpose (this is sometimes known as a folded-Normal), and
is denoted below by N+(0,variance). Certain priors cause error messages in BUGS in some models; it
is therefore necessary to sometimes adopt different priors in different models, as indicated. The
sensitivity of the results to the choice of priors is explored later.
Chapter II.4
84
Table 8 – Priors for treatment effect module parameters
Parameter name Parameter description Prior(s)
𝑑𝜔𝑡 Population-average
treatment effect
(compared to placebo)
of treatment t on
outcome
𝑑𝜔𝑡 ∈ ℝ:
|𝑑𝜔𝑡 | ~ 𝑁+(0, 1000)
𝑑𝜔𝑡 ∈ [-1,1]:
|𝑑𝜔𝑡 | ~ 𝐵𝑒𝑡𝑎(0.5, 0.5)
𝜇𝑖𝑗 “Baseline” value of
outcome j in study i
(refers either to arm 1
or average of arms,
depending on model
version)
𝜇𝑖𝑗 ∈ ℝ:
𝜇𝑖𝑗 ~ 𝑁(0,1000) or 𝑁(0,100)
𝜇𝑖𝑗 ~ 𝑁(0,1) for some outcomes in Model 0
with fixed effects
𝜇𝑖𝑗 ∈ [0,1]:
𝜇𝑖𝑗 ~ 𝐺𝑎𝑚𝑚𝑎(0.5,0.5) (although this
distribution can take values above 1, these
values are effectively censored by the
model)
σ Standard deviation of
random effects
distribution
𝜎 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10)
𝑏 Average mapping
coefficient for outcome
|𝑏𝜔 | ~ 𝑁+(0,1000)
𝜏𝑚𝑎𝑝 = 1/𝜎𝑚𝑎𝑝2 Precision of random
mappings distribution
𝜏𝑚𝑎𝑝 ~ 𝐺𝑎𝑚𝑚𝑎(0.05,0.05) censored on interval (1,
∞)
II.4.9 Assessing model fit and complexity
The suitability of a statistical model for a particular dataset is often described in terms of its fit and
complexity. In broad terms, fit measures how closely the model describes the data (the closer the
fitted/predicted values are to the observed values, the better the fit) and complexity measures the
level of detail in the model structure (typically taken to mean the number of parameters). Models
with better fit and lower complexity are favoured, although these are often conflicting objectives –
Chapter II.4
85
for example, a model with a parameter for every observation will achieve perfect fit but is overly
complex, while at the other extreme, a model where all the fitted values are identical is of minimal
complexity but will exhibit poor fit. For this reason it is good practice to select models based on
both fit and complexity rather than focusing exclusively on one or the other.
Various measures of model fit and complexity have been developed but, in the Bayesian context, it
has been argued that the residual deviance is a natural measure of model fit, and the Deviance
Information Criterion a reasonable model selection measure that reflects both fit and complexity 119.
In a univariate Normal model, the residual deviance can be simply understood as the sum of the
squared standardised residuals for all observations in the data, where the standardised residual for
an observation is the number of standard deviations by which the observed value differs from the
fitted value. Extending this to the case of a multivariate Normal likelihood, the residual deviance
𝐷(𝝋), where 𝝋 refers to the set of parameters that are the target for inference, is calculated as
𝐷(𝝋) =∑(𝒚𝒊 −𝑬[𝒀𝒊|𝝋]) 𝐂𝐕𝒊−𝟏(𝒚𝒊 −𝑬[𝒀𝒊|𝝋])
𝒊
where 𝐂𝐕𝒊−𝟏 is the within-study coprecision matrix, i.e. the inverse of 𝐂𝐕𝒊. If the model fits well
then the mean residual deviance 𝐷(𝝋) should be similar to or less than the number of independent
observations. One can also calculate each study’s contribution to the residual deviance (i.e. the
summand on the right hand side of the definition above, for each study i) to reveal if any issues of
poor fit can be traced back to individual studies.
The complexity of the model is described by the effective number of parameters 𝑝𝐷. Counting the
number of parameters is not always as straightforward as it might sound, particularly in hierarchical
random effects models, but 𝑝𝐷 can be calculated as the mean residual deviance less the residual
deviance evaluated at the mean, i.e. 𝑝𝐷 = 𝐷(𝝋) − 𝐷(). The deviance information criterion 𝐷𝐼𝐶 =
𝐷(𝝋) + 𝑝𝐷 is used to compare models in terms of both their fit and complexity; models with lower
DIC are favoured. The contribution of individual studies to 𝑝𝐷 (known as leverages) can also be
calculated119; the leverage is a measure of the influence a study has on the estimated parameters.
It is worth bearing in mind that these measures evaluate the fit and complexity of the multivariate
Normal approximation to the likelihood; if the data/parameters are such that the approximation is
poor then they may not accurately reflect the fit and complexity of the “exact” underlying model.
Chapter II.4
86
Additionally, although inconsistency in the evidence network may be one factor contributing to poor
model fit, the residual deviance does not directly evaluate inconsistency, which should ideally be
assessed by other methods when performing a network meta-analysis (see II.7).
Chapter II.5
87
II.5 Population calibration module
For multi-outcome decision-making purposes such as benefit-risk assessments or heath economic
evaluations it is sometimes essential to translate the relative treatment effects from a meta-analysis
into real-world outcome estimates on an absolute scale, e.g. proportions and incidences rather than
relative risks or rate ratios. For this to happen, the relative effects must be combined with and
calibrated by a typical baseline value (or distribution) 90,91. This is akin to the situation in simple
linear regression where the slope coefficient (relative-effect) expresses the fundamental relationship
between variables but a constant intercept term is required in order to estimate predicted values
(absolute-effects). In the context of the RRMS case study the baseline or intercept term
corresponds to the outcome level in an untreated population of RRMS patients. I have chosen to
estimate its posterior distribution by a random-effects multivariate meta-analysis of the absolute
outcome levels across the set of all trial arms in the RRMS dataset, adjusted by the corresponding
treatment contrast from the treatment effects model. This assumes that the aim of the case study is
to assess the benefits and risks of the RRMS treatment in a generalised trial-eligible Western
population of RRMS patients, and is a convenient approach for illustration purposes as it makes use
of all the source data already identified for the case study. However it should be recognised that
outcomes on absolute scales tend to be much more heterogeneous than relative effects, and the
resulting wide distributions will contribute to the uncertainty of the overall results. When models
are used to inform real decisions, a better approach may be to carefully select the source data to
provide a more homogeneus sample that is highly relevant to the target population. Possible
approaches that could be used include:
• selecting a subset of the studies used in the treatment effects model, eg based on
demographic similarity to the target population. This approach would require minimal
changes to the models described here, being simply a matter of indexing;
• using a different set of studies altogether; note however that if there are any treated study
arms in the population calibration dataset that are not in the treatment effects dataset, and
a random effects model is used for the treatment effects, then the corresponding study-
specific treatment effects (needed for adjustment) will have a high degree of uncertainty.
• constructing explicit posteriors, perhaps based on external data (eg national/local statistics
or patient registries).
II.5.1 Statistical model
The model is superficially similar to that used for the main multivariate NMA in the treatment
contrast module, but is used rather differently. No inference is made regarding the underlying
Chapter II.5
88
treatment contrasts; these are assumed to be “known” (what this really means is that the posterior
distributions from the treatment effects module are used, but with no inferential feedback from the
population calibration model – this is achieved by using the “cut” function in BUGS). The aim of this
module is to model the baseline distribution of outcomes on the absolute scale in an untreated
population. The population-average value of outcome in an untreated population is denoted by
𝑎𝜔 , 𝜔 𝜖 1, … ,.
In the untreated population model a multivariate normal distribution is again assigned to 𝒚𝑖𝑘, the
NOi-length vector of observed outcomes in arm k of study i:
𝒚𝑖𝑘~ 𝑀𝑉𝑁(𝜶𝑖 + 𝜹𝑖𝑘 , 𝐂𝐕𝑖𝑘 ) (4)
where 𝜶𝑖 = 𝛼1,… , 𝛼𝑁𝑂𝑖 is the vector of untreated study-specific population means relating to the
NOi outcomes in study i, 𝜹𝑖𝑘 is the vector of “known” treatment effects in arm k of study i, and 𝐂𝐕𝑖𝑘
is the within-study covariance matrix, here taken to be the same as in the treatment effects module.
𝜶𝑖 is given a multivariate distribution, 𝜶𝑖 ~ 𝑀𝑉𝑁(𝒂𝑖 ,𝒊) where 𝒂𝑖 is a vector of length 𝑁𝑂𝑖 of whose
elements are 𝑎𝜔𝑖𝑗 and 𝒁𝒊 is a between-study covariance matrix of size 𝑁𝑂𝑖 ×𝑁𝑂𝑖. For simplicity and
coherence it is assumed here that the correlation between untreated outcomes 𝛼𝑖𝑗1 and 𝛼𝑖𝑗1 is the
same as the correlation between treatment effects 𝛿𝑖𝑗1𝑘 and 𝛿𝑖𝑗2𝑘 in the treatment effects module.
In other words, the diagonal elements of 𝒁𝒊 are equal to ζ2 (the between-study variance) and the
off-diagonal elements are equal to ρb ζ2 (or, if outcome-specific correlation propensities are used,
𝑠𝑖𝑔𝑛(𝜌𝑏𝑗1)𝑠𝑖𝑔𝑛(𝜌𝑏𝑗2)𝑟 √(|𝜌𝑏𝑗1𝜌𝑏𝑗2|)ζ2). It is however theoretically possible for correlations
between the untreated outcomes to differ from those between the treatment effects, and the
model can accommodate this if desired.
This is analogous to a “random effects” model; the corresponding “fixed effects” model is obtained
by replacing 𝛼𝑖𝑗 with its mean 𝑎𝜔𝑖𝑗 in (4) and can be used when there is little between-study
heterogeneity in the untreated outcomes.
For outcomes on a restricted scale, such as the binary risk difference, it is worth remembering to
apply the appropriate floor/ceiling to 𝛼𝑖𝑗 by using the min/max functions in the BUGS language
before passing the values on to any dependent nodes.
The variance decomposition described in II.4.4 is used again to express the multivariate Normal
distributions above as combinations of univariate Normals.
Chapter II.5
89
Since estimates of the parameters one is trying to estimate (i.e. the mean and variance of the
untreated outcome distributions) can be obtained directly from every study (in combination with the
assumed treatment contrasts from the treatment effects module) there is no need for the
parameterisation described in II.4.5, which was only necessary in the treatment contrasts model
because the untreated “baseline” outcomes were inestimable in studies with no untreated/placebo
group.
In some circumstances where no events are observed in untreated study arms, it may be reasonable
to assume an untreated event rate of zero rather than attempting to infer the rate. For example, the
occurrence of many treatment-related adverse events in untreated control arms may be zero – or
close enough to make no practical difference. The approach taken here, however, will be to model
the underlying rates in such instances.
It is possible to calculate the residual deviance for the population calibration module, just as in the
treatment effects module, but if (as here) the same data is used for both modules then the residual
deviance will also be the same in both. This is because a non-zero deviance only occurs when there
is inconsistency in the relative treatment contrasts (due either to mismatched treatment effects
within a “loop” in the network diagram or inconsistent mapping ratios) - and these contrasts are the
same in both modules. The study-specific “baseline” is defined differently in the two modules, but
in either case it is not subject to any consistency constraints that prevent it from fitting perfectly,
and therefore does not contribute to the residual deviance.
It is worth noting that performing such an analysis of outcomes on the absolute scale in order to
make differential inferences about the effect of treatments (or other trial characteristics) would not
be advisable as differences in studies or populations could confound any true effects; here, however,
the intention is simply to describe the extent of variability in untreated populations rather than to
seek to explain or classify it. The treatment effects model only makes inferences regarding
treatment contrasts, which are assumed to be homogeneous due to randomisation.
II.5.2 Priors
The priors used for the population calibration model are shown in Table 9 and mirror those used in
the treatment effects model.
Chapter II.5
90
Table 9 – Priors for the population calibration module.
Parameter name Parameter description Prior(s)
𝑎𝜔 Population-average value of
untreated outcome
aω ∈ ℝ:
𝑎𝜔 ~𝑁(0,1000)
aω ∈ [−1,1]:
𝑎𝜔 ~ 𝐵𝑒𝑡𝑎(0.5,0.5)
ζ Standard deviation of random
untreated outcomes distribution
ζ~𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10)
II.5.3 Outputs
The untreated population-average parameters 𝑎𝜔 are combined with the population-average
contrasts 𝑑𝜔𝑡 to give absolute-scale treatment effects 𝑥𝜔𝑡 :
𝑥𝜔𝑡 = 𝑎𝜔 + 𝑑𝜔𝑡 𝜔 𝜖 1, … ,, 𝑡 𝜖 1, … ,𝑁𝑇
These parameters are population averages, and due to the law of averages their posteriors exhibit
little uncertainty provided that there sufficient numbers of patients and/or studies in the trial data.
For decision-making purposes, however, it may not be the uncertainty on the average that is most
relevant but the variability around that average within a typical population. It is therefore often
more relevant to consider the predictive distribution of outcomes which incorporates this additional
variability around the average. Owing to the structure of the model, there are two levels of
additional uncertainty to consider. In the first place, one can allow for the between-study variability
in untreated outcomes and treatment effects by simulating new values (or rather, vectors) from the
posterior distributions of 𝜶𝑖 and 𝜹𝑖𝑘, with i in this case referring to a hypothetical unobserved study
that includes all treatments and outcomes. The posterior of 𝑖𝑘 = 𝜶𝑖 + 𝜹𝑖𝑘 for this hypothetical
study then corresponds to the predictive distribution of the study-level average outcome vector, the
elements of which will have the same expectation as 𝑥𝜔𝑡 but a bigger variance, reflecting the
observed between-study heterogeneity. This predictive distribution can be sampled within the
Bayesian MCMC environment by calculating 𝑖𝑘
within the model code and sampling from its
posterior.
This allows one to assess the distribution of treatment effects – and thus, ultimately, the benefit-risk
balance – in terms of “study-level” averages. This need not necessarily be literally interpreted as
Chapter II.5
91
referring to different “studies” but perhaps as regions, towns, clinics etc depending on how the
study populations are recruited. An average at this level might suffice for many decision-making
purposes, such as determining the costs of an intervention, where it is the aggregate sum to be
entered in the accounts that is key. It is important to recognise, however, that the patient-level
variability within these “study”-sized units has still not been accounted for.
For this, one must construct the predictive distribution of outcomes allowing for both between-
study and within-study (patient-level) variability. This is achieved by simulating new values (vectors)
from the posterior distribution of 𝒚𝑖𝑘 , 𝑤ith i again referring to the hypothetical unobserved study
with all treatments and outcomes. The mean vector of 𝒚𝑖𝑘 is given by 𝑖𝑘
, and the correlations
between components are the same as elsewhere in the model. The variance is however unknown.
For observed values of 𝒚𝑖𝑘 in the data, the sample variance was used as an estimator of the true
variance, as is conventional; for the hypothetical unobserved study there is no sample variance to
use and another approach is needed. The strategy used here is to estimate the marginal variance of
𝑦𝑖𝑗𝑘 for each outcome j within the model by assuming that the sample variances in the data are
observations from a common Normal distribution, the mean of which can be used as the variance
estimate for the new hypothetical study. To obtain the full patient-level variability in the predictive
distribution (rather than the sampling variability of a group average), the number of patients in each
treatment group should be set to 1.
(As an aside, it may be worth noting that one could also use this variance estimate in place of the
sample variances when specifying the distribution of 𝒚𝑖𝑘 for observed studies i. This will not
however be pursued here as it goes rather beyond the scope of this thesis, being more a matter of
general statistical modelling than one specific to multivariate evidence synthesis or benefit-risk.
Initial attempts suggest that it does not make much difference in this case.)
Strictly speaking the estimated variance from this method will reflect both between-patient
heterogeneity and measurement error of outcomes; it should therefore be an estimated upper
bound on the between-patient heterogeneity rather than a straightforward estimate.
Again, allowing for the additional patient-level uncertainty means that 𝒚𝑖𝑘 has greater variance than
𝑖𝑘
but with the same mean.
As a final step, the output parameters can be transformed back to their original scale (i.e. converting
log odds back to proportions, etc) if desired.
Chapter II.5
92
II.5.4 Rankings
It is straightforward to create treatment rankings for each outcome based on the population-level
average outcomes or the study- or patient-level predictive distributions. This does not require any
particular assumptions about a decision maker’s relative preferences for different outcomes - the
only value judgements that need to be specified are the (hopefully self-evident) impact signs for
each outcome: the impact is either positive (higher values are better) or negative (lower values are
better) depending on the outcome definition. This additional information can be supplied in the
data. Within the RRMS case study, only the proportion avoiding relapse has positive impact; for the
other outcomes, a lower value is better.
The rankings can be calculated within each MCMC iteration of the model so that a posterior
distribution of rankings is obtained. One way to present these distributions is via the Surface Under
the Cumulative Ranking Curve (SUCRA) statistic120, which summarises the posterior rankings for each
treatment as an overall rating between 0 and 1, where 0 is a treatment that is always outranked by
all others and 1 is a treatment that always outranks all others. One way to interpret the SUCRA for a
given treatment is that it represents the expected (or average) proportion of its competitors that it
outranks.
The rankings should be the same whether calculated in the treatment effects module or the
population calibration module (since the difference between the two is the untreated baseline,
which is equal for all treatments), and will also be the same regardless of whether the outcomes
have been transformed to another scale (since scale transformations are monotonic and have no
effect on ordering relations). The distribution of rankings will however depend on whether they are
based on the population-level average outcomes or the study- or patient-level predictive
distributions.
Rankings and SUCRAs are ordinal summaries that give no information about the magnitude of the
difference between treatments. Just because a treatment outranks another does not mean that the
difference is clinically or statistically significant, and the difference between adjacent ranks among a
set of treatments can vary greatly, potentially making the SUCRA somewhat misleading120. For this
reason SUCRAs should always be presented and interpreted alongside (rather than instead of)
summaries of the posterior distribution such as credibility intervals.
Chapter II.6
93
II.6 Results
Simulations were performed using the Markov Chain Monte Carlo technique in either WinBUGS
(version 1.4.3) 48 or OpenBUGS (version 3.2.2 rev 1063 - www.openbugs.net). Initial values were
generated within BUGS for the majority of models. 100,000 iterations were discarded to allow for
“burn-in”; the posterior statistics were then derived from a further 100,000 iterations. Convergence
was assessed by inspection of the sample histories. Model fit was assessed by calculation of the
mean residual deviance119, which in a well-fitting model should be similar to (or less than) the
number of independent observations.
Appendix B contains the BUGS code and RRMS data files used to generate the results.
II.6.1 Treatment effects module
II.6.1.1 Model 0: all outcomes independent
Figure 11 and Figure 12 show posterior summary statistics for the key parameters of the
naïve/univariate Model 0, with fixed and random effects respectively on the relapse, disability and
liver safety outcomes. The number of observations for the remaining outcomes (serious
gastrointestinal disorders, serious bradycardia and macular edema) is not sufficient to justify a
random effects model and so fixed effects have been used for these outcomes in both models. The
risk difference for these outcomes has also been magnified by a factor of 10 for clarity. Only
treatment-outcome combinations with data are shown.
Chapter II.6
94
Figure 11 - Posterior credibility intervals of relative treatment effects (population averages) from Model 0, fixed effects. Markers indicate means and lines indicate 95% credibility intervals. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. GI = gastrointestinal. sd = standard deviation.
Chapter II.6
95
Figure 12 - Posterior credibility intervals of relative treatment effects (population averages) from Model 0, random effects (except serious GI disorders, serious bradycardia and macular edema). Markers indicate means and lines indicate 95% credibility intervals. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. GI = gastrointestinal. sd = standard deviation.
Chapter II.6
96
The treatment effect estimates are largely similar from both models, albeit with slightly wider
distributions in the random effects model.
The fixed effect model does not fit well, with a mean residual deviance of 199.5, well in excess of
169 (the number of observations in the dataset). The residual deviance in the random effects model
is somewhat better at 174.5 but still exceeds the number of observations.For the remainder of the
chapter, all models will use random effects on relapse, disability progression and liver safety
outcomes, and fixed effects on serious GI disorders, serious bradycardia and macular edema, unless
otherwise indicated.
Chapter II.6
97
II.6.1.2 Model 1: Correlated non-zero outcomes
Figure 13 summarises the posterior distribution of the key parameters of Model 1, with all
correlation coefficients set to zero. Again only parameters relating to non-missing data are shown.
Figure 14 shows the results from Model 1 with assumed correlation coefficients of 0.6 between all
pairs of outcomes (at both the between- and within-study levels). This is probably not a realistic
correlation structure but serves to illustrate the model’s capabilities (see II.6.1.6 for more discussion
of this point).
Chapter II.6
98
Figure 13 - Posterior credibility intervals of relative treatment effects (population averages) from Model 1 (random effects), with all correlations between outcomes set to zero. Markers indicate means and lines indicate 95% credibility intervals. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. sd = standard deviation.
Chapter II.6
99
\ Figure 14 - Posterior credibility intervals of relative treatment effects (population averages) from Model 1 (random effects), with all correlations between outcomes set to 0.6. Markers indicate means and lines indicate 95% credibility intervals. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. sd = standard deviation.
Chapter II.6
100
As expected, the treatment effect posteriors in Figure 13 appear much the same as in Model 0
(seeFigure 12), apparently confirming that the Normal approximation to the true likelihood is
reasonable for the purpose of estimating the treatment effects. The mean residual deviance of 155.5
is in fairly close agreement with Model 0 (where a value of 156.9 is obtained for this restricted set of
outcomes) and is only slightly higher than the number of observations (152), indicating reasonable
fit.
Any slight differences between the two sets of results (from Model 0 and Model 1 with no
correlations) are due to a combination of three factors: (i) the difference between the original
Binomial/Poisson likelihood and the (approximate) Normal likelihood, (ii) the use of different priors
due to the different parameter scales associated with each version of the likelihood, and (iii) random
artefacts of the MCMC process (since the sampling algorithm only approximates the true posterior
distribution). Inspection of the results reveals no differences of particular significance or concern in
this case.
Comparing Figure 13 and Figure 14 reveals that including the correlations has had the following
impacts:
- The posterior standard deviation of the treatment effects has increased and the residual
deviance has increased slightly from 155.5 to 161.5, indicating worsening heterogeneity and
fit. This is likely to be because the assumed correlation structure (correlation of 0.6 between
all pairs of outcomes) is not realistic for this dataset, since some outcomes are likely to be
negatively correlated (see II.6.1.6).
- There have been various minor changes to the estimated treatment effect means, some
increasing and some decreasing, but with no overall systematic trend.
By this point it should be clear from the results that there is not straightforward way to rank the
treatments in terms of their overall benefit-risk balance as the results vary by outcome. Indeed, the
best- and worst-performing treatments are different for almost every outcome (not counting
placebo).
The treatment effect estimates mostly seem reasonable, but there are some slight surprises:
• The performance of subcutaneous interferon beta-1a and interferon beta-1a with regard to
disability progression is rather confusing, with each of these treatments performing very
well on one outcome measure but very poorly on the other.
Chapter II.6
101
• It is unexpected that so many treatments should perform better than placebo with regard to
ALT elevation above 5x the upper limit of the normal range.
While these could be chance findings, it is also possible that they are the result of bias due to
inconsistency in the network. In the case of ALT elevation above 5x the upper limit of the normal
range, the network is especially sparse, making the estimates particularly vulnerable to any chance
findings, biased studies or between-study heterogeneity.
II.6.1.3 Model 2: fixed baseline
Figure 15 and Figure 16 summarise the posterior distributions of the key parameters of Model 2,
with all correlation coefficients again set either to 0 or 0.6. Again only parameters relating to non-
missing data are shown.
Chapter II.6
102
Figure 15 - Posterior credibility intervals of relative treatment effects (population averages) from Model 2 (random effects), with all correlations between outcomes set to zero. Markers indicate means and lines indicate 95% credibility intervals. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. sd = standard deviation.
Chapter II.6
103
Figure 16 - Posterior credibility intervals of relative treatment effects (population averages) from Model 2 (random effects), with all correlations between outcomes set to zero. Markers indicate means and lines indicate 95% credibility intervals. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. sd = standard deviation.
Chapter II.6
104
Again, with correlations set to zero there are some minor changes in the treatment effect estimates
between this model and Models 0 and 1, but these appear insubstantial. For the most part, there
also appears to be little difference between the results of Model 1 and 2, even with correlations
present, suggesting that there may be little practical difference between the variable- and fixed-
baseline parameterisations; however, it is the fixed-baseline version that will be taken forward here
owing to its theoretical properties.
One can examine the study-level contributions to the fit and complexity statistics in order to identify
any outlying studies. However, since each study contributes a different number of observations
(treatment arms multiplied by outcomes), the studies with more observations will tend to contribute
more. To adjust for this, it may be helpful simply to divide each study’s contribution by the number
of observations in order to obtain the average contribution per observation. This is done in Figure
17, which plots each study’s average contribution to the residual deviance per observation
(horizontal axis) against its average contribution to the complexity (leverage) per observation
(vertical axis). If a study fits well then the deviance per observation should not be much greater than
1, resulting in an overall model deviance roughly equal to the number of observations. The leverage
per observation should generally be less than 1, as otherwise the model will have more parameters
than observations (since the effective number of parameters is the sum of the study-level
leverages). Any study with both a high deviance contribution and a high leverage can be regarded as
an outlier that is adversely affecting the overall model fit and complexity.
0
0.2
0.4
0.6
0.8
1
1.2
0 0.5 1 1.5 2 2.5 3
Leve
rage
per
ob
sera
vtio
n (
stu
dy-
leve
l ave
rage
)
Deviance contribution per observation (study-level average)
Model 2, correlations of 0.6
BORNSTEIN 1987
Chapter II.6
105
Figure 17 – Deviance and complexity (leverage) per observation for individual studies in the RRMS dataset (Model 2, correlations of 0.6).
In this case the majority of studies show reasonable deviance contributions (clustered close to 1) and
leverages (all lying below 1) except for BORNSTEIN 198794, which has a high deviance contribution.
As the earliest trial in the dataset by several years, there may be differences in population
characteristics, aspects of clinical care or study conduct that result in its treatment effects being
heterogeneous with the other studies. It is also the smallest study in the dataset and therefore the
most prone to chance sampling error. Since the leverage for this study is low, however, its influence
on the overall results should be modest and it seems reasonable to retain it in the dataset.
II.6.1.4 Model 3: mappings
Figure 18 and Figure 19 summarise the posterior distribution of the key parameters of Model 3, for
fixed mappings and random-by-treatment mappings applied respectively, with all outcomes in one
mapping group, and all correlation coefficients set to 0.6. The treatment effect parameters with
missing data have now been included as they can be estimated via the mappings. The mapping
ratios themselves are not shown here but details can be found in Appendix C, along with results for
alternative correlation structures.
Chapter II.6
106
Figure 18 - Posterior credibility intervals of relative treatment effects (population averages) from Model 3 (random effects, fixed mappings, one mapping group, all correlation coefficients between outcomes = 0.6). Markers indicate means and lines indicate 95% credibility intervals. Solid markers and lines are treatment-outcome combinations for which data was available; hollow markers and dashed lines are estimated by mappings. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. sd = standard deviation.
Chapter II.6
107
Figure 19 - Posterior credibility intervals of relative treatment effects (population averages) from Model 3 (random effects, random mappings, one mapping group, all correlation coefficients between outcomes = 0.6). Markers indicate means and lines indicate 95% credibility intervals. Solid markers and lines are treatment-outcome combinations for which data was available; hollow markers and dashed lines are estimated by mappings. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. sd = standard deviation.
Chapter II.6
108
The results from both models are for the most part very similar, but some of the more extreme
effect sizes appear to be moderated in the fixed-mapping model. The random-mapping model
shows marginally better fit. Curiously, the complexity of the random-mapping model (as measured
by the effective number of parameters pd) is less than that of the fixed-mapping model, which goes
contrary to expectations. The reason for this is not immediately clear, but it may be that pd is a
misleading measure for this structure of model and should ot be too heavily relied upon for model
selection.
Plots of the average deviance contribution per observation against the average leverage per
observation are shown in Figure 20 and Figure 21 below. For the fixed mapping model there is little
change from Model 2, but in the random-mapping model the leverage of the outlying study
(BORNSTEIN 1987) is reduced almost to zero. This indicates that this study is having very little
impact on the results, presumably because (i) the mappings allow strength to be borrowed from
elsewhere in the network, and (ii) the random-mappings formulation means that mappings derived
from this study have little impact on the rest of the model, since only one other study uses the same
active treatment.
Figure 20 – Deviance and complexity (leverage) per observation for individual studies in the RRMS dataset (Model 3, correlations of 0.6, fixed mappings in one group).
0
0.2
0.4
0.6
0.8
1
1.2
0 0.5 1 1.5 2 2.5 3
Leve
rage
per
ob
sera
vtio
n (
stu
dy-
leve
l ave
rage
)
Deviance contribution per observation (study-level average)
Model 3, fixed mappings
BORNSTEIN 1987
Chapter II.6
109
Figure 21 – Deviance and complexity (leverage) per observation for individual studies in the RRMS dataset (Model 3, correlations of 0.6, random mappings in one group).
The mappings have allowed the missing treatment-outcome combinations to be estimated, at the
cost of some distortion in the estimates of the other effects (compared to the estimates obtained
from Model 2, without mappings). Effectively the estimates are all brought more in line with the
proportionality assumption.
The extent of this distorting effect may depend on how the outcomes are grouped for mapping
purposes. To examine this, Table 10 explores the impact of altering the mapping groups within the
random-mapping Model 3. Again all correlation coefficients have been set to 0.6. Rows
corresponding to missing data are highlighted.
0
0.2
0.4
0.6
0.8
1
1.2
0 0.5 1 1.5 2 2.5
Leve
rage
per
ob
sera
vtio
n (
stu
dy-
leve
l ave
rage
)
Deviance contribution per observation (study-level average)
Model 3, random mappings
BORNSTEIN 1987
Chapter II.6
110
Table 10 – Posterior distributions from Model 4: effect of varying mapping groups (random effects, all correlation coefficients between outcomes = 0.6). DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. GI = gastrointestinal. sd = standard deviation.
RANDOM EFFECTS FIXED MAPPINGS MODEL
1 group 2 groups 3 groups No mappings
mean sd mean sd mean sd mean sd
Log annual relapse rate ratio (vs placebo)
DF -0.443 0.142 -0.620 0.138 -0.658 0.140 -0.699 0.181
FM -0.730 0.163 -0.714 0.158 -0.759 0.153 -0.726 0.184
GA -0.274 0.101 -0.460 0.109 -0.467 0.108 -0.379 0.142
IA (IM) -0.216 0.087 -0.266 0.089 -0.270 0.094 -0.201 0.160
IA (SC) -0.272 0.106 -0.338 0.111 -0.327 0.117 0.026 0.192
IB -0.421 0.129 -0.521 0.128 -0.505 0.126 -0.407 0.171
LM -0.253 0.091 -0.259 0.090 -0.236 0.096 -0.205 0.179
TF -0.296 0.120 -0.380 0.135 -0.392 0.147 -0.377 0.257
Log odds ratio of avoiding relapse (vs placebo)
DF 0.832 0.158 0.755 0.143 0.739 0.149 0.715 0.184
FM 0.856 0.158 0.816 0.149 0.875 0.159 0.921 0.194
GA 0.647 0.134 0.559 0.120 0.515 0.124 0.632 0.160
IA (IM) 0.388 0.118 0.331 0.106 0.304 0.110 0.384 0.182
IA (SC) 0.573 0.161 0.496 0.149 0.377 0.151 0.804 0.227
IB 0.761 0.170 0.704 0.158 0.644 0.158 0.766 0.214
LM 0.401 0.115 0.358 0.105 0.286 0.111 0.326 0.184
TF 0.512 0.168 0.449 0.156 0.421 0.163 0.429 0.275
Log odds ratio of disability progression confirmed 3 months later (vs placebo)
DF -0.314 0.129 -0.472 0.137 -0.507 0.173 -0.528 0.201
FM -0.398 0.138 -0.481 0.134 -0.391 0.170 -0.296 0.201
GA -0.193 0.094 -0.355 0.114 -0.464 0.158 -0.179 0.194
IA (IM) -0.195 0.102 -0.208 0.088 -0.294 0.169 -0.301 0.250
IA (SC) -0.489 0.241 -0.356 0.170 -0.656 0.259 -0.650 0.277
IB -0.350 0.151 -0.425 0.144 -0.648 0.231 -0.110 0.288
LM -0.283 0.124 -0.239 0.098 -0.409 0.154 -0.385 0.214
TF -0.259 0.127 -0.283 0.122 -0.348 0.245 -0.400 0.288
Log odds ratio of disability progression confirmed 6 months later (vs placebo)
DF -0.356 0.155 -0.530 0.179 -0.494 0.198 -0.473 0.286
FM -0.508 0.157 -0.566 0.167 -0.412 0.183 -0.412 0.213
GA -0.241 0.116 -0.411 0.147 -0.458 0.178 -0.082 0.268
IA (IM) -0.245 0.118 -0.241 0.103 -0.281 0.158 -0.438 0.219
IA (SC) -0.319 0.147 -0.330 0.138 -0.530 0.215 0.477 0.390
IB -0.836 0.391 -0.623 0.289 -0.833 0.358 -1.513 0.408
LM -0.352 0.141 -0.281 0.120 -0.431 0.163 -0.483 0.221
TF -0.357 0.252 -0.336 0.177 -0.341 0.302 -0.046 31.600
Log odds ratio of ALT above upper limit of normal range (vs placebo)
DF 0.669 0.181 0.322 0.202 0.301 0.209 0.254 0.229
FM 1.233 0.257 1.300 0.269 1.380 0.282 1.368 0.313
GA 0.310 0.127 -0.120 0.204 -0.183 0.215 -0.124 0.220
IA (IM) 0.583 0.169 0.617 0.198 0.590 0.206 0.628 0.221
IA (SC) 1.133 0.366 1.278 0.406 1.075 0.427 1.614 0.463
IB 1.304 0.297 1.063 0.355 0.958 0.369 1.109 0.377
LM 0.721 0.144 0.830 0.150 0.761 0.157 0.735 0.191
TF 0.875 0.212 0.795 0.234 0.781 0.244 0.875 0.275
Log odds ratio of ALT above 3x upper limit of normal range (vs placebo)
DF 0.539 0.179 0.242 0.164 0.215 0.166 0.225 0.277
FM 1.079 0.248 1.007 0.263 1.072 0.284 1.300 0.309
GA 0.402 0.171 -0.079 0.154 -0.113 0.152 0.316 0.301
IA (IM) 0.366 0.169 0.454 0.191 0.397 0.185 0.213 0.392
IA (SC) 0.732 0.484 1.203 0.744 0.774 0.496 -0.060 31.650
Chapter II.6
111
IB 0.998 0.641 0.958 0.555 0.828 0.558 -0.069 31.640
LM 0.753 0.241 0.801 0.237 0.657 0.241 0.835 0.326
TF 0.403 0.168 0.509 0.187 0.466 0.191 -0.003 0.376
Log odds ratio of ALT above 5x upper limit of normal range (vs placebo)
DF 0.175 0.139 0.075 0.083 0.072 0.087 -0.529 0.454
FM 0.303 0.237 0.306 0.250 0.371 0.310 0.643 0.460
GA 0.124 0.108 -0.029 0.071 -0.047 0.080 -0.113 0.410
IA (IM) 0.120 0.108 0.150 0.133 0.145 0.137 -0.327 0.540
IA (SC) 0.216 0.221 0.376 0.381 0.264 0.275 -0.183 31.560
IB 0.295 0.303 0.302 0.319 0.287 0.322 -0.088 31.700
LM 0.157 0.132 0.226 0.183 0.201 0.169 -0.338 0.581
TF 0.170 0.175 0.188 0.177 0.187 0.181 0.095 31.670
Average mapping (reference outcomes have constant mapping of 1)
log ARR 1 1 1 1 1 1 N/A N/A
logit avoid relapse 2.270 0.933 1.396 0.482 1.235 0.363 N/A N/A
logit 3M DP -1.046 0.415 -0.843 0.236 1 1 N/A N/A
logit 6M DP -1.316 0.558 -0.981 0.322 -1.061 0.327 N/A N/A
logit ALT>ULN 2.933 1.130 1 1 1 1 N/A N/A
logit ALT>3xULN 2.216 1.007 0.818 0.215 0.740 0.210 N/A N/A
logit ALT>5xULN 0.665 0.596 0.259 0.206 0.258 0.205 N/A N/A
Between-study treatment effects sd 0.250 0.052 0.256 0.049 0.262 0.053 0.221 0.061
Between-treatment mapping sd 0.566 0.161 0.263 0.158 0.264 0.170 N/A N/A
Residual deviance 161.0 17.5 161.5 17.5 162.1 17.8 161.4 17.6
Comparing the columns of Table 10, the general trend appears to be that the higher the number
mapping groups, the closer the estimates are to those from the model without mappings (for effects
that the latter model is able to estimate). In other words the distortion induced by the mappings is
less when the mappings are only applied between more closely related outcomes, as one would
expect. This confirms that one should carefully consider how to group the outcomes when using
mappings, and try to ensure outcomes remain as similar as possible within each group.
II.6.1.5 Models 4a/4b: outcomes with zeroes
Table 11 shows posterior summary statistics from Models 4a and 4b, for only the three outcomes
that were excluded from Models 1-3 due to the presence of zeroes (serious gastrointestinal
disorders, serious bradycardia and macular edema). Fixed effects have been used for these
outcomes owing to the small number of observations. All correlations involving these three
outcomes have been set to zero, i.e. they are modelled in a univariate fashion, and they have not
been subjected to mappings. This allows for a fair comparison between the estimates produced by
the two models and the empirical estimates of the treatment effects (also shown in the table;
obtained as the difference in observed proportions between study arms, taking an average weighted
by patient numbers where more than one study contributes). Both models used identical
beta(0.5,0.5) priors on the treatment effects d and “baseline” probabilities . A range of constant
continuity corrections (added to the observed proportion in all study arms) have been used in
conjunction with Model 4b.
Chapter II.6
112
Table 11 – Posterior distributions from Models 4a and 4b for the “zeroes” outcomes (fixed effects, no correlations between outcomes), and empirical treatment effect estimates. Only treatment-outcome combinations with data are shown; all other treatment effect parameters for these outcomes are set to zero. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, LQ = laquinimod, TF = teriflunomide, cc = continuity correction.
Mean sd 2.50% 97.50% Empirical estimate
Mean – empirical
difference Model 4a (binomial likelihood)
Serious GI, DF 0.01012 0.00381 0.00381 0.01860 0.01041 -0.00029
Serious GI, GA 0.00142 0.00201 0.00000 0.00714 0 0.00142
Serious GI, LQ 0.01215 0.00584 0.00137 0.02463 0.01275 -0.00060
Serious GI, TF 0.01860 0.00892 0.00214 0.03776 0.01957 -0.00097
Serious bradycardia, FM 0.00209 0.00232 0.00000 0.00825 0.00251 -0.00042
Macular edema, FM 0.00124 0.00140 0.00000 0.00501 0.00128 -0.00004
Residual deviance 161.2 17.39 128.7 197.1 n/a n/a
Model 4b (normal likelihood, cc=0.01)
Serious GI, DF 0.01043 0.00031 0.00982 0.01104 0.01041 0.00002 Serious GI, GA 0.00013 0.00015 0.00000 0.00055 0 0.00013
Serious GI, LQ 0.01274 0.00034 0.01207 0.01341 0.01275 -0.00001
Serious GI, TF 0.01956 0.00059 0.01842 0.02070 0.01957 -0.00001
Serious bradycardia, FM 0.00211 0.00030 0.00152 0.00270 0.00251 -0.00040
Macular edema, FM 0.00104 0.00027 0.00051 0.00157 0.00128 -0.00024
Residual deviance 481.1 18.3 446.9 518.8 n/a n/a
Model 4b (normal likelihood, cc=0.025)
Serious GI, DF 0.01047 0.00043 0.00964 0.01130 0.01041 0.00006 Serious GI, GA 0.00021 0.00024 0.00000 0.00088 0 0.00021
Serious GI, LQ 0.01274 0.00046 0.01185 0.01364 0.01275 -0.00001
Serious GI, TF 0.01955 0.00075 0.01809 0.02101 0.01957 -0.00002
Serious bradycardia, FM 0.00252 0.00043 0.00166 0.00337 0.00251 0.00000
Macular edema, FM 0.00103 0.00042 0.00017 0.00185 0.00128 -0.00025
Residual deviance 319.3 18.4 285.0 357.3 n/a n/a
Model 4b (normal likelihood, cc=0.05)
Serious GI, DF 0.01050 0.00057 0.00940 0.01162 0.01041 0.00009
Serious GI, GA 0.00031 0.00036 0.00000 0.00127 0 0.00031
Serious GI, LQ 0.01273 0.00060 0.01155 0.01391 0.01275 -0.00002 Serious GI, TF 0.01954 0.00095 0.01768 0.02142 0.01957 -0.00003
Serious bradycardia, FM 0.00266 0.00059 0.00150 0.00381 0.00251 0.00015
Macular edema, FM 0.00098 0.00057 0.00002 0.00213 0.00128 -0.00030
Residual deviance 253.4 18.3 219.3 291.1 n/a n/a
Model 4b (normal likelihood, cc=0.10)
Serious GI, DF 0.01053 0.00076 0.00905 0.01202 0.01041 0.00012
Serious GI, GA 0.00044 0.00050 0.00000 0.00179 0 0.00044
Serious GI, LQ 0.01272 0.00080 0.01116 0.01429 0.01275 -0.00003
Serious GI, TF 0.01953 0.00125 0.01709 0.02198 0.01957 -0.00004
Serious bradycardia, FM 0.00270 0.00081 0.00110 0.00428 0.00251 0.00019
Macular edema, FM 0.00096 0.00070 0.00001 0.00248 0.00128 -0.00032
Residual deviance 217.2 18.4 183.1 255.0 n/a n/a
Model 4b (normal likelihood, cc=0.25)
Serious GI, DF 0.01056 0.00107 0.00848 0.01267 0.01041 0.00015
Serious GI, GA 0.00067 0.00075 0.00000 0.00267 0 0.00067
Serious GI, LQ 0.01270 0.00113 0.01050 0.01491 0.01275 -0.00005
Serious GI, TF 0.01949 0.00172 0.01612 0.02287 0.01957 -0.00008
Serious bradycardia, FM 0.00260 0.00117 0.00028 0.00489 0.00251 0.00009
Macular edema, FM 0.00102 0.00087 0.00000 0.00306 0.00128 -0.00026
Residual deviance 195.6 18.5 161.2 233.6 n/a n/a
Chapter II.6
113
Both models 4a and 4b get reasonably close to the empirical treatment effect means. 4b, using a
Normal approximation, actually appears to perform slightly better in this regard than 4a, perhaps
surprisingly, but in either case the difference is not great enough to be of any concern. In model 4b
the treatment effect means are insensitive to the continuity correction, as expected. The treatment
effect standard deviations are positively correlated with the continuity correction but always fall
short of the variances from the exact binomial likelihood by an order of magnitude or two. This is
also true of the upper 97.5% points which may be a more sensible measure of spread as the
distribution is skewed due to the floor at zero. Conversely, the residual deviance decreases with
increasing continuity correction values, because the higher variance results in smaller standardised
residuals for any points that do not fit perfectly - but always remains above that from the exact
binomial likelihood.
By artificially inflating the sample variances supplied to model 4b, the posterior treatment effect
distributions can be calibrated to broadly match those in the exact binomial model. Table 12 shows
the impact of inflating the sample variances by a factor of 100.
Chapter II.6
114
Table 12 - Posterior distributions from Models 4a and 4b (with 100x inflated sample variances) for the “zeroes” outcomes (fixed effects, no correlations between outcomes), and empirical treatment effect estimates. Only treatment-outcome combinations with data are shown; all other treatment effect parameters for these outcomes are set to zero. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, LQ = laquinimod, TF = teriflunomide, cc = continuity correction.
Mean sd 2.50% 97.50% Empirical estimate
Mean – empirical
difference
Model 4a (binomial likelihood) Serious GI, DF 0.01012 0.00381 0.00381 0.01860 0.01041 -0.00029
Serious GI, GA 0.00142 0.00201 0.00000 0.00714 0 0.00142
Serious GI, LQ 0.01215 0.00584 0.00137 0.02463 0.01275 -0.00060
Serious GI, TF 0.01860 0.00892 0.00214 0.03776 0.01957 -0.00097
Serious bradycardia, FM 0.00209 0.00232 0.00000 0.00825 0.00251 -0.00042
Macular edema, FM 0.00124 0.00140 0.00000 0.00501 0.00128 -0.00004
Residual deviance 161.2 17.39 128.7 197.1 n/a n/a
Model 4b (normal likelihood, cc=0.01) Serious GI, DF 0.01016 0.00319 0.00377 0.01638 0.01041 -0.00025
Serious GI, GA 0.00170 0.00189 0.00000 0.00669 0 0.00170
Serious GI, LQ 0.01215 0.00351 0.00518 0.01898 0.01275 -0.00060
Serious GI, TF 0.01835 0.00622 0.00608 0.03016 0.01957 -0.00122
Serious bradycardia, FM 0.00221 0.00206 0.00000 0.00722 0.00251 -0.00031
Macular edema, FM 0.00159 0.00161 0.00000 0.00568 0.00128 0.00031
Residual deviance 177.1 18.2 143.2 214.5 n/a n/a
Model 4b (normal likelihood, cc=0.025) Serious GI, DF 0.00992 0.00449 0.00102 0.01874 0.01041 -0.00049
Serious GI, GA 0.00257 0.00289 0.00000 0.01027 0 0.00257
Serious GI, LQ 0.01164 0.00482 0.00188 0.02090 0.01275 -0.00111
Serious GI, TF 0.01758 0.00792 0.00171 0.03308 0.01957 -0.00199
Serious bradycardia, FM 0.00294 0.00283 0.00001 0.00997 0.00251 0.00043
Macular edema, FM 0.00227 0.00235 0.00000 0.00830 0.00128 0.00099
Residual deviance 175.4 18.1 141.9 212.5 n/a n/a
Model 4b (normal likelihood, cc=0.05) Serious GI, DF 0.00964 0.00567 0.00021 0.02129 0.01041 -0.00076
Serious GI, GA 0.00357 0.00399 0.00001 0.01416 0 0.00357
Serious GI, LQ 0.01101 0.00610 0.00030 0.02328 0.01275 -0.00174
Serious GI, TF 0.01680 0.00961 0.00040 0.03629 0.01957 -0.00277
Serious bradycardia, FM 0.00367 0.00364 0.00001 0.01280 0.00251 0.00116
Macular edema, FM 0.00304 0.00320 0.00001 0.01128 0.00128 0.00176
Residual deviance 174.3 18.1 140.6 211.4 n/a n/a
Model 4b (normal likelihood, cc=0.10) Serious GI, DF 0.00971 0.00697 0.00006 0.02480 0.01041 -0.00069
Serious GI, GA 0.00496 0.00550 0.00001 0.01965 0 0.00496
Serious GI, LQ 0.01059 0.00737 0.00007 0.02640 0.01275 -0.00216
Serious GI, TF 0.01637 0.01147 0.00013 0.04110 0.01957 -0.00320
Serious bradycardia, FM 0.00465 0.00474 0.00001 0.01674 0.00251 0.00214
Macular edema, FM 0.00403 0.00428 0.00001 0.01515 0.00128 0.00275
Residual deviance 173.5 18.0 139.9 210.4 n/a n/a
Model 4b (normal likelihood, cc=0.25) Serious GI, DF 0.01062 0.00878 0.00004 0.03076 0.01041 0.00021
Serious GI, GA 0.00731 0.00807 0.00001 0.02872 0 0.00731
Serious GI, LQ 0.01105 0.00906 0.00004 0.03183 0.01275 -0.00170
Serious GI, TF 0.01698 0.01404 0.00005 0.04935 0.01957 -0.00259
Serious bradycardia, FM 0.00615 0.00641 0.00001 0.02262 0.00251 0.00364
Macular edema, FM 0.00568 0.00610 0.00001 0.02165 0.00128 0.00440
Residual deviance 172.6 18.0 139.3 209.6 n/a n/a
Although not perfect, a continuity correction of 0.025 with sample variances inflated by 100 provides
a reasonable fit to the treatment effect posteriors in the binomial model. The residual deviance has
also moved in the right direction, albeit with a little way to go to match the fit of the exact binomial
model. The inflation factor and continuity correction could be refined, or alternative approximations
used, to more closely replicate the posteriors if desired (perhaps even using a different formula for
Chapter II.6
115
different outcomes, studies or arms). In this instance a broad-brush approach is sufficient to
illustrate the principle, and further refinement will not be sought; instead the continuity correction
of 0.025 and inflation factor of 100 will be adopted. This implies an estimated variance of
(0.025 + 𝑝)(0.975 − 𝑝) × 100 𝑁⁄ for the mean proportion of patients experiencing a given
outcome in a study arm with observed proportion p and N patients. Whether this approximation is
generalizable to other datasets is not clear at this stage and may be worthy of further investigation.
Having derived the inflated sample variances, it is a good idea to remove the continuity correction
from the estimated proportions themselves (i.e. the data items y). As the same continuity correction
is added to the observed proportion in all study arms, it should make no difference to the risk
difference d. However, the same dataset will be used for the population calibration module, where
preserving the true proportions at arm level will be more important.
When the variances have been adjusted appropriately, Model 4b can then be re-run with within-
study correlations if desired, as shown in Figure 22 where within-study correlations of 0.6 between
all pairs of outcomes have been assumed.
Chapter II.6
116
Figure 22 - Posterior credibility intervals of relative treatment effects (population averages) from Model 4b (random effects, random mappings, one mapping group, all correlation coefficients between outcomes = 0.6, sample variances estimated as (𝟎.𝟎𝟐𝟓+ 𝒑)(𝟎.𝟗𝟕𝟓− 𝒑) × 𝟏𝟎𝟎 𝑵⁄ for the “zeroes” outcomes). Markers indicate means and lines indicate 95% credibility intervals. Solid markers and lines are treatment-outcome combinations for which data was available; hollow markers and dashed lines are estimated by mappings. Any treatment-outcome combinations not shown are assumed to have a zero effect. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. GI = gastrointestinal. sd = standard deviation.
Chapter II.6
117
To recap, the following process has been used to derive these results:
1. Obtain posterior statistics from the model using the exact likelihood for the “zeroes”
outcomes (in this case, Binomial) without allowing for within-study correlations
2. Use a continuity correction to obtain a Normal approximation to the likelihood, and (still
assuming no within-study correlations) adjust the sample variances in the data to obtain
posterior distributions that match those in step 1.
3. Obtain final posterior estimates from the adjusted-variance Normal model, this time with an
allowance for within-study correlations.
Although it requires several model runs and two versions of the code, this procedure appears to be a
reasonable way to incorporate within-study correlations, which is prohibitively difficult using a
Binomial likelihood.
A plot of the average deviance contribution per observation against the average leverage per observation for this model is shown in Figure 23, and looks very similar to those seen earlier.
Figure 23 – Deviance and complexity (leverage) per observation for individual studies in the RRMS dataset (Model 4b, correlations of 0.6, random mappings in one group).
0
0.2
0.4
0.6
0.8
1
1.2
0 0.5 1 1.5 2 2.5 3
Leve
rage
per
ob
sera
vtio
n (
stu
dy-
leve
l ave
rage
)
Deviance contribution per observation (study-level average)
Model 4b, random mappings in one group
BORNSTEIN 1987
Chapter II.6
118
Either model 4a or 4b can in principle incorporate random effects, between-study correlations or
mappings to the new outcomes but as these do not seem appropriate for the RRMS case study, no
such results are used here.
II.6.1.6 Final models
To allow estimation of all the treatment effects but only map between outcome measures that
relate to the same benefit or risk, the three-group version of model 4b (with the amended sample
variance formula for the “zeroes” outcomes, as set out in the previous section) will be used for the
remainder of the thesis. As before, fixed effects will be used for the “zeroes” outcomes and random
effects for all others, and fixed correlations of 0.6 between all pairs of outcomes at the between-
and within-study levels will be assumed. The value of 0.6 is somewhat arbitrary, having been chosen
simply as the moderate “middling” option of the three positive constants that were when building
the models (results for the alternative values of 0.3 and 0.9 are available in Appendix C). In reality a
positive correlation between all of the RRMS outcomes seems unfeasible, since one would expect
some pairs of outcomes (such as the annualised relapse rate and the relapse-free proportion) to be
negatively correlated. Indeed, the results corroborate this, with the random effects standard
deviation and residual deviance both increasing for higher (positive) assumed correlations, indicating
worsening model fit. Negative correlations can be allowed for using the extended correlation
structure described in II.4.4.1.3 and results on this basis are presented in Appendix C; this probably
gives a more realistic model but also increases run-time, so for practical reasons correlations of 0.6
have been assumed instead.
Figure 24 and Figure 25 summarise the posterior distribution of the key parameters from the fixed-
and random-mapping versions of this final model respectively.
Chapter II.6
119
Figure 24 - Posterior distributions of relative treatment effects (population averages) on Normal scale from the final model (fixed effects on “zeroes” outcomes, otherwise random effects, three mapping groups, all correlation coefficients between outcomes = 0.6). Markers indicate means and lines indicate 95% credibility intervals. Solid markers and lines are treatment-outcome combinations for which data was available; hollow markers and dashed lines are estimated by mappings. Any treatment-outcome combinations not shown are assumed to have a zero effect. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. GI = gastrointestinal. sd = standard deviation.
Chapter II.6
120
Figure 25 - Posterior distributions of relative treatment effects (population averages) on Normal scale from the final model (fixed effects on “zeroes” outcomes, otherwise random effects, three mapping groups, all correlation coefficients between outcomes = 0.6). Markers indicate means and lines indicate 95% credibility intervals. Solid markers and lines are treatment-outcome combinations for which data was available; white markers and dashed lines are estimated by mappings. Any treatment-outcome combinations not shown are assumed to have a zero effect. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. GI = gastrointestinal. sd = standard deviation.
Chapter II.6
121
The fixed- and random-mapping models appear very similar both in terms of the estimated
treatment effects and the overall fit. The most obvious difference is that the treatment effects
which are imputed by mappings have wider distributions in the random-mapping model. Either
model could be suitable as the basis for a benefit-risk assessment, and ultimately the choice
between them may come down to one’s view of the proportionality assumption, which is stronger in
the fixed-mapping version. The choice between these models will be discussed further in IV.1.2.
Plots of the average deviance contribution per observation against the average leverage per
observation for these models are shown in Figure 26 and Figure 27 below, and again the pattern is
very similar to those encountered earlier, with BORNSTEIN 1987 the only outlying study but not of
major concern due to its low leverage.
Figure 26 – Deviance and complexity (leverage) per observation for individual studies in the RRMS dataset (Final model, fixed mappings in three groups).
0
0.2
0.4
0.6
0.8
1
1.2
0 0.5 1 1.5 2
Leve
rage
per
ob
sera
vtio
n (
stu
dy-
leve
l ave
rage
)
Deviance contribution per observation (study-level average)
Final model, fixed mappings in three groups
BORNSTEIN 1987
Chapter II.6
122
Figure 27 – Deviance and complexity (leverage) per observation for individual studies in the RRMS dataset (Final model, random mappings in three groups).
II.6.2 Population calibration module
Table 13 shows the posterior distribution of the key untreated population parameters from the
population calibration module.
0
0.2
0.4
0.6
0.8
1
1.2
0 0.5 1 1.5 2 2.5
Leve
rage
per
ob
sera
vtio
n (
stu
dy-
leve
l ave
rage
)
Deviance contribution per observation (study-level average)
Final model, random mappings in three groups
BORNSTEIN 1987
Chapter II.6
123
Table 13 - Posterior distributions of untreated population outcomes on Normal scale from population calibration module.
Mean sd 2.5% 97.5%
Untreated population averages on Normal scale Log annualised relapse rate -0.473 0.273 -1.012 0.066 Log odds of avoiding relapse -0.292 0.282 -0.849 0.264
Log odds of disability progression, confirmed 3 months later -1.036 0.308 -1.643 -0.430
Log odds of disability progression, confirmed 6 months later -1.424 0.341 -2.099 -0.754
Log odds of ALT > ULN -2.128 0.353 -2.833 -1.447
Log odds of ALT > 3x ULN -3.279 0.363 -3.998 -2.568 Log odds of ALT > 5x ULN -4.009 0.447 -4.892 -3.140
Proportion with serious gastrointestinal events 0.0002309 0.0004090 0.0000001 0.0015180
Proportion with serious bradycardia 0.0029820 0.0015840 0.0001516 0.0061160
Proportion with macular edema 0.0007984 0.0008261 0.0000013 0.0029080
Untreated population averages on transformed scale
Annualised relapse rate 0.647 0.180 0.364 1.068
Proportion avoiding relapse 0.429 0.068 0.300 0.566
Proportion with disability progression, confirmed 3 months later 0.266 0.059 0.162 0.394
Proportion with disability progression, confirmed 6 months later 0.200 0.054 0.109 0.320
Proportion with ALT > ULN 0.111 0.035 0.056 0.191
Proportion with ALT > 3x ULN 0.038 0.014 0.018 0.071
Proportion with ALT > 5x ULN 0.020 0.009 0.007 0.041
Proportion with serious gastrointestinal events 0.0002309 0.0004090 0.0000001 0.0015180
Proportion with serious bradycardia 0.0029820 0.0015840 0.0001516 0.0061160
Proportion with macular edema 0.0007984 0.0008261 0.0000013 0.0029080
Between-study heterogeneity sd 1.076 0.112 0.881 1.317
The distributions appear perfectly plausible and reflect the data well. As expected, there is
considerably more between-study heterogeneity in the untreated outcomes than in the relative
treatment effects, with the standard deviation here being roughly 4 times larger than in the
treatment effects module.
The proportions for the final three outcomes are so low that one could probably quite reasonably
set them equal to zero for most practical purposes, but they will be retained in the model here.
II.6.3 Final synthesised outcomes on absolute scale
II.6.3.1 Population-average outcomes
Figure 28 summarises the posterior distribution of the population-average outcomes on the Normal
scale used for modelling. The relapse rate (which is almost always less than one) and the odds of
Chapter II.6
124
relapse, disability progression and liver enzyme elevation (which are almost always lower than
evens) appear negative on this scale due to the logarithmic transformation.
Figure 29 shows the same distributions, back-transformed to their original scales. On this scale the
absolute level of outcome is always positive.
Chapter II.6
125
Figure 28 - Posterior distributions of absolute treatment outcomes (population averages) on Normal scale from the final model (fixed effects on “zeroes” outcomes, otherwise random effects, three mapping groups, all correlation coefficients between outcomes = 0.6). Markers indicate means and lines indicate 95% credibility intervals. Solid markers and lines are treatment-outcome combinations for which data was available; white markers and dashed lines are estimated by mappings. Any treatment-outcome combinations not shown are assumed to have the same distributions as placebo. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. GI = gastrointestinal. sd = standard deviation.
Chapter II.6
126
Figure 29 - Posterior distributions of absolute treatment outcomes (population averages) on their original scales from the final model (fixed effects on “zeroes” outcomes, otherwise random effects, three mapping groups, all correlation coefficients between outcomes = 0.6). Markers indicate means and lines indicate 95% credibility intervals. Solid markers and lines are treatment-outcome combinations for which data was available; white markers and dashed lines are estimated by mappings. Any treatment-outcome combinations not shown are assumed to have the same distributions as placebo. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. GI = gastrointestinal. sd = standard deviation.
Chapter II.6
127
II.6.3.2 Predictive distributions
Figure 30 summarises the posterior predictive distributions of the study-level average outcomes.
Note that for the final three outcomes, which employ fixed effects (and fixed untreated values),
there is no change from Figure 29. The annual relapse rate has been capped at 3 relapses per year
as otherwise the tail of the distribution contains arbitrarily (and unrealistically) high values.
Figure 31 summarises the posterior predictive distributions of the patient-level outcomes. In this
case the probabilities of relapse, disability progression are frequently capped at both 0 and 1 (i.e. the
minimum and maximum values for a probability), to the extent that the 95% credibility intervals
cover pratcially the entire interval [0,1].
Chapter II.6
128
Figure 30 - Posterior distributions of absolute treatment outcomes (study-level averages) on their original scales from the final model (fixed effects on “zeroes” outcomes, otherwise random effects, three mapping groups, all correlation coefficients between outcomes = 0.6). Markers indicate means and lines indicate 95% credibility intervals. Solid markers and lines are treatment-outcome combinations for which data was available; white markers and dashed lines are estimated by mappings. Any treatment-outcome combinations not shown are assumed to have the same distributions as placebo. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. GI = gastrointestinal. sd = standard deviation.
Chapter II.6
129
Figure 31 - Posterior distributions of absolute treatment outcomes (individual-level) on their original scales from the final model (fixed effects on “zeroes” outcomes, otherwise random effects, three mapping groups, all correlation coefficients between outcomes = 0.6). Markers indicate means and lines indicate 95% credibility intervals. Solid markers and lines are treatment-outcome combinations for which data was available; white markers and dashed lines are estimated by mappings. Any treatment-outcome combinations not shown are assumed to have the same distributions as placebo. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. GI = gastrointestinal. sd = standard deviation.
Chapter II.6
130
II.6.4 Rankings
Figure 32 and Figure 33 show the SUCRA statistics (see II.5.4) based on population average outcomes
in the fixed mapping and random mapping models respectively, for the three different grouping
structures. The changing formats within Figure 32 reflect that the rankings are equivalent for any
outcomes that are grouped together, an inherent property of the fixed-mapping model. In
particular, in the one-group model (top left), the treatment effects for every outcome occur in the
same proportions (relative to one another) for all treatments. This means that the treatment
rankings are essentially equivalent for all outcomes, but the rankings for efficacy outcomes (where
the treatment effects have a positive impact on the patient) are reversed for the liver safety
outcomes (where the impact of treatment is negative); this results in the SUCRAs for efficacy and
safety always summing to 1. This is not the case with more than one mapping group, where the
rankings can differ between the groups.
Figure 32 - SUCRA based on population averages; fixed mapping model
Chapter II.6
131
Figure 33 - SUCRA based on population averages; random mapping model
The rankings for serious gastrointestinal events are largely unaffected by the mappings and are
shown in Figure 34. The rankings for serious bradycardia and macular edema are not worth a graph;
fingolimod is always ranked lowest and all other treatments equal first.
Chapter II.6
132
Figure 34 - SUCRA based on population averages: serious gastrointestinal events
Figure 35 shows how the SUCRA figures change depending on the level of predictive variability that
is accounted for. The figures are based on the efficacy and liver safety outcomes in the three-group
fixed-mapping model but the impact is similar for other outcomes/models. If one considers
predictive distributions at either the study or patient level, instead of population averages, the
increased variability of the treatment effects feeds through to the rankings, which become more
random and less systematic. The ultimate effect on the SUCRA statistics, as revealed in Figure 26 is
to that all treatments’ scores are shrunk towards 0.5, the value that indicates neutral distribution of
rankings. This effect is particularly pronounced in the patient-level predictive distributions for the
RRMS dataset, a stark result that suggests treatment is a very poor predictor of the outcomes an
average RRMS patient will experience, with patient-to-patient heterogeneity and random chance
playing a far greater role.
One can also observe that the equivalence of rankings for outcomes in the same group is slightly
disrupted by the additional variability in the predictive distributions.
Chapter II.6
133
Figure 35 - SUCRA for the efficacy and liver outcomes in the three-group fixed-mapping model: the impact of predictive variability.
If the underlying rankings themselves are of interest, rather than the high-level summary provided
by SUCRA, Figure 36 provides one possible visualisation, showing the proportion of posterior
samples in which each treatment was at each ranking level for a particular outcome statistic (in this
case the population average annualised relapse rate in the one-group random mappings model –
similar graphs for the other outcomes/models are available in Appendix C). These particular
rankings underlie the SUCRAs shown by the darkest green bars in the top-left graph in Figure 33.
Graphs such as this may help to compare performance when the SUCRAs do not distinguish clearly
between treatments (such as glatiramer acetate, subcutaneous interferon beta-1a and laquinimod in
this example), but being based on rankings they still do not convey any sense of whether differences
in ranks correspond to clinically meaningful differences in outcome. For this one must examine the
posterior distributions of the treatment effects.
Chapter II.6
134
Figure 36 - Probabilistic rankings for the population average relapse rate, one-group random mappings model
II.6.5 Conclusions regarding RRMS treatments
Given the results on all ten outcomes, a decision maker must somehow put together all ten sets of
results to come up with an overall score or rule that ranks the treatments. The next chapter will
address this problem by incorporating additional parameters in the model relating to outcome
importance. At this stage, however, one can still make a few general observations about the
treatments’ performance:
• Fingolimod ranks highest in terms of relapse prevention but is not quite so outstanding with
regard to disability progression, and is one of the worst treatments for most safety
outcomes.
• Interferon beta-1b performs well on all efficacy measures but poorly on liver safety.
• Dimethyl fumarate and glatiramer acetate both perform well on efficacy (where dimethyl
fumarate has a slight lead) and liver safety (where glatiramer acetate is second only to
placebo), but both are associated with serious gastrointestinal adverse events.
• Intramuscular interferon beta-1a does well on liver safety but not so well on efficacy.
The efficacy findings are broadly in line with the results of the Cochrane review 81 but there are some
differences. Most notably, glatiramer acetate appears less effective in our analysis and dimethyl
fumarate appears more effective. This difference persists even when the analysis is restricted to the
Cochrane efficacy outcomes and performed on a univariate basis. The difference appears to relate to
our differing approaches to trials using non-standard dosages: the Cochrane review pooled all
dosages for each treatment, whereas here, study arms that did not use the normal dosage were
excluded. Dimethyl fumarate and glatiramer acetate were among the drugs with more than one
dosage used in trials. Surprisingly, the Cochrane review’s safety results (based on discontinuation
Chapter II.7
135
due to any adverse events) are largely in line with the liver safety rankings here (based on biomarker
tests); this similarity may be a chance finding or could perhaps indicate that elevated liver enzymes
can act as a proxy for lack of tolerability in a more general sense. But worth bearing in mind is that
lack of efficacy can be intolerable too, and a recent observational post-marketing review of RRMS 121
found that compliance with treatment was highest for fingolimod and other very effective drugs,
which suggests that most MS patients may value efficacy higher than safety.
II.6.6 Sensitivity analyses
Results of the sensitivity analysis on the assumed priors and correlation parameters are shown in
Appendix C. Alternative non-informative priors were found to have little impact on the results.
Assuming extreme values for the correlation coefficients (0 or 0.9) had some impact on the
individual treatment effect estimates but not so much on the rankings; vague priors on the
correlation propensities did affect the rankings somewhat but not sufficiently to have much impact
on the overall conclusions.
II.7 Discussion
Multivariate network meta-analysis models are relatively novel, but this is an active research area
and various models have recently been proposed, as described in the literature synopsis. Few
applications exist, however, beyond the examples used to introduce the models, and clearly more
experience is needed to evaluate the various approaches’ reliability, practicality and generalizability.
For reasons already discussed, Bayesian multivariate NMA lends itself well to the demands of
quantitative benefit-risk assessment, and the growth of this field may lead to more applications in
future. The computations themselves do not take especially long on modern computers; all of the
models in this chapter that assumed constant correlations between outcomes took less than 5
minutes to run 200,000 iterations in OpenBUGS (version 3.2.2 rev 1063) on a Microsoft Surface Book
2 (i5-8350U 1.70 GHz quad core) running Windows 10; where random correlation propensities were
used, the run time did not exceed 30 minutes. Bayesian modelling using MCMC remains a highly
specialised discipline, however, and the expertise required to set up and interpret such analyses is
likely to be the most significant barrier to more widespread adoption.
The family of models presented here share a combination of features that stands out compared to
other published multivariate NMA models, particularly with a view to benefit-risk assessment:
estimation of missing treatment-outcome combinations, estimation of outcomes on the absolute
scale, flexibility in the assumed correlation structure, code that needs minimal adaptation for each
Chapter II.7
136
dataset, no need to specify covariance arrays in the data. Among the factors that may discourage
researchers and analysts from employing multivariate NMA models are complexity of
implementation and patchiness of data, and these models address both issues.
There are however a number of limitations. The model developed here relies on multivariate
Normal distributions to allow for within-study correlations since they are mathematically tractable
and can be used to approximate most other common distributions. Even implementing multivariate
Normal distributions is not always easy in BUGS, however, and the novel construction developed in
II.4.4 was used in order to facilitate the coding. Other software packages (for example Stan122) may
be able to overcome this limitation.
Avoiding the Normal approximations altogether may be difficult as extending other distributions into
the multivariate domain is not straightforward. In the case of several binomial outcomes, one
possible approach might be to characterise the joint likelihood by specifying all of the conditional
probabilities, but this may be rather cumbersome for large numbers of outcomes and it would
presumably be very difficult to parameterise and code such a model for an arbitrary dataset.
Combining different outcome types will increase the difficulty. It seems likely that using
transformations to achieve an approximation of multivariate Normality will remain the favoured
approach.
Within the multivariate Normal framework, binary events with zero rates can be handled on a risk
difference (or rate difference) scale, but if within-study correlations are to be included, this must be
implemented by post-hoc tuning of the variance data to obtain the correct posteriors. This is not an
ideal solution – not only does it lack rigour, it also makes the process somewhat longwinded as
several model runs are needed – once to establish the results using the “exact” likelihood in a
univariate context; several more runs to tune a Normal likelihood to achieve equivalent results; and
then a final run using the Normal likelihood with within-study correlations. A formula can be derived
for the sample variances (as a function of the sample proportion) that were fed to the final model in
the RRMS case study, but whether this will be generalisable to other datasets is unclear and until
this is established it would be prudent to repeat the same iterative tuning procedure.
It may also be possible to use the inverse relative risk scale for binary outcomes with zero rates, but
this has not been pursued here as it has no clear advantage over the risk difference, presents similar
issues with estimating the sample variance, and suffers from a slightly awkward interpretation.
Chapter II.7
137
Any zeroes in the data for outcomes modelled on the logit scale or log rate scale could be handled
via a continuity correction, although this was not necessary with any of the RRMS outcomes we
considered.
The decomposition of the multivariate Normal distribution into an array of univariate Normals is
restricted to a particular class of correlation structures; but one that is arguably sufficiently broad for
most contexts. It has the advantage of using a relatively small number of parameters to encode the
covariance matrix, thus leaving sufficient degrees of freedom to allow the model to be fitted in the
small patchy datasets for which it is designed.
It appears that allowing for correlations in the RRMS case study may have only had a substantial
impact on the results when the unrealistic assumption of universal positive correlations was
imposed, and not for more realistic correlation structures. It remains to be seen if this is always the
case for other datasets, however. Simulation studies may help to clarify the model’s performance –
and the impact of the allowance for correlations - in a range of different scenarios.
Not all possible types of outcomes have been included in the model, most notably survival/time-to-
event outcomes. Given data on the survival probabilities in each treatment arm at the time point(s)
of interest, it should in principle be straightforward to incorporate such outcomes if one assumes
proportional hazards as per the Cox model. . For example, if outcome k in a trial is the survival
probability at a fixed time point, the relative treatment effect 𝑑𝑡𝑘 can be defined as the log hazard
ratio between treatment t and the reference treatment 1; then the log hazard ratio of treatment 𝑡2
compared to 𝑡1 is 𝑑𝑡2𝑘 − 𝑑𝑡1𝑘 and the ratio of the survival probabilities (i.e. the inverse relative risk
of failure) is the exponential of the hazard ratio, i.e. 𝑒𝑒𝑑𝑡2𝑘
−𝑑𝑡1𝑘 . This should make it straightforward
to specify (a Normal approximation of) the binomial likelihood of the survival data using the inverse
relative risk to compare treatments.
Categorical outcomes with more than two categories (i.e. multinomial) have not yet been
incorporated; allowing for correlations involving these outcomes could be difficult, but it may be
possible to find a way, perhaps by using the “zeroes trick” in BUGS to specify a custom likelihood123.
Throughout the RRMS case study I have assumed that the same between-study standard deviation
sigma applies to the treatment effects on all outcomes, although they are expressed on different
scales and represent different clinical effects. It may be more realistic to explore using a different
standard deviation parameter for each outcome, but for reasons of parsimony this was not done
here. Expressing the multivariate Normal distribution as a combination of univariate Normals is still
possible in such a model, as shown by Theorems 1 and 2.
Chapter II.7
138
The assumption of proportionality between outcomes, underlying the mappings, is a strong one.
This can be mitigated to some extent by grouping the outcomes so that mappings are only applied
between outcomes that are particularly closely related, and by using random mappings so that the
proportions do not have to be exactly equal on all treatments. Potentially one could go further and
build more flexibility into the mapping relationships by adding more parameters – for example,
introducing a power parameter q so that the mapping equation for treatment t becomes 𝑑𝜔𝑡𝑞 =
𝛽𝜔𝑡𝑑1𝑡𝑞. This power parameter could be assigned different constant values as a form of sensitivity
analysis, or given a vague prior.
The mappings, as currently defined, should only be used where the treatment effects on the
outcome(s) used as the baseline for mappings are non-missing for all treatments. Should any of
these treatment effects be missing, the baseline outcome will itself be estimated via the mappings
and the model will not behave as intended. This issue may be avoidable if one reparameterises the
model such that the baseline outcome varies by treatment according to data availability, and this
could facilitate applications to datasets with a higher degree of patchiness. There are other
restrictions in the model that could perhaps be overcome, such as the assumptions of equal variance
across treatment contrasts and across log treatment effects that give rise to fixed correlations of 0.5
between random effects and between mappings respectively.
The mappings cannot be used for outcomes where the effect size comparing a treatment to baseline
t=1 is zero for some outcomes and non-zero for others in the same mapping group, as this is not
consistent with a linear scaling between the effects on different outcomes. In particular, adverse
effects that are only associated with one (or a subset) of the treatments may produce spurious non-
zero results for any other treatments in the model if they are subjected to mappings. In practice this
should not present problems if one selects the mapping groups appropriately.
Another method to allow for correlations or mappings, not pursued here, is to define explicit
structural relationships between outcomes. For example, using the RRMS case study, one could link
the four relapse and disability progression outcomes by assuming that within each study there is a
constant annualised relapse rate r; that a proportion φ of relapses lead to a disability progression
that persists for 3 months; and a proportion θ of such disability progressions persist for a further 3
months. The mathematics of rates and proportions then implies that
𝑃(𝑎𝑣𝑜𝑖𝑑𝑖𝑛𝑔 𝑟𝑒𝑙𝑎𝑝𝑠𝑒) = 𝑒−𝐴𝑅𝑅∗𝑡,
𝑃(𝑑𝑖𝑠𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑝𝑟𝑜𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑐𝑜𝑛𝑓𝑖𝑟𝑚𝑒𝑑 3 𝑚𝑜𝑛𝑡ℎ𝑠 𝑙𝑎𝑡𝑒𝑟) = 𝜑𝑃(𝑎𝑣𝑜𝑖𝑑𝑖𝑛𝑔 𝑟𝑒𝑙𝑎𝑝𝑠𝑒), and
Chapter II.7
139
𝑃(𝑑𝑖𝑠𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑝𝑟𝑜𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑐𝑜𝑛𝑓𝑖𝑟𝑚𝑒𝑑 3 𝑚𝑜𝑛𝑡ℎ𝑠 𝑙𝑎𝑡𝑒𝑟) = 𝜃𝜑𝑃(𝑎𝑣𝑜𝑖𝑑𝑖𝑛𝑔 𝑟𝑒𝑙𝑎𝑝𝑠𝑒)
where t is the number of years of follow-up. Or one could model the distribution of ALT within each
study as a continuous variable and derive the binary outcomes used in the models by using
thresholds. This kind of approach has not been pursued here for two main reasons:
(i) It is difficult to give any generic recipes for such models as they will be highly dependent
on context. The mapping-based models on the other hand take a very generalised form.
(ii) The assumptions underlying a specific structural model may be somewhat stronger than
the relatively loose proportionality condition used for the models in this chapter.
Placing vague priors on the mappings and correlation coefficients allows one to say very
little at all about the extent and form of any relationships while still allowing for the
possibility that they exist.
However, there are situations when specifying explicit structural relationships may have advantages,
such as when the nature of the relationships is known with certainty a priori, or when there is readily
available data on the structural parameters (such as φ and θ as defined above).
Here mappings have been used merely as a tool to synthesise missing outcome measures, but the
model could also be used to investigate the mappings themselves in order to establish surrogacy
relationships between outcomes, as has previously been done with a similar model 89.
It is good practice to only rely on network meta-analyses when the data are consistent; that is, there
should be little conflict between effects estimates that form closed loops in a network. Where
mappings are used, one should also pay attention to the consistency of the mapping ratios.
Techniques have been developed to assess treatment effect consistency in univariate NMAs 43,44 and
in principle a multivariate extension to handle mapping consistency should be possible; this has not
been addressed directly in this thesis but is a target for future work.
The fixed-baseline parameterisation of the treatment effects module (as opposed to the more
common variable-baseline approach) needs summary data for each trial arm, rather than contrast-
level data comparing each treatment to baseline within a trial. Arm-level data is generally published,
but should it not be available for all studies then the corresponding variable-baseline model can be
used instead (with additional within-study correlations reflecting that contrasts observed within a
trial are expressed relative to the same baseline arm and hence correlated). The population
calibration module does however fundamentally require arm-level data, and so any contrast-only
studies would need to be omitted from the data for this part of the model.
Chapter II.7
140
Many biostatitistical methods assume that relative treatment effects comparing pairs of arms within
RCTs tend to be homogeneous throughout the population, with most study- and patient-level
variables affecting only the baseline outcome levels. Where variables do impact on the relative
treatment effects, these are known as effect modifiers, and their influence can cause problems in
evidence synthesis 65,124,125. When combining evidence from multiple studies, or extrapolating to
different populations, any differences in the distribution of effect modifiers can lead to confounding
and bias in the treatment effect estimates.
The key assumption in (multivariate) NMA (and hence a key limitation on when it can be applied) is
that the distribution of effect modifiers is the same in all studies (and in the population that is the
target of the decision). A special case where this assumption may not hold is in the presence of
publication bias: when some outcomes for some treatments are not reported due to poor
performance, then outcome missingness may be a (proxy for one or more) effect modifier(s), and
there may be a heterogeneity in effects between the studies and the target population. In practice
there may be little way to avoid the data being coloured by publication bias but methods exist to
detect it 126 and one could make allowance for its impact, say by being conservative in other ways
such as in the setting of priors or utility trade-offs. This relies, however, on being able to anticipate
the likely direction of the impact that publication bias has on the evidence synthesis. For trials with
a placebo (or established active treatment) control arm, publication bias will usually favour the
active/experimental treatment arm; where trials compare two new treatments (such as many multi-
arm trials), the direction of bias may be less obvious. In network meta-analyses, the network
structure becomes significant in determining the likely impact of any bias: in simple “star”-shaped
networks of placebo-controlled studies, for example (see Figure 37), the bias will usually act in
favour of active drugs whose outcomes go unreported and against those where reporting is more
complete; in more complex networks the impact may be less straightforward, although sensitivity
analyses could be used to test various scenarios.
Figure 37 – A “star”-shaped evidence network with six active treatments (A, B, C, D, E, F) and placebo (P).
Chapter II.7
141
It is not immediately clear what is driving the between-study heterogeneity in the RRMS case study.
The populations of the source studies mostly appear very similar in terms of the distribution of age,
gender and disability level, but show considerable variation in geographical location, time since MS
diagnosis and treatment history, among other factors. It may be that one or more of these factors is
acting as an effect modifier; specific hypotheses could be tested by including these factors as
covariates in the model (in other words, turning the network meta-analysis into a network meta-
regression).
As previously noted, the modelling in this chapter is not intended to reflect the full safety profile of
the RRMS treatments involved, as to do so would have resulted in an impractically large example
dataset. The safety outcomes adopted have been chosen to illustrate the model’s features and
capabilities. It would therefore be wrong to interpret the results as a reflection of the overall
benefit-risk balance for these drugs. However, a fairly comprehensive set of efficacy outcomes has
been included, and to the extent that the data are reliable and consistent, the results for the
outcomes presented here should be meaningful.
In summary, the modelling approaches detailed in this chapter make many unique and useful
contributions to the field of multivariate evidence synthesis, including:
• efficient coding of arbitrary multivariate NMA models with flexible correlation structures
• mappings that exploit related outcome definitions to fill in gaps in data and borrow strength
• ability to incorporate outcomes with zero rates
• a population calibration model accompanying the main evidence synthesis, going beyond
treatment contrasts to estimate the distribution of real-world outcomes.
Ultimately the aims of the chapter have been achieved. A model (or rather family of models) has
been constructed that can perform a principled Bayesian evidence-based multivariate synthesis of a
variety of outcomes, for a number of difference treatments, with variables on a scale suitable for use
in a multi-criteria decision model. Implementing such a decision model, and deriving Bayesian
estimates of its preference-related parameters, is the focus of the next chapter.
Chapter III.1
142
III. Bayesian multi-criteria utility modelling
III.1 Background, aims & objectives
III.1.1 Introduction
A benefit-risk assessment is an example of what is known in decision theory as a multicriteria
decision. The idea is to select between a group of alternatives (here, treatments) on the basis of
several criteria (here, the benefits and risks contributing to the overall balance). The previous
chapter discussed methods for obtaining estimates of each treatment’s performance in relation to
each criterion, resulting in an effects table such as the simple examples in I.1.5 or the forest plots
from the RRMS case study in II.6.3.
In this chapter the existence of a fully populated effects table is taken as given; now the focus shifts
to using these estimates to formulate an overall decision. Multicriteria decisions can be trivial if one
treatment is optimal with regard to all criteria. In general, however, the treatments may be ranked
differently depending which criterion is used as the basis for judgement. Multicriteria decision
analysis (MCDA) is a discipline that breaks down decisions in order to resolve conflicts between
criteria and determine which treatment is favoured overall.
III.1.1.1 MCDA
An informal example of the MCDA approach, applied to the context of benefit-risk, was given in
I.1.5. The discussion here will focus more on the theoretical and technical underpinnings.
MCDA is not so much a single method as a field of study; the term can be applied to a diverse family
of approaches to decision problems 127. They all however have the following general characteristics
(the interpretation in the benefit-risk context is given in parentheses):
• The decision amounts to a choice between a number of alternatives (i.e. treatments)
• Several criteria are relevant to the decision (i.e. efficacy and safety outcomes)
• Each alternative is assessed with regard to each criterion (i.e. the effects of each treatment
on each outcome are evaluated)
• (Optionally) a method is provided to aggregate the indvidual criterion assessments into an
overall assessment that accounts for all criteria (i.e. the overall benefit-risk balance is
evaluated).
The final step (aggregation) is present in some formulations of MCDA but not others 12. The version
of MCDA regarded here as canonical is that set out by Keeney and Raiffa 128, which has its roots in
Chapter III.1
143
multi-attribute utility theory and uses explicit elicitation of the trade-offs between criteria to
construct an aggregated score.
This formulation of MCDA is an example of a compensatory decision-making method, which means
that changing the value of one criterion can be compensated for (in terms of overall value) by a
change in one or more of the others 129. For example, an increase in unwanted side effects might be
mitigated by an increase in efficacy, leaving the overall perceived value of treatment unchanged.
This is what is meant by a trade-off 128. In non-compensatory decision-making this does not hold and
a variety of other decision rules may apply. For example, any alternative where one particular
criterion exceeds (or falls below) a certain threshold might be automatically accepted (or rejected) -
such an approach has been proposed in the benefit-risk context 130. It is not too difficult to imagine
that an individual patient may have a non-compensatory attitude to benefits and risks of treatment.
From the perspective of a regulator or drug developer making decisions based on outcomes at the
population level, non-compensatory attitudes are possible for extremely poor outcome values
(efficacy less than placebo, or risk at unacceptably high levels) but, crucially, such “non-starter”
drugs should usually be easy to identify and can be screened out prior to applying MCDA. Indeed, in
real life it seems highly unlikely that any such treatments would ever get though early phase trials
and reach the stage where a population benefit-risk assessment is carried out. Any risks that have
not emerged prior to large Phase III trials are unlikely to occur with high enough frequencies in the
treated population to justify a non-compensatory approach. It does not seem unreasonable to
suppose that a population-level decision-maker will exhibit compensatory attitudes over the range
of outcomes represented by viable real-world alternatives, and this assumption will underlie the
methods in this chapter. If the decision maker takes a non-compensatory perspective, then the
MCDA model proposed by Saint-Hilary et al 130 may be a viable approach that still has much in
common with the models described here.
There is plenty of common ground between MCDA and health economic decision models 131. MCDA
aggregates multiple outcomes by placing them on a utility scale, while economic models typically
express outcomes on a monetary or QALY scale (based on implicit or explicit valuations of the
underlying trade-offs). Economic models can also be used to analyse decisions with uncertain
consequences (using decision trees, for example) 132. Much of this work will therefore translate
readily to the realm of economic models; however, it seems more appropriate to use MCDA as the
main basis for the methodology since (i) economic costs are usually not considered in benefit-risk
assessments, and (ii) the explicit focus on preferences and trade-offs in MCDA provides helpful
clarity on the underlying decision principles.
Chapter III.1
144
III.1.1.2 Multi-attribute utility theory
Multi-attribute utility theory (MAUT) is an extension of utility theory (which was originally
formulated to deal with only a single criterion) into the multi-attribute (i.e. multi-criteria) domain. It
defines preferences using the concept of utility, which is a cardinal measure of value or
satisfaction133. In MAUT an individual’s utility is taken to be a real-valued function (the utility
function) of a number of underlying criteria. The defining characteristic of a utility function U(X) is
that 𝐸𝐴[𝑈(𝑿)] ≥ 𝐸𝐵[𝑈(𝑿)] if and only if A is at least as desirable as B, where A and B are probability
distributions over the space of multi-attribute consequences X and 𝐸𝐴, 𝐸𝐵 are the corresponding
expectations128. A theorem by von Neumann and Morgenstern guarantees the existence of such a
function given some basic axioms regarding the nature of preferences, and shows that it is unique
up to a positive linear transformation 133. The latter point makes intuitive sense since utility thus
defined has no absolute meaning; all that matters is whether the expected utility of A exceeds that
of B, and this does not depend on any particular linear scale. Various versions of MAUT have been
developed (depending for example on whether the criteria can be evaluated with or without
certainty) 134 but it is a closer family than MCDA as a whole, as some MCDA methods do not attempt
to quantify preferences in a cardinal fashion.
It is of course impossible to account for every single criterion influencing an individual’s overall utility
– potentially this could include any aspect of life from the weather to what they had for breakfast –
but this is not necessary in order to apply MAUT; one can simplify the utility model by focusing on a
set of criteria that are relevant to the decision at hand 128,135.
MAUT provides a rigorous axiomatic mathematical framework for multi-criteria decisions through
the eyes of a rational decision maker (i.e. one who seeks to maximise expected utility 133). While it
can be argued that individual decision makers may often not live up to expectations of rationality,
those such as regulators who make official decisions on behalf of the general public should generally
aim for their decisions to be justifiable, and hence rational 134.
Certain classes of multi-attribute utility functions (MAUFs) are often favoured due to their
tractability:
• Additive and linear in partial values: Such a MAUF can be expressed as a weighted sum of
partial values for the various criteria, where a partial value (or partial utility) function is itself
a utility function restricted to one criterion. In this class the partial value functions (PVFs)
need not be linear. The general form is:
Chapter III.1
145
𝑈 = ∑ 𝑤𝑒𝑖𝑔ℎ𝑡𝜔𝑃𝑉𝐹𝜔(𝑥𝜔)
𝜔=1
If the utility function is of this form then the criteria are said to be mutually utility
independent 128,134. In simple terms this means the criteria weights, 𝑤𝑒𝑖𝑔ℎ𝑡𝜔 , are
independent of the criteria partial utilities, 𝑃𝑉𝐹𝜔(𝑥𝜔). Such models have been favoured in
part due to their high statistical tractability, which facilitates elicitation of preferences 34.
Applications date back at least as far as the 1960s 136.
As the scale of the utility function is essentially arbitrary, as per the von Neumann-
Morgenstern theorem133, the weights and partial values are unique only up to overall scaling
constants. It is conventional however to normalise the weights to sum to 1, and to scale the
partial values such that the maximum partial value for any alternative is 1 and the minimum
partial value is 0. These constraints allow a unique solution to be identified and limit overall
utility to the interval [0,1].
• Additive and linear in criteria: this is a subset of the above class where each PVF is also a
linear function of the underlying criterion measure 𝑥𝜔. Equivalently, the MAUF is itself an
additive linear function of the criteria and can be written as
𝑈 = ∑ 𝛼 + 𝑈𝐶𝜔𝑥𝜔
𝜔=1
where 𝑈𝐶𝜔 is a utility coefficient reflecting both 𝑤𝑒𝑖𝑔ℎ𝑡𝜔 and the linear coefficient of 𝑥𝜔
within its partial value function, and 𝛼 is an overall intercept term.
Although the Bayesian paradigm has been associated with decision theory for a long time, it has only
recently begun to be taken seriously with regard to multi-attribute problems. This is due in part to
the advent of MCMC sampling, which has helped to overcome serious difficulties in computation.
Now that this issue has been overcome there are good arguments supporting the use of Bayesian
approaches in this field 34 (see also I.1.5).
III.1.1.3 MCDA and MAUT in health
Multi-criteria utility elicitation techniques have been used for some time to construct health-related
quality of life measures 137,138 These composite measures are essentially utility values calculated
Chapter III.1
146
based on a number of underlying outcome measures; the parameters for the utility function are
estimated using the principles of MAUT (typically elicited using absolute scenario ratings). Such
measures tend not to be suitable for benefit-risk assessment as they do not capture the specific risks
associated with each treatment.
Beginning in the 1990s, various attempts have been made to use MCDA to perform holistic
assessments of treatments and other health interventions. Such analyses have been used for
purposes including health technology assessments, economic evaluations and clinical prescribing
recommendations 139.
Around the turn of the millennium it was recognised that multi-criteria decision making techniques
could have the potential to put benefit-risk assessment on a more formal and reliable footing 6,49.
This led to several exploratory initiatives aiming to demonstrate proof of the concept by government
bodies, institutions, organisations and consortia such as the European Medicines Agency 9,
Innovative Medicines Initiative 10, International Society for Outcomes Research 12, and the
Pharmaceutical Research and Manufacturers of America. Many projects have now issued guidance
on best practice for MCDA in benefit-risk 5,11 55. There have also been a number of reviews and
recommendations specifically concerning utility elicitation methods for use in health related
fields140,141142,143.
This activity is reflected an increase in use of quantitative benefit-risk methods by pharmaceutical
companies and other industry stakeholders 15.
III.1.1.4 Notation and nomenclature
A great deal of disparity and ambiguity of terminology exists in MCDA, owing in part to the
independent parallel development of methods that would later be grouped under the MCDA
umbrella. The following conventions will be adopted here:
• The MCDA decision variables are known as criteria.
• Clinical variables are known as outcomes. There may be more than one outcome that can be
used to measure each criterion (for example, see Figure 10).
• Preferences is a general umbrella term for utility function parameters.
• Utility coefficient is the linear coefficient for a criterion in the utility function.
• (Preference) weights are utility coefficients normalised to sum to 1 across all criteria.
• Preference strength for a criterion is the log of the absolute value of its utility coefficient.
• The utility ratio of two outcomes is the absolute value of the ratio between their utility
coefficients.
Chapter III.1
147
• Relative preference strength for two criteria is the difference between their preference
strengths (or equivalently, the log utility ratio).
The reasoning behind using these particular numerical measures will become clearer when the
model parameterisation is set out in sections III.2 to 0. In the meantime, defining these terms
facilitates discussion of existing preference elicitation methods using consistent terminology.
III.1.2 Preference elicitation methods
A wide variety of methods have been developed for eliciting the parameters of the utility function
(often simply referred to here as preferences) 141,143,144 . Elicitation can be carried out in groups, one
to one sessions, or individually via paper, internet, or telephone. Participants are required to make
judgements regarding either:
• The value of the decision criteria themselves, in isolation or via pairwise comparisons (such
methods are known as compositional); or
• The overall value of scenarios involving several criteria at once (such methods are known as
decompositional) 145.
Elicitation methods also vary in terms of the format of judgements that participants are asked to
express, and thus the types of data that needs to be analysed. This will be covered in more detail in
III.1.3. First, the paragraphs below briefly introduce some of the well-known elicitation methods
that have been employed to elicit preferences for health outcomes. This is by no means intended as
an exhaustive list. Later in the chapter it will be shown how data from most of these methods can
be analysed in a generalised Bayesian framework.
III.1.2.1 Analytic Hierarchy Process
The Analytic Hierarchy Process (AHP) is a framework for multi-criteria decision making that was
originally developed by Saaty 146 and has spawned a substantial literature, with applications in many
fields and various technical adaptations/extensions of the methodology having been developed 147.
AHP features heavily in the literature on decision-making in other disciplines and there have also
been applications in health-related fields 148.
AHP is a compositional method that includes a technique for eliciting priorities (the AHP terminology
for preference weights or partial values) based on exhaustive pairwise comparisons of criteria (or
criterion levels). A judgement matrix is created with a row and a column for each criterion (or level),
such as that in Figure 38; participants fill in the matrix with estimates of the utility ratios between
the corresponding row and column.
Chapter III.1
148
Figure 38 – Example of an AHP judgement matrix. C1-C4 are the criteria (or criterion levels) to be compared; the symbol indicates where a judgement is to be entered estimating the utility ratio between the row and column criteria. The grey cells do not need to be filled in.
The numerical judgements in the matrix are usually expressed using values from 1 to 9 (or
reciprocals thereof 146, where 1 represents equal importance between the criteria and 9 (or 1/9)
represents an extreme difference in importance. Alternative numerical scales have also been
proposed 149-151. Typically the questions are administered in a paper or electronic questionnaire for
participants to complete individually.
A number of methods exist for deriving the weights or partial values from the judgement matrix.
Saaty originally proposed deriving the weight vector as the eigenvector of the matrix and this
remains the standard method, but it has been criticised for (among other reasons) its deterministic
nature and apparent lack of sound principles 152,153. An alternative regression-based method for
analysing AHP results has been proposed at various times in order to address these issues 149,151,154-
156; this tends to give equivalent weight estimates to the eigenvalue method but with the advantages
that is based on well-founded axioms and provides insight into the statistical properties of the
preference estimates 152,153 . The regression-based AHP analysis is particularly important for this
thesis because it is conducive to a Bayesian implementation 157.
III.1.2.2 Measuring Attractiveness by a Categorical BasEd TecHnique (MACBETH)
MACBETH is a compositional MCDA tool that, like AHP, requires participants to fill out judgement
matrices to provide judgements of the utility ratio between criteria on a pairwise basis. However, a
strictly verbal scale is used to express judgements. If one criterion is judged more important than
another, the difference is qualified as very weak, weak, moderate, strong, very strong, or extreme158.
Unlike the AHP judgement scale, these descriptions do not correspond to fixed numerical
differences. Instead, a specialised MACBETH software package derives numerical weights from the
Chapter III.1
149
judgement matrices that satisfy the verbal comparative judgements (and the implicit ordering
between them) using an internal algorithm, and also calculates a measure of consistency. The
specifics of the process are too complex to go into here but have been detailed elsewhere159.
III.1.2.3 Swing weighting
Swing weighting is another compositional method based on pairwise criteria ratings. Rather than
evaluating all possible pairs of criteria, however, a hierarchical tree structure is used 160. The method
emphasises the need to be clear about the amount of “swing” in each criterion when evaluating the
pairwise ratings, indicating its roots in Keeney-Raiffa’s canonical work on MCDA, which uses a similar
elicitation procedure 128.
Swing weighting is facilitated by use of a tree diagram. The series of figures below illustrate the
process using a simplified tree diagram based on RRMS treatment outcomes and administration
modes (see III.3.2.1 for the original version); the data has been made up for this example. The
details of the swing weighting process vary but the following is typical:
(i) A tree diagram is constructed with the criteria arranged in hierarchical groups (Figure
39). Here benefit criteria are shown in green, risk criteria in red, and administration
mods in blue. Note that the level of the swing has been emphasised in bold here for
each outcome; without quantifying the swings this way it is impossible to give
meaningful weights. For categorical variables such as the administration modes, it is
first necessary to rank the levels in terms of desirability and to use the lowest-ranked (in
this case daily subcutaneous) as a reference to define the swings.
Chapter III.1
150
Figure 39 – Swing weighting example using RRMS treatment outcomes and administration modes: step (i)
(ii) At each yellow cell in the middle level of the hierarchy, its subordinate cells (i.e. those
criteria on the right which branch off from it) are weighted numerically, with a notional
value of 100% for the most important branch and values between 0 and 100% for the
other branches, reflecting their relative value (Figure 40).
Chapter III.1
151
Figure 40 – Swing weighting example using RRMS treatment outcomes and administration modes: step (ii)
(iii) The top-ranked criterion in each group is “promoted” to its parent cell and the same
weighting process then takes place at the next level up the hierarchy (Figure 41). In
trees with more hierarchical levels, this process continues all the way up the hierarchy.
Chapter III.1
152
Figure 41 – Swing weighting example using RRMS treatment outcomes and administration modes: step (iii)
(iv) Each criterion’s overall weight is determined by multiplying its weight with those of its
“parent” cells at higher levels of the hierarchy (Figure 42). These weights are on an
arbitrary scale; it is conventional to normalise them to sum to 1 at the end of the
process.
Chapter III.1
153
Figure 42 – Swing weighting example using RRMS treatment outcomes and administration modes: step (iv). Final weights are shown in bold (right).
Note that this analysis is entirely deterministic. A fuller description of the process is given by
Mussen et al 50. The use of hierarchies and other network structures for elicitation will be discussed
further in III.1.3.3.1.
III.1.2.4 Choice experiments
Choice experiments (or discrete choice experiments, DCEs) are fundamentally different from the
methods discussed above as they are decompositional in nature and the responses are ordinal, not
cardinal. In other words, rather than asking participants to directly quantify or qualify their relative
preferences for individual criteria, DCEs consist of several choice tasks, each of which requires
participants to choose the most appealing option from a choice set consisting of a number of
different multi-criteria scenarios. The responses are then analysed (using regression-based
methods) to infer the strength of preference for each individual criterion. Figure 43 is an example of
a choice task used to elicit preferences for benefit-risk assessment.
Chapter III.1
154
Figure 43 – Example of a binary choice set. This is from the PROTECT RRMS patient choice experiment, which will be described in more detail in III.4.2. PML = progressive multifocal leukoencephalopathy.
The statistical properties of the estimates are dependent on the precise design of the DCE (i.e. which
scenarios are shown to which participants), which should be tailored to the problem under
investigation. DCEs were developed outside the Keeney-Raiffa 128 linear MCDA framework used in
this thesis and are not necessarily as constrained by the same assumptions regarding values/utilities,
but can be designed to elicit preferences for use in the linear MCDA setting.
Of all the well-known methods for preference elicitation, it is probably discrete choice experiments
that have the most extensive literature 161, and have been used for some time in health related
fields162-166. DCEs have been employed in benefit-risk modelling using MCDA, for example in the
PROTECT intiative 72.
III.1.3 Data types
This section sets out a formal system for classification of the data formats and structures commonly
used for preference elicitation.
III.1.3.1 Rankings and choices
Rankings express relative differences in value (giving no information as to the absolute scale) and are
expressed as ordinals. Choice data, in which the most highly valued element of a set is chosen, is a
Chapter III.1
155
type of partial ranking data in which only the top ranking is supplied. The majority of this section
deals with choice data; it will be explained below how choice methods can be extended to deal with
full rankings.
Choice task responses are analysed using choice models. The multinomial logit model (an extension
of binary logistic regression) is the most popular model for various reasons both practical167 and
theoretical168. The analysis of choice data typically works according to the following principles:
• The utility 𝑉𝑋𝑖 of a scenario X to an individual i is assumed to consist of (i) a deterministic
component 𝑈𝑋 defined as a specific function of the criteria, with parameters to be
estimated, and (ii) an individual-specific random error term 휀𝑖. That is, 𝑉𝑋𝑖 = 𝑈𝑋 + 휀𝑖 and
𝑈𝑋 = 𝑓(𝑥1, … , 𝑥𝑚; 𝜷) where 𝑥1, … , 𝑥𝑚 are the criteria values in scenario X and 𝜷 is the set
of preference parameters to be estimated. If a linear utility model is assumed, then 𝑈𝑋 =
𝛽1𝑥1 +⋯+ 𝛽𝑚𝑥𝑚 but the method is not restricted to this particular form.
• An individual i selects option A if 𝑉𝐴𝑖 > 𝑉𝑋𝑖 for all alternative options X.
• For the multinomial logit model, it is assumed that the error terms follow a Gumbel
(extreme value type I) distribution, with the result that the probability of selecting option A
in a choice task is given by 𝑃𝐴 = 𝑒𝑈𝐴 ∑𝑒𝑈𝑋⁄ where the summation is over all possible
options X 169.
• Given data on which options were selected, the coefficients 𝜷 and their standard errors can
be found by regression.
Extensions of the multinomial logit model, such as the exploded (or rank ordered) logit and
sequential best worst logit 170,171 have been developed to allow the analysis of full rankings data (as
opposed to the partial rankings provided by choice data). This is done by noting that a ranking of
several options can be broken down as a series of statistically independent choices: in the exploded
logit model, for example, it is assumed that participants first choose the best option, then choose
the best of those that remain, and so on. Any ranking data can thus be re-expressed as choice data
and analysed accordingly.
A popular alternative to the logit model is the probit model, where the errors follow a Normal (as
opposed to Gumbel) distribution. In practice there is usually little difference between the estimates
of either model, as the two distributions are very similar except in the tails 172.
Choice (and by extension, ranking) models are founded on sound statistical principles and utilise a
probabilistic, regression-based analysis. This all hints that a Bayesian implementation should be
Chapter III.1
156
natural and reasonably straightforward. Indeed, Bayesian applications and adaptations/extensions
of choice models already exist, and have been used in the health sciences 165.
One issue that has been identified with choice models is that their reliability decreases (and the
cognitive burden on participants increases) as the number of criteria increases 7,8. Benefit-risk
assessments may often contain more criteria than is recommended. One solution to this may be to
split the criteria between two or more separate choice experiments (with some overlap) and use a
model that can combine the results. Alternatively, some other elicitation method could be used to
augment the number of criteria. Such approaches require a model that can estimate preferences
based two or more datasets jointly. This has been attempted before on a limited and non-Bayesian
basis173; a Bayesian approach that can accommodate other data formats such as relative ratings
would be particularly useful for benefit-risk assessments.
III.1.3.2 Absolute ratings
Absolute ratings, in the context of healthcare, can only be used to express the participant’s
judgement of his or her overall health state. This is because there is a natural universal absolute
scale for overall health, ranging from 0 (“dead”) to 1 (“perfect health”). There is however no
universal absolute scale for ratings that express the importance of an individual criterion or
outcome; such judgements can only be evaluated relative to one another.
The only way to use absolute ratings to evaluate outcome preferences, therefore, is to ask
participants to rate multi-criteria scenarios on the natural absolute scale described above, and to use
these ratings as the dependent variable in a regression in order to estimate the parameters of the
utility function.
The use of such a model is common in marketing-based preference elicitation studies (which tend to
be known as conjoint analyses) 174, and there is a substantial literature on its theory and applications,
including Bayesian versions 175. In principle such models could be included in the scope of this
chapter. However, due to the lack of applications in the benefit-risk field and a lack of relevant data,
approaches based on absolute scenario ratings will not be pursued here.
III.1.3.3 Relative ratings
Relative ratings express the ratio of preference intensity between scenarios by comparing their
utilities (or incremental utilities). Most commonly, relative ratings are used to express the relative
importances of individual criteria/outcomes.
For example, a rating task might require participants – diabetes patients, say - to express their
relative preference for “avoiding a disability progression” versus “reducing the relapse rate by 1
Chapter III.1
157
relapse/year”. The response could be any positive number, with (for example) a value of 0.5
indicating that a myocardial infarction is half as important as a 10% change in body mass, or a value
of 100 indicating that it is 100 times more important. Negative ratings are not encountered, as it is
usually clear whether a criterion has positive or negative impact; this is made particularly
transparent in this example by the use of “avoiding” and “reducing” in the criteria descriptions.
Sometimes ratings are elicited on what may at first glance appear to be an absolute scale, with one
elicited value per criterion (rather than per pair of criteria). For example, an elicitation task involving
m criteria might ask participants to “place the most important criterion at value 100 and the others
at appropriate values xi between 0 and 100”. In such instances the data should be analysed as m-1
relative ratings of value xi /100 rather than as absolute ratings. The value of 100 has no meaning and
is simply an arbitrary fixed anchor point used to establish scale. It is recommended (and assumed in
this chapter) that any such data are transformed onto the relative scale before use.
Asking subjects to rate the relative importance of outcomes can however be somewhat woolly,
depending on how the questions are phrased – for what exactly are they being asked to compare? If
an RRMS patient rates the benefit of “relapse prevention” to be five times as important as the risk of
“serious gastrointestinal events”, are they comparing a single relapse to a single serious
gastrointestinal event? Or are they implicitly also weighting them by their incidences, judging
relapses as more important merely because they occur more frequently? Or, given an unclear
question, might they alight on an answer that lies between these two extremes?
The burden of disease outcomes is a function both of their frequency* and their seriousness, and a
failure to disentangle the two (and/or clearly communicate the required task to participants) will
render any elicited data practically meaningless. Unfortunately, this is a common pitfall in
preference elicitation, sometimes known as range insensitivity bias176. To account correctly for
frequency and seriousness there are two possible approaches:
(i) Elicit judgements that reflect both seriousness and frequency and evaluate the decision with
regard only to this elicited data; or
(ii) Elicit judgements that relate only to seriousness and combine these with other evidence on
the frequencies.
* For clarity, the discussion here is phrased in terms of frequency, i.e. the typical objective clinical measure for either binary or count outcomes. For other outcomes other clinical measures may be used (eg continuous variables representing clinical severity, time to event, etc) and can be substituted into the argument accordingly. The key point is that these quantities are measurable at the clinical level and do not have to be elicited from individual subjects.
Chapter III.1
158
Option (i) is employed in many formulations of MCDA (such as classical AHP 146) and may well be
appropriate if objective data on frequency is unavailable; in the benefit-risk context, however, an
evidence-based analysis should (presumably) reflect the best available clinical evidence on
frequency (or other objectively estimable clinical measures) and restrict the use of subjective
judgements (which may be subject to cognitive biases) to seriousness alone, i.e. follow option (ii).
This means ensuring that elicitation tasks comparing outcomes should always clearly refer to fixed
intervals of the outcomes involved, as is emphasised in swing weighting. If (as will usually be
assumed here) the partial value functions for the outcomes in question are linear, then the relative
importance depends only on the interval width (not its location on the overall scale), and the
elicitation tasks can be phrased accordingly, eg “compare a reduction of 1 in the annual relapse rate
against a reduction of 5% in the serious gastrointestinal event risk.”
One common practice is to set the interval width for each criterion equal to the difference between
the best and worst alternative with respect to that criterion 160,177. In a fully Bayesian MCDA, this
approach is not feasible as these intervals are not fixed but random quantities. Instead predefined
fixed intervals must be used, and arguably a single unit (of whatever outcome measure is used) is
the simplest to communicate to participants.
III.1.3.3.1 Structure of relative rating tasks
Given a set of criteria, there is more than one way to break down the preference elicitation problem
into a series of pairwise comparisons.
Relative preference intensities are of course transitive, so that given preference ratios for A over B,
and for B over C, one can derive the ratio for A over C as their product, as illustrated in Figure 44.
Figure 44 - Simple example of a network of outcome preferences (i). The preference ratio for A over B is 2, and for B over C is 3, so one can deduce that the preference ratio for A over C is 6.
But what if one has also directly elicited the ratio for A over C (Figure 45)? A direct and indirect
estimate will be available, and these may not be consistent.
Chapter III.1
159
Figure 45 - Simple example of a network of outcome preferences (ii). In this case there is inconstancy between the direct estimate for the preference ratio of A over C (4) and its indirect estimate (6).
There is a clear parallel with the network meta-analysis models of the preceding chapter, where
indirect and direct evidence are combined to inform pooled estimates of the treatment contrasts.
Indeed, the situation is perfectly analogous, as shown by the similarity between Figure 4 and Figure
45. This parallel will be drawn upon throughout this chapter, particularly in section III.4 where a
“network meta-analysis” model for preferences is proposed.
Fans and webs
It is in situations where the network of preference comparisons contain loops that pooling of direct
and indirect estimates takes place, and thus the possibility of inconsistency emerges. When
designing an elicitation experiment, one can in theory elicit any combination of comparisons that
forms a connected network of outcomes. It is worth highlighting two particular approaches,
representing opposite extremes:
• Compare all pairs of outcomes on an exhaustive basis, resulting in a fully-linked network
(here called a “web”) such as the example shown in Figure 46. This provides the maximum
amount of data but requires participants to make the largest possible number of
comparisons (which is 𝑛(𝑛 − 1)/2 for a network with n outcomes), and possibly generates
many inconsistencies. This approach is employed by methods including AHP and MACBETH,
some of which also provide means of calculating the inconsistency in the network.
Chapter III.1
160
Figure 46 – Example of a “web” network with six outcomes/criteria (left). Often the comparisons are entered into a triangular “matrix” such as that shown on the right.
• Choose one outcome relative to which the other outcomes are all compared, resulting in a
network of comparisons as in Figure 47, with no loops (and hence no inconsistencies). For n
outcomes this results in n-1 comparisons, the smallest possible number in a connected
network.
Figure 47 – Example of a “fan” network with six outcomes/criteria (left). The number of comparisons required (right) is much lower than for a “web”.
Hierarchical elicitation structures
Many elicitation methods use networks with a hierarchical structure. In a simple two-level hierarchy
this means that the outcomes/criteria are divided into several groups; comparisons are performed (i)
between criteria within each group at the lower level of the hierarchy and (ii) between the groups at
the upper level; these are then combined to give ratings for the full set of outcomes (see III.1.2.3 for
an example). For example, the upper level of the hierarchy might consist of the two groups
“Benefits” and “Risks”, containing the benefit and risk outcomes respectively. The overall
Chapter III.1
161
importance of an individual benefit (say) is a combination of its importance relative to the other
benefits (lower level), and the importance of benefits relative to risks (upper level). Hierarchies can
be extended to any number of levels (given sufficient criteria).
Reasons for using a hierarchical elicitation structure may include:
• To mitigate the impact of cognitive biases in the elicitation process 176. For example,
equalisation bias can result in comparisons being biased towards equality. The more
unequal in importance the criteria being compared, the bigger a problem equalisation bias
will be. To avoid this, using hierarchies can allow highly unequal criteria to be compared
indirectly via criteria of intermediate importance, or bundle several less important criteria
together for comparison with a single more important criterion.
• Reducing the number of comparisons – if one is using an elicitation method that uses webs
(eg AHP, MACBETH) then introducing hierarchies will reduce the number of judgements that
need to be elicited from each participant.
• Simply to structure the problem and as a guide to the thought process, or to aid in
communicating the results 160.
Most methods use one of two rules for transferring preferences up and down the hierarchy:
1. Agglomeration – this means a set of criteria at the lower level are represented at the upper
level as an agglomerated group, and thus any judgements at the upper level reflect the total
importance of the set. In the benefit-risk example, the upper-level comparison would be
between all benefits and all risks. The agglomeration rule is commonly employed in AHP.
Figure 48 contains an example of a network diagram for a two-level hierarchy using the
agglomeration rule with web structures at every level. This structure is typical of AHP.
Chapter III.1
162
Figure 48 – Hierarchical elicitation network for 10 criteria arranged in two groups of three and one group of four, using the agglomeration rule and webs at both levels of the hierarchy. The table on the right shows the comparisons that need to be performed by participants.
2. Substitution – this means that a set of criteria at the lower level are represented at the
upper level by a single member, usually taken to be the most important in the set (but more
on this below). In the benefit-risk context, the upper-level comparison would be between
the most important benefit and the most important risk. The substitution rule is commonly
employed in swing weighting (an example was shown in III.1.2.3).
Figure 49 is an example of a network diagram for a two-level hierarchy using the substitution
rule with web structures at every level.
Chapter III.1
163
Figure 49 - Hierarchical elicitation network for 10 criteria arranged in two groups of three and one group of four, using the substitution rule and webs at both levels of the hierarchy. The table on the right shows the comparisons that need to be performed by participants.
Figure 50 is an example of a network diagram for a two-level hierarchy using the substitution
rule with fan structures at every level. This type of network can be called a tree and is
typical of swing weighting (as in the example in III.1.2.3).
In a tree with a given number of criteria, the number of levels has no effect on the number
of comparisons (this is straightforward to prove by mathematical induction on the number
of criteria).
Chapter III.1
164
Figure 50 - Hierarchical elicitation network for 10 criteria arranged in two groups of three and one group of four, using the substitution rule and fans at both levels of the hierarchy – that is, a tree. The table on the right shows the comparisons that need to be performed by participants.
The first stage in the elicitation process is usually to decide which criteria are to be
promoted/substituted; before this is done the network must be drawn differently. For example,
prior to elicitation the tree in Figure 54 would typically be drawn as in Figure 55, known in the swing-
weighting context as a tree diagram or value tree (see also the example in III.1.2.3).
Chapter III.1
165
Figure 51 - Hierarchical elicitation network in Figure 50, shown before identification of criteria for promotion, i.e. in value tree format.
While the agglomeration rule may be appropriate for some decision problems, its use in rating
outcomes for healthcare decisions poses difficulties due to inaccurate clarity of scale, a pitfall
already encountered in the preceding section. If asked to value “all benefits” against “all risks”,
what exactly will a participant be expected to mentally compare? Is he/she expected to weigh the
outcomes according to the frequencies with which they are expected to occur? To answer in the
affirmative is arguably more the more natural interpretation for the participant, but the negative is
the only way to obtain preferences that can be combined with externally estimated frequencies. In
any case one cannot be sure of what basis the participant has used to mentally agglomerate the
outcomes, and so any data are as good as meaningless. One could in principle come up with an
explicit agglomeration formula, such as giving even weight to a fixed interval on each criterion, but
this would have to be made clear in the elicitation questions and would be rather cognitively taxing
on the participants.
The substitution rule is therefore the favoured approach here. A common convention is to follow a
bottom-up procedure whereby the lower levels are evaluated first, and the most important criterion
at each level is promoted to the next level up 29,50. The selection of the most important criterion for
Chapter III.1
166
promotion may be because it is natural for importance to increase as one advances through the
levels. However, there is no obvious reason why this approach should be universally recommended;
where differences in weight between criteria may be significant, a better strategy would appear to
be to group criteria with others of similar importance, and promote the most important outcome
from groups of overall low importance, and the least important outcome from groups of overall high
importance (i.e. to aim to equalise the importance as one moves up the levels in order to minimise
the impact of equalisation bias).
The outcomes to be promoted can be selected in advance, using a priori (or pilot study) estimates of
the likely ratings to fix the network structure for all participants. Alternatively, some elicitation
methods may facilitate a dynamic process (such as the bottom-up procedure described above) that
allows the promoted outcomes to vary between participants.
Note that the lower down the hierarchy a given criterion is, the more comparisons are involved in
determining its weight – and therefore the more uncertainty there is on the weight, giving more
scope for random error. This uneven uncertainty structure has not been allowed for in some
previous attempts to allow for uncertainty in MCDA which treat all weights symmetrically with
regard to uncertainty178. This also may be an argument against using too many hierarchical levels in
an elicitation network: too many levels may result in too much uncertainty on the weights.
Due to the use of an unclear agglomeration rule, I would be wary of using the traditional hierarchical
AHP method for benefit-risk preference elicitation, but single-level AHP matrices are not affected by
this and can be analysed with the models in this chapter.
III.1.3.3.2 Rating scales
It is acknowledged that different numerical scales can be used to discretise or verbalise the ratio
judgements, and due to psychologies of scales it has been argued that this affects the results149,179.
Such considerations are beyond the scope of this thesis and any numerical ratings will be analysed as
continuous variables on the scale on which they are originally expressed.
III.1.3.4 Summary data from published elicitation studies
Preference elicitation studies are now becoming more widespread in the medical literature, and for
any given disease there is a reasonable chance that researchers will be able to find several published
studies with elicited preferences for several outcomes pertinent to that disease and/or its
treatments. To the best of my knowledge, however, there have been no attempts to perform meta-
analysis upon the results of elicitation studies. This may simply be because no appropriate method
has yet been proposed. As with all meta-analyses, the benefits of such a method would be twofold:
Chapter III.1
167
the ability to aggregate the results of multiple studies and obtain an overall result; and the ability to
assess the level of preference heterogeneity between studies. I would argue that both of these are
urgent needs in the field of quantitative benefit-risk assessment for the following reasons:
• If the value of the preference-weighted approach to benefit-risk assessment at the
population level is to be established, then it must be shown that preferences are (at least
some of the time) reasonably homogeneous.
• If quantitative benefit-risk assessments are to be used to inform real-world regulatory
decisions, they should be able to incorporate all available relevant data on preferences.
It strikes me therefore that developing a meta-analysis model for preferences would be particularly
helpful to the field at the present time and this will be one objective of this chapter.
One possibility is to employ an approach analogous to network meta-analysis. A standard NMA
compares a set of treatments with respect to some outcome measure of interest, and each source
study provides information on that measure for a subset of the treatments. Here, by parallel, we
would like to compare a set of outcomes with respect to the strength of preference for those
outcomes – but apart from this change in context, the method requires little adjustment. Just as a
relative treatment effect measure comparing two treatments is assumed to be homogeneous
between studies in standard meta-analysis, the key assumption here is that the ratio of preference
strength between any two outcomes is homogeneous. This is consistent with an additive linear-in-
criteria utility function and the assumptions of the elicitation methods described in this chapter.
The method could be used to obtain preference estimates without carrying out a new elicitation
exercise. It could also prove useful if employed alongside new preference elicitation studies, both as
a check on the external validity of the results and also for planning purposes (eg obtaining prior
estimates of the results in order to establish sample size). Despite its potential usefulness, meta-
analysis of preferences does not feature in the MCDA literature.
III.1.4 Allowing for uncertainty in preferences
As we have seen in III.1.2 , the most popular ratings models for benefit-risk preference elicitation are
deterministic, providing no natural way to allow for uncertainty. Indeed, within healthcare fields,
allowance for uncertainty in preference modelling is more often than not missing, or carried out
using one-way sensitivity or scenario analyses 139. The latter approaches are by nature approximate
and cannot fully characterise the uncertainty in model outputs 17. Given the level of unfamiliarity
with preference modelling in the field and the associated potential for bias or imprecision 40,50, not to
mention heterogeneity within the population, these are discomfiting findings; a more sophisticated
Chapter III.1
168
approach to preference uncertainty is surely warranted in order to address any concerns that the
preference estimates are imprecise or that preferences themselves are highly variable.
In a few cases, more advanced probabilistic simulation methods have used, allowing full simulation
of the uncertain model outputs139. There may still be still room for improvement, however.
Probabilistic approaches to preference uncertainty in benefit-risk have tended to either assume no
knowledge of the distribution of preferences (using Stochastic Multicriteria Acceptabilty Analysis,
which simply explores all possible combinations of weights), or to estimate the distribution based on
external data (which adds another layer to the analysis and relies on an appropriate data source
being available). See I.1.6 for more discussion of these methods.
1393331,32A further problem is presented by the diversity of elicitation methods and data types; this
has led to a segmentation of the field, with most ratings methods being tied to a specific data
structure - for example, AHP uses webs and swing weighting uses trees. This makes it hard to use all
of the available data or to compare the findings of heterogeneously designed studies.
A parametric generalised Bayesian approach that is designed to incorporate data from a range of
common preference elicitation methods would directly address these issues, allowing uncertainty to
be estimated directly (within the model) from an arbitrary dataset and propagated to the outputs.
III.1.5 Aim and objectives
In light of the above, the overall aim of this part of the project is to develop just such a Bayesian
framework for analysis and meta-analysis of preference data for benefit-risk assessment using
MCDA. Ideally the framework should:
• be able to accommodate original (raw) preference elicitation data in the form of either
choices or relative ratings of criteria, and originating from as many of the following methods
as possible: AHP, swing weighting, MACBETH, discrete choice experiments; since these are
the types of preference data most often used for quantitative benefit-risk assessment 139;
• be able to accommodate summary preference data from previous elicitation studies where
no raw data is available;
• be fully Bayesian, with new Bayesian models to be developed where no suitable models
exist for a given data type; and
Chapter III.1
169
• use a common preference parameterisation that allows inferences to be made on multiple
data types simultaneously
This should improve the ability of analysts using MCDA for benefit-risk assessments to make the
most of all available data while accounting for uncertainty in a Bayesian manner. Ultimately this will
help to ensure that the perspectives of stakeholders – as revealed by their stated preferences - can
be fairly reflected in the decision making process.
It will be assumed that the overall utility function is an additive linear combination of the partial
values on all of the criteria, and that partial values for continuous criteria are themselves linear in
the underlying outcome measures. Categorical criteria will also be accommodated.
As the focus here is on analysing and interpreting preference elicitation studies, I will not delve too
deeply into issues relating to the design and execution of such studies, except insofar as these
impact directly on the methods to be used for analysis.The RRMS case study will be used as a
motivating example throughout the chapter, with a view to estimating preference weights for the
outcomes synthesised in Chapter II.
Chapter III.2
170
III.2 High level model structure
The evidence synthesis strategy will use the overall model structure depicted within the blue area of
Figure 52. which also shows how this fits together with the other modelling components already
described in Chapter II (shown in faded tones).
Figure 52 - High-level model structure, focusing on preference modelling.
III.2.1 Notes on preference parameters
The details of the various models will follow in their respective sections but there are some general
points worth making at this stage.
III.2.1.1 Assumed form of the utility function
Throughout this chapter, it will be assumed that utility is linear in criteria (see III.1.1.2). This should
not be unduly restrictive, as the underlying criteria measures can be transformed to another scale if
Chapter III.2
171
it appears this will improve linearity, provided an appropriate monotonic transformation exists. This
means that the assumption of linearity in criteria is not in practice much stronger than that of
linearity in partial values. It is however stronger than merely assuming that preferences are
compensatory. It has been argued that linearity in criteria is usually a reasonable assumption
provided that every criterion always has a monotonically increasing or decreasing partial value
function irrespective of the value of other criteria 180, and it is hard to imagine any clinical benefits or
risks that would not fulfil this condition.
A further simplification will be made to the utility function for the purposes of this chapter. The
focus will be on comparing treatments in terms of the (additive) utility difference between them,
and therefore the intercept term will be omitted, with the caveat that utility will no longer always lie
within the interval [0,1].
This reduces the problem of identifying the utility function to estimating the utility coefficients 𝑈𝐶𝜔
where
𝑈 = ∑ 𝑈𝐶𝜔𝑥𝜔
𝜔=1
Again, the utility coefficients are in general identifiable and interpretable only up to an overall
scaling constant; to standardise the scale normalised “preference weights” 𝑤𝜔 will defined as
follows:
𝑤𝜔 =𝑈𝐶𝜔
∑ 𝑈𝐶𝜔𝜔=1
𝜔 ∈ 1,… ,
It is important to note however that these are not the same as the traditional MCDA weights
(𝑤𝑒𝑖𝑔ℎ𝑡𝜔). While 𝑤𝑒𝑖𝑔ℎ𝑡𝜔 corresponds to the entire domain of 𝑃𝑉𝐹𝜔, 𝑤𝜔 corresponds to a unit
change in 𝑥𝜔.
III.2.1.2 Shared preference parameters
As shown in Figure 52 and discussed in III.1.3, the preference module will be used to analyse various
types of elicited and published preference data. Each of these data types will have different
statistical properties and will therefore require different statistical models. However, inferences will
be made on the same set of preference strength parameters which will be common to all models.
By running the models simultaneously, combined inferences based on multiple data sources can be
obtained.
Chapter III.2
172
Specifically, any model featuring criterion will use the same preference strength parameter 𝑔𝜔
and utility coefficient 𝑈𝐶𝜔 for that criterion. In terms of magnitude the utility coefficients 𝑈𝐶𝜔 are
equal to the exponentiated preference strengths 𝑒𝑔𝜔.
III.2.1.3 Known signs
It will be assumed that the sign of each utility coefficient is known a priori; these signs will be passed
to the model as data. A similar assumption was made regarding the sign of the treatment effects in
Chapter II.
This assumption should usually be easily satisfied, at least for continuous criteria, since the sign of
the utility coefficient should be obvious to deduce from the definition of each criterion’s outcome
measure(s). For categorical criteria, it may sometimes be necessary to carry out a preliminary
analysis (eg using standard deterministic elicitation methods, or simply by inspecting the data) to
determine the signs.
III.2.1.4 Parameter scales
As has been noted (see III.1.1.2), in the general MAUT framework utility is expressed on an arbitrary
scale: in other words, the utility coefficients are only unique up to an overall scaling constant. This is
not always true of the coefficients obtained by different elicitation methods, however.
In the case of preferences obtained from choice models, the utility scale is fixed, since the
coefficients are related to the participants’ observed choice behaviour. For example, in the binomial
logit choice model, a coefficient of 1 represents an increase of 1 in the log odds of choosing a
particular alternative.
Preferences arising from relative ratings, however, are not calibrated to any absolute scale and so
there are infinitely many sets of coefficients that will fit the data. When relative ratings are analysed
alongside choice data with shared preference parameters, the choice analysis will fix the scale and
ensure a unique solution is found. Where only relative ratings are analysed, however, the analysis
will have an excessive degree of freedom which will be reflected as additional uncertainty in the
posterior distributions of the preference strengths or utility coefficients. There are two possible
strategies for dealing with this:
• Eliminate the extra degree of freedom from the analysis altogether by arbitrarily fixing the
value of 𝑔𝜔0 (typically at zero) for a particular outcome 𝜔0. This has the advantage of
ensuring a unique solution at the parameter inference level (which may help with model
convergence in an MCMC context); the disadvantage however is that the symmetry of the
Chapter III.2
173
model is compromised (in the Bayesian context this situation corresponds to a highly
lopsided prior, where the preference strength on one outcome is known with certainty and
the others are random).
• Allow the extra degree of freedom when making inferences on the preference strengths 𝑔𝜔,
but fix the scale when carrying the preferences forward for further analysis (i.e. in the actual
benefit-risk assessment). For example, defining
𝑤𝑒𝑖𝑔ℎ𝑡𝜔 = 𝑒𝑔𝜔 ∑ 𝑒𝑔𝑖
𝑖=1⁄
will give a set of normalised utility coefficients for use in benefit-risk assessment or any
other post-hoc analysis, without the additional posterior uncertainty associated with an
arbitrary absolute scale. This approach preserves the model’s symmetry but could
potentially give rise to MCMC convergence issues.
Here the latter approach that will be adopted on symmetry grounds; convergence of the preference
parameters will be monitored to ensure this approach remains appropriate.
Preference ratios, like treatment contrasts and mapping ratios, are transitive (see II.1.1 and II.4.6)
and whichever parameterisation is used should ensure that consistency is maintained. Both of the
approaches above can be seen to respect consistency since it is the preference strength (or its
exponentiated equivalent, the untility coefficient) for each criterion that is assigned a parameter
value; any preference ratios are calculated from these parameters and hence are guaranteed to be
consistent. This is not to say, however, that the observed preference ratios in the data are
guaranteed to be consistent when they are drawn from multiple source studies. Consistency in the
data is an assumption that the model relies on when combining disparate sources of evidence, and
one that should be verified in any applications. One way to check this assumption is simply to
inspect the observed preference ratios; developing more formal methods for evaluating preference
inconsistency is a priority for future work in this area.
III.2.1.5 Categorical criteria
The conventional way to describe preferences for a categorical criterion in MCDA is (i) to assign an
overall weight that represents the impact on utility of moving from the least favoured to the most
favoured level, and (ii) to express the relative utility of any intervening levels on a scale from 0 to 1
(i.e. as a partial value function).
Chapter III.2
174
The 0-1 scale will generally not be used here, as it is more straightforward to report all of the utility
coefficients for intermediate levels on the same scale as the overall weight. Note however that only
the overall weight (i.e the difference between the most and least favoured levels) is included in the
100% total; to also add the weights for other levels would exaggerate the criterion’s importance (as
they simply represent smaller portions of the criterion’s overall). This is essentially a matter of
reporting and does not affect the underlying preference parameters. An illustration is provided in
Figure 53 using criteria that will be explored later in the chapter as part of the RRMS case study.
Figure 53 – Two ways to display preferences for categorical variables – an example using criteria from the RRMS case study (but fictional data). Mode of administration is a 4-category variable, coded using 3 indicator variables with “daily subcutaneous” as the reference category. LEFT: The convention adopted in this thesis. The height of the bars shows the magnitude of the utility coefficient for each of the 3 admin indicator variables alongside the other RRMS criteria. These weights have been normalised so that the dark blue bars (i.e. excluding all admin contrasts but the weightiest) sum to 100%. RIGHT: The same information presented in more conventional MCDA fashion. Only the weightiest admin contrast is shown on the overall weight scale, and all weights are normalised to sum to 100%; the relative utility of the administration levels is shown separately as a partial value function.
Dummy coding will be used for categorical variables (see III.5.3.2) so that the utility coefficient for
each categorical level represents the difference in utility between that level and the reference level.
It is convenient to choose either the least or most favoured category as the reference so that one of
the coefficients represents the overall criterion weight. It turns out that using the least favoured
category as the reference makes for the most convenient model specification, as will be explained in
III.3.3.2.
Chapter III.2
175
III.2.1.6 Random preferences
Some models will use “random preference” formulations to allow for variation in preferences
(between studies and/or individuals), analogous to the random effects models widely used in
statistics. Typically random effects statistical models use a Normal distribution centred on the
population-average parameter to characterise the distribution of the random effects and I do not
propose to break with this convention. However, there is more than one scale such a distribution
could be fitted on as the parameters can be expressed either as preference strengths 𝑔𝜔 or utility
coefficients 𝑒𝑔𝜔 .
There are good arguments for fitting the random preferences distribution on the preference
strength scale, i.e. with mean 𝑔𝜔, and a fixed standard deviation. Specifically, preference strengths
can take any value on the real line, so under this method there is no need to censor or truncate the
distribution. Furthermore, a fixed standard deviation on the (logarithmic) preference strength scale
corresponds to a standard deviation which is a fixed proportion of the mean on the (exponentiated)
utility coefficient scale, and this seems intuitively appropriate: to use a monetary analogy, one would
expect to see greater absolute variation in the value individuals assign to a £500,000 house
compared to a 50p apple.
Based on preliminary model runs, however, it appears that models using random preference
strengths formulated on the logarithmic preference strength scale tend to exhibit somewhat poor
convergence. Better convergence is observed when the random effects are assigned a Normal
distribution on the utility coefficient scale. On this scale it is necessary to additionally ensure that
the random coefficients are strictly positive; this is easily achieved in BUGS by specifying a Normal
distribution left-censored at zero. It is also straightforward in BUGS to specify the random effects
standard deviation as a fixed proportion of the mean, thus approximating the desirable property of a
random effects distribution referred to in the previous paragraph.
When using random preference distributions, the implicit assumption is that it is valid to derive
population-level utility coefficients as a (weighted) average of the utility coefficients for each study
or individual. The question of how to aggregate preferences at the group level has long been
recognised as a key issue in utility-based economics: see for example 181-184. Support for the
approach used here is provided by a theorem by Keeney 185 (building on Harsanyi’s utilitarianian
theorem 186,187), which states that the only population utility function satisfying certain basic axioms
is a weighted average of the individual utility functions. Determining the weightings for the average
requires consideration of how important one individual’s utility is compared to another. Under the
general definition of utility, doing this in a fair and equal manner is not straightforward as a person’s
Chapter III.2
176
utility may also reflect the wellbeing of those around them (i.e. altruism) 186. In the benefit-risk
context we are restricting the utility function to measures of one’s personal health, and assuming a
regulatory perspective that assigns equal importance to all individuals, so a straightforward average
seems appropriate.
III.2.1.7 Priors
The preference priors can also (in principle) be expressed on either the preference strength or utility
coefficient scale, and there may be arguments for various distributions on one scale or the other.
Here identical Gamma distributions with shape parameter 1 and rate parameter 0.01 will be
assigned to the utility coefficients – i.e. 𝑒𝑔𝜔~𝐺𝑎𝑚𝑚𝑎(1,0.01).- There are theoretical justifications
for the Gamma prior. Firstly, the distribution has a floor at zero but no upper bound, as required for
the utility coefficients. Furthermore, a Gamma prior with shape parameter 1 on the utility
coefficients is equivalent to a 𝐷𝑖𝑟𝑖𝑐ℎ𝑙𝑒𝑡(𝟏) distribution on the normalised weights (where 1 is a
vector of length with each component equal to 1), also known as a flat Dirichlet distribution,
which is uniform over the weight space and therefore a natural choice for an uninformative prior in
this context. This distribution’s suitability for modelling weights in benefit-risk assessment has been
noted elsewhere 178.
On more practical grounds, preliminary model runs indicate that this choice of prior gives good
convergence compared to other possible priors that were investigated.
The rate parameter of the Gamma distribution has no impact on the distribution of the normalised
(relative) weights, but determines the absolute scale of the utility coefficients. This parameter
should therefore be set so that the distribution covers an a priori feasible range for the utility
coefficients.
In principle the priors in a Bayesian model should be set by referring to external evidence or expert
intuition rather than using the main model data. In the absence of any external or expert reference
points for the feasible range of utility coefficients, however, preliminary (non-Bayesian) analyses of
the datasets used in this chapter were carried out and the range of utility coefficients examined to
inform the prior (essentially an example of the “empirical Bayes” approach 188). When the rate
parameter of the Gamma distribution is set to 0.01, the 2.5% to 97.5% interval of the prior
corresponds well to the observed range of utility coefficients, with the maximum observed utility
coefficient of 388 (from a deterministic analysis of a discrete choice dataset; see III.4.4) lying just
above the 97.5% point. The 𝐺𝑎𝑚𝑚𝑎(1,0.01) distribution was therefore adopted as the prior.178
Chapter III.2
177
Other reasonable noninformative priors for the utility coefficients are possible. Normal priors with
wide variance have been used elsewhere 189 and 0 includes a sensitivity analysis using such a Normal
prior (although due to the fixing of signs in this model, only the upper half of a Normal distribution
centred on zero will be used for this purpose).
Priors can also be defined on the utility ratio scale relative to a selected reference criterion, which
may seem a natural approach when cost is included as the reference 190 but in the benefit-risk
assessment context the inherent asymmetry in treating one criterion differently from the others is
less appealing.
The remaining sections in this chapter set out the particular preference models, datasets and results
that will be used to make inferences on these parameters.
Chapter III.3
178
III.3 Bayesian analysis of elicited ratings
There is little literature on Bayesian relative ratings models. Searching the literature reveals no
Bayesian implementations of swing weighting. A Bayesian implementation of AHP has been shown
to be possible and to have advantages over traditional methods 157; but so far it has been limited to
the AHP format only; a model that generalises to other ratings data (and/or forms part of a larger
Bayesian benefit-risk assessment analysis) has not been demonstrated.
The underlying data structure is a network of criteria of the general form described in III.3.1.3; the
data themselves are ratings between which relative ratings have been provided.
III.3.1.1 Network-level constants
Let 1, … , be a set of criteria, and 1, … , 𝑛 a set of participants (typically it will be assumed
that these are individuals, but a single “participant” can also be a group of people who give ratings
on a collective consensus basis. Let 1, … , 𝐾 be the set of contrasts, i.e. the pairs of criteria
participants are asked to compare, corresponding to the links on the network diagram.
III.3.1.2 Contrast-level ratings data
The ratings data (if complete) consists of a set of K relative ratings 𝑧𝑖𝑘 (k ∈ 1,… , 𝐾) for each
participant i ∈ 1, … , 𝑛, where each k represents a specific contrast between a pair of outcomes. To
show which outcomes are involved in each contrast, an indicator variable 𝜒𝜔𝑘 is created for each
criterion 𝜔 and contrast k, taking the value 1 when the 𝜔 is the “headline” element of contrast k, -1
when 𝜔 is the “baseline” element of contrast k, and 0 otherwise. In a slight abuse of notation, 𝜔1 –
𝜔2 will sometimes be used to refer to the contrast between “headline” criterion 1 and “baseline”
criterion 2, although ratings are not straightforward subtractions but rather ratios of utility
coefficients (much of the modelling will however be carried out on the logarithmic “preference
strength” scale, where ratings do indeed correspond to subtractive differences).
III.3.1.3 Examples of contrast structures
The set of K criteria contrasts for which ratings are available is determined by the design of the
elicitation tasks. This section shows how the contrasts are coded for some example network
structures.
III.3.1.3.1 Fans
A particularly simple network structure for elicitation is the fan, as illustrated in Figure 47. In this
case a suitable set of contrasts is 𝐵 − 𝐴, 𝐶 − 𝐴 , 𝐷 − 𝐴, 𝐸 − 𝐴, 𝐹 − 𝐴, with 5 elements.
(Inverting any of these contrasts would also result in a suitable set.) The data would look like Table
14 (although the rows of data may be ordered differently).
Chapter III.3
179
Table 14 – Data structure for a “fan” network with six outcomes A, B, C, D, E and F. The ratings data are here represented by the placeholder symbol
Participant
i
Contrast
k
Rating
𝑧𝑖𝑘
𝜒𝐴𝑘 𝜒𝐵𝑘 𝜒𝐶𝑘 𝜒𝐷𝑘 𝜒𝐸𝑘 𝜒𝐹𝑘
1 1 -1 1 0 0 0 0
1 2 -1 0 1 0 0 0
1 3 -1 0 0 1 0 0
1 4 -1 0 0 0 1 0
1 5 -1 0 0 0 0 1
2 1 -1 1 0 0 0 0
2 2 -1 0 1 0 0 0
2 3 -1 0 0 1 0 0
2 4 -1 0 0 0 1 0
2 5 -1 0 0 0 0 1
etc...
III.3.1.3.2 Webs
The web structure (as in Figure 46) contains a contrast between all possible distinct pairs of
outcomes, and hence results in (2) contrasts, the maximum possible for outcomes. In the case
of Figure 46 this gives 15 contrasts, namely 𝐵 − 𝐴, 𝐶 − 𝐴 , 𝐷 − 𝐴, 𝐸 − 𝐴, 𝐹 − 𝐴, 𝐶 − 𝐵,𝐷 − 𝐵, 𝐸 −
𝐵, 𝐹 − 𝐵,𝐷 − 𝐶, 𝐸 − 𝐶, 𝐹 − 𝐶, 𝐸 − 𝐷, 𝐹 − 𝐷, 𝐹 − 𝐸 and data resembling Table 15 (again, the rows
may be permuted).
Table 15 – Data structure for a “web” network with six criteria A, B, C, D, E and F. The ratings data are here represented by the placeholder symbol
Participant
i
Contrast
k
Rating
𝑧𝑖𝑘
𝜒𝐴𝑘 𝜒𝐵𝑘 𝜒𝐶𝑘 𝜒𝐷𝑘 𝜒𝐸𝑘 𝜒𝐹𝑘
1 1 -1 1 0 0 0 0
1 2 -1 0 1 0 0 0
1 3 -1 0 0 1 0 0
1 4 -1 0 0 0 1 0
1 5 -1 0 0 0 0 1
1 6 0 -1 1 0 0 0
1 7 0 -1 0 1 0 0
1 8 0 -1 0 0 1 0
1 9 0 -1 0 0 0 1
Chapter III.3
180
1 10 0 0 -1 1 0 0
1 11 0 0 -1 0 1 0
1 12 0 0 -1 0 0 1
1 13 0 0 0 -1 1 0
1 14 0 0 0 -1 0 1
1 15 0 0 0 0 -1 1
2 1 -1 1 0 0 0 0
etc...
III.3.1.3.3 Trees and other hierarchical network structures
The same principles can be used to prepare elicitation data for trees and other hierarchical network
structures.
The network in Figure 49 includes 15 contrasts, namely 𝐵 − 𝐴, 𝐶 − 𝐴 , 𝐶 − 𝐵, 𝐸 − 𝐷, 𝐹 − 𝐷, 𝐹 −
𝐸, 𝐻 − 𝐺, 𝐼 − 𝐺, 𝐽 − 𝐺, 𝐼 − 𝐻, 𝐽 − 𝐻, 𝐽 − 𝐼, 𝐷 − 𝐴, 𝐺 − 𝐴, 𝐺 − 𝐷 and data resembling Table
16 (again, the rows may be permuted).
Table 16 – Data structure for the network in Figure 49 with ten outcomes A, B, C, D, E, F, G, H, I and J. The ratings data are here represented by the placeholder symbol
Participant
i
Contrast
k
Rating
𝑧𝑖𝑘
𝜒𝐴𝑘 𝜒𝐵𝑘 𝜒𝐶𝑘 𝜒𝐷𝑘 𝜒𝐸𝑘 𝜒𝐹𝑘 𝜒𝐺𝑘 𝜒𝐻𝑘 𝜒𝐼𝑘 𝜒𝐽𝑘
1 1 -1 1 0 0 0 0 0 0 0 0
1 2 -1 0 1 0 0 0 0 0 0 0
1 3 0 -1 1 0 0 0 0 0 0 0
1 4 0 0 0 -1 1 0 0 0 0 0
1 5 0 0 0 -1 0 1 0 0 0 0
1 6 0 0 0 0 -1 1 0 0 0 0
1 7 0 0 0 0 0 0 -1 1 0 0
1 8 0 0 0 0 0 0 -1 0 1 0
1 9 0 0 0 0 0 0 -1 0 0 1
1 10 0 0 0 0 0 0 0 -1 1 0
1 11 0 0 0 0 0 0 0 -1 0 1
1 12 0 0 0 0 0 0 0 0 -1 1
1 13 -1 0 0 1 0 0 0 0 0 0
1 14 -1 0 0 0 0 0 1 0 0 0
Chapter III.3
181
1 15 0 0 0 -1 0 0 1 0 0 0
2 1 -1 1 0 0 0 0 0 0 0 0
etc...
The tree in Figure 50 includes 9 contrasts, namely 𝐵 − 𝐴, 𝐶 − 𝐴 , 𝐸 − 𝐷, 𝐹 − 𝐷,𝐻 − 𝐺, 𝐼 − 𝐺, 𝐽 −
𝐺, 𝐷 − 𝐴, 𝐺 − 𝐴 and data resembling Table 17 (again, the rows may be permuted).
Table 17 – Data structure for the tree in Figure 50 with ten outcomes A, B, C, D, E, F, G, H, I and J. The ratings data are here represented by the placeholder symbol
Participant
i
Contrast
k
Rating
𝑧𝑖𝑘
𝜒𝐴𝑘 𝜒𝐵𝑘 𝜒𝐶𝑘 𝜒𝐷𝑘 𝜒𝐸𝑘 𝜒𝐹𝑘 𝜒𝐺𝑘 𝜒𝐻𝑘 𝜒𝐼𝑘 𝜒𝐽𝑘
1 1 -1 1 0 0 0 0 0 0 0 0
1 2 -1 0 1 0 0 0 0 0 0 0
1 3 0 0 0 -1 1 0 0 0 0 0
1 4 0 0 0 -1 0 1 0 0 0 0
1 5 0 0 0 0 0 0 -1 1 0 0
1 6 0 0 0 0 0 0 -1 0 1 0
1 7 0 0 0 0 0 0 -1 0 0 1
1 8 -1 0 0 1 0 0 0 0 0 0
1 9 -1 0 0 0 0 0 1 0 0 0
2 1 -1 1 0 0 0 0 0 0 0 0
etc...
These examples used the substitution rule to structure the hierarchy. An agglomeration rule, if well-
defined a priori (and not dependent upon any vague/uncertain frequencies), could also be
incorporated by including more indicator variables (with appropriate coefficients) in the relevant
contrasts.
III.3.2 Datasets
To illustrate the application of the ratings-based elicitation model, two datasets from a recent
benefit-risk methodology project will be used, as detailed in the sections below.
III.3.2.1 PROTECT investigator ratings for RRMS treatment outcomes
This dataset consists of relative ratings for treatment outcomes, derived using swing weighting and a
value tree structure, and used in an early example of benefit-risk assessment using MCDA 29,53.
Ratings were determined by consensus within the case study team; prior to this two individuals had
Chapter III.3
182
given their ratings. Here these will be regarded as 3 independent participants (but it is recognised
that in reality this probably is not the case as the two individuals were part of the team).
This elicitation exercise was part of a proof-of-concept MCDA-based benefit-risk assessment into the
RRMS drugs glatiramer acetate, intramuscular interferon beta-1a, and natalizumab. The first two
drugs have already been encountered in the comparison of first-line RRMS therapies in Chapter II,
whereas natalizumab was excluded as it is usually reserved as second-line treatment for more
aggressive cases.
A full listing of this dataset is shown in Appendix A.2.
The “swing weighting” methodology was used to elicit ratings. This uses a tree structure sometimes
known as a value tree (Figure 54). The benefits and one of the risks (liver enzyme elevation) are
essentially the same as in Chapter II but a number of other risks are also included:
• Herpes reactivation – the reactivation of dormant herpes infections, a risk of many RRMS
treatments due to their immunosuppressive nature.
• Progressive Multifocal Leukoencephalopathy (PML) - a brain infection by the John
Cunningham Virus, causing severe disability and death if untreated, this is a rare but very
serious risk associated with natalizumab.
• Congenital abnormalities – the risk of teratogenic disorders due to treatment.
• Seizures – a known risk of interferon beta-1a.
• Infusion/injection reactions – localised reactions to administration by infection or infusion
• Allergic/hypersensitivity reactions – systemic allergic reactions to treatment
• Flu-like reactions – transient systemic flu-like malaise as a result of treatment, usually
resolving in a few days
The route of administration for a treatment can have a significant impact on patient preferences 191
192 and the following routes of administration were included in the ratings exercise:
• Daily oral (self-administered)
• Daily subcutaneous injection (self-administered)
• Weekly intramuscular injection (self-administered)
• Monthly intravenous infusion (in clinic)
Chapter III.3
183
Figure 54 – Value tree for the RRMS investigator ratings dataset before elicitation. Administration modes (blue), clinical benefits (green) and risks (red) of treatment for the PROTECT RRMS investigator ratings data. The yellow cells are the comparison points at which the swing weighting methodology is to be applied.
Swing weighting takes place at each yellow cell, resulting in relative ratings for its “children”. Here
the weighting was carried out “bottom-up” (i.e. right to left in the orientation shown) with the
substitution rule (as per the example in III.1.2.3). Promoting the criteria changes the shape of the
network; after this process the tree appears rather different (Figure 55).
Treatment effects
Benefits
Reduction in relapses
Slowdown in disability progression
Risks
Infection
Herpes reactivation
PML
Congenital abnormalities
Liver enzyme elevation
Seizures
Other
Infusion/injection reactions
Allergic/hypersensitivity reactions
Flu-like reactions
Administration
Administration: daily oral vs daily subcutaneous
Administration: monthly infusion vs daily subcutaneous
Administration: weekly intramuscular vs daily
subcutaneous
Chapter III.3
184
Figure 55 – Value tree for the RRMS investigator ratings dataset after the elicitation process is complete. Administration modes (blue), clinical benefits (green) and risks (red) of treatment for the PROTECT RRMS investigator ratings data.
III.3.2.2 PROTECT patient ratings for RRMS patient outcomes
This dataset consists of criteria ratings originally elicited within PROTECT’s workstream on patient
and public involvement 193. As with the investigator ratings, the project concerned preferences for
outcomes of treatment with the RRMS drugs glatiramer acetate, intramuscular interferon beta-1a,
and natalizumab. The data were elicited using the (classical) AHP method 146, in a paper-based
survey issued to RRMS patients at a London clinic. The study design, consent processes and ethical
approval have been described elsewhere 193. The initial analysis of these data revealed a problem
with the way the survey questions had been worded, however, and as a result there is substantial
doubt over the validity of the elicited preferences. The problems were twofold and both were
PML
Disability progression Relapse
Herpes reactivation
Congenital abnormalities
Liver enzyme elevation
Seizures
Infusion/injection reactions
Allergic/hypersensit-ivity reactions
Flu-like reactions
Administration: daily oral vs daily
subcutaneous
Administration: monthly infusion vs daily subcutaneous
Administration: weekly intramuscular
vs daily subcutaneous
Chapter III.3
185
common pitfalls that have already been discussed: firstly, the “amount” of each outcome used for
the comparisons was not specified, meaning that the interpretation of the ratings is not clear, an
example of range insensitivity bias (see III.1.3.3). Secondly, no rule was specified for moving
between the hierarchies, causing further problems (see III.3.1.3.3). Not all of the ratings in the
dataset are affected by these issues, however. Among the criteria being compared were the route
and frequency of administration for each drug, which should give valid ratings as they are
unambiguously defined and located at the same level of the elicitation hierarchy.
Using just the administration categories within the AHP framework results in the elicitation network
– a web – shown in Figure 56. Note that the administration categories feature alone rather than as
pairwise contrasts; this is a feature of the AHP methodology that will be discussed further in III.3.3.2.
Figure 56 – Elicitation network diagram for administration modes in the PROTECT RRMS patient ratings data.
A full listing of this dataset is provided in Appendix A.2.
III.3.3 Statistical model
III.3.3.1 Key model features and parameters
The model has its roots in the regression-based AHP analysis that has been proposed before on
various occasions 149,151,154-156 including in a Bayesian context 157. Here however the model will be
generalised beyond AHP to apply more widely to any cardinal relative ratings data inlcuding that
obtained from other elicitation methods such as swing weighting.
Chapter III.3
186
Recall from III.2.1 that the population-average utility function is assumed to be linear and additive in
all criteria, taking the form
𝑈 = ∑ 𝑒𝑔𝜔𝑥𝜔
𝜔=1
where 𝑥𝜔 is the independent variable representing criterion 𝜔 and 𝑔𝜔 is its associated preference
strength.
If preferences are assumed to vary between units (such as studies or individuals), a random
preference model can be used so that the utility function for unit i is
𝑈𝑖 = ∑ 𝑒𝛾𝑖𝜔𝑥𝜔
𝜔=1
where 𝛾𝑖𝜔 is unit i's preference strength for criterion 𝜔. (The distribution of 𝛾𝑖𝜔, or rather 𝑒𝛾𝑖𝜔 , is
set out in the following page).
As previously discussed, the absolute scale of the utility function is arbitrary and irrelevant to
multicriteria decision making. Decision options are always evaluated relative to one another rather
than by reference to some external benchmark, so it is the relative magnitudes of the utility
coefficients 𝑒𝛾𝑖𝜔 for the set of criteria 𝜔 ∈ 1,… , Ω that influence decision making behaviour. In
other words, if one assumes homogeneous preferences among a population, then for any pair of
criteria 𝜔1, 𝜔2 ∈ 1,… , Ω it is the preference ratios 𝑒𝛾𝑖𝜔2/𝑒𝛾𝑖𝜔1 that must be homogeneous rather
than the absolute values 𝑒𝛾𝑖𝜔1 and 𝑒𝛾𝑖𝜔2 .
It is axiomatic that preferences should be transitive – in other words, if one perceives A as twice as
attractive as B, and B as twice as attractive as C, then A should appear four times as attractive as B.
This is guaranteed by the model since the preference ratios are naturally transitive under
multiplication, i.e. 𝑒𝛾𝑖𝜔3
𝑒𝛾𝑖𝜔1
= 𝑒𝛾𝑖𝜔3
𝑒𝛾𝑖𝜔2
× 𝑒𝛾𝑖𝜔2
𝑒𝛾𝑖𝜔1
.
A rating by individual i comparing the outcome 𝜔2to 𝜔1 is an estimate or “measurement” (subject to
measurement error) of the ratio of utility coefficients 𝑒𝛾𝑖𝜔2/𝑒𝛾𝑖𝜔1 .
To construct the likelihood for the ratings model, these ratio estimates are transformed to the
logarithmic scale and assumed to consist of (i) a deterministic component 𝛾𝑖𝜔2 − 𝛾𝑖𝜔1 and (ii) a
Normal error term 휀𝑖𝑘 with variance 𝜎𝑟𝑎𝑡2 . is the error terms are assumed to be independent,
Chapter III.3
187
meaning that any “mistake” made by a participant in a rating task (i.e., a deviation from their true
underlying preferences) is independent of any “mistakes” they make on other ratings.
In other words, if 𝑧𝑖𝜔1𝜔2 is individual i’s rating of 𝜔2 compared to 𝜔1 then log(𝑧𝑖𝜔1𝜔2) follows a
Normal distribution with mean 𝛾𝑖𝜔2 − 𝛾𝑖𝜔1 and variance 𝜎𝑟𝑎𝑡2 :
log(𝑧𝑖𝑘) ~ 𝑁(𝛾𝑖𝜔2 − 𝛾𝑖𝜔1 , 𝜎𝑟𝑎𝑡2 )
Equivalently, using the notation of III.3.1.2,
log(𝑧𝑖𝑘) ~ 𝑁(∑ 𝛾𝑖𝜔𝜒𝜔𝑘𝜔=1 , 𝜎𝑟𝑎𝑡
2 ) for every pairwise contrast 𝑘 ∈ 1,… , 𝐾.
The use of an additive error term on the log ratio scale is equivalent to assuming multiplicative
errors on the ratio scale.
The variance 𝜎𝑟𝑎𝑡2 is here assumed to be constant for all participants for the sake of parsimony, but
in principle it would be straightforward to allow for heterogeneous variances. The value of 𝜎𝑟𝑎𝑡2
reflects the deviation of the stated ratings from the underlying preference strength ratios. It is this
variability that allows the model to incorporate inconsistencies in the ratings. Thus if a vague prior is
assigned to 𝜎𝑟𝑎𝑡2 then its posterior will reflect the overall level inconsistency in the dataset. In some
datasets (such as trees evaluated by only one individual), there is no scope for inconsistency in the
data and hence no data-based estimate of 𝜎𝑟𝑎𝑡2 will be possible. In such cases the posterior
distribution will reflect only the prior.
The model can be constructed with either “fixed” or “random” preferences, somewhat analogous to
fixed- and random-effects meta-analysis but at the level of individual participants rather than
studies. In the random preference version of the model, 𝛾𝑖𝜔 is allowed to vary between individuals
to accomodate the presence of preference heterogeneity in the population. For the reasons
discussed in III.2.1.6, a Normal distribution on the exponentiated scale is used, with mean 𝑒𝑔𝜔 (the
population-average utility coefficient for criterion ) and standard deviation proportional to the
mean (with the coefficient of proportionality denoted 𝜎𝑝𝑟𝑒𝑓), i.e.
𝑒𝛾𝑖𝜔 ~ 𝑁(𝑒𝑔𝜔 , (𝑒𝑔𝜔𝜎𝑝𝑟𝑒𝑓) 2) for an individual participant i.
No allowance is made for any correlations among 𝛾𝑖𝜔 or 𝑒𝛾𝑖𝜔 for distinct values of . I am not
aware of any compelling reason to believe that statistical correlations must exist between the
preferences for different criteria, although it is not implausible. The model could in principle be
extended to incorporate such correlations using an approach similar to that employed for the
Chapter III.3
188
between-study outcome correlations in Chapter II. For this initial proof of concept it was felt that
this was an unnecessary layer of complexity.
In the fixed preferences version of the model, preferences are assumed to be perfectly
homogeneous among the participants, i.e. 𝛾𝑖𝜔 = 𝑔𝜔.
It is also possible to combine data from more than one elicitation study in the model; in such
instance it may be desirable to add in another hierarchy of random effects so that preferences can
vary between studies in addition to (or instead of) between individuals.
III.3.3.2 Categorical variables
Ratings-based elicitation methodologies vary in their parameterisation of the levels of a categorical
variable. Some methodologies such as AHP estimate a utility coefficient for every level, whereas
others such as swing weighting fix one level as a reference and estimate coefficients for all but the
reference level. The utility associated with the reference level is analogous to the intercept term in
linear regression – i.e. it is a nuisance parameter that is not important when comparing treatments,
where only the difference in utility between levels matter. Furthermore, the parameterisation used
in AHP has been criticised because the intercept term can interfere with judgements expressed on
the ratio scale179. The model developed here therefore uses a parameterisation similar to swing-
weighting that is based on utility differences and does not estimate the reference level. Instead, an
indicator variable is constructed for each level apart from the reference and these indicators are
treated as separate criteria, with their estimated utility coefficients representing the difference in
utility from the reference level (i.e. the convention known as dummy coding – see III.5.3.2 for more
discussion of coding schemes).
To give an example, if Q is a categorical variable with n levels 0,1, … , 𝑛 − 1 then it will be
represented in the utility function by n-1 criteria 𝜔1, … , 𝜔𝑛−1. The population-average utility
function (focusing here solely on these n-1 criteria for clarity, although there may be others) is
𝑈 = ∑𝑒𝑔𝜔𝑞𝑥𝜔𝑞
n−1
𝑞=1
where 𝑥𝜔𝑞 is an indicator variable taking the value 1 when Q=q and 0 otherwise.
This way, each coefficient 𝑒𝑔𝜔𝑞 represents the change in utility associated with moving from Q level
0 (the reference level) to level q. When Q=0 (the reference level), U=0. In other words, there is no
intercept term in the utility function. In principle this lack of intercept should not concern us since
(as we have already seen) the absolute level of utility is arbitrary and irrelevant; and indeed this
Chapter III.3
189
parameterisation works fine with ratings methodologies such as swing weighting that ask
participants to compare changes in utility rather than absolute levels.
In the AHP elicitation method, however, participants are required to compare the absolute level of
utility associated with each level of a categorical variable. In order to analyse ratings elicited using
this method, an additional nuisance intercept parameter
𝛼𝐴𝐻𝑃 is included in the utility function to represent the reference level; that is, the utility function is
𝑈 = 𝛼𝐴𝐻𝑃 +∑𝑒𝑔𝜔𝑞𝑥𝜔𝑞
n−1
𝑞=1
The mean utility coefficient for any other level q on the absolute AHP utility scale is then equal to
𝛼𝐴𝐻𝑃 + 𝑒𝑔𝜔𝑞 . .
For coding purposes it is convenient to choose the least favoured level (i.e. the level with the lowest
utility) as the reference. If the most favoured level were used as the reference, then the coefficient
on the AHP scale would be 𝛼𝐴𝐻𝑃 - 𝑒𝑔𝜔 and additional constraints would need to be imposed to
ensure that this quantity remained strictly positive (since its logarithm informs the likelihood). If any
intermediate level were used as the reference then some coefficients would be equal to 𝛼𝐴𝐻𝑃 + 𝑒𝑔𝜔
and some equal to 𝛼𝐴𝐻𝑃 - 𝑒𝑔𝜔, overcomplicating the model code (and additionally, no single 𝑒𝑔𝜔
would correspond to the entire criterion preference weight).
As discussed in III.2.1.5, it is conventional in MCDA to present the preference parameters for the
administration modes as a partial value function taking values from 0 (least preferred) to 1 (most
preferred) for each category, and a weight that corresponds to the entire range. This can be done
based on the utility coefficient parameterisation used here; the weight for the range is simply the
largest utility coefficient 𝑒𝑔𝜔 among the categorical levels, and the partial value for each level is its
utility coefficient expressed as a proportion of the weight for the range. If the maximum coefficient
is not identifiable in advance, presenting the results this way (with fully simulated posterior
distributions) will tend to require an additional post hoc model run so that the appropriate
calculations can be specified.
III.3.3.3 Priors
The main prior distributions used are set out in Table 18 and will be used throughout this chapter
unless otherwise stated. From time to time alternative priors may be used to investigate specific
points; this will be made clear at the time. The priors for the standard deviation parameters may
appear rather wide, but it was felt to be sensible for the distribution to cover multiple orders of
Chapter III.3
190
magnitude so as not to obscure any extreme heterogeneity caused by combining different
methodologies/scales in different populations, and the possibility of framing biases, etc.
Table 18 – Priors for ratings model parameters
Parameter name Parameter description Prior(s)
𝑔𝜔 Population-average preference strength for
criterion
𝑒𝑔𝜔~𝐺𝑎𝑚𝑚𝑎(1,0.01)
(see III.2.1.7 for more details)
𝛼𝐴𝐻𝑃 Nuisance parameter representing utility of “reference” level of categorical administration variable in AHP-style ratings
𝛼𝐴𝐻𝑃~𝐺𝑎𝑚𝑚𝑎(1,0.1)
(similar to the prior for 𝑒𝑔𝜔 but over a smaller scale, as the reference level is
expected to have a relatively low utility)
𝜎𝑟𝑎𝑡 Standard deviation of log ratings
𝜎𝑟𝑎𝑡 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10)
This vague prior is expected to be more than wide enough to cover all plausible
values since a deviation of 10 on the logarithmic scale (on which the random
distribution of ratings is defined) corresponds to multiplication or
division by 𝑒10 ≈ 22,000, i.e. a change of several orders of magnitude.
𝜎𝑝𝑟𝑒𝑓 Proportional standard deviation of random preference distribution
𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10)
This was the widest uniform prior that allowed the models to run without errors. The random distribution of preferences is defined such that
a deviation of 10 corresponds to a 10-fold increase (or decrease) in any given utility
coefficient; the variability in ratings (i.e. ratios of utility coefficients) will be somewhat higher still and this should be sufficient to cover any
plausible between-participant variation in preferences.
III.3.3.4 Initial values for MCMC simulation
Preliminary runs reveal that, unlike most models in this thesis where BUGS is able to generate
suitable initial values, in the ratings model initial values for the utility coefficients must be supplied
by the user in order for the model to converge properly. This is likely to be because the utility
coefficients at intermediate levels of the administration variable must be lower than the coefficient
at the highest level and BUGS does not recognise this restriction when generating initial values. The
initial values used here are 𝑒𝑔𝜔 = 1 for any criterion used to derive a preference weight, and
Chapter III.3
191
𝑒𝑔𝜔 = 0.5 for the intermediate levels of the administration variable. If the model converges
properly then the results should not be sensitive to the choice of initial values; to confirm this, a
sensitivity analysis of the overall MCDA results to alternative sets of initial values is presented in
Appendix C.
III.3.4 Results
Appendix B contains the BUGS code and data files used to generate the results.
III.3.4.1 PROTECT investigator ratings
Simulations were performed using the Markov Chain Monte Carlo technique in either WinBUGS
(version 1.4.3) 48 or OpenBUGS (version 3.2.2 rev 1063 - www.openbugs.net). Initial values were
generated within BUGS for the majority of models. 100,000 iterations were discarded to allow for
“burn-in”; the posterior statistics were then derived from a further 100,000 iterations. Convergence
was assessed by inspection of the sample histories. Model fit was assessed by calculation of the
mean residual deviance119, which in a well-fitting model should be similar to (or less than) the
number of independent observations.
Table 19
Chapter III.3
192
Table 19 shows each participant’s mean preference weights calculated using the standard
deterministic method (see III.1.2.3) and also in the fixed effects Bayesian model described above, for
three different uniform priors on the ratings standard deviation.
Note that (as will be the case throughout this chapter) the utility coefficients and/or weights for
continuous criteria are expressed with regard to fixed unit intervals in the underlying outcome
measures. These units are provided in the results tables. It is important to recognise that the
outcome values used for elicitation, and/or experienced by patients on real-world treatments, may
be very different. The given “weight” for a criterion should therefore not by itself be interpreted as
a measure of that criterion’s influence on the benefit-risk balance without also considering the likely
real-world levels of the associated outcome measure.
Chapter III.3
193
Table 19 – Mean preference weights for individual participants in the investigator ratings dataset; deterministic analysis and Bayesian analysis with sensitivity to assumed ratings standard deviation
Participant 1 Normalised preference weights Unit Determin-
istic Posterior mean
σrat (0,0.01)
Posterior mean
σrat (0,1)
Posterior mean
σrat (0,10) Clinical outcomes
Relapse 1 event 3.9% 3.9% 5.8% 7.0%
Disability progression 1 event 5.6% 5.6% 6.5% 7.1%
PML 1 event 55.9% 55.9% 45.0% 34.9%
Herpes reactivation 1 event 6.7% 6.7% 6.8% 7.1%
Liver enzyme elevation 1 event 11.2% 11.2% 9.5% 8.8%
Seizures 1 event 5.6% 5.6% 6.0% 6.6%
Congenital abnormalities 1 event 5.6% 5.6% 6.0% 6.7%
Infusion/injection reactions 1 event 2.8% 2.8% 5.4% 6.9%
Allergic/hypersensitivity reactions 1 event 1.1% 1.1% 3.5% 5.5%
Flu-like reactions 1 event 1.1% 1.1% 3.5% 5.5%
Modes of administration (reference = daily subcutaneous)
Daily oral N/A 0.6% 0.6% 1.9% 3.9%
Monthly infusion N/A 0.4% 0.4% 2.5% 5.6%
Weekly intramuscular N/A 0.3% 0.3% 1.9% 4.9%
Participant 2
Normalised preference weights Unit Determin-istic
Posterior mean
σrat (0,0.01)
Posterior mean
σrat (0,1)
Posterior mean
σrat (0,10) Clinical outcomes
Relapse 1 event 19.0% 19.0% 11.8% 10.6%
Disability progression 1 event 27.1% 27.1% 17.7% 15.3%
PML 1 event 30.1% 30.1% 31.4% 28.6%
Herpes reactivation 1 event 6.0% 6.0% 7.5% 7.6%
Liver enzyme elevation 1 event 6.0% 6.0% 7.4% 7.6%
Seizures 1 event 3.0% 3.0% 4.5% 5.2%
Congenital abnormalities 1 event 3.0% 3.0% 4.5% 5.2%
Infusion/injection reactions 1 event 1.5% 1.5% 4.1% 5.2%
Allergic/hypersensitivity reactions 1 event 0.6% 0.6% 2.7% 4.0%
Flu-like reactions 1 event 0.6% 0.6% 2.7% 4.0%
Modes of administration (reference = daily subcutaneous)
Daily oral N/A 3.0% 3.0% 5.8% 6.6%
Monthly infusion N/A 2.1% 2.1% 5.4% 6.6%
Weekly intramuscular N/A 1.5% 1.5% 4.2% 5.5%
Participant 3 Normalised preference weights Unit Determin-
istic Posterior mean
σrat (0,0.01)
Posterior mean
σrat (0,1)
Posterior mean
σrat (0,10) Clinical outcomes
Relapse 1 event 15.9% 15.9% 11.0% 10.4%
Disability progression 1 event 26.5% 26.6% 18.5% 16.9%
PML 1 event 29.5% 29.5% 29.7% 28.1%
Herpes reactivation 1 event 8.8% 8.9% 9.4% 9.2%
Liver enzyme elevation 1 event 5.9% 5.9% 7.0% 7.2%
Seizures 1 event 2.9% 3.0% 4.1% 4.5%
Congenital abnormalities 1 event 2.9% 3.0% 4.1% 4.5%
Infusion/injection reactions 1 event 1.5% 1.5% 3.0% 3.5%
Allergic/hypersensitivity reactions 1 event 1.3% 1.3% 3.6% 4.5%
Flu-like reactions 1 event 1.6% 1.6% 4.3% 5.2%
Modes of administration (reference = daily subcutaneous)
Daily oral N/A 2.9% 3.0% 5.2% 5.8%
Monthly infusion N/A 2.1% 2.1% 4.8% 5.7%
Weekly intramuscular N/A 1.5% 1.5% 3.7% 4.6%
Chapter III.3
194
This shows that the methods give the same central estimate of the weights when the standard
deviation is effectively zero, but when a nonzero standard deviation is allowed for the estimates
diverge considerably, with the weights being dragged towards their joint prior (which weights all 11
criteria equally at 9%). In other words the value of the standard deviation parameter can have a
strong impact on the results - but analysing individual trees like this provides no actual evidence on
what the standard deviation should be; its posterior essentially just reflects its prior. In the Bayesian
model, however, we can analyse all three participants together and thus obtain a data-based
posterior of the ratings standard deviation that reflects the inconsistency between individuals’
ratings. Results from this model are shown in Table 20 together with the preference weights and the
calculated residual deviance. Because of the small number of participants in this rating exercise,
only the fixed effects results are provided. The figures in Table 20 are based upon a uniform
standard deviation prior on the interval (0,10) which is intended to be wide enough to cover the full
range of plausible values (see III.3.3.3).
Table 20 – Posterior distribution of preferences for simultaneous analysis of all participants in the investigator ratings dataset
FIXED PREFERENCES 3 participants, 36 ratings in total
Normalised preference weights
Unit Mean sd 2.5% Median 97.5%
Clinical outcomes Relapse 1 event 10.1% 3.8% 4.2% 9.7% 18.9% Disability progression 1 event 15.2% 3.8% 8.5% 15.0% 23.4% PML 1 event 40.1% 4.2% 31.6% 40.1% 48.0% Herpes reactivation 1 event 8.1% 2.7% 3.9% 7.7% 14.4% Liver enzyme elevation 1 event 8.3% 2.8% 4.0% 8.0% 14.8% Seizures 1 event 4.5% 1.6% 2.1% 4.2% 8.4% Congenital abnormalities 1 event 4.5% 1.6% 2.1% 4.2% 8.4% Infusion/injection reactions 1 event 2.8% 1.0% 1.4% 2.7% 5.4% Allergic/hypersensitivity reactions 1 event 1.8% 1.0% 0.6% 1.5% 4.2% Flu-like reactions 1 event 1.9% 1.0% 0.6% 1.7% 4.5%
Modes of administration (reference = daily subcutaneous)
Daily oral N/A 2.7% 1.0% 1.2% 2.5% 5.2% Monthly infusion N/A 2.2% 1.2% 0.7% 1.9% 5.4% Weekly intramuscular N/A 1.6% 0.9% 0.5% 1.4% 4.0%
Ratings standard deviation N/A 0.63 0.10 0.47 0.62 0.87 Residual deviance N/A 35.0 8.4 20.5 34.3 53.2
Chapter III.3
195
The number of observations in the dataset is 36 (3 individuals providing 12 ratings each); since the
mean residual deviance is slightly lower, at 35, this indicates a good model fit. The posterior
distribution of the ratings standard deviation lies well below the upper bound of the prior,
suggesting that the prior was indeed suitably vague (this is also confirmed by a sensitivity analysis in
Appendix C).
III.3.4.2 PROTECT patient ratings
Table 21 shows the median preferences for administration modes for a single (arbitrarily chosen)
participant in the PROTECT patient ratings dataset, derived using both the Bayesian model (for two
different priors on the ratings standard deviation) and the standard deterministic “eigenvalue”
method 146.
In this instance, as only the administration categories feature in this dataset and the ratings were
elicited using the AHP methodology, the results are presented in two ways:
• as AHP-style partial “priorities” – that is, the utility coefficients 𝛼𝐴𝐻𝑃 + 𝑒𝑔𝜔 for every
category but the reference and simply 𝛼𝐴𝐻𝑃 for the reference, normalised to sum to 1 across
all categories; and
• as a partial value function on a scale from 0 to 1, representing each category’s utility
coefficient as a proportion of the maximum.
Medians have been shown instead of means because the partial value distributions are highly
skewed.
Because this ratings dataset uses a web of pairwise comparisons, rather than the tree used in the
investigator ratings dataset, there is scope for inconsistency in a single individual’s ratings and an
estimate of the ratings standard deviation can therefore be obtained.
Chapter III.3
196
Table 21 – Median preferences for a single participant in the patient ratings dataset; deterministic analysis and Bayesian analysis with sensitivity to assumed ratings standard deviation
Participant 1 Determ
-inistic Posterior median
σrat
(0,0.01)
Posterior median
σrat (0,0.1)
Posterior median
σrat (0,1)
Posterior median
σrat (0,10)
Posterior median
σrat
(0,100) Partial priorities for modes of administration (on strictly positive absolute scale from AHP)
Daily subcutaneous 0.05 0.06 0.06 0.06 0.03 0.03 Daily oral 0.42 0.42 0.42 0.40 0.35 0.35 Monthly infusion 0.13 0.13 0.13 0.14 0.19 0.19 Weekly intramuscular 0.40 0.39 0.39 0.37 0.33 0.33
Partial values for modes of administration (relative to reference = daily subcutaneous)
Daily oral 1.00 1.00 1.00 1.00 1.00 1.00 Monthly infusion 0.21 0.19 0.20 0.23 0.48 0.47 Weekly intramuscular 0.94 0.93 0.93 0.93 0.96 0.97
Ratings standard deviation N/A 0.01 0.10 0.90 2.35 2.34 Residual deviance N/A 43,080.0 435.1 10.7 4.3 4.3
As was found with the investigator ratings dataset, the standard deterministic analysis and the
Bayesian analysis are in close agreement for a single participant if the ratings standard deviation is
constrained to be small enough, but can diverge considerably otherwise due to shrinkage towards
the prior weights. In this instance the true value of the standard deviation can be estimated from
the data. The value of 2.34 is the same in both of the last two columns, indicating that it is a true
data-based estimate rather than an artefact of the prior; this in turn strongly suggests that the
narrower priors in the previous two columns are unduly restrictive. Examining the residual deviance
(bearing in mind there are 6 observations) backs up this conclusion, with the model fit clearly
superior when the ratings standard deviation is not tightly constrained. This means that the best-
fitting model is somewhat sensitive to the prior weights, suggesting a need for particular care when
setting these priors and perhaps some kind of sensitivity analysis.
Chapter III.3
197
Table 27 summarises the posterior distribution of preferences for administration modes (and other
key variables) based on a simultaneous analysis of all participants in the PROTECT patient ratings
dataset. Results from a fixed preference model and a random (by participant) preference model are
shown. As before, the ratings standard deviation is assigned a uniform prior on the interval (0,10)
on the grounds that this includes all plausible values.
Chapter III.3
198
Table 22 – Posterior distribution of preferences for simultaneous analysis of all participants in the patient ratings dataset
FIXED PREFERENCES 36 participants, 207 ratings in total
Mean sd 2.5% Median 97.5%
Partial priorities for modes of administration (on strictly positive absolute scale from AHP)
Daily subcutaneous 0.09 0.05 0.00 0.09 0.18 Daily oral 0.42 0.04 0.34 0.41 0.49 Monthly infusion 0.29 0.03 0.24 0.29 0.36 Weekly intramuscular 0.20 0.02 0.16 0.20 0.24
Partial values for modes of administration (relative to reference = daily subcutaneous)
Daily oral 1.00 N/A
(constant) 1.00 1.00 1.00 Monthly infusion 0.61 0.15 0.30 0.62 0.92 Weekly intramuscular 0.30 0.15 0.02 0.31 0.55
Ratings standard deviation 1.21 0.06 1.10 1.21 1.34 Residual deviance 205.9 20.3 168.0 205.3 247.4 RANDOM PREFERENCES BY PARTICIPANT
36 participants, 207 ratings in total
Mean sd 2.5% Median 97.5%
Partial priorities for modes of administration (on strictly positive absolute scale from AHP)
Daily subcutaneous 0.09 0.05 0.01 0.10 0.18 Daily oral 0.42 0.04 0.34 0.42 0.50 Monthly infusion 0.29 0.03 0.23 0.29 0.36 Weekly intramuscular 0.20 0.02 0.16 0.19 0.24
Partial values for modes of administration (relative to reference = daily subcutaneous)
Daily oral 1.00 N/A
(constant) 1.00 1.00 1.00 Monthly infusion 0.60 0.16 0.29 0.60 0.92 Weekly intramuscular 0.29 0.15 0.02 0.30 0.55
Ratings standard deviation 1.06 0.07 0.93 1.05 1.20 Proportional between-participant preference standard deviation 0.33 0.04 0.25 0.34 0.41 Residual deviance 206.1 20.3 168.4 205.5 247.6
The mean residual deviance is almost identical in both the fixed and random preference models, and
lies just below 207, the number of observations, indicating that both models fit the data well, indeed
Chapter III.3
199
equally well. In such instances it is good practice 119 to favour the model with fewer parameters i.e.
the fixed preference model.
The preferences for administration modes seem reasonably consistent with those in the investigator
ratings dataset in that the categories are ranked equivalently, although the estimated partial values
differ somewhat.
There is not much that can be inferred about the homogeneity (or otherwise) of the study
populations by comparing the standard deviation estimates in each dataset. The estimated ratings
standard deviation in the fixed preference model here (mean=1.21) is higher than the estimate
obtained from the investigator ratings dataset (mean=0.635), but it would be presumptuous to
interpret this is indicating a true difference in populations, since the latter estimate is based on only
3 participants and comes from a tree structure that provides minimal information on such a
parameter.
III.3.4.3 PROTECT investigator and patient ratings
Table 23 shows the results for a ratings model combining both the PROTECT patient and investigator
ratings. Results are shown for two versions of the model: fixed preferences, and random (by
participant) preferences. A version allowing for preference variation at the between-study level has
also been developed but has not been fitted here, partly to avoid generating too many results and
partly because there are only two studies providing data. This model will however be used later in
the chapter when more studies (using different elicitation methods) are added – see 0. From here
on the preferences for administration modes are presented here only as partial values, not as AHP-
style partial priorities.
Chapter III.3
200
Table 23 – Posterior distribution of preferences for simultaneous analysis of all participants in the investigator ratings and patient ratings datasets
FIXED PREFERENCES 2 studies, 39 participants and 243 ratings in total
Unit Mean sd 2.5% Median 97.5%
Preference weights Relapse 1 event 11.5% 6.5% 2.7% 10.1% 27.5% Disability progression 1 event 15.0% 6.2% 5.6% 14.2% 29.3% PML 1 event 32.8% 6.4% 20.8% 32.6% 45.9% Herpes reactivation 1 event 8.6% 4.9% 2.2% 7.5% 20.7% Liver enzyme elevation 1 event 8.8% 5.0% 2.3% 7.7% 21.2% Seizures 1 event 5.1% 3.2% 1.2% 4.3% 13.5% Congenital abnormalities 1 event 5.1% 3.3% 1.2% 4.3% 13.6% Infusion/injection reactions 1 event 4.7% 2.5% 1.3% 4.2% 11.0% Allergic/hypersensitivity reactions 1 event 3.9% 3.1% 0.6% 3.0% 12.2% Flu-like reactions 1 event 4.1% 3.3% 0.6% 3.2% 12.8% Administration (daily oral vs daily subcutaneous) N/A 0.5% 0.2% 0.2% 0.5% 1.0% Administration (monthly infusion vs daily subcutaneous) N/A 0.3% 0.1% 0.1% 0.3% 0.6% Administration (weekly intramuscular vs daily subcutaneous) N/A 0.2% 0.1% 0.1% 0.2% 0.4%
Ratings standard deviation
N/A 1.17 0.05 1.06 1.16 1.28 Residual deviance N/A 242.0 22.1 200.9 241.3 286.9 RANDOM PREFERENCES BY PARTICIPANT
2 studies, 39 participants and 243 ratings in total
Unit Mean sd 2.5% Median 97.5%
Preference weights Relapse 1 event 12.0% 6.7% 2.6% 10.7% 28.6% Disability progression
1 event 15.9% 6.4% 5.7% 15.0% 31.0% PML 1 event 32.6% 7.0% 20.1% 32.3% 47.5% Herpes reactivation 1 event 8.4% 4.6% 2.3% 7.4% 19.9% Liver enzyme elevation
1 event 9.0% 5.1% 2.4% 7.9% 21.7% Seizures 1 event 5.2% 3.1% 1.3% 4.5% 13.0% Congenital abnormalities
1 event 4.9% 2.8% 1.3% 4.2% 12.2% Infusion/injection reactions 1 event 4.4% 2.4% 1.3% 3.9% 10.7% Allergic/hypersensitivity reactions 1 event 3.3% 2.5% 0.6% 2.6% 9.9% Flu-like reactions 1 event 3.8% 3.0% 0.6% 2.9% 11.9% Administration (daily oral vs daily subcutaneous) N/A 0.6% 0.2% 0.3% 0.6% 1.1% Administration (monthly infusion vs daily subcutaneous) N/A 0.4% 0.1% 0.2% 0.4% 0.7% Administration (weekly intramuscular vs daily subcutaneous) N/A 0.2% 0.1% 0.1% 0.2% 0.5%
Ratings standard deviation N/A 1.01 0.06 0.90 1.01 1.14
Proportional between-participant preference standard deviation
N/A 0.33 0.04 0.25 0.33 0.40 Residual deviance N/A 241.9 21.9 200.7 241.2 286.5
Chapter III.3
201
There is very little difference between the results for the two versions of the model and the
estimated between-participant preference standard deviation in the second model is low –
suggesting the simple fixed preference model is most appropriate. The residual deviance is very
similar in both models (again indicating the fixed preference model should be favoured), and falls
slightly below the number of observations (243), indicating a good fit.
The model appears to be combining the two datasets in an appropriate manner. As expected, the
partial values for administration modes fall between the estimates from each individual dataset, and
the preference weights agree with those in Table 20, as they are drawn only from the investigator
ratings dataset.
In this instance, the main benefit of adding in the patient ratings data is only a modest refinement in
the estimates of preferences for administration modes (since these are the only criteria in the
patient ratings dataset). However, these results provide an important proof of concept, showing
that ratings from separate studies using different sets of outcomes can be combined in a single
analysis. As well as obtaining combined results, this provides a means to examine the between-
study heterogeneity of the elicited preferences. In this instance, another benefit of including the
patient ratigns data is that it allows one to obtain a data-based estimate of the ratings standard
deviation to feed back into the investigator ratings analysis, which otherwise had to rely only on the
prior to estimate this parameter.
The ratings standard deviation and between-participant standard deviation cannot be directly
compared as they are defined on different scales. To convert the estimated mean ratings standard
deviation of 1.01 to the proportional absolute scale used for the between-participant preference
standard deviation requires the following steps:
• Start on log ratings scale: 1.01
• Divide by √2 to obtain standard deviation associated with a single preference strength
rather than a rating which compares two preference strengths (assuming independence of
preferences), still on log ratings scale: 0.71
• Exponentiate to obtain standard multiplicative factor on absolute scale: 2.02
• Subtract 1 to obtain proportional standard deviation on absolute scale: 1.02
(As it happens, this set of transformations is approximated by the identity when the starting figure is
close to 1, but this is a mere coincidence.)
Chapter III.3
202
The estimated (between-participant) preference standard deviation of 0.33 appears therefore to be
considerably lower than the within-participant random error, indicating a good degree of preference
homogeneity among and between populations even if individual participants apparently found it
hard to always express their preferences consistently or accurately.
III.3.5 Discussion
The Bayesian model appears viable and brings with it the advantages inherent to all Bayesian MCMC
models: the ability to allow for uncertainty in the data/parameters and propagate the full
uncertainty distribution through the model outputs and any subsequent calculations together with
the ability to allow for prior information if desired. Furthermore it has been shown here that the
model can combine more than one source of ratings despite differences in the original elicitation
methods.
The results in Table 19 show that this model’s Bayesian method gives weights that are noticeably
influenced by the prior . Here, this resulted in shrinkage towards equal weighting on all criteria.
Given that some outcomes are a priori more “important” than others, such even weighting in the
prior may be undesirable; it may be more appropriate to use weights from a pilot study, or an
empirical Bayes approach where the prior is based on the data. One way to specify such a prior is to
set 𝑒𝑔𝜔~𝐺𝑎𝑚𝑚𝑎(𝑤𝜔 , 0.01) where 𝑤𝜔 is an empirical deterministic estimate of the weight on
outcome 𝜔.
Moreover, the standard deviations estimated from the data (Tables 21 to 23) were sufficient to
cause substantial disagreement with the deterministic analysis. Indeed, this was the case despite
the fact that the standard deterministic analysis was not the same in both cases (swing weighting for
the investigator ratings, eigenvalue AHP for the patient ratings). This is an unexpected result that
raises questions over whether a deterministic analysis of ratings data can be relied upon to provide a
robust central estimate. It would be of value to confirm these findings in other datasets, and
perhaps using alternative models for the distribution of ratings, but on the face of these results
there is a strong argument for favouring the Bayesian analysis over standard deterministic
techniques for ratings even if only point estimates of preferences are required.
Another advantage of the Bayesian analysis is that it provides a principled framework for combining
ratings from multiple participants. It is of course possible to use the standard deterministic analyses
and calculate averages across participants – but the averages can be taken at the level of either the
input ratings or the output parameters, and there has been substantial discussion in the literature as
to the appropriateness and underlying assumptions of these two approaches 157,194,195. The use of a
Bayesian hierarchical model means this debate can be avoided, and provides a principled solution
Chapter III.3
203
based on a clear underlying model, with the added bonus that MCMC allows one to derive full
posterior distributions rather than just summary statistics.
Insofar as the datasets could be compared (i.e. the preferences for administration modes), they
appeared reasonably homogeneous; there will be more scope to investigate the homogeneity of
preferences in the sections that follow.
It is somewhat surprising that residual deviance was virtually the same for the fixed and random
preference versions of the model, as this is generally not the case in fixed- and random-effects meta-
analysis. The individual deviances for each observation do vary between the models, but after
summation the same total is obtained. It is not yet clear if this is an indicator of the homogeneity in
the datasets used here or an inherent property of the model.
Although the model was designed to be generalisable to a variety of datasets, there are some
limitations. Non-standard variants of criteria ratings data (such as ordinal ratings) are not currently
supported. The model also cannot directly analyse the verbal judgements made in MACBETH unless
these are first translated to a numerical scale of some kind.
The estimated preferences appear quite reasonable; certainly there are no figures that stand out as
implausible, and the criterion with the greatest weight is PML, which is fairly unarguably the most
debilitating clinical outcome. It is worth recalling that the participants in the investigator ratings
dataset were not truly three independent individuals (see III.3.2.1), but nevertheless they do still
represent real-world preference estimates. Therefore, with regard to the RRMS case study, the
ratings model has now provided some real-world evidence on relative preference weights for:
• ten clinical outcomes, three of which were included in the evidence synthesis in Chapter II
and are therefore of primary interest (relapses, disability progression, liver enzyme
elevation)
• four modes of administration, covering all the RRMS treatments that were featured in
Chapter II.
This is a good start, but there is more usable evidence on RRMS preferences beyond just these
relative ratings, as the remainder of this chapter will demonstrate.
Chapter III.4
204
III.4 Bayesian analysis of choice data
As discussed in III.1.2.4, Bayesian models for analysing discrete choice experiments have already
been established. The aim of this section is therefore not to describe a new methodology, but rather
to use standard methods to carry out a Bayesian analysis of a choice dataset that is relevant to the
RRMS study. Later, the results will be compared and combined with those from the methods
developed elsewhere in this chapter, and used to inform a preference-weighted benefit-risk
assessment based on the evidence synthesis in Chapter II.
III.4.1 Data structure
The data are drawn from a single discrete choice experiment involving several individual
participants. All choice tasks in this dataset are binary; examples of analyses involving more
alternatives per choice set can be found in the literature 167.
III.4.1.1 Study-level constants
Let 1, … , be a set of outcomes, and 1, … , 𝑛 a set of participants (again, it will typically it will
be assumed that these are individuals, but a single “participant” can also be a group of people make
choices on a collective consensus basis. Let 1, … ,𝑁𝐶𝑆 be the collection of completed choice sets
in the dataset, i.e. the number of observations.
III.4.1.2 Choice-set-level data
For each observation 𝑘 ∈ 1, … ,𝑁𝐶𝑆 the following variables are supplied:
• 𝑠𝑢𝑏𝑗𝑒𝑐𝑡𝑘 ∈ 1,… , 𝑛 an identifier for the subject or participant who completed the choice
task
• 𝑥𝜔𝑗 (𝜔 ∈ 1,… , , 𝑗 ∈ 𝐴𝑘 , 𝐵𝑘 ) the value of criterion 𝜔 in scenario 𝑗 within the
choice set
• 𝑦𝑘 ∈ 0,1 a binary variable indicating whether the participant chose scenario 𝐴𝑘 (𝑦𝑖 = 1)
or scenario 𝐵𝑘 (𝑦𝑖 = 0)
III.4.2 Dataset - PROTECT patient choice data
This dataset consists of choice data originally elicited within PROTECT’s workstream on patient and
public involvement 193, like the patient ratings in III.3.2.2. The choice tasks were performed in a
paper-based survey issued to RRMS patients at a London clinic and the study design, ethical
approval and consent processes have been described elsewhere193. The choice experiment was
designed with 64 choice tasks; these were divided between 4 versions of the survey with 16 choice
tasks each so as to ease the burden on participants. The criteria used were: relapse rate, risk of
Chapter III.4
205
disability progression, risk of PML, risk of allergic/hypersensitivity reactions, risk of serious allergic
reactions, and risk of depression.
The first four of these are by now familiar, but the latter two are new and represent additional
possible risks of treatment. These are listed as possible side effects in the Summary of Product
Characteristics for at least one of the investigated treatments.
• Serious allergic reactions – systemic anaphylactic reactions to treatment requiring
hospitalisation
• Depression – thoughts of hopelessness, lack of self-worth, suicidal ideation.
A listing of the dataset (in abridged format) is shown in Appendix A.2.
III.4.3 Choice model
III.4.3.1 Binomial logit
Recall from III.1.3.1 the basic principles of choice models:
• The utility 𝑉𝑋𝑖 of a scenario X to an individual i is assumed to consist of (i) a deterministic
component 𝑈𝑋 defined as a specific function of the criteria, with parameters to be
estimated, and (ii) an individual-specific random error term 휀𝑖. That is, 𝑉𝑋𝑖 = 𝑈𝑋 + 휀𝑖 and
𝑈𝑋 = 𝑓(𝑥1, … , 𝑥𝑚; 𝜷) where 𝑥1, … , 𝑥𝑚 are the criteria values in scenario X and 𝜷 is the set
of preference parameters to be estimated. If, as here, a linear utility model is assumed, then
𝑈𝑋 = 𝛽1𝑥1 +⋯+ 𝛽𝑚𝑥𝑚 but the method is not restricted to this particular form.
• An individual i selects option A if 𝑉𝐴𝑖 > 𝑉𝑋𝑖 for all alternative options X.
A number of models are available for choice data; here the binomial logit model is used. This is the
most widely used choice model167 and is restricted to binary choice sets. The model assumes a
logistic link function between the excess utility of an alternative and the probability of choosing that
alternative. In other words, for a binary choice set k comparing scenarios 𝐴𝑘 and 𝐵𝑘, the log odds of
choosing 𝐴𝑘 over 𝐵𝑘 is given by 𝑈𝐴𝑘 −𝑈𝐵𝑘 (following the notation established in III.1.3, or
equivalently, the probability of choosing A is 𝑒𝑈𝐴𝑘 (𝑒𝑈𝐴𝑘 + 𝑒𝑈𝐵𝑘)⁄ . A number of arguments exist for
the validity of the logit model 168.
𝑈𝐴𝑘 and 𝑈𝐵𝑘 are the values of the (assumed linear) utility function for the criteria levels in scenarios
A and B respectively. The linear coefficients for each criterion are the utility coefficients 𝑒𝑔𝜔 (or
the random equivalent 𝑒𝛾𝑖𝜔 for participant i in the random preferences model). Thus the probability
𝑝𝐴𝑘 of choosing 𝐴𝑘 over 𝐵𝑘 in the random preferences model satisfies
Chapter III.4
206
𝑙𝑜𝑔𝑖𝑡(𝑝𝐴𝑘) = 𝑈𝐴𝑘 − 𝑈𝐵𝑘 = ∑ 𝑒𝛾𝑖𝜔𝑥𝜔𝐴𝑘 − 𝑒𝛾𝑖𝜔𝑥𝜔𝐵𝑘
𝜔=1
= ∑ 𝑒𝛾𝑖𝜔(𝑥𝜔𝐴𝑘 − 𝑥𝜔𝐵𝑘)
𝜔=1
where 𝑥𝜔𝐴𝑘 and 𝑥𝜔𝐵𝑘 are the levels of criterion 𝜔 in alternatives 𝐴𝑘 and 𝐵𝑘 respectively.
It is also possible to include a constant intercept term in the equation above in order to capture any
difference in utility between the choice set alternatives that is not explained by the criteria in the
model. That is, the equation above could be rewritten as
𝑙𝑜𝑔𝑖𝑡(𝑝𝐴𝑘) = 𝑈𝐴𝑘 −𝑈𝐵𝑘 = ∑ 𝑒𝛾𝑖𝜔(𝑥𝜔𝐴𝑘 − 𝑥𝜔𝐵𝑘)
𝜔=1
+ 𝐴𝑆𝐶
where ASC represents any difference in utility between v and B that is not explained by the criteria
𝜔 ∈ 1,… , Ω. In the DCE literature this intercept term is referred to as an alternative-specific
constant. In some DCEs in other fields (such as market research) where the scenarios correspond to
familiar real-world alternatives (such as brands of consumer goods) and are labelled as such, ASCs
can be useful in capturing the effect of unmeasurable criteria (eg brand image) that cannot be
explicitly included in the model. In the preference elicitation context the choice sets are
hypothetical and unlabelled with any real-world interpretation; the ASC thus only represents any
inherent bias the participants may have between the first (or left-hand) alternative 𝐴𝑘 and the
second (or right-hand) alternative 𝐵𝑘. A preliminary frequentist analysis of the RRMS choice dataset
revealed a statistically insignificant ASC at the 5% level, meaning that there is little evidence for any
such left-right bias. For this reason no ASC is therefore included here, but it would be
straightforward to do so.
Having determined 𝑝𝐴𝑘 , the likelihood of an observed choice 𝑦𝑘 is simply 𝑦𝑘~𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖(𝑝𝐴𝑘).
As usual both a “fixed preferences” and “random preferences” version of the model are possible,
but in Bayesian MCMC applications the fixed effects model is preferred, as the next paragraph
explains.
III.4.3.2 Individual vs collapsed analysis
Specifying the model with a Bernoulli likelihood (one choice set and subject per observation) in an
MCMC context results in a model that is prohibitively slow to update, based on initial trial runs. In
the fixed preferences model, significant time savings can be achieved by aggregating/collapsing the
Chapter III.4
207
data so that each observation gives the proportion of subjects choosing y=1 for a particular choice
set, and using a Binomial likelihood corresponding to the sum of the individual Bernoullis. The
downside of this specification is that it cannot incorporate random preferences, as it relies on the
choice probabilities being the same for all subjects. For practical reasons, therefore, only a fixed
preference choice analysis will be performed here. It may be worth recalling that a fixed preference
analysis was found to be appropriate in the patient ratings dataset; the same may be a reasonable
assumption of the choice dataset as enrolment was simultaneous from the same population, with
participants randomised into one or the other dataset.
III.4.3.3 Priors
In the fixed effect model it is only necessary to specify priors for either the preference strengths or
utility coefficients; here the latter are all assigned identical Gamma(1,0.01) priors as set out in
III.2.1.7.
III.4.4 Results
The results of the fixed preferences model are shown in Table 24 alongside the results of a
frequentist multinomil logit analysis conducted using the “mlogit” package in R version 3.2.1.
Normalised weights have only been calculated in the Bayesian model, since it is not straightforward
to obtain their standard errors in the frequentist framework (and the original coefficients are
sufficient to compare the two models).
Chapter III.4
208
Table 24 - Posterior distribution of preferences in the patient choice dataset
124 participants, 1755 choices in total FIXED PREFERENCES Frequentist
analysis Bayesian analysis
unit mean SE mean SE 2.5% median 97.5%
Utility coefficients on choice scale; i.e. effect on log odds of choice
Relapse 1 relapse/year -0.80 0.31 -0.85 0.31 -1.46 -0.85 -0.26
Disability progression
100% risk over 2 years -12.50 0.65 -12.45 0.65 -13.75 -12.44 -11.19
PML 100% risk over 2 years -388.3 30.0 -383.7 30.6 -444.3 -383.5 -324.5
Allergic/hypersen-sitivity reactions
100% risk over 2 years -1.22 0.18 -1.24 0.18 -1.59 -1.23 -0.89
Serious allergic reactions
100% risk over 2 years -36.34 4.10 -39.79 4.07 -47.87 -39.77 -31.88
Depression 100% risk over 2 years -5.81 0.77 -5.16 0.76 -6.66 -5.16 -3.67
Normalised preference weights
Relapse 1 relapse/year N/A
N/A 0.2% 0.1% 0.1% 0.2% 0.3%
Disability progression
100% risk over 2 years N/A N/A 2.8% 0.2% 2.5% 2.8% 3.2%
PML 100% risk over 2 years N/A N/A 86.5% 1.1% 84.3% 86.6% 88.6%
Allergic/hypersen-sitivity reactions
100% risk over 2 years N/A N/A 0.3% 0.0% 0.2% 0.3% 0.4%
Serious allergic reactions
100% risk over 2 years N/A N/A 9.0% 0.9% 7.3% 9.0% 10.9%
Depression 100% risk over 2 years N/A N/A 1.2% 0.2% 0.8% 1.2% 1.5%
Residual deviance N/A N/A N/A 94.3 3.5 89.5 93.6 102.7
The results appear to show that the Bayesian model is working as expected and fits the data well:
the Bayesian estimates of the utility coefficients are close to those in the frequentist analysis, and
the residual deviance is well below the number of individual participants in the choice experiment.
III.4.5 Discussion
Akthough implementing a Bayesian binomial logit choice analysis is not in itself novel, these results
show that it can be done successfully within the specific parameterisation that I am using in this
thesis. This forms the starting point for a unified model that can combine choice data with other
sources of preference information, an approach that will be pursued in 0.
Although many criteria appear in both this dataset and the investigator ratings dataset, the utility
coefficients cannot be directly compared, as the absolute scale in the ratings model is arbitrary; and
neither can the weights, as the overall set of criteria is not the same. It is however straightforward
Chapter III.4
209
to renormalise the weight estimates within the set of shared criteria, putting them on a comparable
basis, as shown in Table 25 (which is based on the median weights).
Table 25 – Comparison of criteria weights in the investigator ratings and patient choices datasets.
Criterion Median renormalised weight (investigator ratings)
Median renormalised weight (patient choices)
Relapse 14.5% 0.03%
Disability progression 22.5% 3.05%
PML 60.7% 93.91%
Allergic/hypersensitivity reactions
2.3% 3.01%
A few conclusions are immediately obvious:
• PML has the highest weight in both models/datasets, as expected;
• The weights for PML and allergic/hypersensitivity reactions are of a similar order in both
datasets;
• The weights for disability progression and relapse are considerably lower in the patient
choice dataset – and in the case of relapse, the difference is extreme.
To recap on the chapter so far: Bayesian models for eliciting preferences based on individual ratings
data and individual choice have been demonstrated and fitted using a shared parameterisation.
Three sets of individual elicitation data, originating from different methods, have been analysed.
The evidence so far on homogeneity/heterogeneity of preferences has been mixed.
Later, in 0, a more formal approach to comparing and combining these preference data sources will
be attempted. The next section, however, looks at another potential data source - published
elicitation studies – and investigates whether a meta-analysis of preferences is possible.
Chapter III.5
210
III.5 Bayesian meta-analysis of preferences
Just as decision makers frequently rely on evidence synthesis of existing studies to provide clinical
data instead carrying out their own studies, they may also wish to rely on externally elicited
preferences. There could be many reasons for this – for example, elicitation exercises can be difficult
to design, take time and money to carry out, it may be difficult to get access to enough of the right
kind of participants. Even if one has original elicited data available for analysis, it may sometimes be
desirable to borrow strength from external studies. An exploration of the homo/heterogeneiety
amongst preference elicitation studies may be of interest in its own right. This suggests a need for a
meta-analytical methodology that can aggregate the results of multiple elicitation studies; however,
no such methods have been described.
At least one review of patient preference studies in a particular disease area has been published, but
it was a qualitative analysis with no attempt made at numerical aggregation of the data 196. The
authors stated that “given the high degree of variability across choice-based preference-elicitation
studies with regard to design, sample composition, statistical analyses, and research questions, and
the relatively small literature, a meta-analysis was deemed untenable”. Some of these points are
beyond the meta-analyst’s control, but with appropriate methodology (to be explored in this
section) it may be possible to overcome some of the bigger issues relating to the heterogeneity of
studies’ design and statistical methods. The number of preference studies in the literature is always
increasing, and it does not seem unfeasible that in some disease areas a large enough set of studies
from a relatively homogeneous population, suitable for meta-analysis, may be available soon if not
already. As interest in quantitative methods using preferences increases, and analysts and decision
makers seek to make sense of a growing volume of data, I anticipate that an upsurge in demand for
meta-analyses of preferences is imminent.
Since medical decisions will often involve more than two criteria, and in general not all of the source
studies will feature every relevant criterion, any meta-analytical method would need to be flexible
enough to handle problems and datasets with arbitrary dimensions, as well as a range of study
designs and analyses. This section attempts to develop a suitable model.
Chapter III.5
211
III.5.1 Data structure
III.5.1.1 Network-level constants
Let 1, … , be a set of outcomes, and 1,…,NPS a set of preference elicitation studies.
III.5.1.2 Study-level constants
For each preference elicitation study 𝑖 ∈ 1,… ,𝑁𝑃𝑆 the following constants describe the
dimensions:
𝑁𝑃𝐶𝑖 ∈ 1,… , the number of criteria within study 𝑖 ∈ 1,… , 𝑁𝑃𝑆 . The criteria within study i
are ordered such that 𝜔𝑖𝑗 ∈ 1,… , refers to the jth criterion.
𝑁𝑃𝑖𝑗 ∈ 1,2,3, … the number of individual points, levels or categories of criterion 𝑗 ∈ 1,… , 𝑁𝑃𝐶𝑖
for which a utility coefficient is provided in study 𝑖 ∈ 1,… ,𝑁𝑃𝑆 .
𝑁𝐿𝑖𝑗 ∈ 1,2,3, … the number of levels of criterion 𝑗 ∈ 1,… ,𝑁𝑃𝐶𝑖 for which a utility coefficient is
to be estimated by the model. Continuous criteria are assumed to have linear partial value
functions, so that 𝑁𝐿𝑖𝑗 = 1; but for categorical criteria a separate coefficient is required for each
level, meaning that 𝑁𝐿𝑖𝑗 = 𝑁𝑃𝑖𝑗.
III.5.1.3 Outcome-level preference data
The data consists of estimated coefficients from the utility models in the source studies and their
standard errors, denoted by 𝑐𝑖𝑗𝑘 and 𝜋𝑖𝑗𝑘 respectively for study i, outcome j, point k.
The coefficients often have to be transformed from their raw state in order to ensure consistency of
scale and category coding between studies. This is discussed in further detail in III.5.3.1 and III.5.3.2.
𝑙𝑖𝑗𝑘 ∈ 1,2, … ,𝑁𝐿𝑖𝑗 refers to the criterion level for which a utility coefficient is to be estimated at
the within-study coordinates criterion j, point k within study i. In the overall parameterisation of
preferences each level is represented as a separate criterion, denoted by 𝜔𝑖𝑗𝑘 ∈ 1,2, … , 𝑁𝑃𝐶𝑖𝑗.
Another variable 𝑥𝑖𝑗𝑘 is also used, containing (for continuous criteria) the value of criterion j to
which the utility coefficient 𝜋𝑖𝑗𝑘 relates. Categorical criteria are coded using dummy indicator
variables so that 𝑥𝑖𝑗𝑘 = either 0 or 1.
III.5.2 Dataset: RRMS
The method was applied to the relapsing remitting multiple sclerosis (RRMS) case study. The aim
was to perform a meta-analysis on the results of published patient preference elicitation studies
Chapter III.5
212
relating to the RRMS treatment outcomes and administration modes that have already been
encountered in the case study – namely, relapses, disability progression, liver enzyme elevation and
Additional literature searches were carried out to identify utility/preference elicitation studies. Only
studies reporting patient preferences (as opposed to the preferences of some other group such as
clinicians) were included. Each study was screened to determine which criteria had been included,
how these were defined and whether sufficient information was reported to enable inclusion in the
analysis. For further details see Appendix A.
The criteria that were included and the assumed form of their partial value functions are shown in
Table 26.
Table 26 – RRMS case study outcomes for the preference synthesis module. PVF = partial value function
Outcome PVF type Units / levels
Relapse frequency Linear Relapses per year
Risk of disability
progression
Linear Risk of progression
Route and frequency
of administration
Categorical Daily oral, 1-3x weekly intramuscular or
subcutaneous, monthly intravenous.
The precise categories used for administration modes vary between studies, reflecting the diversity
of treatment regimens available. In order to ensure a reasonable amount of data per category, and
for compatibility with the treatments in the RRMS case study and the ratings datasets, it was
decided to pool all self-administered injection methods (i.e. subcutaneous and intramuscular) at
frequencies from once weekly to once every 2 days (or thrice weekly).
The studies that contributed data to each case study are shown in Table 27.
Details of the search and screening process are set out in Appendix A.
Chapter III.5
213
Table 27 – Source studies for the RRMS dataset for the preference synthesis module.
RRMS studies Outcomes used Number of participants
ARROYO 197 Relapse frequency Disability progression risk Route and frequency of administration
221
GARCIA-DOMINGUEZ 198 Route and frequency of administration
125
MANSFIELD 199 Relapse frequency Route and frequency of administration
301
POULOS 200 Relapse frequency Disability progression risk
189
UTZ 201 Route and frequency of administration
156
WILSON 2014 202 Relapse frequency Disability progression risk Route and frequency of administration
291
WILSON 2015 203 Relapse frequency Route and frequency of administration
50
III.5.3 Data extraction
Care must be taken in extracting the estimated coefficients from the source studies as outcomes
may be expressed on different scales and/or using different coding conventions. These factors can
affect both the interpretation of the coefficients and their statistical properties, as detailed below.
III.5.3.1 Categorical or continuous
Assuming that utility is linear in any continuous criteria (perhaps after a transformation) is a fairly
commonplace practice. However, many studies (particularly discrete choice experiments) instead
allow for non-linearity by estimating separate utility coefficients for a number of discrete levels of a
criterion (i.e. treating it as a categorical variable). In principle the same approach could be used in a
meta-analysis, but unfortunately studies will tend (more often than not) to use different discrete
levels, rendering the results incompatible. For meta-analytical purposes, therefore, it will often be
particularly convenient therefore to assume linearity (or, perhaps, some other continuous
monotonic function with a simple parameterisation) after having examined the data to check that
such a relationship appears appropriate. Where a study reports a linear coefficient it can be used as
is; where a study reports discrete levels these can be fitted within the linear framework by
performing a linear regression on the discrete points within the study. This regression can be
incorporated within the main model, as will be shown in III.5.5, so technically this is a form of meta-
regression 204,205.
Chapter III.5
214
III.5.3.2 Coding schemes
When fitting regression models with categorical predictors, there is more than one coding scheme
that can be used to represent the categorical data.
III.5.3.2.1 Dummy coding
Dummy coding is probably the most widely used coding method. For a categorical variable X with n
categories, dummy coding involves choosing a category to act as the “baseline” or “reference”.
Then, order the variables such that category 1 is the reference, and create n-1 indicator variables
𝑋2, … , 𝑋𝑛 such that 𝑋𝑖 = 1 for an individual observation in category i and 0 otherwise, as shown in
the example in Table 28 for 𝑛 = 4. There is no indicator variable for the reference category 1, as the
dependent variable for observations in this category is reflected in the intercept term of the
regression equation. To include an extra variable for category 1 would add an unnecessary degree
of freedom to the model and hinder estimation.
Table 28 – Example of a 4-category variable and its dummy-coded indicator variables
X
(original
category)
𝑋2 𝑋3 𝑋4
1 0 0 0
2 1 0 0
3 0 1 0
4 0 0 1
Because the intercept reflects the dependent variable in the reference category, the interpretation
of the regression coefficient 𝛽𝑖 corresponding to 𝑋𝑖 (i >1) is the difference in the dependent variable
between category i and the reference category. As such, the estimated coefficients for the indicator
variables 𝑋𝑖 share a common baseline, and are therefore fundamentally correlated, which will now
be formalised in Theorem 4. The theorem and its proof are largely applicable to the general
regression context, but will be set out here in terms of DCEs so that the interpretation of the
assumptions is clear. There are four key assumptions that make the correlations easily tractable:
• independence of observations;
• equal variance of the dependent variable (i.e. perceived utility) in all predictor categories;
• level balance in the choice experiment design (i.e. the categories of the various predictor
variables occur with equal frequency in the choice sets); and
Chapter III.5
215
• orthogonality in the choice experiment design (i.e. there is no confounding between
different categories or variables in the choice design – in other words the effect of each
variable can be estimated independently of the level of other variables).
The first two are typical assumptions underlying almost all DCEs; level balance and orthogonality, on
the other hand, are desirable (but not universal) properties of well-designed DCEs 206.
Theorem 4: In an orthogonal level-balanced DCE, assuming independent observations and equal
variance of utility in all categories, under dummy coding the correlation between coefficients for
distinct levels of the same variable is 0.5 and the correlation between coefficients for different
variables is 0.
Proof:
Assume initially that all predictor variables are categorical.
Suppose that X is the predictor variable (with m categories) and X2,...,Xm are its dummy-coded
indicator variables for which we are trying to estimate the regression coefficients β2,..., β m. We can
assume without loss of generality that there is one other predictor Z with n categories (since if X is
the only predictor, then Z can still be said to exist with (trivially) n=1; and if there is more than one
other predictor with numbers of categories given by n1, n2, n3,,... then they can be combined into a
single predictor Z with the number of categories given by n1*n2*n3*... i.e. all possible combinations
of the original categories. The coefficients of Z will be linear combinations of the original
coefficients, and vice versa). Let Z2,...,Zn be the dummy-coded indicator variables for Z with
coefficients γ2,..., γ n.
Let 𝑌𝑖𝑗 denote the average value of the continuous dependent variable Y (in this case, utility
expressed as the log odds of choice) for observations in X category 𝑖 ∈ 1,… ,𝑚 i and Z category 𝑗 ∈
1, … , 𝑛. In an orthogonal design, all combinations of categories occur equally often206; therefore
there are observations in every combination of categories and 𝑌𝑖𝑗 is well-defined. It follows from
the assumptions of equal variance of utility across categories, and equal frequency of categories
within the data (level balance), that 𝑣𝑎𝑟(𝑌𝑖𝑗 ) is a constant 𝜎2 for all i,j.
Within a Z category Zj, the estimated difference in Y (utility) between X category i and the reference
category is given by 𝑌𝑖𝑗 − 𝑌1𝑗 . The estimated coefficient 𝛽 (𝑖 ∈ 2,… ,𝑚) will correspond to the
average of these estimated differences across all Z categories j in proportion to their frequency in
the choice sets. Under the assumption of level balance, all categories of Z occur with equal
frequency and thus the estimate is a simple average, 𝛽 =1
𝑛∑ (𝑛𝑗=1 𝑌𝑖𝑗 − 𝑌1𝑗 ) .
Chapter III.5
216
Thus 𝑣𝑎𝑟(𝛽) = 𝑣𝑎𝑟(1
𝑛∑ (𝑛𝑗=1 𝑌𝑖𝑗 − 𝑌1𝑗 )) =
1
𝑛2∑ 2𝜎2𝑛𝑗=1 (by independence of observations)
= 2𝜎2/𝑛.
Similarly 𝛾 =1
𝑚∑ (𝑚𝑖=1 𝑌𝑖𝑗 − 𝑌𝑖1 ) and 𝑣𝑎𝑟(𝛾) = 2𝜎
2/𝑚.
For distinct 𝑖1, 𝑖2 ∈ 1, …𝑚, 𝑐𝑜𝑣(𝛽𝑖1,𝛽𝑖2
) =1
𝑛2𝑐𝑜𝑣(𝑌1𝑗 , 𝑌1𝑗 ) (by independence of observations)
= 𝜎2/𝑛 .
Therefore the correlation coefficient between 𝛽𝑖1and 𝛽𝑖2
is 0.5 as required.
𝑐𝑜𝑣(𝛽, 𝛾) =1
𝑚𝑛(𝑐𝑜𝑣(𝑌𝑖𝑗 , 𝑌𝑖𝑗 ) − 𝑐𝑜𝑣(𝑌1𝑗 , 𝑌1𝑗 ) − 𝑐𝑜𝑣(𝑌𝑖1 , 𝑌𝑖1 ) + 𝑐𝑜𝑣(𝑌11 , 𝑌11 )) =
1
𝑚𝑛(𝜎2 − 𝜎2 − 𝜎2 + 𝜎2) = 0 as required.
If Z was constructed as a combination of predictors, then the original coefficients are linear
combinations of 𝛾 and therefore also uncorrelated with 𝛽.
Finally note that if either X or Z is analysed as a continuous linear variable, then each category
contributes an estimate of 𝛽/𝑤𝑖𝑑𝑡ℎ𝑖 (or 𝛾/𝑤𝑖𝑑𝑡ℎ𝑗) where 𝑤𝑖𝑑𝑡ℎ𝑖 (or 𝑤𝑖𝑑𝑡ℎ𝑗) refers to the
difference in magnitude between level i (or j) and the reference. In other words continuous linear
coefficients within DCEs are effectively linear combinations of categorical coefficients, and will
therefore exhibit the same lack of correlation between variables.
III.5.3.2.2 Effects coding
Effects coding is similar to dummy coding but instead of setting all the indicators to 0 in the
reference category, they are instead set to -1, as shown in the example in Table 29 for 𝑛 = 4. This
means that the effective regression coefficient for the reference category is the sum of the
coefficients for the other categories, i.e. 𝛽1 = −∑ 𝛽𝑖𝑛𝑖=2 . Consequently ∑ 𝛽𝑖 = 0
𝑛𝑖=1 , so the
regression intercept reflects the dependent variable not in the reference category but in the “grand
mean” across all categories. This has an appealing symmetry and sometimes makes for a more
convenient interpretation, particularly when interaction terms are included, making effects coding
popular in some fields such as discrete choice modelling 166.
Table 29 – Example of a 4-category variable and its effects-coded indicator variables
Chapter III.5
217
X
(original
category)
𝑋2 𝑋3 𝑋4
1 -1 -1 -1
2 1 0 0
3 0 1 0
4 0 0 1
Although effects coding gives 𝑛 coefficients for 𝑛 categories, knowing any n-1 coefficients fully
determines the final one, since they sum to zero; there are still therefore only n-1 degrees of
freedom. This of course means that there must be correlations between the coefficients, as the
following theorem shows. As before, independence of observations, equal variance of the
dependent variable (i.e. perceived utility) in all predictor categories, level balance and orthogonality
are assumed.
Theorem 5: In an orthogonal level-balanced DCE, assuming independent observations and equal
variance of utility in all categories, under effects coding the correlation between coefficients for
distinct levels of the same variable is -1/(m-1) (where m is the number of categories) and the
correlation between coefficients for different variables is 0.
Proof:
The assumptions and notation are as described in Theorem 4 and the proof proceeds similarly.
Again, assume initially that there are two predictors X and Z, both categorical. This time the
regression coefficients are (for X) β1,..., β m and (for Z) γ2,..., γ n.
Let 𝑌𝑖𝑗 denote the average value of the continuous dependent variable Y (in this case, utility
expressed as the log odds of choice) for observations in X category 𝑖 ∈ 1,… ,𝑚 i and Z category 𝑗 ∈
1, … , 𝑛. In an orthogonal design, all combinations of categories occur equally often ; therefore
there are observations in every combination of categories and 𝑌𝑖𝑗 is well-defined. It follows from
the assumptions of equal variance of utility across categories, and equal frequency of categories
within the data (level balance), that 𝑣𝑎𝑟(𝑌𝑖𝑗 ) is a constant 𝜎2 for all i,j.
Within a Z category Zj, the estimated difference in Y (utility) between X category i and the “grand
mean” of X categories is given by 𝑌𝑖𝑗 −1
𝑚∑ 𝑌𝑘𝑗 𝑚𝑘=1 . The estimated coefficient 𝛽 (𝑖 ∈ 1,… ,𝑚)
will correspond to the average of these estimated differences across all Z categories j in proportion
Chapter III.5
218
to their frequency in the choice sets. Under the assumption of level balance, all categories of Z
occur with equal frequency and thus the estimate is 𝛽 =1
𝑛∑ (𝑛𝑗=1 𝑌𝑖𝑗 −
1
𝑚∑ 𝑌𝑘𝑗 𝑚𝑘=1 ) .
Thus 𝑣𝑎𝑟(𝛽) = 𝑣𝑎𝑟(1
𝑛∑ (𝑛𝑗=1 𝑌𝑖𝑗 −
1
𝑚∑ 𝑌𝑘𝑗 𝑚𝑘=1 ))
= 𝑣𝑎𝑟(1
𝑛∑ (𝑛𝑗=1
𝑚−1
𝑚𝑌𝑖𝑗 −
1
𝑚∑ 𝑌𝑘𝑗 𝑘≠𝑖 ))
= 1
𝑛2∑ [(
𝑚−1
𝑚)2𝜎2𝑛
𝑗=1 +𝑚−1
𝑚2 𝜎2] (by independence of observations)
=𝑚−1
𝑚𝑛𝜎2.
Similarly 𝛾 =1
𝑚∑ (𝑚𝑖=1 𝑌𝑖𝑗 −
1
𝑛∑ 𝑌𝑖𝑘 𝑛𝑘=1 ) and 𝑣𝑎𝑟(𝛾) =
𝑛−1
𝑚𝑛𝜎2.
For distinct 𝑖1, 𝑖2 ∈ 1, …𝑚,
𝑐𝑜𝑣(𝛽𝑖1,𝛽𝑖2
) = 𝑐𝑜𝑣 (1
𝑛∑𝑌𝑖1𝑗
𝑛
𝑗=1
−1
𝑚𝑛∑∑𝑌𝑘𝑗
𝑚
𝑘=1
𝑛
𝑗=1
,1
𝑛∑𝑌𝑖2𝑗
𝑛
𝑗=1
−1
𝑚𝑛∑∑𝑌𝑘𝑗
𝑚
𝑘=1
𝑛
𝑗=1
)
= 0 −𝜎2
𝑚𝑛−𝜎2
𝑚𝑛+𝜎2
𝑚𝑛 = −
𝜎2
𝑚𝑛
Therefore the correlation coefficient between 𝛽𝑖1and 𝛽𝑖2
is -1/(m-1) as required.
𝑐𝑜𝑣(𝛽, 𝛾) = 𝑐𝑜𝑣(1
𝑛∑𝑌𝑖𝑝
𝑛
𝑝=1
−1
𝑚𝑛∑∑𝑌𝑘𝑝
𝑚
𝑘=1
𝑛
𝑝=1
,1
𝑚∑𝑌𝑘𝑗
𝑚
𝑘=1
−1
𝑚𝑛∑∑𝑌𝑘𝑝
𝑛
𝑝=1
𝑚
𝑘=1
)
=𝜎2
𝑚𝑛−
𝜎2
𝑚𝑛−
𝜎2
𝑚𝑛+
𝜎2
𝑚𝑛 = 0 as required.
The cases involving more than two predictors or linear coefficients follow using the same logic as for
Theorem 4.
It is worth stressing again that Theorems 4 and 5 do not deal with correlations between coefficients
that may arise due to:
• Correlations between the underlying preference parameters, or
• Lack of orthogonality or balance in the choice experiment’s design
Chapter III.5
219
Rather, they are addressing fundamental and unavoidable structural correlations between the
variables reported in the data - correlations that will be present even given fixed utility parameters
and perfect experimental design.
Another coding scheme sometimes used for categorical variables is orthogonal coding, but this is not
likely to be encountered much in this context as it used to generate a set of user-defined contrasts,
useful for testing specific a priori hypotheses. This is unlike dummy or effects coding, which provide
a comprehensive set of contrasts relative to the same baseline, and hence are more suited to
preference elicitation studies, where a full characterisation of preferences is the aim. Orthogonal
coding by definition generates uncorrelated coefficients, so it can be handled easily enough if
necessary.
It is straightforward to identify whether a study uses dummy coding or effects coding (if the text
does not specify) by examining the results for any given categorical variable; the former should have
a reference category clearly indicated and n-1 coefficients for n categories; the latter will have n
coefficients that sum to zero.
In order to design a model for preference meta-analysis the correlations described in Theorems 4
and 5 must be allowed for in the likelihood, and this is made much simpler if the coefficients in the
data all use the same coding scheme. Fortunately, it is fairly straightforward to convert between
coding schemes (or change the reference category), as detailed in the next section.
III.5.4 Data rebasing
Since the outcomes, units and coding schemes in the data may vary, it will usually be necessary to
“rebase” the data to some extent – that is, to transform the reported coefficients and their standard
errors onto a consistent basis for analysis.
To transform standard errors, knowledge of correlations between the reported coefficients is
required. The assumption throughout is that the source DCEs have perfectly level balanced and
orthogonal designs, meaning that the correlations are as described in Theorems 4 and 5. Many
popular DCE designs such as fractional factorial arrays have good balance and orthogonality206 but
this may not always be the case. If a study is not balanced and orthogonal then details of the study
design could in principle allow better estimates of the correlations to be calculated 207.
Chapter III.5
220
The headings below set out the various kinds of transformations that were found to be necessary in
the RRMS dataset. This is mainly a general walkthrough of the issues involved – for more specifics of
the data extraction and rebasing process used in the case study, see Appendix A.
III.5.4.1 Variables with discrete utility coefficients
This section covers both categorical variables and continuous variables for which coefficients have
been estimated at discrete levels (i.e. no overall linear trend coefficient is provided).
III.5.4.1.1 Changing the coding scheme
Converting reported coefficients from a study to a different coding scheme is essentially just a
matter of constructing simple linear combinations of the coefficients, but care must be taken with
the standard errors due to the correlations between estimates.
Suppose the source study reports a set of coefficients βi (i ∈ 1,… , n) for categories i of some n-
category variable X. (If the study uses dummy coding, then βi is a constant 0 for the reference
category). The aim is to convert these coefficients to dummy coding with a reference category of
our choice – assume without loss of generality this is category number 1.
For each category i ∈ 1,… , n it is necessary to generate a coefficient βi∗ that corresponds to the
change in utility when moving category 1 to i (by the definition of dummy coding).
In other words βi∗ = βi − β1.
To derive the standard error of βi∗ one can use the following formula which follows from the basic
properties of variance and covariance:
𝑣𝑎𝑟(βi ± βj) = 𝑣𝑎𝑟(βi) + 𝑣𝑎𝑟(βj) ± 2cor(βi, βj)√𝑣𝑎𝑟(βi)𝑣𝑎𝑟(βj)
where cor(βi, βj) is the correlation coefficient between βiand βj, recalling (as per Theorems 4 and 5)
that this takes the value 0.5 for dummy coded coefficients and -1/(n-1) for effects coded coefficients.
III.5.4.1.2 Recalibrating intercept to zero (continuous variables only)
Once all of the coefficients have been converted (if necessary) to dummy coding, it is convenient to
adjust the 𝑥𝑖𝑗𝑘 values for any continuous variables originally reported in discrete levels so that no
intercept term is required in the subsequent meta-regression. This is achieved by subtracting the
value of 𝑥𝑖𝑗1 (i.e. the reference level) from all 𝑥𝑖𝑗𝑘 for all points k.
In formal terms, one defines 𝑥𝑖𝑗𝑘∗ = 𝑥𝑖𝑗𝑘 − 𝑥𝑖𝑗1 .
Chapter III.5
221
This amounts to a uniform shift in the x axis which does not affect the linear trend coefficient (since
𝑑𝑈
𝑑𝑥=
𝑑𝑈
𝑑𝑥∗ ) but eliminates the need to fit an intercept, since for the reference level both the
coefficient and the adjusted value of 𝑥𝑖𝑗1 are zero.
For some criteria (such as time-to-event criteria) a value of zero may seem to have a dubious
interpretation, but note that the correct interpretation of 𝑥𝑖𝑗𝑘∗ = 0 is an increase of zero in the value
of 𝑥𝑖𝑗𝑘 relative to a plausible baseline and not 𝑥𝑖𝑗𝑘 = 0 per se.
III.5.4.1.3 Continuous transformations (continuous variables only)
For a given criterion there are often several different outcome measures or units used by different
studies. In order to carry out a meta-analysis, the coefficients all need to be expressed using the
same measure. Converting between outcome definitions is often easy for discrete coefficients. It is
the levels to which a coefficient relates, rather than the coefficient itself, that need transforming.
Many simple transformations (such as a change of units) are entirely straightforward. The criterion
levels to which a coefficient relates (including the reference, if applicable) simply need to be
converted to the desired measure. For example, given a coefficient expressing the utility difference
between 212 degrees Fahrenheit and a reference level of 32 degrees Fahrenheit, one can easily see
that exactly the same coefficient relates to the difference between 100 and 0 degrees Celsius.
In some cases, additional assumptions may be required. The disability progression criterion in the
RRMS case study is in sometimes expressed as a risk of disability progression over a particular time
horizon, or sometimes as the expected time until a disability progression event. One strategy for
converting between the two is to assume a constant disability hazard θ; the risk of progressing
within a time period t is then given 208 by 1 − 𝑒−𝜃𝑡 . Indeed, this assumption and transformation
were used in preparing the RRMS dataset.
III.5.4.2 Variables with linear coefficients
This section covers variables for which linear utility coefficients are provided in the source data.
III.5.4.2.1 Linear transformations
It is straightforward to handle a linear transformation (such as a change of units) in a predictor
variable x that was analysed in linear fashion in the source study. In this case the reported linear
coefficient simply needs to be multiplied by the constant ratio of the interval size (i.e. the interval
width to which the transformed coefficient should relate divided by the the interval width over
which the original coefficient was defined). This should reflect both the number and size of the units
Chapter III.5
222
in each interval. The standard error is multiplied in the same way (multiplicative scaling by a
constant is a basic property of any standard deviation).
A particularly simple example is a change of units: a utility coefficient for weight gain, say, expressed
as a linear coefficient per 1g increase in weight needs multiplying by 1000 (along with its standard
error) to obtain a coefficient expressed per kg.
Similarly, where coefficients are reported that do not correspond to unit intervals, this must be
taken into account. Many DCEs tend to report linear coefficients corresponding to either end of an
interval centred at zero utility, eg +1.36 at x=5kg and -1.36 at x=1kg. This corresponds to a utility
coefficient of 2.72 over an interval of 4kg, i.e. a coefficient of 2.72/4 = 0.68 per kg.
In the model that follows the interval widths can be passed to the model as data, so it is not
important to standardise the widths, but the units in which they are expressed must of course be
consistent.
Where criteria are measured using count/rate outcomes the time horizon is part of the units. If
coefficients use different time horizons, it will often be appropriate to align them simply by linearly
scaling the event rate and coefficient in proportion with the time horizon. For example, a utility of
0.5 associated with an additional 1 relapse per month is equivalent to a utility of 0.5/12 associated
with an additional 1 relapse per year, assuming a constant event rate throughout the period
(although one should always consider whether any such extrapolation of time horizons is
appropriate).
III.5.4.2.2 Non-linear transformations
Non-linear transformations are problematic if all that is reported is a linear coefficient since (over an
arbitrary interval) linearity on both the original and transformed scale is impossible. Often, however,
linear coefficients are estimated using only two discrete points at either end of an interval; if these
points are known then it is possible to change the linear scale by transforming the individual discrete
values as per III.5.4.1.3 (if one is satisfied that interpolating on the new linear scale is appropriate).
If linear coefficients were estimated using more than two discrete levels, they are incompatible with
linearity on any other scale. In the RRMS dataset, for example, it is assumed that utility is linear with
regard to risk of disability progression, meaning that studies which assumed linearity with regard to
time until disability progression had to be excluded.
Chapter III.5
223
III.5.4.3 Combining categorical variables
In some instances it may be necessary to combine two or more categorical predictor variables from
a utility elicitation study to obtain the required effect. In the RRMS dataset, for example, most
studies use commonly encountered combinations of administration route and frequency to elicit
preferences with regard to administration. Two studies, however, elicited preferences for route and
frequency as two separate categorical variables 197,201. Assuming utility independence between
these two sub-criteria, and statistical independence between their estimated utility coefficients (i.e.
orthogonality of the source DCE), one can obtain the coefficients for the combined criterion by
adding (or subtracting) the coefficients for the appropriate levels of each sub-criterion, and (due to
independence) adding their variances (after changing the coding scheme of each sub-criterion if
necessary, as described above).
III.5.5 Statistical model
III.5.5.1 The concept: a generalised network meta-analysis of preferences
The basic concept behind the model is to borrow the structure from network meta-analysis and
apply it in the context of outcome preferences. Essentially this means involve using a “network” of
outcomes and estimate the relative preferences between them from previous preference elicitation
studies, analogous to the way in which relative treatment effects from individual clinical studies are
combined in a network meta-analysis.
In the linear utility framework already established in this chapter, the utility function is given by
𝑈 = ∑ 𝑒𝑔𝜔𝑥𝜔
𝜔=1
where the coefficients are estimated using an elicitation method such as those discussed earlier in
the chapter. Under this linearity assumption, the utility ratio between criteria 𝜔1 and 𝜔2, i.e. the
relative level of utility for a unit increase in 𝜔1 compared to a unit increase in 𝜔2, takes a constant
value 𝑒𝑔𝜔1/𝑒𝑔𝜔2 regardless of the starting level of either criterion. Note also that this quantity is
independent of the (arbitrary) absolute scale on which the utility function is expressed. This in turn
suggests that the utility ratio may be homogeneous regardless of the particular elicitation method or
exercise that was used in each study. Again, this is analogous to the typical situation in network
meta-analysis where it is assumed that the relative treatment effects are homogeneous between
studies even though the absolute outcomes may show considerable variation.
Chapter III.5
224
Network diagrams can be drawn analogous to those in traditional network meta-analysis, as shown
in Figure 57 for the RRMS case study. The thickness of the line connecting any 2 criteria shows is
proportional to the number of studies provided data for the relevant preference ratio.
Figure 57 - Network diagram of preference elicitation studies concerning relapsing remitting multiple sclerosis treatment outcomes.
During the process of screening and data extraction it became apparent that most of the eligible
studies were discrete choice experiments using the logit model (for more details see Appendix A).
The utility coefficients from these studies are all on the same absolute scale where the coefficients
represent log odds ratios of choice, and therefore their absolute values can be directly combined
without the need to focus on pairwise relative utility ratios. The absolute utility coefficients also
appeared to be just as homogeneous as the utility ratios. It therefore was decided that in this case
the meta-analysis could be carried out on this absolute utility scale, although this should be kept
under review in any future applications. To express preferences from other types of elicitation study
on the same scale for analysis, they are multiplied within the model by an overall scaling constant
which is assigned a vague prior.
Chapter III.5
225
The analysis is then essentially a simple matter of performing simultaneous meta-analysis on each
criterion’s (scaled) utility coefficient. However, the task is made more complicated due to (i) the
need to incorporate a meta-regression on the criteria values used in each study and (ii) the fact that
the source studies can estimate and report their coefficients in different ways with different
statistical properties, as already discussed in III.5.3 and III.5.4.
III.5.5.2 Model specification
The observed coefficients (after any necessary rebasing) are assumed to follow conditional Normal
distributions, at least in the marginal sense.
In formal terms, in the random preferences version of the model, for study 𝑖 ∈ 1,… ,𝑁𝑃𝑆 and
criterion 𝑗 ∈ 1, … ,𝑁𝑃𝑂𝑖 and discrete point 𝑘 ∈ 1,… ,𝑁𝑃𝑖𝑗 the observed utility coefficient 𝑐𝑖𝑗𝑘
conditional on the corresponding preference parameter 𝑒𝑔𝜔𝑖𝑗𝑘 and study-specific scaling constant 휁𝑖
has a Normal marginal distribution with mean 𝑒𝛾𝑖𝑗𝑥𝑖𝑗𝑘휁𝑖 and variance 𝜋𝑖𝑗𝑘2 :
𝑐𝑖𝑗𝑘 ~ 𝑁( 𝑒𝛾𝑙𝑖𝑗𝑘𝑥𝑖𝑗𝑘휁𝑖 , 𝜋𝑖𝑗𝑘
2 )
where 𝑒𝛾𝑙𝑖𝑗𝑘 ~ 𝑁(𝑒
𝑔𝜔𝑖𝑗𝑘 , (𝑒𝑔𝜔𝑖𝑗𝑘 𝜎𝑝𝑟𝑒𝑓)
2) is the random study-specific utility coefficient on the
logit scale for level k of within-study criterion j (similar to the random preferences distribution in the
ratings model - see III.3.3) and 𝜔𝑖𝑗𝑘 refers to the global criterion whose preferences relate to the
level 𝑙𝑖𝑗𝑘 . In the fixed preferences model, 𝑒𝛾𝑙𝑖𝑗𝑘 = 𝑒
𝑔𝜔𝑖𝑗𝑘 .
However, within levels of a criterion the coefficients will be correlated, as set out in III.5.4. For any
criteria with more than 2 levels it is necessary to allow for correlations of 0.5 (dummy coding) or
-1/(n-1) (effects coding) between pairs of coefficients for the same criterion within a study. In
principle either coding scheme can be used; for the sake of simplicity the model here will be
constructed based on dummy coding for all criteria with any rebasing having already been carried
out within the data. Correlations of 0.5 between coefficients must therefore be allowed for. This is
achieved using the same variance decomposition technique described earlier in II.4.4, with an
auxiliary variable representing the portion of variability than is shared between the coefficients.
III.5.5.3 Priors
Priors must be specified for the preference strengths (or utility coefficients) and random preference
standard deviation; these are the same as set out for the ratings model in III.3.3.
Chapter III.5
226
It is also necessary to specify a prior for the study-specific scaling constants 휁𝑖. For logit-model
choice studies, 휁𝑖 is assigned a fixed value of 1; for other types of study the prior used here is a vague
folded-Normal prior, i.e. 휁𝑖~𝑁+(0,100).
III.5.6 Results
III.5.6.1 Published RRMS studies only
Table 30 and Table 31 show summary statistics from the posterior distributions of the key
parameters and the residual deviance for the fixed and random effects versions of the model
repectively.
Table 30 - Posterior distribution of preferences in published RRMS preference elicitaton studies; fixed preference model
FIXED PREFERENCES 7 studies, 28 coefficient estimates in total
unit mean SE 2.5% median 97.5%
Utility coefficients on choice scale; i.e. effect on log odds of choice
Relapse rate 1 relapse/year -1.085 0.041 -1.167 -1.085 -1.005
Disability progression 100% risk -1.465 0.063 -1.590 -1.465 -1.343
Daily oral vs daily subcutaneous N/A 0.851 0.034 0.785 0.851 0.919
Monthly infusion vs daily subcutaneous N/A 0.461 0.023 0.417 0.461 0.506
Weekly intramuscular vs daily subcutaneous N/A 0.178 0.012 0.154 0.177 0.202
Normalised preference weights
Relapse rate 1 relapse/year 4.4% 0.2% 3.9% 4.4% 4.9%
Disability progression 100% risk 31.9% 0.6% 30.7% 31.9% 33.1%
Daily oral vs daily subcutaneous N/A 25.0% 0.3% 24.4% 25.0% 25.7%
Monthly infusion vs daily subcutaneous N/A 13.6% 0.7% 12.3% 13.6% 14.9%
Weekly intramuscular vs daily subcutaneous N/A 5.2% 0.3% 4.6% 5.2% 5.8%
Residual deviance N/A 373.5 3.7 368.3 372.8 382.3
Chapter III.5
227
Table 31 - Posterior distribution of preferences in published RRMS preference elicitation studies; random preference model
RANDOM PREFERENCES 7 studies, 28 coefficient estimates in total
unit mean SE 2.5% median 97.5%
Utility coefficients on choice scale; i.e. effect on log odds of choice
Relapse rate 1 relapse/year -1.474 0.523 -2.710 -1.385 -0.762
Disability progression 100% risk -3.195 1.606 -6.907 -2.850 -1.549
Daily oral vs daily subcutaneous N/A 2.718 0.775 1.557 2.600 4.560
Monthly infusion vs daily subcutaneous N/A 0.610 0.291 0.265 0.545 1.341
Weekly intramuscular vs daily subcutaneous N/A 0.517 0.428 0.119 0.373 1.717
Normalised preference weights
Relapse rate 1 relapse/year 20.3% 6.0% 10.3% 19.7% 33.9%
Disability progression 100% risk 42.2% 9.7% 26.1% 41.1% 64.9% Daily oral vs daily subcutaneous N/A 37.5% 8.5% 20.8% 37.4% 54.9%
Monthly infusion vs daily subcutaneous N/A 8.4% 3.5% 3.6% 7.7% 17.4%
Weekly intramuscular vs daily subcutaneous N/A 7.1% 5.6% 1.7% 5.2% 23.0%
Between-study proportional preference standard deviation N/A 0.65 0.17 0.42 0.62 1.06
Residual deviance N/A 45.6 6.5 34.9 44.9 60.2
In both datasets the residual deviance in the fixed preference model is at least an order of
magnitude too high, indicating very poor fit; the random preferences model fits better but the
residual deviance still exceeds the number of observations, suggesting excessive heterogeneity or
non-Normality of the random preferences.
Inspection of the study-level residuals (not shown) reveals that most of the deviance is associated
with one or two outlying studies in each dataset, which do not fit well with the assumed Normality
of the random preference distribution. Exclusion of these studies improves the overall fit (see III.5.7)
III.5.6.2 Published RRMS studies and PROTECT patient choice results
Table 32 shows the results from a random preference analysis of the RRMS dataset augmented by
the inclusion of an extra study: the PROTECT RRMS patient choice study, providing coefficients (for
relapse rate and disability progression) obtained from a frequentist analysis (as already shown
alongside the Bayesian results in
Chapter III.5
228
Table 24).
Table 32 - Posterior distribution of preferences in published RRMS choice studies and summary data from PROTECT patient choice study; random preference model
RANDOM PREFERENCES 8 studies, 30 coefficient estimates in total
unit mean SE 2.5% median 97.5%
Utility coefficients on choice scale; i.e. effect on log odds of choice
Relapse rate 1 relapse/year -1.349 0.473 -2.517 -1.259 -0.706
Disability progression 100% risk -7.042 2.777 -13.880 -6.488 -3.686
Daily oral vs daily subcutaneous N/A 2.651 0.799 1.496 2.522 4.565
Monthly infusion vs daily subcutaneous N/A 0.601 0.309 0.250 0.528 1.397
Weekly intramuscular vs daily subcutaneous N/A 0.524 0.443 0.115 0.376 1.769
Normalised preference weights
Relapse rate 1 relapse/year 12.7% 4.4% 5.9% 12.0% 22.9% Disability progression 100% risk 62.6% 8.5% 45.8% 62.6% 79.5%
Daily oral vs daily subcutaneous N/A 24.7% 6.8% 12.8% 24.2% 39.6%
Monthly infusion vs daily subcutaneous N/A 5.6% 2.7% 2.2% 5.0% 12.6%
Weekly intramuscular vs daily subcutaneous N/A 4.9% 4.0% 1.0% 3.5% 16.2%
Between-study proportional preference standard deviation N/A 0.72 0.18 0.47 0.69 1.16
Residual deviance N/A 47.3 6.8 36.2 46.7 62.4
The PROTECT study appears to fit well with the majority of the external studies, since including it has
increased the between-study preference heterogeneity only slightly; also, the increase in the
residual deviance is only 1.6, which compares favourably to the 2 additional observed coefficients.
The population average utility coefficients have not changed much, with the main difference being
the coefficient for disability progression, which is more than doubled when the PROTECT study is
included. The other utility coefficients are not substantially altered.
Chapter III.5
229
III.5.7 Discussion
As elicitation studies and quantitative health decision models become more widespread it is likely
that many will be interested in comparing or combining elicitation results. This method is a natural
adaptation of meta-analysis that provides a mathematical framework for this task.
The data did appear to show a higher degree of heterogeneity than one would perhaps expect or
wish to see when carrying out a meta-analysis of this kind, although with only 7 source studies in the
dataset it is difficult to make a fair assessment.
The between-study heterogeneity of preferences in the random preferences model can be
measured by the corresponding (proportional) standard deviation parameter – in other words, the
typical proportional variation in utility coefficients between studies - which was estimated at 65% in
the RRMS case study. Univariate meta-analyses often report the between-study heterogeneity using
an 𝐼2 statistic, representing the proportion of variability that is due to true heterogeneity rather than
random chance. A multivariate version of this statistic has been proposed 209 and is calculated as
𝐼2 = (𝑄 − 𝑑𝑓 + 1)/𝑄 where df is the degrees of freedom in the model (i.e. the number of
parameter estimates in the data minus the number of parameters to be estimated) and Q is a
multivariate analogue 209 of the Cochran Q-statistic 210, which is equal to the residual deviance. Here
this gives an estimate of 𝐼2 = 50% in the RRMS case study, indicating moderate to high
heterogeneity. (When the PROTECT patient choice results are included, the estimate is slightly lower
at 𝐼2 = 47%.)
One possible reason for the heterogeneity is that some of the source data were biased and/or
incompatible with one another for reasons related to study design and conduct - preference
elicitation studies are relatively novel in the field, and place unfamiliar demands on researchers and
patients alike. There are many methodological aspects of these studies – from the statistical design
and analysis to the way information is provided to participants – that could potentially colour the
results.
It must be acknowledged, of course, that poor study design and conduct are not the only possible
explanation for these findings. It may be that the assumptions do not hold, and the model’s
structure and distributions do not correctly reflect the population-wide distribution of preferences.
But even if this is so in these cases, it does not imply that these models can never be appropriate,
especially at the sub-group level. Encouragingly, most of the residual deviance in the RRMS dataset
was associated with two studies (POULOS 200 and WILSON 2014 202); removing these from the
Chapter III.5
230
dataset educes the 𝐼2 estimate to 20%. This suggests the presence of effect modifiers differentiating
these studies from the others; a cursory examination of the study reports revealed no obvious
demographic or design variables that could be responsible, but a detailed comparison has not been
undertaken. It may be that participants’ responses are affected by the way questions are framed
and the explanatory materials that are provided (for further discussion of this point see III.6.4).
There is too little data here to judge properly, but should it be the case that the studies or
populations appear fall into more than one class then potentially an approach based on mixture
models might be appropriate 211.Discrete choice and “rating-based conjoint analysis” (i.e. absolute
scenario ratings) were the only methodologies represented in the source studies. In principle any
method that gives weights or preference strengths with standard deviations/errors can be used.
Any preference estimates without a measure of uncertainty would require additional
assumptions/modelling.
There were no included choice studies that used the probit (as opposed to logit) model, which
technically results in a different utility scale. If any such studies were to be included, they could all be
given the same scale parameter, and since has been shown that the difference in results between
the two models is usually negligible 172, using the same scale parameter value for all choice studies
seems unlikely to be problematic.
No allowance was made for correlations arising as a result of imbalanced choice study designs. Any
such correlations can in principle be calculated given the full study design 212; incorporating them
would require an extension to the model.
Additionally, one could choose to allow for correlations among the (random) preferences for
different criteria, either within-or between-study. This has not been attempted at this stage in order
to keep the model simple, but the code could be adapted to allow for multivariate random
preference distributions using the kind of construction set out in II.4.4. Note that such statistical
dependence of preferences between individuals in a population is a different concept to utility
dependence, which would contradict the assumptions of the linear-additive MCDA model.
The data structure used for categorical criteria can also in principle accommodate continuous
criteria where the aim is to estimate the utility associated with a set of discrete levels (and hence
allow for non-linearity). It would be fairly straightforward to interpolate the utility between these
levels in, say, piecewise linear fashion. This does however goes somewhat beyond the scope of this
thesis, where the focus is on linear MCDA.
Chapter III.5
231
Preference elicitation in medical decision-making is still relatively novel and clearly more experience
– and larger datasets – are needed to see whether population preferences are homogeneous
enough to apply this particular method (or indeed, to apply MCDA in benefit-risk at all). The results
from these datasets are mixed, and more research in this area is needed. The consistency and
homogeneity within studies appears to be good, however, so if significant between-study
heterogeneity is confirmed it would be worth asking how it arises. The studies used here do vary
widely geographically, and RRMS varies widely in severity and in patient characteristics, so
straightforward heterogeneity may be to blame. It may also be possible, however, that the
psychological impact of the way questions are framed (eg cognitive biases of scale related to the
levels used) or the supporting information (outcome glossaries, etc) presented to participants, may
influence the results of any given study.
In providing a framework for comparing studies and making a broad assessment of their
heterogeneity, the model presented here at least presents a first step toward understanding such
issues. Should any particular study characteristic be suspected of influencing the results,
incorporating this effect via a meta-regression coefficient could provide direct evidence.
Issues of between-study heterogeneity notwithstanding, as a proof of concept this was largely a
successful exercise – the model has been clearly set out on paper and shown to function as
expected, with working code that converges and allows estimation of the parameters of interest.
Here we saw that the method could be used to combine external studies with the PROTECT choice
data – however, this required a two-stage analysis, with summary results from the first stage being
used in the second. In its current form this model could also not incorporate results from the ratings
datasets as they are not expressed on the same scale. In the next section a different model for
combining inferences from multiple preference datasets will be constructed, one capable of using
the full source data (choices and/or ratings) in the original format rather than just summary
statistics.
Chapter III.6
232
III.6 Combining preferences from different methods
Given the Bayesian models already described in III.3, III.4 and III.5 for individual elicited ratings,
individual choices and summary preference data from previous analyses respectively, the logical
next step is to establish a unified model or framework that can make combined inferences based on
two or more of those models.
The literature in this area is somewhat limited. Several studies have previously carried out parallel
tests of different methods without synthesising the combined results 174,213,192,214-216.
There is a class of models originating in marketing research, sometimes known as “hybrid conjoint”,
that seek to combine different preference data formats. However the formats they focus on are
absolute scenario ratings, which are not being used here, and directly elicited weights and partial
values (i.e. not broken down into pairwise comparisons, making the elicitation cognitively more
taxing than methods that break down the problem like AHP or swing weighting). Furthermore most
of these models are not Bayesian and do not even incorporate the elicited data fully in the
likelihood. One model from this school does use a fully Bayesian parameterisation that is similar in
concept to my goal here 217 – but again the data formats are restricted to absolute scenario ratings
and directly elicited weights and partial values. Another example incorporates choice data, and also
uses a Bayesian approach 175, but again it does not incorporate the elicited data in the likelihood,
instead using it to construct the priors. None of these models therefore can carry out a full Bayesian
analysis all of the types of preference data that have been encountered in this chapter.
Within the field of choice modelling, there was some limited early work on combining and
comparing different preference data formats, and the idea’s potential was noted 218. More recently
a paper by Zhang et al presented a combined analysis of choice and ranking data, with different (but
overlapping) sets of criteria 173. All of these examples are however non-Bayesian and therefore not
directly compatible with the models here. Choice studies have also been developed that
supplement the choice tasks with additional questions (such as rating the strength of preference for
the selected choice 25,26) but this type of approach does not allow for borrowing of information from
separate studies altogether.
Attempts to marry different preference data formats have also been made in models that formulate
MCDA very differently, for example using fuzzy mathematics 219. This is a completely different
paradigm that is not compatible with a Bayesian approach and therefore beyond the scope of this
thesis.
Chapter III.6
233
A new model for joint Bayesian analysis of multiple preference data types is therefore required. In
the Bayesian MCMC context this is reasonably straightforward to construct; given the models
already developed in this chapter, one essentially just needs to append the likelihoods together
using the same set of preference parameters. (This is one advantage of the Bayesian MCMC
approach: it is easy to specify arbitrarily complex models). The basic parameterisation used so far
within this chapter is compatible with each type of data provided that any differences in scale are
allowed for. To recap:
• Preference parameters derived from choice data (in either summary or individual format)
are on a fixed scale related to individual choice probabilities
• Preference parameters derived from criteria ratings data are on an arbitrary scale.
Again the underlying principle is that we are ultimately concerned not with the absolute utility scale
but with the values of the utility coefficients, on a relative basis, which are assumed to be
homogeneous in the population.
III.6.1 Datasets
The RRMS case study will be used to test the approach. The aim is to generate preference weights
for (i) the outcomes synthesised in Chapter II and (ii) the administration route and frequency for
each of the treatments (as shown in Table 4). We have already come across usable preference data
for some of these, but not others, as shown in Table 33.
Table 33 – Treatment outcomes and administration modes for the RRMS treatments, and the availability of corresponding preference data. IR = investigator ratings dataset, PR = patient ratings dataset, PC = patient choices dataset, PS = preference synthesis dataset.
Preference data available?
IR PR PC PS
Criteria for which outcomes have been synthesised: Relapse rate Yes
Risk of disability progression Yes
Risk of liver enzyme elevation Yes
Risk of serious gastrointestinal disorders No
Risk of serious bradycardia No
Risk of macular edema No
Administration routes and frequencies: Daily oral (self-administered) Yes
Daily injection (self-administered) Yes
1-3x per week injection (self-administered) Yes
All four datasets therefore have something to contribute to the overall network. However, they also
contain several other criteria that are not directly relevant to the case study. To avoid having to deal
Chapter III.6
234
with too cumbersome a model it would be convenient to drop some of these irrelevant criteria if
possible. However, the following points should be considered:
• For choice data, it is not advisable to drop any criteria; the analysis should include a
parameter for every criterion in the original dataset, since the impact of any excluded
variables on choice may manifest as additional uncertainty on the remaining parameter
estimates.
• For ratings data and preference meta-analysis data, criteria can sometimes be discarded but
it is necessary to pay close attention to the connectivity of the network structure, as
illustrated in Figure 58.
Figure 58 – Example of combined preference network. The tree structure (top left) might arise from swing weighting; the web structure (bottom left) from AHP or preference meta-analysis. These combine to give a more complex overall structure (right). Suppose the aim is to obtain weights for criteria A, B, C and D. Then X should certainly not be removed as this would leave D disconnected. Y can safely be removed without affecting the estimated weights. Discarding Z would still permit the weights to be estimated but would discard useful data.
Chapter III.6
235
In addition there may be concerns over the validity of some data, for example with the RRMS patient
ratings for continuous outcomes where the scale was not made clear. Table 34 applies this logic to
the RRMS case study and sets out whether each of the criteria encountered in each preference
dataset should be included in the overall model.
Chapter III.6
236
Table 34 – Overall RRMS preference model: Criteria from each dataset for inclusion/exclusion
Dataset RRMS criteria to include RRMS criteria to exclude Notes
PROTECT patient ratings (see III.3.2.2)
Administration mode (daily oral, daily injection, weekly injection, monthly infusion)
Relapse Disability progression Liver enzyme elevation PML Herpes reactivation Seizures Flu-like reactions Infusion/injection
reactions Allergic/hypersensitivity
reactions Serious allergic reactions Depression
All continuous criteria excluded owing to doubts over validity due to unclear scale in elicitation questions.
PROTECT patient choices (see III.4.2)
Relapse Disability progression PML Allergic/hypersensitivity
reactions Serious allergic reactions Depression
(None) Validity of choice model relies on all criteria being included
PROTECT investigator ratings (see III.3.2.1)
Relapse Disability progression Liver enzyme elevation PML Allergic/hypersensitivity
reactions Infusion/injection
reactions Administration mode
(daily oral, daily injection, weekly injection, monthly infusion)
Herpes reactivation Seizures Flu-like reactions Congenital abnormalities
Many outcomes are not relevant and can be excluded as they are only located on isolated branches of the investigator value tree [CROSS REF] and hence provide no indirect information on other outcomes. Monthly infusion, PML and allergic/hypersensitivity reactions are retained because they feature in the patient ratings and/or patient choices, and thus may provide indirect evidence on the main criteria of interest. Infusion/injection reactions is retained because it provides the only link in the value tree between allergic/hypersensitivity reactions and the other criteria.
Published RRMS preference studies (see III.5.2)
Relapse Disability progression Administration mode
(daily oral, daily injection, 1-3x weekly injection, monthly infusion)
Various depending on source study (see Appendix A)
“Weekly injection” expanded to 1-3x weekly due to heterogeneous definitions in source studies.
Chapter III.6
237
Although messy, this very piecemeal collection of datasets is well suited for illustrating the method’s
ability to combine fragments of data from different sources.
III.6.2 Statistical model
The model simply combines the components we have seen previously into this chapter (i.e. a choice
model, a ratings model and a preference synthesis model) in order to specify the joint likelihood of
observing all of the adopted datasets. The underlying preference parameters to be estimated are
shared between all of the components but there is no need to allow for any correlation or
dependence between the likelihoods provided that no individuals took part in more than one of the
elicitation studies (which seems a reasonable assumption in this case).
As discussed earlier in the chapter, the various components should be compatible in terms of the
utility scale – if any choice data is included in one or more components (as in the choice and
preference synthesis models here) then the overall utility is fixed to the scale where a unit
represents the log odds of choice; if not then the scale remains arbitrary. Either way, normalised
weights can be obtained. In the RRMS case study, the presence of choice data means that the
overall utility remains fixed to the logit-of-choice scale.
III.6.2.1 Fixed/random preferences
The amount of heterogeneity to allow in the preference parameters is no longer a simple binary
choice between fixed and random preferences. There are several places in the model where a
“fixed” or “random” parameterisation can be used, as shown in Figure 59.
Chapter III.6
238
Figure 59 – Hierarchical structure of the preference data, indicating the levels where random preference distributions can be used.
This leads to a multitude of possible variations on the model; it would take up too much space (and
not be particularly informative) to go through them all here. This section will focus on two versions:
• Fixed preferences (every participant’s preferences are equal to the population average)
• Study-level random preferences only (preferences vary between studies but not between
participants in a single study)
Individual-level random preferences will not be used. The model selection is based partly on the
practical grounds that the choice model runs prohibitively slowly with individual-level data (see
III.4.3.2) but can also be justified on a statistical basis, since the results from the ratings datasets
indicate that the fixed-preference model fits equally well (and should therefore be favoured due to
its relative simplicity119).
Individual-level
random preferences
Study-level random
preferences
Population average
preferences
Average preferences
PROTECT investigator
ratings
Participant 1
Participant 2
Participant 3
PROTECT patient ratings
Participant 1
...
Participant 36
PROTECT patient choices
Participant 1
...
Particopant 124External study 1
...
External study 7
Chapter III.6
239
III.6.2.2 Combining the datasets
Each dataset uses a different set of criteria (although each overlaps with at least one other, giving a
connected network overall). To run the models simultaneously a master set of outcomes (the union
of the outcome sets for each dataset) is used to define the underlying preference strength
parameters; indexing vectors are then used to pick out the appropriate parameters for each analysis.
It is easier to specify the model if categorical variables follow the same coding scheme in every
dataset they appear in, but this may not always be strictly necessary.
For continuous criteria, care needs to be taken to ensure the outcome units to which the
preferences relate are equivalent; simple transformations may be necessary, sometimes requiring
additional assumptions (see III.5.4.2 for a discussion of the issues). In this instance:
• In the ratings datasets all criteria weights are based on comparisons of individual events
over a timescale that is unspecified (but presumably equivalent for all criteria)
• In the choice and preference synthesis datasets relapses are measured as a 1-year average
(i.e. expected) rate while the other outcomes are expressed as risks (i.e. binary expectations)
over a 2-year period.
In the second instance the time horizon of the relapse criterion is half that of the other criteria, while
in the first is the time horizons are (presumably) equal. To adjust for this the relapse utility
coefficient estimates from the ratings datasets need to be doubled.
III.6.2.3 Predictive distributions of preferences
In the results shown in this chapter so far I have focused on the population-level averages. As with
the clinical outcome synthesis in II.6.3.2, one can also allow for between- and within-study variability
in the posterior distributions if desired.
The study-level predictive distribution is obtained by drawing simulations from the random
preference distribution, with the between-study standard deviation estimated by the model. The
individual-level predictive distribution is obtained by drawing from a Normal distribution centred on
the study-level average. In a model with random preferences by participant, the standard deviation
of this distribution should be the within-study between-participant preference standard deviation.
The models fitted here all assume fixed preferences by participant, however, and so there is no
difference between the predictive distribution of preferences at the study level and participant level.
Suppose however for a moment that the assumption of fixed preferences within studies was
incorrect. Any true between-participant heterogeneity within studies in the preference meta-
analysis dataset would be reflected in the variances of the coefficient estimates, together with the
Chapter III.6
240
within-participant (between-choice) random variability of utility. An estimated upper bound for the
between-participant heterogeneity can therefore be derived based on the average variance of the
coefficients from the external preference studies (this parallels the approach used in Chapter II). For
consistency with the parameterisation used so far, the standard deviation for a given utility
coefficient is expressed as a constant proportion of the mean.
Any true between-participant heterogeneity within studies in the ratings datasets would be swept
up in the ratings standard deviation parameter, together with the within-participant random
variability of ratings. The ratings standard deviation can therefore be used as an estimated upper
bound for the within-study between-participant heterogeneity in the ratings datasets. It should
however be multiplied by 1/√2 (i.e. halving the variance, to account for the fact that a rating
consists of judgements on 2 criteria and the predictive distribution of a preference parameter
reflects the variability of only 1 criterion); note also that the use of this parameter means that the
predictive distribution should be defined on the log scale.
These upper bounds on the heterogeneity within the ratings and preference meta-analysis datasets
will be used in parallel to estimate the predictive distribution of preferences at the individual level.
Any true between-participant heterogeneity in the choice dataset, although it will be reflected in the
posterior distributions, cannot be so easily captured within the model (without additional
calculations) as it is not directly represented within any existing variables.
III.6.3 Results
The BUGS code and data used to generate these results are provided in Appendix B.
III.6.3.1 Patient choices and preference synthesis
Table 35 and Table 36 show the posterior distributions of the key parameters and variables from
models combining the patient choice dataset with the published preference studies, with fixed
preferences and random (by study) preferences respectively. Computing 200,000 iterations
(100,000 for burn-in and 100,000 for the posterior estimates) in OpenBUGS (version 3.2.2 rev 1063)
on a Microsoft Surface Book 2 (i5-8350U 1.70 GHz quad core) running Windows 10 took 201 seconds
for the fixed preference model and 275 seconds for the random preference model, which does not
seem excessive for an MCMC analysis.
Chapter III.6
241
Table 35 - Posterior distribution of preferences based on published RRMS choice studies and full data from PROTECT patient choice study; fixed preference model
FIXED PREFERENCES 7 summary-data studies with 28 coefficient estimates 1 full-data study with 124 participants and 1755 choices
unit mean SE 2.5% median 97.5%
Utility coefficients on choice scale; i.e. effect on log odds of choice
Relapse rate 1 relapse/year -1.14 0.04 -1.22 -1.14 -1.06 Disability progression 100% risk -1.61 0.07 -1.74 -1.61 -1.49 PML 100% risk -266.0 24.7 -314.8 -265.9 -217.9 Allergic/hypersensitivity reactions 100% risk -0.67 0.14 -0.94 -0.67 -0.39 Serious allergic reactions 100% risk -31.31 3.32 -37.83 -31.30 -24.83 Depression 100% risk -2.39 0.65 -3.66 -2.39 -1.12 Daily oral vs daily subcutaneous N/A 0.92 0.03 0.85 0.92 0.98 Monthly infusion vs daily subcutaneous N/A 0.48 0.02 0.43 0.48 0.52 Weekly intramuscular vs daily subcutaneous N/A 0.19 0.01 0.17 0.19 0.22
Normalised preference weights for synthesised RRMS outcomes and treatment administration modes
Relapse rate 1 relapse/year 31.0% 0.6% 29.8% 31.0% 32.2% Disability progression 100% risk 44.0% 0.6% 42.8% 44.0% 45.1% Daily oral vs daily subcutaneous N/A 25.0% 0.3% 24.3% 25.0% 25.7% Monthly infusion vs daily subcutaneous N/A 13.0% 0.6% 11.8% 13.0% 14.3% Weekly intramuscular vs daily subcutaneous N/A 5.2% 0.3% 4.6% 5.2% 5.8%
Choice model residual deviance N/A 491.9 5.7 481.1 491.8 503.4 Preference synthesis residual deviance N/A 379.3 6.2 370.1 378.4 393.8 Total residual deviance N/A 871.2 4.6 864.1 870.6 882.1
Chapter III.6
242
Table 36 - Posterior distribution of preferences based on published RRMS choice studies and full data from PROTECT patient choice study; random (by study) preference model
RANDOM PREFERENCES (by study; fixed within studies)
7 summary-data studies with 28 coefficient estimates 1 full-data study with 124 participants and 1755 choices
unit mean SE 2.5% median 97.5%
Utility coefficients on choice scale; i.e. effect on log odds of choice
Relapse rate 1 relapse/year -1.47 0.56 -2.84 -1.36 -0.72 Disability progression 100% risk -7.29 3.08 -13.83 -6.71 -3.99 PML 100% risk -268.1 90.8 -491.1 -251.4 -140.7 Allergic/hypersensitivity reactions 100% risk -25.07 33.99 -121.20 -12.55 -0.43 Serious allergic reactions 100% risk -39.66 4.42 -48.40 -39.63 -31.11 Depression 100% risk -5.11 0.89 -6.87 -5.10 -3.37 Daily oral vs daily subcutaneous N/A 2.77 0.79 1.61 2.64 4.67 Monthly infusion vs daily subcutaneous N/A 0.63 0.32 0.27 0.56 1.44 Weekly intramuscular vs daily subcutaneous N/A 0.56 0.48 0.10 0.41 1.87
Normalised preference weights for synthesised RRMS outcomes and treatment administration modes
Relapse rate 1 relapse/year 13.1% 4.7% 5.9% 12.4% 24.1% Disability progression 100% risk 62.2% 8.4% 45.8% 62.1% 79.0% Daily oral vs daily subcutaneous N/A 24.7% 6.6% 13.0% 24.2% 39.0% Monthly infusion vs daily subcutaneous N/A 5.6% 2.7% 2.3% 5.1% 12.4% Weekly intramuscular vs daily subcutaneous N/A 5.0% 4.1% 0.9% 3.7% 16.3%
Between-study proportional preference standard deviation N/A 0.65 0.14 0.45 0.63 0.98 Choice model residual deviance N/A 94.2 3.5 89.4 93.6 102.7 Preference synthesis residual deviance N/A 45.4 6.5 34.8 44.8 59.9 Total residual deviance N/A 139.7 7.4 127.2 139.0 155.8
Again the residual deviance (compared to the number of observations) reveals very poor fit for the
fixed preference model but good fit for the random preference model.
It is worth noting that the preference weight posteriors obtained from this (random preference)
model and dataset, which includes the full PROTECT choice data, are in close agreement with those
Chapter III.6
243
obtained in III.5.6.2 where the same data was included in summary form. This is a reassuring finding
that indicates the overall model structure and parameterisation is behaving appropriately. The only
substantial difference between the two sets of results the residual deviance, which is higher here
than in III.5.6.2 because the incorporation of full individual-level data creates more scope for
observations to deviate from the within-study averages.
III.6.3.2 Full model – patient choices, investigator ratings, patient ratings and preference
synthesis
Table 37 and Table 38 show the posterior distributions of the key parameters and variables from
models combining all relevant preference data, with fixed preferences and random (by study)
preferences respectively. Computing 200,000 iterations (100,000 for burn-in and 100,000 for the
posterior estimates) in OpenBUGS (version 3.2.2 rev 1063) on a Microsoft Surface Book 2 (i5-8350U
1.70 GHz quad core) running Windows 10 took 567 seconds for the fixed preference model and 388
seconds for the random preference model, which again does not seem excessive for an MCMC
analysis of this kind.
Chapter III.6
244
Table 37 - Posterior distribution of preferences based on all preference datasets; fixed preference model
FIXED PREFERENCES 7 summary-data studies with 28 coefficient estimates 3 full-data studies with 163 participants, 231 ratings and
1755 choices
unit mean SE 2.5% median 97.5%
Utility coefficients on choice scale; i.e. effect on log odds of choice
Relapse rate 1 relapse/year -1.09 0.04 -1.17 -1.09 -1.02
Disability progression 100% risk -1.59 0.06 -1.71 -1.58 -1.46 PML 100% risk -247.0 26.1 -298.5 -247.0 -196.0 Liver enzyme elevation 100% risk -71.52 48.70 -198.90 -59.46 -15.37 Allergic/hypersensitivity reactions 100% risk -0.75 0.13 -1.01 -0.75 -0.50 Serious allergic reactions 100% risk -32.77 3.34 -39.35 -32.75 -26.27 Depression 100% risk -2.37 0.65 -3.64 -2.37 -1.10 Infusion/injection reactions 100% risk -6.40 3.80 -16.17 -5.51 -1.86 Daily oral vs daily subcutaneous N/A -0.90 0.03 -0.97 -0.90 -0.83 Monthly infusion vs daily subcutaneous N/A -0.47 0.02 -0.52 -0.47 -0.43 Weekly intramuscular vs daily subcutaneous N/A -0.19 0.01 -0.21 -0.19 -0.16
Normalised preference weights for synthesised RRMS outcomes and treatment administration modes
Relapse rate 1 relapse/year 2.1% 1.4% 0.5% 1.7% 5.8%
Disability progression 100% risk 3.0% 2.0% 0.8% 2.5% 8.4% Liver enzyme elevation 100% risk 93.1% 4.6% 81.1% 94.3% 98.2% Daily oral vs daily subcutaneous N/A 1.7% 1.1% 0.4% 1.4% 4.7% Monthly infusion vs daily subcutaneous N/A 0.9% 0.6% 0.2% 0.8% 2.5% Weekly intramuscular vs daily subcutaneous N/A 0.4% 0.2% 0.1% 0.3% 1.0%
Ratings standard deviation N/A 1.33 0.06 1.21 1.33 1.46 Ratings model residual deviance N/A 230.0 21.5 189.8 229.3 274.1 Choice model residual deviance N/A 535.8 6.0 525.0 535.4 548.8 Preference synthesis residual deviance N/A 379.4 6.2 370.1 378.5 394.1 Total residual deviance N/A 1145 22.3 1104 1144 1191
Chapter III.6
245
Table 38 - Posterior distribution of preferences based on all preference datasets; random (by study) preference model
RANDOM PREFERENCES (by study; fixed within studies)
7 summary-data studies with 28 coefficient estimates 3 full-data studies with 163 participants, 231 ratings and
1755 choices
unit mean SE 2.5% median 97.5%
Utility coefficients on choice scale; i.e. effect on log odds of choice
Relapse rate 1 relapse/year -1.62 0.62 -3.21 -1.48 -0.81
Disability progression 100% risk -7.26 2.21 -12.66 -6.85 -4.35 PML 100% risk -245.3 75.3 -431.6 -231.9 -139.2 Liver enzyme elevation 100% risk -21.22 23.23 -84.71 -13.88 -1.93 Allergic/hypersensitivity reactions 100% risk -5.92 7.22 -24.93 -3.64 -0.47 Serious allergic reactions 100% risk -39.48 4.39 -48.07 -39.46 -30.91 Depression 100% risk -5.10 0.88 -6.83 -5.10 -3.38 Infusion/injection reactions 100% risk -19.31 35.30 -107.50 -8.50 -1.01 Daily oral vs daily subcutaneous N/A -2.72 0.64 -4.21 -2.63 -1.72 Monthly infusion vs daily subcutaneous N/A -0.72 0.31 -1.50 -0.66 -0.33 Weekly intramuscular vs daily subcutaneous N/A -0.66 0.43 -1.76 -0.54 -0.17
Normalised preference weights for synthesised RRMS outcomes and treatment administration modes
Relapse rate 1 relapse/year 6.5% 3.6% 1.4% 5.8% 15.2%
Disability progression 100% risk 28.8% 13.5% 7.0% 27.5% 56.9% Liver enzyme elevation 100% risk 53.7% 20.3% 14.9% 54.9% 88.1% Daily oral vs daily subcutaneous N/A 11.0% 5.4% 2.7% 10.4% 23.0% Monthly infusion vs daily subcutaneous N/A 2.9% 1.7% 0.6% 2.5% 7.0% Weekly intramuscular vs daily subcutaneous N/A 2.6% 2.0% 0.4% 2.0% 7.9%
Ratings standard deviation N/A 1.20 0.06 1.09 1.19 1.32 Between-study proportional preference standard deviation N/A 0.58 0.09 0.44 0.57 0.79 Ratings model residual deviance N/A 229.9 21.4 190.1 229.3 273.7 Choice model residual deviance N/A 94.4 3.6 89.5 93.7 103.3 Preference synthesis residual deviance N/A 45.8 6.6 34.9 45.1 60.8 Total residual deviance N/A 370.1 22.7 327.8 369.4 416.2
Based on the residual deviances, the random preference model clearly achieves a much better fit to
the data than the fixed preference model. Introducing the ratings data has reduced the between-
study preference standard deviation slightly compared to the model in Table 36, and the preference
weights remain in roughly the same proportions (with the exception of liver enzyme elevation,
which was not included in the previous model since it only appears in the ratings data).
Chapter III.6
246
III.6.3.3 Predictive distributions of preferences
The preceding tables represent the posterior distributions of the population-average preference
parameters.
Figure 60 shows the posterior 95% credibility intervals of the preference weights from the full model
(with random preferences by study) at the following levels of predictive variability (see III.6.2 for
more details):
• Population averages
• Study-level averages; includes between-study variability
• Individual preferences (1); includes between-study variability and an upper bound estimate
of between-participant variability in the preference meta-analysis dataset
• Individual preferences (2); includes between-study variability and an upper bound estimate
of between-participant variability in the ratings dataset
For administration, only the overall weight (i.e. corresponding to the utility difference between daily
oral and daily subcutaneous) is shown.
Figure 60 – Forest plot showing the posterior predictive distributions of preference weights in the full RRMS preference model, at various levels of predictive variability. The two versions of the individual-level predictive distributions are based on upper bounds for the individual-level variance and so the width of the distributions may be overstated.
Chapter III.6
247
Increasing the level of predictive variability adds to the uncertainty on weights, widening the
credibility intervals and also appears to shift the means slightly towards the null (the point where all
criteria are equally weighted, i.e. 25% each in this example).
The upper bound estimate (1) for the between-patient heterogeneity in the preference meta-
analysis dataset only increases the overall uncertainty very marginally above that resulting from
between-study heterogeneity, supporting the assumption that preferences are fixed within studies.
The upper bound estimate from the ratings dataset has more of an impact, but the estimated
standard deviation parameters seen earlier in Table 23 suggest that most of this impact must have
been due to within-participant random errors rather than between-participant heterogeneity.
Overall therefore the assumption of fixed preferences within studies appears to be sound.
III.6.4 Discussion
Overall this has provided a successful initial demonstration of the method’s capabilities for
combining preference data; coherent posterior distributions have been obtained which appears to
reflect a combined average of the individual analyses. For example, consider the estimated utility
ratio of disability progression to relapse: in the preference synthesis model, this is 2.2III.5.6.1), in the
choice model it is 14.6 (III.4.4), in the ratings model it is roughly two thirds (III.3.4.3, remembering to
double the relapse coefficient as per III.6.2.2). The model that combines the first two estimates
gives a ratio of 5 (III.6.3.1); then bringing in the third estimate gives a ratio of 4.5 (III.6.3.2). In other
words the combined estimates are always in between the original estimates.
It is also encouraging to see the equivalence of the weights obtained in III.5.6.3 and III.6.3.1, where
the same PROTECT source data is analysed in two different formats. In the first instance the
PROTECT data was included in summary form in the preference meta-analysis model; in the second
instance the PROTECT data was analysed in a binomial logit choice model and combined with the
results of the preference meta-analysis. This is evidence that the novel constructions of the
preference meta-analysis model and the combined preference model are consistent with the
established principles of choice models.
These models provide the basis for a framework to combine disparate sources of preference data; or
simply to make comparisons and evaluate heterogeneity. I believe this constitutes a useful step
forward in the field of preference modelling. Unlike most previous work in this area the underlying
datasets do not need to be based on precisely the same set of criteria. The model has been built
with generalisibility to different datasets in mind, hopefully facilitating further applications, but due
to the complexity of the overall structure there are some elements of the code that would need
some adaptions in order to be used elsewhere.
Chapter III.6
248
The model fit is generally poor for fixed preferences, but adequate to excellent for random
preferences at the study level. Insofar as attempts have been made to incorporate random
preferences at the individual participant level (i.e. in the ratings datasets) there seemed to be little
resulting improvement in model fit; further investigation of this issue in the choice dataset was
abandoned as the random preference model was found to run prohibitively slowly. As discussed
above, the evidence seems to suggest that the assumption of fixed within-study preferences was
appropriate here; nevertheless it might be interesting to examine the impact of allowing for random
preferences at the individual level in other datasets.
The between-study (proportional) preference standard deviation parameter was estimated at 58% in
the full model, a little less than the 65% estimate from the external preference studies alone. These
are proportional figures, i.e. the standard deviation of a utility coefficient is the given percentage of
the mean. This suggests that the preferences from the PROTECT datasets are consistent with the
external studies and the already revealed level of inconsistency. The multivariate 𝐼2 statistic209 is not
straightforward to calculate for the full preference model, however, as the Q-statistic no longer
coincides with the residual deviance (since the latter also reflects within-study heterogeneity) .
There has been little other work on Bayesian parametric utility modelling within health-related
fields. One model worthy of note is the model put forward by Musal & Soyer for directly analysing
utility coefficients, which uses an interesting parameterisation whereby the likelihood of he
coefficients is characterised using beta distributions220,221. As here, a Dirichlet prior is used. However,
for data the model uses utility coefficients with no error bounds, which is to ignore any random
error or uncertainty introduced by the original elicitation method (or even bias, as the results in
III.3.4 suggest). It would be interesting to see whether Musal & Soyer’s parameterisation could
extend to the elicitation data formats used here, but this may not be straightforward.
In principle results from any elicitation methods involving either choices or cardinal relative ratings
can be accommodated in my model. One could also incorporate scenario rankings by expressing
them as a series of choices170. In principle it would not be difficult to adapt the model to incorporate
absolute (Likert) ratings of scenarios, which have been combined with choice data elsewhere 175,222.
This was not done here because no such data was available (indeed I am not aware of such
methodology having been used for preference elicitation with regard to treatment outcomes).
Methods for ordinal criteria ratings have also been proposed 51,223 and it is possible that these (or
similar) approaches could be incorporated into the framework if this form of data is available.
Chapter III.6
249
Another data type not included is that arising from best-worst scaling 224. Analysis of such data can
be fairly complex; nevertheless, probabilistic models do exist 171,225 and it may be possible to
incorporate them in this Bayesian framework in future.
The fixed preference model fits just as well as the random preference model when only ratings and
choices are combined (results not shown), which is not surprising as (excluding the small number of
investigator ratings) the studies that produced these datasets were carried out by the same study
team (the PROTECT Patient & Public Involvement team 193) and in the same population (multiple
sclerosis patients at a London clinic). It is when the external studies are brought in that the
heterogeneity increases.
Heterogeneity ultimately indicates that one or more effect modifiers (that is, variables that influence
the relative level of preference for differenct outcomes) are unevenly distributed between the
source studies. Any such variation between studies might relate to one or more of the following
types of effect modifying variable:
• Population characteristics - the distribution of patients’ demographic and/or clinical
attributes may vary between studies.
• Geographical characteristics – physical factors such as climate and societal factors (such as
cultural attitudes or aspects of the healthcare system) may contribute to heterogeneity
when studies are carried out in distinct geographical locations.
• Study design factors – aspects of study design such as the type of elicitation task (choices vs
ratings, etc), the way questions are framed, and the choice of criteria and levels can vary
between studies and may influence the preference estimates 40. Studies may appear to use
the same clinical criteria but define them as emerging over different time horizons. Any
disparity among studies with regard to the covariates that are used to adjust the results may
also have an impact. Different recruitment methods may result in studies having patient
groups with different characteristics even if conducted in exactly the sme population.
• Study conduct factors – Elicited preferences may be influenced by the wording of
explanatory materials (intructions and glossaries) provided to participants and the
style/depth of involvement of facilitators 40. It seems likely that this may be a significant
contributor to heterogeneity of preference elicitation studies, both in the RRMS case study
and more generally, since the elicitation tasks can be cognitively demanding and require
extensive background knowledge, and participants will often refer extensively to any
provided guidance in order to help make sense of the tasks.
Chapter III.6
250
In the results shown in this chapter, the heterogeneity between studies is measured by the
between-study preference standard deviation, estimated (in the random effects model) as 58% on
the proportional scale. Assuming Normality, this implies that about 95% of studies will have
preference parameters within 116% of the overall mean, which might sound like quite a lot of
heterogeneity given that a 100% deviation corresponds to a doubling (or halving) of the parameter
value. However, bear in mind that 58% is considerably lower than the within-participant ratings
standard deviation, which was estimated at 101% on the equivalent scale (see III.3.4.3). In light of
this the between-study heterogeneity appears quite reasonable.
These results from the RRMS case study are fairly encouraging, and hopefully goes some way to allay
concerns about the validity of benefit-risk preference modelling at the population level. It is not
clear to what degree the heterogeneity represents true variability in preferences, or reflects aspects
of study design and execution. Note that many potential data sources (several external preference
studies and most of the PROTECT patient ratings dataset) had to be excluded due to the potential for
range insensitivity bias (see III.1.3.3); were it not for these exclusions the heterogeneity could have
been much higher. Equally many of the included studies may have had flaws that were not
recognised, and the heterogeneity may have been far less if all of the studies were perfect. It
remains probable that there is also some variability in preferences between segments of the
population, as has been found in other elicitation studies226-228.
One could see any heterogeneity as a problem for preference modellers, but my overriding view is
that the ability to directly assess and quantify the uncertainty on preference weights – and (in this
case at least) to find it within fairly reasonable limits given their subjective nature and the diversity
of study populations and elicitation techniques – will be something of a boost to the growing field of
benefit-risk assessment. Although there is noticeable heterogeneity between the RRMS preference
datasets, the weights do appear to represent a deeper truth than simply the arbitrary whims of the
study participants.
Nevertheless, attempts to aggregate/synthesise preferences for decision making purposes should try
to avoid heterogeneity whenever possible by ensuring that literature reviews include only those
studies that are most relevant to the problem and the target population. Where heterogeneity
cannot be avoided, it may be helpful to explicitly model the impact of any effect modifiers (using
meta-regression, for example) or to examine subgroups of studies/patients to identify more
homogeneous classes. If this is not possible, then random effects (preferences) models can be used
and measures of heterogeneity presented as part of the results, as has been done here.
Chapter III.6
251
Looking at the preference weights for the RRMS outcomes, one could argue that the preference
weight on liver enzyme elevation appears too high. On the face of it, these results appear to suggest
that a liver enzyme elevation event is more serious than a disability progression event, even though
the former may be transient and not translate into long-term liver damage, whereas the latter is
effectively permanent by definition. This is not an artefact of the methodology but a reflection of
the preferences expressed by the participants. It is worth bearing in mind that many patients and
clinicians may deem it more important to avoid harm by action (i.e. adverse events due to
treatment) than harm by inaction (i.e. lack of efficacy) as the latter is the part of the natural course
of the disease. Whether such a view is rational is a philosophical point that I cannot provide a
general resolution to here, but such issues may need to be considered by decision makers when
weighing up decision criteria.
The equal weighting of criteria in the prior has had a clear impact on the results. This could be seen
when comparing the ratings model results with a deterministic analysis (see III.3.4). In the combined
preference model, although there is no deterministic analysis to serve as a comparison, assigning
unequal prior weights does substantially change the posterior estimates (results not shown). It wuld
therefore appear that the selection of priors is something that will need to be considered carefully in
Bayesian preference elicitation.
The impact of the between-study heterogeneity on the posterior distribution of the preference
weights can be seen in Figure 60. Another conseqyuence of the heterogeneity is that the central
estimates of the preference weights can vary considerably depending on which datasets are
included, as shown in Figure 61. It is not yet clear to what extent the uncertainty/heterogeneity of
weights impacts upon the overall value of the different treatment options, however. This will be
explored in the next chapter.
Chapter III.6
252
Figure 61 – Preference weights (posterior means) for the key benefit-risk criteria, for three different combinations of the source datasets.
Ultimately the model can now provide estimated RRMS outcome weights based on all of the
available preference data of sufficient quality. A sense of the magnitude of preference
heterogeneity among the population has also been obtained, which will be important when
interpreting and assessing the importance of the overall results. The final step in assessing the
benefit-risk balance, then, is to combine the preference distributions with the treatment outcome
distributions that were estimated in Chapter II. This combination of performance data and
preference data is at the heart of MCDA, and will be the subject of Chapter IV.
Chapter III.6
253
IV. Assessing the overall benefit-risk balance
Having obtained the posterior distribution of treatment outcomes (Chapter II) and outcome
preferences (Chapter III) the final step in assessing the benefit-risk balance is to put these pieces
together and evaluate the overall utility associated with each treatment.
In essence all that is required is to combine the clinical outcomes synthesised in Chapter II with the
preferences elicited/synthesised in Chapter III, but the devil is in the detail of how this is done given
the particular outcome definitions that have been used in each source study. This section will go
through the issues encountered and strategy followed in the RRMS case study, much of which is
expected to also be relevant to other applications. But each benefit-risk assessment is different and
complex, with its own idiosyncrasies in the criteria and data, and it is difficult to set out a generalised
procedure or anticipate all of the issues that may be encountered in other applications. .
The “zeroes” outcomes in the RRMS case study (see II.4.7) will not be used here since (i) no
preference data was available and (ii) this would rather bias the benefit-risk assessment against
fingolimod, the only treatment with which two of these outcomes are associated. These outcomes
were only included in Chapter II to illustrate the model’s capability to synthesise such treatment-
specific adverse events.
Chapter IV.1
254
IV.1 Methods
IV.1.1 High level model structure
Figure 62 shows the overall modelling structure, putting together the clinical evidence modules from
Chapter II and the preference module from Chapter III.
Figure 62 – High level structure of the entire benefit-risk assessment model.
IV.1.2 Selection of outcomes and model versions
In the RRMS case study, a key issue to be settled is the question of which outcome definitions to use.
The preferences have been defined at the criteria level, but most of the criteria can be measured by
two or three different outcomes (as shown in Figure 10).
For the clinical evidence synthesis, fixed mappings with one group for each criterion will be used (i.e.
three mapping groups). This means that whichever outcome is chosen to represent a criterion in the
BR assessment, information on the other outcomes for that criterion is also incorporated, and
treatment rankings for that criterion are always the same. Applying the mappings this way achieves
what I would argue is a reasonable degree of “smoothing” (i.e. strongly within criteria, but not
between criteria) and can be thought of as an insurance policy against selecting the “wrong”
Chapter IV.1
255
outcome and hence overlooking key trends in another similar one. However, this strategy may not
be viable in all cases, as it depends on the pattern of missingness/patchiness in the evidence
synthesis data; sometimes different approaches to mapping will be necessary. Moreover, this
mapping strategy does not completely address the problem. The rankings within a criterion may be
unaffected by the choice of outcome, but the magnitude of (cardinal) differences between
treatments is still very much outcome-dependent. If we are to quantify the level of preference
associated with these differences, then care needs to be taken to ensure the preferences are
assigned to the same outcome as that to which the elicited preferences originally related (or as close
as possible).
It is of course impossible to know exactly what outcome definitions participants had in their heads
during an elicitation exercise, but it is usually possible to access the wording of the elicitation
questions together with any definitions, notes or glossaries that were provided. The suggested
strategy therefore is to use whatever outcome definition (out of those for which clinical outcome
data can be synthesised) seems to best fit the wording in the elicitation materials. Again, this will
require careful judgement on a case by case basis. Table 39 sets out the logic that was applied for
the criteria in the RRMS case study.
Table 39 – Identification of outcomes to which preferences relate for the criteria in the RRMS case study.
Criterion Outcomes Outcome to which preferences assumed to relate, and why
Relapse Annual relapse rate (ARR) Relapse-free proportion over 2 years (RFP)
ARR – this measure was implied by the elicitation questions in the PROTECT datasets and by the reporting in the external studies.
Disability progression Proportion experiencing disability progression, confirmed 3 months later (DP3) Proportion experiencing disability progression, confirmed 6 months later (DP6)
DP3 – this is the definition more commonly adopted within the source studies.
Liver enzyme elevation Proportion with ALT above upper limit of normal range (ALT1) Proportion with ALT above 3x upper limit of normal range (ALT3) Proportion with ALT above 5x upper limit of normal range (ALT)
ALT1 – the elicitation questions did not specify the level at which enzymes were considered to be “elevated”, but arguably this is the most literal interpretation.
Chapter IV.1
256
It seems clear however that the interpretation of outcome definitions (especially imprecise ones) by
elicitation participants may vary, and I would suggest that attaching alternative outcomes to the
preferences should be a key focus for sensitivity analyses, especially in cases where there is
significant doubt such as the liver enzyme elevation outcomes in the RRMS case study.
The full preference model with random preferences by study will be used, as per III.6.3.2.
Since the absolute level of clinical outcomes may vary considerably between populations, particular
attention should be paid to the data sources used for the population calibration module so that the
result of the decision is appropriate to the levels of outcomes observed in the target population. It
would also be sensible to take similar caution with the data sources used for preferencs, which may
vary in the population at large.
Another important question for a probabilistic benefit-risk assessment is the level of predictive
variability that is to be included – in other words, whether the aim is to assess the distribution of the
benefit-risk balance in terms of population-level averages, study-level averages or with full
individual-level variability. Here all three levels will be presented. It seems sensible to always be
consistent and use the same level of predictive variability for both preferences and clinical
outcomes. The individual-level predictive distributions use approximate upper bounds on the
individual-level variability, as described in II.5.3 for the clinical evidence synthesis and 0 for
preferences (method 1 for individual preferences will be used here).
Once the outcomes, models and level of predictive variability have been chosen, the calculations
required in order to assess the overall benefit-risk balance of the treatments are straightforward:
1. Pick out the utility coefficient parameters (with the selected level of predictive variability)
from the preference module. Only select those that relate to the benefit-risk criteria, not
any preference parameters for other criteria in the datasets.
2. Normalise the parameters from step 1, i.e. convert to weights
3. For each treatment and criterion, the weighted partial utility is calculated as the product of
a. the weight from step 2; and
b. the selected outcome for that criterion, on the absolute scale (from the population
calibration module), with the selected level of predictive variability.
4. For each treatment, the overall utility or benefit-risk score is the sum across all criteria of its
weighted partial utilities from step 3.
Chapter IV.2
257
The overall benefit-risk score is on an arbitrary utility scale (which may include negative utilities) as I
am not following the convention of restricting utility to the interval [0,1]. For further discussion of
the utility parameterisation see III.2.1.1.
These calculations can easily be incorporated within the model, and this is strongly recommended as
it is allows the exact posterior distribution of every calculated quantity to be reported. Rankings
based on the benefit-risk score, and the associated SUCRA statistics120 (see II.5.4), can (and should)
also be calculated in situ. Recall however that SUCRAs give no information on the magnitude (and
hence significance) of the differences in score, and so posterior summaries of the benefit-risk score
should also be presented.
IV.2 Results
The BUGS code and data used to generate these results are provided in Appendix B.
The code is generalised at the “within-dataset” level in that it is not tailored to the dimensions of the
clinical evidence dataset, the PROTECT choice dataset, the PROTECT investigator ratings dataset, the
PROTECT patient ratings dataset or the preference network meta-analysis dataset. It is not
generalised to any combination of datasets, as the number of datasets here does not warrant it and
it would hinder the comprehensibility of the code.
Computing 200,000 iterations (100,000 for burn-in and 100,000 for the posterior estimates) in
OpenBUGS (version 3.2.2 rev 1063) on a Microsoft Surface Book 2 (i5-8350U 1.70 GHz quad core)
running Windows 10 took 1008 seconds, or just under 17 minutes – longer than for either the clinical
evidence synthesis or preference models in isolation, but still not unreasonably long from a practical
perspective.
IV.2.1 Benefit-risk scores
Table 40 shows the posterior distribution of each treatment’s population average overall utility (i.e.
the benefit-risk score) broken down by to show the contribution of each criterion. Scores for
relapse, disability progression and liver enzyme are relative to a notional perfect treatment on which
these outcomes never occur. Scores for administration are relative to the baseline category (daily
subcutaneous injection).
The posterior total utility or benefit-risk score for each treatment is shown in Figure 63 for all three
levels of predictive variability.
Chapter IV.2
258
Table 40 – Benefit-risk score by treatment, with breakdown by criterion. Figures are population average posterior means and (standard deviations).
Relapse Disability
progression
Liver enzyme
elevation
Administratio
n
TOTAL
Placebo -0.045 (0.030) -0.075 (0.041) -0.055 (0.028) 0.111 (0.055) -0.066 (0.056)
Dimethyl fumarate -0.024 (0.016) -0.053 (0.03) -0.081 (0.043) 0.111 (0.055) -0.049 (0.069)
Fingolimod -0.022 (0.015) -0.057 (0.032) -0.164 (0.082) 0.111 (0.055) -0.134 (0.103)
Glatiramer acetate -0.029 (0.019) -0.055 (0.031) -0.061 (0.031) 0 (0) -0.146 (0.039)
Interferon beta-1a
(intramuscular) -0.035 (0.023) -0.062 (0.034) -0.092 (0.048) 0.027 (0.020) -0.162 (0.052)
Interferon beta-1a
(subcutaneous) -0.033 (0.022) -0.049 (0.029) -0.155 (0.082) 0.027 (0.020) -0.21 (0.076)
Interferon beta-1b -0.027 (0.018) -0.046 (0.027) -0.147 (0.076) 0.027 (0.020) -0.194 (0.073)
Laquinimod -0.036 (0.024) -0.055 (0.031) -0.106 (0.054) 0.111 (0.055) -0.088 (0.076)
Teriflunomide -0.031 (0.021) -0.057 (0.033) -0.102 (0.054) 0.111 (0.055) -0.082 (0.077)
Dimethyl fumarate has the highest average benefit-risk score, followed by placebo and then
teriflunomide and laquinimod. The three interferon-based medicines have the lowest scores.
Figure 63 – Posterior benefit-risk score for RRMS treatments at three levels of predictive variability. The markers and lines indicate the mean and 95% credibility limits. PL = placebo, DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, SC = subcutaneous interferon beta-1b, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide.
Chapter IV.2
259
Note the long left tail in the predictive distributions (especially at the individual level). This is
because in the model the annual relapse rate outcome has a lower bound of zero but no upper
bound. In reality however an RRMS patient cannot experience an unlimited number of relapses in a
year and it may be sensible to apply a cap to the modelled rates. Since this study concerns first-line
treatments, in Figure 64 the annual relapse rate has been capped at 3 on the assumption that any
patients with more severe relapse rates would be eligible for more aggressive second-line therapies.
Figure 64 – Posterior benefit-risk score for RRMS treatments at three levels of predictive variability, with a maximum of 3 relapses per year. The markers and lines indicate the mean and 95% credibility limits. PL = placebo, DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, SC = subcutaneous interferon beta-1b, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide.
The first point to note is that there is considerable overlap in the credibility intervals, so no
treatment can be declared an outright winner or loser. This is particularly true of the study- and
individual-level predictive distributions (which are naturally wider due to the additional variability)
but there is also overlap between the population-average distributions. This should not be too
surprising: all of these drugs have proven their clinical value in trials and are in use by real-world
patients, but as with all disease-modiying therapies they can only ever have a limited impact on
RRMS symptoms. Furthermore these treatments are known to be in fairly close clinical equipoise,
with the appropriate choice of treatment for a patient depending on their own individual experience
of benefits and harms93. This is particularly relevant when considering the overall benefit-risk
balance (rather than any individual outcome), as the drugs with the highest efficacy may not
perform as well on safety.Although dimethyl fumarate has the best average benefit-risk score, the
extent of the overlap means that for any given patient (or subgroup of patients), it is possible that
Chapter IV.2
260
any one of the treatments in the case study is the optimal one. As such, this analysis does not
present any compelling evidence for regulators to remove any of these treatments from the market.
Another point of interest is the position of placebo, which perhaps surprisingly has the second-best
score on average. Bearing in mind again the extent of the overlap, this does not mean that the
treatments which score less are always worse than placebo for every patient; it may however
suggest that the average patient (at least, out of those who took part in the preference elicitation
studies) places high values on safety and administrative convenience (i.e. the criteria on which
placebo outperforms all other treatments).
IV.2.2 Rankings
Figure 65 shows the SUCRA statistic 120 based on the overall benefit-risk score rankings at three
levels of predictive variability, with the annual relapse rate capped at 3 (i.e. the same distributions
shown in Figure 64.
Chapter IV.2
261
Figure 65 - SUCRA statistic for the overall benefit-risk score of the RRMS treatments at three levels of predictive variability. PL = placebo, DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, SC = subcutaneous interferon beta-1b, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide.
Chapter IV.2
262
The SUCRAs reflect the same underlying distributions as the forest plots in Figure 64 but more
clearly emphasise the relative ranks of the treatments. However, they do not convey any
information about the difference in score between ranks, which may not correspond to a substantial
change in overall utility.
Preferences for administration modes appear to have been a key driver of the results, with the four
lowest-scoring treatments all being injectable and the five highest-scoring all orally delivered.
IV.2.3 Sensitivity analyses
IV.2.3.1 Impact of exclusion of criteria from decision model
The treatment scores and rankings are considerably altered if the decision is not based on the full set
of criteria. The figures below show the (population average) SUCRA statistic by treatment for three
models with restricted sets of criteria.
Figure 66 – SUCRA statistic by treatment based on population-average benefit risk score; efficacy outcomes only. PL = placebo, DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, SC = subcutaneous interferon beta-1b, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide.
Based on efficacy alone, the best-performing drug is interferon beta-1b, followed by dimethyl
fumarate, fingolimod and subcutaneous interferon beta-1a (Figure 66).
Chapter IV.2
263
Figure 67 – SUCRA statistic by treatment based on population-average benefit risk score; liver safety only. PL = placebo, DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, SC = subcutaneous interferon beta-1b, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide.
For liver safety alone, the best treatment is placebo, followed by glatiramer acetate, dimethyl
fumarate and intramuscular interferon beta-1a (Figure 67). The rankings are in fact almost reversed
from the efficacy-only results, except in the cases of dimethyl fumarate and glatiramer acetate,
revealing their strength as all-round performers.
Figure 68 – SUCRA statistic by treatment based on population-average benefit risk score; efficacy and liver safety outcomes (but not administration). PL = placebo, DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, SC = subcutaneous interferon beta-1b, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide.
If efficacy and safety are included but the administration mode criterion is not (Figure 68) then the
injectable treatments fare somewhat better in the rankings compared to the main results (Figure
Chapter IV.2
264
65), particularly glatiramer acetate which edges into first place very slightly ahead of dimethyl
fumarate.
IV.2.3.2 Impact of choice of outcomes for weighting
The figures below show the (population average) SUCRA statistic by treatment when, for disability
and liver enzyme elevation, the outcome to which the preference weight is assumed to relate is
changed from the default set out in Table 39. For relapses the elicitation questions did not leave as
much room for ambiguity with regard to the outcome definition, so this criterion has not been
subject to the same kind of sensitivity analysis.
IV.2.3.2.1 Disability progression
Figure 69 - SUCRA statistic by treatment based on population-average benefit risk score; disability progression weight relates to disability progression events confirmed 6 months later (rather than 3 months later in the main results). PL = placebo, DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, SC = subcutaneous interferon beta-1b, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide.
Changing the meaning of the disability progression preference strength so that it relates to a
progression confirmed 6 months later (Figure 69) rather than 3 months later (main results,Figure 65)
has scarcely any effect on the SUCRAs.
Chapter IV.2
265
IV.2.3.2.2 Liver enzyme elevation
Figure 70 - SUCRA statistic by treatment based on population-average benefit risk score; liver enzyme elevation weight relates to alanine aminotransferase above 3x upper limit of normal range (rather than simply above upper limit of normal range as in the main results). PL = placebo, DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, SC = subcutaneous interferon beta-1b, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide.
Changing the meaning of the liver enzyme elevation preference strength so that it relates to a
elevation of alanine aminotransferase above 3x the upper limit of the normal range (Figure 70)
rather than 1x the upper limit (main results, Figure 65) has the effect of reducing the overall weight
on liver safety outcomes (since the more serious enzyme elevation is rarer and the magnitude of
difference between the rates on different treatments is less). This reduces the overall score for
placebo and glatiramer acetate (the two safest treatments) while giving a boost to other active
treatments, especially fingolimod (the worst performer on liver safety).
Chapter IV.2
266
Figure 71 - SUCRA statistic by treatment based on population-average benefit risk score; liver enzyme elevation weight relates to alanine aminotransferase above 5x upper limit of normal range (rather than simply above upper limit of normal range as in the main results). PL = placebo, DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, SC = subcutaneous interferon beta-1b, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide.
Changing the meaning of the liver enzyme elevation preference strength so that it relates to a
elevation of alanine aminotransferase above 5x the upper limit of the normal range (Figure 71)
rather than 3x the upper limit (Figure 70) further boosts the overall rankings for fingolimod,
subcutaenous interfreron beta-1a and interferon beta-1b, the three worst drugs for liver safety. This
time their improvement in the rankings comes not only at the expense of placebo and glatiramer
acetate but also the other active treatments. However, it is not enough to change the overall
success of dimethyl fumarate.
IV.2.3.3 Impact of exclusion of preference datasets
The figures below show the (population average) SUCRA statistic by treatment when each of the
preference datasets is excluded in turn.
Chapter IV.2
267
Figure 72 - SUCRA statistic by treatment based on population-average benefit risk score; preferences from published studies excluded. PL = placebo, DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, SC = subcutaneous interferon beta-1b, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide.
Excluding the published summary data from external preference elicitation studies resulted in less
weight being placed on safety and administration, severely diminishing placebo’s standing and
changing some of the rankings for the active drugs (Figure 72) compared to the main results (Figure
65), but still not changing dimethyl fumarate’s overall lead.
Figure 73 - SUCRA statistic by treatment based on population-average benefit risk score; PROTECT patient choice dataset excluded. PL = placebo, DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, SC = subcutaneous interferon beta-1b, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide.
Removing the patient choice data from the preference model (Figure 73) had little impact on the
overall rankings compared to the main model (Figure 65).
Chapter IV.3
268
Figure 74 - SUCRA statistic by treatment based on population-average benefit risk score; PROTECT ratings datasets excluded. PL = placebo, DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, SC = subcutaneous interferon beta-1b, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide.
Removing the ratings data from the preference model (Figure 74) shifted the preference weights
towards favouring safety, putting placebo in the lead overall and giving a boost to glatiramer acetate
while most other active drugs fared less well than in the main results (Figure 65).
IV.3 Discussion
IV.3.1 Bayesian MCDA
As far as I know, this represents the first MCDA-based benefit-risk assessment in which every one of
the clinical outcome and preference parameters are inferred from real-world evidence in a single
comprehensive fully Bayesian model. The decision problem was somewhat simplified compared to
real benefit-risk problems, as I did not attempt to incorporate an exhaustive set of safety criteria in
order to prevent the model becoming too cluttered and cumbersome to make an effective example;
however I have shown in II.4.7 how additional safety outcomes could be included.
The model has been designed to be generalizable to other datasets of arbitrary dimensions with few
changes required. Coding complex multi-stage multivariate models such as this can be a daunting
and time-consuming task, and although one must be realistic regarding how “user friendly” MCMC
modelling can ever be, I hope that providing a template will facilitate and encourage further
applications.
Chapter IV.3
269
IV.3.2 Benefit-risk assessment of RRMS treatments
It is important to bear in mind that this is not a comprehensive BR assessment due to the narrow
scope of safety outcomes included (although the results obtained for liver safety do in some respects
appear very similar to those in the Cochrane review based on treatment adherence 81, as noted in
II.6.5).
Nevertheless, it may be worthwhile to consider the clinical implications of the results. Of the RRMS
drugs investigated, dimethyl fumarate has the best posterior benefit-risk score, and is the only active
treatment outperforming placebo. This is because it has a large effect size (although not the largest)
on the efficacy outcomes and a small effect size (although not the smallest) on the liver safety
outcomes, as well as having the most favourable administration method (daily oral). Owing to this
good all-round performance, the high SUCRA value for dimethyl fumarate is for the most part very
insensitive to the choice of outcomes for weighting and the inclusion/exclusion of the various
criteria and preference datasets. However, if the weight on safety is sufficiently increased then it is
outperformed by placebo, and if the administration outcome is excluded it is outperformed by
glatiramer acetate. The rankings in general appear fairly robust to the sensitivity analyses, although
there are some changes, especially with regard to the ranking of the placebo option. If the model is
used for real-world regulatory performances, a more thorough sensitivity analysis may be
worthwhile, subjecting the model inputs to further alternate scenarios and perhaps examining the
thresholds required to change the decision outcome.
It should be recognised that the uncertainty of the benefit-risk score for each treatment is large in
relation to the differences between treatments. In other words, there is considerable overlap in the
distributions of the benefit-risk score and there are no universal winners and losers. This is
especially true when one allows for predictive variability at the study or patient level, and serves as a
reminder that patient outcomes and preferences do vary and there is no one-size-fits-all treatment.
Based on these results there is no reason to conclude that any of these treatments should be
withdrawn altogether. What these results do suggest is that the higher-ranking treatments (such as
dimethyl fumarate and teriflunomide) should perhaps be offered to patients in the first instance
(since on average they provide the highest utility) with lower-ranked treatments being kept as
reserve options. It should be borne in mind however that this analysis includes only clinical trial
data; when a treatment has been licensed for some time then a more complete picture of the real-
world safety and effectiveness might be obtained from post-marketing data (and indeed some RRMS
treatments have been withdrawn on the basis of such data81,229). Also, treatment safety and
administration may have significant impact on patient satisfaction and these potential impacts
should not be overlooked in decision making.
Chapter IV.3
270
The figures in IV.2.3.1 are a good illustration of how drastically the conclusions change depending on
whether one considers efficacy or safety in isolation, or a combination of the two, or also in
combination with administration modes. This demonstrates the importance of a multi-criteria
decision-making approach that can incorporate all aspects of treatment that matter to patients.
Chapter V.1
271
V. Conclusions
The research question posed in Chapter I of this thesis was:
“Can a modelling framework be developed that facilitates a fully Bayesian implementation of MCDA
for benefit-risk decision making; with parameters for clinical outcomes and associated preferences
directly informed by real-world data, and reflecting the uncertainties inherent in such data, while
respecting all relevant correlations and consistency relations?”
I believe that the model developed for the RRMS case study provides an affirmative answer to this
question. This chapter aims to place this result in context and consider its implications for research
and practice in the field of medical decision making.
V.1 Summary of results
The bullet points under the headings below give a high-level summary of the results described
earlier in this thesis.
V.1.1 Bayesian synthesis of clinical evidence for benefit-risk assessment (Chapter II)
Chapter II demonstrated a number of key methodological results that together enable principled
multivariate evidence synthesis of clinical outcomes for benefit-risk assessment:
• Development of a multivariate network meta-analysis model with full allowance for within-
and between-study outcome correlations. The study-specific treatment effects are defined
relative to a universal fixed baseline treatment, avoiding a potetntial asymmetry issue with
the more common parameterisation that allows the baseline treatment to vary by study.
• Extension of the multivariate NMA model to incorporate between-outcome mappings at the
mean level, useful for patchy data networks (i.e. summary trial data with a high degree of
missingness and heterogeneity in the outcome definitions).
• A novel construction of the multivariate Normal distribution via decomposition of the
variance, useful for specifying the multivariate NMA model in arbitrary dimensions in the
BUGS language.
• An extension to the multivariate NMA model allowing the inclusion of outcomes that are
known (or assumed) never to occur on certain treatments.
• Application to a multivariate outcome synthesis for several relapsing-remitting multiple
sclerosis treatments, revealing how the treatment effects (relative to placebo) compare with
regard to relapse, disability progression and liver safety outcomes.
Chapter V.1
272
• Estimation of the absolute-level distribution of outcomes within trial populations for the
same set of RRMS treatments, resulting in synthesised outcomes suitable for decision-
making using MCDA. The synthesis provides not only population averages but also the
predictive distribution of outcomes at the study or patient level.
V.1.2 Bayesian multi-criteria utility modelling (Chapter III)
In Chapter III, the work done in unifying preference elicitation methods in a probabilistic manner
resulted in some important methodological insights. These included an elucidation of the network
structure of two commonly used preference elicitation methods based on pairwise criteria ratings
(AHP and swing weighting), and establishing a general framework within which they both represent
special cases.
The main methodological achievement in Chapter III was a unified parametric model, based on the
assumption of linear additive utility, for Bayesian analysis of elicited criteria ratings data and choice
data from individuals, Bayesian meta-analysis of summary preference data from published studies,
and aggregation of (and examination of heterogeneity among) these sources of preference data.
Chapter III also provided some results regarding real-world multiple sclerosis patient preferences via
an application of the model to several sources data in various formats. Inference of multiple
sclerosis patient preferences for relapses, disability progression, liver enzyme elevation and
administration mode was performed, not just in terms of the population average but also allowing
for predictive variability at the study or participant level.
The criteria were ranked as follows, from highest to lowest weight: 1 liver enzyme elevation event, 1
disability progression event, oral vs injectable administration, 1 relapse per year.
There was notable heterogeneity in preferences between studies but not much evidence of within-
study heterogeneity.
V.1.3 Assessing the overall benefit-risk balance (Chapter IV)
Chapter IV combined the models developed in Chapters II and III to give a fully Bayesian evidence-
based multi-criteria decision analysis model for choosing between medical treatments. The model
was applied to a set of relapsing-remitting multiple sclerosis treatments, providing an assessment of
the overall benefit-risk balance.
Based on the outcomes and data that were included in the model, the treatment with the most
favourable overall balance was dimethyl fumarate, followed by teriflunomide. The administration
mode was highly influential on the results, with other oral treatments also performing well and
Chapter V.2
273
injectable treatments generally assessed as unfavourable. The rankings for dimethyl fumarate and
teriflunomide were fairly robust to sensitivity analyses on subjective aspects of the model structure.
While there were clear differences between the mean benefit-risk scores for the various treatments,
there was considerable overlap when allowing for uncertainty, especially if predictive variability at
the study and/or patient level was allowed for.
V.2 Strengths
The particular modelling approach used in this thesis has a number of key strengths that make it
well-suited to benefit-risk assessment and other medical decisions:
• The Bayesian MCMC environment facilitates simultaneous modelling of all variables so that
the uncertainty of the data is automatically propagated to the model outputs, an advantage
of the methodology that has been noted elsewhere 230.
• The models are motivated by, and constructed around, real-world examples where the
available data do not necessarily conform to ideal standards. In particular, the real-world
practicality of any MCDA-based benefit-risk model depends on its ability to analyse patchy
clinical evidence and incorporate preferences from diverse sources. This thesis was able to
address many of the key limitations of earlier work in this area 40,53
• The models can accommodate many different kinds of clinical outcomes, correlation
structures and preference elicitation formats with few changes required to the BUGS code.
V.3 Limitations
The limitations of the models have already been discussed within the preceding chapters, but an
overview will be provided here.
Bayesian statistics is not the only paradigm for modelling uncertainty, and totally different
approaches to uncertainty in MCDA have been used elsewhere, such as fuzzy sets 231 or stochastic
multicriteria acceptability analysis (SMAA) 30. However I believe that in the context of medical
decision-making the Bayesian approach makes particular sense due to its compatibility with the way
clinical evidence is gathered and analysed, and its ability to reflect any evidence that is available and
fill in any gaps with prior assumptions.
I have not provided any models that incorporate individual patient data (IPD) in the clinical evidence
synthesis, largely owing to the lack of data in the public domain. Availability of IPD for real benefit-
risk assessments is improving, however. Manufacturers will usually have such data when carrying
out their own assessments, and they are obliged to provide the regulator with IPD from pivotal
Chapter V.3
274
clinical trials when submitting a drug licence application in the EU 232. IPD has advantages over
summary data in that it reveals the shape of the within-study distribution of outcomes and
treatment effects, giving additional insight into the probability of key of key clinical or decision
thresholds being reached at the individual patient level, and allowing examination of heterogeneity
in the patient population. In particular, the benefit-risk balance in stratified subgroups can be
examined, potentially allowing more clarity in how the new drug fits in with licensing needs and
prescribing guidelines for different classes of patients, and thus allowing more specific conclusions
and recommendations to be reached. The main disadvantage of IPD is that these additional analyses
may be much more labour-intensive compared to an analysis based on summary data. Insofar as
IPD from some studies need to be aggregated with summary data from others, evidence synthesis
methods such as those in this thesis may still be required, and in such cases the available IPD may
provide useful information for the model, particularly with regard to estimating the within-study
correlations. For these reasons I believe there is still value in the summary-data approach used in
this thesis, despite the increasing availability of IPD.The mapping-based imputations in the
multivariate network meta-analysis model operate under the assumption that treatment effects in
the published study populations follow the same distribution as in the general target population.
This leaves the model open to the influence of publication bias if it turns out that any outcomes are
unreported due to poor treatment performance. While this is of course a simplifying assumption
that one must bear in mind when interpreting the results, it is not unique to this model - the same
assumption is implicitly made in any review of clinical evidence that does not control for publication
bias. Estimating the mechanism underlying outcome missingness from benefit-risk assessment
data is likely to be difficult (especially given the typical number of studies), but it may be useful to
test the sensitivity of the results to the possibility of publication bias, for example by imposing a
fixed penalty on estimated unreported outcome measures.
The mappings also rely on at least one outcome in each mapping group being present in all included
studies, thus limiting the patchiness of data that can be included and/or the number of groups that
can used.
I have not made any assessment of the reliability of the studies included in the clinical evidence
synthesis, choosing instead to rely on the screening performed by the Cochrane reviewers 81.
Similarly, I have not reviewed the literature for any trials carried out since the publication of the
Cochrane review, nor sought out any post-marketing surveillance data.
I have not set out any formal methods for evaluating inconsistency in the evidence (with regard to
treatment effects, mappings or preference ratios), despite the importance of the assumption of
Chapter V.4
275
consistency that underlies the models. Nor have I undertaken any formal assessment of
inconsistency in the RRMS case study. Evaluating inconsistency is an area that has received much
attention with regard to univariate network meta-analysis, and existing approaches could be
extended to the models developed here.
The preference modelling here was limited to linear-additive MCDA, but in principle the model could
easily be extended to other forms of additive utility function. That is, the utility function could take
the form
𝑈 = ∑ 𝑤𝑒𝑖𝑔ℎ𝑡𝜔𝑃𝑉𝐹𝜔(𝑥𝜔)
𝜔=1
Where 𝑃𝑉𝐹𝜔 is any pre-sepcified monotonic partial value function for criterion 𝜔, This is simply a
transformation of the data and requires no significant alteration of the methodology. The utility
coefficients would be estimated as in the linear case, but they would be coefficients of 𝑃𝑉𝐹𝜔(𝑥𝜔)
rather than 𝑥𝜔. (This assumes the source data can be expressed on this transformed scale – if linear
coefficients are quoted with no reference to the underlying points used for elicitation, this may not
be possible). Going beyond the additive model and allowing utility functions with interaction terms
is likely to be more challenging unless all contributing data sources use the same model. 130
With regard to the use of elicited preference data, I have not been concerned in this thesis with
issues concerning the elicitation study design, choice of method, whose preferences to use (except
insofar as directly relates to the statistical analysis). These issues have been discussed elsewhere 82
233.
I have barely scratched the surface in terms of examining preferences among groups of individuals
and the philosophy of utilitarian decision making at the group level. There is a substantial literature
on these topics in terms of both theory 184,185,234 and application 235-238, and the citations given here
are just a few examples out of many.
V.4 Reflections on generalisability & applicability
These models, although developed to be generalisable, require further testing as they have almost
exclusively been tested on only a single problem (the RRMS case study). The only exception is the
preference meta-analysis model (III.5) which is currently also being tested on a small type II diabetes
dataset, where the initial results suggesting it fits poorly. This may be indicative of the poor quality
of that dataset rather than the suitability of the model, but this emphasises the need for additional
test applications.
Chapter V.4
276
The aim was to code the models in a manner that would work on arbitrary datasets with no hard
coded parameters or dimensions. The individual models for each specific data type largely met this
requirement, but the overall model did not, due to the sheer multiplicity of combined data
structures that could in theory be incorporated. However, the model remains easily adaptable.
The use of elicited preferences in formal models is sometimes seen as controversial due to their
subjective nature, and some have raised concerns that a high degree of heterogeneity should be
expected 239. These concerns may be well-founded, or not – the only way to know for sure is to
actually try modelling the preferences arising from different elicitation studies, and the methods
developed here are well suited to this task. As long as the heterogeneity of preferences is within
reasonable limits (or homogeneous subgroups can be identified) then there would seem to be some
merit in the MCDA approach and the model can provide estimates of the necessary parameters.
Even in the absence of extensive data on heterogeneity, the model could help by simulating the
predictive distribution of the benefit-risk balance under various scenarios – for example, one could
answer questions such as “how much between-study preference heterogeneity is required to make
treatment X more favoured than treatment Y at least 50% of the time?”
With regard to the RRMS case study in particular, the preferences have a substantial amount of
between-study heterogeneity, but I do not believe that this is sufficient grounds to call the whole
modelling approach into question. Indeed, it may actually be an argument in favour of Bayesian
preference estimation. When heterogeneity is present it does of course increase the variability of
the benefit-risk balance and therefore make the task of the decision maker more complex, but the
advantage of Bayesian MCMC is that it is straightforward to incorporate this variability in the results
via the predictive distribution. And, of course, using a Bayesian MCDA model does not create this
heterogeneity, it merely reveals it. Using a simpler decision making approach might obscure this
aspect of the problem altogether and lead to poor decisions.
The validity of the assumption of an additive linear utility function (i.e. mutual utility independence
of the decision criteria) is not guaranteed. Elicitation studies can be designed to test this assumption
by including interaction coefficients in the utility model and testing their significance. Some have
argued that additive linearity is usually a reasonable assumption 180 whilst others have argued for a
more cautious approach, recommending that criteria are tested for any violations of this
assumption and the impact assessed40.
Another important structural assumption is that of proportionality between outcomes, which
underlies the mappings in the clinical evidence synthesis. This seems a reasonable starting point but
Chapter V.5
277
other non-proportional links between outcomes could instead be used if evidence or logic suggests
this would be more appropriate. Developing a test of the proportionality in the data may be of use
here.
V.5 Contribution to the field
This work has made several contributions to the state of the art in evidence synthesis, benefit risk-
assessment and preference modelling.
I believe the clinical evidence synthesis model represents the most powerful and flexible
multivariate NMA yet published, with full rigour when it comes to correlations. Furthermore, the
mapping-based approach to outcome imputation adds strength and helps to fill in patchy data
networks, an important development that should facilitate the application of multivariate NMA to
real-world situations where perfect data structures may not be available. I sometimes refer to this
methodology as “patchwork meta-analysis” since it uses meta-analytical techniques to patch
together fragments of data that might at first seem incompatible. Generalised model code has been
provided in Appendix B with a view to aiding future applications, and I am exploring possibilities for
hosting the code files online.
Several important advances have also been made with regard to the probabilistic modelling of
preferences for medical multi-criteria decisions. Firstly, the features of various preference elicitation
methodologies that relate to their statistical properties have been elucidated (such as whether the
data consists of ratings or rankings, the role of the network structure, and the use of substitution or
agglomeration to move through the network). Secondly, statistical models have been identified or
developed for the analysis of the most common types of individual-level elicitation data. Thirdly, a
unified Bayesian parametric framework has been proposed that can make joint inferences using
several such models in combination. Finally, a meta-analytical framework for comparison and
aggregation of summary preference data from previously published studies has been developed. All
of these novel approaches have been demonstrated using real-world preference data for RRMS
treatments.
In isolation, these are significant contributions to the field of probabilistic multi-criteria medical
decision making. Taken together, they demonstrate the feasibility and power of a holistic evidence-
based Bayesian modelling approach, and an important step forward in terms of the statistical rigour
and sophistication that can be applied to such problems.
Various other Bayesian models involving elicited preferences or utilities have appeared in the
literature, sometimes in the benefit-context. This work however stands out in terms of the rigour
Chapter V.6
278
with which the uncertainty is characterised and the range of data types that can be accommodated.
An alternative Bayesian model for uncertainty on elicited weights treats the preference weights
themselves as observations drawn from exchangeable distributions 178 but, unlike my model (where
the observations are the original ratings or choices) this approach does not account for asymmetries
that may result from the elicitation method. Others have also used a Bayesian analysis of choice
data in order to elicit preferences for benefit-risk assessment 240. However, this thesis goes further
by showing that such an analysis can be combined with other types of elicitation data using a
common parameterisation. Non-Bayesian probabilistic MCDA benefit-risk models have also
appeared: one such model obtained the uncertainty on weights using a bootstrap technique33, but
this procedure cannot be incorporated in a single-step Bayesian MCMC model and would not extend
easily to aggregation of multiple somewhat heterogeneous preference datasets. Another approach
is to elicit the uncertainty level alongside the central preference estimates27 but this requires
specially designed elicitation studies (and may not be compatible with many elicitation methods).
The Bayesian approach stands out for its ability to account for uncertainty in a principled manner in
a variety of data structures based on standard elicitation methods.
Bayesian MCMC modelling is a specialist field, and hence any one of the models discussed in this
thesis may be demanding to apply in practice. However, in pivotal benefit-risk asessments, many
drug manufacturers may already be using some form of MCDA, network meta-analysis and/or
Bayesian MCMC modelling to support their applications to regulators or for internal decision-
making. Given a familiarity with these methods, it should be quite feasible for some real-world
decision makers to combine them into an integrated model along the lines I have laid out here.
V.6 Future research priorities
Trying all of these methods on more datasets is naturally of key importance. Convergence of MCMC
algorithms may vary between datasets 241, and the validity of the model assumptions cannot be
taken for granted.
In terms of further development of the model there are some obvious extensions that could be
attempted, including:
• Incorporating other types of outcome such as multinomial or time-to-event variables in the
clinical evidence synthesis.
• Altering the way the mappings are applied so that there is no single “baseline” outcome in
each mapping group that must be present in every study.
Chapter V.6
279
• Extending the preference models to allow for correlations between criteria preference
strengths – these correlations can be incorporated easily using the coding technique
described in II.4.4 but were omitted here for the sake of simplicity.
• Incorporating additional preference data formats such as absolute ratings of multi-criteria
scenarios (aka conjoint analysis) or ordinal pairwise criteria ratings.
• Providing measures of the consistency of the treatment effects (where the evidence
network contains loops), outcome mappings (i.e. whether the proportionality assumption
holds) and preferences from different sources.
Other targets for future research include:
• Simulation studies to examine the models’ performance more systematically across a range
of possible data structures.
• A more complete benefit-risk assessment of RRMS treatments, with a set of criteria that
better reflects the safety profile of the drugs than the simplified case study adopted here.
• Checking whether the assumption of preference independence holds in real patient
populations.
• Further examination of the homogeneity/heterogeneity of preferences in the patient
population, and the implications for MCDA-based benefit-risk assessment. The picture may
be different across different disease areas. Investigating subgroups (using latent class
analysis, for example 228,242-244) may reveal predictable structures underlying any
heterogeneity.
• Clarification of best practice for elicitation, including consideration of how the framing of
elicitation questions can influence the results40 and how this can best be accounted for in
both study design and interpretation.
It will also be important to raise awareness the existence of these methods to decision makers such
as pharmaceutical companies and regulators. Alongside this, however, it may be necessary to
develop methods that simplify implementation of the models. Although I have made efforts to keep
the models as generalisable and user-friendly as reasonably possible, it must be recognised that
running such complex models in current Gibbs sampling software is a task that inevitably requires
some expertise. Ultimately, if the approach’s worth can be proved through further case examples
and examination of the above issues, then it may be sensible to try to simplify implementation by
developing automated software routines (such as an R package, or a standalone program).
However, this is far from a trivial task given the unpredictability of MCMC convergence241.
Chapter V.7
280
V.7 Concluding summary
Assessment of the benefit-risk balance of treatments, and medical decision-making in general, can
be put on a more formal footing using multi-criteria decision analysis with explicit value judgements.
However this method is largely unfamiliar in the health sciences and its reliability and technical
capabilities have not been properly evaluated.
This thesis successfully shows that a Bayesian MCMC approach can successfully address many of the
technical challenges involved in jointly modelling clinical and preference variables, and provides a
framework for constructing fully probabilistic MCDA models for comparing treatments in terms of
several conflicting clinical outcomes. It also provides an illustration, via the RRMS case study, that
preferences arising from multiple study populations and elicitation methods can be combined in a
coherent model. These are significant steps forward in terms of MCDA modelling in healthcare,
since evidence-based medicine should include evidence from all reliable sources.
Nevertheless, the reliability and practicality of using this approach for decision-making requires
further research, with key priorities including further investigation of the distribution of preferences
within patient populations, establishment of reliable elicitation practices, and easing the
implementation of Bayesian MCMC simulation.
281
References
1. Sackett DL, Rosenberg WMC, Gray JAM, Haynes RB, Richardson WS. Evidence based medicine: what it is and what it isn't. BMJ. 1996;312(7023):71-72.
2. EMA. ICH guideline E2C (R2) on periodic benefit-risk evaluation report (PBRER). 2012. 3. Mackay FJ. Post-marketing studies: the work of the Drug Safety Research Unit. Drug safety.
1998;19(5):343-353. 4. Levine MN, Julian JA. Registries That Show Efficacy: Good, but Not Good Enough. J Clin
Oncol. 2008;26(33):5316-5319. 5. Hughes D, Waddingham E, Mt‐Isa S, et al. Recommendations for benefit–risk assessment
methodologies and visual representations. Pharmacoepidemiology and Drug Safety. 2016;25(3):251-262.
6. Hughes DA, Bayoumi AM, Pirmohamed M. Current assessment of risk-benefit by regulators: is it time to introduce decision analyses? Clin Pharmacol Ther. 2007;82(2):123-127.
7. Bridges JFP, Hauber AB, Marshall D, et al. Conjoint Analysis Applications in Health—a Checklist: A Report of the ISPOR Good Research Practices for Conjoint Analysis Task Force. Value in Health. 2011;14(4):403-413.
8. Marshall D, Bridges JFP, Hauber B, et al. Conjoint Analysis Applications in Health — How are Studies being Designed and Reported? The Patient: Patient-Centered Outcomes Research. 2010;3(4):249-256.
9. EMA. Benefit-risk methodology. 2009; https://www.ema.europa.eu/en/about-us/support-research/benefit-risk-methodology. Accessed 05/05/2019, 2019.
10. PROTECT. About PROTECT. 2009; http://www.imi-protect.eu/about.shtml. Accessed 16/01/2019, 2019.
11. Coplan P, Noel R, Levitan B, Ferguson J, Mussen F. Development of a Framework for Enhancing the Transparency, Reproducibility and Communication of the Benefit–Risk Balance of Medicines. Clinical Pharmacology & Therapeutics. 2011;89(2):312-315.
12. Thokala P, Devlin N, Marsh K, et al. Multiple Criteria Decision Analysis for Health Care Decision Making-An Introduction: Report 1 of the ISPOR MCDA Emerging Good Practices Task Force. Value in Health. 2016;19(1):1-13.
13. Hammond JS, Keeney RL, Raiffa H. Smart choices: A practical guide to making better decisions. Harvard Business Review Press; 2015.
14. Mt-Isa S, Hallgreen CE, Wang N, et al. Balancing benefit and risk of medicines: a systematic review and classification of available methodologies. Pharmacoepidemiology and drug safety. 2014;23(7):667-678.
15. Ho MP, Gonzalez JM, Lerner HP, et al. Incorporating patient-preference evidence into regulatory decision making. Surgical endoscopy. 2015;29(10):2984-2993.
16. Sutton AJ, Cooper NJ, Abrams KR, Lambert PC, Jones DR. A Bayesian approach to evaluating net clinical benefit allowed for parameter uncertainty. Journal of clinical epidemiology. 2005;58(1):26-40.
17. Broekhuizen H, Groothuis-Oudshoorn CGM, van Til JA, Hummel JM, Ijzerman MJ. A Review and Classification of Approaches for Dealing with Uncertainty in Multi-Criteria Decision Analysis for Healthcare Decisions. Pharmacoeconomics. 2015;33(5):445-455.
18. Durbach IN, Stewart TJ. Modeling uncertainty in multi-criteria decision analysis. European Journal of Operational Research. 2012;223(1):1-14.
19. Wen S, Zhang L, Yang B. Two approaches to incorporate clinical data uncertainty into multiple criteria decision analysis for benefit-risk assessment of medicinal products. Value in health : the journal of the International Society for Pharmacoeconomics and Outcomes Research. 2014;17(5):619-628.
282
20. Mosadeghi R, Warnken J, Tomlinson R, Mirfenderesk H. Uncertainty analysis in the application of multi-criteria decision-making methods in Australian strategic environmental decisions. Journal of Environmental Planning and Management. 2013;56(8):1097-1124.
21. Kangas AS, Kangas J. Probability, possibility and evidence: approaches to consider risk and uncertainty in forestry decision analysis. Forest Policy and Economics. 2004;6(2):169-188.
22. Stewart TJ, Durbach I. Dealing with uncertainties in MCDA. In. International Series in Operations Research and Management Science. Vol 2332016:467-496.
23. Svecova L, Fotr J, Vrbova L. A MULTI-CRITERIA EVALUATION OF ALTERNATIVES UNDER RISK. 6th International Days of Statistics and Economics. 2012:1090-1100.
24. Basak I. Probabilistic judgments specified partially in the Analytic Hierarchy Process. European Journal of Operational Research. 1998;108(1):153-164.
25. Bech M, Gyrd-Hansen D, Kjær T, Lauriden J, Sørensen J. Graded pairs comparison - Does strength of preference matter? Analysis of preferences for specialised nurse home visits for pain management. Health Economics. 2007;16(5):513-529.
26. Dekker T, Hess S, Brouwer R, Hofkes M. Decision uncertainty in multi-attribute stated preference studies. Resource and Energy Economics. 2016;43:57-73.
27. Jessop A. Using imprecise estimates for weights. Journal of the Operational Research Society. 2011;62(6):1048-1055.
28. Voltaire L, Pirrone C, Bailly D. Dealing with preference uncertainty in contingent willingness to pay for a nature protection program: A new approach. Ecological Economics. 2013;88:76-85.
29. Nixon R, Dierig C, Mt-Isa S, et al. A case study using the PrOACT-URL and BRAT frameworks for structured benefit risk assessment. Biometrical Journal. 2016;58(1):8-27.
30. Lahdelma R, Hokkanen J, Salminen P. SMAA - Stochastic multiobjective acceptability analysis. European Journal of Operational Research. 1998;106(1):137-143.
31. Tervonen T, van Valkenhoef G, Buskens E, Hillege HL, Postmus D. A stochastic multicriteria model for evidence-based decision making in drug benefit-risk analysis. Statistics in medicine. 2011;30(12):1419-1428.
32. van Valkenhoef G, Tervonen T, Zhao J, de Brock B, Hillege HL, Postmus D. Multicriteria benefit-risk assessment using network meta-analysis. Journal of clinical epidemiology. 2012;65(4):394-403.
33. Broekhuizen H, Groothuis-Oudshoorn CG, Hauber AB, Jansen JP, MJ IJ. Estimating the value of medical treatments to patients using probabilistic multi criteria decision analysis. BMC medical informatics and decision making. 2015;15:102.
34. Smith JQ. Bayesian decision analysis: Principles and practice. 2010. 35. Ashby D. Bayesian statistics in medicine: a 25 year review. Statistics in medicine.
2006;25(21):3589-3631. 36. Harrell FE, Shih YCT. Using full probability models to compute probabilities of actual interest
to decision makers. International Journal of Technology Assessment in Health Care. 2001;17(1):17-26.
37. Stangl DK. Bridging the gap between statistical analysis and decision making in public health research. Statistics in Medicine 2005; 24:503-511.
38. Ashby D, Smith AFM. Evidence-based medicine as Bayesian decision-making. Statistics in Medicine. 2000;19(23):3291-3305.
39. Costa MJ, He W, Jemiai Y, Zhao Y, Di Casoli C. The Case for a Bayesian Approach to Benefit-Risk Assessment::Overview and Future Directions. Therapeutic Innovation & Regulatory Science. 2017;51(5):568-574.
40. Garcia-Hernandez A. A Note on the Validity and Reliability of Multi-Criteria Decision Analysis for the Benefit-Risk Assessment of Medicines. Drug safety. 2015;38(11):1049-1057.
41. Muehlbacher AC. Patient-centric HTA: different strokes for different folks. Expert Review of Pharmacoeconomics & Outcomes Research. 2015;15(4):591-597.
283
42. Umar N, Schaarschmidt M, Schmieder A, Peitsch WK, Schoellgen I, Terris DD. Matching physicians' treatment recommendations to patients' treatment preferences is associated with improvement in treatment satisfaction. Journal of the European Academy of Dermatology and Venereology. 2013;27(6):763-770.
43. Efthimiou O, Mavridis D, Cipriani A, Leucht S, Bagos P, Salanti G. An approach for modelling multiple correlated outcomes in a network of interventions using odds ratios. Statistics in medicine. 2014;33(13):2275-2287.
44. Efthimiou O, Mavridis D, Riley RD, Cipriani A, Salanti G. Joint synthesis of multiple correlated outcomes in networks of interventions. Biostatistics (Oxford, England). 2015;16(1):84-97.
45. Hong H, Carlin BP, Shamliyan TA, et al. Comparing Bayesian and Frequentist Approaches for Multiple Outcome Mixed Treatment Comparisons. Medical Decision Making. 2013;33(5):702-714.
46. Ades AE, Lu G, Dias S, Mayo-Wilson E, Kounali D. Simultaneous synthesis of treatment effects and mapping to a common scale: an alternative to standardisation. Research synthesis methods. 2015;6(1):96-107.
47. Lu G, Kounali D, Ades AE. Simultaneous Multioutcome Synthesis and Mapping of Treatment Effects to a Common Scale. Value in Health. 2014;17(2):280-287.
48. Lunn DJ, Thomas A, Best N, Spiegelhalter D. WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing. 2000;10(4):325-337.
49. CIOMS. Benefit-Risk Balance for Marketed Drugs: Evaluating Safety Signals, Report of CIOMS Working Group IV. In. Chemistry International -- Newsmagazine for IUPAC. Vol 211999:48.
50. Mussen F, Salek S, Walker S. A quantitative approach to benefit-risk assessment of medicines - part 1: the development of a new model using multi-criteria decision analysis. Pharmacoepidemiology and drug safety. 2007;16 Suppl 1:S2-s15.
51. Caster O, Noren GN, Ekenberg L, Edwards IR. Quantitative benefit-risk assessment using only qualitative information on utilities. Medical decision making : an international journal of the Society for Medical Decision Making. 2012;32(6):E1-15.
52. Tervonen T. JSMAA: open source software for SMAA computations. International Journal of Systems Science. 2014;45(1):69-81.
53. Waddingham E, Mt-Isa S, Nixon R, Ashby D. A Bayesian approach to probabilistic sensitivity analysis in structured benefit-risk assessment. Biometrical Journal. 2016;58(1):28-42.
54. Tervonen T, Naci H, van Valkenhoef G, et al. Applying Multiple Criteria Decision Analysis to Comparative Benefit-Risk Assessment: Choosing among Statins in Primary Prevention. Medical Decision Making. 2015;35(7):859-871.
55. Marsh K, Ijzerman M, Thokala P, et al. Multiple Criteria Decision Analysis for Health Care Decision Making—Emerging Good Practices: Report 2 of the ISPOR MCDA Emerging Good Practices Task Force. Value in Health. 2016;19(2):125-137.
56. Glass GV. Primary, Secondary, and Meta-Analysis of Research. Educational Researcher. 1976;5(10):3-8.
57. Pearson K. Report on Certain Enteric Fever Inoculation Statistics. British Medical Journal. 1904;2(2288):1243-1246.
58. Ades AE. A chain of evidence with mixed comparisons: models for multi-parameter synthesis and consistency of evidence. Statistics in Medicine. 2003;22(19):2995-3016.
59. Bucher HC, Guyatt GH, Griffith LE, Walter SD. The results of direct and indirect treatment comparisons in meta-analysis of randomized controlled trials. Journal of clinical epidemiology. 1997;50(6):683-691.
60. Lu G, Ades AE. Combination of direct and indirect evidence in mixed treatment comparisons. Statistics in Medicine. 2004;23(20):3105-3124.
61. Lee AW. Review of mixed treatment comparisons in published systematic reviews shows marked increase since 2009. Journal of clinical epidemiology. 2014;2(67):138-143.
284
62. Jansen JP, Fleurence R, Devine B, et al. Interpreting Indirect Treatment Comparisons and Network Meta-Analysis for Health-Care Decision Making: Report of the ISPOR Task Force on Indirect Treatment Comparisons Good Research Practices: Part 1. Value in Health. 2011;14(4):417-428.
63. Dias SW, N.J.; Sutton. A.J.; Ades, A.E. NICE DSU Technical Support Document 2: A general linear modelling framework for pair-wise and network meta-analysis of randomised controlled trials; 2011 (updated 2016) http://nicedsu.org.uk/wp-content/uploads/2017/05/TSD2-General-meta-analysis-corrected-2Sep2016v2.pdf. Accessed: 18/02/2020.
64. Salanti G, Kavvoura FK, Ioannidis JPA. Exploring the Geometry of Treatment Networks. Annals of Internal Medicine. 2008;148(7):544-553.
65. Dias S, Welton NJ, Sutton AJ, Caldwell DM, Lu G, Ades AE. Evidence synthesis for decision making 4: inconsistency in networks of evidence based on randomized controlled trials. Medical decision making : an international journal of the Society for Medical Decision Making. 2013;33(5):641-656.
66. Lu G, Ades AE. Assessing Evidence Inconsistency in Mixed Treatment Comparisons. Journal of the American Statistical Association. 2006;101(474):447-459.
67. Ades AE, Sculpher M, Sutton A, et al. Bayesian methods for evidence synthesis in cost-effectiveness analysis. Pharmacoeconomics. 2006;24(1):1-19.
68. Riley RD. Multivariate meta-analysis: the effect of ignoring within-study correlation. Journal of the Royal Statistical Society: Series A (Statistics in Society). 2009;172(4):789-811.
69. Dias S, Welton NJ, Caldwell DM, Ades AE. Checking consistency in mixed treatment comparison meta-analysis. Statistics in Medicine. 2010;29(7‐8):932-944.
70. Lu G, Welton NJ, Higgins JPT, White IR, Ades AE. Linear inference for mixed treatment comparison meta-analysis: A two-stage approach. Research synthesis methods. 2011;2(1):43-60.
71. Caster O, Edwards IR. Quantitative benefit-risk assessment of methylprednisolone in multiple sclerosis relapses. BMC neurology. 2015;15:206.
72. Juhaeri J, Amzal B, Chan E, et al. Wave 2 Case Study Report: Rimonabant. 2012. 73. Smith TC, Spiegelhalter DJ, Thomas A. Bayesian approaches to random-effects meta-analysis:
a comparative study. Statistics in medicine. 1995;14(24):2685-2699. 74. Warn DE, Thompson SG, Spiegelhalter DJ. Bayesian random effects meta-analysis of trials
with binary outcomes: methods for the absolute risk difference and relative risk scales. Statistics in Medicine. 2002;21(11):1601-1623.
75. Bucher HC, Griffith L, Guyatt GH, Opravil M. Meta-analysis of prophylactic treatments against Pneumocystis carinii pneumonia and toxoplasma encephalitis in HIV-infected patients. J Acquir Immune Defic Syndr Hum Retrovirol. 1997;15(2):104-114.
76. Higgins JP, Whitehead A. Borrowing strength from external trials in a meta-analysis. Statistics in medicine. 1996;15(24):2733-2749.
77. Lumley T. Network meta-analysis for indirect treatment comparisons. Statistics in medicine. 2002;21(16):2313-2324.
78. Dias S, Sutton AJ, Ades AE, Welton NJ. Evidence Synthesis for Decision Making 2: A Generalized Linear Modeling Framework for Pairwise and Network Meta-analysis of Randomized Controlled Trials. Medical Decision Making. 2013;33(5):607-617.
79. Lu G, Ades A. Modeling between-trial variance structure in mixed treatment comparisons. Biostatistics. 2009;10(4):792-805.
80. Welton NJ, Caldwell DM, Adamopoulos E, Vedhara K. Mixed treatment comparison meta-analysis of complex interventions: psychological interventions in coronary heart disease. Am J Epidemiol. 2009;169(9):1158-1165.
285
81. Tramacere I, Filippini G, Del Giovane C, et al. Immunomodulators and immunosuppressants for multiple sclerosis: a network meta-analysis. The Cochrane database of systematic reviews. 2013(6):Cd008933.
82. NICE. Guide to the methods of technology appraisal; 2013. https://www.nice.org.uk/process/pmg9/resources/guide-to-the-methods-of-technologyappraisal-2013-pdf-2007975843781 Accessed: 29/07/2019.
83. Bujkiewicz S, Thompson JR, Sutton AJ, et al. Multivariate meta-analysis of mixed outcomes: a Bayesian approach. Statistics in medicine. 2013;32(22):3926-3943.
84. Jackson D, Riley R, White IR. Multivariate meta-analysis: Potential and promise. Statistics in medicine. 2011;30(20):2481-2498.
85. Riley RD, Abrams KR, Lambert PC, Sutton AJ, Thompson JR. An evaluation of bivariate random-effects meta-analysis for the joint synthesis of two correlated outcomes. Statistics in medicine. 2007;26(1):78-97.
86. Madan J, Chen Y-F, Aveyard P, et al. Synthesis of evidence on heterogeneous interventions with multiple outcomes recorded over multiple follow-up times reported inconsistently: a smoking cessation case-study. Journal of the Royal Statistical Society: Series A (Statistics in Society). 2014;177(1):295-314.
87. Welton NJ, Cooper NJ, Ades AE, Lu G, Sutton AJ. Mixed treatment comparison with multiple outcomes reported inconsistently across trials: Evaluation of antivirals for treatment of influenza A and B. Statistics in Medicine. 2008;27(27):5620-5639.
88. Pedder H, Dias S, Bennetts M, Boucher M, Welton NJ. Modelling time-course relationships with multiple treatments: Model-based network meta-analysis for continuous summary outcomes. Research synthesis methods. 2019;10(2):267-286.
89. Daniels MJ, Hughes MD. Meta-analysis for the evaluation of potential surrogate markers. Statistics in Medicine. 1997; Sep 15;16(17):1965-82.
90. Dias S, Ades AE, Welton NJ, Jansen JP, Sutton AJ. Network Meta-Analysis for Decision-Making. Wiley; 2018.
91. Welton NJ, Sutton AJ, Cooper NJ, Abrams KR, Ades AE. Evidence Synthesis for Decision Making in Healthcare. 2012.
92. Rosati G. The prevalence of multiple sclerosis in the world: an update. Neurol Sci. 2001;22(2):117-139.
93. Gajofatto A, Benedetti MD. Treatment strategies for multiple sclerosis: When to start, when to change, when to stop? World Journal of Clinical Cases : WJCC. 2015;3(7):545-555.
94. Bornstein MB, Miller A, Slagle S, et al. A pilot trial of Cop 1 in exacerbating-remitting multiple sclerosis. N Engl J Med. 1987;317(7):408-414.
95. Cadavid D, Wolansky Lj Fau - Skurnick J, Skurnick J Fau - Lincoln J, et al. Efficacy of treatment of MS with IFNbeta-1b or glatiramer acetate by monthly brain MRI in the BECOME study. Neurology. 2009;23(72):1976-1983.
96. Calabresi PA, Radue E-W, Goodin D, et al. Safety and efficacy of fingolimod in patients with relapsing-remitting multiple sclerosis (FREEDOMS II): a double-blind, randomised, placebo-controlled, phase 3 trial. The Lancet Neurology. 2014;13(6):545-556.
97. Comi G, Jeffery D, Kappos L, et al. Placebo-Controlled Trial of Oral Laquinimod for Multiple Sclerosis. New England Journal of Medicine. 2012;366(11):1000-1009.
98. Durelli L, Verdun E, Barbero P, et al. Every-other-day interferon beta-1b versus once-weekly interferon beta-1a for multiple sclerosis: results of a 2-year prospective randomised multicentre study (INCOMIN). The Lancet.359(9316):1453-1460.
99. Ebers GC. Randomised double-blind placebo-controlled study of interferon beta-1a in relapsing/remitting multiple sclerosis. The Lancet. 1998;352(9139):1498-1504.
100. Fox RJ, Miller DH, Phillips JT, et al. Placebo-Controlled Phase 3 Study of Oral BG-12 or Glatiramer in Multiple Sclerosis. New England Journal of Medicine. 2012;367(12):1087-1097.
286
101. Gold R, Kappos L, Arnold DL, et al. Placebo-Controlled Phase 3 Study of Oral BG-12 for Relapsing Multiple Sclerosis. New England Journal of Medicine. 2012;367(12):1098-1107.
102. Jacobs LD, Cookfair DL, Rudick RA, et al. Intramuscular interferon beta-1a for disease progression in relapsing multiple sclerosis. Annals of Neurology. 1996;39(3):285-294.
103. Johnson KP, Brooks BR, Cohen JA, et al. Copolymer 1 reduces relapse rate and improves disability in relapsing‐remitting multiple sclerosis: Results of a phase III multicenter, double‐blind, placebo‐controlled trial. Neurology. 1995;45(7):1268-1276.
104. Kappos L, Radue E-W, O'Connor P, et al. A Placebo-Controlled Trial of Oral Fingolimod in Relapsing Multiple Sclerosis. New England Journal of Medicine. 2010;362(5):387-401.
105. Mikol DD, Barkhof F, Chang P, et al. Comparison of subcutaneous interferon beta-1a with glatiramer acetate in patients with relapsing multiple sclerosis (the REbif vs Glatiramer Acetate in Relapsing MS Disease [REGARD] study): a multicentre, randomised, parallel, open-label trial. The Lancet Neurology. 2008;7(10):903-914.
106. O'Connor P, Filippi M, Arnason B, et al. 250 microg or 500 microg interferon beta-1b versus 20 mg glatiramer acetate in relapsing-remitting multiple sclerosis: a prospective, randomised, multicentre study. The Lancet Neurology. 2009;8(10):889-897.
107. O'Connor P, Wolinsky JS, Confavreux C, et al. Randomized Trial of Oral Teriflunomide for Relapsing Multiple Sclerosis. New England Journal of Medicine. 2011;365(14):1293-1303.
108. Paty DW, Li DKB, Group TIMS. Interferon beta-1b is effective in relapsing-remitting multiple sclerosis. I. Clinical results of a multicenter, randomized, double-blind, placebo-controlled trial. Neurology. 1993;43(4):655-661.
109. Vollmer TL, Sorensen PS, Selmaj K, et al. A randomized placebo-controlled phase III trial of oral laquinimod for multiple sclerosis. Journal of Neurology. 2014;261(4):773-783.
110. Koch-Henriksen N, Sørensen PS, Christensen T, et al. A randomized study of two interferon- beta treatments in relapsing–remitting multiple sclerosis. Neurology. 2006;66(7):1056.
111. Deeks JJ. Issues in the selection of a summary statistic for meta-analysis of clinical trials with binary outcomes. Statistics in Medicine. 2002;21(11):1575-1600.
112. Jansen JP. Network meta-analysis of survival data with fractional polynomials. BMC Medical Research Methodology. 2011;11(1):61.
113. Ouwens MJ, Philips Z, Jansen JP. Network meta-analysis of parametric survival curves. Research synthesis methods. 2010;1(3-4):258-271.
114. Franchini AJ, Dias S, Ades AE, Jansen JP, Welton NJ. Accounting for correlation in network meta-analysis with multi-arm trials. Research synthesis methods. 2012;3(2):142-160.
115. Riley RD, Jackson D, Salanti G, et al. Multivariate and network meta-analysis of multiple outcomes and multiple treatments: rationale, concepts, and examples. BMJ. 2017;358.
116. Riley RD, Thompson JR, Abrams KR. An alternative model for bivariate random-effects meta-analysis when the within-study correlations are unknown. Biostatistics. 2008;9(1):172-186.
117. Sweeting MJ, Sutton Aj Fau - Lambert PC, Lambert PC. What to add to nothing? Use and avoidance of continuity corrections in meta-analysis of sparse data. Statistics in Medicine. 2004; 23: 1351-1375.
118. Phillips R, Hazell L, Sauzet O, Cornelius V. Analysis and reporting of adverse events in randomised controlled trials: a review. BMJ open. 2019;9(2):e024537-e024537.
119. Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2002;64(4):583-639.
120. Salanti G, Ades AE, Ioannidis JPA. Graphical methods and numerical summaries for presenting results from multiple-treatment meta-analysis: an overview and tutorial. Journal of Clinical Epidemiology. 2011;64(2):163-171.
121. Walker R, Schulz M, Arora B, et al. 057 Real world evidence (RWE) on long-term persistence of fingolimod in relapsing-remitting multiple sclerosis (RRMS) in australia. Journal of Neurology, Neurosurgery & Psychiatry. 2018;89(6):A23.
287
122. Team SD. Stan User's Guide. 2019; https://mc-stan.org/docs/2_22/stan-users-guide/index.html. Accessed 11/02/2020.
123. Spiegelhalter DJ, Thomas A, Best NG, Lunn DJ. WinBUGS User Manual, version 1.4. 2003; http://www.mrc-bsu.cam.ac.uk/wp-content/uploads/manual14.pdf. Accessed 11/02/2020.
124. Jansen JP, Trikalinos T, Cappelleri JC, et al. Indirect Treatment Comparison/Network Meta-Analysis Study Questionnaire to Assess Relevance and Credibility to Inform Health Care Decision Making: An ISPOR-AMCP-NPC Good Practice Task Force Report. Value in Health. 2014;17(2):157-173.
125. Madan J, Stevenson MD, Cooper KL, Ades AE, Whyte S, Akehurst R. Consistency between direct and indirect trial evidence: Is direct evidence always more reliable? Value in Health. 2011;14(6):953-960.
126. Egger M, Smith GD, Schneider M, Minder C. Bias in meta-analysis detected by a simple, graphical test. 1997;315(7109):629-634.
127. Figueira J, Greco S, Ehrgott M. Multiple Criteria Decision Analysis: State of the Art Surveys. New York: Springer; 2005.
128. Keeney RL, Raiffa H. Decisions with Multiple Objectives: Preferences and Value Tradeoffs. New York: Wiley; 1976.
129. Rothrock L, Yin J. Integrating Compensatory and Noncompensatory Decision-Making Strategies in Dynamic Task Environments. In: Kugler T, Smith JC, Connolly T, Son Y-J, eds. Decision Modeling and Behavior in Complex and Uncertain Environments. New York, NY: Springer New York; 2008:125-141.
130. Saint-Hilary G, Robert V, Gasparini M, Jaki T, Mozgunov P. A novel measure of drug benefit–risk assessment based on Scale Loss Score. Statistical Methods in Medical Research. 2019;28(9):2738-2753.
131. Pauly MV, McGuire TG, Barros PP. Handbook of Health Economics. Amsterdam, Netherlands: Elsevier Science & Technology; 2012.
132. Petrou S, Gray A. Economic evaluation using decision analytical modelling: design, conduct, analysis, and reporting. 2011;342:d1766.
133. von Neumann JM, O. Theory of Games and Economic Behavior. Princeton, NJ: Princeton University Press; 1943.
134. Dyer JS. Multiattribute Utility Theory (MAUT). In: Greco S, Ehrgott M, Figueira JR, eds. Multiple Criteria Decision Analysis State of the Art Surveys. Vol 1. New York, NY: Springer; 2016:285-314.
135. Phillips LD. A theory of requisite decision models. Acta Psychologica. 1984;56(1):29-48. 136. Anderson NH. Functional Measurement and Psychophysical Judgement. Psychological
Review. 1970;77(3):153-&. 137. Torrance GW, Boyle MH, Horwood SP. Application of multi-attribute utility theory to
measure social preferences for health states. Operations Research. 1982;30(6):1043-1069. 138. Whitehead SJ, Ali S. Health outcomes in economic evaluation: the QALY and utilities. British
Medical Bulletin. 2010;96(1):5-21. 139. Marsh K, Lanitis T, Neasham D, Orfanos P, Caro J. Assessing the Value of Healthcare
Interventions Using Multi-Criteria Decision Analysis: A Review of the Literature. Pharmacoeconomics. 2014;32(4):345-365.
140. Petrou S, Henderson J. Preference-Based Approaches to Measuring the Benefits of Perinatal Care. Birth. 2003;30(4):217-226.
141. Ryan M, Scott DA, Reeves C, et al. Eliciting public preferences for healthcare: a systematic review of techniques. Health technology assessment (Winchester, England). 2001;5(5):1-180.
142. Blinman P, King M, Norman R, Viney R, Stockler MR. Preferences for cancer treatments: An overview of methods and applications in oncology. Annals of Oncology. 2012;23(5):1104-1110.
288
143. Brett Hauber A, Fairchild AO, Reed Johnson F. Quantifying benefit-risk preferences for medical interventions: an overview of a growing empirical literature. Applied health economics and health policy. 2013;11(4):319-329.
144. Weernink MGM, Janus SIM, van Til JA, Raisch DW, van Manen JG, Ijzerman MJ. A Systematic Review to Identify the Use of Preference Elicitation Methods in Healthcare Decision Making. Pharmaceutical Medicine. 2014;28(4):175-185.
145. Huber J, Wittink DR, Fiedler JA, Miller R. The effectiveness of alterntive preference elicitation procedures in predicting choice. Journal of Marketing Research. 1993;30(1):105-114.
146. Saaty RW. The analytic hierarchy process—what it is and how it is used. Mathematical Modelling. 1987;9(3):161-176.
147. Saaty TL, Vargas LG. Models, Methods, Concepts & Applications of the Analytic Hierarchy Process. Springer US; 2012.
148. Liberatore MJ, Nydick RL. The analytic hierarchy process in medical and health care decision making: A literature review. European Journal of Operational Research. 2008;189(1):194-207.
149. Lootsma FA. Scale sensitivity in the multiplicative AHP and SMART. Journal of Multi-Criteria Decision Analysis. 1993;2(2):87-110.
150. Finan JS, Hurley WJ. Transitive calibration of the AHP verbal scale. European Journal of Operational Research. 1999;112(2):367-372.
151. Laininen P, Hamalainen RP. Analyzing AHP-matrices by regression. European Journal of Operational Research. 2003;148(3):514-524.
152. Genest C, Rivest LP. A Statistical Look at Saaty's Method of Estimating Pairwise Preferences Expressed on a Ratio Scale. Journal of Mathematical Psychology. 1994;38(4):477-496.
153. Bana e Costa CA, Vansnick J-C. A critical analysis of the eigenvalue method used to derive priorities in AHP. European Journal of Operational Research. 2008;187(3):1422-1428.
154. de Jong P. A statistical approach to Saaty's scaling method for priorities. Journal of Mathematical Psychology. 1984;28(4):467-478.
155. Crawford G, Williams C. A note on the analysis of subjective judgment matrices. Journal of Mathematical Psychology. 1985;29(4):387-405.
156. Alho JM, Kangas J, Kolehmainen O. Uncertainty in expert predictions of the ecological consequences of forest plans. Journal of the Royal Statistical Society Series C: Applied Statistics. 1996;45(1):1-14.
157. Altuzarra A, Moreno-Jimenez JM, Salvador M. A Bayesian priorization procedure for AHP-group decision making. European Journal of Operational Research. 2007;182(1):367-382.
158. Bana e Costa CA, Vansnick J-C. Applications of the MACBETH Approach in the Framework of an Additive Aggregation Model. Journal of Multi-Criteria Decision Analysis. 1997;6(2):107-114.
159. Bana e Costa CA, De Corte J-M, Vansnick J-C. On the Mathematical Foundations of MACBETH. In: Greco S, Ehrgott M, Figueira JR, eds. Multiple Criteria Decision Analysis: State of the Art Surveys. New York, NY: Springer New York; 2016:421-463.
160. Dodgson J, Spackman M, Pearman A, Phillips L. Multi-criteria analysis: a manual. London School of Economics and Political Science, Department of Economic History;2009.
161. Arons AM, Krabbe PF. Probabilistic choice models in health-state valuation research: background, theories, assumptions and applications. Expert review of pharmacoeconomics & outcomes research. 2013;13(1):93-108.
162. Ryan M, Bate A, Eastmond CJ, Ludbrook A. Use of discrete choice experiments to elicit preferences. Quality in health care : QHC. 2001;10 Suppl 1:i55-60.
163. Ryan M, Gerard K, Amaya-Amaya M, eds. Using discrete choice experiments to value health and health care. Dordrecht: Springer Academic Publishers; 2008. The economics of non-market goods and resources.
289
164. Lancsar E, Louviere J. Conducting discrete choice experiments to inform healthcare decision making: a user's guide. Pharmacoeconomics. 2008;26(8):661-667.
165. Clark MD, Determann D, Petrou S, Moro D, de Bekker-Grob EW. Discrete Choice Experiments in Health Economics: A Review of the Literature. Pharmacoeconomics. 2014;32(9):883-902.
166. Muhlbacher A, Johnson FR. Choice Experiments to Quantify Preferences for Health and Healthcare: State of the Practice. Applied health economics and health policy. 2016;14(3):253-266.
167. Louviere J, Hensher D, Swait J. Stated choice methods: analysis and application. Cambridge: Cambridge University Press; 2000.
168. Cheu RL, Nguyen HT, Magoc T, Kreinovich V. Logit discrete choice model: A new distribution-free justification. Soft Computing. 2009;13(2):133-137.
169. McFadden D. Conditional logit analysis of qualitative choice behaviour. In: Zarembka P, ed. Frontiers in econometrics. New York: Academic Press; 1974.
170. Ben-Akiva M, Morikawa T, Shiroishi F. Analysis of the reliability of preference ranking data. Journal of Business Research. 1992;24(2):149-164.
171. Lancsar E, Louviere J, Donaldson C, Currie G, Burgess L. Best worst discrete choice experiments in health: Methods and an application. Social Science & Medicine. 2013;76:74-82.
172. Böckenholt U. Comparative judgments as an alternative to ratings: Identifying the scale origin. Psychol Methods. 2004;9(4):453-465.
173. Zhang J, Johnson FR, Mohamed AF, Hauber AB. Too many attributes: A test of the validity of combining discrete-choice and best-worst scaling data. Journal of Choice Modelling. 2015;15:1-13.
174. Chakraborty G, Ball D, Gaeth GJ, Jun S. The ability of ratings and choice conjoint to predict market shares - A Monte Carlo simulation. Journal of Business Research. 2002;55(3):237-249.
175. Marshall P, Bradlow ET. A unified approach to conjoint analysis models. Journal of the American Statistical Association. 2002;97(459):674-682.
176. Montibeller G, von Winterfeldt D. Cognitive and Motivational Biases in Decision and Risk Analysis. Risk Anal. 2015;35(7):1230-1250.
177. Steele K, Carmel Y, Cross J, Wilcox C. Uses and misuses of multicriteria decision analysis (MCDA) in environmental decision making. Risk Anal. 2009;29(1):26-33.
178. Saint-Hilary G, Cadour S, Robert V, Gasparini M. A simple way to unify multicriteria decision analysis (MCDA) and stochastic multicriteria acceptability analysis (SMAA) using a Dirichlet distribution in benefit–risk assessment. Biometrical Journal. 2017;59(3):567-578.
179. Salo AA, Hamalainen RP. On the Measurement of Preferences in the Analytic Hierarchy Process. Journal of Multi-Criteria Decision Analysis. 1997;6:309-319.
180. Edwards W. How to Use Multiattribute Utility Measurement for Social Decisionmaking. IEEE Transactions on Systems, Man, and Cybernetics. 1977;7(5):326-340.
181. Arrow KJ. Social Choice and Individual Values. New York: John Wiley & Sons; 1951. 182. Sen A. The Impossibility of a Paretian Liberal. Journal of Political Economy. 1970;78(1):152-
157. 183. Keeney RL. A Group Preference Axiomatization with Cardinal Utility. Management Science.
1976;23(2):140-145. 184. Sen A. Collective choice and social welfare. 1970. 185. Keeney RL. Group Preference Axiomatization with Cardinal Utility. Management Science.
1976;23(2):140-145. 186. Harsanyi JC. Cardinal Welfare, Individualistic Ethics, and Interpersonal Comparisons of
Utility. Journal of Political Economy. 1955;63(4):309-321. 187. Hammond PJ. Harsanyi’s Utilitarian Theorem: A Simpler Proof and Some Ethical
Connotations. In: Selten R, ed. Rational Interaction: Essays in Honor of John C. Harsanyi. Berlin, Heidelberg: Springer Berlin Heidelberg; 1992:305-319.
290
188. Carlin BP. Bayes and empirical Bayes methods for data analysis. London: Chapman & Hall; 1996.
189. Haan P, Kemptner D, Uhlendorff A. Bayesian procedures as a numerical tool for the estimation of an intertemporal discrete choice model. Empirical Economics. 2015;49(3):1123-1141.
190. Daziano RA, Chiew E. On the effect of the prior of Bayes estimators of the willingness to pay for electric-vehicle driving range. Transportation Research Part D-Transport and Environment. 2013;21:7-13.
191. Boettger B, Thate-Waschke I-M, Bauersachs R, Kohlmann T, Wilke T. Preferences for anticoagulation therapy in atrial fibrillation: the patients' view. Journal of Thrombosis and Thrombolysis. 2015;40(4):406-415.
192. Lichtenstein GR, Waters HC, Kelly J, et al. Assessing drug treatment preferences of patients with Crohn's disease: A conjoint analysis. The Patient. 2010;3(2):113-123.
193. Hockley KA, D.; Das, S.; Hallgreen, C.; Mt-Isa, S.; Waddingham, E. ; Nicolas, R.; Talbot, S.; Stoeckert, I.; Genov, G.; Dil, Y.; Groves, J.; Johnson, R.; Lightbourne, A.; Mwangi, J.; Seal-Jones, R.; Elmachtoub, A.; Allen, C.; Thomson, A.; Lohrmann, E.; Micaleff, A.; Nixon, R.; Treacy, J.; Wise, L. PATIENT AND PUBLIC INVOLVEMENT REPORT version 1.0 - Recommendations for Patient and Public Involvement in the assessment of benefit and risk of medicines. PROTECT;2015.
194. Forman E, Peniwati K. Aggregating individual judgments and priorities with the Analytic Hierarchy Process. European Journal of Operational Research. 1998;108(1):165-169.
195. Lin C, Kou G. Bayesian revision of the individual pair-wise comparison matrices under consensus in AHP-GDM. Applied Soft Computing Journal. 2015;35:802-811.
196. Schatz NK, Fabiano GA, Cunningham CE, et al. Systematic Review of Patients’ and Parents’ Preferences for ADHD Treatment Options and Processes of Care. Patient. 2015;8(6):483-497.
197. Arroyo R, Sempere AP, Ruiz-Beato E, et al. Conjoint analysis to understand preferences of patients with multiple sclerosis for disease-modifying therapy attributes in Spain: a cross-sectional observational study. BMJ Open. 2017;7(3):e014433.
198. Garcia-Dominguez JM, Munoz D, Comellas M, Gonzalbo I, Lizan L, Polanco Sanchez C. Patient preferences for treatment of multiple sclerosis with disease-modifying therapies: a discrete choice experiment. Patient Prefer Adherence. 2016;10:1945-1956.
199. Mansfield C, Thomas N, Gebben D, Lucas M, Hauber AB. Preferences for Multiple Sclerosis Treatments: Using a Discrete-Choice Experiment to Examine Differences Across Subgroups of US Patients. Int J MS Care. 2017;19(4):172-183.
200. Poulos C, Kinter E, Yang JC, Bridges JF, Posner J, Reder AT. Patient Preferences for Injectable Treatments for Multiple Sclerosis in the United States: A Discrete-Choice Experiment. Patient. 2016;9(2):171-180.
201. Utz KS, Hoog J, Wentrup A, et al. Patient preferences for disease-modifying drugs in multiple sclerosis therapy: a choice-based conjoint analysis. Ther Adv Neurol Disord. 2014;7(6):263-275.
202. Wilson L, Loucks A, Bui C, et al. Patient centered decision making: use of conjoint analysis to determine risk-benefit trade-offs for preference sensitive treatment choices. J Neurol Sci. 2014;344(1-2):80-87.
203. Wilson LS, Loucks A, Gipson G, et al. Patient preferences for attributes of multiple sclerosis disease-modifying therapies: development and results of a ratings-based conjoint analysis. Int J MS Care. 2015;17(2):74-82.
204. Berkey CS, Hoaglin DC, Antczak-Bouckoms A, Mosteller F, Colditz GA. Meta-analysis of multiple outcomes by regression with random effects. Statistics in Medicine. 1998;17(22):2537-2550.
205. Thompson SG, Higgins JPT. How should meta-regression analyses be undertaken and interpreted? Statistics in Medicine. 2002;21(11):1559-1573.
291
206. Johnson FR, Lancsar E, Marshall D, et al. Constructing Experimental Designs for Discrete-Choice Experiments: Report of the ISPOR Conjoint Analysis Experimental Design Good Research Practices Task Force. Value in Health. 2013;16(1):3-13.
207. Kuhlfeld W. Marketing Research Methods in SAS. Cary, NC, USA: SAS Institute Inc.;2010. 208. Cox DR, Oakes D. Analysis of Survival Data. New York: Chapman and Hall; 1984. 209. Jackson D, White IR, Riley RD. Quantifying the impact of between-study heterogeneity in
multivariate meta-analyses. Statistics in medicine. 2012;31(29):3805-3820. 210. Higgins JPT, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses.
BMJ (Clinical research ed). 2003;327(7414):557-560. 211. McLachlan G, Peel D. Finite Mixture Models. New York: Wiley; 2000. 212. Hensher DA, Rose JM, Greene WH. Applied Choice Analysis. 2 ed. Cambridge: Cambridge
University Press; 2015. 213. Boyle KJ, Holmes TP, Teisl MF, Roe B. A comparison of conjoint analysis response formats.
American Journal of Agricultural Economics. 2001;83(2):441-454. 214. Pignone MP, Brenner AT, Hawley S, et al. Conjoint analysis versus rating and ranking for
values elicitation and clarification in colorectal cancer screening. Journal of general internal medicine. 2012;27(1):45-50.
215. Wijnen BFM, Van Der Putten IM, Groothuis S, et al. Discrete-choice experiments versus rating scale exercises to evaluate the importance of attributes. Expert Review of Pharmacoeconomics and Outcomes Research. 2015;15(4):721-728.
216. Zagonari F. Choosing among weight-estimation methods for multi-criterion analysis: A case study for the design of multi-purpose offshore platforms. Applied Soft Computing Journal. 2016;39:1-10.
217. Ter Hofstede F, Kim Y, Wedel M. Bayesian prediction in hybrid conjoint analysis. Journal of Marketing Research. 2002;39(2):253-261.
218. Louviere JJ, Fox MF, Moore WL. Cross-task validity comparisons of stated preference choice models. Marketing Letters. 1993;4(3):205-213.
219. Herrera F, Herrera-Viedma E, Chiclana F. Multiperson decision-making based on multiplicative preference relations. European Journal of Operational Research. 2001;129(2):372-385.
220. Musal RM, Soyer R. Bayesian Modeling of Health State Preferences. 2010. 221. Musal RM, Soyer R, McCabe C, Kharroubi SA. Estimating the population utility function: A
parametric Bayesian approach. European Journal of Operational Research. 2012;218(2):538-547.
222. Bacon L, Lenk P. Augmenting discrete-choice data to identify common preference scales for inter-subject analyses. Qme-Quantitative Marketing and Economics. 2012;10(4):453-474.
223. Leskinen P, Kangas AS, Kangas J. Rank-based modelling of preferences in multi-criteria decision making. European Journal of Operational Research. 2004;158(3):721-733.
224. Flynn TN, Louviere JJ, Peters TJ, Coast J. Best-worst scaling: What it can do for health care research and how to do it. Journal of Health Economics. 2007;26(1):171-189.
225. Marley AAJ, Louviere JJ. Some probabilistic models of best, worst, and best-worst choices. Journal of Mathematical Psychology. 2005;49(6):464-480.
226. Luyten J, Kessels R, Goos P, Beutels P. Public preferences for prioritizing preventive and curative health care interventions: a discrete choice experiment. Value in health : the journal of the International Society for Pharmacoeconomics and Outcomes Research. 2015;18(2):224-233.
227. Schmieder A, Schaarschmidt M-L, Umar N, et al. Comorbidities significantly impact patients' preferences for psoriasis treatments. Journal of the American Academy of Dermatology. 2012;67(3):363-372.
292
228. Najafzadeh M, Gagne JJ, Choudhry NK, Polinski JM, Avorn J, Schneeweiss SS. Patients' Preferences in Anticoagulant Therapy Discrete Choice Experiment. Circulation-Cardiovascular Quality and Outcomes. 2014;7(6):912-919.
229. EMA. Use of multiple sclerosis medicine Lemtrada restricted while EMA review is ongoing. 2019; https://www.ema.europa.eu/en/news/use-multiple-sclerosis-medicine-lemtrada-restricted-while-ema-review-ongoing. Accessed 16 July 2019.
230. Cooper NJ, Sutton AJ, Abrams KR, Turner D, Wailoo A. Comprehensive decision analytical modelling in economic evaluation: a Bayesian approach. Health Economics. 2004;13(3):203-226.
231. Sun L, van Kooten GC. Comparing Fuzzy and Probabilistic Approaches to Preference Uncertainty in Non-Market Valuation. Environmental & Resource Economics. 2009;42(4):471-489.
232. EMA. ICH guideline E3 on structure and content of clinical study reports. 01/07/1996 1996. 233. Mott DJ, Najafzadeh M. Whose preferences should be elicited for use in health-care
decision-making? A case study using anticoagulant therapy. Expert Review of Pharmacoeconomics & Outcomes Research. 2016;16(1):33-39.
234. Keeney RL, Kirkwood CW. Group Decision Making Using Cardinal Social Welfare Functions. Management Science. 1975;22(4):430-437.
235. Greco S, Kadzinski M, Mousseau V, Slowinski R. Robust ordinal regression for multiple criteria group decision: UTA(GMS)-GROUP and UTADIS(GMS)-GROUP. Decision Support Systems. 2012;52(3):549-561.
236. Hahn ED. Judgmental consistency and consensus in stochastic multicriteria decision making. Expert Systems with Applications. 2010;37(5):3784-3791.
237. Kunsch PL. A statistical multi-criteria procedure with stochastic preferences. International Journal of Multicriteria Decision Making. 2010;1(1):49-73.
238. Moreno-Jiménez JM, Salvador M, Gargallo P, Altuzarra A. Systemic decision making in AHP: a Bayesian approach. Annals of Operations Research. 2014.
239. Caster O. Benefit-Risk Assessment in Pharmacovigilance. In: Bate A, ed. Evidence-Based Pharmacovigilance: Clinical and Quantitative Aspects. New York, NY: Springer New York; 2018:233-257.
240. Mukhopadhyay S, Dilley K, Oladipo A, Jokinen J. Hierarchical Bayesian Benefit–Risk Modeling and Assessment Using Choice Based Conjoint. Statistics in Biopharmaceutical Research. 2019;11(1):52-60.
241. Robert CP, Elvira V, Tawn N, Wu C. Accelerating MCMC algorithms. Wiley Interdisciplinary Reviews: Computational Statistics. 2018;10(5):e1435.
242. Daziano RA. Inference on mode preferences, vehicle purchases, and the energy paradox using a Bayesian structural choice model. Transportation Research Part B: Methodological. 2015;76:1-26.
243. Goossens LM, Utens CM, Smeenk FW, Donkers B, van Schayck OC, Rutten-van Molken MP. Should I stay or should I go home? A latent class analysis of a discrete choice experiment on hospital-at-home. Value in health : the journal of the International Society for Pharmacoeconomics and Outcomes Research. 2014;17(5):588-596.
244. Magor TJ, Coote LV. Latent variables as a proxy for inherent preferences: A test of antecedent volition. Journal of Choice Modelling. 2014;13:24-36.
293
Appendices
Appendix A
294
Appendix A. Source data for RRMS case study
1 Clinical evidence synthesis
Network diagrams by outcome
Relapse rate
Relapse-free proportion
Disability progression confirmed 3 months later
Disability progression confirmed 6 months later
ALT above ULN
ALT above 3x ULN
ALT above 5x ULN
Serious GI disorders
Serious bradycardia, macular edema
DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide.
Appendix A
295
Raw arm-level data
The tables below show the raw trial data for each of the RRMS case study outcomes in turn. N = total number of patients in arm, n = number of patients experiencing given binary outcome, se = standard error. * indicates that the value was estimated based on other reported quantities.
ANNUALISED RELAPSE RATE Arm 1 Arm 2 Arm 3
Study # arms Drug N Estimate(se) Drug N Estimate(se) Drug N Estimate(se)
BRAVO 2014 3 PL 450 0.34 (0.03) IA (IM) 447 0.26 (0.02) LQ 434 0.28 (0.03)
CONFIRM 2012 3 PL 363 0.4 (0.04) DF 359 0.22 (0.025) GA 350 0.29 (0.03)
ALLEGRO 2012 2 PL 556 0.39 (0.03) LQ 550 0.3 (0.02)
BECOME 2009 2 GA 39 0.33 (0.101*) IB 36 0.37 (0.112*)
BEYOND 2009 2 GA 448 0.34 (0.03*) IB 897 0.36 (0.022*)
DEFINE 2012 2 PL 408 0.36 (0.035) DF 409 0.17 (0.02)
FREEDOMS 2010 2 PL 418 0.4 (0.033) FM 425 0.18 (0.02)
FREEDOMS II 2014 2 PL 355 0.4 (0.035) FM 358 0.21 (0.02)
INCOMIN 2002 2 IB 94 0.5 (0.071) IA (IM) 88 0.7 (0.094)
JOHNSON 1995 2 PL 126 0.84 (0.09*) GA 125 0.59 (0.076*)
MSCRG 1996 2 PL 143 0.82 (0.083*) IA (IM) 158 0.67 (0.072*)
PRISMS 1998 2 PL 187 1.28 (0.091*) IA (SC) 189 1.73 (0.107*)
REGARD 2008 2 IA (SC) 381 0.3 (0.031*) GA 375 0.29 (0.03*)
TEMSO 2011 2 PL 363 0.54 (0.038) TF 358 0.37 (0.033)
BORNSTEIN 1987 2 PL 23 1.35 (0.266*) GA 25 0.3 (0.965*)
IFNB 1993 2 PL 112 1.27 (0.117*) IB 115 0.84 (0.095*) Note: for Model 0, the number of relapse events and person-years are the required data items. As these were generally unreported, they were constructed so as to match the estimated annualised rates, i.e.
#𝑒𝑣𝑒𝑛𝑡𝑠 = 𝑎𝑛𝑛𝑢𝑎𝑙𝑖𝑠𝑒𝑑 𝑟𝑒𝑙𝑎𝑝𝑠𝑒 𝑟𝑎𝑡𝑒 × 𝑝𝑒𝑟𝑠𝑜𝑛𝑦𝑒𝑎𝑟𝑠 with 𝑝𝑒𝑟𝑠𝑜𝑛𝑦𝑒𝑎𝑟𝑠 set to 4
3 𝑁 i.e. an assumption that each participant contributes two thirds of the 2-year
study period on average).
RELAPSE-FREE PROPORTION Arm 1 Arm 2 Arm 3
Study # arms Drug n(N) Drug n(N) Drug n(N)
BRAVO 2014 3 PL 275 (450) IA (IM) 308 (447) LQ 286 (434)
CONFIRM 2012 3 PL 214 (363) DF 255 (359) GA 238 (350)
ALLEGRO 2012 2 PL 290 (556) LQ 346 (550)
BEYOND 2009 2 GA 327 (448) IB 655 (897)
DEFINE 2012 2 PL 220 (408) DF 299 (410)
FREEDOMS 2010 2 PL 191 (418) FM 299 (425)
FREEDOMS II 2014 2 PL 187 (355) FM 256 (358)
INCOMIN 2002 2 IB 49 (96) IA (IM) 33 (92)
JOHNSON 1995 2 PL 34 (126) GA 42 (125)
MSCRG 1996 2 PL 23 (87) IA (IM) 32 (85)
PRISMS 1998 2 PL 30 (187) IA (SC) 59 (184)
REGARD 2008 2 IA (SC) 239 (386) GA 234 (378)
TEMSO 2011 2 PL 166 (363) TF 202 (358)
BORNSTEIN 1987 2 PL 6 (23) GA 14 (25)
IFNB 1993 2 PL 18 (112) IB 36 (115)
Appendix A
296
DISABILITY PROGRESSION CONFIRMED 3 MONTHS LATER Arm 1 Arm 2 Arm 3
Study # arms Drug n(N) Drug n(N) Drug n(N)
BRAVO 2014 3 PL 60 (450) IV (IM) 47 (447) LQ 42 (434)
CONFIRM 2012 3 PL 62 (363) DF 47 (359) GA 56 (350)
ALLEGRO 2012 2 PL 87 (556) LQ 61 (550)
BEYOND 2009 2 GA 90 (448) IB 188 (897)
DEFINE 2012 2 PL 110 (408) DF 65 (409)
FREEDOMS 2010 2 PL 101 (418) FM 75 (425)
FREEDOMS II 2014 2 PL 103 (355) FM 91 (358)
JOHNSON 1995 2 PL 31 (126) GA 27 (125)
PRISMS 1998 2 PL 71 (187) IA (SC) 51 (189)
TEMSO 2011 2 PL 99 (363) TF 72 (358)
BORNSTEIN 1987 2 PL 11 (23) GA 5 (25)
DISABILITY PROGRESSION CONFIRMED 6 MONTHS LATER Arm 1 Arm 2 Arm 3
Study # arms Drug n(N) Drug n(N) Drug n(N)
BRAVO 2014 3 PL 46 (450) IV (IM) 35 (447) LQ 28 (434)
CONFIRM 2012 3 PL 45 (363) DF 28 (359) GA 38 (350)
ALLEGRO 2012 2 PL 78 (556) LQ 54 (550)
FREEDOMS 2010 2 PL 79 (418) FM 53 (425)
FREEDOMS II 2014 2 PL 63 (355) FM 49 (358)
INCOMIN 2002 2 IB 13 (96) IV (IM) 28 (92)
MSCRG 1996 2 PL 50 (143) IV (IM) 35 (158)
REGARD 2008 2 IA (SC) 45 (386) GA 33 (378)
ALT ABOVE UPPER LIMIT OF NORMAL RANGE Arm 1 Arm 2 Arm 3
Study # arms Drug n(N) Drug n(N) Drug n(N)
BRAVO 2014 3 PL 84 (415) IV (IM) 131 (413) LQ 127 (384)
CONFIRM 2012 3 PL 149 (362) DF 167 (355) GA 129 (346)
ALLEGRO 2012 2 PL 99 (515) LQ 175 (504)
BEYOND 2009 2 GA 16 (445) IB 99 (888)
FREEDOMS II 2014 2 PL 18 (355) FM 62 (358)
PRISMS 1998 2 PL 2 (187) IA (SC) 10 (184)
REGARD 2008 2 IA (SC) 21 (381) GA 5 (375)
TEMSO 2011 2 PL 129 (360) TF 205 (358)
Appendix A
297
ALT ABOVE 3x UPPER LIMIT OF NORMAL RANGE Arm 1 Arm 2 Arm 3
Study # arms Drug n(N) Drug n(N) Drug n(N)
BRAVO 2014 3 PL 10 (415) IV (IM) 11 (413) LQ 16 (384)
CONFIRM 2012 3 PL 23 (362) DF 20 (355) GA 24 (346)
ALLEGRO 2012 2 PL 8 (515) LQ 24 (504)
DEFINE 2012 2 PL 12.24 (408) DF 24.6 (410)
FREEDOMS 2010 2 PL 7 (418) FM 36 (425)
FREEDOMS II 2014 2 PL 12 (355) FM 33 (358)
TEMSO 2011 2 PL 24 (360) TF 24 (358)
ALT ABOVE 5x UPPER LIMIT OF NORMAL RANGE Arm 1 Arm 2 Arm 3
Study # arms Drug n(N) Drug n(N) Drug n(N)
BRAVO 2014 3 PL 7 (415) IV (IM) 5 (413) LQ 4 (384)
CONFIRM 2012 3 PL 13 (363) DF 7 (355) GA 10 (346)
FREEDOMS 2010 2 PL 4 (418) FM 8 (425)
FREEDOMS II 2014 2 PL 4 (355) FM 8 (358)
SERIOUS GASTROINTESTINAL DISORDERS Arm 1 Arm 2 Arm 3
Study # arms Drug n(N) Drug n(N) Drug n(N)
CONFIRM 2012 3 PL 0 (363) DF 4 (359) GA 0 (351)
ALLEGRO 2012 2 PL 1 (556) LQ 8 (550)
DEFINE 2012 2 PL 0 (408) DF 4 (410)
TEMSO 2011 2 PL 1 (360) TF 8 (358)
SERIOUS BRADYCARDIA Arm 1 Arm 2
Study # arms Drug n(N) Drug n(N)
FREEDOMS 2010 2 PL 1 (418) FM 4 (425)
FREEDOMS II 2014 2 PL 1 (355) FM 0 (358)
MACULAR EDEMA Arm 1 Arm 2
Study # arms Drug n(N) Drug n(N)
FREEDOMS 2010 2 PL 0 (418) FM 0 (425)
FREEDOMS II 2014 2 PL 0 (355) FM 1 (358)
Appendix A
298
Arm-level data on Normal scale
The tables below show the transformed Normal trial data for each of the RRMS case study outcomes
in turn. The uncertainty is described using variances (which relate to the distribution of each
outcome within the study arm) rather than standard errors (which relate to the sampling
distribution of the mean). The two are of course linked by the relation 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟 =
√𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒/𝑁.
DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a,
IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. N =
total number of patients in arm, n = number of patients experiencing given binary outcome, va=variance. *
indicates that the value was estimated based on other reported quantities.
LOG ANNUALISED RELAPSE RATE Arm 1 Arm 2 Arm 3
Study # arms Drug N Estimate(va) Drug N Estimate(va) Drug N Estimate(va)
BRAVO 2014 3 PL 450 -1.08 (3.52) IA (IM) 447 -1.35 (2.66) LQ 434 -1.27 (5.02)
CONFIRM 2012 3 PL 363 -0.92 (3.69) DF 359 -1.51 (4.56) GA 350 -1.24 (4.02)
ALLEGRO 2012 2 PL 556 -0.94 (3.3) LQ 550 -1.2 (2.45)
BECOME 2009 2 GA 39 -1.11 (3.87) IB 36 -0.99 (3.44)
BEYOND 2009 2 GA 448 -1.08 (3.57) IB 897 -1.02 (3.37)
DEFINE 2012 2 PL 408 -1.02 (3.89) DF 409 -1.77 (4.38)
FREEDOMS 2010 2 PL 418 -0.92 (2.85) FM 425 -1.71 (4.06)
FREEDOMS II 2014 2 PL 355 -0.92 (2.75) FM 358 -1.56 (3.47)
INCOMIN 2002 2 IB 94 -0.69 (2.45) IA (IM) 88 -0.36 (1.75)
JOHNSON 1995 2 PL 126 -0.17 (1.45) GA 125 -0.53 (2.07)
MSCRG 1996 2 PL 143 -0.2 (1.48) IA (IM) 158 -0.4 (1.82)
PRISMS 1998 2 PL 187 0.25 (0.95) IA (SC) 189 0.55 (0.7)
REGARD 2008 2 IA (SC) 381 -1.2 (4.12) GA 375 -1.24 (4.13)
TEMSO 2011 2 PL 363 -0.62 (1.81) TF 358 -0.99 (2.86)
BORNSTEIN 1987 2 PL 23 0.3 (0.92) GA 25 -1.2 (4.43)
IFNB 1993 2 PL 112 0.24 (0.96) IB 115 -0.17 (1.45)
Appendix A
299
LOG ODDS OF AVOIDING RELAPSE Arm 1 Arm 2 Arm 3
Study # arms Drug N Estimate(va) Drug N Estimate(va) Drug N Estimate(va)
BRAVO 2014 3 PL 450 0.45 (4.21) IA (IM) 447 0.8 (4.67) LQ 434 0.66 (4.45)
CONFIRM 2012 3 PL 363 0.36 (4.13) DF 359 0.9 (4.86) GA 350 0.75 (4.6)
ALLEGRO 2012 2 PL 556 0.09 (4.01) LQ 550 0.53 (4.29)
BEYOND 2009 2 GA 448 0.99 (5.07) IB 897 1 (5.08)
DEFINE 2012 2 PL 408 0.16 (4.02) DF 409 0.99 (5.06)
FREEDOMS 2010 2 PL 418 -0.17 (4.03) FM 425 0.86 (4.79)
FREEDOMS II 2014 2 PL 355 0.11 (4.01) FM 358 0.92 (4.91)
INCOMIN 2002 2 IB 94 0.04 (4) IA (IM) 88 -0.58 (4.35)
JOHNSON 1995 2 PL 126 -1 (5.08) GA 125 -0.68 (4.48)
MSCRG 1996 2 PL 143 -1.02 (5.14) IA (IM) 158 -0.5 (4.26)
PRISMS 1998 2 PL 187 -1.66 (7.42) IA (SC) 189 -0.75 (4.59)
REGARD 2008 2 IA (SC) 381 0.49 (4.24) GA 375 0.49 (4.24)
TEMSO 2011 2 PL 363 -0.17 (4.03) TF 358 0.26 (4.07)
BORNSTEIN 1987 2 PL 23 -1.04 (5.19) GA 25 0.24 (4.06)
IFNB 1993 2 PL 112 -1.65 (7.41) IB 115 -0.79 (4.65)
LOG ODDS OF DISABILITY PROGRESSION, CONFIRMED 3 MONTHS LATER Arm 1 Arm 2 Arm 3
Study # arms Drug N Estimate(va) Drug N Estimate(va) Drug N Estimate(va)
BRAVO 2014 3 PL 450 -1.87 (8.65) IA (IM) 447 -2.14 (10.63) LQ 434 -2.23 (11.44)
CONFIRM 2012 3 PL 363 -1.58 (7.06) DF 359 -1.89 (8.79) GA 350 -1.66 (7.44)
ALLEGRO 2012 2 PL 556 -1.68 (7.58) LQ 550 -2.08 (10.14)
BEYOND 2009 2 GA 448 -1.38 (6.23) IB 897 -1.33 (6.04)
DEFINE 2012 2 PL 408 -1 (5.08) DF 409 -1.67 (7.48)
FREEDOMS 2010 2 PL 418 -1.14 (5.46) FM 425 -1.54 (6.88)
FREEDOMS II 2014 2 PL 355 -0.89 (4.86) FM 358 -1.08 (5.27)
JOHNSON 1995 2 PL 126 -1.12 (5.39) GA 125 -1.29 (5.91)
PRISMS 1998 2 PL 187 -0.49 (4.25) IA (SC) 189 -1 (5.08)
TEMSO 2011 2 PL 363 -0.98 (5.04) TF 358 -1.38 (6.22)
BORNSTEIN 1987 2 PL 23 -0.09 (4.01) DF 25 -1.39 (6.25)
LOG ODDS OF DISABILITY PROGRESSION, CONFIRMED 6 MONTHS LATER Arm 1 Arm 2 Arm 3
Study # arms Drug N Estimate(va) Drug N Estimate(va) Drug N Estimate(va)
BRAVO 2014 3 PL 450 -2.17 (10.9) IA (IM) 447 -2.47 (13.86) LQ 434 -2.67 (16.57)
CONFIRM 2012 3 PL 363 -1.96 (9.21) DF 359 -2.47 (13.91) GA 350 -2.11 (10.33)
ALLEGRO 2012 2 PL 556 -1.81 (8.29) LQ 550 -2.22 (11.29)
FREEDOMS 2010 2 PL 418 -1.46 (6.52) FM 425 -1.95 (9.16)
FREEDOMS II 2014 2 PL 355 -1.53 (6.85) FM 358 -1.84 (8.46)
INCOMIN 2002 2 IB 94 -1.85 (8.54) IA (IM) 88 -0.83 (4.72)
MSCRG 1996 2 PL 143 -0.62 (4.4) IA (IM) 158 -1.26 (5.8)
REGARD 2008 2 IA (SC) 381 -2.03 (9.71) GA 375 -2.35 (12.55)
Appendix A
300
LOG ODDS OF ALT ABOVE UPPER LIMIT OF NORMAL RANGE Arm 1 Arm 2 Arm 3
Study # arms Drug N Estimate(va) Drug N Estimate(va) Drug N Estimate(va)
BRAVO 2014 3 PL 450 -1.37 (6.19) IA (IM) 447 -0.77 (4.62) LQ 434 -0.7 (4.52)
CONFIRM 2012 3 PL 363 -0.36 (4.13) DF 359 -0.12 (4.01) GA 350 -0.52 (4.28)
ALLEGRO 2012 2 PL 556 -1.44 (6.44) LQ 550 -0.63 (4.41)
BEYOND 2009 2 GA 448 -3.29 (28.85) IB 897 -2.08 (10.1)
FREEDOMS II 2014 2 PL 355 -2.93 (20.78) FM 358 -1.56 (6.98)
PRISMS 1998 2 PL 187 -4.53 (94.51) IA (SC) 189 -2.86 (19.46)
REGARD 2008 2 IA (SC) 381 -2.84 (19.2) GA 375 -4.3 (76.01)
TEMSO 2011 2 PL 363 -0.58 (4.35) TF 358 0.29 (4.09)
LOG ODDS OF ALT ABOVE 3x UPPER LIMIT OF NORMAL RANGE Arm 1 Arm 2 Arm 3
Study # arms Drug N Estimate(va) Drug N Estimate(va) Drug N Estimate(va)
BRAVO 2014 3 PL 450 -3.7 (42.52) IA (IM) 447 -3.6 (38.57) LQ 434 -3.14 (25.04)
CONFIRM 2012 3 PL 363 -2.69 (16.81) DF 359 -2.82 (18.81) GA 350 -2.6 (15.49)
ALLEGRO 2012 2 PL 556 -4.15 (65.39) LQ 550 -3 (22.05)
DEFINE 2012 2 PL 408 -3.48 (34.36) DF 409 -2.75 (17.73)
FREEDOMS 2010 2 PL 418 -4.07 (60.73) FM 425 -2.38 (12.9)
FREEDOMS II 2014 2 PL 355 -3.35 (30.62) FM 358 -2.29 (11.95)
TEMSO 2011 2 PL 363 -2.64 (16.07) TF 358 -2.63 (15.99)
LOG ODDS OF ALT ABOVE 5x UPPER LIMIT OF NORMAL RANGE Arm 1 Arm 2 Arm 3
Study # arms Drug N Estimate(va) Drug N Estimate(va) Drug N Estimate(va)
BRAVO 2014 3 PL 450 -4.07 (60.3) IA (IM) 447 -4.4 (83.61) LQ 434 -4.55 (97.01)
CONFIRM 2012 3 PL 363 -3.29 (28.96) DF 359 -3.91 (51.73) GA 350 -3.51 (35.63)
FREEDOMS 2010 2 PL 418 -4.64 (105.51) FM 425 -3.95 (54.14)
FREEDOMS II 2014 2 PL 355 -4.47 (89.76) FM 358 -3.78 (45.77)
Appendix A
301
The variances given for the outcomes below are calculated as (0.025 + 𝑝)(0.975 − 𝑝) × 100 𝑁⁄ (where p is the estimated risk) as per II.6.1.5.
RISK OF SERIOUS GASTROINTESTINAL DISORDERS Arm 1 Arm 2 Arm 3
Study # arms Drug N Estimate(va) Drug N Estimate(va) Drug N Estimate(se)
CONFIRM 2012 3 PL 363 0 (0.0067) DF 359 0.011 (0.0097) GA 350 0 (0.0069)
ALLEGRO 2012 2 PL 556 0.002 (0.0047) LQ 550 0.015 (0.0069)
DEFINE 2012 2 PL 408 0 (0.006) DF 409 0.01 (0.0082)
TEMSO 2011 2 PL 363 0.003 (0.0075) DF 358 0.022 (0.0126)
RISK OF SERIOUS BRADYCARDIA Arm 1 Arm 2
Study # arms Drug N Estimate(va) Drug N Estimate(se)
FREEDOMS 2010 2 PL 418 0.002 (0.0064) FM 425 0.009 (0.0078)
FREEDOMS II 2014 2 PL 355 0.003 (0.0076) FM 358 0 (0.0068)
RISK OF MACULAR EDEMA Arm 1 Arm 2
Study # arms Drug N Estimate(va) Drug N Estimate(va)
FREEDOMS 2010 2 PL 418 0 (0.0058) FM 425 0 (0.0057)
FREEDOMS II 2014 2 PL 355 0 (0.0069) FM 358 0.003 (0.0075)
Appendix A
302
Outcome proportionality plots
Relapse-free proportion vs annualised relapse rate
These effects are assumed to occur in proportion in all mapping strategies used.
Appendix A
303
Disability progression confirmed 6 months later vs 3 months later
These effects are assumed to occur in proportion in all mapping strategies used.
ALT above 3x ULN vs ALT above ULN
These effects are assumed to occur in proportion in all the mapping strategies used.
Appendix A
304
ALT above 5x ULN vs ALT above ULN
These effects are assumed to occur in proportion in all mapping strategies used.
Disability progression confirmed 3 months later vs annualised relapse rate
These effects are assumed to occur in proportion in the one-group and two-group models.
Appendix A
305
Disability progression confirmed 6 months later vs annualised relapse rate
These effects are assumed to occur in proportion in the one-group and two-group models.
ALT above ULN vs annualised relapse rate
These effects are assumed to occur in proportion in the one-group models.
Appendix A
306
ALT above 3xULN vs annualised relapse rate
These effects are assumed to occur in proportion in the one-group models.
ALT above 5xULN vs annualised relapse rate
These effects are assumed to occur in proportion in the one-group models.
Appendix A
307
2 PROTECT datasets
Investigator ratings
The raw pairwise ratings in the PROTECT investigator ratings dataset are shown in the table below.
PML = progressive focal leukoencephalopathy.
Pairwise comparison Participant 1 Participant 2 Participant 3
Avoid a relapse vs avoid a disability progression 0.7 0.7 0.6
Avoid a disability progression vs avoid PML 0.1 0.9 0.9
Daily subcutaneous -> daily oral vs avoid PML 0.01 0.1 0.1
Avoid herpes reactivation vs avoid PML 0.12 0.2 0.3
Avoid liver enzyme elevation vs avoid PML 0.2 0.2 0.2
Avoid seizures vs avoid PML 0.1 0.1 0.1
Avoid congenital abnormalities vs avoid PML 0.1 0.1 0.1
Avoid infusion/injection reactions vs avoid PML 0.05 0.05 0.05
Avoid allergic/hypersensitivity reactions vs avoid infusion/injection reactions 0.4 0.4 0.89
Avoid flu-like reactions vs avoid infusion/injection reactions 0.4 0.4 1.11
Daily subcutaneous -> daily oral vs daily subcutaneous -> monthly intravenous infusion 0.7 0.7 0.7
Daily subcutaneous -> daily oral vs daily subcutaneous -> weekly intramuscular 0.5 0.5 0.5
Patient ratings
The relative ratings for administration modes were the only data used from the PROTECT patient
ratings study and are shown in the table below. NA indicates missing values, i.e. questions left
unanswered by the respondent. As the AHP method was used to elicit these ratings, they all take a
value of 1,3,5,7,9 or a reciprocal thereof.
Appendix A
308
Participant Monthly
infusion vs
weekly
intramuscular
Daily
subcutaneous
vs monthly
infusion
Weekly
intramuscular
vs daily
subcutaneous
Daily oral vs
weekly
intramuscular
Daily
subcutaneous
vs daily oral
Daily oral
vs monthly
infusion
1 0.111111 0.142857 7 3 0.333333 3
2 3 0.2 0.111111 5 5 5
3 3 0.333333 9 3 0.333333 9
4 NA NA 9 3 0.333333 7
5 5 0.333333 NA 3 0.333333 3
6 3 0.333333 3 0.333333 0.333333 0.333333
7 3 0.333333 5 3 0.333333 9
8 3 0.333333 3 1 1 0.333333
9 1 1 1 1 1 1
10 5 0.2 9 7 0.142857 1
11 9 0.111111 1 1 1 1
12 0.111111 3 0.2 5 1 5
13 1 1 1 5 0.2 5
14 3 0.333333 3 3 0.333333 3
15 5 0.333333 9 3 0.333333 0.333333
16 3 0.333333 3 3 0.333333 3
17 3 0.333333 1 3 0.333333 3
18 9 0.111111 1 3 0.333333 3
19 3 0.333333 0.2 3 0.333333 1
20 5 0.2 9 3 0.333333 3
21 3 0.333333 1 3 0.333333 0.2
22 9 9 0.142857 3 0.2 3
23 0.142857 0.142857 7 3 0.333333 3
24 0.2 5 9 3 0.333333 3
25 0.333333 0.333333 0.333333 3 0.333333 3
26 5 0.333333 0.2 3 0.2 5
27 3 0.333333 9 0.111111 0.111111 0.333333
28 3 0.333333 7 3 0.333333 0.2
29 1 0.142857 9 5 0.2 5
30 3 0.333333 3 3 0.333333 3
31 3 0.333333 9 5 0.333333 0.333333
32 NA NA NA NA NA NA
33 9 7 0.142857 3 0.142857 7
34 0.142857 0.2 5 3 0.333333 3
35 3 0.333333 5 3 0.333333 1
36 0.111111 0.111111 9 0.111111 9 0.111111
Appendix A
309
Patient choices
The PROTECT patient choice data consists of 1755 individual choices, generated by 124 individuals
presented with 16 choices each. This would be prohibitively long to show in full at the individual
level, and instead is shown below in collapsed form, i.e. with one row per choice set (of which 64
were used in total) and, for each choice set, the number of participants who made each choice out
of those who responded. Missing responses have been excluded. This is the form in which the data
was analysed and is equivalent to the full data provided one is not concerned with preference
variability at the individual level.
N = total respondents, n = number of respondents choosing option B, ARR = annualised relapse rate,
DP = disability progression, PML = progressive multifocal leukoencephalopathy, A/H = allergic-
hypersensitivity reactions, SA = serious allergic reactions, DEP = depression.
Option A Option B
Choice
set
N n ARR DP
risk
PML
risk
A/H
risk
SA
risk
DEP
risk
ARR DP
risk
PML
risk
A/H
risk
SA
risk
DEP
risk
1 25 21 1 0.25 0 0 0 0.1 0.75 0.1 0 0.5 0 0.2
2 25 12 1 0.1 0.003 0.5 0.02 0.2 0.75 0.25 0 0.5 0.02 0.1
3 24 2 1 0.1 0 0 0.02 0.1 1 0.25 0.003 0.5 0 0.1
4 25 25 0.75 0.25 0 0.5 0.02 0.2 1 0.1 0 0 0 0.2
5 25 18 0.75 0.25 0.003 0.5 0 0.2 0.75 0.25 0.003 0 0.02 0.1
6 24 8 1 0.1 0 0.5 0 0.1 0.75 0.1 0.003 0 0.02 0.1
7 25 1 0.75 0.1 0.003 0 0 0.2 0.75 0.25 0.003 0.5 0.02 0.1
8 25 6 1 0.1 0.003 0 0 0.2 1 0.25 0 0 0.02 0.2
9 27 21 1 0.25 0.003 0 0 0.2 1 0.25 0 0.5 0 0.1
10 27 2 0.75 0.1 0 0 0.02 0.2 1 0.25 0.003 0 0.02 0.1
11 27 15 1 0.25 0 0.5 0 0.2 0.75 0.25 0 0 0.02 0.2
12 27 20 0.75 0.1 0.003 0.5 0 0.2 0.75 0.1 0 0.5 0.02 0.1
13 27 9 1 0.1 0 0.5 0.02 0.2 0.75 0.1 0.003 0.5 0.02 0.1
14 27 15 0.75 0.25 0.003 0 0 0.1 1 0.1 0.003 0 0.02 0.2
15 27 19 1 0.25 0 0 0.02 0.1 0.75 0.1 0.003 0 0 0.1
16 27 4 0.75 0.1 0.003 0.5 0.02 0.2 1 0.25 0.003 0.5 0 0.2
17 32 23 1 0.25 0.003 0 0 0.2 1 0.25 0 0.5 0 0.1
18 32 2 1 0.25 0 0 0 0.2 1 0.25 0 0 0.02 0.1
19 32 13 1 0.25 0 0.5 0 0.2 0.75 0.25 0 0 0.02 0.2
20 31 20 0.75 0.1 0.003 0.5 0 0.2 0.75 0.1 0 0.5 0.02 0.1
21 31 13 1 0.1 0 0.5 0.02 0.2 0.75 0.1 0.003 0.5 0.02 0.1
22 32 16 0.75 0.25 0.003 0 0 0.1 1 0.1 0.003 0 0.02 0.2
23 30 24 1 0.25 0 0 0.02 0.1 0.75 0.1 0.003 0 0 0.1
24 31 6 0.75 0.1 0.003 0.5 0.02 0.2 1 0.25 0.003 0.5 0 0.2
25 30 20 1 0.25 0 0 0 0.1 0.75 0.1 0 0.5 0 0.2
26 30 18 1 0.1 0.003 0.5 0.02 0.2 0.75 0.25 0 0.5 0.02 0.1
Appendix A
310
27 30 2 1 0.1 0 0 0.02 0.1 1 0.25 0.003 0.5 0 0.1
28 29 28 0.75 0.25 0 0.5 0.02 0.2 1 0.1 0 0 0 0.2
29 29 19 0.75 0.25 0.003 0.5 0 0.2 0.75 0.25 0.003 0 0.02 0.1
30 30 10 1 0.1 0 0.5 0 0.1 0.75 0.1 0.003 0 0.02 0.1
31 30 4 0.75 0.1 0.003 0 0 0.2 0.75 0.25 0.003 0.5 0.02 0.1
32 30 9 1 0.1 0.003 0 0 0.2 1 0.25 0 0 0.02 0.2
33 23 20 0.75 0.25 0.003 0 0.02 0.2 0.75 0.1 0.003 0.5 0 0.1
34 22 8 0.75 0.25 0 0.5 0 0.2 1 0.25 0 0.5 0.02 0.1
35 23 14 1 0.25 0 0.5 0.02 0.2 1 0.25 0.003 0 0 0.1
36 22 2 1 0.1 0.003 0 0.02 0.2 0.75 0.25 0.003 0.5 0.02 0.2
37 23 1 1 0.1 0 0.5 0 0.2 1 0.25 0.003 0.5 0.02 0.1
38 21 11 0.75 0.1 0.003 0 0.02 0.1 0.75 0.25 0 0 0 0.1
39 23 22 1 0.1 0.003 0.5 0 0.1 0.75 0.1 0 0 0 0.2
40 23 4 0.75 0.1 0 0.5 0 0.1 1 0.1 0 0 0.02 0.2
41 22 18 0.75 0.25 0 0 0.02 0.1 1 0.1 0 0.5 0.02 0.1
42 22 20 0.75 0.1 0.003 0.5 0.02 0.2 1 0.1 0.003 0 0 0.1
43 22 12 1 0.1 0.003 0.5 0 0.2 0.75 0.25 0 0.5 0 0.1
44 22 3 0.75 0.25 0 0 0 0.2 0.75 0.25 0.003 0.5 0 0.1
45 22 19 0.75 0.25 0.003 0 0.02 0.2 0.75 0.1 0 0.5 0.02 0.2
46 22 3 1 0.1 0.003 0.5 0.02 0.1 1 0.25 0.003 0 0.02 0.2
47 22 0 1 0.1 0 0 0 0.1 0.75 0.25 0.003 0 0 0.2
48 22 17 1 0.25 0 0 0 0.2 0.75 0.1 0 0 0.02 0.1
49 30 23 0.75 0.25 0 0 0.02 0.1 1 0.1 0 0.5 0.02 0.1
50 30 29 0.75 0.1 0.003 0.5 0.02 0.2 1 0.1 0.003 0 0 0.1
51 30 9 1 0.1 0.003 0.5 0 0.2 0.75 0.25 0 0.5 0 0.1
52 30 8 0.75 0.25 0 0 0 0.2 0.75 0.25 0.003 0.5 0 0.1
53 30 27 0.75 0.25 0.003 0 0.02 0.2 0.75 0.1 0 0.5 0.02 0.2
54 30 3 1 0.1 0.003 0.5 0.02 0.1 1 0.25 0.003 0 0.02 0.2
55 30 1 1 0.1 0 0 0 0.1 0.75 0.25 0.003 0 0 0.2
56 30 26 1 0.25 0 0 0 0.2 0.75 0.1 0 0 0.02 0.1
57 32 31 0.75 0.25 0.003 0 0.02 0.2 0.75 0.1 0.003 0.5 0 0.1
58 32 12 0.75 0.25 0 0.5 0 0.2 1 0.25 0 0.5 0.02 0.1
59 32 19 1 0.25 0 0.5 0.02 0.2 1 0.25 0.003 0 0 0.1
60 32 1 1 0.1 0.003 0 0.02 0.2 0.75 0.25 0.003 0.5 0.02 0.2
61 32 2 1 0.1 0 0.5 0 0.2 1 0.25 0.003 0.5 0.02 0.1
62 32 11 0.75 0.1 0.003 0 0.02 0.1 0.75 0.25 0 0 0 0.1
63 32 26 1 0.1 0.003 0.5 0 0.1 0.75 0.1 0 0 0 0.2
64 32 3 0.75 0.1 0 0.5 0 0.1 1 0.1 0 0 0.02 0.2
Appendix A
311
3 Preference meta-analysis
Literature search
A search was carried out on PubMed for the following expressions:
“multiple sclerosis”
AND
“preference”/”preferences”/”utility”/”utilities”/”elicitation”/”elicited”/”elicit”
AND
“patient”
AND
“relapsing”
AND
“remitting”
PubMed search term: ((((((((((preference) OR preferences) OR utility) OR utilities) OR elicit)
OR elicitation) OR elicited) AND multiple sclerosis) AND relapsing) AND remitting) AND patient
PubMed hits: 198 (search carried out 22/09/2017)
After title screening: 30 papers remaining
Papers were then screened for relevance of the methodology. Some papers were primarily clinical
and only mentioned utility or preferences in passing. Many papers were not concerned with
preference elicitation for individual criteria, but rather with surveying global utility or quality of life
among the MS patient population. These were excluded, leaving only 9 papers remaining.
Then each paper was assessed for compatibility with the RRMS case study, i.e. whether the following
requirements were met:
- Preferences should be elicited from multiple sclerosis patients.
- The criteria assessed must include some outcomes or treatment administration modes from
the RRMS case study, with definitions/scales that are either equivalent to those used in the
evidence synthesis or can be used to approximate the latter via simple transformations.
312
- The units of each crterion must be clearly expressed within the elicitation tasks.
Two studies were excluded as they did not satisfy the last point, with no criteria units specified
during the elicitation procedure:
• Sempere, Angel & López, Vanesa M. & Gimenez-Martinez, Juana & Ruiz-Beato, Elena &
Cuervo, Jesús & Maurino, Jorge. (2017). Using a multidimensional unfolding approach to
assess multiple sclerosis patient preferences for disease-modifying therapy: A pilot study.
Patient Preference and Adherence. Volume 11. 995-999. doi:10.2147/PPA.S129356.
• Kremer IE, Evers SM, Jongen PJ, van der Weijden T, van de Kolk I, Hiligsmann M.
Identification and Prioritization of Important Attributes of Disease-Modifying Drugs in
Decision Making among Patients with Multiple Sclerosis: A Nominal Group Technique and
Best-Worst Scaling. PLoS One. 2016;11(11):e0164862. doi:10.1371/journal.pone.0164862
This resulted in a final set of 7 studies, providing the utility coefficients set out in the tables below
(full references are given in the bibliography). As the tables show, the studies use different
conventions: dummy or effects coding may be used to construct coefficients; and standard errors,
standard deviations or confidence intervals to report uncertainty. One study (Wilson 2014) reported
exponentiated coefficients.
313
N = number of participants, ARR = annualised relapse rate, DP = disability progression, IM = intramuscular, IV = intravenous infusion, SC = subcutaneous, coef = coefficient, sd = standard deviation, se = standard error, CI = confidence interval.
Study: ARROYO N=221
Study type: absolute scenario ratings
Effects-coded coefficients
ARR coef sd
0.2 0.367 0.131
0.5 -0.367 0.131
Expected time to DP (years) coef sd
2 -0.445 0.131
5 0.445 0.131
Administration route coef sd
Oral 1.345 0.195
SC/IM -0.381 0.175
IV -0.965 0.195
Administration frequency coef sd
Daily -0.877 0.206
every 2 days - weekly -0.527 0.251
monthly 0.267 0.206
twice yearly 1.137 0.251
Study: GARCIA-DOMINGUEZ N=125
Linear coefficient
Expected time to DP coef se
1 year reduction 0.128 0.013
Dummy-coded coefficients
Administration modes coef se
Oral daily 0
IM weekly -0.849 0.113
SC several x week -0.943 0.103
314
Study: MANSFIELD N=301
Study type: discrete choice (logit)
Numbers visually approximated from graphs
Effects-coded coefficients
1-year DP risk coef 95% CI
0.15 -1.3 (-1.5,-1.1)
0.02 1.3 (1.1,1.5)
ARR coef 95% CI
0.125 0.25 (0.1,0.4)
0.167 0.2 (0.05,0.35)
0.2 0.05 (-0.1,0.2)
0.5 -0.5 (-0.65,-0.35)
Administration modes coef 95% CI
Oral daily 1.2 (1.0,1.4)
Injection 3x week -0.5 (-0.8,-0.2)
IV monthly -0.3 (-0.6,0)
IV every 6 months 0.25 (0,0.5)
Study: POULOS N=189
Study type: discrete choice (logit)
Numbers visually approximated from graphs
Effects-coded coefficients
ARR coef 95% CI
0.25 0.6 (0.4,0.8)
0.75 -0.1 (-0.25,0.05)
1 -0.5 (-0.7,-0.3)
Expected time to DP (years)
coef 95% CI
1 -0.9 (-1.2,-0.6)
2 -0.3 (-0.5,-0.1)
4 1.2 (0.9,1.5)
315
Study: UTZ N=156
Study type: discrete choice (logit)
Dummy-coded coefficients
Administration route coef sd
Oral 3.61 2.22
SC/IM -3.61 2.22
Administration frequency
coef sd
Daily -0.49 0.88
every 2 days - weekly 2.35 1.35
monthly 3.74 2.48
x3 daily -5.61 3.31
Study: WILSON 2014 N=291
Study type: discrete choice (logit)
Exponentiated coefficients reported
Dummy-coded coefficients
ARR exp(coef) 95% CI
1 1
0.5 1.2 (1.08,1.32)
0.2 1.53 (1.38,1.69)
Expected time to DP (years)
exp(coef) 95% CI
2 1
4 1.36 (1.23,1.50)
10 2.46 (2.22,2.72)
Administration modes exp(coef) 95% CI
SC daily 1
IM weekly 1.04 (0.93,1.18)
IV monthly 1.62* (1.54,1.71)
Oral daily 2.08 (1.84,2.35)
* reported as 1.52 in the original paper but presumed to be a typographical error
316
Study: WILSON 2015 N=50
Study type: absolute scenario ratings
Linear coefficients
ARR coef se
1 -0.05 0.06
Expected time to DP (years)
coef se
1 0.12 0.03
Administration modes coef se
Oral daily (reference) 0
IM 3x week -1.23 0.24
SC 3x week -1.41 0.24
IV monthly -0.86 0.24
Where studies reported standard deviations, these were converted to standard errors by dividing by
the square root of the number of participants. Where 95% confidence intervals were reported, the
interval width was divided by 3.92 to obtain the standard error.
Relapse rates were sometimes expressed in terms of the expected time between relapses, i.e. the
reciprocal of the rate, but not explicitly modelled as linear on that scale, meaning they could be
simply converted to annualised rates at the point of extraction (and have been expressed as such in
the tables above). The situation with regard to disability progression was less straightforward. The
evidence synthesis (and PROTECT preference datasets) used 2-year risk outcomes to measure this
criterion whereas the published elicitation studies used the expected time to disability progression
(in years), or in one study (Mansfield) a 1-year risk. It was decided that extrapolating preferences
from a 1-year to a 2-year risk horizon was too speculative, so this data was not included. Where
preferences were elicited regarding the expected time to disability progression, this was
transformed to a 2-year risk under a constant hazard assumption i.e. using the formula
𝑃(𝑝𝑟𝑜𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑤𝑖𝑡ℎ𝑖𝑛 2 𝑦𝑒𝑎𝑟𝑠) = 1 − 𝑒−2
𝑡 where t is the expected time until progression (see
III.5.4.1.3). This could not be done for the Garcia-Dominguez study, however, since this study
elicited a single linear coefficient on time to progression at three discrete levels. This fixes the utility
scale as linear in time to progression and renders it incompatible with the case study assumption of
linearity in progression risk. This coefficient was therefore not included.
Given the scale transformations applied to the data it is sensible to check that the assumption of
linear utility on the ARR and disability risk scales is appropriate. The graphs below plot the discrete
317
coefficient estimates on those scales (for the studies contributing such data) and appear sufficiently
linear for these purposes, albeit with too few data points to draw any firm conclusions regarding the
true relationships.
Rebasing was needed to make the coding scheme consistent throughout the dataset. Coefficients
were converted where necessary to dummy coding as described in III.5.4.1.1III.5.4. Continuous
criteria were assumed linear so the choice of reference is arbitrary provided the criterion levels are
expressed as linear changes from the reference point. For the categorical criterion, administration
modes, “daily subcutaneous” was selected as the reference; where this was unavailable in any given
study, a study-specific alternative reference category was used and recorded in the data so the
model could adjust the parameters accordingly (i.e. by combining with the parameter for the study-
specific reference category). Creating a pooled category for intramuscular and subcutaneous
injections taken at least once a week (but not every day) was the best way to make efficient use of
the data.
Most of the published studies combined the administration mode and frequency into a single
criterion (as in the PROTECT preference datasets) but two studies (Arroyo, Utz) elicited preferences
318
for these two dimensions separately. It is straightforward to combine them by taking linear pairings
of the coefficients (see III.5.4.3).
The tables below show the source data after transformations and rebasing.
Study: ARROYO
N=221
Study type: absolute scenario ratings
Change in ARR coef se
0.3 -0.734 0.0176
Change in 2-year DP risk
coef se
0.3024 -0.89 0.0176
Administration modes coef se
SC daily (reference) 0
Oral daily 1.726 0.022
IV monthly 0.56 0.210
Injection 2 days-weekly 0.35 0.025
Study: GARCIA-DOMINGUEZ
N=125
Study type: discrete choice (logit)
Administration modes coef se
Oral daily (reference) 0
Injection 2 days-weekly -0.849 0.113
Injection 2 days-weekly -0.943 0.103
Study: MANSFIELD N=301
Study type: discrete choice (logit)
Change in ARR coef se
0 (reference) 0
0.042 -0.05 0.1250
0.075 -0.2 0.1250
0.375 -0.75 0.1250
Administration modes coef se
Oral daily 0
Injection 2 days-weekly
-1.7 0.2101
IV monthly -1.5 0.2101
319
Study: POULOS N=189
Study type: discrete choice (logit)
Change in ARR coef se
0.5 -0.7 0.1552
0.75 -1.1 0.1767
Change in DP event risk
coef se
0 (reference) 0
-0.23254 0.6 0.2224
-0.4712 2.1 0.2651
Study: UTZ N=156
Study type: discrete choice (logit)
Administration frequency
coef se
Oral daily (reference) 0 0
Injection 2 days-weekly
-4.38 0.5045
Study: WILSON 2014 N=291
Study type: discrete choice (logit)
Change in ARR coef se
-0.5 0.1823 0.0512
-0.8 0.4253 0.0517
Change in DP event risk
coef se
0 0
-0.2387 0.3075 0.0506
-0.4509 0.9002 0.0518
Administration modes coef se
SC daily 0
Injection 2 days-weekly 0.0392 0.0607
IV monthly 0.4824 0.0267
Oral daily 0.7324 0.0624
320
Study: WILSON 2015 N=50
Study type: absolute scenario ratings
Change in ARR coef se
1 -0.05 0.06
Change in DP event risk
coef se
1 0.12 0.03
Administration modes coef se
Oral daily (reference) 0
Injection 2 days-weekly -1.23 0.24
Injection 2 days-weekly -1.41 0.24
IV monthly -0.86 0.24
Appendix B
321
Appendix B. BUGS code and data
1 Clinical evidence synthesis
Dictionary
The table below describes the key variables/parameters/constants.
Name Description d[t,] Population average effect of treatment t on outcome ad[t, ] Magnitude (absolute value) of d[t, ] adelta[i,k,j] Effect (relative to reference treatment) on outcome j of treatment in
arm k of study i delta[i,k,j] Used in constructing adelta[i,k,j]. Almost identical to adelta[i,k,j] but is
non-zero for reference treatment. sdelta[i,j] Mean of adelta[i,k,j] across all trial arms k b[] Magnitude of the average (across treatments) mapping coefficient for
outcome , i.e. the absolute value of the average ratio between
outcome and the reference outcome in the appropriate group sb[] Mapping coefficient for outcome with correct sign sign[] Known sign of treatment effect on outcome . Takes value 1 if all
treatment effects on outcome (relative to reference) are positive; or
value -1 if all treatment effects on outcome (relative to reference) are negative
impact[] Takes value 1 if an increase in outcome is beneficial; or value -1 if an
increase in outcome is harmful beta[t, ] Treatment-specific mapping coefficient for outcome relative to
reference outcome in the appropriate group, for treatment t lbeta[t, ] Treatment-specific mapping coefficient for outcome relative to
reference outcome in the appropriate group, for treatment t, on log scale
rho_b[] Between-study propensity to correlate for outcome ; if equal for all outcomes, it is also the between-study correlation coefficient
rho_w[] Within-study propensity to correlate for outcome ; if equal for all outcomes, it is also the within-study correlation coefficient
sig Between-study random effects standard deviation tau Between-study random effects precision mapsig Between-treatment random mappings standard deviation maptau Between-treatment random mappings precision y[i,k,j] Observed value of outcome j in arm k of study i va[i,k,j] Observed variance of outcome j in arm k of study i ns Number of studies in dataset nt Number of treatments in dataset na[i] Number of arms in study i maxarms Highest number of arms in any study in dataset* no[i] Number of outcomes in study i totalo Total number of outcomes in dataset ng Number of outcome groups used for mappings ogbase Vector listing the first outcome in each group (plus a final component
equal to totalo+1) a[] Population average absolute level of outcome in untreated
population on Normal scale
Appendix B
322
alpha[ns+1,] Study-level predictive distributon of absolute level of outcome in untreated population on Normal scale
absd[t,] Population average absolute level of outcome on treatment t on Normal scale
pm_amu[ns+1,t,] Study-level predictive distribution of absolute level of outcome on treatment t on Normal scale
pred_y[t,] Individual-level predictive distribution of absolute level of outcome on treatment t on Normal scale
trad[t,] Population average absolute level of outcome on treatment t back-transformed to original scale
trad_pred_study[t,] Study-level predictive distribution of absolute level of outcome on treatment t back-transformed to original scale
trad_pred_y[t,] Individual-level predictive distribution of absolute level of outcome on treatment t back-transformed to original scale
* Due to the coding used to construct the covariance matrix, if there is a study with the maximum number of
arms and outcomes then it is necessary to increase the value of maxarms by 1.
The treatments, outcomes and studies in the RRMS case study are numbered as follows:
Treatments
1 Placebo
2 Dimethyl fumarate
3 Fingolimod
4 Glatiramer acetate
5 Interferon beta-1a (intramuscular)
6 Interferon beta-1a (subcutaneous)
7 Interferon beta-1b
8 Laquinimod
9 Teriflunomide
Outcomes
1 Annualised relapse rate
2 Relapse-free proportion
3 Proportion undergoing disability progression; confirmed 3 months later
4 Proportion undergoing disability progression; confirmed 6 months later
5 Alanine aminotransferase above upper limit of normal range
6 Alanine aminotransferase above 3x upper limit of normal range
7 Alanine aminotransferase above 5x upper limit of normal range
8 Proportion with serious gastrointestinal disorders
9 Proportion with serious bradycardia
10 Proportion with macular edema
Studies
1 BRAVO 2014
2 CONFIRM 2012
3 ALLEGRO 2012
4 BECOME 2009
5 BEYOND 2009
6 DEFINE 2012
7 FREEDOMS 2010
8 FREEDOMS II 2014
9 INCOMIN 2002
10 JOHNSON 1995
11 MSCRG 1996
12 PRISMS 1998
13 REGARD 2008
14 TEMSO 2011
15 BORNSTEIN 1987
16 IFNB 1993
Treatment effects module code: Model 0
Variables and constants specific to Model 0 Name Description no1[i] Number of Poisson distributed outcomes in study i no2[i] Number of binary outcomes in study i modelled with odds ratios totalo1 Total number of Poisson distributed outcomes in dataset totalo1 Total number of of binary outcomes modelled with odds ratios in
dataset
# This model uses random effects on outcomes 1-7; to obtain a fixed effects model replace each blue line of code with delta[i,k,j] <- H[i,k,j] # Outcomes 8-10 are modelled using fixed effects.
Appendix B
323
model
sig~dunif(0,10) # prior for between-study sd of treatment effects
tau<-pow(sig,-2) # between-study precision of treatment effects
for(i in 1:ns)
temp[i]<-sum(n[i,1:na[i]]) # variable n is not used
# outcome 1: relapse rate (Poisson)
for (i in 1:ns)
for (j in 1:no1[i])
mu[i,j]~dnorm(0,.001)
for (k in 1:na[i])
lambda[i,k,j] <- pi[i,k,j]*va[i,k,j] # here va is the number of person-
years
y[i,k,j]~dpois(lambda[i,k,j])
log(pi[i,k,j])<-mu[i,j] + adelta[i,k,j]
yhat[i,k,j] <- pi[i,k,j] * va[i,k,j]
dev[i,k,j] <- 2 * (y[i,k,j] * (log(y[i,k,j])-log(yhat[i,k,j])) - ( y[i,k,j]
- yhat[i,k,j]))
delta[i,k,j] ~ dnorm(H[i,k,j],tau) # distribution of trial-specific
treatment effect on outcome j in arm k
adelta[i,k,j]<-(1-equals(k,1))*delta[i,k,j] # treatment effect set to
zero for reference treatment
H[i,k,j] <- d[t[i,k],o[i,j]] - d[t[i,1],o[i,j]]
prd[i,j]<-sum(dev[i,1:na[i],j])
# outcomes 2-7: common binary events (Binomial, via odds ratio)
for (j in no1[i]+1:no1[i]+no2[i])
mu[i,j]~dnorm(0,.001)
for (k in 1:na[i])
y[i,k,j]~dbin(pi[i,k,j],va[i,k,j]) # here va is denominator
logit(pi[i,k,j])<-mu[i,j] + adelta[i,k,j]
yhat[i,k,j] <- pi[i,k,j] * va[i,k,j]
dev[i,k,j] <- 2 * (y[i,k,j] * (log(y[i,k,j])-log(yhat[i,k,j])) + (va[i,k,j]
- y[i,k,j]) * (log(va[i,k,j] - y[i,k,j]) - log(va[i,k,j] - yhat[i,k,j])))
delta[i,k,j] ~ dnorm(H[i,k,j],tau) # distribution of trial-specific
treatment effect on outcome j in arm k
adelta[i,k,j]<-(1-equals(k,1))*delta[i,k,j] # treatment effect set to
zero for reference treatment
H[i,k,j] <- d[t[i,k],o[i,j]] - d[t[i,1],o[i,j]]
prd[i,j]<-sum(dev[i,1:na[i],j])
# outcomes 8-10: common binary events (Binomial, via risk diff)
for (j in no1[i]+no2[i]+1:no[i])
mu[i,j]~dgamma(0.5,0.5)
for (k in 1:na[i])
y[i,k,j]~dbin(pi[i,k,j],va[i,k,j]) # here va is denominator
pi[i,k,j]<-mu[i,j]+min(max(d[t[i,k],o[i,j]]-d[t[i,1],o[i,j]],-mu[i,j]), 1-
mu[i,j])
Appendix B
324
yhat[i,k,j] <- pi[i,k,j] * va[i,k,j]
dev[i,k,j] <- 2 * (y[i,k,j] * (log(y[i,k,j])-log(yhat[i,k,j])) + (va[i,k,j]
- y[i,k,j]) * (log(va[i,k,j] - y[i,k,j]) - log(va[i,k,j] - yhat[i,k,j])))
prd[i,j]<-sum(dev[i,1:na[i],j])
resdev[i]<-sum(prd[i,1:no[i]])
for (k in 2:nt)
for (j in 1:totalo1+totalo2) d[k,j]~dnorm(0,.001) # prior for
mean treatment effects
for (j in totalo1+totalo2+1:totalo) d[k,j]~dbeta(0.5,0.5)
for (j in 1:totalo)
d[1,j]<-0 # mean treatment effect is zero on reference
treatment
for (k in 1:nt)
for (j in 1:totalo)
rank[k,j]<-equals(impact[j],-1)*rank(d[,j],k)+equals(impact[j],1)*(nt+1-
rank(d[,j],k)) # treatment rankings by outcome
for (q in 1:nt)
rankprop[k,j,q]<-equals(rank[k,j],q) # indicator for time
spent at each rank
cumrankprop[k,j,q]<-step(q-rank[k,j]) # indicator for time
spent at or below each rank
sucra[k,j]<-sum(cumrankprop[k,j,1:nt-1])/(nt-1) # SUCRA
nmaresdev<-sum(resdev[]) # summed overall residual
deviance
# END
Treatment effects module code: Model 1
# This model uses random effects; to obtain a fixed effects model replace the blue line of code with delta[i,k,j] <- H[i,k,j]
model
sig~dunif(0,10) # prior for between-study sd of treatment effects
tau<-pow(sig,-2) # between-study precision of treatment effects
for(i in 1:ns)
rc[i]<-0
E[i]~dnorm(0,1) # normalised between-trial different-arm different-
outcome covariance of treatment effects (delta)
resdev[i]<-inprod(pres[i,1:no[i]*na[i]],res[i,1:no[i]*na[i]]) #
residual deviance for study i
cp[i,1:totalo*maxarms,1:totalo*maxarms]<-
inverse(cv[i,1:totalo*maxarms,1:totalo*maxarms])
# within-study coprecision matrix of outcomes in study i
Appendix B
325
for (j in 1:no[i])
G[i,j]~dnorm(0,1) # normalised between-trial different-arm same-outcome
covariance of treatment effects (delta)
mu[i,j]~dnorm(0,.001) # "average" level of outcome j in study
i across all trial arms
delta[i,1,j]<-0
for (k in 1:na[i])
D[i,k,j] <- mu[i,j] + delta[i,k,j] +
signr_w[o[i,j]]*sqrt(abs(rho_w[o[i,j]]))*B[i,k]*pow(prec[i,k,j],-0.5)
# mean of outcome j in arm k of study i is the average across all arms plus
the effect of treatment compared to average; final term induces required
within-study covariance between different outcomes in same arm
y[i,k,j]~dnorm(D[i,k,j],yprec[i,k,j]) # distribution
of outcome j in arm k of study i
prec[i,k,j]<-pow(va[i,k,j]/n[i,k],-1) # overall
variance of observed outcome y
yprec[i,k,j] <- prec[i,k,j]/(1-abs(rho_w[o[i,j]])) # remaining
(unshared) precision of y[I,k,j] after accounting for covariance
for (k in 1:na[i])
B[i,k]~dnorm(0,1) # normalised within-trial same-arm different-
outcome covariance of observed outcomes (y)
for (k in 2:na[i])
F[i,k] ~ dnorm(0,1) # normalised between-trial same-arm
different-outcome covariance of treatment effects (delta)
for (j in 1:no[i])
taud[i,k,j]<-tau/(1-abs(rho_b[o[i,j]])-0.5+0.5*abs(rho_b[o[i,j]]))
# remaining (unshared) precision of delta after accounting for covariances
delta[i,k,j] ~ dnorm(H[i,k,j],taud[i,k,j]) # distribution of
trial-specific treatment effect on outcome j in arm k
H[i,k,j] <- d[t[i,k],o[i,j]] - d[t[i,1],o[i,j]] +
signr_b[o[i,j]]*(sqrt(abs(rho_b[o[i,j]])*0.5)*E[i]+signr_b[o[i,j]]*sqrt(abs
(rho_b[o[i,j]])-abs(rho_b[o[i,j]])*0.5)*F[i,k] + signr_b[o[i,j]]*sqrt(0.5-
abs(rho_b[o[i,j]])*0.5)*G[i,j])* pow(tau,-0.5)
# mean of treatment effect for outcome j in arm k of study i is the
population average effect parameter for that treatment/outcome, with
adjustments for correlations: different-arm/different-outcome, same-
arm/different-outcome, same-arm/same-outcome
for (x in 1:no[i]*na[i])
arm[i,x]<-trunc(1+(x-1)/no[i])
out[i,x]<-x-no[i]*trunc((x-1)/no[i])
for (z in 1:no[i]*na[i])
cv[i,x,z]<-
pow(prec[i,arm[i,x],out[i,x]]*prec[i,arm[i,z],out[i,z]],-
0.5)*(signr_w[o[i,out[i,x]]]*signr_w[o[i,out[i,z]]]*sqrt(abs(rho_w[o[i,out[
i,x]]]*rho_w[o[i,out[i,z]]]))*equals(arm[i,x],arm[i,z])+(1-
signr_w[o[i,out[i,x]]]*signr_w[o[i,out[i,z]]]*sqrt(abs(rho_w[o[i,out[i,x]]]
*rho_w[o[i,out[i,z]]])))*equals(x,z))
for (j in no[i]*na[i]+1:totalo*maxarms)
Appendix B
326
cv[i,x,j]<-0
cv[i,j,x]<-0
res[i,x]<-y[i,arm[i,x],out[i,x]] - mu[i,out[i,x]] -
delta[i,arm[i,x],out[i,x]]
pres[i,x]<-inprod(cp[i,x,1:no[i]*na[i]],res[i,1:no[i]*na[i]])
for (j in no[i]*na[i]+1:totalo*maxarms-1)
for (k in j+1:totalo*maxarms) cv[i,j,k]<-0
cv[i,k,j]<-0
cv[i,j,j]<-1
cv[i,totalo*maxarms,totalo*maxarms]<-1
for (j in 1:totalo)
d[1,j]<-0 # mean treatment effect is zero on reference
treatment
signr_b[j]<-step(rho_b[j]) # sign of between-study correlations
signr_w[j]<-step(rho_w[j]) # sign of within-study
correlations
for (k in 2:nt) d[k,j]~dnorm(0,.001) # prior for mean
treatment effects
for (k in 1:nt) for (j in 1:totalo)
rank[k,j]<-equals(impact[j],-1)*rank(d[,j],k)+equals(impact[j],1)*(nt+1-
rank(d[,j],k)) # treatment rankings by outcome
for (q in 1:nt)
rankprop[k,j,q]<-equals(rank[k,j],q) # indicator for time
spent at each rank
cumrankprop[k,j,q]<-step(q-rank[k,j]) # indicator for time
spent at or below each rank
sucra[k,j]<-sum(cumrankprop[k,j,1:nt-1])/(nt-1) # SUCRA
nmaresdev<-sum(resdev[]) # summed overall residual
deviance
# END
Treatment effects module code: Model 1* (contrast-level data)
# This model uses random effects; to obtain a fixed effects model replace the blue line of code with delta[i,k,j] <- H[i,k,j]
model
sig~dunif(0,10) # prior for between-study sd of treatment effects
tau<-pow(sig,-2) # between-study precision of treatment effects
for(i in 1:ns)
Appendix B
327
E[i]~dnorm(0,1) # normalised between-trial different-arm different-
outcome covariance of treatment effects (delta)
A[i]~dnorm(0,1)
rc[i]<-1-n[i,1]/(n[i,1]+sum(n[i,2:na[i]])/(na[i]-1)) # estimate
between-arm correlation based on number of patients in trial arms
resdev[i]<-inprod(pres[i,1:no[i]*(na[i]-1)],res[i,1:no[i]*(na[i]-1)])
cp[i,1:totalo*(maxarms-1),1:totalo*(maxarms-1)]<-
inverse(cv[i,1:totalo*(maxarms-1),1:totalo*(maxarms-1)])
# within-study coprecision matrix of outcomes in study i
for (j in 1:no[i])
G[i,j]~dnorm(0,1) # normalised between-trial different-arm same-outcome
covariance of treatment effects (delta)
C[i,j]~dnorm(0,1)
for (k in 2:na[i])
D[i,k,j] <- delta[i,k,j] +
signr_w[o[i,j]]*(sqrt(abs(rho_w[o[i,j]])*rc[i])*A[i]+signr_w[o[i,j]]*sqrt(a
bs(rho_w[o[i,j]])-abs(rho_w[o[i,j]])*rc[i])*B[i,k] +
signr_w[o[i,j]]*sqrt(rc[i]--
abs(rho_w[o[i,j]])*rc[i])*C[i,j])*pow(prec[i,k,j],-0.5)
# mean of outcome j in arm k of study i is the average across all arms plus
the effect of treatment compared to average; final term induces required
within-study covariance between different outcomes in same arm
y[i,k,j]~dnorm(D[i,k,j],yprec[i,k,j]) # distribution
of outcome j in arm k of study i
prec[i,k,j]<-pow(se[i,k,j],-2) # overall
variance of observed outcome y
yprec[i,k,j] <- prec[i,k,j]/(1-abs(rho_w[o[i,j]])-
rc[i]+rc[i]*abs(rho_w[o[i,j]])) # remaining (unshared) precision of
y[i,k,j] after accounting for covariance
for (k in 2:na[i])
B[i,k]~dnorm(0,1) # normalised within-trial same-arm different-
outcome covariance of observed outcomes (y)
F[i,k] ~ dnorm(0,1) # normalised between-trial same-arm
different-outcome covariance of treatment effects (delta)
for (j in 1:no[i])
taud[i,k,j]<-tau/(1-abs(rho_b[o[i,j]])-0.5+0.5*abs(rho_b[o[i,j]]))
# remaining (unshared) precision of delta after accounting for covariances
delta[i,k,j] ~ dnorm(H[i,k,j],taud[i,k,j]) # distribution of
trial-specific treatment effect on outcome j in arm k
H[i,k,j] <- d[t[i,k],o[i,j]] - d[t[i,1],o[i,j]] +
signr_b[o[i,j]]*(sqrt(abs(rho_b[o[i,j]])*0.5)*E[i]+signr_b[o[i,j]]*sqrt(abs
(rho_b[o[i,j]])-abs(rho_b[o[i,j]])*0.5)*F[i,k] + signr_b[o[i,j]]*sqrt(0.5-
abs(rho_b[o[i,j]])*0.5)*G[i,j])* pow(tau,-0.5)
# mean of treatment effect for outcome j in arm k of study i is the
population average effect parameter for that treatment/outcome, with
adjustments for correlations: different-arm/different-outcome, same-
arm/different-outcome, same-arm/same-outcome
for (x in 1:no[i]*(na[i]-1)) # indexing variable x loops
through all arm/outcome combinations in study i
arm[i,x]<-trunc(1+(x-1)/no[i])+1 # finds within-trial arm number
corresponding to each value of x
Appendix B
328
out[i,x]<-x-no[i]*trunc((x-1)/no[i]) # finds within-trial outcome
number corresponding to each value of x
for (z in 1:no[i]*(na[i]-1)) cv[i,x,z]<-
pow(prec[i,arm[i,x],out[i,x]]*prec[i,arm[i,z],out[i,z]],-
0.5)*signr_w[o[i,out[i,x]]]*signr_w[o[i,out[i,z]]]*(rc[i]*sqrt(abs(rho_w[o[
i,out[i,x]]]*rho_w[o[i,out[i,z]]])+(sqrt(abs(rho_w[o[i,out[i,x]]]*rho_w[o[i
,out[i,z]]]))-
rc[i]*sqrt(abs(rho_w[o[i,out[i,x]]]*rho_w[o[i,out[i,z]]])))*equals(arm[i,x]
,arm[i,z])+(rc[i]-
rc[i]*sqrt(abs(rho_w[o[i,out[i,x]]]*rho_w[o[i,out[i,z]]])))*equals(out[i,x]
,out[i,z]))+(1-rc[i]-
signr_w[o[i,out[i,x]]]*signr_w[o[i,out[i,z]]]*sqrt(abs(rho_w[o[i,out[i,x]]]
*rho_w[o[i,out[i,z]]]))+rc[i]*signr_w[o[i,out[i,x]]]*signr_w[o[i,out[i,z]]]
*sqrt(abs(rho_w[o[i,out[i,x]]]*rho_w[o[i,out[i,z]]])))*equals(x,z))
# within-study covariance matrix element representing covariance between
arm/outcome combinations x and z; matrix is needed to calculate residual
deviance
for (j in no[i]*(na[i]-1)+1:totalo*(maxarms-1))
# covariance matrix needs extra columns and rows to standardise its
dimensions across studies
cv[i,x,j]<-0 # fill in redundant off-diagonal elements of the
covariance matrix with zeroes
cv[i,j,x]<-0 # fill in redundant off-diagonal elements of the
covariance matrix with zeroes
res[i,x]<-y[i,arm[i,x],out[i,x]] - delta[i,arm[i,x],out[i,x]]
# residual for arm/outcome combination x in study i
pres[i,x]<-inprod(cp[i,x,1:no[i]*(na[i]-1)],res[i,1:no[i]*(na[i]-1)])
# inner product of residuals and row of coprecision matrix (for residual
deviance calculation)
for (j in no[i]*(na[i]-1)+1:totalo*(maxarms-1)-1)
# covariance matrix needs extra columns and rows to standardise its
dimensions across studies
for (k in j+1:totalo*(maxarms-1)) cv[i,j,k]<-0
# fill in redundant off-diagonal elements of the covariance matrix with
zeroes
cv[i,k,j]<-0 # fill in redundant off-diagonal elements of the
covariance matrix with zeroes
cv[i,j,j]<-1 # fill in redundant diagonal elements of the
covariance matrix with 1s
cv[i,totalo*(maxarms-1),totalo*(maxarms-1)]<-1 # fill in final
redundant diagonal element of the covariance matrix with a 1
for (j in 1:totalo)
d[1,j]<-0 # mean treatment effect is zero on reference
treatment
signr_b[j]<-step(rho_b[j]) # sign of between-study correlations
signr_w[j]<-step(rho_w[j]) # sign of within-study
correlations
for (k in 2:nt) d[k,j]~dnorm(0,.001) # prior for mean
treatment effects
Appendix B
329
for (k in 1:nt) for (j in 1:totalo)
rank[k,j]<-equals(impact[j],-1)*rank(d[,j],k)+equals(impact[j],1)*(nt+1-
rank(d[,j],k)) # treatment rankings by outcome
for (q in 1:nt)
rankprop[k,j,q]<-equals(rank[k,j],q) # indicator for time
spent at each rank
cumrankprop[k,j,q]<-step(q-rank[k,j]) # indicator for time
spent at or below each rank
sucra[k,j]<-sum(cumrankprop[k,j,1:nt-1])/(nt-1) # SUCRA
nmaresdev<-sum(resdev[]) # summed overall residual
deviance
# END
Treatment effects module code: Model 2
# This model uses random effects; to obtain a fixed effects model replace the blue line of code with delta[i,k,j] <- H[i,k,j]
model
sig~dunif(0,10) # prior for between-study sd of treatment
effects
tau<-pow(sig,-2) # between-study precision of treatment effects
for(i in 1:ns)
E[i]~dnorm(0,1)
# normalised between-trial different-arm different-outcome
covariance of treatment effects (delta)
resdev[i]<-inprod(pres[i,1:no[i]*na[i]],res[i,1:no[i]*na[i]])
# residual deviance for study i
cp[i,1:totalo*maxarms,1:totalo*maxarms]<-
inverse(cv[i,1:totalo*maxarms,1:totalo*maxarms]) # within-study
coprecision matrix of outcomes in study i
for (j in 1:no[i])
G[i,j]~dnorm(0,1) # normalised between-trial
different-arm same-outcome covariance of treatment effects (delta)
mu[i,j]~dnorm(0,.001) # "average" level of
outcome j in study i across all trial arms
sdelta[i,j]<-sum(adelta[i,1:na[i],j]) # effect of
"average" treatment in study i on outcome j relative to reference treatment
for (k in 1:na[i])
D[i,k,j] <- mu[i,j] + adelta[i,k,j] - sdelta[i,j]/na[i] +
signr_w[o[i,j]]*sqrt(abs(rho_w[o[i,j]]))*B[i,k]*pow(prec[i,k,j],-0.5) #
mean of outcome j in arm k of study i is the average across all arms plus
the effect of treatment compared to average; final term induces required
within-study covariance between different outcomes in same arm
y[i,k,j]~dnorm(D[i,k,j],yprec[i,k,j])
# distribution of outcome j in arm k of
study i
Appendix B
330
prec[i,k,j]<-pow(va[i,k,j]/n[i,k],-1)
# overall variance of observed outcome
y
yprec[i,k,j] <- prec[i,k,j]/(1-abs(rho_w[o[i,j]]))
# remaining (unshared)
variance of y after accounting for covariance
for (k in 1:na[i])
B[i,k]~dnorm(0,1) # normalised within-trial same-
arm different-outcome covariance of observed outcomes (y)
F[i,k] ~ dnorm(0,1) # normalised between-trial
same-arm different-outcome covariance of treatment effects (delta)
iszeroarm[i,k]<-equals(t[i,k],1) # equals 1 if arm k of
trial i is reference treatment
for (j in 1:no[i])
taud[i,k,j]<-tau/(1-abs(rho_b[o[i,j]])-
0.5+0.5*abs(rho_b[o[i,j]]))
# remaining (unshared) variance of delta after
accounting for covariances
delta[i,k,j] ~ dnorm(H[i,k,j],taud[i,k,j])
# distribution
of trial-specific treatment effect on outcome j in arm k
adelta[i,k,j]<-(1-iszeroarm[i,k])*delta[i,k,j]
# treatment
effect set to zero for reference treatment
H[i,k,j] <- d[t[i,k],o[i,j]] +
signr_b[o[i,j]]*(sqrt(abs(rho_b[o[i,j]])*0.5)*E[i]+sqrt(abs(rho_b[o[i,j]])-
abs(rho_b[o[i,j]])*0.5)*F[i,k]+ sqrt(0.5-abs(rho_b[o[i,j]])*0.5)*G[i,j])*
pow(tau,-0.5) # mean of treatment effect for outcome j in arm k of
study i is the population average effect parameter for that
treatment/outcome, with adjustments for correlations: different-
arm/different-outcome, same-arm/different-outcome, same-arm/same-outcome
for (x in 1:no[i]*na[i]) # indexing variable x loops
through all arm/outcome combinations in study i
arm[i,x]<-trunc(1+(x-1)/no[i]) # finds within-trial
arm number corresponding to each value of x
out[i,x]<-x-no[i]*trunc((x-1)/no[i]) # finds within-trial
outcome number corresponding to each value of x
for (z in 1:no[i]*na[i])
cv[i,x,z]<-
pow(prec[i,arm[i,x],out[i,x]]*prec[i,arm[i,z],out[i,z]],-
0.5)*(signr_w[o[i,out[i,x]]]*signr_w[o[i,out[i,z]]]*sqrt(abs(rho_w[o[i,out[
i,x]]]*rho_w[o[i,out[i,z]]]))*equals(arm[i,x],arm[i,z])+(1-
signr_w[o[i,out[i,x]]]*signr_w[o[i,out[i,z]]]*sqrt(abs(rho_w[o[i,out[i,x]]]
*rho_w[o[i,out[i,z]]])))*equals(x,z)) # within-study covariance matrix
element representing covariance between arm/outcome combinations x and z;
matrix is needed to calculate residual deviance
for (j in no[i]*na[i]+1:totalo*maxarms) # covariance matrix needs
extra columns and rows to standardise its dimensions across studies
cv[i,x,j]<-0 # fill in redundant off-
diagonal elements of the covariance matrix with zeroes
cv[i,j,x]<-0 # fill in redundant off-
diagonal elements of the covariance matrix with zeroes
Appendix B
331
res[i,x]<-y[i,arm[i,x],out[i,x]] - mu[i,out[i,x]] -
adelta[i,arm[i,x],out[i,x]] + sdelta[i,out[i,x]]/na[i] # residual for
arm/outcome combination x in study i
pres[i,x]<-inprod(cp[i,x,1:no[i]*na[i]],res[i,1:no[i]*na[i]])
# inner product of residuals and row of
coprecision matrix (for residual deviance calculation)
for (j in no[i]*na[i]+1:totalo*maxarms-1) #
covariance matrix needs extra columns and rows to standardise its
dimensions across studies
for (k in j+1:totalo*maxarms) cv[i,j,k]<-0 # fill in
redundant off-diagonal elements of the covariance matrix with zeroes
cv[i,k,j]<-0 # fill in
redundant off-diagonal elements of the covariance matrix with zeroes
cv[i,j,j]<-1 # fill in
redundant diagonal elements of the covariance matrix with 1s
cv[i,totalo*maxarms,totalo*maxarms]<-1 # fill in
final redundant diagonal element of the covariance matrix with a 1
for (j in 1:totalo)
d[1,j]<-0 # mean treatment effect is zero on reference
treatment
signr_b[j]<-step(rho_b[j]) # sign of between-study
correlations
signr_w[j]<-step(rho_w[j]) # sign of within-study
correlations
for (k in 2:nt) d[k,j]~dnorm(0,.001) # prior for mean
treatment effects
for (k in 1:nt) for (j in 1:totalo)
rank[k,j]<-equals(impact[j],-
1)*rank(d[,j],k)+equals(impact[j],1)*(nt+1-rank(d[,j],k)) # treatment
rankings by outcome
for (q in 1:nt)
rankprop[k,j,q]<-equals(rank[k,j],q)
cumrankprop[k,j,q]<-step(q-rank[k,j]) #
indicator for time spent at or below each rank
sucra[k,j]<-sum(cumrankprop[k,j,1:nt-1])/(nt-1) # SUCRA
nmaresdev<-sum(resdev[]) # summed
overall residual deviance
# END
Treatment effects module code: Model 3
# This model uses random effects; to obtain a fixed effects model replace the blue line of code with delta[i,k,j] <- H[i,k,j]
# This model uses random mappingss; to obtain a fixed mappings model replace the red line of code with lbeta[k,j] <- log(abs(b[j]))
Appendix B
332
model
sig~dunif(0,10) # prior for between-study sd of treatment effects
tau<-pow(sig,-2) # between-study precision of treatment effects
for(i in 1:ns)
E[i]~dnorm(0,1) # normalised between-trial different-arm different-
outcome covariance of treatment effects (delta)
resdev[i]<-inprod(pres[i,1:no[i]*na[i]],res[i,1:no[i]*na[i]]) #
residual deviance for study i
cp[i,1:totalo*maxarms,1:totalo*maxarms]<-
inverse(cv[i,1:totalo*maxarms,1:totalo*maxarms])
# within-study coprecision matrix of outcomes in study i
for (j in 1:no[i])
G[i,j]~dnorm(0,1) # normalised between-trial different-arm same-outcome
covariance of treatment effects (delta)
mu[i,j]~dnorm(0,.001) # "average" level of outcome j in study
i across all trial arms
sdelta[i,j]<-sum(adelta[i,1:na[i],j])/na[i] # effect of "average"
treatment in study i on outcome j relative to reference
for (k in 1:na[i])
D[i,k,j] <- mu[i,j] + adelta[i,k,j] - sdelta[i,j] +
signr_w[o[i,j]]*sqrt(abs(rho_w[o[i,j]]))*B[i,k]*pow(prec[i,k,j],-0.5)
# mean of outcome j in arm k of study i is the average across all arms plus
the effect of treatment compared to average; final term induces required
within-study covariance between different outcomes in same arm
y[i,k,j]~dnorm(D[i,k,j],yprec[i,k,j]) # distribution
of outcome j in arm k of study i
prec[i,k,j]<-pow(va[i,k,j]/n[i,k],-1) # overall
variance of observed outcome y
yprec[i,k,j] <- prec[i,k,j]/(1-abs(rho_w[o[i,j]])) # remaining
(unshared) precision of y[I,k,j] after accounting for covariance
taud[i,k,j]<-tau/(1-abs(rho_b[o[i,j]])-0.5+0.5*abs(rho_b[o[i,j]]))
# remaining (unshared) precision of delta after accounting for covariances
delta[i,k,j] ~ dnorm(H[i,k,j],taud[i,k,j]) # distribution of
trial-specific treatment effect on outcome j in arm k
adelta[i,k,j]<-(1-equals(t[i,k],1))*delta[i,k,j] # treatment effect
set to zero for reference treatment
H[i,k,j] <- d[t[i,k],o[i,j]] +
signr_b[o[i,j]]*(sqrt(abs(rho_b[o[i,j]])*0.5)*E[i]+signr_b[o[i,j]]*sqrt(abs
(rho_b[o[i,j]])-abs(rho_b[o[i,j]])*0.5)*F[i,k] + signr_b[o[i,j]]*sqrt(0.5-
abs(rho_b[o[i,j]])*0.5)*G[i,j])* pow(tau,-0.5)
# mean of treatment effect for outcome j in arm k of study i is the
population average effect parameter for that treatment/outcome, with
adjustments for correlations: different-arm/different-outcome, same-
arm/different-outcome, same-arm/same-outcome
for (k in 1:na[i])
B[i,k]~dnorm(0,1) # normalised within-trial same-arm different-
outcome covariance of observed outcomes (y)
F[i,k] ~ dnorm(0,1) # normalised between-trial same-arm
different-outcome covariance of treatment effects (delta)
Appendix B
333
for (x in 1:no[i]*na[i]) # indexing variable x loops through all
arm/outcome combinations in study i
arm[i,x]<-trunc(1+(x-1)/no[i]) # finds within-trial arm number
corresponding to each value of x
out[i,x]<-x-no[i]*trunc((x-1)/no[i]) # finds within-trial outcome
number corresponding to each value of x
for (z in 1:no[i]*na[i]) cv[i,x,z]<-
pow(prec[i,arm[i,x],out[i,x]]*prec[i,arm[i,z],out[i,z]],-
0.5)*(signr_w[o[i,out[i,x]]]*signr_w[o[i,out[i,z]]]*sqrt(abs(rho_w[o[i,out[
i,x]]]*rho_w[o[i,out[i,z]]]))*equals(arm[i,x],arm[i,z])+(1-
signr_w[o[i,out[i,x]]]*signr_w[o[i,out[i,z]]]*sqrt(abs(rho_w[o[i,out[i,x]]]
*rho_w[o[i,out[i,z]]])))*equals(x,z))
# within-study covariance matrix element representing covariance between
arm/outcome combinations x and z; matrix is needed to calculate residual
deviance
for (j in no[i]*na[i]+1:totalo*maxarms)
# covariance matrix needs extra columns and rows to standardise its
dimensions across studies
cv[i,x,j]<-0 # fill in redundant off-diagonal elements of the
covariance matrix with zeroes
cv[i,j,x]<-0 # fill in redundant off-diagonal elements of the
covariance matrix with zeroes
res[i,x]<-y[i,arm[i,x],out[i,x]] - mu[i,out[i,x]] -
adelta[i,arm[i,x],out[i,x]] + sdelta[i,out[i,x]]
# residual for arm/outcome combination x in study i
pres[i,x]<-inprod(cp[i,x,1:no[i]*na[i]],res[i,1:no[i]*na[i]])
# inner product of residuals and row of coprecision matrix (for residual
deviance calculation)
for (j in no[i]*na[i]+1:totalo*maxarms-1)
# covariance matrix needs extra columns and rows to standardise its
dimensions across studies
for (k in j+1:totalo*maxarms) cv[i,j,k]<-0
# fill in redundant off-diagonal elements of the covariance matrix with
zeroes
cv[i,k,j]<-0 # fill in redundant off-diagonal elements of the
covariance matrix with zeroes
cv[i,j,j]<-1 # fill in redundant diagonal elements of the
covariance matrix with 1s
cv[i,totalo*maxarms,totalo*maxarms]<-1 # fill in final redundant
diagonal element of the covariance matrix with a 1
for (m in 1:ng) # cycle through outcome groups
b[ogbase[m]]<-sign[ogbase[m]] # mean mapping is +/-1 for base outcome
in each group
for (j in 2:totalo) b[j] ~ dnorm(0,.01) # mean mapping
for outcome j relative to outcome 1
Appendix B
334
for(j in 1:totalo) sb[j]<-sign[j]*abs(b[j]) # mean mapping
for outcome j with correct (known) sign
lb[j]<-log(abs(b[j]))
maptau~dgamma(.005,.005) I(1,) # Lu-Ades prior for
mapping precision
mapsig <- pow(maptau,-0.5) # sd of mappings on
outcome 1
for (k in 2:nt)
W[k]~dnorm(0,1) # normalised covariation of mappings
for (m in 2:ng+1)
beta[k,ogbase[m-1]]<-sign[ogbase[m-1]] # treatment-specific mapping is
+/-1 for base outcome in each group
ad[k,ogbase[m-1]]~dnorm(0,.001) # prior for population-mean treatment
effect of each treatment on outcome 1
for (j in ogbase[m-1]+1:ogbase[m]-1)
ad[k,j]<-(beta[k,j]/beta[k,ogbase[m-1]])*ad[k,ogbase[m-1]]
# population-mean treatment effect on outcome j is mapped from mean effect
on base outcome for that group
beta[k,j]<-sign[j]*exp(lbeta[k,j]) # treatment-specific
mappings with correct sign
lbeta[k,j] ~ dnorm(bW[k,j], lbetatau[k,j]) # treatment-
specific mapping distribution on log scale
lbetatau[k,j]<-pow(mapsig,-2)/(1-0.5) # precision
corresponding to half of mapping sd
bW[k,j]<-log(abs(b[j]))+sqrt(0.5)*mapsig*W[k] # mean mappings
with adjustment for correlations
for (j in 1:totalo)
ad[1,j]<-0 # mean treatment effect is zero on
reference treatment
signr_b[j]<-step(rho_b[j]) # sign of between-study
correlations
signr_w[j]<-step(rho_w[j]) # sign of within-study
correlations
for (k in 1:nt) for (j in 1:totalo)
rank[k,j]<-equals(impact[j],-1)*rank(d[,j],k)+equals(impact[j],1)*(nt+1-
rank(d[,j],k)) # treatment rankings by outcome
d[k,j]<-ad[k,j] # assign known signs to treatment
effects
for (q in 1:nt)
rankprop[k,j,q]<-equals(rank[k,j],q) # indicator for time
spent at each rank
cumrankprop[k,j,q]<-step(q-rank[k,j]) # indicator for time
spent at or below each rank
sucra[k,j]<-sum(cumrankprop[k,j,1:nt-1])/(nt-1) # SUCRA
nmaresdev<-sum(resdev[]) # summed overall residual
deviance
# END
Appendix B
335
Treatment effects module code: Model 4a
Variables and constants specific to Models 4a and 4b Name Description no1[i] Number of outcomes in study i excluding the binary outcoms with
zeroes totalo1 Total number of outcomes in dataset excluding the binary outcoms with
zeroes
# This model uses random effects; to obtain a fixed effects model replace the blue line of code with delta[i,k,j] <- H[i,k,j] # This model uses random mappingss; to obtain a fixed mappings model replace the red line of code with lbeta[k,j] <- log(abs(b[j])) model
### TREATMENT EFFECTS MODEL
sig~dunif(0,10) # prior for between-study sd of treatment effects
tau<-pow(sig,-2) # between-study precision of treatment effects
for (j in 1:totalo1) ff[j]<-1
for (j in totalo1+1:totalo) ff[j]<-1
for (j in 1:totalo)
ad[1,j]<-0 # mean treatment effect is zero on
reference treatment
signr_b[j]<-step(rho_b[j]) # sign of between-study
correlations
signr_w[j]<-step(rho_w[j]) # sign of within-study
correlations
for (k in 1:nt) d[k,j]<-ad[k,j]*(1-ze[j,k])
# assign known signs to treatment effects
for(i in 1:ns)
temp[i]<-no2[i]*ns2+sum(sw2[])*no[i] # unused variables
E[i]~dnorm(0,1) # normalised between-trial different-arm different-
outcome covariance of treatment effects (delta)
resdev[i]<-inprod(pres[i,1:no1[i]*na[i]],res[i,1:no1[i]*na[i]]) #
residual deviance for study i
cp[i,1:totalo1*maxarms,1:totalo1*maxarms]<-
inverse(cv[i,1:totalo1*maxarms,1:totalo1*maxarms])
# within-study coprecision matrix of outcomes in study i
for (j in 1:totalo1) mu[i,j]~dnorm(0,.01) # "average"
level of outcome j in study i across all trial arms
for (j in totalo1+1:totalo) mu[i,j]~dgamma(0.5,0.5)
for (j in 1:no1[i])
G[i,j]~dnorm(0,1) # normalised between-trial different-arm same-outcome
covariance of treatment effects (delta)
sdelta[i,j]<-sum(adelta[i,1:na[i],j])/na[i] # effect of "average"
treatment in study i on outcome j relative to reference
for (k in 1:na[i])
Appendix B
336
D[i,k,j] <- mu[i,o[i,j]] + adelta[i,k,j] - sdelta[i,j] +
signr_w[o[i,j]]*sqrt(abs(rho_w[o[i,j]]))*B[i,k]*pow(prec[i,k,j],-0.5)
# mean of outcome j in arm k of study i is the average across all arms plus
the effect of treatment compared to average; final term induces required
within-study covariance between different outcomes in same arm
y[i,k,j]~dnorm(D[i,k,j],yprec[i,k,j]) # distribution
of outcome j in arm k of study i
prec[i,k,j]<-pow(va[i,k,j]/n[i,k],-1) # overall
variance of observed outcome y
yprec[i,k,j] <- prec[i,k,j]/(1-abs(rho_w[o[i,j]])) # remaining
(unshared) precision of y[I,k,j] after accounting for covariance
for (k in 1:na[i])
B[i,k]~dnorm(0,1) # normalised within-trial same-arm different-
outcome covariance of observed outcomes (y)
F[i,k] ~ dnorm(0,1) # normalised between-trial same-arm
different-outcome covariance of treatment effects (delta)
# iszeroarm[i,k]<-equals(t[i,k],1) # equals 1 if arm k of trial i is
reference treatment
for (j in 1:no1[i])
taud[i,k,j]<-tau/(1-abs(rho_b[o[i,j]])-0.5+0.5*abs(rho_b[o[i,j]]))
# remaining (unshared) precision of delta after accounting for covariances
delta[i,k,j] ~ dnorm(H[i,k,j],taud[i,k,j]) # distribution of
trial-specific treatment effect on outcome j in arm k
adelta[i,k,j]<-(1-ze[o[i,j],t[i,k]])*delta[i,k,j]
H[i,k,j] <- d[t[i,k],o[i,j]] +
signr_b[o[i,j]]*(sqrt(abs(rho_b[o[i,j]])*0.5)*E[i]+signr_b[o[i,j]]*sqrt(abs
(rho_b[o[i,j]])-abs(rho_b[o[i,j]])*0.5)*F[i,k] + signr_b[o[i,j]]*sqrt(0.5-
abs(rho_b[o[i,j]])*0.5)*G[i,j])* pow(tau,-0.5)
# mean of treatment effect for outcome j in arm k of study i is the
population average effect parameter for that treatment/outcome, with
adjustments for correlations: different-arm/different-outcome, same-
arm/different-outcome, same-arm/same-outcome
for (i in 1:ns2)
for (j in no1[sw2[i]]+1:no[sw2[i]])
for (k in 1:na[sw2[i]])
pi[sw2[i],k,j] <- mu[sw2[i],o[sw2[i],j]] +min(max(adelta[sw2[i],k,j],-
mu[sw2[i],o[sw2[i],j]]),1-mu[sw2[i],o[sw2[i],j]])
adelta[sw2[i],k,j]<- d[t[sw2[i],k],o[sw2[i],j]]
y[sw2[i],k,j]~dbin(pi[sw2[i],k,j],va[sw2[i],k,j])
yhat[sw2[i],k,j] <- pi[sw2[i],k,j] * va[sw2[i],k,j]
dev[sw2[i],k,j] <- 2 * (y[sw2[i],k,j] * (log(y[sw2[i],k,j])-
log(yhat[sw2[i],k,j])) + (va[sw2[i],k,j] - y[sw2[i],k,j]) *
(log(va[sw2[i],k,j] - y[sw2[i],k,j]) - log(va[sw2[i],k,j] -
yhat[sw2[i],k,j])))
prd[sw2[i],j]<-sum(dev[sw2[i],1:na[sw2[i]],j])
resdev2[i]<-sum(prd[sw2[i],no1[sw2[i]]+1:no[sw2[i]]])
Appendix B
337
### MAPPINGS
for (m in 1:ng) # cycle through outcome groups
sb[ogbase[m]]<-sign[ogbase[m]] # mean mapping is +/-1 for base
outcome in each group
for (j in ogbase[m]+1:ogbase[m+1]-1)
sb[j]<-(sign[j]/sign[ogbase[m]])*abs(b[j])
for (j in 1:totalo) b[j] ~ dnorm(0,.01)
lb[j]<-log(abs(sb[j]))
maptau~dgamma(.005,.005) I(1,) # Lu-Ades prior for
mapping precision
mapsig <- pow(maptau,-0.5) # sd of mappings on
outcome 1
for (k in 2:nt)
W[k]~dnorm(0,1) # normalised covariation of mappings
for (m in 2:ng+1)
beta[k,ogbase[m-1]]<- sign[ogbase[m-1]] # treatment-specific mapping is
+/-1 for base outcome in each group
ad1[k,ogbase[m-1]]~dnorm(0,.001) I(0,)
ad2[k,ogbase[m-1]]~dbeta(0.5,0.5)
ad[k,ogbase[m-1]]<-sign[ogbase[m-1]]*(step(totalo1+0.5-ogbase[m-
1])*ad1[k,ogbase[m-1]]+step(ogbase[m-1]-totalo1-0.5)*ad2[k,ogbase[m-1]])
for (j in ogbase[m-1]+1:ogbase[m]-1)
ad[k,j]<- (beta[k,j]/beta[k,ogbase[m-1]])*abs(ad[k,ogbase[m-1]] )
# population-mean treatment effect on outcome j is mapped from mean effect
on base outcome for that group
beta[k,j]<-(sign[j]/sign[ogbase[m-1]])*exp(lbeta[k,j]) #
treatment-specific mappings with correct sign
lbeta[k,j] ~ dnorm(bW[k,j], lbetatau[k,j]) # treatment-
specific mapping distribution on log scale
lbetatau[k,j]<-pow(mapsig,-2)/(1-0.5) # precision
corresponding to half of mapping sd
bW[k,j]<-log(abs(b[j]))+sqrt(0.5)*mapsig*W[k] # mean mappings
with adjustment for correlations
### RESIDUAL DEVIANCE
nmaresdev<-sum(resdev[]) # summed
overall residual deviance
for (i in 1:ns)
for (x in 1:no1[i]*na[i]) # indexing variable x loops through all
arm/outcome combinations in study i
arm[i,x]<-trunc(1+(x-1)/no1[i]) # finds within-trial arm number
corresponding to each value of x
out[i,x]<-x-no1[i]*trunc((x-1)/no1[i]) # finds within-trial outcome
number corresponding to each value of x
for (z in 1:no1[i]*na[i]) cv[i,x,z]<-
pow(prec[i,arm[i,x],out[i,x]]*prec[i,arm[i,z],out[i,z]],-
0.5)*(signr_w[o[i,out[i,x]]]*signr_w[o[i,out[i,z]]]*sqrt(abs(rho_w[o[i,out[
Appendix B
338
i,x]]]*rho_w[o[i,out[i,z]]]))*equals(arm[i,x],arm[i,z])+(1-
signr_w[o[i,out[i,x]]]*signr_w[o[i,out[i,z]]]*sqrt(abs(rho_w[o[i,out[i,x]]]
*rho_w[o[i,out[i,z]]])))*equals(x,z))
# within-study covariance matrix element representing covariance between
arm/outcome combinations x and z; matrix is needed to calculate residual
deviance
for (j in no1[i]*na[i]+1:totalo1*maxarms)
# covariance matrix needs extra columns and rows to standardise its
dimensions across studies
cv[i,x,j]<-0 # fill in redundant off-diagonal elements of the
covariance matrix with zeroes
cv[i,j,x]<-0 # fill in redundant off-diagonal elements of the
covariance matrix with zeroes
res[i,x]<-y[i,arm[i,x],out[i,x]] - mu[i,o[i,out[i,x]]] -
adelta[i,arm[i,x],out[i,x]] + sdelta[i,out[i,x]]
# residual for arm/outcome combination x in study i
pres[i,x]<-inprod(cp[i,x,1:no1[i]*na[i]],res[i,1:no1[i]*na[i]])
# inner product of residuals and row of coprecision matrix (for residual
deviance calculation)
for (j in no1[i]*na[i]+1:totalo1*maxarms-1)
# covariance matrix needs extra columns and rows to standardise its
dimensions across studies
for (k in j+1:totalo1*maxarms) cv[i,j,k]<-0
# fill in redundant off-diagonal elements of the covariance matrix with
zeroes
cv[i,k,j]<-0 # fill in redundant off-diagonal elements of the
covariance matrix with zeroes
cv[i,j,j]<-1 # fill in redundant diagonal elements of the
covariance matrix with 1s
cv[i,totalo1*maxarms,totalo1*maxarms]<-1 # fill in final redundant
diagonal element of the covariance matrix with a 1
# END
Treatment effects module code: Model 4b
Variables and constants specific to Models 4a and 4b Name Description no1[i] Number of outcomes in study i excluding the binary outcoms with
zeroes totalo1 Total number of outcomes in dataset excluding the binary outcoms with
zeroes
# This model uses random effects; to obtain a fixed effects model replace the blue line of code with adelta[i,k,j] <- H[i,k,j]
# This model uses random mappingss; to obtain a fixed mappings model replace the red line of code with lbeta[k,j] <- log(abs(b[j])) model
Appendix B
339
### TREATMENT EFFECTS MODEL
sig~dunif(0,10) # prior for between-study sd of treatment effects
tau<-pow(sig,-2) # between-study precision of treatment effects
for (j in 1:totalo1) ff[j]<-1
for (j in totalo1+1:totalo) ff[j]<-1
for (j in 1:totalo)
ad[1,j]<-0 # mean treatment effect is zero on
reference treatment
signr_b[j]<-step(rho_b[j]) # sign of between-study
correlations
signr_w[j]<-step(rho_w[j]) # sign of within-study
correlations
for (k in 1:nt) d[k,j]<-ad[k,j]*(1-ze[j,k])
# assign known signs to treatment effects
for(i in 1:ns)
temp[i]<-no2[i]
E[i]~dnorm(0,1)
for (j in 1:no1[i])
mu[i,j]~dnorm(0,.01) # "average" level of outcome j in
study i across all trial arms
for (j in no1[i]+1:no[i])
mu[i,j]~dgamma(0.5,0.5) # "average" level of outcome j in
study i across all trial arms
for (j in 1:no[i])
G[i,j]~dnorm(0,1)
sdelta[i,j]<-sum(delta[i,1:na[i],j])/na[i] # effect of "average"
treatment in study i on outcome j relative to reference
for (k in 1:na[i])
Dmu[i,k,j] <- step(totalo1+0.5-o[i,j])*(mu[i,j] + delta[i,k,j] -
sdelta[i,j]) + step(o[i,j]-totalo1-0.5)*min(1,max(0,mu[i,j] + delta[i,k,j]
- sdelta[i,j]))
D[i,k,j] <- mu[i,j] + delta[i,k,j] - sdelta[i,j] +
signr_w[o[i,j]]*sqrt(abs(rho_w[o[i,j]]))*B[i,k]*pow(prec[i,k,j]/ff[o[i,j]],
-0.5)
# mean of outcome j in arm k of study i is the average across all arms plus
the effect of treatment compared to average; final term induces required
within-study covariance between different outcomes in same arm
DD[i,k,j]<-step(totalo1+0.5-o[i,j])*D[i,k,j] + step(o[i,j]-totalo1-
0.5)*min(1,max(0,D[i,k,j]))
y[i,k,j]~dnorm(DD[i,k,j],yprec[i,k,j]) # distribution of outcome j
in arm k of study i
prec[i,k,j]<-pow(va[i,k,j]/n[i,k],-1) # overall precision
of observed outcome y
yprec[i,k,j] <- (prec[i,k,j]/(1-abs(rho_w[o[i,j]])))/ff[o[i,j]] #
remaining (unshared) precision of y after accounting for covariance
taud[i,k,j]<-tau/(1-abs(rho_b[o[i,j]])-0.5+0.5*abs(rho_b[o[i,j]]))
# remaining (unshared) precision of delta after accounting for covariances
adelta[i,k,j] ~ dnorm(H[i,k,j],taud[i,k,j]) # distribution of
trial-specific treatment effect on outcome j in arm k
Appendix B
340
delta[i,k,j]<-step(o[i,j]-totalo1-0.5)*(1-
ze[o[i,j],t[i,k]])*d[t[i,k],o[i,j]]+step(totalo1+0.5-o[i,j])*(1-
ze[o[i,j],t[i,k]])*adelta[i,k,j] # select appropriate
treatment effect parameter for this study arm and outcome
H[i,k,j] <- d[t[i,k],o[i,j]] +
signr_b[o[i,j]]*(sqrt(abs(rho_b[o[i,j]])*0.5)*E[i]+signr_b[o[i,j]]*sqrt(abs
(rho_b[o[i,j]])-abs(rho_b[o[i,j]])*0.5)*F[i,k] + signr_b[o[i,j]]*sqrt(0.5-
abs(rho_b[o[i,j]])*0.5)*G[i,j])* pow(tau,-0.5)
for (k in 1:na[i])
B[i,k]~dnorm(0,1) # normalised within-trial same-arm different-outcome
covariance of observed outcomes (y)
F[i,k]~dnorm(0,1)
### MAPPINGS
for (m in 1:ng) # cycle through outcome groups
sb[ogbase[m]]<-sign[ogbase[m]] # mean mapping is +/-1 for base
outcome in each group
for (j in ogbase[m]+1:ogbase[m+1]-1)
sb[j]<-(sign[j]/sign[ogbase[m]])*abs(b[j])
for (j in 1:totalo) b[j] ~ dnorm(0,.01)
lb[j]<-log(abs(sb[j]))
maptau~dgamma(.005,.005) I(1,) # Lu-Ades prior for
mapping precision
mapsig <- pow(maptau,-0.5) # sd of mappings on
outcome 1
for (k in 2:nt)
W[k]~dnorm(0,1) # normalised covariation of mappings
for (m in 2:ng+1)
beta[k,ogbase[m-1]]<- sign[ogbase[m-1]] # treatment-specific mapping is
+/-1 for base outcome in each group
ad1[k,ogbase[m-1]]~dnorm(0,.001) I(0,)
ad2[k,ogbase[m-1]]~dbeta(0.5,0.5)
ad[k,ogbase[m-1]]<-sign[ogbase[m-1]]*(step(totalo1+0.5-ogbase[m-
1])*ad1[k,ogbase[m-1]]+step(ogbase[m-1]-totalo1-0.5)*ad2[k,ogbase[m-1]])
for (j in ogbase[m-1]+1:ogbase[m]-1)
ad[k,j]<- (beta[k,j]/beta[k,ogbase[m-1]])*abs(ad[k,ogbase[m-1]] )
# population-mean treatment effect on outcome j is mapped from mean effect
on base outcome for that group
beta[k,j]<-(sign[j]/sign[ogbase[m-1]])*exp(lbeta[k,j]) #
treatment-specific mappings with correct sign
lbeta[k,j] ~ dnorm(bW[k,j], lbetatau[k,j]) # treatment-
specific mapping distribution on log scale
lbetatau[k,j]<-pow(mapsig,-2)/(1-0.5) # precision
corresponding to half of mapping sd
bW[k,j]<-log(abs(b[j]))+sqrt(0.5)*mapsig*W[k] # mean mappings
with adjustment for correlations
Appendix B
341
### RESIDUAL DEVIANCE
nmaresdev<-sum(resdev[]) # summed
overall residual deviance
for (i in 1:ns)
resdev[i]<-inprod(pres[i,1:no[i]*na[i]],res[i,1:no[i]*na[i]])
# residual deviance for study i
cp[i,1:totalo*maxarms,1:totalo*maxarms]<-
inverse(cv[i,1:totalo*maxarms,1:totalo*maxarms]) # within-study
coprecision matrix of outcomes in study i
for (x in 1:no[i]*na[i]) # indexing variable x loops through all
arm/outcome combinations in study i
arm[i,x]<-trunc(1+(x-1)/no[i]) # finds within-trial arm number
corresponding to each value of x
out[i,x]<-x-no[i]*trunc((x-1)/no[i]) # finds within-trial outcome
number corresponding to each value of x
for (z in 1:no[i]*na[i])
cv[i,x,z]<-
pow((prec[i,arm[i,x],out[i,x]]/ff[o[i,out[i,x]]])*(prec[i,arm[i,z],out[i,z]
]/ff[o[i,out[i,z]]]),-
0.5)*(signr_w[o[i,out[i,x]]]*signr_w[o[i,out[i,z]]]*sqrt(abs(rho_w[o[i,out[
i,x]]]*rho_w[o[i,out[i,z]]]))*equals(arm[i,x],arm[i,z])+(1-
signr_w[o[i,out[i,x]]]*signr_w[o[i,out[i,z]]]*sqrt(abs(rho_w[o[i,out[i,x]]]
*rho_w[o[i,out[i,z]]])))*equals(x,z))
# within-study covariance matrix element representing covariance between
arm/outcome combinations x and z; matrix is needed to calculate residual
deviance
for (j in no[i]*na[i]+1:totalo*maxarms)
# covariance matrix needs extra columns and rows to standardise its
dimensions across studies
cv[i,x,j]<-0 # fill in redundant off-diagonal elements of the
covariance matrix with zeroes
cv[i,j,x]<-0 # fill in redundant off-diagonal elements of the
covariance matrix with zeroes
res[i,x]<-y[i,arm[i,x],out[i,x]] - Dmu[i,arm[i,x],out[i,x]]
# residual for arm/outcome combination x in study i
pres[i,x]<-inprod(cp[i,x,1:no[i]*na[i]],res[i,1:no[i]*na[i]])
# inner product of residuals and row of coprecision matrix (for residual
deviance calculation)
for (j in no[i]*na[i]+1:totalo*maxarms-1)
# covariance matrix needs extra columns and rows to standardise its
dimensions across studies
for (k in j+1:totalo*maxarms)
cv[i,j,k]<-0 # fill in redundant off-diagonal elements of
the covariance matrix with zeroes
cv[i,k,j]<-0 # fill in redundant off-diagonal elements of
the covariance matrix with zeroes
cv[i,j,j]<-1 # fill in redundant diagonal elements of the
covariance matrix with 1s
cv[i,totalo*maxarms,totalo*maxarms]<-1 # fill in final redundant
diagonal element of the covariance matrix with a 1
Appendix B
342
for (k in 1:nt)
for (j in 1:totalo)
rank[k,j]<-equals(impact[j],-1)*rank(d[,j],k)+equals(impact[j],1)*(nt+1-
rank(d[,j],k)) # treatment rankings by outcome
for (q in 1:nt)
rankprop[k,j,q]<-equals(rank[k,j],q)
cumrankprop[k,j,q]<-step(q-rank[k,j]) # indicator for time
spent at or below each rank
sucra[k,j]<-sum(cumrankprop[k,j,1:nt-1])/(nt-1) # SUCRA
#END
Population calibration module code
# This model assumes random effects for outcomes numbered up to and including totalo1 and fixed effects for the remaining outcomes. ### POPULATION CALIBRATION MODEL
for (i in 1:ns)
Q[i]~dnorm(0,1)
for (k in 1:na[i]) S[i,k]~dnorm(0,1)
for (j in 1:no1[i]) alpha[i,j]<-aalpha[i,j]
for (j in no1[i]+1:no[i]) alpha[i,j]<-min(1,max(0,aalpha[i,j]))
for (j in 1:no[i])
aalpha[i,j]~dnorm(amu[i,j],aprec[i,j])
amu[i,j]<-a[o[i,j]]+signr_b[o[i,j]]*zi*sqrt(abs(rho_b[o[i,j]]))*Q[i]
aprec[i,j]<-pow(zi,-2)/(1-abs(rho_b[o[i,j]]))
for (k in 1:na[i])
pm_y[i,k,j]<-y[i,k,j]
pm_va[i,k,j]<-va[i,k,j]
pm_va_prec[i,k,j]<-pow(pm_va[i,k,j]*sqrt(2/n[i,k]),-1)
pm_va[i,k,j]~dnorm(pm_va_mu[o[i,j]],pm_va_prec[i,k,j])
pm_prec[i,k,j]<-pow(pm_va[i,k,j]/n[i,k],-1)/((1-
abs(rho_w[o[i,j]]))*ff[o[i,j]])
pm_mu[i,k,j]<-step(o[i,j]-totalo1-0.5)*a[o[i,j]]+step(totalo1+0.5-
o[i,j])*alpha[i,j]+signr_w[o[i,j]]*sqrt(abs(rho_w[o[i,j]]))*sqrt(pm_va[i,k,
j]/n[i,k])*S[i,k] + cut(delta[i,k,j])
pm_y[i,k,j]~dnorm(pm_mu[i,k,j],pm_prec[i,k,j])
zi~dunif(0,10)
for (j in 1:totalo1)
a[j]~dnorm(0,.001)
for (j in totalo1+1:totalo)
a[j]~dgamma(0.5,0.5)
for (j in 1:totalo) for (k in 1:nt)
Appendix B
343
absd[k,j]<-step(totalo1+0.5-j)*(a[j]+d[k,j])+step(j-totalo1-
0.5)*max(0,min(1,a[j]+d[k,j]))
### PREDICTIVE DISTRIBUTIONS
E[ns+1]~dnorm(0,1)
Q[ns+1]~dnorm(0,1)
for (k in 1:nt)
F[ns+1,k]~dnorm(0,1)
S[ns+1,k]~dnorm(0,1)
for (j in 1:totalo1)
alpha[ns+1,j]<-aalpha[ns+1,j]
pm_va_mu[j]~dunif(0,100)
for (j in totalo1+1:totalo)
alpha[ns+1,j]<-max(0,aalpha[ns+1,j])
pm_va_mu[j]~dunif(0,0.0001)
for (j in 1:totalo)
G[ns+1,j]~dnorm(0,1)
amu[ns+1,j]<-a[j]+signr_b[j]*zi*sqrt(abs(rho_b[j]))*Q[ns+1]
aprec[ns+1,j]<-pow(zi,-2)/(1-abs(rho_b[j]))
aalpha[ns+1,j]~dnorm(amu[ns+1,j],aprec[ns+1,j])
for (k in 1:nt)
taud[ns+1,k,j]<-tau/(1-abs(rho_b[j])-0.5+0.5*abs(rho_b[j]))
adelta[ns+1,k,j] ~ dnorm(H[ns+1,k,j],taud[ns+1,k,j])
delta[ns+1,k,j]<-step(j-totalo1-0.5)*(1-ze[j,k])*d[k,j]+step(totalo1+0.5-
j)*(1-ze[j,k])*adelta[ns+1,k,j] # select appropriate
treatment effect parameter for this study arm and outcome
H[ns+1,k,j] <- d[k,j] +
signr_b[j]*(sqrt(abs(rho_b[j])*0.5)*E[ns+1]+signr_b[j]*sqrt(abs(rho_b[j])-
abs(rho_b[j])*0.5)*F[ns+1,k] + signr_b[j]*sqrt(0.5-
abs(rho_b[j])*0.5)*G[ns+1,j])* pow(tau,-0.5)
pm_prec[ns+1,k,j]<-pow(pm_va_mu[j]*ff[j],-1)/(1-abs(rho_w[j]))
pm_amu[ns+1,k,j]<-step(j-totalo1-
0.5)*min(1,max(0,a[j]+cut(delta[ns+1,k,j])))+step(totalo1+0.5-
j)*(alpha[ns+1,j]+cut(delta[ns+1,k,j]))
pm_mu[ns+1,k,j]<- pm_amu[ns+1,k,j]
+signr_w[j]*sqrt(abs(rho_w[j]))*sqrt(pm_va_mu[j])*S[ns+1,k]
apred_y[k,j]~dnorm(pm_mu[ns+1,k,j],pm_prec[ns+1,k,j])
pred_y[k,j]<- step(totalo1+0.5-j)*apred_y[k,j] + step(j-totalo1-
0.5)*max(0,apred_y[k,j])
### TRANSFORMATIONS (NOTE - HARD CODED TO OUTCOME TYPES FROM RRMS CASE
STUDY)
for (k in 1:nt)
trad[k,1]<-exp(absd[k,1])
trad_pred_study[k,1]<-exp(pm_amu[ns+1,k,1])
trad_pred_y[k,1]<-exp(pred_y[k,1])
for (j in 2:7) trad[k,j]<-exp(absd[k,j])/(1+exp(absd[k,j]))
trad_pred_study[k,j]<-exp(pm_amu[ns+1,k,j])/(1+exp(pm_amu[ns+1,k,j]))
Appendix B
344
trad_pred_y[k,j]<-exp(pred_y[k,j])/(1+exp(pred_y[k,j]))
for (j in 8:10) trad[k,j] <- absd[k,j]
trad_pred_study[k,j]<-pm_amu[ns+1,k,j]
trad_pred_y[k,j]<-pred_y[k,j]
for (j in 11:12) trad[k,j]<-d[k,j]
trad_pred_study[k,j]<-d[k,j]
trad_pred_y[k,j]<-d[k,j]
# END
Appendix B
345
RRMS case study data
This section sets out the data in BUGS format; see Appendix A for details of the original source data).
Each version of the model requires a set of parameters specified in list format and a rectangular
array of trial data. Additionally, Models 4a and 4b require a second rectangular array to indicate
which treatment effects are assumed to equal zero.
The table below shows the list data for each version of the model.
Parameter values (list format) for Model 0 list(rho_b=c(0.6,0.6,0.6,0.6,0.6,0.6,0.6,0,0,0),ns=16,totalo1=1,
totalo2=6, totalo=10,maxarms=4,nt=9,impact=c(-1,1,-1,-1,-1,-1,-1,-1,-1,-
1))
Parameter values (list format) for Models 1,1*,2,3 (7 outcomes only) Model 3 (one mapping group) using the following data in list format: list(rho_b=c(0.6,0.6,0.6,0.6,0.6,0.6,0.6),rho_w=c(0.6,0.6,0.6,0.6,0.6,0.6
,0.6),ns=16,totalo=7,maxarms=4,nt=9,impact=c(-1,1,-1,-1,-1,-1,-
1),sign=c(-1,1,-1,-1,1,1,1),ogbase=c(1,8),ng=1)
For models 1, 1* and 2, remove the red and green data. For model 3 with two mapping groups, replace the data in green with ogbase=c(1,5,8),ng=2 For model 3 with three mapping groups, replace the data in green with ogbase=c(1,3,5,8),ng=3
Parameter values (list format) for Models 4a, 4b Model 4a (one mapping group) uses the following data in list format: list(rho_b=c(0.6,0.6,0.6,0.6,0.6,0.6,0.6,0,0,0),rho_w=c(0.6,0.6,0.6,0.6,0
.6,0.6,0.6,0.6,0.6,0.6),totalo=10,totalo1=7,maxarms=4,nt=9,impact=c(-
1,1,-1,-1,-1,-1,-1,-1,-1,-1,1,1),sign=c(-1,1,-1,-1,1,1,1,1,1,1,1,1),
ogbase=c(1,8,9,10,11,12,13),ng=6,ns=16,ns2=6,sw2=c(2,3,6,7,8,14)
For model 4b, remove the blue data. For two mapping groups, replace the data in green with ogbase=c(1,5,8,9,10,11,12,13),ng=7
For three mapping groups, replace the data in green with ogbase=c(1,3,5,8,9,10,11,12,13),ng=8
Appendix B
346
The table below shows the additional “zeroes” data for Models 4a and 4b. Columns correspond to
treatments and rows to outcomes; a value of 1 indicates that the corresponding treatment effect
will be fixed at zero.
Table of assumed zeroes for Models 4a,4b
ze[,1] ze[,2] ze[,3] ze[,4] ze[,5] ze[,6] ze[,7] ze[,8] ze[,9]
1 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
1 0 1 0 1 1 1 0 0
1 1 0 1 1 1 1 1 1
1 1 0 1 1 1 1 1 1
END
The table below shows the rectangular arrays of trial data that are required for each version of the
model.
Trial data (rectangular format) for Model 0 na[] no[] no1[] no2[] o[,1] o[,2] o[,3] o[,4] o[,5] o[,6] o[,7] o[,8]
o[,9] n[,1] n[,2] n[,3] t[,1] t[,2] t[,3] y[,1,1] y[,1,2]
y[,1,3] y[,1,4] y[,1,5] y[,1,6] y[,1,7] y[,1,8]
y[,1,9] y[,2,1] y[,2,2] y[,2,3] y[,2,4] y[,2,5]
y[,2,6] y[,2,7] y[,2,8] y[,2,9] y[,3,1] y[,3,2]
y[,3,3] y[,3,4] y[,3,5] y[,3,6] y[,3,7] y[,3,8]
y[,3,9] va[,1,1] va[,1,2] va[,1,3] va[,1,4] va[,1,5]
va[,1,6] va[,1,7] va[,1,8] va[,1,9] va[,2,1] va[,2,2]
va[,2,3] va[,2,4] va[,2,5] va[,2,6] va[,2,7] va[,2,8]
va[,2,9] va[,3,1] va[,3,2] va[,3,3] va[,3,4] va[,3,5]
va[,3,6] va[,3,7] va[,3,8] va[,3,9]
3 7 1 6 1 2 3 4 5 6 7 NA
NA 450 447 434 1 5 8 204 275 60 46
84 10 7 NA NA 154 308 47 35 131 11 5
NA NA 161 286 42 28 127 16 4 NA NA
600 450 450 450 415 415 415 NA NA 596 447
447 447 413 413 413 NA NA 578 434 434 434
384 384 384 NA NA
3 8 1 6 1 2 3 4 5 6 7 8
NA 363 359 350 1 2 4 193 214 62 45
149 23 13 0 NA 105 255 47 28 167 20 7
4 NA 135 238 56 38 129 24 10 0 NA
484 363 363 363 362 362 363 363 NA 478 359
359 359 355 355 355 359 NA 466 350 350 350
346 346 346 351 NA
2 7 1 5 1 2 3 4 5 6 8 NA
NA 556 550 NA 1 8 NA 289 290 87 78
99 8 1 NA NA 219 346 61 54 175 24 8
Appendix B
347
NA NA NA NA NA NA NA NA NA NA NA
741.3333333 556 556 556 515 515 556 NA NA 733
550 550 550 504 504 550 NA NA NA NA NA
NA NA NA NA NA NA
2 1 1 0 1 NA NA NA NA NA NA NA
NA 39 36 NA 4 7 NA 17 NA NA NA
NA NA NA NA NA 17 NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA
NA 52 NA NA NA NA NA NA NA NA 48
NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA
2 4 1 3 1 2 3 5 NA NA NA NA
NA 448 897 NA 4 7 NA 203 327 90 16
NA NA NA NA NA 430 655 188 99 NA NA
NA NA NA NA NA NA NA NA NA NA NA
NA 597.3333333 448 448 445 NA NA NA NA NA
1196 897 897 888 NA NA NA NA NA NA NA
NA NA NA NA NA NA NA
2 5 1 3 1 2 3 6 8 NA NA NA
NA 408 409 NA 1 2 NA 195 220 110 12 0
NA NA NA NA 92 299 65 25 4 NA NA
NA NA NA NA NA NA NA NA NA NA NA
544 408 408 408 408 NA NA NA NA 545 410
409 410 410 NA NA NA NA NA NA NA NA
NA NA NA NA NA
2 8 1 5 1 2 3 4 6 7 9 10
NA 418 425 NA 1 3 NA 222 191 101 79 7
4 1 0 NA 101 299 75 53 36 8 4 0
NA NA NA NA NA NA NA NA NA NA
557.3333333 418 418 418 418 418 418 418 NA 566
425 425 425 425 425 425 425 NA NA NA NA
NA NA NA NA NA NA
2 9 1 6 1 2 3 4 5 6 7 9
10 355 358 NA 1 3 NA 189 187 103 63
18 12 4 1 0 100 256 91 49 62 33 8
0 1 NA NA NA NA NA NA NA NA NA
473.3333333 355 355 355 355 355 355 355 355 477
358 358 358 358 358 358 358 358 NA NA NA
NA NA NA NA NA NA
2 3 1 2 1 2 4 NA NA NA NA NA
NA 94 88 NA 7 5 NA 62 49 13 NA
NA NA NA NA NA 81 33 28 NA NA NA
NA NA NA NA NA NA NA NA NA NA NA
NA 125.3333333 96 96 NA NA NA NA NA NA
117 92 92 NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA
2 3 1 2 1 2 3 NA NA NA NA NA
NA 126 125 NA 1 4 NA 141 34 31 NA
NA NA NA NA NA 97 42 27 NA NA NA
NA NA NA NA NA NA NA NA NA NA NA
NA 168 126 126 NA NA NA NA NA NA 166
125 125 NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA
2 3 1 2 1 2 4 NA NA NA NA NA
NA 143 158 NA 1 5 NA 156 23 50 NA
NA NA NA NA NA 140 32 35 NA NA NA
NA NA NA NA NA NA NA NA NA NA NA
NA 190.6666667 87 143 NA NA NA NA NA NA
210 85 158 NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA
2 4 1 3 1 2 3 5 NA NA NA NA
NA 187 189 NA 1 6 NA 319 30 71 2
NA NA NA NA NA 435 59 51 10 NA NA
NA NA NA NA NA NA NA NA NA NA NA
NA 249.3333333 187 187 187 NA NA NA NA NA
252 184 189 184 NA NA NA NA NA NA NA
NA NA NA NA NA NA NA
Appendix B
348
2 4 1 3 1 2 4 5 NA NA NA NA
NA 381 375 NA 6 4 NA 150 239 45 21
NA NA NA NA NA 147 234 33 5 NA NA
NA NA NA NA NA NA NA NA NA NA NA
NA 500 386 386 381 NA NA NA NA NA 508
378 378 375 NA NA NA NA NA NA NA NA
NA NA NA NA NA NA
2 6 1 4 1 2 3 5 6 8 NA NA
NA 363 358 NA 1 9 NA 261 166 99 129
24 1 NA NA NA 176 202 72 205 24 8
NA NA NA NA NA NA NA NA NA NA NA
NA 484 363 363 360 360 360 NA NA NA 477
358 358 358 358 358 NA NA NA NA NA NA
NA NA NA NA NA NA
2 3 1 2 1 2 3 NA NA NA NA NA
NA 23 25 NA 1 4 NA 41 6 11 NA
NA NA NA NA NA 9 14 5 NA NA NA
NA NA NA NA NA NA NA NA NA NA NA
NA 30.66666667 23 23 NA NA NA NA NA NA
33 25 25 NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA
2 2 1 1 1 2 NA NA NA NA NA NA
NA 112 115 NA 1 7 NA 189 18 NA NA
NA NA NA NA NA 128 36 NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA
NA 149.3333333 112 NA NA NA NA NA NA NA
153 115 NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA
END
Trial data (rectangular format) for Models 1,2,3 na[] no[] o[,1] o[,2] o[,3] o[,4] o[,5] o[,6] o[,7] n[,1] n[,2] n[,3]
t[,1] t[,2] t[,3] y[,1,1] y[,1,2] y[,1,3] y[,1,4]
y[,1,5] y[,1,6] y[,1,7] y[,2,1] y[,2,2] y[,2,3]
y[,2,4] y[,2,5] y[,2,6] y[,2,7] y[,3,1] y[,3,2]
y[,3,3] y[,3,4] y[,3,5] y[,3,6] y[,3,7] va[,1,1]
va[,1,2] va[,1,3] va[,1,4] va[,1,5] va[,1,6] va[,1,7]
va[,2,1] va[,2,2] va[,2,3] va[,2,4] va[,2,5] va[,2,6]
va[,2,7] va[,3,1] va[,3,2] va[,3,3] va[,3,4] va[,3,5]
va[,3,6] va[,3,7]
3 7 1 2 3 4 5 6 7 450 447 434 1
5 8 -1.078809661 0.451985124 -1.871802177 -2.172773481 -
1.371301577 -3.701301974 -4.065357025 -1.347073648 0.79562585 -2.141316945 -
2.465675288 -0.766709748 -3.598556816 -4.401829262 -1.272965676 0.658779537 -
2.233592222 -2.674148649 -0.704888998 -3.135494216 -4.553876892 3.521753493
4.207792208 8.653846154 10.89647008 6.194252626 42.52469136
60.30287115 2.655451786 4.667126039 10.6281383 13.85638003
4.617210763 38.57281773 83.6122549 5.020610262 4.44991495
11.44047619 16.56896552 4.517785471 25.04347826 97.01052632
3 7 1 2 3 4 5 6 7 363 359 350 1
2 4 -0.916290732 0.362029709 -1.57997588 -1.955388893 -
0.35734586 -2.690505891 -3.292983797 -1.514127733 0.896872646 -1.892855586 -
2.469913865 -0.11844815 -2.818398258 -3.906292331 -1.237874356 0.753771802 -
1.658228077 -2.105417028 -0.520084949 -2.596497715 -3.514526067 3.691612479
4.132503293 7.060818776 9.208176101 4.129060718 16.80697704
28.96021978 4.560769534 4.859766214 8.788938898 13.90602072
4.014046375 18.80970149 51.73440066 4.015061307 4.595588235
7.44047619 10.33232119 4.276640589 15.49120083 35.6297619
2 6 1 2 3 4 5 6 NA 556 550 NA 1
8 NA -0.94160854 0.086384614 -1.68469465 -1.812901906 -
1.43556541 -4.149069462 NA -1.203972804 0.528318781 -2.081488625 -
2.21759188 -0.631271777 -2.995732274 NA NA NA NA NA NA
NA NA 3.302978061 4.007466943 7.576305664 8.291385045
6.440000971 65.39077909 NA 2.451712012 4.285673807 10.14113782
11.29405615 4.411914894 22.05 NA NA NA NA NA NA
NA NA
Appendix B
349
2 1 1 NA NA NA NA NA NA 39 36 NA 4
7 NA -1.108662625 NA NA NA NA NA NA -
0.994252273 NA NA NA NA NA NA NA NA NA NA
NA NA NA 3.868686582 NA NA NA NA NA NA
3.443987948 NA NA NA NA NA NA NA NA NA
NA NA NA NA
2 4 1 2 3 5 NA NA NA 448 897 NA 4
7 NA -1.078809661 0.994169625 -1.380723316 -3.288868197 NA
NA NA -1.021651248 0.995697509 -1.327413564 -2.075646471 NA
NA NA NA NA NA NA NA NA NA 3.574478436
5.07250992 6.229174426 28.84979604 NA NA NA 3.368066624
5.076077219 6.036438796 10.09517225 NA NA NA NA NA
NA NA NA NA NA
2 4 1 2 3 6 NA NA NA 408 409 NA 1
2 NA -1.021651248 0.157185584 -0.996613121 -3.47609869 NA
NA NA -1.771956842 0.990913372 -1.666254387 -2.751535313 NA
NA NA NA NA NA NA NA NA NA 3.894646152
4.024758221 5.078218426 34.36426117 NA NA NA 4.3758069
5.064931152 7.481261181 17.73049645 NA NA NA NA NA
NA NA NA NA NA
2 6 1 2 3 4 6 7 NA 418 425 NA 1
3 NA -0.916290732 -0.172676589 -1.143781257 -1.456552255 -
4.072683065 -4.639571613 NA -1.714798428 0.864161666 -1.540445041 -
1.948601941 -2.380060405 -3.95364468 NA NA NA NA NA NA
NA NA 2.85182696 4.029891367 5.457225849 6.524177589
60.73131734 105.5096618 NA 4.056923076 4.794420555 6.880952381
9.161341043 12.89810054 54.14418465 NA NA NA NA NA
NA NA NA
2 7 1 2 3 4 5 6 7 355 358 NA 1
3 NA -0.916290732 0.107144637 -0.894700099 -1.533619076 -
2.929711172 -3.352823797 -4.474491862 -1.560647748 0.920204631 -1.076389152 -
1.841520979 -1.563225069 -2.287317621 -3.778491613 NA NA NA NA
NA NA NA 2.747215428 4.01149096 4.8553321 6.85067406
20.77563469 30.61831876 89.76139601 3.465179 4.908241422
5.274889904 8.464698501 6.983653008 11.95002331 45.77285714 NA
NA NA NA NA NA NA
2 3 1 2 4 NA NA NA NA 94 88 NA 7
5 NA -0.693147181 0.041672696 -1.85389125 NA NA NA
NA -0.356674944 -0.581029882 -0.826678573 NA NA NA NA
NA NA NA NA NA NA NA 2.454896121 4.001736865
8.541241891 NA NA NA NA 1.747514885 4.347200822
4.723214286 NA NA NA NA NA NA NA NA NA
NA NA
2 3 1 2 3 NA NA NA NA 126 125 NA 1
4 NA -0.174353387 -0.995428052 -1.119889687 NA NA NA
NA -0.527632742 -0.68117099 -1.289130613 NA NA NA NA
NA NA NA NA NA NA NA 1.449615722 5.07544757
5.390831919 NA NA NA NA 2.069581078 4.482214573
5.905139834 NA NA NA NA NA NA NA NA NA
NA NA
2 3 1 2 4 NA NA NA NA 143 158 NA 1
5 NA -0.198450939 -1.023388867 -0.620576488 NA NA NA
NA -0.400477567 -0.504556011 -1.256836294 NA NA NA NA
NA NA NA NA NA NA NA 1.484054383 5.141983696
4.397634409 NA NA NA NA 1.817426485 4.260023585
5.79883856 NA NA NA NA NA NA NA NA NA
NA NA
2 4 1 2 3 5 NA NA NA 187 189 NA 1
6 NA 0.246860078 -1.655048424 -0.490910314 -4.527208645 NA
NA NA 0.548121409 -0.750776293 -0.995428052 -2.856470206 NA
NA NA NA NA NA NA NA NA NA 0.947953851
7.424416136 4.245871782 94.51081081 NA NA NA 0.700851385
4.590644068 5.07544757 19.45747126 NA NA NA NA NA
NA NA NA NA NA
2 4 1 2 4 5 NA NA NA 381 375 NA 6
4 NA -1.203972804 0.486030965 -2.025219988 -2.841581594 NA
NA NA -1.237874356 0.485507816 -2.347036856 -4.304065093 NA
NA NA NA NA NA NA NA NA NA 4.122317025
Appendix B
350
4.240913102 9.709742587 19.20119048 NA NA NA 4.131658462
4.240384615 12.55019763 76.01351351 NA NA NA NA NA
NA NA NA NA NA
2 5 1 2 3 5 6 NA NA 363 358 NA 1
9 NA -0.616186139 -0.17121594 -0.980829253 -0.582605306 -
2.63905733 NA NA -0.994252273 0.25841169 -1.379325692 0.292572058 -
2.633087163 NA NA NA NA NA NA NA NA NA
1.81239387 4.029386582 5.041666667 4.349139233 16.07142857 NA
NA 2.857256318 4.067149023 6.223970474 4.086210744 15.98852295
NA NA NA NA NA NA NA NA NA
2 3 1 2 3 NA NA NA NA 23 25 NA 1
4 NA 0.300104592 -1.041453875 -0.087011377 NA NA NA
NA -1.203972804 0.241162057 -1.386294361 NA NA NA NA
NA NA NA NA NA NA NA 0.916028512 5.18627451
4.007575758 NA NA NA NA 4.433022784 4.058441558 6.25
NA NA NA NA NA NA NA NA NA NA NA
2 2 1 2 NA NA NA NA NA 112 115 NA 1
7 NA 0.2390169 -1.652923024 NA NA NA NA NA -
0.174353387 -0.785928914 NA NA NA NA NA NA NA NA
NA NA NA NA 0.957245599 7.413711584 NA NA NA
NA NA 1.450496939 4.650140647 NA NA NA NA NA
NA NA NA NA NA NA NA
END
Trial data (rectangular format) for Model 1* na[] no[] o[,1] o[,2] o[,3] o[,4] o[,5] o[,6] o[,7] n[,1] n[,2] n[,3]
t[,1] t[,2] t[,3] y[,2,1] y[,2,2] y[,2,3] y[,2,4]
y[,2,5] y[,2,6] y[,2,7] y[,3,1] y[,3,2] y[,3,3]
y[,3,4] y[,3,5] y[,3,6] y[,3,7] se[,2,1] se[,2,2]
se[,2,3] se[,2,4] se[,2,5] se[,2,6] se[,2,7] se[,3,1]
se[,3,2] se[,3,3] se[,3,4] se[,3,5] se[,3,6] se[,3,7]
3 7 1 2 3 4 5 6 7 450 447 434 1
5 8 -0.268263987 0.343640726 -0.269514768 -0.292901806
0.604591829 0.102745158 -0.336472237 -0.194156014 0.206794413 -
0.361790045 -0.501375168 0.666412578 0.565807758 -0.488519866 0.117331696
0.140682789 0.207382171 0.234974448 0.155223505 0.425196391
0.566620159 0.139263582 0.140013962 0.213521225 0.249783342
0.155481992 0.390132261 0.597940581
3 7 1 2 3 4 5 6 7 363 359 350 1
2 4 -0.597837001 0.534842937 -0.312879706 -0.514524972
0.238897709 -0.127892367 -0.613308534 -0.321583624 0.391742093 -
0.078252197 -0.150028135 -0.162739089 0.094008176 -0.22154227 0.151240949
0.157864688 0.20960204 0.25318434 0.150186586 0.314157483
0.473167185 0.147109942 0.156571258 0.201766684 0.234281442
0.153602721 0.300933192 0.426121508
2 6 1 2 3 4 5 6 NA 556 550 NA 1
8 NA -0.262364264 0.441934167 -0.396793976 -0.404689974
0.804293633 1.153337188 NA NA NA NA NA NA NA
NA 0.101971889 0.122473706 0.179066695 0.188274296 0.14001571
0.397114875 NA NA NA NA NA NA NA NA
2 1 1 NA NA NA NA NA NA 39 36 NA 4
7 NA 0.114410351 NA NA NA NA NA NA NA
NA NA NA NA NA NA 0.441433374 NA NA NA
NA NA NA NA NA NA NA NA NA NA
2 4 1 2 3 5 NA NA NA 448 897 NA 4
7 NA 0.057158414 0.001527884 0.053309752 1.213221726 NA
NA NA NA NA NA NA NA NA NA 0.108321553
0.130313145 0.143645374 0.275047703 NA NA NA NA NA
NA NA NA NA NA
2 4 1 2 3 6 NA NA NA 408 409 NA 1
2 NA -0.750305594 0.833727789 -0.669641267 0.724563377 NA
NA NA NA NA NA NA NA NA NA 0.142283153
0.149158634 0.175323147 0.357179195 NA NA NA NA NA
NA NA NA NA NA
2 6 1 2 3 4 6 7 NA 418 425 NA 1
3 NA -0.798507696 1.036838256 -0.396663784 -0.492049686
Appendix B
351
1.69262266 0.685926933 NA NA NA NA NA NA NA
NA 0.127938477 0.14464397 0.171014737 0.192780126 0.419092716
0.616290143 NA NA NA NA NA NA NA NA
2 7 1 2 3 4 5 6 7 355 358 NA 1
3 NA -0.644357016 0.813059994 -0.181689053 -0.307901903
1.366486103 1.065506177 0.69600025 NA NA NA NA NA
NA NA 0.131976914 0.158145965 0.168556582 0.207224715
0.279339086 0.345873877 0.617013894 NA NA NA NA NA
NA NA
2 3 1 2 4 NA NA NA NA 94 88 NA 7
5 NA 0.336472237 -0.622702579 1.027212677 NA NA NA
NA NA NA NA NA NA NA NA 0.214415578
0.303268327 0.380180437 NA NA NA NA NA NA NA
NA NA NA NA
2 3 1 2 3 NA NA NA NA 126 125 NA 1
4 NA -0.353279355 0.314257063 -0.169240926 NA NA NA
NA NA NA NA NA NA NA NA 0.167515776
0.275933047 0.300042495 NA NA NA NA NA NA NA
NA NA NA NA
2 3 1 2 4 NA NA NA NA 143 158 NA 1
5 NA -0.202026628 0.518832857 -0.636259806 NA NA NA
NA NA NA NA NA NA NA NA 0.147921269
0.250838798 0.25971946 NA NA NA NA NA NA NA
NA NA NA NA
2 4 1 2 3 5 NA NA NA 187 189 NA 1
6 NA 0.301261331 0.90427213 -0.504517738 1.670738438 NA
NA NA NA NA NA NA NA NA NA 0.093688208
0.252966168 0.222619444 0.779971146 NA NA NA NA NA
NA NA NA NA NA
2 4 1 2 4 5 NA NA NA 381 375 NA 6
4 NA -0.033901552 -0.000523149 -0.321816868 -1.462483499 NA
NA NA NA NA NA NA NA NA NA 0.147775118
0.14979552 0.242800499 0.50308998 NA NA NA NA NA
NA NA NA NA NA
2 5 1 2 3 5 6 NA NA 363 358 NA 1
9 NA -0.378066134 0.429627631 -0.398496439 0.875177364
0.005970167 NA NA NA NA NA NA NA NA NA
0.113903395 0.14986991 0.17684536 0.152954556 0.298219024 NA
NA NA NA NA NA NA NA NA
2 3 1 2 3 NA NA NA NA 23 25 NA 1
4 NA -1.504077397 1.282615932 -1.299282984 NA NA NA
NA NA NA NA NA NA NA NA 0.465991672
0.622758266 0.651338947 NA NA NA NA NA NA NA
NA NA NA NA
2 2 1 2 NA NA NA NA NA 112 115 NA 1
7 NA -0.413370288 0.86699411 NA NA NA NA NA
NA NA NA NA NA NA NA 0.145464266 0.326542278
NA NA NA NA NA NA NA NA NA NA NA
NA
END
Trial data (rectangular format) for Model 4a na[] no[] no1[] no2[] o[,1] o[,2] o[,3] o[,4] o[,5] o[,6] o[,7] o[,8]
o[,9] n[,1] n[,2] n[,3] t[,1] t[,2] t[,3] y[,1,1] y[,1,2]
y[,1,3] y[,1,4] y[,1,5] y[,1,6] y[,1,7] y[,1,8]
y[,1,9] y[,2,1] y[,2,2] y[,2,3] y[,2,4] y[,2,5]
y[,2,6] y[,2,7] y[,2,8] y[,2,9] y[,3,1] y[,3,2]
y[,3,3] y[,3,4] y[,3,5] y[,3,6] y[,3,7] y[,3,8]
y[,3,9] va[,1,1] va[,1,2] va[,1,3] va[,1,4] va[,1,5]
va[,1,6] va[,1,7] va[,1,8] va[,1,9] va[,2,1] va[,2,2]
va[,2,3] va[,2,4] va[,2,5] va[,2,6] va[,2,7] va[,2,8]
va[,2,9] va[,3,1] va[,3,2] va[,3,3] va[,3,4] va[,3,5]
va[,3,6] va[,3,7] va[,3,8] va[,3,9]
3 7 7 0 1 2 3 4 5 6 7 NA
NA 450 447 434 1 5 8 -1.078809661 0.451985124 -
1.871802177 -2.172773481 -1.371301577 -3.701301974 -4.065357025 NA NA -
Appendix B
352
1.347073648 0.79562585 -2.141316945 -2.465675288 -0.766709748 -3.598556816 -
4.401829262 NA NA -1.272965676 0.658779537 -2.233592222 -2.674148649 -
0.704888998 -3.135494216 -4.553876892 NA NA 3.521753493 4.207792208
8.653846154 10.89647008 6.194252626 42.52469136 60.30287115 NA
NA 2.655451786 4.667126039 10.6281383 13.85638003 4.617210763
38.57281773 83.6122549 NA NA 5.020610262 4.44991495
11.44047619 16.56896552 4.517785471 25.04347826 97.01052632 NA
NA
3 8 7 1 1 2 3 4 5 6 7 8
NA 363 359 350 1 2 4 -0.916290732 0.362029709 -
1.57997588 -1.955388893 -0.35734586 -2.690505891 -3.292983797 0 NA -
1.514127733 0.896872646 -1.892855586 -2.469913865 -0.11844815 -2.818398258 -
3.906292331 4 NA -1.237874356 0.753771802 -1.658228077 -2.105417028 -
0.520084949 -2.596497715 -3.514526067 0 NA 3.691612479 4.132503293
7.060818776 9.208176101 4.129060718 16.80697704 28.96021978 363
NA 4.560769534 4.859766214 8.788938898 13.90602072 4.014046375
18.80970149 51.73440066 359 NA 4.015061307 4.595588235
7.44047619 10.33232119 4.276640589 15.49120083 35.6297619 351
NA
2 7 6 1 1 2 3 4 5 6 8 NA
NA 556 550 NA 1 8 NA -0.94160854 0.086384614 -
1.68469465 -1.812901906 -1.43556541 -4.149069462 1 NA NA -
1.203972804 0.528318781 -2.081488625 -2.21759188 -0.631271777 -2.995732274 8
NA NA NA NA NA NA NA NA NA NA NA
3.302978061 4.007466943 7.576305664 8.291385045 6.440000971
65.39077909 556 NA NA 2.451712012 4.285673807 10.14113782
11.29405615 4.411914894 22.05 550 NA NA NA NA NA
NA NA NA NA NA NA
2 1 1 0 1 NA NA NA NA NA NA NA
NA 39 36 NA 4 7 NA -1.108662625 NA NA
NA NA NA NA NA NA -0.994252273 NA NA NA
NA NA NA NA NA NA NA NA NA NA NA
NA NA NA 3.868686582 NA NA NA NA NA NA
NA NA 3.443987948 NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA
2 4 4 0 1 2 3 5 NA NA NA NA
NA 448 897 NA 4 7 NA -1.078809661 0.994169625 -
1.380723316 -3.288868197 NA NA NA NA NA -1.021651248
0.995697509 -1.327413564 -2.075646471 NA NA NA NA NA
NA NA NA NA NA NA NA NA NA 3.574478436
5.07250992 6.229174426 28.84979604 NA NA NA NA NA
3.368066624 5.076077219 6.036438796 10.09517225 NA NA NA
NA NA NA NA NA NA NA NA NA NA NA
2 5 4 1 1 2 3 6 8 NA NA NA
NA 408 409 NA 1 2 NA -1.021651248 0.157185584 -
0.996613121 -3.47609869 0 NA NA NA NA -1.771956842
0.990913372 -1.666254387 -2.751535313 4 NA NA NA NA
NA NA NA NA NA NA NA NA NA 3.894646152
4.024758221 5.078218426 34.36426117 408 NA NA NA NA
4.3758069 5.064931152 7.481261181 17.73049645 410 NA NA
NA NA NA NA NA NA NA NA NA NA NA
2 8 6 2 1 2 3 4 6 7 9 10
NA 418 425 NA 1 3 NA -0.916290732 -0.172676589 -
1.143781257 -1.456552255 -4.072683065 -4.639571613 1 0 NA -
1.714798428 0.864161666 -1.540445041 -1.948601941 -2.380060405 -3.95364468 4
0 NA NA NA NA NA NA NA NA NA NA
2.85182696 4.029891367 5.457225849 6.524177589 60.73131734
105.5096618 418 418 NA 4.056923076 4.794420555 6.880952381
9.161341043 12.89810054 54.14418465 425 425 NA NA NA
NA NA NA NA NA NA NA
2 9 7 2 1 2 3 4 5 6 7 9
10 355 358 NA 1 3 NA -0.916290732 0.107144637 -
0.894700099 -1.533619076 -2.929711172 -3.352823797 -4.474491862 1 0 -
1.560647748 0.920204631 -1.076389152 -1.841520979 -1.563225069 -2.287317621 -
3.778491613 0 1 NA NA NA NA NA NA NA NA
NA 2.747215428 4.01149096 4.8553321 6.85067406 20.77563469
30.61831876 89.76139601 355 355 3.465179 4.908241422
Appendix B
353
5.274889904 8.464698501 6.983653008 11.95002331 45.77285714 358
358 NA NA NA NA NA NA NA NA NA
2 3 3 0 1 2 4 NA NA NA NA NA
NA 94 88 NA 7 5 NA -0.693147181 0.041672696 -
1.85389125 NA NA NA NA NA NA -0.356674944 -0.581029882 -
0.826678573 NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA 2.454896121 4.001736865 8.541241891
NA NA NA NA NA NA 1.747514885 4.347200822
4.723214286 NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA
2 3 3 0 1 2 3 NA NA NA NA NA
NA 126 125 NA 1 4 NA -0.174353387 -0.995428052 -
1.119889687 NA NA NA NA NA NA -0.527632742 -0.68117099 -
1.289130613 NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA 1.449615722 5.07544757 5.390831919
NA NA NA NA NA NA 2.069581078 4.482214573
5.905139834 NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA
2 3 3 0 1 2 4 NA NA NA NA NA
NA 143 158 NA 1 5 NA -0.198450939 -1.023388867 -
0.620576488 NA NA NA NA NA NA -0.400477567 -0.504556011 -
1.256836294 NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA 1.484054383 5.141983696 4.397634409
NA NA NA NA NA NA 1.817426485 4.260023585
5.79883856 NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA
2 4 4 0 1 2 3 5 NA NA NA NA
NA 187 189 NA 1 6 NA 0.246860078 -1.655048424 -
0.490910314 -4.527208645 NA NA NA NA NA 0.548121409 -
0.750776293 -0.995428052 -2.856470206 NA NA NA NA NA NA
NA NA NA NA NA NA NA NA 0.947953851
7.424416136 4.245871782 94.51081081 NA NA NA NA NA
0.700851385 4.590644068 5.07544757 19.45747126 NA NA NA
NA NA NA NA NA NA NA NA NA NA NA
2 4 4 0 1 2 4 5 NA NA NA NA
NA 381 375 NA 6 4 NA -1.203972804 0.486030965 -
2.025219988 -2.841581594 NA NA NA NA NA -1.237874356
0.485507816 -2.347036856 -4.304065093 NA NA NA NA NA
NA NA NA NA NA NA NA NA NA 4.122317025
4.240913102 9.709742587 19.20119048 NA NA NA NA NA
4.131658462 4.240384615 12.55019763 76.01351351 NA NA NA
NA NA NA NA NA NA NA NA NA NA NA
2 6 5 1 1 2 3 5 6 8 NA NA
NA 363 358 NA 1 9 NA -0.616186139 -0.17121594 -
0.980829253 -0.582605306 -2.63905733 1 NA NA NA -0.994252273
0.25841169 -1.379325692 0.292572058 -2.633087163 8 NA NA
NA NA NA NA NA NA NA NA NA NA
1.81239387 4.029386582 5.041666667 4.349139233 16.07142857 360
NA NA NA 2.857256318 4.067149023 6.223970474 4.086210744
15.98852295 358 NA NA NA NA NA NA NA NA
NA NA NA NA
2 3 3 0 1 2 3 NA NA NA NA NA
NA 23 25 NA 1 4 NA 0.300104592 -1.041453875 -
0.087011377 NA NA NA NA NA NA -1.203972804 0.241162057 -
1.386294361 NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA 0.916028512 5.18627451 4.007575758
NA NA NA NA NA NA 4.433022784 4.058441558 6.25
NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA
2 2 2 0 1 2 NA NA NA NA NA NA
NA 112 115 NA 1 7 NA 0.2390169 -1.652923024
NA NA NA NA NA NA NA -0.174353387 -0.785928914
NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA 0.957245599 7.413711584 NA NA
NA NA NA NA NA 1.450496939 4.650140647 NA NA
NA NA NA NA NA NA NA NA NA NA NA
NA NA NA
END
Appendix B
354
Trial data (rectangular format) for Model 4b Sample variance for outcomes 8-10 is set to (0.025 + 𝑝)(0.975 − 𝑝) × 100 𝑁⁄ as per II.6.1.5. na[] no[] no1[] no2[] o[,1] o[,2] o[,3] o[,4] o[,5] o[,6] o[,7] o[,8]
o[,9] n[,1] n[,2] n[,3] t[,1] t[,2] t[,3] y[,1,1] y[,1,2]
y[,1,3] y[,1,4] y[,1,5] y[,1,6] y[,1,7] y[,1,8]
y[,1,9] y[,2,1] y[,2,2] y[,2,3] y[,2,4] y[,2,5]
y[,2,6] y[,2,7] y[,2,8] y[,2,9] y[,3,1] y[,3,2]
y[,3,3] y[,3,4] y[,3,5] y[,3,6] y[,3,7] y[,3,8]
y[,3,9] va[,1,1] va[,1,2] va[,1,3] va[,1,4] va[,1,5]
va[,1,6] va[,1,7] va[,1,8] va[,1,9] va[,2,1] va[,2,2]
va[,2,3] va[,2,4] va[,2,5] va[,2,6] va[,2,7] va[,2,8]
va[,2,9] va[,3,1] va[,3,2] va[,3,3] va[,3,4] va[,3,5]
va[,3,6] va[,3,7] va[,3,8] va[,3,9]
3 7 7 0 1 2 3 4 5 6 7 NA
NA 450 447 434 1 5 8 -1.078809661 0.451985124 -
1.871802177 -2.172773481 -1.371301577 -3.701301974 -4.065357025 NA NA -
1.347073648 0.795625850 -2.141316945 -2.465675288 -0.766709748 -3.598556816 -
4.401829262 NA NA -1.272965676 0.658779537 -2.233592222 -2.674148649 -
0.704888998 -3.135494216 -4.553876892 NA NA 3.521753493 4.207792208
8.653846154 10.896470082 6.194252626 42.524691358 60.302871148 NA
NA 2.655451786 4.667126039 10.628138298 13.856380028 4.617210763
38.572817730 83.612254902 NA NA 5.020610262 4.449914950
11.440476190 16.568965517 4.517785471 25.043478261 97.010526316 NA
NA
3 8 7 1 1 2 3 4 5 6 7 8
NA 363 359 350 1 2 4 -0.916290732 0.362029709 -
1.579975880 -1.955388893 -0.357345860 -2.690505891 -3.292983797 0.000000000
NA -1.514127733 0.896872646 -1.892855586 -2.469913865 -0.118448150 -
2.818398258 -3.906292331 0.011142061 NA -1.237874356 0.753771802 -
1.658228077 -2.105417028 -0.520084949 -2.596497715 -3.514526067 0.000000000
NA 3.691612479 4.132503293 7.060818776 9.208176101 4.129060718
16.806977042 28.960219780 0.006714876 NA 4.560769534 4.859766214
8.788938898 13.906020716 4.014046375 18.809701493 51.734400657
0.009703569 NA 4.015061307 4.595588235 7.440476190 10.332321188
4.276640589 15.491200828 35.629761905 0.006944444 NA
2 7 6 1 1 2 3 4 5 6 8 NA
NA 556 550 NA 1 8 NA -0.941608540 0.086384614 -
1.684694650 -1.812901906 -1.435565410 -4.149069462 0.001798561 NA NA -
1.203972804 0.528318781 -2.081488625 -2.217591880 -0.631271777 -2.995732274
0.014545455 NA NA NA NA NA NA NA NA NA
NA NA 3.302978061 4.007466943 7.576305664 8.291385045
6.440000971 65.390779093 0.004690719 NA NA 2.451712012
4.285673807 10.141137819 11.294056153 4.411914894 22.050000000
0.006905748 NA NA NA NA NA NA NA NA NA
NA NA
2 1 1 0 1 NA NA NA NA NA NA NA
NA 39 36 NA 4 7 NA -1.108662625 NA NA
NA NA NA NA NA NA -0.994252273 NA NA NA
NA NA NA NA NA NA NA NA NA NA NA
NA NA NA 3.868686582 NA NA NA NA NA NA
NA NA 3.443987948 NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA
2 4 4 0 1 2 3 5 NA NA NA NA
NA 448 897 NA 4 7 NA -1.078809661 0.994169625 -
1.380723316 -3.288868197 NA NA NA NA NA -1.021651248
0.995697509 -1.327413564 -2.075646471 NA NA NA NA NA
NA NA NA NA NA NA NA NA NA 3.574478436
5.072509920 6.229174426 28.849796037 NA NA NA NA NA
3.368066624 5.076077219 6.036438796 10.095172255 NA NA NA
NA NA NA NA NA NA NA NA NA NA NA
2 5 4 1 1 2 3 6 8 NA NA NA
NA 408 409 NA 1 2 NA -1.021651248 0.157185584 -
0.996613121 -3.476098690 0.000000000 NA NA NA NA -1.771956842
0.990913372 -1.666254387 -2.751535313 0.009756098 NA NA NA
NA NA NA NA NA NA NA NA NA NA
Appendix B
355
3.894646152 4.024758221 5.078218426 34.364261168 0.005974265 NA
NA NA NA 4.375806900 5.064931152 7.481261181 17.730496454
0.008182466 NA NA NA NA NA NA NA NA NA
NA NA NA NA
2 8 6 2 1 2 3 4 6 7 9 10
NA 418 425 NA 1 3 NA -0.916290732 -0.172676589 -
1.143781257 -1.456552255 -4.072683065 -4.639571613 0.002392344 0.000000000
NA -1.714798428 0.864161666 -1.540445041 -1.948601941 -2.380060405 -
3.953644680 0.009411765 0.000000000 NA NA NA NA NA NA
NA NA NA NA 2.851826960 4.029891367 5.457225849
6.524177589 60.731317344 105.509661836 0.006373685 0.005831340 NA
4.056923076 4.794420555 6.880952381 9.161341043 12.898100543
54.144184652 0.007818258 0.005735294 NA NA NA NA NA
NA NA NA NA NA
2 9 7 2 1 2 3 4 5 6 7 9
10 355 358 NA 1 3 NA -0.916290732 0.107144637 -
0.894700099 -1.533619076 -2.929711172 -3.352823797 -4.474491862 0.002816901
0.000000000 -1.560647748 0.920204631 -1.076389152 -1.841520979 -
1.563225069 -2.287317621 -3.778491613 0.000000000 0.002793296 NA NA
NA NA NA NA NA NA NA 2.747215428 4.011490960
4.855332100 6.850674060 20.775634685 30.618318756 89.761396011
0.007617781 0.006866197 3.465179000 4.908241422 5.274889904
8.464698501 6.983653008 11.950023310 45.772857143 0.006808659
0.007547718 NA NA NA NA NA NA NA NA NA
2 3 3 0 1 2 4 NA NA NA NA NA
NA 94 88 NA 7 5 NA -0.693147181 0.041672696 -
1.853891250 NA NA NA NA NA NA -0.356674944 -0.581029882 -
0.826678573 NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA 2.454896121 4.001736865 8.541241891
NA NA NA NA NA NA 1.747514885 4.347200822
4.723214286 NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA
2 3 3 0 1 2 3 NA NA NA NA NA
NA 126 125 NA 1 4 NA -0.174353387 -0.995428052 -
1.119889687 NA NA NA NA NA NA -0.527632742 -0.681170990 -
1.289130613 NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA 1.449615722 5.075447570 5.390831919
NA NA NA NA NA NA 2.069581078 4.482214573
5.905139834 NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA
2 3 3 0 1 2 4 NA NA NA NA NA
NA 143 158 NA 1 5 NA -0.198450939 -1.023388867 -
0.620576488 NA NA NA NA NA NA -0.400477567 -0.504556011 -
1.256836294 NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA 1.484054383 5.141983696 4.397634409
NA NA NA NA NA NA 1.817426485 4.260023585
5.798838560 NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA
2 4 4 0 1 2 3 5 NA NA NA NA
NA 187 189 NA 1 6 NA 0.246860078 -1.655048424 -
0.490910314 -4.527208645 NA NA NA NA NA 0.548121409 -
0.750776293 -0.995428052 -2.856470206 NA NA NA NA NA NA
NA NA NA NA NA NA NA NA 0.947953851
7.424416136 4.245871782 94.510810811 NA NA NA NA NA
0.700851385 4.590644068 5.075447570 19.457471264 NA NA NA
NA NA NA NA NA NA NA NA NA NA NA
2 4 4 0 1 2 4 5 NA NA NA NA
NA 381 375 NA 6 4 NA -1.203972804 0.486030965 -
2.025219988 -2.841581594 NA NA NA NA NA -1.237874356
0.485507816 -2.347036856 -4.304065093 NA NA NA NA NA
NA NA NA NA NA NA NA NA NA 4.122317025
4.240913102 9.709742587 19.201190476 NA NA NA NA NA
4.131658462 4.240384615 12.550197628 76.013513514 NA NA NA
NA NA NA NA NA NA NA NA NA NA NA
2 6 5 1 1 2 3 5 6 8 NA NA
NA 363 358 NA 1 9 NA -0.616186139 -0.171215940 -
0.980829253 -0.582605306 -2.639057330 0.002777778 NA NA NA -
0.994252273 0.258411690 -1.379325692 0.292572058 -2.633087163 0.022346369
Appendix B
356
NA NA NA NA NA NA NA NA NA NA NA
NA 1.812393870 4.029386582 5.041666667 4.349139233 16.071428571
0.007501715 NA NA NA 2.857256318 4.067149023 6.223970474
4.086210744 15.988522954 0.012599075 NA NA NA NA NA
NA NA NA NA NA NA NA
2 3 3 0 1 2 3 NA NA NA NA NA
NA 23 25 NA 1 4 NA 0.300104592 -1.041453875 -
0.087011377 NA NA NA NA NA NA -1.203972804 0.241162057 -
1.386294361 NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA 0.916028512 5.186274510 4.007575758
NA NA NA NA NA NA 4.433022784 4.058441558
6.250000000 NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA
2 2 2 0 1 2 NA NA NA NA NA NA
NA 112 115 NA 1 7 NA 0.239016900 -1.652923024
NA NA NA NA NA NA NA -0.174353387 -0.785928914
NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA 0.957245599 7.413711584 NA NA
NA NA NA NA NA 1.450496939 4.650140647 NA NA
NA NA NA NA NA NA NA NA NA NA NA
NA NA NA
END
2 Relative ratings
Dictionary
The table below describes the key variables/parameters/constants.
Name Description g[] Average preference strength (log utility coefficient) for criterion eg[] Average utility coefficient for criterion nr Number of ratings in the data rastudyid[i] Identifier of the study/dataset to which the rating in row i belongs subj[i] Identifier of the participant who provided the rating in row i nrastud Number of source studies/datasets nsubj Number of participants nrc Number of criteria raresdev Residual deviance of the model ragamma[k,j] Preference strength for participant or study k (in the random preference
model) rasig Ratngs standard deviation prefresig Standard deviation of random preference distribution ahpalpha Average preference strength for reference category of administration variable
in PROTECT patient ratings (required due to use of AHP elicitation method) pvfa[] AHP-style partial priority for level of administration variable pvfaa AHP-style partial priority for reference level of administration variable pvfb[] Keeney-Raiffa style partial value for level of administration variable
With regard to the RRMS case study data, the PROTECT investigator ratings are study 1 and the PROTECT patient ratings are study 2. The criteria are numbered as follows: 1 Relapse
2 Disability progression
3 PML
Appendix B
357
4 Herpes reactivation
5 Liver enzyme elevation
6 Seizures
7 Congenital abnormalities
8 Infusion/injection reactions
9 Allergic/hypersensitivity reactions
10 Flu-like reactions
11 Daily oral vs daily subcutaneous
12 Monthly infusion vs daily subcutaneous
13 Weekly intramuscular vs daily subcutaneous
Ratings model code (1)
# This model has an intercept term ahpalpha to fix the utility scale in absolute terms, designed for AHP data relating to categorical criteria (eg the PROTECT patient ratings). Where there is no such data (eg the PROTECT investigator ratings), remove the red and green items of code. # The model shown uses random preferences at the individual level – for the fixed-preference model replace the blue lines of code with ragamma[i,j] <- prefmu[i,j] # Note that some of the model outputs (pvfa, pvfaa, pvfb, weightb) are hard coded to the criteria indices in the RRMS dataset. # This model can be hard coded to work with a dataset formed by concatenating the PROTECT investigator ratings (participants 1-3) and the PROTECT patient ratings (participants 4-39). To do this, replace the green code with +step(i-3.5)*ahpalpha model
for(i in 1:nr) # loop through ratings
temp[i]<-subj[i]+nsubj+rastudyid[i]+nrastud # these variables are unused
in the fixed preference model
#Utility model
for (j in 1:nrc) pmu[j,i] <- log(ragamma[subj[i],j])*cr[i,j] # each
criterion's expected contribution to log rating i
ramu[i] <- sum(pmu[1:nrc,i]) # expected value (mean) of log rating i
logra[i]<-log(ra[i])
logra[i]~dnorm(ramu[i],ratau) # likelihood of observed log rating i
radev[i] <- (logra[i]-ramu[i]) * (logra[i]-ramu[i]) * ratau # residual
deviance contribution
for (i in 1:nsubj) # loop through participants
for (j in 1:nrc) # loop through criteria
ragamma[i,j]~dnorm(prefmu[i,j],prefretau[i,j]) I(0,) # random
preferences by participant
prefmu[i,j] <- eg[j] + ahpalpha
prefretau[i,j]<-pow(prefresig*prefmu[i,j],-2) # random preference
precision (sd proportional to mean)
Appendix B
358
ahpalpha~dgamma(1,0.1) # prior for AHP intercept term (utility of reference
category)
ratau<-pow(rasig,-2)
rasig~dunif(0,10) # prior for ratings standard deviation
for (j in 1:nrc) eg[j] ~ dgamma(1,0.01) # prior for utility coefficients
g[j]<-log(eg[j])
pvfb[j]<-eg[j]/eg[11] # Keeney-Raiffa-style partial values for
intermediate admin levels (note hard coding)
pvfa[j]<-(eg[j]+ahpalpha)/(sum(eg[11:13])+4*ahpalpha) # AHP-style partial
priorities for intermediate admin levels (note hard coding)
pvfaa<-ahpalpha/(sum(eg[11:13])+ 4*ahpalpha) # AHP-style partial
priority for reference admin level (note hard coding)
prefresig~dunif(0,10) # prior for random preference standard deviation
#weights
for(i in 1:nrc)
weight[i]<-eg[i]/sum(eg[1:nrc]) # preference weights, i.e. normalised
utiltiy coefficients
for(i in 1:nrc)
weightb[i]<-eg[i]/sum(eg[1:11]) # this calculates weights excluding
intermediate admin levels from the total (note hard coding)
raresdev<-sum(radev[1:nr]) # ratings standard deviation
# END
Ratings model code (2)
# This model is hard coded to a dataset formed by concatenating the PROTECT investigator ratings (participants 1-3) and the PROTECT patient ratings (participants 4-39). # The model includes the intercept term ahpalpha for the patient ratings but not the investigator ratings (i.e. when the variable subj is at least 4). # This model uses random preferences at the study level
model
for(i in 1:nr) # loop through ratings
temp[i]<-subj[i]+nsubj # these variables are unused
#Utility model
Appendix B
359
for (j in 1:nrc) pmu[j,i] <- log(ragamma[rastudyid[i],j])*cr[i,j] #
each criterion's expected contribution to log rating i
ramu[i] <- sum(pmu[1:nrc,i]) # expected value (mean) of log rating
logra[i]<-log(ra[i])
logra[i]~dnorm(ramu[i],ratau) # likelihood of observed log rating i
radev[i] <- (logra[i]-ramu[i]) * (logra[i]-ramu[i]) * ratau # residual
deviance contribution
for (j in 1:nrc) # loop through criteria
for (k in 1:nrastud) # loop through studies
ragamma[k,j]~dnorm(prefmu[k,j],prefretau[j]) I(0,)
prefmu[k,j] <- eg[j] + (k-1)*ahpalpha # includes ahpalpha for
study 2 but not study 1
ahpalpha~dgamma(1,0.01) # prior for AHP intercept term (utility of reference
category)
ratau<-pow(rasig,-2)
rasig~dunif(0,10) # prior for ratings standard deviation
for (j in 1:nrc) eg[j] ~ dgamma(1,0.01) # prior for utility coefficients
g[j]<-log(eg[j])
prefretau[j]<-pow(prefresig*eg[j],-2)
prefresig~dunif(0,10) # prior for random preference standard deviation
#weights
for(i in 1:nrc)
weight[i]<-eg[i]/sum(eg[1:nrc]) # preference weights, i.e. normalised
utiltiy coefficients
raresdev<-sum(radev[1:nr]) # ratings standard deviation
# END
Ratings data
PROTECT investigator ratings list(nr=243,nrc=13,nrastud=1,nsubj=3)
rastudyid[] subj[] ra[] cr[,1] cr[,2] cr[,3] cr[,4]
cr[,5] cr[,6] cr[,7] cr[,8] cr[,9] cr[,10]
cr[,11] cr[,12] cr[,13]
1 1 0.7 1 -1 0 0 0 0 0 0 0 0
0 0 0
1 1 0.1 0 1 -1 0 0 0 0 0 0 0
0 0 0
1 1 0.01 0 0 -1 0 0 0 0 0 0 0
1 0 0
Appendix B
360
1 1 0.12 0 0 -1 1 0 0 0 0 0 0
0 0 0
1 1 0.2 0 0 -1 0 1 0 0 0 0 0
0 0 0
1 1 0.1 0 0 -1 0 0 1 0 0 0 0
0 0 0
1 1 0.1 0 0 -1 0 0 0 1 0 0 0
0 0 0
1 1 0.05 0 0 -1 0 0 0 0 1 0 0
0 0 0
1 1 0.4 0 0 0 0 0 0 0 -1 1 0
0 0 0
1 1 0.4 0 0 0 0 0 0 0 -1 0 1
0 0 0
1 1 0.7 0 0 0 0 0 0 0 0 0 0
-1 1 0
1 1 0.5 0 0 0 0 0 0 0 0 0 0
-1 0 1
1 2 0.7 1 -1 0 0 0 0 0 0 0 0
0 0 0
1 2 0.9 0 1 -1 0 0 0 0 0 0 0
0 0 0
1 2 0.1 0 0 -1 0 0 0 0 0 0 0
1 0 0
1 2 0.2 0 0 -1 1 0 0 0 0 0 0
0 0 0
1 2 0.2 0 0 -1 0 1 0 0 0 0 0
0 0 0
1 2 0.1 0 0 -1 0 0 1 0 0 0 0
0 0 0
1 2 0.1 0 0 -1 0 0 0 1 0 0 0
0 0 0
1 2 0.05 0 0 -1 0 0 0 0 1 0 0
0 0 0
1 2 0.4 0 0 0 0 0 0 0 -1 1 0
0 0 0
1 2 0.4 0 0 0 0 0 0 0 -1 0 1
0 0 0
1 2 0.7 0 0 0 0 0 0 0 0 0 0
-1 1 0
1 2 0.5 0 0 0 0 0 0 0 0 0 0
-1 0 1
1 3 0.6 1 -1 0 0 0 0 0 0 0 0
0 0 0
1 3 0.9 0 1 -1 0 0 0 0 0 0 0
0 0 0
1 3 0.1 0 0 -1 0 0 0 0 0 0 0
1 0 0
1 3 0.3 0 0 -1 1 0 0 0 0 0 0
0 0 0
1 3 0.2 0 0 -1 0 1 0 0 0 0 0
0 0 0
1 3 0.1 0 0 -1 0 0 1 0 0 0 0
0 0 0
1 3 0.1 0 0 -1 0 0 0 1 0 0 0
0 0 0
1 3 0.05 0 0 -1 0 0 0 0 1 0 0
0 0 0
1 3 0.888888889 0 0 0 0 0 0 0 -1 1
0 0 0 0
Appendix B
361
1 3 1.111111111 0 0 0 0 0 0 0 -1 0
1 0 0 0
1 3 0.7 0 0 0 0 0 0 0 0 0 0
-1 1 0
1 3 0.5 0 0 0 0 0 0 0 0 0 0
-1 0 1
END
PROTECT patient ratings list(nr=243,nrc=13,nrastud=2,nsubj=36)
rastudyid[] subj[] ra[] cr[,1] cr[,2] cr[,3] cr[,4]
cr[,5] cr[,6] cr[,7] cr[,8] cr[,9] cr[,10]
cr[,11] cr[,12] cr[,13]
2 1 0.111111111 0 0 0 0 0 0 0 0 0
0 0 1 -1
2 1 0.142857143 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 1 7 0 0 0 0 0 0 0 0 0 0
0 0 1
2 1 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 1 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 1 3 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 2 3 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 2 0.2 0 0 0 0 0 0 0 0 0 0
0 -1 0
2 2 0.111111111 0 0 0 0 0 0 0 0 0
0 0 0 1
2 2 5 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 2 5 0 0 0 0 0 0 0 0 0 0
-1 0 0
2 2 5 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 3 3 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 3 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 3 9 0 0 0 0 0 0 0 0 0 0
0 0 1
2 3 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 3 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 3 9 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 4 9 0 0 0 0 0 0 0 0 0 0
0 0 1
2 4 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 4 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 4 7 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 5 5 0 0 0 0 0 0 0 0 0 0
0 1 -1
Appendix B
362
2 5 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 5 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 5 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 5 3 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 6 3 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 6 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 6 3 0 0 0 0 0 0 0 0 0 0
0 0 1
2 6 0.333333333 0 0 0 0 0 0 0 0 0
0 1 0 -1
2 6 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 6 0.333333333 0 0 0 0 0 0 0 0 0
0 1 -1 0
2 7 3 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 7 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 7 5 0 0 0 0 0 0 0 0 0 0
0 0 1
2 7 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 7 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 7 9 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 8 3 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 8 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 8 3 0 0 0 0 0 0 0 0 0 0
0 0 1
2 8 1 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 8 1 0 0 0 0 0 0 0 0 0 0
-1 0 0
2 8 0.333333333 0 0 0 0 0 0 0 0 0
0 1 -1 0
2 9 1 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 9 1 0 0 0 0 0 0 0 0 0 0
0 -1 0
2 9 1 0 0 0 0 0 0 0 0 0 0
0 0 1
2 9 1 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 9 1 0 0 0 0 0 0 0 0 0 0
-1 0 0
2 9 1 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 10 5 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 10 0.2 0 0 0 0 0 0 0 0 0 0
0 -1 0
Appendix B
363
2 10 9 0 0 0 0 0 0 0 0 0 0
0 0 1
2 10 7 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 10 0.142857143 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 10 1 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 11 9 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 11 0.111111111 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 11 1 0 0 0 0 0 0 0 0 0 0
0 0 1
2 11 1 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 11 1 0 0 0 0 0 0 0 0 0 0
-1 0 0
2 11 1 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 12 0.111111111 0 0 0 0 0 0 0 0 0
0 0 1 -1
2 12 3 0 0 0 0 0 0 0 0 0 0
0 -1 0
2 12 0.2 0 0 0 0 0 0 0 0 0 0
0 0 1
2 12 5 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 12 1 0 0 0 0 0 0 0 0 0 0
-1 0 0
2 12 5 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 13 1 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 13 1 0 0 0 0 0 0 0 0 0 0
0 -1 0
2 13 1 0 0 0 0 0 0 0 0 0 0
0 0 1
2 13 5 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 13 0.2 0 0 0 0 0 0 0 0 0 0
-1 0 0
2 13 5 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 14 3 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 14 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 14 3 0 0 0 0 0 0 0 0 0 0
0 0 1
2 14 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 14 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 14 3 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 15 5 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 15 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
Appendix B
364
2 15 9 0 0 0 0 0 0 0 0 0 0
0 0 1
2 15 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 15 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 15 0.333333333 0 0 0 0 0 0 0 0 0
0 1 -1 0
2 16 3 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 16 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 16 3 0 0 0 0 0 0 0 0 0 0
0 0 1
2 16 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 16 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 16 3 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 17 3 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 17 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 17 1 0 0 0 0 0 0 0 0 0 0
0 0 1
2 17 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 17 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 17 3 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 18 9 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 18 0.111111111 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 18 1 0 0 0 0 0 0 0 0 0 0
0 0 1
2 18 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 18 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 18 3 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 19 3 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 19 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 19 0.2 0 0 0 0 0 0 0 0 0 0
0 0 1
2 19 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 19 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 19 1 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 20 5 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 20 0.2 0 0 0 0 0 0 0 0 0 0
0 -1 0
Appendix B
365
2 20 9 0 0 0 0 0 0 0 0 0 0
0 0 1
2 20 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 20 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 20 3 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 21 3 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 21 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 21 1 0 0 0 0 0 0 0 0 0 0
0 0 1
2 21 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 21 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 21 0.2 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 22 9 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 22 9 0 0 0 0 0 0 0 0 0 0
0 -1 0
2 22 0.142857143 0 0 0 0 0 0 0 0 0
0 0 0 1
2 22 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 22 0.2 0 0 0 0 0 0 0 0 0 0
-1 0 0
2 22 3 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 23 0.142857143 0 0 0 0 0 0 0 0 0
0 0 1 -1
2 23 0.142857143 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 23 7 0 0 0 0 0 0 0 0 0 0
0 0 1
2 23 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 23 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 23 3 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 24 0.2 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 24 5 0 0 0 0 0 0 0 0 0 0
0 -1 0
2 24 9 0 0 0 0 0 0 0 0 0 0
0 0 1
2 24 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 24 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 24 3 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 25 0.333333333 0 0 0 0 0 0 0 0 0
0 0 1 -1
2 25 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
Appendix B
366
2 25 0.333333333 0 0 0 0 0 0 0 0 0
0 0 0 1
2 25 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 25 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 25 3 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 26 5 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 26 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 26 0.2 0 0 0 0 0 0 0 0 0 0
0 0 1
2 26 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 26 0.2 0 0 0 0 0 0 0 0 0 0
-1 0 0
2 26 5 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 27 3 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 27 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 27 9 0 0 0 0 0 0 0 0 0 0
0 0 1
2 27 0.111111111 0 0 0 0 0 0 0 0 0
0 1 0 -1
2 27 0.111111111 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 27 0.333333333 0 0 0 0 0 0 0 0 0
0 1 -1 0
2 28 3 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 28 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 28 7 0 0 0 0 0 0 0 0 0 0
0 0 1
2 28 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 28 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 28 0.2 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 29 1 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 29 0.142857143 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 29 9 0 0 0 0 0 0 0 0 0 0
0 0 1
2 29 5 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 29 0.2 0 0 0 0 0 0 0 0 0 0
-1 0 0
2 29 5 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 30 3 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 30 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
Appendix B
367
2 30 3 0 0 0 0 0 0 0 0 0 0
0 0 1
2 30 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 30 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 30 3 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 31 3 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 31 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 31 9 0 0 0 0 0 0 0 0 0 0
0 0 1
2 31 5 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 31 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 31 0.333333333 0 0 0 0 0 0 0 0 0
0 1 -1 0
2 33 9 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 33 7 0 0 0 0 0 0 0 0 0 0
0 -1 0
2 33 0.142857143 0 0 0 0 0 0 0 0 0
0 0 0 1
2 33 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 33 0.142857143 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 33 7 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 34 0.142857143 0 0 0 0 0 0 0 0 0
0 0 1 -1
2 34 0.2 0 0 0 0 0 0 0 0 0 0
0 -1 0
2 34 5 0 0 0 0 0 0 0 0 0 0
0 0 1
2 34 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 34 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 34 3 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 35 3 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 35 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 35 5 0 0 0 0 0 0 0 0 0 0
0 0 1
2 35 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 35 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 35 1 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 36 0.111111111 0 0 0 0 0 0 0 0 0
0 0 1 -1
2 36 0.111111111 0 0 0 0 0 0 0 0 0
0 0 -1 0
Appendix B
368
2 36 9 0 0 0 0 0 0 0 0 0 0
0 0 1
2 36 0.111111111 0 0 0 0 0 0 0 0 0
0 1 0 -1
2 36 9 0 0 0 0 0 0 0 0 0 0
-1 0 0
2 36 0.111111111 0 0 0 0 0 0 0 0 0
0 1 -1 0
END
Both ratings datasets list(nr=243,nrc=13,nrastud=2,nsubj=39)
rastudyid[] subj[] ra[] cr[,1] cr[,2] cr[,3] cr[,4] cr[,5] cr[,6] cr[,7] cr[,8]
cr[,9] cr[,10] cr[,11] cr[,12] cr[,13]
1 1 0.7 1 -1 0 0 0 0 0 0 0 0
0 0 0
1 1 0.1 0 1 -1 0 0 0 0 0 0 0
0 0 0
1 1 0.01 0 0 -1 0 0 0 0 0 0 0
1 0 0
1 1 0.12 0 0 -1 1 0 0 0 0 0 0
0 0 0
1 1 0.2 0 0 -1 0 1 0 0 0 0 0
0 0 0
1 1 0.1 0 0 -1 0 0 1 0 0 0 0
0 0 0
1 1 0.1 0 0 -1 0 0 0 1 0 0 0
0 0 0
1 1 0.05 0 0 -1 0 0 0 0 1 0 0
0 0 0
1 1 0.4 0 0 0 0 0 0 0 -1 1 0
0 0 0
1 1 0.4 0 0 0 0 0 0 0 -1 0 1
0 0 0
1 1 0.7 0 0 0 0 0 0 0 0 0 0
-1 1 0
1 1 0.5 0 0 0 0 0 0 0 0 0 0
-1 0 1
1 2 0.7 1 -1 0 0 0 0 0 0 0 0
0 0 0
1 2 0.9 0 1 -1 0 0 0 0 0 0 0
0 0 0
1 2 0.1 0 0 -1 0 0 0 0 0 0 0
1 0 0
1 2 0.2 0 0 -1 1 0 0 0 0 0 0
0 0 0
1 2 0.2 0 0 -1 0 1 0 0 0 0 0
0 0 0
1 2 0.1 0 0 -1 0 0 1 0 0 0 0
0 0 0
1 2 0.1 0 0 -1 0 0 0 1 0 0 0
0 0 0
1 2 0.05 0 0 -1 0 0 0 0 1 0 0
0 0 0
1 2 0.4 0 0 0 0 0 0 0 -1 1 0
0 0 0
1 2 0.4 0 0 0 0 0 0 0 -1 0 1
0 0 0
1 2 0.7 0 0 0 0 0 0 0 0 0 0
-1 1 0
1 2 0.5 0 0 0 0 0 0 0 0 0 0
-1 0 1
1 3 0.6 1 -1 0 0 0 0 0 0 0 0
0 0 0
Appendix B
369
1 3 0.9 0 1 -1 0 0 0 0 0 0 0
0 0 0
1 3 0.1 0 0 -1 0 0 0 0 0 0 0
1 0 0
1 3 0.3 0 0 -1 1 0 0 0 0 0 0
0 0 0
1 3 0.2 0 0 -1 0 1 0 0 0 0 0
0 0 0
1 3 0.1 0 0 -1 0 0 1 0 0 0 0
0 0 0
1 3 0.1 0 0 -1 0 0 0 1 0 0 0
0 0 0
1 3 0.05 0 0 -1 0 0 0 0 1 0 0
0 0 0
1 3 0.888888889 0 0 0 0 0 0 0 -1 1
0 0 0 0
1 3 1.111111111 0 0 0 0 0 0 0 -1 0
1 0 0 0
1 3 0.7 0 0 0 0 0 0 0 0 0 0
-1 1 0
1 3 0.5 0 0 0 0 0 0 0 0 0 0
-1 0 1
2 4 0.111111111 0 0 0 0 0 0 0 0 0
0 0 1 -1
2 4 0.142857143 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 4 7 0 0 0 0 0 0 0 0 0 0
0 0 1
2 4 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 4 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 4 3 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 5 3 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 5 0.2 0 0 0 0 0 0 0 0 0 0
0 -1 0
2 5 0.111111111 0 0 0 0 0 0 0 0 0
0 0 0 1
2 5 5 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 5 5 0 0 0 0 0 0 0 0 0 0
-1 0 0
2 5 5 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 6 3 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 6 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 6 9 0 0 0 0 0 0 0 0 0 0
0 0 1
2 6 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 6 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 6 9 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 7 9 0 0 0 0 0 0 0 0 0 0
0 0 1
2 7 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 7 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 7 7 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 8 5 0 0 0 0 0 0 0 0 0 0
0 1 -1
Appendix B
370
2 8 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 8 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 8 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 8 3 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 9 3 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 9 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 9 3 0 0 0 0 0 0 0 0 0 0
0 0 1
2 9 0.333333333 0 0 0 0 0 0 0 0 0
0 1 0 -1
2 9 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 9 0.333333333 0 0 0 0 0 0 0 0 0
0 1 -1 0
2 10 3 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 10 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 10 5 0 0 0 0 0 0 0 0 0 0
0 0 1
2 10 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 10 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 10 9 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 11 3 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 11 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 11 3 0 0 0 0 0 0 0 0 0 0
0 0 1
2 11 1 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 11 1 0 0 0 0 0 0 0 0 0 0
-1 0 0
2 11 0.333333333 0 0 0 0 0 0 0 0 0
0 1 -1 0
2 12 1 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 12 1 0 0 0 0 0 0 0 0 0 0
0 -1 0
2 12 1 0 0 0 0 0 0 0 0 0 0
0 0 1
2 12 1 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 12 1 0 0 0 0 0 0 0 0 0 0
-1 0 0
2 12 1 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 13 5 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 13 0.2 0 0 0 0 0 0 0 0 0 0
0 -1 0
2 13 9 0 0 0 0 0 0 0 0 0 0
0 0 1
2 13 7 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 13 0.142857143 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 13 1 0 0 0 0 0 0 0 0 0 0
1 -1 0
Appendix B
371
2 14 9 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 14 0.111111111 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 14 1 0 0 0 0 0 0 0 0 0 0
0 0 1
2 14 1 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 14 1 0 0 0 0 0 0 0 0 0 0
-1 0 0
2 14 1 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 15 0.111111111 0 0 0 0 0 0 0 0 0
0 0 1 -1
2 15 3 0 0 0 0 0 0 0 0 0 0
0 -1 0
2 15 0.2 0 0 0 0 0 0 0 0 0 0
0 0 1
2 15 5 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 15 1 0 0 0 0 0 0 0 0 0 0
-1 0 0
2 15 5 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 16 1 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 16 1 0 0 0 0 0 0 0 0 0 0
0 -1 0
2 16 1 0 0 0 0 0 0 0 0 0 0
0 0 1
2 16 5 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 16 0.2 0 0 0 0 0 0 0 0 0 0
-1 0 0
2 16 5 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 17 3 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 17 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 17 3 0 0 0 0 0 0 0 0 0 0
0 0 1
2 17 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 17 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 17 3 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 18 5 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 18 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 18 9 0 0 0 0 0 0 0 0 0 0
0 0 1
2 18 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 18 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 18 0.333333333 0 0 0 0 0 0 0 0 0
0 1 -1 0
2 19 3 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 19 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 19 3 0 0 0 0 0 0 0 0 0 0
0 0 1
2 19 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
Appendix B
372
2 19 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 19 3 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 20 3 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 20 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 20 1 0 0 0 0 0 0 0 0 0 0
0 0 1
2 20 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 20 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 20 3 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 21 9 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 21 0.111111111 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 21 1 0 0 0 0 0 0 0 0 0 0
0 0 1
2 21 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 21 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 21 3 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 22 3 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 22 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 22 0.2 0 0 0 0 0 0 0 0 0 0
0 0 1
2 22 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 22 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 22 1 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 23 5 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 23 0.2 0 0 0 0 0 0 0 0 0 0
0 -1 0
2 23 9 0 0 0 0 0 0 0 0 0 0
0 0 1
2 23 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 23 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 23 3 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 24 3 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 24 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 24 1 0 0 0 0 0 0 0 0 0 0
0 0 1
2 24 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 24 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 24 0.2 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 25 9 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 25 9 0 0 0 0 0 0 0 0 0 0
0 -1 0
Appendix B
373
2 25 0.142857143 0 0 0 0 0 0 0 0 0
0 0 0 1
2 25 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 25 0.2 0 0 0 0 0 0 0 0 0 0
-1 0 0
2 25 3 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 26 0.142857143 0 0 0 0 0 0 0 0 0
0 0 1 -1
2 26 0.142857143 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 26 7 0 0 0 0 0 0 0 0 0 0
0 0 1
2 26 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 26 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 26 3 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 27 0.2 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 27 5 0 0 0 0 0 0 0 0 0 0
0 -1 0
2 27 9 0 0 0 0 0 0 0 0 0 0
0 0 1
2 27 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 27 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 27 3 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 28 0.333333333 0 0 0 0 0 0 0 0 0
0 0 1 -1
2 28 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 28 0.333333333 0 0 0 0 0 0 0 0 0
0 0 0 1
2 28 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 28 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 28 3 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 29 5 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 29 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 29 0.2 0 0 0 0 0 0 0 0 0 0
0 0 1
2 29 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 29 0.2 0 0 0 0 0 0 0 0 0 0
-1 0 0
2 29 5 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 30 3 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 30 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 30 9 0 0 0 0 0 0 0 0 0 0
0 0 1
2 30 0.111111111 0 0 0 0 0 0 0 0 0
0 1 0 -1
2 30 0.111111111 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 30 0.333333333 0 0 0 0 0 0 0 0 0
0 1 -1 0
Appendix B
374
2 31 3 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 31 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 31 7 0 0 0 0 0 0 0 0 0 0
0 0 1
2 31 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 31 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 31 0.2 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 32 1 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 32 0.142857143 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 32 9 0 0 0 0 0 0 0 0 0 0
0 0 1
2 32 5 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 32 0.2 0 0 0 0 0 0 0 0 0 0
-1 0 0
2 32 5 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 33 3 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 33 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 33 3 0 0 0 0 0 0 0 0 0 0
0 0 1
2 33 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 33 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 33 3 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 34 3 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 34 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 34 9 0 0 0 0 0 0 0 0 0 0
0 0 1
2 34 5 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 34 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 34 0.333333333 0 0 0 0 0 0 0 0 0
0 1 -1 0
2 36 9 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 36 7 0 0 0 0 0 0 0 0 0 0
0 -1 0
2 36 0.142857143 0 0 0 0 0 0 0 0 0
0 0 0 1
2 36 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 36 0.142857143 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 36 7 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 37 0.142857143 0 0 0 0 0 0 0 0 0
0 0 1 -1
2 37 0.2 0 0 0 0 0 0 0 0 0 0
0 -1 0
2 37 5 0 0 0 0 0 0 0 0 0 0
0 0 1
2 37 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
Appendix B
375
2 37 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 37 3 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 38 3 0 0 0 0 0 0 0 0 0 0
0 1 -1
2 38 0.333333333 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 38 5 0 0 0 0 0 0 0 0 0 0
0 0 1
2 38 3 0 0 0 0 0 0 0 0 0 0
1 0 -1
2 38 0.333333333 0 0 0 0 0 0 0 0 0
0 -1 0 0
2 38 1 0 0 0 0 0 0 0 0 0 0
1 -1 0
2 39 0.111111111 0 0 0 0 0 0 0 0 0
0 0 1 -1
2 39 0.111111111 0 0 0 0 0 0 0 0 0
0 0 -1 0
2 39 9 0 0 0 0 0 0 0 0 0 0
0 0 1
2 39 0.111111111 0 0 0 0 0 0 0 0 0
0 1 0 -1
2 39 9 0 0 0 0 0 0 0 0 0 0
-1 0 0
2 39 0.111111111 0 0 0 0 0 0 0 0 0
0 1 -1 0
END
3 Choices
Dictionary
The table below describes the key variables/parameters/constants that have not already been
encountered in the ratings models.
Name Description nchc Number of criteria in choice experiment nchs Number of choice sets in dataset V[i,k] Non-random component of utility of option k in choice set i pr[i] Probability of choosing option 2 in choice set i ch_N[i] Number of participants choosing an option in choice set i ch_n[i] Number of participants choosing option 2 in choice set i chresdev Residual deviance in choice model
The criteria in the PROTECT patient choice dataset are numbered as follows:
1. Relapse 2. Disability progression 3. PML 4. Allergic/hypersensitivity reactions 5. Serious allergic reactions 6. Depression
Appendix B
376
Choice model code
# This model assumes fixed preferences at the individual level.
model
for(i in 1:nchs) # loop through choice sets
# difference in utility between choices with logistically distributed random component
logit(pr[i]) <- V[i,2] - V[i,1]
#p is probability of choosing right-hand option, corresponding to y=2
ch_n[i]~dbin(pr[i],ch_N[i]) # likelihood
# residual deviance calcs
ch_nhat[i]<-pr[i]*ch_N[i]
chdev[i]<- 2 * (ch_n[i] * (log(ch_n[i])-log(ch_nhat[i])) + (ch_N[i]-
ch_n[i])*(log(ch_N[i]-ch_n[i]) - log(ch_N[i]-ch_nhat[i])))
for(k in 1:2) # loop through choice options
for (j in 1:nchc) pg[i,j,k] <- chsign[j]*chgamma[j]*ch_cr[i,j,k] # each
criterion's contribution to utility
V[i,k] <- sum(pg[i,1:nchc,k])` # total utility of option k
for (j in 1:nchc) chgamma[j]<-eg[j] # fixed preference model
temp[j]<-chc[j] # unused variable
for (j in 1:nchc) eg[j] ~ dgamma(1,0.01) # prior for utility coefficient
g[j]<-log(eg[j])
#weights
for(i in 1:nchc)
weight[i]<-eg[i]/sum(eg[1:nchc]) # normalised weight
chresdev<-sum(chdev[1:nchs]) # residual deviance for choice model
# END
PROTECT patient choices data
list(nchs=64,nchc=6,chc=c(1,2,3,8,9,10),chsign=c(-1,-1,-1,-1,-1,-1))
ch_N[] ch_n[] ch_cr[,1,1] ch_cr[,2,1] ch_cr[,3,1] ch_cr[,4,1]
ch_cr[,5,1] ch_cr[,6,1] ch_cr[,1,2] ch_cr[,2,2] ch_cr[,3,2]
ch_cr[,4,2] ch_cr[,5,2] ch_cr[,6,2]
25 21 2 0.25 0 0 0 0.1 1.5 0.1 0 0.5 0
0.2
25 12 2 0.1 0.003 0.5 0.02 0.2 1.5 0.25 0 0.5
0.02 0.1
Appendix B
377
24 2 2 0.1 0 0 0.02 0.1 2 0.25 0.003 0.5 0
0.1
25 25 1.5 0.25 0 0.5 0.02 0.2 2 0.1 0 0 0
0.2
25 18 1.5 0.25 0.003 0.5 0 0.2 1.5 0.25 0.003 0
0.02 0.1
24 8 2 0.1 0 0.5 0 0.1 1.5 0.1 0.003 0
0.02 0.1
25 1 1.5 0.1 0.003 0 0 0.2 1.5 0.25 0.003 0.5
0.02 0.1
25 6 2 0.1 0.003 0 0 0.2 2 0.25 0 0
0.02 0.2
27 21 2 0.25 0.003 0 0 0.2 2 0.25 0 0.5 0
0.1
27 2 1.5 0.1 0 0 0.02 0.2 2 0.25 0.003 0
0.02 0.1
27 15 2 0.25 0 0.5 0 0.2 1.5 0.25 0 0
0.02 0.2
27 20 1.5 0.1 0.003 0.5 0 0.2 1.5 0.1 0 0.5
0.02 0.1
27 9 2 0.1 0 0.5 0.02 0.2 1.5 0.1 0.003 0.5
0.02 0.1
27 15 1.5 0.25 0.003 0 0 0.1 2 0.1 0.003 0
0.02 0.2
27 19 2 0.25 0 0 0.02 0.1 1.5 0.1 0.003 0 0
0.1
27 4 1.5 0.1 0.003 0.5 0.02 0.2 2 0.25 0.003 0.5 0
0.2
32 23 2 0.25 0.003 0 0 0.2 2 0.25 0 0.5 0
0.1
32 2 2 0.25 0 0 0 0.2 2 0.25 0 0
0.02 0.1
32 13 2 0.25 0 0.5 0 0.2 1.5 0.25 0 0
0.02 0.2
31 20 1.5 0.1 0.003 0.5 0 0.2 1.5 0.1 0 0.5
0.02 0.1
31 13 2 0.1 0 0.5 0.02 0.2 1.5 0.1 0.003 0.5
0.02 0.1
32 16 1.5 0.25 0.003 0 0 0.1 2 0.1 0.003 0
0.02 0.2
30 24 2 0.25 0 0 0.02 0.1 1.5 0.1 0.003 0 0
0.1
31 6 1.5 0.1 0.003 0.5 0.02 0.2 2 0.25 0.003 0.5 0
0.2
30 20 2 0.25 0 0 0 0.1 1.5 0.1 0 0.5 0
0.2
30 18 2 0.1 0.003 0.5 0.02 0.2 1.5 0.25 0 0.5
0.02 0.1
30 2 2 0.1 0 0 0.02 0.1 2 0.25 0.003 0.5 0
0.1
29 28 1.5 0.25 0 0.5 0.02 0.2 2 0.1 0 0 0
0.2
29 19 1.5 0.25 0.003 0.5 0 0.2 1.5 0.25 0.003 0
0.02 0.1
30 10 2 0.1 0 0.5 0 0.1 1.5 0.1 0.003 0
0.02 0.1
30 4 1.5 0.1 0.003 0 0 0.2 1.5 0.25 0.003 0.5
0.02 0.1
30 9 2 0.1 0.003 0 0 0.2 2 0.25 0 0
0.02 0.2
Appendix B
378
23 20 1.5 0.25 0.003 0 0.02 0.2 1.5 0.1 0.003 0.5 0
0.1
22 8 1.5 0.25 0 0.5 0 0.2 2 0.25 0 0.5
0.02 0.1
23 14 2 0.25 0 0.5 0.02 0.2 2 0.25 0.003 0 0
0.1
22 2 2 0.1 0.003 0 0.02 0.2 1.5 0.25 0.003 0.5
0.02 0.2
23 1 2 0.1 0 0.5 0 0.2 2 0.25 0.003 0.5
0.02 0.1
21 11 1.5 0.1 0.003 0 0.02 0.1 1.5 0.25 0 0 0
0.1
23 22 2 0.1 0.003 0.5 0 0.1 1.5 0.1 0 0 0
0.2
23 4 1.5 0.1 0 0.5 0 0.1 2 0.1 0 0
0.02 0.2
22 18 1.5 0.25 0 0 0.02 0.1 2 0.1 0 0.5
0.02 0.1
22 20 1.5 0.1 0.003 0.5 0.02 0.2 2 0.1 0.003 0 0
0.1
22 12 2 0.1 0.003 0.5 0 0.2 1.5 0.25 0 0.5 0
0.1
22 3 1.5 0.25 0 0 0 0.2 1.5 0.25 0.003 0.5 0
0.1
22 19 1.5 0.25 0.003 0 0.02 0.2 1.5 0.1 0 0.5
0.02 0.2
22 3 2 0.1 0.003 0.5 0.02 0.1 2 0.25 0.003 0
0.02 0.2
22 0 2 0.1 0 0 0 0.1 1.5 0.25 0.003 0 0
0.2
22 17 2 0.25 0 0 0 0.2 1.5 0.1 0 0
0.02 0.1
30 23 1.5 0.25 0 0 0.02 0.1 2 0.1 0 0.5
0.02 0.1
30 29 1.5 0.1 0.003 0.5 0.02 0.2 2 0.1 0.003 0 0
0.1
30 9 2 0.1 0.003 0.5 0 0.2 1.5 0.25 0 0.5 0
0.1
30 8 1.5 0.25 0 0 0 0.2 1.5 0.25 0.003 0.5 0
0.1
30 27 1.5 0.25 0.003 0 0.02 0.2 1.5 0.1 0 0.5
0.02 0.2
30 3 2 0.1 0.003 0.5 0.02 0.1 2 0.25 0.003 0
0.02 0.2
30 1 2 0.1 0 0 0 0.1 1.5 0.25 0.003 0 0
0.2
30 26 2 0.25 0 0 0 0.2 1.5 0.1 0 0
0.02 0.1
32 31 1.5 0.25 0.003 0 0.02 0.2 1.5 0.1 0.003 0.5 0
0.1
32 12 1.5 0.25 0 0.5 0 0.2 2 0.25 0 0.5
0.02 0.1
32 19 2 0.25 0 0.5 0.02 0.2 2 0.25 0.003 0 0
0.1
32 1 2 0.1 0.003 0 0.02 0.2 1.5 0.25 0.003 0.5
0.02 0.2
32 2 2 0.1 0 0.5 0 0.2 2 0.25 0.003 0.5
0.02 0.1
32 11 1.5 0.1 0.003 0 0.02 0.1 1.5 0.25 0 0 0
0.1
Appendix B
379
32 26 2 0.1 0.003 0.5 0 0.1 1.5 0.1 0 0 0
0.2
32 3 1.5 0.1 0 0.5 0 0.1 2 0.1 0 0
0.02 0.2
END
Appendix B
380
4 Preference meta-analysis
Dictionary
The table below describes the key variables/parameters/constants that have not already been
explained.
Name Description nps Number of preference elicitation studies in dataset pma_n[i] Number of participants in study i nop[i] Number of outcomes in study i np[i,j] Number of coefficients reported for criterion j in study i up[i,j,k] Coefficient k for criterion j in study i upse[i,j,k] Standard error of up[i,j,k] x[i,j,k] Criterion value to which up[i,j,k] relates zeta[i] Scaling coefficient for study i minlev[i,j] Lowest categorical level for which utility is to be estimated for criterion j in
study i maxlev[i,j] Highestest categorical level t for which utility is to be estimated for
criterion j in study i lev[i,j,k] Categorical level to be estimated based on up[i,j,k] (=1 for linear
criteria) levsign[i,j,k] Takes value 1 if the categorical level to be estimated based on up[i,j,k]
is more favourable than the reference category in study I (or if it is a linear criterion for which higher values are more favourable); takes value -1 othewise
base[i,j] Integer code for administration reference category in study i offset[m] Parameter to use for reference category adjustment represented by code
m pmagamma[i,j,k] Study-specific utility coefficient for level k of criterion j in study I (random
preference model) pmaresdev Residual deviance in preference meta-analysis model
The criteria and studies in the RRMS preference meta-analysis dataset are numbered as follows:
Criteria
1. Relapse 2. Disability progression 3. Daily oral vs daily subcutaneous 4. Monthly infusion vs daily subcutaneous 5. Weekly injection vs daily subcutaneous
Studies
1. ARROYO 2. MANSFIELD 3. POULOS 4. WILSON 2014 5. WILSON 2015 6. GARCIA-DOMINGUEZ 7. UTZ
Preference meta-analysis model code
# This model uses random preferences at the study level; for the fixed-preference model replace the code in blue with for (k in minlev[i,j]:maxlev[i,j]) pmagamma[i,j,k]<-
abs(eg[op[i,j]+k-1]-offset[base[i,j]+1])
# Note that the scaling coefficients and base offsets are hard coded to the RRMS dataset.
Appendix B
381
model
prefresig~dunif(0,10) # prior for random preference standard deviation
for (i in 1:nps) # studies reporting preference coefficients
temp1[i]<-pma_n[i] # unused variable (fixed preference model)
for (j in 1:nop[i]) # loop through outcomes
AA[i,j]~dnorm(0,1)
for (k in 1:np[i,j]) # loop through utility estimates for outcome
j
up[i,j,k]~dnorm(atheta[i,j,k],prep[i,j,k]) # likelihood of
observed utility coefficient
atheta[i,j,k]<-theta[i,j,k] + sqrt(0.5)*upse[i,j,k]*AA[i,j] #
expected value of observed utility coefficient (with dummy coding
covariance adjustment)
prep[i,j,k] <- pow(upse[i,j,k],-2)*2 #
precision (with correction for dummy coding covariance)
pmares[i,j,k] <- up[i,j,k]-theta[i,j,k] # residual
theta[i,j,k]<-
levsign[i,j,k]*pmagamma[i,j,lev[i,j,k]]*ux[i,j,k]*zeta[i] # expected
value of observed utility coefficient
# residual deviance calcs
pma_cv[i,j,k,k]<- pow(upse[i,j,k],2)
pmapres[i,j,k]<-inprod(pma_cp[i,j,k,1:np[i,j]],
pmares[i,j,1:np[i,j]] )
for (m in 1:k-1) pma_cv[i,j,k,m]<-
0.5*upse[i,j,k]*upse[i,j,m]
pma_cv[i,j,m,k] <- pma_cv[i,j,k,m]
for (k in np[i,j]+1:maxp) pma_cv[i,j,k,k]<-1
for (m in 1:k-1) pma_cv[i,j,k,m]<-0
pma_cv[i,j,m,k]<-0
BB[i,j]~dnorm(0,1) # normalised component of between-study
variance shared for study i outcome j
for (k in minlev[i,j]:maxlev[i,j]) # loop through levels for
which coefficients are to be estimated
pmatau[i,j,k]<-pow(prefresig*(eg[op[i,j]+k-1]-offset[base[i,j]+1]),-
2)*2
pmagamma[i,j,k]~dnorm(pmamu[i,j,k],pmatau[i,j,k]) I(0,) # random
utility coefficeint distribution
Appendix B
382
pmamu[i,j,k]<-abs(eg[op[i,j]+k-1]-offset[base[i,j]+1]) +
sqrt(0.5)*BB[i,j]*prefresig*(eg[op[i,j]+k-1]-offset[base[i,j]+1]) # mean
of random utility coefficient distribution (with covariance adjustment)
pma_cp[i,j,1:maxp,1:maxp] <- inverse(pma_cv[i,j,1:maxp,1:maxp]
)
pmadev[i,j]<- inprod(pmapres[i,j,1:np[i,j]], pmares[i,j,1:np[i,j]] )
pmardev[i]<-sum(pmadev[i,1:nop[i]]) # residual deviance contribution
for study i
#scaling coefficients (note hard coded to MS dataset)
zeta[2]<-1
zeta[3]<-1
zeta[4]<-1
zeta[6]<-1
zeta[7]<-1
zeta[8]<-1
zeta[1]~dnorm(0,.01) I(0,)
zeta[5]~dnorm(0,.01) I(0,)
#base offsets (sets admin reference category; note hard coded to MS
dataset)
offset[1]<-0
offset[2]<-eg[3]
for (j in 1:npmac) eg[j] ~ dgamma(1,0.01) # prior for utility
coefficients
g[j]<-log(eg[j])
temp2[j]<-pmac[j] # unused variable
weight[j]<-eg[j]/sum(eg[1:npmac]) # normalised preference weights
weightb[j]<-eg[j]/sum(eg[1:3]) # normalised preference weights
excluding intermediate admin levels (note hard coding)
pmaresdev<-sum(pmardev[1:nps]) # residual deviance for preference
meta-analysis model
# END
Appendix B
383
Preference meta-analysis dataset: RRMS
list(nps=7,maxp=3,npmac=5,pmac=c(1,2,5,6,7),nc=11,wi=c(1,1,0,1,1,0,0,0,0,
0,0,0,0,0),wib=c(1,1,0,1,1,0,1,0,0,0,0,0,0,0))
pma_n[] nop[] op[,1] op[,2] op[,3] np[,1] np[,2]
np[,3] base[,1] base[,2] base[,3] ux[,1,1]
ux[,1,2] ux[,1,3] ux[,2,1] ux[,2,2] ux[,2,3]
ux[,3,1] ux[,3,2] ux[,3,3] up[,1,1] up[,1,2]
up[,1,3] up[,2,1] up[,2,2] up[,2,3] up[,3,1]
up[,3,2] up[,3,3] upse[,1,1] upse[,1,2] upse[,1,3]
upse[,2,1] upse[,2,2] upse[,2,3] upse[,3,1] upse[,3,2]
upse[,3,3] lev[,1,1] lev[,1,2] lev[,1,3] lev[,2,1]
lev[,2,2] lev[,2,3] lev[,3,1] lev[,3,2] lev[,3,3]
levsign[,1,1] levsign[,1,2] levsign[,1,3] levsign[,2,1]
levsign[,2,2] levsign[,2,3] levsign[,3,1] levsign[,3,2]
levsign[,3,3] minlev[,1] minlev[,2] minlev[,3] maxlev[,1]
maxlev[,2] maxlev[,3]
221 3 1 2 3 1 1 3 0 0 0 0.3
NA NA 0.302440605 NA NA 1 1 1 -0.734
NA NA -0.89 NA NA 1.726 0.56 0.35 0.017624027 NA
NA 0.017624027 NA NA 0.021564908 0.210222243 0.02516081 1
NA NA 1 NA NA 1 2 3 -1 NA NA -
1 NA NA 1 1 1 1 1 1 1 1 3
301 2 1 3 NA 3 2 NA 0 1 NA 0.042
0.075 0.375 1 1 NA NA NA NA -0.05 -0.2 -0.75 -
1.7000 -1.5 NA NA NA NA 0.124973966 0.124973966
0.124973966 0.210114901 0.210114901 NA NA NA NA 1 1
1 2 3 NA NA NA NA -1 -1 -1 -1 -
1 NA NA NA NA 1 2 NA 1 3 NA
189 2 1 2 NA 2 2 NA 0 0 NA 0.5
0.75 NA -0.232544158 -0.471195376 NA NA NA
NA -0.7 -1.1 NA 0.6 2.1 NA NA NA NA
0.155172514 0.176739878 NA 0.222392803 0.265109817 NA NA
NA NA 1 1 NA 1 1 NA NA NA NA -
1 -1 NA -1 -1 NA NA NA NA 1 1 NA 1
1 NA
291 3 1 2 3 2 2 3 0 0 0 -0.5 -
0.8 NA -0.238651219 -0.450851312 NA 1 1 1
0.182321557 0.425267735 NA 0.3074847 0.90016135 NA
0.732367894 0.482426149 0.039220713 0.051191504 0.051695161 NA
0.050625239 0.051817522 NA 0.062410652 0.026711978 0.060736003 1
1 NA 1 1 NA 1 2 3 -1 -1 NA -
1 -1 NA 1 1 1 1 1 1 1 1 3
50 2 1 3 NA 1 3 NA 0 1 NA 1
NA NA 1 1 1 NA NA NA -0.05 NA NA -
1.23 -1.41 -0.86 NA NA NA 0.06 NA NA 0.24 0.24 0.24
NA NA NA 1 NA NA 2 2 3 NA NA
NA -1 NA NA -1 -1 -1 NA NA NA 1 2
NA 1 3 NA
125 1 3 NA NA 2 NA NA 1 NA NA 1 1
NA NA NA NA NA NA NA -0.849 -0.943
NA NA NA NA NA NA NA 0.113 0.103 NA NA
NA NA NA NA NA 2 2 NA NA NA NA
NA NA NA -1 -1 NA NA NA NA NA NA
NA 2 NA NA 2 NA NA
156 1 3 NA NA 1 NA NA 1 NA NA 1
NA NA NA NA NA NA NA NA -4.38 NA NA
NA NA NA NA NA NA 0.504493749 NA NA NA
NA NA NA NA NA 2 NA NA NA NA NA
Appendix B
384
NA NA NA -1 NA NA NA NA NA NA NA
NA 2 NA NA 2 NA NA
END
5 Combined preference model
Dictionary
The table below describes the key variables/parameters/constants that have not already been
explained.
Name Description chc Vector of criteria numbers in choice data rac Vector of criteria numbers in ratings data pmac Vector of criteria numbers in preference meta-analysis data nc Total number of criteria in model wi[] Indicates whether criterion is to be included in weights pred_egamma[] Study-level predictve distribution of utility coefficient for criterion pred_pref[] Individual-level predictve distribution of utility coefficient for criterion weight_pred_study[] Study-level predictve distribution of preference weight for criterion weight_pred_y[] Individual-level predictve distribution of preference weight for criterion
totresdev Total residual deviance in all preference models
The criteria are numbered as follows:
1. Relapse 2. Disability progression 3. PML 4. Liver enzyme elevation 5. Daily oral vs daily subcutaneous 6. Monthly infusion vs daily subcutaneous 7. Weekly intramuscular vs daily subcutaneous 8. Allergic/hypersensitivity reactions 9. Serious allergic reactions 10. Depression 11. Infusion/injection reactions
Combined preference model code – fixed preferences
model
for(i in 1:nchs) # loop through choice sets
# difference in utility between choices with logistically distributed random
component
logit(pr[i]) <- V[i,2] - V[i,1]
#p is probability of choosing right-hand option, corresponding to y=2
ch_n[i]~dbin(pr[i],ch_N[i])
Appendix B
385
ch_nhat[i]<-pr[i]*ch_N[i]
chdev[i]<- 2 * (ch_n[i] * (log(ch_n[i])-log(ch_nhat[i])) + (ch_N[i]-
ch_n[i])*(log(ch_N[i]-ch_n[i]) - log(ch_N[i]-ch_nhat[i])))
for(k in 1:2)
for (j in 1:nchc) pg[i,j,k] <- chsign[j]*chgamma[j]*ch_cr[i,j,k]
#Utility model
V[i,k] <- sum(pg[i,1:nchc,k])
for (j in 1:nchc) chgamma[j]<-eg[chc[j]]
for(i in 1:nr) # loop through ratings
#Utility model
for (j in 1:nrc) pmu[j,i] <- log(ragamma[rastudyid[i],j])*(cr[i,j]-
0.5*equals(j,1)*cr[i,j]) # when j=1 relapse coefficient is halved due to
differing time horizons in ratings dataset
ramu[i] <- sum(pmu[1:nrc,i])
logra[i]<-log(ra[i])
logra[i]~dnorm(ramu[i],ratau)
radev[i] <- (logra[i]-ramu[i]) * (logra[i]-ramu[i]) * ratau # residual
deviance contribution
for (j in 1:nrc)
for (k in 1:nrastud)
ragamma[k,j]<-prefmu[k,j]
prefmu[k,j] <- eg[rac[j]] + (k-1)*ahpalpha # includes alpha
for study 2 but not study 1
ahpalpha~dgamma(1,0.1)
ratau<-pow(rasig,-2)
rasig~dunif(0,10)
for (i in 1:nps) # studies reporting preference coefficients
for (j in 1:nop[i])
AA[i,j]~dnorm(0,1)
for (k in 1:np[i,j])
up[i,j,k]~dnorm(atheta[i,j,k],prep[i,j,k])
atheta[i,j,k]<-theta[i,j,k] + sqrt(0.5)*upse[i,j,k]*AA[i,j]
Appendix B
386
prep[i,j,k] <- pow(upse[i,j,k],-2)*2
# precision (with correction for dummy coding covariance)
pmares[i,j,k] <- up[i,j,k]-theta[i,j,k]
theta[i,j,k]<-
levsign[i,j,k]*pmagamma[i,j,lev[i,j,k]]*ux[i,j,k]*zeta[i]
pma_cv[i,j,k,k]<- pow(upse[i,j,k],2)
pmapres[i,j,k]<-inprod(pma_cp[i,j,k,1:np[i,j]],
pmares[i,j,1:np[i,j]] )
for (m in 1:k-1) pma_cv[i,j,k,m]<- 0.5*upse[i,j,k]*upse[i,j,m]
pma_cv[i,j,m,k] <- pma_cv[i,j,k,m]
for (k in np[i,j]+1:maxp) pma_cv[i,j,k,k]<-1
for (m in 1:k-1) pma_cv[i,j,k,m]<-0
pma_cv[i,j,m,k]<-0
BB[i,j]~dnorm(0,1)
for (k in minlev[i,j]:maxlev[i,j])
pmagamma[i,j,k]<-abs(eg[pmac[op[i,j]+k-1]]-offset[base[i,j]+1])
pma_cp[i,j,1:maxp,1:maxp] <- inverse(pma_cv[i,j,1:maxp,1:maxp] )
pmadev[i,j]<- inprod(pmapres[i,j,1:np[i,j]], pmares[i,j,1:np[i,j]] )
pmardev[i]<-sum(pmadev[i,1:nop[i]])
zeta[2]<-1
zeta[3]<-1
zeta[4]<-1
zeta[6]<-1
zeta[7]<-1
zeta[8]<-1
zeta[1]~dnorm(0,.01) I(0,)
zeta[5]~dnorm(0,.01) I(0,)
#base offsets
offset[1]<-0
offset[2]<-eg[pmac[3]]
#priors
for (j in 1:nc) eg[j] ~ dgamma(1,0.01)
g[j]<-log(eg[j])
Appendix B
387
#weights
for(i in 1:nc)
wg[i]<-wi[i]*eg[i]
weight[i]<-wg[i]/sum(wg[1:nc])
weightb[i]<-eg[i]/sum(wg[1:nc])
pratau<-2*ratau
chresdev<-sum(chdev[1:nchs])
raresdev<-sum(radev[1:nr])
pmaresdev<-sum(pmardev[1:nps])
totresdev<-raresdev+chresdev+pmaresdev
# END
Combined preference model code – random preferences
# This model includes predictive distributions on preferences, using method 1 (as described in III.6.2.3) for the indidivual-level distributions. For method 2, replace the blue code with: pred_gamma[i]<-log(pred_egamma[i])
pred_pref[i]~dlnorm(pred_gamma[i],pratau)
model
for(i in 1:nchs) # loop through choice sets
# difference in utility between choices with logistically distributed random
component
logit(pr[i]) <- V[i,2] - V[i,1]
#p is probability of choosing right-hand option, corresponding to y=2
ch_n[i]~dbin(pr[i],ch_N[i])
ch_nhat[i]<-pr[i]*ch_N[i]
chdev[i]<- 2 * (ch_n[i] * (log(ch_n[i])-log(ch_nhat[i])) + (ch_N[i]-
ch_n[i])*(log(ch_N[i]-ch_n[i]) - log(ch_N[i]-ch_nhat[i])))
for(k in 1:2)
for (j in 1:nchc) pg[i,j,k] <- chsign[j]*chgamma[j]*ch_cr[i,j,k]
#Utility model
V[i,k] <- sum(pg[i,1:nchc,k])
for (j in 1:nchc) chgamma[j]~dnorm(eg[chc[j]],prefretau[j]) I(0,)
for(i in 1:nr) # loop through ratings
Appendix B
388
#Utility model
for (j in 1:nrc) pmu[j,i] <- log(ragamma[rastudyid[i],j])*(cr[i,j]-
0.5*equals(j,1)*cr[i,j]) # when j=1 relapse coefficient is halved due to
differing time horizons in ratings dataset
ramu[i] <- sum(pmu[1:nrc,i])
logra[i]<-log(ra[i])
logra[i]~dnorm(ramu[i],ratau)
radev[i] <- (logra[i]-ramu[i]) * (logra[i]-ramu[i]) * ratau # residual
deviance contribution
for (j in 1:nrc)
for (k in 1:nrastud)
ragamma[k,j]~dnorm(prefmu[k,j],raprefretau[k,j]) I(0,)
prefmu[k,j] <- eg[rac[j]] + (k-1)*ahpalpha # includes alpha
for study 2 but not study 1
raprefretau[k,j]<-pow(prefresig*prefmu[k,j],-2)
ahpalpha~dgamma(1,0.1)
ratau<-pow(rasig,-2)
rasig~dunif(0,10)
prefresig~dunif(0,10)
for (i in 1:nps) # studies reporting preference coefficients
for (j in 1:nop[i])
AA[i,j]~dnorm(0,1)
for (k in 1:np[i,j])
up[i,j,k]~dnorm(atheta[i,j,k],prep[i,j,k])
atheta[i,j,k]<-theta[i,j,k] + sqrt(0.5)*upse[i,j,k]*AA[i,j]
prep[i,j,k] <- pow(upse[i,j,k],-2)*2
# precision (with correction for dummy coding covariance)
pmava[i,j,k]<-pma_n[i]*pow(upse[i,j,k],2)
pmava_prec[i,j,k]<-pow(pmava[i,j,k]*sqrt(2/pma_n[i]),-1)
pmava[i,j,k]~dnorm(pmavam[i,j,k],pmava_prec[i,j,k]) I(0,)
pmavam[i,j,k]<-pmava_mu*cut(theta[i,j,k])*cut(theta[i,j,k])
pmares[i,j,k] <- up[i,j,k]-theta[i,j,k]
theta[i,j,k]<-
levsign[i,j,k]*pmagamma[i,j,lev[i,j,k]]*ux[i,j,k]*zeta[i]
pma_cv[i,j,k,k]<- pow(upse[i,j,k],2)
pmapres[i,j,k]<-inprod(pma_cp[i,j,k,1:np[i,j]],
pmares[i,j,1:np[i,j]] )
for (m in 1:k-1) pma_cv[i,j,k,m]<-
0.5*upse[i,j,k]*upse[i,j,m]
Appendix B
389
pma_cv[i,j,m,k] <- pma_cv[i,j,k,m]
for (k in np[i,j]+1:maxp) pma_cv[i,j,k,k]<-1
for (m in 1:k-1) pma_cv[i,j,k,m]<-0
pma_cv[i,j,m,k]<-0
BB[i,j]~dnorm(0,1)
for (k in minlev[i,j]:maxlev[i,j])
pmatau[i,j,k]<-pow(prefresig*(eg[pmac[op[i,j]+k-1]]-
offset[base[i,j]+1]),-2)*2
pmagamma[i,j,k]~dnorm(pmamu[i,j,k],pmatau[i,j,k]) I(0,)
pmamu[i,j,k]<-abs(eg[pmac[op[i,j]+k-1]]-offset[base[i,j]+1]) +
sqrt(0.5)*BB[i,j]*prefresig*(eg[pmac[op[i,j]+k-1]]-offset[base[i,j]+1])
pma_cp[i,j,1:maxp,1:maxp] <- inverse(pma_cv[i,j,1:maxp,1:maxp] )
pmadev[i,j]<- inprod(pmapres[i,j,1:np[i,j]], pmares[i,j,1:np[i,j]] )
pmardev[i]<-sum(pmadev[i,1:nop[i]])
zeta[2]<-1
zeta[3]<-1
zeta[4]<-1
zeta[6]<-1
zeta[7]<-1
zeta[8]<-1
zeta[1]~dnorm(0,.01) I(0,)
zeta[5]~dnorm(0,.01) I(0,)
#base offsets
offset[1]<-0
offset[2]<-eg[pmac[3]]
#priors
for (j in 1:nc) eg[j] ~ dgamma(1,0.01)
g[j]<-log(eg[j])
prefretau[j]<-pow(prefresig*eg[j],-2)
#weights
for(i in 1:nc)
wg[i]<-wi[i]*eg[i]
weight[i]<-wg[i]/sum(wg[1:nc])
weightb[i]<-eg[i]/sum(wg[1:nc])
pmaprec_mu[i]<-pow(pmava_mu*eg[i]*eg[i],-1)
pmava_mu~dunif(0,100)
# predictive preferences
Appendix B
390
for (i in 1:nc) pred_egamma[i]~dnorm(eg[i],prefretau[i]) I(0,)
pred_wegamma[i]<-wi[i]*pred_egamma[i]
weight_pred_study[i]<-pred_wegamma[i]/sum(pred_wegamma[1:nc])
weightb_pred_study[i]<-pred_egamma[i]/sum(pred_wegamma[1:nc])
pred_pref[i]~dnorm(pred_egamma[i],pmaprec_mu[i]) I(0,)
pred_wpref[i]<-wi[i]*pred_pref[i]
weight_pred_y[i]<-pred_wpref[i]/sum(pred_wpref[1:nc])
weightb_pred_y[i]<-pred_pref[i]/sum(pred_wpref[1:nc])
pratau<-2*ratau
chresdev<-sum(chdev[1:nchs])
raresdev<-sum(radev[1:nr])
pmaresdev<-sum(pmardev[1:nps])
totresdev<-raresdev+chresdev+pmaresdev
# END
Ratings dataset for combined preference model
# The combined model needs this altered data file for the ratings dataset (but for the choice or preference meta-analysis datasets the same data files already presented can be used). list(nr=231,nrc=9,nc=11,wi=c(1,1,0,1,1,0,0,0,0,0,0,0,0,0),nrastud=2,rac=c(1,2,3,4
,5,6,7,11,8))
rastudyid[] ra[] cr[,1] cr[,2] cr[,3] cr[,4] cr[,5] cr[,6] cr[,7] cr[,8] cr[,9]
1 0.7 1 -1 0 0 0 0 0 0 0
1 0.1 0 1 -1 0 0 0 0 0 0
1 0.01 0 0 -1 0 1 0 0 0 0
1 0.2 0 0 -1 1 0 0 0 0 0
1 0.05 0 0 -1 0 0 0 0 1 0
1 0.4 0 0 0 0 0 0 0 -1 1
1 0.7 0 0 0 0 -1 1 0 0 0
1 0.5 0 0 0 0 -1 0 1 0 0
1 0.7 1 -1 0 0 0 0 0 0 0
1 0.9 0 1 -1 0 0 0 0 0 0
1 0.1 0 0 -1 0 1 0 0 0 0
1 0.2 0 0 -1 1 0 0 0 0 0
1 0.05 0 0 -1 0 0 0 0 1 0
1 0.4 0 0 0 0 0 0 0 -1 1
1 0.7 0 0 0 0 -1 1 0 0 0
1 0.5 0 0 0 0 -1 0 1 0 0
1 0.6 1 -1 0 0 0 0 0 0 0
1 0.9 0 1 -1 0 0 0 0 0 0
1 0.1 0 0 -1 0 1 0 0 0 0
1 0.2 0 0 -1 1 0 0 0 0 0
1 0.05 0 0 -1 0 0 0 0 1 0
1 0.888888889 0 0 0 0 0 0 0 -1 1
1 0.7 0 0 0 0 -1 1 0 0 0
1 0.5 0 0 0 0 -1 0 1 0 0
2 0.111111111 0 0 0 0 0 1 -1 0 0
2 0.142857143 0 0 0 0 0 -1 0 0 0
2 7 0 0 0 0 0 0 1 0 0
2 3 0 0 0 0 1 0 -1 0 0
2 0.333333333 0 0 0 0 -1 0 0 0 0
2 3 0 0 0 0 1 -1 0 0 0
2 3 0 0 0 0 0 1 -1 0 0
2 0.2 0 0 0 0 0 -1 0 0 0
2 0.111111111 0 0 0 0 0 0 1 0 0
Appendix B
391
2 5 0 0 0 0 1 0 -1 0 0
2 5 0 0 0 0 -1 0 0 0 0
2 5 0 0 0 0 1 -1 0 0 0
2 3 0 0 0 0 0 1 -1 0 0
2 0.333333333 0 0 0 0 0 -1 0 0 0
2 9 0 0 0 0 0 0 1 0 0
2 3 0 0 0 0 1 0 -1 0 0
2 0.333333333 0 0 0 0 -1 0 0 0 0
2 9 0 0 0 0 1 -1 0 0 0
2 9 0 0 0 0 0 0 1 0 0
2 3 0 0 0 0 1 0 -1 0 0
2 0.333333333 0 0 0 0 -1 0 0 0 0
2 7 0 0 0 0 1 -1 0 0 0
2 5 0 0 0 0 0 1 -1 0 0
2 0.333333333 0 0 0 0 0 -1 0 0 0
2 3 0 0 0 0 1 0 -1 0 0
2 0.333333333 0 0 0 0 -1 0 0 0 0
2 3 0 0 0 0 1 -1 0 0 0
2 3 0 0 0 0 0 1 -1 0 0
2 0.333333333 0 0 0 0 0 -1 0 0 0
2 3 0 0 0 0 0 0 1 0 0
2 0.333333333 0 0 0 0 1 0 -1 0 0
2 0.333333333 0 0 0 0 -1 0 0 0 0
2 0.333333333 0 0 0 0 1 -1 0 0 0
2 3 0 0 0 0 0 1 -1 0 0
2 0.333333333 0 0 0 0 0 -1 0 0 0
2 5 0 0 0 0 0 0 1 0 0
2 3 0 0 0 0 1 0 -1 0 0
2 0.333333333 0 0 0 0 -1 0 0 0 0
2 9 0 0 0 0 1 -1 0 0 0
2 3 0 0 0 0 0 1 -1 0 0
2 0.333333333 0 0 0 0 0 -1 0 0 0
2 3 0 0 0 0 0 0 1 0 0
2 1 0 0 0 0 1 0 -1 0 0
2 1 0 0 0 0 -1 0 0 0 0
2 0.333333333 0 0 0 0 1 -1 0 0 0
2 1 0 0 0 0 0 1 -1 0 0
2 1 0 0 0 0 0 -1 0 0 0
2 1 0 0 0 0 0 0 1 0 0
2 1 0 0 0 0 1 0 -1 0 0
2 1 0 0 0 0 -1 0 0 0 0
2 1 0 0 0 0 1 -1 0 0 0
2 5 0 0 0 0 0 1 -1 0 0
2 0.2 0 0 0 0 0 -1 0 0 0
2 9 0 0 0 0 0 0 1 0 0
2 7 0 0 0 0 1 0 -1 0 0
2 0.142857143 0 0 0 0 -1 0 0 0 0
2 1 0 0 0 0 1 -1 0 0 0
2 9 0 0 0 0 0 1 -1 0 0
2 0.111111111 0 0 0 0 0 -1 0 0 0
2 1 0 0 0 0 0 0 1 0 0
2 1 0 0 0 0 1 0 -1 0 0
2 1 0 0 0 0 -1 0 0 0 0
2 1 0 0 0 0 1 -1 0 0 0
2 0.111111111 0 0 0 0 0 1 -1 0 0
2 3 0 0 0 0 0 -1 0 0 0
2 0.2 0 0 0 0 0 0 1 0 0
2 5 0 0 0 0 1 0 -1 0 0
2 1 0 0 0 0 -1 0 0 0 0
2 5 0 0 0 0 1 -1 0 0 0
2 1 0 0 0 0 0 1 -1 0 0
2 1 0 0 0 0 0 -1 0 0 0
2 1 0 0 0 0 0 0 1 0 0
2 5 0 0 0 0 1 0 -1 0 0
2 0.2 0 0 0 0 -1 0 0 0 0
2 5 0 0 0 0 1 -1 0 0 0
2 3 0 0 0 0 0 1 -1 0 0
2 0.333333333 0 0 0 0 0 -1 0 0 0
Appendix B
392
2 3 0 0 0 0 0 0 1 0 0
2 3 0 0 0 0 1 0 -1 0 0
2 0.333333333 0 0 0 0 -1 0 0 0 0
2 3 0 0 0 0 1 -1 0 0 0
2 5 0 0 0 0 0 1 -1 0 0
2 0.333333333 0 0 0 0 0 -1 0 0 0
2 9 0 0 0 0 0 0 1 0 0
2 3 0 0 0 0 1 0 -1 0 0
2 0.333333333 0 0 0 0 -1 0 0 0 0
2 0.333333333 0 0 0 0 1 -1 0 0 0
2 3 0 0 0 0 0 1 -1 0 0
2 0.333333333 0 0 0 0 0 -1 0 0 0
2 3 0 0 0 0 0 0 1 0 0
2 3 0 0 0 0 1 0 -1 0 0
2 0.333333333 0 0 0 0 -1 0 0 0 0
2 3 0 0 0 0 1 -1 0 0 0
2 3 0 0 0 0 0 1 -1 0 0
2 0.333333333 0 0 0 0 0 -1 0 0 0
2 1 0 0 0 0 0 0 1 0 0
2 3 0 0 0 0 1 0 -1 0 0
2 0.333333333 0 0 0 0 -1 0 0 0 0
2 3 0 0 0 0 1 -1 0 0 0
2 9 0 0 0 0 0 1 -1 0 0
2 0.111111111 0 0 0 0 0 -1 0 0 0
2 1 0 0 0 0 0 0 1 0 0
2 3 0 0 0 0 1 0 -1 0 0
2 0.333333333 0 0 0 0 -1 0 0 0 0
2 3 0 0 0 0 1 -1 0 0 0
2 3 0 0 0 0 0 1 -1 0 0
2 0.333333333 0 0 0 0 0 -1 0 0 0
2 0.2 0 0 0 0 0 0 1 0 0
2 3 0 0 0 0 1 0 -1 0 0
2 0.333333333 0 0 0 0 -1 0 0 0 0
2 1 0 0 0 0 1 -1 0 0 0
2 5 0 0 0 0 0 1 -1 0 0
2 0.2 0 0 0 0 0 -1 0 0 0
2 9 0 0 0 0 0 0 1 0 0
2 3 0 0 0 0 1 0 -1 0 0
2 0.333333333 0 0 0 0 -1 0 0 0 0
2 3 0 0 0 0 1 -1 0 0 0
2 3 0 0 0 0 0 1 -1 0 0
2 0.333333333 0 0 0 0 0 -1 0 0 0
2 1 0 0 0 0 0 0 1 0 0
2 3 0 0 0 0 1 0 -1 0 0
2 0.333333333 0 0 0 0 -1 0 0 0 0
2 0.2 0 0 0 0 1 -1 0 0 0
2 9 0 0 0 0 0 1 -1 0 0
2 9 0 0 0 0 0 -1 0 0 0
2 0.142857143 0 0 0 0 0 0 1 0 0
2 3 0 0 0 0 1 0 -1 0 0
2 0.2 0 0 0 0 -1 0 0 0 0
2 3 0 0 0 0 1 -1 0 0 0
2 0.142857143 0 0 0 0 0 1 -1 0 0
2 0.142857143 0 0 0 0 0 -1 0 0 0
2 7 0 0 0 0 0 0 1 0 0
2 3 0 0 0 0 1 0 -1 0 0
2 0.333333333 0 0 0 0 -1 0 0 0 0
2 3 0 0 0 0 1 -1 0 0 0
2 0.2 0 0 0 0 0 1 -1 0 0
2 5 0 0 0 0 0 -1 0 0 0
2 9 0 0 0 0 0 0 1 0 0
2 3 0 0 0 0 1 0 -1 0 0
2 0.333333333 0 0 0 0 -1 0 0 0 0
2 3 0 0 0 0 1 -1 0 0 0
2 0.333333333 0 0 0 0 0 1 -1 0 0
2 0.333333333 0 0 0 0 0 -1 0 0 0
2 0.333333333 0 0 0 0 0 0 1 0 0
2 3 0 0 0 0 1 0 -1 0 0
Appendix B
393
2 0.333333333 0 0 0 0 -1 0 0 0 0
2 3 0 0 0 0 1 -1 0 0 0
2 5 0 0 0 0 0 1 -1 0 0
2 0.333333333 0 0 0 0 0 -1 0 0 0
2 0.2 0 0 0 0 0 0 1 0 0
2 3 0 0 0 0 1 0 -1 0 0
2 0.2 0 0 0 0 -1 0 0 0 0
2 5 0 0 0 0 1 -1 0 0 0
2 3 0 0 0 0 0 1 -1 0 0
2 0.333333333 0 0 0 0 0 -1 0 0 0
2 9 0 0 0 0 0 0 1 0 0
2 0.111111111 0 0 0 0 1 0 -1 0 0
2 0.111111111 0 0 0 0 -1 0 0 0 0
2 0.333333333 0 0 0 0 1 -1 0 0 0
2 3 0 0 0 0 0 1 -1 0 0
2 0.333333333 0 0 0 0 0 -1 0 0 0
2 7 0 0 0 0 0 0 1 0 0
2 3 0 0 0 0 1 0 -1 0 0
2 0.333333333 0 0 0 0 -1 0 0 0 0
2 0.2 0 0 0 0 1 -1 0 0 0
2 1 0 0 0 0 0 1 -1 0 0
2 0.142857143 0 0 0 0 0 -1 0 0 0
2 9 0 0 0 0 0 0 1 0 0
2 5 0 0 0 0 1 0 -1 0 0
2 0.2 0 0 0 0 -1 0 0 0 0
2 5 0 0 0 0 1 -1 0 0 0
2 3 0 0 0 0 0 1 -1 0 0
2 0.333333333 0 0 0 0 0 -1 0 0 0
2 3 0 0 0 0 0 0 1 0 0
2 3 0 0 0 0 1 0 -1 0 0
2 0.333333333 0 0 0 0 -1 0 0 0 0
2 3 0 0 0 0 1 -1 0 0 0
2 3 0 0 0 0 0 1 -1 0 0
2 0.333333333 0 0 0 0 0 -1 0 0 0
2 9 0 0 0 0 0 0 1 0 0
2 5 0 0 0 0 1 0 -1 0 0
2 0.333333333 0 0 0 0 -1 0 0 0 0
2 0.333333333 0 0 0 0 1 -1 0 0 0
2 9 0 0 0 0 0 1 -1 0 0
2 7 0 0 0 0 0 -1 0 0 0
2 0.142857143 0 0 0 0 0 0 1 0 0
2 3 0 0 0 0 1 0 -1 0 0
2 0.142857143 0 0 0 0 -1 0 0 0 0
2 7 0 0 0 0 1 -1 0 0 0
2 0.142857143 0 0 0 0 0 1 -1 0 0
2 0.2 0 0 0 0 0 -1 0 0 0
2 5 0 0 0 0 0 0 1 0 0
2 3 0 0 0 0 1 0 -1 0 0
2 0.333333333 0 0 0 0 -1 0 0 0 0
2 3 0 0 0 0 1 -1 0 0 0
2 3 0 0 0 0 0 1 -1 0 0
2 0.333333333 0 0 0 0 0 -1 0 0 0
2 5 0 0 0 0 0 0 1 0 0
2 3 0 0 0 0 1 0 -1 0 0
2 0.333333333 0 0 0 0 -1 0 0 0 0
2 1 0 0 0 0 1 -1 0 0 0
2 0.111111111 0 0 0 0 0 1 -1 0 0
2 0.111111111 0 0 0 0 0 -1 0 0 0
2 9 0 0 0 0 0 0 1 0 0
2 0.111111111 0 0 0 0 1 0 -1 0 0
2 9 0 0 0 0 -1 0 0 0 0
2 0.111111111 0 0 0 0 1 -1 0 0 0
END
Appendix B
394
Initial values for combined preference model
list(eg=c(1,1,1,1,1,0.5,0.5,1,1,1,1))
Appendix B
395
6 Full MCDA model
Dictionary
The table below describes the key variables/parameters/constants that have not already been
explained.
Name Description admin[t] Administration mode for treatment t. =0 for daily subcutaneous, =1 for
daily oral, =2 for monthly infusion, =3 for preflink Vector listing the preference parameter number corresponding to each
mapping group wo Vector listing the outcome number used in the MCDA calculations for
each criterion/mapping group wib[] Indicates whether criterion is to be included in alternative weight
calculation (which gives a weight for intermediate admin levels but does not include them in the sum used for normalisation)
totresdev Total nce in the residual deviance in the model
The set of “outcomes” in the RRMS clinical evidence synthesis dataset is extended to include the administration modes as follows: Outcomes
1 Annualised relapse rate
2 Relapse-free proportion
3 Proportion undergoing disability progression; confirmed 3 months later
4 Proportion undergoing disability progression; confirmed 6 months later
5 Alanine aminotransferase above upper limit of normal range
6 Alanine aminotransferase above 3x upper limit of normal range
7 Alanine aminotransferase above 5x upper limit of normal range
8 Proportion with serious gastrointestinal disorders
9 Proportion with serious bradycardia
10 Proportion with macular edema
11 Indicator variable for daily oral administration
12 Indicator variable for administration by 1-3x weekly injection
In other words d[t,11]=1 for treatments with daily oral administration, d[t,12]=1 for treatments
administered by 1-3x weekly injection, and d[t,11]=d[t,12]=0 for treatments administered by daily
subcutraneous injection. An indicator for monthly infusion is not required as there are no such
treatments in the dataset.
Full MCDA model code
# This model uses fixed mappings in three groups, random preferences by study and method 1 (see III.6.2.3) for predictive preferences at the individual level. # This model includes the three “zeroes” outcomes in the evidence synthesis but excludes them from from the MCDA model. Due to the way the model is coded, it is necessary to assign utility coefficient parameters for these outcomes even though they are not included in the MCDA calculations. The
Appendix B
396
parameters eg[12], eg[13] and eg[14] are used for this purpose and are assigned the deterministic value 1. This has led to some hard coding where the number 14 is used directly to represent the number of criteria in some loops, while the parameter nc needs to retain the value 11 for other purposes in the model. model
### TREATMENT EFFECTS MODEL
sig~dunif(0,10) # prior for between-study sd of treatment effects
tau<-pow(sig,-2) # between-study precision of treatment effects
for (j in 1:totalo1) ff[j]<-1
for (j in totalo1+1:totalo) ff[j]<-1
for (j in 1:totalo)
ad[1,j]<-0 # mean treatment effect is zero on
reference treatment
signr_b[j]<-step(rho_b[j]) # sign of between-study
correlations
signr_w[j]<-step(rho_w[j]) # sign of within-study
correlations
for (k in 1:nt) d[k,j]<-ad[k,j]*(1-ze[j,k])
# assign known signs to treatment effects
for(i in 1:ns)
temp[i]<-no2[i]
E[i]~dnorm(0,1)
for (j in 1:no1[i])
mu[i,j]~dnorm(0,.01) # "average" level of outcome j in
study i across all trial arms
for (j in no1[i]+1:no[i])
mu[i,j]~dgamma(0.5,0.5) # "average" level of outcome j in
study i across all trial arms
for (j in 1:no[i])
G[i,j]~dnorm(0,1)
sdelta[i,j]<-sum(delta[i,1:na[i],j])/na[i] # effect of "average"
treatment in study i on outcome j relative to reference
for (k in 1:na[i])
Dmu[i,k,j] <- step(totalo1+0.5-o[i,j])*(mu[i,j] + delta[i,k,j] -
sdelta[i,j]) + step(o[i,j]-totalo1-0.5)*min(1,max(0,mu[i,j] + delta[i,k,j]
- sdelta[i,j]))
D[i,k,j] <- mu[i,j] + delta[i,k,j] - sdelta[i,j] +
signr_w[o[i,j]]*sqrt(abs(rho_w[o[i,j]]))*B[i,k]*pow(prec[i,k,j]/ff[o[i,j]],
-0.5)
# mean of outcome j in arm k of study i is the average across all arms plus
the effect of treatment compared to average; final term induces required
within-study covariance between different outcomes in same arm
DD[i,k,j]<-step(totalo1+0.5-o[i,j])*D[i,k,j] + step(o[i,j]-totalo1-
0.5)*min(1,max(0,D[i,k,j]))
y[i,k,j]~dnorm(DD[i,k,j],yprec[i,k,j]) # distribution of outcome j
in arm k of study i
prec[i,k,j]<-pow(va[i,k,j]/n[i,k],-1) # overall precision
of observed outcome y
Appendix B
397
yprec[i,k,j] <- (prec[i,k,j]/(1-abs(rho_w[o[i,j]])))/ff[o[i,j]] #
remaining (unshared) precision of y after accounting for covariance
taud[i,k,j]<-tau/(1-abs(rho_b[o[i,j]])-0.5+0.5*abs(rho_b[o[i,j]]))
# remaining (unshared) precision of delta after accounting for covariances
adelta[i,k,j] ~ dnorm(H[i,k,j],taud[i,k,j]) # distribution of
trial-specific treatment effect on outcome j in arm k
delta[i,k,j]<-step(o[i,j]-totalo1-0.5)*(1-
ze[o[i,j],t[i,k]])*d[t[i,k],o[i,j]]+step(totalo1+0.5-o[i,j])*(1-
ze[o[i,j],t[i,k]])*adelta[i,k,j] # select appropriate
treatment effect parameter for this study arm and outcome
H[i,k,j] <- d[t[i,k],o[i,j]] +
signr_b[o[i,j]]*(sqrt(abs(rho_b[o[i,j]])*0.5)*E[i]+signr_b[o[i,j]]*sqrt(abs
(rho_b[o[i,j]])-abs(rho_b[o[i,j]])*0.5)*F[i,k] + signr_b[o[i,j]]*sqrt(0.5-
abs(rho_b[o[i,j]])*0.5)*G[i,j])* pow(tau,-0.5)
for (k in 1:na[i])
B[i,k]~dnorm(0,1) # normalised within-trial same-arm different-outcome
covariance of observed outcomes (y)
F[i,k]~dnorm(0,1)
### MAPPINGS
for (m in 1:ng) # cycle through outcome groups
sb[ogbase[m]]<-sign[ogbase[m]] # mean mapping is +/-1 for base
outcome in each group
for (j in ogbase[m]+1:ogbase[m+1]-1)
sb[j]<-(sign[j]/sign[ogbase[m]])*abs(b[j])
for (j in 1:totalo) b[j] ~ dnorm(0,.01)
lb[j]<-log(abs(sb[j]))
for (k in 2:nt)
W[k]~dnorm(0,1) # normalised covariation of mappings
for (m in 2:ng+1)
beta[k,ogbase[m-1]]<- sign[ogbase[m-1]] # treatment-specific mapping is
+/-1 for base outcome in each group
ad1[k,ogbase[m-1]]~dnorm(0,.001) I(0,)
ad2[k,ogbase[m-1]]~dbeta(0.5,0.5)
ad[k,ogbase[m-1]]<-sign[ogbase[m-1]]*(step(totalo1+0.5-ogbase[m-
1])*ad1[k,ogbase[m-1]]+step(ogbase[m-1]-totalo1-0.5)*ad2[k,ogbase[m-1]])
for (j in ogbase[m-1]+1:ogbase[m]-1)
ad[k,j]<- (beta[k,j]/beta[k,ogbase[m-1]])*abs(ad[k,ogbase[m-1]] )
# population-mean treatment effect on outcome j is mapped from mean effect
on base outcome for that group
beta[k,j]<-sb[j]
### RESIDUAL DEVIANCE
Appendix B
398
nmaresdev<-sum(resdev[]) # summed
overall residual deviance
for (i in 1:ns)
resdev[i]<-inprod(pres[i,1:no[i]*na[i]],res[i,1:no[i]*na[i]])
# residual deviance for study i
cp[i,1:totalo*maxarms,1:totalo*maxarms]<-
inverse(cv[i,1:totalo*maxarms,1:totalo*maxarms]) # within-study coprecision
matrix of outcomes in study i
for (x in 1:no[i]*na[i]) # indexing variable x loops through all
arm/outcome combinations in study i
arm[i,x]<-trunc(1+(x-1)/no[i]) # finds within-trial arm number
corresponding to each value of x
out[i,x]<-x-no[i]*trunc((x-1)/no[i]) # finds within-trial outcome
number corresponding to each value of x
for (z in 1:no[i]*na[i])
cv[i,x,z]<-
pow((prec[i,arm[i,x],out[i,x]]/ff[o[i,out[i,x]]])*(prec[i,arm[i,z],out[i,z]
]/ff[o[i,out[i,z]]]),-
0.5)*(signr_w[o[i,out[i,x]]]*signr_w[o[i,out[i,z]]]*sqrt(abs(rho_w[o[i,out[
i,x]]]*rho_w[o[i,out[i,z]]]))*equals(arm[i,x],arm[i,z])+(1-
signr_w[o[i,out[i,x]]]*signr_w[o[i,out[i,z]]]*sqrt(abs(rho_w[o[i,out[i,x]]]
*rho_w[o[i,out[i,z]]])))*equals(x,z))
# within-study covariance matrix element representing covariance between
arm/outcome combinations x and z; matrix is needed to calculate residual
deviance
for (j in no[i]*na[i]+1:totalo*maxarms)
# covariance matrix needs extra columns and rows to standardise its
dimensions across studies
cv[i,x,j]<-0 # fill in redundant off-diagonal elements of the
covariance matrix with zeroes
cv[i,j,x]<-0 # fill in redundant off-diagonal elements of the
covariance matrix with zeroes
res[i,x]<-y[i,arm[i,x],out[i,x]] - Dmu[i,arm[i,x],out[i,x]]
# residual for arm/outcome combination x in study i
pres[i,x]<-inprod(cp[i,x,1:no[i]*na[i]],res[i,1:no[i]*na[i]])
# inner product of residuals and row of coprecision matrix (for residual
deviance calculation)
for (j in no[i]*na[i]+1:totalo*maxarms-1)
# covariance matrix needs extra columns and rows to standardise its
dimensions across studies
for (k in j+1:totalo*maxarms)
cv[i,j,k]<-0 # fill in redundant off-diagonal elements of
the covariance matrix with zeroes
cv[i,k,j]<-0 # fill in redundant off-diagonal elements of
the covariance matrix with zeroes
cv[i,j,j]<-1 # fill in redundant diagonal elements of the
covariance matrix with 1s
cv[i,totalo*maxarms,totalo*maxarms]<-1 # fill in final redundant diagonal
element of the covariance matrix with a 1
### POPULATION CALIBRATION MODEL
Appendix B
399
for (i in 1:ns)
Q[i]~dnorm(0,1)
for (k in 1:na[i]) S[i,k]~dnorm(0,1)
for (j in 1:no1[i]) alpha[i,j]<-aalpha[i,j]
for (j in no1[i]+1:no[i]) alpha[i,j]<-min(1,max(0,aalpha[i,j]))
for (j in 1:no[i])
aalpha[i,j]~dnorm(amu[i,j],aprec[i,j])
amu[i,j]<-a[o[i,j]]+signr_b[o[i,j]]*zi*sqrt(abs(rho_b[o[i,j]]))*Q[i]
aprec[i,j]<-pow(zi,-2)/(1-abs(rho_b[o[i,j]]))
for (k in 1:na[i])
pm_y[i,k,j]<-y[i,k,j]
pm_va[i,k,j]<-va[i,k,j]
pm_va_prec[i,k,j]<-pow(pm_va[i,k,j]*sqrt(2/n[i,k]),-1)
pm_va[i,k,j]~dnorm(pm_va_mu[o[i,j]],pm_va_prec[i,k,j])
pm_prec[i,k,j]<-pow(pm_va[i,k,j]/n[i,k],-1)/((1-
abs(rho_w[o[i,j]]))*ff[o[i,j]])
pm_mu[i,k,j]<-step(o[i,j]-totalo1-0.5)*a[o[i,j]]+step(totalo1+0.5-
o[i,j])*alpha[i,j]+signr_w[o[i,j]]*sqrt(abs(rho_w[o[i,j]]))*sqrt(pm_va[i,k,
j]/n[i,k])*S[i,k] + cut(delta[i,k,j])
pm_y[i,k,j]~dnorm(pm_mu[i,k,j],pm_prec[i,k,j])
zi~dunif(0,10)
for (j in 1:totalo1)
a[j]~dnorm(0,.001)
for (j in totalo1+1:totalo)
a[j]~dgamma(0.5,0.5)
for (j in 1:totalo) for (k in 1:nt)
absd[k,j]<-step(totalo1+0.5-j)*(a[j]+d[k,j])+step(j-totalo1-
0.5)*max(0,min(1,a[j]+d[k,j]))
### PREDICTIVE DISTRIBUTIONS
E[ns+1]~dnorm(0,1)
Q[ns+1]~dnorm(0,1)
for (k in 1:nt)
F[ns+1,k]~dnorm(0,1)
S[ns+1,k]~dnorm(0,1)
for (j in 1:totalo1)
alpha[ns+1,j]<-aalpha[ns+1,j]
pm_va_mu[j]~dunif(0,100)
for (j in totalo1+1:totalo)
alpha[ns+1,j]<-max(0,aalpha[ns+1,j])
pm_va_mu[j]~dunif(0,0.0001)
for (j in 1:totalo)
G[ns+1,j]~dnorm(0,1)
Appendix B
400
amu[ns+1,j]<-a[j]+signr_b[j]*zi*sqrt(abs(rho_b[j]))*Q[ns+1]
aprec[ns+1,j]<-pow(zi,-2)/(1-abs(rho_b[j]))
aalpha[ns+1,j]~dnorm(amu[ns+1,j],aprec[ns+1,j])
for (k in 1:nt)
taud[ns+1,k,j]<-tau/(1-abs(rho_b[j])-0.5+0.5*abs(rho_b[j]))
adelta[ns+1,k,j] ~ dnorm(H[ns+1,k,j],taud[ns+1,k,j])
delta[ns+1,k,j]<-step(j-totalo1-0.5)*(1-ze[j,k])*d[k,j]+step(totalo1+0.5-
j)*(1-ze[j,k])*adelta[ns+1,k,j] # select appropriate
treatment effect parameter for this study arm and outcome
H[ns+1,k,j] <- d[k,j] +
signr_b[j]*(sqrt(abs(rho_b[j])*0.5)*E[ns+1]+signr_b[j]*sqrt(abs(rho_b[j])-
abs(rho_b[j])*0.5)*F[ns+1,k] + signr_b[j]*sqrt(0.5-
abs(rho_b[j])*0.5)*G[ns+1,j])* pow(tau,-0.5)
pm_prec[ns+1,k,j]<-pow(pm_va_mu[j]*ff[j],-1)/(1-abs(rho_w[j]))
pm_amu[ns+1,k,j]<-step(j-totalo1-
0.5)*min(1,max(0,a[j]+cut(delta[ns+1,k,j])))+step(totalo1+0.5-
j)*(alpha[ns+1,j]+cut(delta[ns+1,k,j]))
pm_mu[ns+1,k,j]<- pm_amu[ns+1,k,j]
+signr_w[j]*sqrt(abs(rho_w[j]))*sqrt(pm_va_mu[j])*S[ns+1,k]
apred_y[k,j]~dnorm(pm_mu[ns+1,k,j],pm_prec[ns+1,k,j])
pred_y[k,j]<- step(totalo1+0.5-j)*apred_y[k,j] + step(j-totalo1-
0.5)*max(0,apred_y[k,j])
# Assign admin levels
for (k in 1:nt) d[k,11]<-equals(admin[k],1) # Daily oral indicator
d[k,12]<-equals(admin[k],3) # every 2 days-weekly injection indicator
### TRANSFORMATIONS, RANKINGS, WEIGHTS, MCDA
for (m in 1:ng)
wgt[m]<-wg[preflink[m]]
weight[m]<-wgt[m]/sum(wgt[1:ng])
for (j in 1:14) weightb[j]<-eg[j]/sum(wgt[1:ng])
for (k in 1:nt)
wbr[k]<-sum(pbr[k,1:ng])
wbr_pred_study[k]<-sum(pbr_pred_study[k,1:ng])
wbr_pred_y[k]<-sum(pbr_pred_y[k,1:ng])
trad[k,1]<-exp(absd[k,1])
trad_pred_study[k,1]<-min(3,exp(pm_amu[ns+1,k,1]))
trad_pred_y[k,1]<-min(3,exp(pred_y[k,1]))
for (j in 2:7) trad[k,j]<-exp(absd[k,j])/(1+exp(absd[k,j]))
trad_pred_study[k,j]<-exp(pm_amu[ns+1,k,j])/(1+exp(pm_amu[ns+1,k,j]))
trad_pred_y[k,j]<-exp(pred_y[k,j])/(1+exp(pred_y[k,j]))
for (j in 8:10) trad[k,j] <- absd[k,j]
trad_pred_study[k,j]<-pm_amu[ns+1,k,j]
trad_pred_y[k,j]<-pred_y[k,j]
Appendix B
401
#admin route categories are deterministic
trad[k,11]<-d[k,11]
trad[k,12]<-d[k,12]
trad_pred_study[k,11]<-d[k,11]
trad_pred_study[k,12]<-d[k,12]
trad_pred_y[k,11]<-d[k,11]
trad_pred_y[k,12]<-d[k,12]
for (m in 1:ng)
pbr[k,m]<-impact[wo[m]]*trad[k,wo[m]]*weightb[preflink[m]]*wib[preflink[m]]
pbr_pred_study[k,m]<-
impact[wo[m]]*trad_pred_study[k,wo[m]]*weightb_pred_study[m]*wib[preflink[m
]]
pbr_pred_y[k,m]<-
impact[wo[m]]*trad_pred_y[k,wo[m]]*weightb_pred_y[m]*wib[preflink[m]]
rank[k,m]<-equals(impact[ogbase[m]],-
1)*rank(wbr[],k)+equals(impact[ogbase[m]],1)*(nt+1- rank(wbr[],k)) #
treatment rankings by outcome
for (q in 1:nt)
rankprop[k,m,q]<-equals(rank[k,m],q)
cumrankprop[k,m,q]<-step(q-rank[k,m]) # indicator for time
spent at or below each rank
sucra[k,m]<-sum(cumrankprop[k,m,1:nt-1])/(nt-1) # SUCRA
for (k in 1:nt) totrank[k]<-rank(wbr[],k)
totrank_pred_study[k]<-rank(wbr_pred_study[],k)
totrank_pred_y[k]<-rank(wbr_pred_y[],k)
### PREFERENCE MODEL
for(i in 1:nchs) # loop through choice sets
# difference in utility between choices with logistically distributed random
component
logit(pr[i]) <- V[i,2] - V[i,1]
#p is probability of choosing right-hand option, corresponding to y=2
ch_n[i]~dbin(pr[i],ch_N[i])
ch_nhat[i]<-pr[i]*ch_N[i]
chdev[i]<- 2 * (ch_n[i] * (log(ch_n[i])-log(ch_nhat[i])) + (ch_N[i]-
ch_n[i])*(log(ch_N[i]-ch_n[i]) - log(ch_N[i]-ch_nhat[i])))
for(k in 1:2)
for (j in 1:nchc) pg[i,j,k] <- chsign[j]*chgamma[j]*ch_cr[i,j,k]
#Utility model
V[i,k] <- sum(pg[i,1:nchc,k])
Appendix B
402
for (j in 1:nchc) chgamma[j]~dnorm(eg[chc[j]],prefretau[j]) I(0,)
for(i in 1:nr) # loop through ratings
#Utility model
for (j in 1:nrc) pmu[j,i] <- log(ragamma[rastudyid[i],j])*(cr[i,j]-
0.5*equals(j,1)*cr[i,j]) # includes adjustment for relapse time horizon
ramu[i] <- sum(pmu[1:nrc,i])
logra[i]<-log(ra[i])
logra[i]~dnorm(ramu[i],ratau)
radev[i] <- (logra[i]-ramu[i]) * (logra[i]-ramu[i]) * ratau # residual
deviance contribution
for (j in 1:nrc)
for (k in 1:nrastud)
ragamma[k,j]~dnorm(prefmu[k,j],raprefretau[k,j]) I(0,)
prefmu[k,j] <- eg[rac[j]] + (k-1)*ahpalpha # includes alpha
for study 2 but not study 1
raprefretau[k,j]<-pow(prefresig*prefmu[k,j],-2)
ahpalpha~dgamma(1,0.1)
ratau<-pow(rasig,-2)
rasig~dunif(0,10)
prefresig~dunif(0,10)
for (i in 1:nps) # studies reporting preference coefficients
for (j in 1:nop[i])
AA[i,j]~dnorm(0,1)
for (k in 1:np[i,j])
up[i,j,k]~dnorm(atheta[i,j,k],prep[i,j,k])
atheta[i,j,k]<-theta[i,j,k] + sqrt(0.5)*upse[i,j,k]*AA[i,j]
prep[i,j,k] <- pow(upse[i,j,k],-2)*2
# precision (with correction for dummy coding covariance)
pmava[i,j,k]<-pma_n[i]*pow(upse[i,j,k],2)
pmava_prec[i,j,k]<-pow(pmava[i,j,k]*sqrt(2/pma_n[i]),-1)
pmava[i,j,k]~dnorm(pmavam[i,j,k],pmava_prec[i,j,k]) I(0,)
pmavam[i,j,k]<-pmava_mu*cut(theta[i,j,k])*cut(theta[i,j,k])
pmares[i,j,k] <- up[i,j,k]-theta[i,j,k]
Appendix B
403
theta[i,j,k]<-
levsign[i,j,k]*pmagamma[i,j,lev[i,j,k]]*ux[i,j,k]*zeta[i]
pma_cv[i,j,k,k]<- pow(upse[i,j,k],2)
pmapres[i,j,k]<-inprod(pma_cp[i,j,k,1:np[i,j]],
pmares[i,j,1:np[i,j]] )
for (m in 1:k-1) pma_cv[i,j,k,m]<- 0.5*upse[i,j,k]*upse[i,j,m]
pma_cv[i,j,m,k] <- pma_cv[i,j,k,m]
for (k in np[i,j]+1:maxp) pma_cv[i,j,k,k]<-1
for (m in 1:k-1) pma_cv[i,j,k,m]<-0
pma_cv[i,j,m,k]<-0
BB[i,j]~dnorm(0,1)
for (k in minlev[i,j]:maxlev[i,j])
pmatau[i,j,k]<-pow(prefresig*(eg[pmac[op[i,j]+k-1]]-
offset[base[i,j]+1]),-2)*2
pmagamma[i,j,k]~dnorm(pmamu[i,j,k],pmatau[i,j,k]) I(0,)
pmamu[i,j,k]<-abs(eg[pmac[op[i,j]+k-1]]-offset[base[i,j]+1]) +
sqrt(0.5)*BB[i,j]*prefresig*(eg[pmac[op[i,j]+k-1]]-offset[base[i,j]+1])
pma_cp[i,j,1:maxp,1:maxp] <- inverse(pma_cv[i,j,1:maxp,1:maxp] )
pmadev[i,j]<- inprod(pmapres[i,j,1:np[i,j]], pmares[i,j,1:np[i,j]] )
pmardev[i]<-sum(pmadev[i,1:nop[i]])
zeta[2]<-1
zeta[3]<-1
zeta[4]<-1
zeta[6]<-1
zeta[7]<-1
zeta[8]<-1
zeta[1]~dnorm(0,.01) I(0,)
zeta[5]~dnorm(0,.01) I(0,)
#base offsets
offset[1]<-0
offset[2]<-eg[pmac[3]]
#priors
for (j in 1:nc) eg[j] ~ dgamma(1,0.01)
g[j]<-log(eg[j])
Appendix B
404
eg[12]<-1
eg[13]<-1
eg[14]<-1
#weights
for(i in 1:14)
wg[i]<-wi[i]*eg[i]
pmaprec_mu[i]<-pow(pmava_mu*eg[i]*eg[i],-1)
prefretau[i]<-pow(prefresig*eg[i],-2)
pmava_mu~dunif(0,100)
# predictive preferences
for (i in 1:ng)
pred_egamma[i]~dnorm(eg[preflink[i]],prefretau[preflink[i]]) I(0,)
pred_wegamma[i]<-wi[preflink[i]]*pred_egamma[i]
weight_pred_study[i]<-pred_wegamma[i]/sum(pred_wegamma[1:ng])
weightb_pred_study[i]<-pred_egamma[i]/sum(pred_wegamma[1:ng])
pred_pref[i]~dnorm(pred_egamma[i],pmaprec_mu[preflink[i]]) I(0,)
pred_wpref[i]<-wi[preflink[i]]*pred_pref[i]
weight_pred_y[i]<-pred_wpref[i]/sum(pred_wpref[1:ng])
weightb_pred_y[i]<-pred_pref[i]/sum(pred_wpref[1:ng])
pratau<-2*ratau
chresdev<-sum(chdev[1:nchs])
raresdev<-sum(radev[1:nr])
pmaresdev<-sum(pmardev[1:nps])
totresdev<-raresdev+chresdev+pmaresdev+nmaresdev
# END
Appendix B
405
Full MCDA model data
The data from the clinical evidence synthesis model (4b) and the combined preference model data
are used again here. It is necessary however to replace the list formatted clinical evidence synthesis
data with the following:
Parameter values (list format) for full MCDA model
list(rho_b=c(0.6,0.6,0.6,0.6,0.6,0.6,0.6,0,0,0),rho_w=c(0.6,0.6,0.6,0.6,0.6,0.6,0
.6,0.6,0.6,0.6),ns=16,totalo=10,totalo1=7,maxarms=4,nt=9,impact=c(-1,1,-1,-1,-1,-
1,-1,-1,-1,-1,1,1),sign=c(-1,1,-1,-
1,1,1,1,1,1,1,1,1),ogbase=c(1,3,5,8,9,10,11,12,13),wo=c(1,3,5,8,9,10,11,12,13),ng
=8,preflink=c(1,2,4,11,12,13,5,7),admin=c(1,1,1,0,3,3,3,1,1),wib=c(1,1,0,1,1,0,1,
0,0,0,0,0,0,0))
Initial values for full MCDA model
list(eg=c(1,1,1,1,1,0.5,0.5,1,1,1,1,NA,NA,NA))
Appendix C
406
Appendix C. Additional results and sensitivity analyses
1 Clinical evidence synthesis models
Estimated mappings
The tables below summarise the posterior distributions of the mappings between outcomes in the
final treatment effects model for both fixed and random mappings (in the latter case, the average
across treatments is reported) according to the following grouping strategies:
- One group: all efficacy and liver safety outcomes in one mapping group
- Two groups: all efficacy outcomes in one group, all liver safety outcomes in another group
- Three groups: both relapse outcomes in one group, both disability progression outcomes in
a second group and all liver safety outcomes in a third group.
Note that serious gastrointestinal disorders, serious bradycardia and macular edema are not subject
to mappings and therefore do not feature in these results.
FIXED MAPPINGS MODEL
1 group 2 groups 3 groups
mean sd mean sd mean sd
Mapping relative to group reference outcome (reference outcomes have constant mapping of 1)
log ARR 1 1 1 1 1 1
logit avoid relapse 1.510 0.608 1.327 0.408 1.188 0.344
logit 3M DP -0.838 0.269 -0.762 0.189 1 1
logit 6M DP -0.972 0.324 -0.894 0.243 -1.047 0.315
logit ALT>ULN 2.132 0.817 1 1 1 1
logit ALT>3xULN 1.670 0.725 0.817 0.177 0.793 0.183
logit ALT>5xULN 0.509 0.455 0.267 0.186 0.302 0.196
RANDOM MAPPINGS MODEL
1 group 2 groups 3 groups
mean sd mean sd mean sd Mapping relative to group reference outcome (reference outcomes have constant mapping of 1)
log ARR 1 1 1 1 1 1
logit avoid relapse 2.270 0.933 1.396 0.482 1.235 0.363
logit 3M DP -1.046 0.415 -0.843 0.236 1 1
logit 6M DP -1.316 0.558 -0.981 0.322 -1.061 0.327
logit ALT>ULN 2.933 1.130 1 1 1 1 logit ALT>3xULN 2.216 1.007 0.818 0.215 0.740 0.210
logit ALT>5xULN 0.665 0.596 0.259 0.206 0.258 0.205
Appendix C
407
Treatment rankings by outcome
The figures below show the proportion of MCMC simulations each treatment spent at each rank, for
each individual clinical outcome in turn. The rankings are based upon the population-average
treatment effects in the final model (random effects on efficacy and liver safety; fixed effects on
serious gastrointestinal disorders, serious bradycardia and macular edema; three mapping groups;
all between-outcome correlations=0.6) and both fixed-mapping and random-mapping versions are
shown.
The rankings for serious gastrointentinal disorders, serious bradycardia and macular edema are not
shown as these outcomes do not contribute to the benefit-risk assessment in the RRMS case study.
DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a,
IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. ARR =
annualised relapse rate, RFP = relapse-free proportion, DP3 = proportion experiencing disability progression
confirmed 3 months later, DP6 = proportion experiencing disability progression confirmed 6 months later, ALT
= proportion with alanine aminotransferase above upper limit of normal range, ALT3 = proportion with
alanine aminotransferase above 3x upper limit of normal range, ALT5 = proportion with alanine
aminotransferase above 5x upper limit of normal range.
Appendix C
408
Fixed mappings
Appendix C
409
Random mappings
Appendix C
410
Sensitivity to assumed correlations
The two tables below show the posterior mean and standard deviation of the key parameters in the
treatment effects module (Model 3, random effects, one mapping group) with the correlations
between all pairs of outcomes (at both the within- and between-study levels) set to 0, 0.3, 0.6 (as
per the main results in II.6.1.4) and 0.9. The tables use fixed and random mappings respectively.
The treatment-outcome combinations with no data (instead estimated via the mappings) are shown
in grey.
RANDOM EFFECTS FIXED MAPPINGS 1 GROUP MODEL 3
All correlations = 0 All correlations = 0.3 All correlations = 0.6 All correlations = 0.9
mean sd mean sd mean sd mean sd
Log annual relapse rate ratio (vs placebo)
DF -0.429 0.118 -0.414 0.106 -0.437 0.120 -0.448 0.156 FM -0.680 0.145 -0.612 0.136 -0.626 0.159 -0.637 0.221 GA -0.267 0.085 -0.261 0.078 -0.289 0.088 -0.338 0.124 IA (IM) -0.276 0.087 -0.257 0.078 -0.265 0.087 -0.268 0.110 IA (SC) -0.340 0.117 -0.343 0.107 -0.399 0.125 -0.484 0.185 IB -0.540 0.128 -0.507 0.118 -0.553 0.141 -0.630 0.215 LM -0.354 0.090 -0.330 0.083 -0.351 0.098 -0.367 0.132 TF -0.366 0.124 -0.343 0.110 -0.354 0.124 -0.350 0.153
Log odds ratio of avoiding relapse (vs placebo)
DF -0.557 0.124 -0.608 0.127 -0.607 0.143 -0.597 0.191 FM -0.888 0.147 -0.898 0.154 -0.870 0.177 -0.839 0.237 GA -0.348 0.102 -0.384 0.104 -0.403 0.111 -0.448 0.146 IA (IM) -0.361 0.105 -0.378 0.106 -0.369 0.111 -0.356 0.133 IA (SC) -0.453 0.166 -0.510 0.160 -0.558 0.165 -0.638 0.208 IB -0.708 0.153 -0.748 0.157 -0.772 0.176 -0.833 0.241 LM -0.462 0.098 -0.485 0.103 -0.488 0.116 -0.487 0.155 TF -0.476 0.142 -0.504 0.145 -0.492 0.155 -0.466 0.192
Log odds ratio of disability progression confirmed 3 months later (vs placebo)
DF -0.272 0.102 -0.329 0.111 -0.359 0.130 -0.388 0.168 FM -0.429 0.131 -0.483 0.145 -0.514 0.174 -0.550 0.234 GA -0.170 0.073 -0.208 0.079 -0.239 0.095 -0.293 0.134 IA (IM) -0.176 0.074 -0.204 0.079 -0.218 0.090 -0.232 0.114 IA (SC) -0.222 0.109 -0.276 0.116 -0.332 0.140 -0.423 0.202 IB -0.346 0.131 -0.404 0.138 -0.458 0.168 -0.549 0.242 LM -0.225 0.082 -0.263 0.091 -0.289 0.108 -0.319 0.142 TF -0.233 0.102 -0.272 0.106 -0.292 0.124 -0.303 0.154
Log odds ratio of disability progression confirmed 6 months later (vs placebo)
DF -0.393 0.143 -0.389 0.136 -0.414 0.148 -0.435 0.181 FM -0.615 0.172 -0.570 0.172 -0.591 0.191 -0.615 0.245 GA -0.245 0.101 -0.246 0.096 -0.275 0.107 -0.328 0.143 IA (IM) -0.253 0.100 -0.241 0.093 -0.251 0.101 -0.260 0.122 IA (SC) -0.312 0.138 -0.324 0.135 -0.380 0.153 -0.471 0.212 IB -0.498 0.181 -0.481 0.176 -0.530 0.195 -0.617 0.265 LM -0.324 0.112 -0.310 0.108 -0.333 0.121 -0.356 0.151 TF -0.336 0.142 -0.324 0.136 -0.338 0.148 -0.342 0.174
Appendix C
411
Log odds ratio of ALT above upper limit of normal range (vs placebo)
DF 0.797 0.155 0.856 0.165 0.857 0.180 0.861 0.220 FM 1.281 0.235 1.274 0.242 1.235 0.250 1.217 0.282 GA 0.497 0.135 0.540 0.135 0.568 0.139 0.646 0.170 IA (IM) 0.520 0.152 0.536 0.152 0.523 0.153 0.515 0.171 IA (SC) 0.655 0.253 0.727 0.245 0.797 0.248 0.933 0.286 IB 1.022 0.243 1.063 0.250 1.099 0.262 1.214 0.314 LM 0.664 0.141 0.685 0.144 0.690 0.153 0.704 0.183 TF 0.682 0.194 0.712 0.201 0.697 0.212 0.674 0.245
Log odds ratio of ALT above 3x upper limit of normal range (vs placebo)
DF 0.567 0.156 0.666 0.164 0.665 0.172 0.678 0.194 FM 0.921 0.262 0.993 0.244 0.960 0.245 0.961 0.260 GA 0.356 0.126 0.423 0.132 0.444 0.137 0.511 0.159 IA (IM) 0.373 0.138 0.418 0.140 0.407 0.139 0.407 0.148 IA (SC) 0.471 0.216 0.567 0.215 0.621 0.223 0.737 0.253 IB 0.735 0.246 0.830 0.244 0.857 0.254 0.959 0.287 LM 0.479 0.153 0.538 0.155 0.540 0.160 0.559 0.176 TF 0.485 0.167 0.552 0.172 0.539 0.178 0.528 0.199
Log odds ratio of ALT above 5x upper limit of normal range (vs placebo)
DF 0.183 0.144 0.232 0.166 0.198 0.152 0.177 0.141 FM 0.299 0.240 0.348 0.252 0.289 0.223 0.255 0.205 GA 0.116 0.097 0.147 0.110 0.132 0.105 0.134 0.108 IA (IM) 0.121 0.103 0.145 0.111 0.122 0.098 0.107 0.090 IA (SC) 0.156 0.144 0.198 0.159 0.186 0.154 0.194 0.161 IB 0.241 0.201 0.290 0.215 0.256 0.201 0.251 0.202 LM 0.155 0.127 0.187 0.137 0.160 0.124 0.146 0.118 TF 0.160 0.138 0.194 0.149 0.162 0.133 0.139 0.119
Average mapping (reference outcome has constant mapping of 1)
log ARR 1 1 1 1 1 1 1 1 logit avoid relapse 1.355 0.339 1.545 0.465 1.510 0.608 1.575 0.915 logit 3M DP -0.669 0.283 -0.817 0.277 -0.838 0.269 -0.877 0.295 logit 6M DP -0.934 0.309 -0.968 0.345 -0.972 0.324 -0.996 0.351 logit ALT>ULN 1.958 0.531 2.192 0.696 2.132 0.817 2.235 1.160 logit ALT>3xULN 1.415 0.525 1.715 0.624 1.670 0.725 1.774 0.966 logit ALT>5xULN 0.472 0.410 0.603 0.480 0.509 0.455 0.479 0.482
Between-study treatment effects sd 0.226 0.038 0.255 0.041 0.367 0.050 0.879 0.097 Residual deviance 161.2 17.5 161.2 17.5 163.0 17.9 160.7 18.2
Appendix C
412
RANDOM EFFECTS RANDOM MAPPINGS 1 GROUP MODEL 3
All correlations = 0 All correlations = 0.3 All correlations = 0.6 All correlations = 0.9
mean sd mean sd mean sd mean sd
Log annual relapse rate ratio (vs placebo)
DF -0.543 0.152 -0.487 0.133 -0.443 0.142 -0.373 0.151 FM -0.670 0.138 -0.715 0.142 -0.730 0.163 -0.707 0.224 GA -0.317 0.103 -0.298 0.095 -0.274 0.101 -0.241 0.111 IA (IM) -0.219 0.088 -0.230 0.083 -0.216 0.087 -0.204 0.097 IA (SC) -0.210 0.093 -0.251 0.095 -0.272 0.106 -0.310 0.146 IB -0.398 0.118 -0.421 0.123 -0.421 0.129 -0.439 0.157 LM -0.258 0.095 -0.271 0.089 -0.253 0.091 -0.241 0.099 TF -0.325 0.132 -0.325 0.124 -0.296 0.120 -0.265 0.130
Log odds ratio of avoiding relapse (vs placebo)
DF -0.648 0.145 -0.741 0.149 -0.832 0.158 -0.932 0.184 FM -0.905 0.151 -0.867 0.148 -0.856 0.158 -0.860 0.199 GA -0.488 0.137 -0.552 0.140 -0.647 0.134 -0.800 0.147 IA (IM) -0.352 0.123 -0.375 0.117 -0.388 0.118 -0.412 0.130 IA (SC) -0.565 0.184 -0.547 0.166 -0.573 0.161 -0.615 0.183 IB -0.678 0.170 -0.715 0.163 -0.761 0.170 -0.859 0.202 LM -0.374 0.113 -0.400 0.111 -0.401 0.115 -0.423 0.130 TF -0.442 0.160 -0.490 0.161 -0.512 0.168 -0.542 0.194
Log odds ratio of disability progression confirmed 3 months later (vs placebo)
DF -0.401 0.148 -0.349 0.132 -0.314 0.129 -0.274 0.129 FM -0.386 0.128 -0.408 0.133 -0.398 0.138 -0.397 0.169 GA -0.235 0.110 -0.213 0.097 -0.193 0.094 -0.184 0.098 IA (IM) -0.206 0.110 -0.201 0.102 -0.195 0.102 -0.191 0.108 IA (SC) -0.328 0.169 -0.378 0.194 -0.489 0.241 -0.583 0.294 IB -0.348 0.150 -0.351 0.145 -0.350 0.151 -0.377 0.176 LM -0.289 0.124 -0.285 0.119 -0.283 0.124 -0.269 0.128 TF -0.287 0.141 -0.275 0.129 -0.259 0.127 -0.239 0.129
Log odds ratio of disability progression confirmed 6 months later (vs placebo)
DF -0.454 0.201 -0.396 0.167 -0.356 0.155 -0.318 0.141 FM -0.483 0.147 -0.511 0.157 -0.508 0.157 -0.511 0.184 GA -0.286 0.142 -0.259 0.122 -0.241 0.116 -0.222 0.110 IA (IM) -0.253 0.127 -0.244 0.117 -0.245 0.118 -0.243 0.121 IA (SC) -0.302 0.154 -0.312 0.148 -0.319 0.147 -0.344 0.159 IB -0.763 0.384 -0.710 0.359 -0.836 0.391 -0.984 0.402 LM -0.355 0.144 -0.352 0.141 -0.352 0.141 -0.346 0.142 TF -0.377 0.285 -0.359 0.241 -0.357 0.252 -0.344 0.257
Log odds ratio of ALT above upper limit of normal range (vs placebo)
DF 0.585 0.171 0.671 0.178 0.669 0.181 0.698 0.201 FM 1.292 0.288 1.240 0.271 1.233 0.257 1.210 0.256 GA 0.310 0.126 0.333 0.130 0.310 0.127 0.331 0.140 IA (IM) 0.533 0.177 0.557 0.167 0.583 0.169 0.591 0.188 IA (SC) 1.130 0.451 1.030 0.398 1.133 0.366 1.269 0.363 IB 1.329 0.330 1.296 0.313 1.304 0.297 1.336 0.317 LM 0.678 0.145 0.700 0.141 0.721 0.144 0.739 0.164 TF 0.758 0.211 0.812 0.210 0.875 0.212 0.900 0.239
Appendix C
413
Log odds ratio of ALT above 3x upper limit of normal range (vs placebo)
DF 0.462 0.175 0.531 0.178 0.539 0.179 0.546 0.179 FM 1.135 0.282 1.057 0.262 1.079 0.248 1.063 0.230 GA 0.374 0.175 0.398 0.171 0.402 0.171 0.412 0.172 IA (IM) 0.333 0.171 0.359 0.169 0.366 0.169 0.368 0.162 IA (SC) 0.571 0.397 0.612 0.383 0.732 0.484 0.874 0.612 IB 0.858 0.577 0.882 0.530 0.998 0.641 1.172 0.817 LM 0.614 0.245 0.663 0.241 0.753 0.241 0.827 0.219 TF 0.381 0.171 0.416 0.170 0.403 0.168 0.398 0.170
Log odds ratio of ALT above 5x upper limit of normal range (vs placebo)
DF 0.189 0.162 0.198 0.156 0.175 0.139 0.129 0.125 FM 0.345 0.284 0.333 0.258 0.303 0.237 0.227 0.213 GA 0.141 0.134 0.140 0.124 0.124 0.108 0.095 0.099 IA (IM) 0.128 0.124 0.132 0.121 0.120 0.108 0.091 0.097 IA (SC) 0.203 0.229 0.208 0.211 0.216 0.221 0.188 0.228 IB 0.306 0.336 0.299 0.296 0.295 0.303 0.252 0.303 LM 0.165 0.149 0.173 0.148 0.157 0.132 0.119 0.119 TF 0.189 0.217 0.187 0.187 0.170 0.175 0.128 0.160
Average mapping (reference outcome has constant mapping of 1)
log ARR 1 1 1 1 1 1 1 1 logit avoid relapse 1.972 0.753 1.961 0.699 2.270 0.933 2.804 1.377 logit 3M DP -1.078 0.458 -0.988 0.390 -1.046 0.415 -1.115 0.464 logit 6M DP -1.375 0.672 -1.225 0.519 -1.316 0.558 -1.435 0.598 logit ALT>ULN 2.748 1.023 2.621 0.891 2.933 1.130 3.442 1.613 logit ALT>3xULN 1.907 0.803 1.907 0.751 2.216 1.007 2.635 1.338 logit ALT>5xULN 0.681 0.592 0.650 0.539 0.665 0.596 0.572 0.614
Between-study treatment effects sd 0.160 0.044 0.186 0.043 0.250 0.052 0.613 0.097 Between-treatment mapping sd 0.571 0.186 0.499 0.182 0.566 0.161 0.617 0.162 Residual deviance 154.8 16.9 157.3 16.9 161.0 17.5 162.9 18.4
The table below shows the results when a vague prior is assigned to each outcome’s “propensity to
correlate” (as defined in II.4.4.1.3). Initially a uniform prior on the interval (-1,1) was attempted;
however in some cases this failed to converge well and therefore a uniform prior on the interval (-
0.9,0.9) was used instead. Two variations are presented: in one, the between- and within-study
correlation propensities are assumed equal; in the other, they are allowed to differ.
Appendix C
414
RANDOM EFFECTS MODEL 3
Fixed mappings Random mappings
Vague prior on each outcome’s correlation propensity (between-study = within-study)
Vague prior on each outcome’s correlation propensity (between-
study within-study)
Vague prior on each outcome’s correlation propensity (between-study = within-study)
Vague prior on each outcome’s correlation propensity (between-
study within-study)
mean sd mean sd mean sd mean sd
Log annual relapse rate ratio (vs placebo)
DF -0.395 0.112 -0.416 0.116 -0.450 0.157 -0.468 0.151 FM -0.591 0.140 -0.592 0.141 -0.652 0.169 -0.670 0.156 GA -0.247 0.086 -0.267 0.087 -0.280 0.114 -0.295 0.111 IA (IM) -0.246 0.079 -0.247 0.081 -0.232 0.093 -0.230 0.088 IA (SC) -0.403 0.116 -0.401 0.122 -0.301 0.120 -0.294 0.122 IB -0.500 0.126 -0.502 0.126 -0.445 0.137 -0.433 0.131 LM -0.308 0.083 -0.306 0.086 -0.280 0.097 -0.289 0.096 TF -0.318 0.106 -0.319 0.109 -0.309 0.132 -0.325 0.127
Log odds ratio of avoiding relapse (vs placebo)
DF -0.595 0.118 -0.623 0.119 -0.652 0.141 -0.677 0.135 FM -0.893 0.125 -0.889 0.126 -0.895 0.136 -0.897 0.137 GA -0.373 0.107 -0.402 0.105 -0.465 0.140 -0.498 0.133 IA (IM) -0.372 0.095 -0.371 0.099 -0.359 0.109 -0.364 0.112 IA (SC) -0.614 0.154 -0.609 0.164 -0.596 0.174 -0.606 0.173 IB -0.758 0.145 -0.759 0.145 -0.709 0.158 -0.717 0.159 LM -0.466 0.088 -0.460 0.094 -0.403 0.103 -0.398 0.103 TF -0.480 0.130 -0.478 0.134 -0.460 0.148 -0.470 0.149
Log odds ratio of disability progression confirmed 3 months later (vs placebo)
DF -0.324 0.104 -0.332 0.100 -0.367 0.135 -0.374 0.131 FM -0.483 0.130 -0.469 0.117 -0.407 0.120 -0.399 0.119 GA -0.203 0.078 -0.214 0.075 -0.219 0.094 -0.227 0.096 IA (IM) -0.202 0.073 -0.197 0.070 -0.206 0.092 -0.203 0.094 IA (SC) -0.334 0.120 -0.323 0.114 -0.376 0.178 -0.366 0.167 IB -0.412 0.130 -0.403 0.120 -0.384 0.141 -0.371 0.143 LM -0.254 0.080 -0.245 0.076 -0.285 0.106 -0.282 0.108 TF -0.261 0.097 -0.254 0.093 -0.279 0.120 -0.281 0.123
Log odds ratio of disability progression confirmed 6 months later (vs placebo)
DF -0.371 0.135 -0.395 0.133 -0.403 0.174 -0.417 0.175 FM -0.551 0.165 -0.557 0.153 -0.500 0.150 -0.500 0.146 GA -0.233 0.099 -0.254 0.098 -0.257 0.125 -0.267 0.125 IA (IM) -0.231 0.089 -0.234 0.088 -0.243 0.112 -0.248 0.116 IA (SC) -0.380 0.144 -0.383 0.143 -0.338 0.152 -0.336 0.155 IB -0.474 0.174 -0.481 0.165 -0.652 0.333 -0.686 0.367 LM -0.290 0.101 -0.290 0.097 -0.339 0.133 -0.340 0.134 TF -0.300 0.124 -0.303 0.122 -0.340 0.212 -0.353 0.231
Log odds ratio of ALT above upper limit of normal range (vs placebo)
DF 0.868 0.194 0.914 0.211 0.742 0.217 0.736 0.231 FM 1.307 0.249 1.308 0.270 1.271 0.292 1.271 0.284 GA 0.543 0.165 0.588 0.173 0.392 0.163 0.389 0.163 IA (IM) 0.544 0.157 0.547 0.170 0.565 0.183 0.555 0.182 IA (SC) 0.905 0.273 0.903 0.296 1.026 0.383 1.055 0.395 IB 1.114 0.273 1.122 0.294 1.280 0.358 1.282 0.343 LM 0.681 0.150 0.675 0.162 0.694 0.161 0.677 0.156 TF 0.702 0.205 0.702 0.218 0.808 0.245 0.786 0.237
Appendix C
415
Log odds ratio of ALT above 3x upper limit of normal range (vs placebo)
DF 0.686 0.181 0.715 0.187 0.573 0.200 0.545 0.191 FM 1.034 0.250 1.026 0.255 1.086 0.284 1.048 0.277 GA 0.431 0.152 0.463 0.157 0.423 0.187 0.410 0.182 IA (IM) 0.431 0.143 0.429 0.147 0.379 0.172 0.356 0.163 IA (SC) 0.718 0.249 0.710 0.261 0.685 0.395 0.641 0.371 IB 0.884 0.265 0.882 0.271 0.913 0.502 0.859 0.463 LM 0.542 0.156 0.533 0.160 0.640 0.233 0.608 0.230 TF 0.552 0.175 0.547 0.180 0.451 0.188 0.427 0.179
Log odds ratio of ALT above 5x upper limit of normal range (vs placebo)
DF 0.240 0.174 0.237 0.173 0.226 0.166 0.205 0.163 FM 0.363 0.257 0.340 0.244 0.370 0.267 0.331 0.256 GA 0.151 0.117 0.154 0.119 0.156 0.130 0.145 0.126 IA (IM) 0.151 0.115 0.142 0.110 0.147 0.123 0.130 0.115 IA (SC) 0.252 0.195 0.235 0.185 0.250 0.224 0.221 0.215 IB 0.309 0.229 0.293 0.220 0.330 0.283 0.294 0.275 LM 0.188 0.136 0.175 0.130 0.191 0.149 0.166 0.139 TF 0.194 0.147 0.182 0.141 0.199 0.173 0.183 0.176
Average mapping (reference outcome has constant mapping of 1)
log ARR 1 1 1 1 1 1 1 1 logit avoid relapse 1.589 0.425 1.582 0.426 1.820 0.584 1.845 0.613 logit 3M DP -0.856 0.294 -0.834 0.278 -0.996 0.361 -0.969 0.339 logit 6M DP -0.974 0.360 -0.990 0.356 -1.174 0.483 -1.207 0.560 logit ALT>ULN 2.348 0.779 2.350 0.798 2.639 0.915 2.555 0.805 logit ALT>3xULN 1.861 0.691 1.844 0.696 1.997 0.862 1.834 0.684 logit ALT>5xULN 0.660 0.528 0.614 0.495 0.730 0.562 0.636 0.521
Between-study treatment effects sd 0.299 0.059 0.331 0.072 0.246 0.064 0.261 0.076 Between-treatment mapping sd N/A N/A 0.370 0.258 0.401 0.182 0.429 0.182 Residual deviance 137.6 19.6 135.1 20.3 136.8 18.9 133.4 19.9 Between-study correlation propensity
log ARR 0.328 0.294 0.152 0.422 0.274 0.330 0.169 0.428 logit avoid relapse -0.344 0.211 -0.453 0.295 -0.216 0.345 -0.309 0.451 logit 3M DP -0.028 0.441 -0.427 0.410 -0.061 0.437 -0.326 0.461 logit 6M DP 0.179 0.443 -0.262 0.475 0.110 0.462 -0.177 0.497 logit ALT>ULN 0.404 0.246 0.440 0.275 0.401 0.287 0.332 0.388 logit ALT>3xULN 0.360 0.301 0.308 0.425 0.357 0.346 0.195 0.494 logit ALT>5xULN 0.465 0.368 0.211 0.522 0.454 0.369 0.164 0.525
Within-study correlation propensity
log ARR 0.328 0.294 0.287 0.423 0.274 0.330 0.237 0.417 logit avoid relapse -0.344 0.211 -0.244 0.356 -0.216 0.345 -0.123 0.433 logit 3M DP -0.028 0.441 0.243 0.456 -0.061 0.437 0.116 0.470 logit 6M DP 0.179 0.443 0.279 0.474 0.110 0.462 0.147 0.495 logit ALT>ULN 0.404 0.246 0.272 0.420 0.401 0.287 0.200 0.468 logit ALT>3xULN 0.360 0.301 0.187 0.446 0.357 0.346 0.202 0.455 logit ALT>5xULN 0.465 0.368 0.281 0.505 0.454 0.369 0.277 0.489
Appendix C
416
The graphs below show the SUCRA statistic by outcome for the models above, based on rankings of the (population average) treatment effects.
Assumed correlation coefficient
Fixed mappings Random mappings
0
0.3
0.6
0.9
Vague prior on each outcome’s correlation propensity (between-study = within-study)
Vague prior on each outcome’s correlation propensity (between-
study within-study)
Appendix C
417
Sensitivity to random effects standard deviation prior
The two tables below show the posterior mean and standard deviation of the key parameters in the
treatment effects module (Model 3, random effects) with an alternative uniform prior on the
random effects standard deviation: 𝜎 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,2) (as recommended by the NICE Decision
Support Unit 78) together with 𝜎 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10) (as per the main results in II.6.1.4). The tables use
one mapping group and three mapping groups respectively. The treatment-outcome combinations
with no data (instead estimated via the mappings) are shown in grey.
RANDOM EFFECTS 1 GROUP MODEL 3
Fixed mappings Random mappings
𝜎 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,2) 𝜎 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10) 𝜎 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,2) 𝜎 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10)
mean sd mean sd mean sd mean sd
Log annual relapse rate ratio (vs placebo)
DF -0.428 0.119 -0.437 0.120 -0.445 0.135 -0.443 0.142 FM -0.617 0.159 -0.626 0.159 -0.720 0.161 -0.730 0.163 GA -0.285 0.087 -0.289 0.088 -0.247 0.105 -0.274 0.101 IA (IM) -0.260 0.086 -0.265 0.087 -0.202 0.087 -0.216 0.087 IA (SC) -0.395 0.123 -0.399 0.125 -0.232 0.108 -0.272 0.106 IB -0.548 0.140 -0.553 0.141 -0.392 0.124 -0.421 0.129 LM -0.344 0.096 -0.351 0.098 -0.235 0.095 -0.253 0.091 TF -0.349 0.122 -0.354 0.124 -0.276 0.119 -0.296 0.120
Log odds ratio of avoiding relapse (vs placebo)
DF -0.608 0.139 -0.607 0.143 -0.838 0.152 -0.832 0.158 FM -0.876 0.172 -0.870 0.177 -0.856 0.158 -0.856 0.158 GA -0.405 0.108 -0.403 0.111 -0.664 0.140 -0.647 0.134 IA (IM) -0.371 0.109 -0.369 0.111 -0.389 0.121 -0.388 0.118 IA (SC) -0.564 0.164 -0.558 0.165 -0.580 0.166 -0.573 0.161 IB -0.781 0.172 -0.772 0.176 -0.764 0.172 -0.761 0.170 LM -0.489 0.112 -0.488 0.116 -0.395 0.117 -0.401 0.115 TF -0.496 0.154 -0.492 0.155 -0.518 0.170 -0.512 0.168
Log odds ratio of disability progression confirmed 3 months later (vs placebo)
DF -0.348 0.127 -0.359 0.130 -0.324 0.129 -0.314 0.129 FM -0.500 0.169 -0.514 0.174 -0.405 0.139 -0.398 0.138 GA -0.233 0.093 -0.239 0.095 -0.192 0.093 -0.193 0.094 IA (IM) -0.212 0.088 -0.218 0.090 -0.204 0.106 -0.195 0.102 IA (SC) -0.324 0.136 -0.332 0.140 -0.521 0.245 -0.489 0.241 IB -0.447 0.163 -0.458 0.168 -0.351 0.148 -0.350 0.151 LM -0.280 0.105 -0.289 0.108 -0.292 0.125 -0.283 0.124 TF -0.283 0.120 -0.292 0.124 -0.265 0.127 -0.259 0.127
Log odds ratio of disability progression confirmed 6 months later (vs placebo)
DF -0.405 0.148 -0.414 0.148 -0.360 0.154 -0.356 0.155 FM -0.580 0.190 -0.591 0.191 -0.509 0.157 -0.508 0.157 GA -0.271 0.108 -0.275 0.107 -0.237 0.119 -0.241 0.116 IA (IM) -0.247 0.100 -0.251 0.101 -0.251 0.122 -0.245 0.118 IA (SC) -0.375 0.153 -0.380 0.153 -0.312 0.145 -0.319 0.147 IB -0.524 0.197 -0.530 0.195 -0.880 0.402 -0.836 0.391 LM -0.326 0.120 -0.333 0.121 -0.362 0.141 -0.352 0.141 TF -0.333 0.148 -0.338 0.148 -0.379 0.299 -0.357 0.252
Appendix C
418
Log odds ratio of ALT above upper limit of normal range (vs placebo)
DF 0.869 0.178 0.857 0.180 0.642 0.174 0.669 0.181 FM 1.258 0.249 1.235 0.250 1.231 0.258 1.233 0.257 GA 0.578 0.140 0.568 0.139 0.287 0.120 0.310 0.127 IA (IM) 0.532 0.154 0.523 0.153 0.589 0.172 0.583 0.169 IA (SC) 0.815 0.251 0.797 0.248 1.164 0.372 1.133 0.366 IB 1.125 0.262 1.099 0.262 1.295 0.304 1.304 0.297 LM 0.701 0.152 0.690 0.153 0.725 0.143 0.721 0.144 TF 0.710 0.213 0.697 0.212 0.890 0.212 0.875 0.212
Log odds ratio of ALT above 3x upper limit of normal range (vs placebo)
DF 0.675 0.171 0.665 0.172 0.526 0.178 0.539 0.179 FM 0.979 0.246 0.960 0.245 1.083 0.254 1.079 0.248 GA 0.452 0.136 0.444 0.137 0.391 0.170 0.402 0.171 IA (IM) 0.415 0.139 0.407 0.139 0.359 0.169 0.366 0.169 IA (SC) 0.635 0.223 0.621 0.223 0.736 0.525 0.732 0.484 IB 0.878 0.255 0.857 0.254 1.016 0.701 0.998 0.641 LM 0.548 0.158 0.540 0.160 0.759 0.240 0.753 0.241 TF 0.549 0.179 0.539 0.178 0.393 0.167 0.403 0.168
Log odds ratio of ALT above 5x upper limit of normal range (vs placebo)
DF 0.204 0.153 0.198 0.152 0.153 0.143 0.175 0.139 FM 0.298 0.227 0.289 0.223 0.273 0.251 0.303 0.237 GA 0.136 0.106 0.132 0.105 0.110 0.117 0.124 0.108 IA (IM) 0.125 0.100 0.122 0.098 0.105 0.109 0.120 0.108 IA (SC) 0.192 0.157 0.186 0.154 0.193 0.237 0.216 0.221 IB 0.266 0.206 0.256 0.201 0.267 0.324 0.295 0.303 LM 0.165 0.126 0.160 0.124 0.141 0.142 0.157 0.132 TF 0.167 0.134 0.162 0.133 0.152 0.179 0.170 0.175
Average mapping (reference outcome has constant mapping of 1)
log ARR 1 1 1 1 1 1 1 1 logit avoid relapse 1.543 0.621 1.510 0.608 2.589 1.148 2.270 0.933 logit 3M DP -0.830 0.273 -0.838 0.269 -1.241 0.556 -1.046 0.415 logit 6M DP -0.973 0.338 -0.972 0.324 -1.541 0.710 -1.316 0.558 logit ALT>ULN 2.208 0.862 2.132 0.817 3.301 1.351 2.933 1.130 logit ALT>3xULN 1.730 0.766 1.670 0.725 2.492 1.183 2.216 1.007 logit ALT>5xULN 0.535 0.486 0.509 0.455 0.644 0.659 0.665 0.596
Between-study treatment effects sd 0.367 0.050 0.367 0.050 0.242 0.050 0.250 0.052 Between-treatment mapping sd N/A N/A N/A N/A 0.614 0.164 0.566 0.161 Residual deviance 162.8 17.9 163.0 17.9 161.4 17.4 161.0 17.5
Appendix C
419
RANDOM EFFECTS 3 GROUPS MODEL 3
Fixed mappings Random mappings
𝜎 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,2) 𝜎 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10) 𝜎 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,2) 𝜎 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10)
mean sd mean sd mean sd mean sd
Log annual relapse rate ratio (vs placebo)
DF -0.666 0.132 -0.657 0.132 -0.670 0.137 -0.658 0.140 FM -0.774 0.147 -0.764 0.148 -0.759 0.152 -0.759 0.153 GA -0.474 0.101 -0.466 0.100 -0.474 0.107 -0.467 0.108 IA (IM) -0.275 0.093 -0.270 0.092 -0.271 0.095 -0.270 0.094 IA (SC) -0.339 0.116 -0.335 0.114 -0.330 0.118 -0.327 0.117 IB -0.549 0.120 -0.541 0.119 -0.505 0.126 -0.505 0.126 LM -0.248 0.097 -0.244 0.096 -0.238 0.095 -0.236 0.096 TF -0.387 0.144 -0.385 0.141 -0.395 0.148 -0.392 0.147
Log odds ratio of avoiding relapse (vs placebo)
DF -0.741 0.141 -0.748 0.143 -0.728 0.145 -0.739 0.149 FM -0.861 0.154 -0.870 0.154 -0.875 0.160 -0.875 0.159 GA -0.528 0.114 -0.533 0.115 -0.509 0.120 -0.515 0.124 IA (IM) -0.308 0.107 -0.309 0.106 -0.301 0.110 -0.304 0.110 IA (SC) -0.382 0.139 -0.386 0.140 -0.371 0.148 -0.377 0.151 IB -0.614 0.143 -0.619 0.142 -0.644 0.156 -0.644 0.158 LM -0.276 0.107 -0.279 0.107 -0.283 0.111 -0.286 0.111 TF -0.433 0.160 -0.439 0.160 -0.419 0.162 -0.421 0.163
Log odds ratio of disability progression confirmed 3 months later (vs placebo)
DF -0.509 0.170 -0.499 0.170 -0.510 0.173 -0.507 0.173 FM -0.403 0.165 -0.397 0.166 -0.380 0.168 -0.391 0.170 GA -0.482 0.153 -0.475 0.151 -0.464 0.156 -0.464 0.158 IA (IM) -0.264 0.157 -0.260 0.156 -0.297 0.170 -0.294 0.169 IA (SC) -0.593 0.224 -0.594 0.225 -0.661 0.270 -0.656 0.259 IB -0.718 0.249 -0.708 0.240 -0.639 0.229 -0.648 0.231 LM -0.425 0.151 -0.420 0.150 -0.402 0.155 -0.409 0.154 TF -0.343 0.252 -0.341 0.250 -0.350 0.242 -0.348 0.245
Log odds ratio of disability progression confirmed 6 months later (vs placebo)
DF -0.523 0.204 -0.514 0.206 -0.499 0.205 -0.494 0.198 FM -0.412 0.181 -0.406 0.181 -0.409 0.182 -0.412 0.183 GA -0.496 0.185 -0.491 0.187 -0.465 0.185 -0.458 0.178 IA (IM) -0.271 0.163 -0.267 0.162 -0.289 0.163 -0.281 0.158 IA (SC) -0.594 0.230 -0.595 0.230 -0.529 0.220 -0.530 0.215 IB -0.749 0.312 -0.741 0.313 -0.875 0.372 -0.833 0.358 LM -0.433 0.166 -0.428 0.166 -0.433 0.168 -0.431 0.163 TF -0.358 0.288 -0.359 0.295 -0.350 0.304 -0.341 0.302
Log odds ratio of ALT above upper limit of normal range (vs placebo)
DF 0.292 0.210 0.299 0.210 0.289 0.213 0.301 0.209 FM 1.379 0.283 1.388 0.276 1.368 0.277 1.380 0.282 GA -0.170 0.214 -0.170 0.214 -0.188 0.221 -0.183 0.215 IA (IM) 0.577 0.210 0.582 0.209 0.582 0.208 0.590 0.206 IA (SC) 1.099 0.433 1.067 0.427 1.077 0.425 1.075 0.427 IB 0.929 0.377 0.935 0.374 0.959 0.373 0.958 0.369 LM 0.765 0.164 0.769 0.162 0.756 0.159 0.761 0.157 TF 0.711 0.247 0.716 0.243 0.774 0.242 0.781 0.244
Appendix C
420
Log odds ratio of ALT above 3x upper limit of normal range (vs placebo)
DF 0.232 0.172 0.237 0.173 0.209 0.167 0.215 0.166 FM 1.082 0.268 1.086 0.277 1.111 0.289 1.072 0.284 GA -0.129 0.168 -0.127 0.167 -0.119 0.158 -0.113 0.152 IA (IM) 0.455 0.188 0.458 0.188 0.402 0.190 0.397 0.185 IA (SC) 0.878 0.411 0.852 0.415 0.800 0.520 0.774 0.496 IB 0.746 0.363 0.749 0.364 0.893 0.652 0.828 0.558 LM 0.609 0.189 0.610 0.191 0.681 0.245 0.657 0.241 TF 0.551 0.197 0.554 0.197 0.464 0.189 0.466 0.191
Log odds ratio of ALT above 5x upper limit of normal range (vs placebo)
DF 0.084 0.089 0.088 0.092 0.080 0.087 0.072 0.087 FM 0.419 0.290 0.424 0.293 0.441 0.284 0.371 0.310 GA -0.054 0.085 -0.052 0.084 -0.060 0.093 -0.047 0.080 IA (IM) 0.171 0.131 0.174 0.133 0.169 0.132 0.145 0.137 IA (SC) 0.325 0.254 0.321 0.259 0.322 0.296 0.264 0.275 IB 0.276 0.222 0.282 0.232 0.362 0.394 0.287 0.322 LM 0.228 0.156 0.231 0.159 0.237 0.158 0.201 0.169 TF 0.211 0.158 0.213 0.158 0.222 0.181 0.187 0.181
Average mapping (reference outcome has constant mapping of 1)
log ARR 1 1 1 1 1 1 1 1 logit avoid relapse 1.159 0.332 1.188 0.344 1.237 0.379 1.235 0.363 logit 3M DP 1 1 1 1 1 1 1 1 logit 6M DP -1.041 0.290 -1.047 0.315 -1.097 0.336 -1.061 0.327 logit ALT>ULN 1 1 1 1 1 1 1 1 logit ALT>3xULN 0.796 0.177 0.793 0.183 0.778 0.236 0.740 0.210 logit ALT>5xULN 0.300 0.194 0.302 0.196 0.317 0.202 0.258 0.205
Between-study treatment effects sd 0.277 0.053 0.274 0.052 0.261 0.052 0.262 0.053 Between-treatment mapping sd N/A N/A N/A N/A 0.292 0.177 0.264 0.170 Residual deviance 162.3 17.9 163.0 17.9 161.6 17.7 162.1 17.8
Appendix C
421
2 Preference models
Sensitivity to priors
The two tables below show the posterior mean and standard deviation of the key parameters in the
ratings model (both PROTECT ratings datasets) with two alternative uniform priors on the ratings
standard deviation: 𝜎𝑟𝑎𝑡 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,2) and 𝜎𝑟𝑎𝑡 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,50) together with
𝜎𝑟𝑎𝑡 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10) (as per the main results in III.3.4.3).
FIXED PREFERENCES 𝜎𝑟𝑎𝑡 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,2) 𝜎𝑟𝑎𝑡 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10) 𝜎𝑟𝑎𝑡 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,50)
Unit Mean sd Mean sd Mean sd
Preference weights Relapse 1 event
11.6% 6.6% 11.5% 6.5% 11.6% 6.6%
Disability progression 1 event
15.1% 6.2% 15.0% 6.2% 15.1% 6.1%
PML 1 event 32.7% 6.5% 32.8% 6.4% 32.8% 6.4% Herpes reactivation 1 event
8.5% 4.9% 8.6% 4.9% 8.5% 4.9%
Liver enzyme elevation 1 event
8.8% 5.0% 8.8% 5.0% 8.8% 5.0%
Seizures 1 event 5.1% 3.2% 5.1% 3.2% 5.1% 3.2% Congenital abnormalities 1 event
5.1% 3.2% 5.1% 3.3% 5.1% 3.2%
Infusion/injection reactions 1 event
4.6% 2.5% 4.7% 2.5% 4.6% 2.5%
Allergic/hypersensitivity reactions 1 event
3.9% 3.1% 3.9% 3.1% 3.9% 3.1%
Flu-like reactions 1 event 4.1% 3.3% 4.1% 3.3% 4.1% 3.3% Administration (daily oral vs daily subcutaneous) N/A
0.5% 0.2% 0.5% 0.2% 0.5% 0.2%
Administration (monthly infusion vs daily subcutaneous) N/A
0.3% 0.1% 0.3% 0.1% 0.3% 0.1%
Administration (weekly intramuscular vs daily subcutaneous) N/A
0.2% 0.1% 0.2% 0.1% 0.2% 0.1%
Ratings standard deviation N/A
1.17 0.05 1.17 0.05 1.17 0.05
Residual deviance N/A 241.9 22.0 242.0 22.1 242.1 22.0
Appendix C
422
RANDOM PREFERENCES BY PARTICIPANT
𝜎𝑟𝑎𝑡 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,2) 𝜎𝑟𝑎𝑡 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10) 𝜎𝑟𝑎𝑡 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,50)
Unit Mean sd Mean sd Mean sd
Preference weights Relapse 1 event 12.0% 6.4% 12.0% 6.7% 12.1% 6.3% Disability progression 1 event 15.9% 6.2% 15.9% 6.4% 16.1% 6.4% PML 1 event 32.6% 7.1% 32.6% 7.0% 32.6% 7.0% Herpes reactivation 1 event 8.7% 4.8% 8.4% 4.6% 8.6% 4.5% Liver enzyme elevation 1 event 8.9% 4.8% 9.0% 5.1% 8.7% 4.8% Seizures 1 event 5.0% 3.1% 5.2% 3.1% 5.3% 3.4% Congenital abnormalities 1 event 5.0% 3.1% 4.9% 2.8% 5.1% 3.1% Infusion/injection reactions 1 event 4.3% 2.4% 4.4% 2.4% 4.2% 2.4% Allergic/hypersensitivity reactions 1 event 3.5% 2.8% 3.3% 2.5% 3.2% 2.3% Flu-like reactions 1 event 3.6% 2.8% 3.8% 3.0% 3.6% 2.7% Administration (daily oral vs daily subcutaneous) N/A 0.6% 0.2% 0.6% 0.2% 0.6% 0.2% Administration (monthly infusion vs daily subcutaneous) N/A 0.4% 0.1% 0.4% 0.1% 0.4% 0.1% Administration (weekly intramuscular vs daily subcutaneous) N/A 0.2% 0.1% 0.2% 0.1% 0.2% 0.1%
Ratings standard deviation N/A 1.02 0.06 1.01 0.06 1.02 0.06 Proportional between-participant preference standard deviation N/A 0.32 0.04 0.33 0.04 0.33 0.04 Residual deviance N/A 242.0 21.9 241.9 21.9 242.0 22.1
Appendix C
423
The table below shows the posterior mean and standard deviation of the key parameters in the
ratings model (both PROTECT ratings datasets, random preferences by participant) with two
alternative uniform priors on the random preferences standard deviation: 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,2) and
𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,50) together with 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10) (as per the main results in III.3.4.3).
RANDOM PREFERENCES BY PARTICIPANT
𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,2) 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10) 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,50)
Unit Mean sd Mean sd Mean sd
Preference weights Relapse 1 event 11.3% 6.3% 12.0% 6.7% 12.5% 6.4% Disability progression 1 event 14.9% 5.8% 15.9% 6.4% 16.0% 6.0% PML 1 event 33.4% 6.9% 32.6% 7.0% 32.3% 7.0% Herpes reactivation 1 event 8.8% 4.8% 8.4% 4.6% 8.6% 4.8% Liver enzyme elevation 1 event 9.6% 5.2% 9.0% 5.1% 8.7% 4.6% Seizures 1 event 5.3% 3.1% 5.2% 3.1% 5.0% 2.9% Congenital abnormalities 1 event 5.0% 3.0% 4.9% 2.8% 4.9% 2.9% Infusion/injection reactions 1 event 4.2% 2.2% 4.4% 2.4% 4.3% 2.2% Allergic/hypersensitivity reactions 1 event 3.4% 2.5% 3.3% 2.5% 3.4% 2.5% Flu-like reactions 1 event 3.5% 2.6% 3.8% 3.0% 3.7% 2.7% Administration (daily oral vs daily subcutaneous) N/A 0.6% 0.3% 0.6% 0.2% 0.6% 0.2% Administration (monthly infusion vs daily subcutaneous) N/A 0.4% 0.2% 0.4% 0.1% 0.4% 0.2% Administration (weekly intramuscular vs daily subcutaneous) N/A 0.2% 0.1% 0.2% 0.1% 0.2% 0.1%
Ratings standard deviation N/A 1.02 0.06 1.01 0.06 1.02 0.06 Proportional between-participant preference standard deviation N/A 0.33 0.04 0.33 0.04 0.32 0.04 Residual deviance N/A 242.0 22.1 241.9 21.9 242.0 22.0
Appendix C
424
The table below shows the posterior mean and standard deviation of the key parameters in the
preference meta-analysis model (random preferences by study) with two alternative uniform priors
on the random preferences standard deviation: 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,2) and 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,50)
together with 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10) (as per the main results in III.5.6.1).
RANDOM PREFERENCES BY STUDY
unit 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,2) 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10) 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,50)
mean SE mean SE mean SE
Utility coefficients on choice scale; i.e. effect on log odds of choice
Relapse rate 1 relapse/year -1.486 0.511 -1.474 0.523 -1.362 0.453 Disability progression 100% risk -3.199 1.681 -3.195 1.606 -3.121 1.737
Daily oral vs daily subcutaneous N/A 2.719 0.752 2.718 0.775 1.593 0.584
Monthly infusion vs daily subcutaneous N/A 0.611 0.287 0.610 0.291 4.004 1.051
Weekly intramuscular vs daily subcutaneous N/A 0.529 0.433 0.517 0.428 0.362 0.276
Normalised preference weights
Relapse rate 1 relapse/year 17.8% 5.3% 20.3% 6.0% 13.3% 4.1%
Disability progression 100% risk 36.7% 9.5% 42.2% 9.7% 29.4% 8.9%
Daily oral vs daily subcutaneous N/A 32.4% 6.9% 37.5% 8.5% 15.2% 3.6%
Monthly infusion vs daily subcutaneous N/A 7.2% 2.6% 8.4% 3.5% 38.7% 6.5%
Weekly intramuscular vs daily subcutaneous N/A 6.0% 4.1% 7.1% 5.6% 3.4% 2.0%
Between-study proportional preference standard deviation N/A 0.64 0.16 0.65 0.17 0.66 0.16
Residual deviance N/A 45.6 6.5 45.6 6.5 45.7 6.6
Appendix C
425
The table below shows the posterior mean and standard deviation of the key parameters in the full
preference model model (random preferences by study) with two alternative uniform priors on the
random preferences standard deviation: 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,2) and 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,50) together
with 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10) (as per the main results in III.5.6.1).
RANDOM PREFERENCES BY STUDY unit 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,2) 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10) 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,50)
mean SE mean SE mean SE
Utility coefficients on choice scale; i.e. effect on log odds of choice
Relapse rate 1 relapse/year -1.86 1.02 -1.62 0.62 -1.77 0.64
Disability progression 100% risk -7.31 2.23 -7.26 2.21 -7.23 2.16 PML 100% risk -244.5 75.5 -245.3 75.3 -245.6 75.3 Liver enzyme elevation 100% risk -22.75 25.87 -21.22 23.23 -20.83 24.34 Allergic/hypersensitivity reactions 100% risk -6.31 8.23 -5.92 7.22 -2.75 0.64 Serious allergic reactions 100% risk -39.56 4.40 -39.48 4.39 -0.74 0.30 Depression 100% risk -5.10 0.89 -5.10 0.88 -0.69 0.43 Infusion/injection reactions 100% risk -20.85 35.38 -19.31 35.30 -5.95 7.47 Daily oral vs daily subcutaneous N/A -2.79 0.67 -2.72 0.64 -39.48 4.39 Monthly infusion vs daily subcutaneous N/A -0.76 0.33 -0.72 0.31 -5.08 0.88 Weekly intramuscular vs daily subcutaneous N/A -0.70 0.44 -0.66 0.43 -19.30 32.32
Normalised preference weights for synthesised RRMS outcomes and treatment administration modes
Relapse rate 1 relapse/year 7.0% 4.4% 6.5% 3.6% 7.2% 4.3%
Disability progression 100% risk 27.9% 13.4% 28.8% 13.5% 29.0% 13.4% Liver enzyme elevation 100% risk 54.2% 20.4% 53.7% 20.3% 52.6% 20.6% Daily oral vs daily subcutaneous N/A 10.8% 5.4% 11.0% 5.4% 11.2% 4.0% Monthly infusion vs daily subcutaneous N/A 2.9% 1.7% 2.9% 1.7% 6.4% 13.4% Weekly intramuscular vs daily subcutaneous N/A 2.6% 2.0% 2.6% 2.0% 6.0% 20.6%
Ratings standard deviation N/A 1.19 0.06 1.20 0.06 1.19 0.06 Proportional between-study preference standard deviation N/A 0.59 0.09 0.58 0.09 0.58 0.09 Ratings model residual deviance N/A 230.0 21.5 229.9 21.4 230.1 21.5 Choice model residual deviance N/A 94.4 3.6 94.4 3.6 94.4 3.6 Preference synthesis residual deviance N/A 45.6 6.5 45.8 6.6 45.6 6.5 Total residual deviance N/A 370.0 22.8 370.1 22.7 370.1 22.7
Appendix C
426
The table below shows the posterior mean and standard deviation of the key parameters in the full
preference model model (random preferences by study) with an alternative uniform prior on the
utility coefficients: 𝑒𝑔𝜔 ~ 𝑁+(0,10000) (a folded Normal distribution; see II.4.8) together with
𝑒𝑔𝜔 ~ 𝐺𝑎𝑚𝑚𝑎(1,0.01) as per the main results in III.5.6.1.
RANDOM PREFERENCES BY STUDY unit 𝑒𝑔𝜔 ~ 𝐺𝑎𝑚𝑚𝑎(1,0.01) 𝑒𝑔𝜔 ~ 𝑁+(0,10000)
mean SE mean SE
Utility coefficients on choice scale; i.e. effect on log odds of choice
Relapse rate 1 relapse/year -1.62 0.62 -1.78 0.68
Disability progression 100% risk -7.26 2.21 -7.32 2.23 PML 100% risk -245.3 75.3 -201.4 43.5 Liver enzyme elevation 100% risk -21.22 23.23 -26.48 27.54 Allergic/hypersensitivity reactions 100% risk -5.92 7.22 -7.47 9.29 Serious allergic reactions 100% risk -39.48 4.39 -39.49 4.36 Depression 100% risk -5.10 0.88 -5.10 0.90 Infusion/injection reactions 100% risk -19.31 35.30 -22.72 30.99 Daily oral vs daily subcutaneous N/A -2.72 0.64 -2.78 0.66 Monthly infusion vs daily subcutaneous N/A -0.72 0.31 -0.76 0.32 Weekly intramuscular vs daily subcutaneous N/A -0.66 0.43 -0.71 0.44
Normalised preference weights for synthesised RRMS outcomes and treatment administration modes
Relapse rate 1 relapse/year 6.5% 3.6% 6.4% 3.8%
Disability progression 100% risk 28.8% 13.5% 25.7% 13.0% Liver enzyme elevation 100% risk 53.7% 20.3% 57.9% 20.1% Daily oral vs daily subcutaneous N/A 11.0% 5.4% 10.0% 5.3% Monthly infusion vs daily subcutaneous N/A 2.9% 1.7% 2.7% 1.7% Weekly intramuscular vs daily subcutaneous N/A 2.6% 2.0% 2.5% 1.9%
Ratings standard deviation N/A 1.20 0.06 1.20 0.06 Proportional between-study preference standard deviation N/A 0.58 0.09 0.59 0.09 Ratings model residual deviance N/A 229.9 21.4 229.9 21.4 Choice model residual deviance N/A 94.4 3.6 94.5 3.6 Preference synthesis residual deviance N/A 45.8 6.6 45.6 6.5 Total residual deviance N/A 370.1 22.7 370.0 22.7
Appendix C
427
3 MCDA model
Sensitivity to assumed correlations
The graphs below show the SUCRA statistic (based on population-average benefit-risk scores) for the
treatments in the RRMS case study under different assumptions regarding the correlations between
pair of outcomes in the evidence synthesis.
All correlations = 0
All correlations = 0.3
All correlations = 0.6
All correlations = 0.9
Vague prior on correlation propensities, between-study = within-study
Vague prior on correlation propensities,
between-study within-study
Appendix C
428
Sensitivity to priors
The graphs below show the SUCRA statistic (based on population-average benefit-risk scores) for the
treatments in the RRMS case study under alternative priors for the random effects standard
deviation, utility coefficients and random preferences standard deviation.
Main priors
𝝈 ~ 𝑼𝒏𝒊𝒇𝒐𝒓𝒎(𝟎, 𝟐)
𝒆𝒈𝝎 ~ 𝑵+(𝟎, 𝟏𝟎𝟎𝟎𝟎) (see II.4.8)
𝝈𝒑𝒓𝒆𝒇 ~ 𝑼𝒏𝒊𝒇𝒐𝒓𝒎(𝟎, 𝟐)
Appendix C
429
Sensitivity to initial values
The graphs below show the SUCRA statistic (based on population-average benefit-risk scores) for the
treatments in the RRMS case study under alternative sets of initial values for the utility coefficients
(as specificed above each graph in BUGS format).
eg=c(1,1,1,1,1,0.5,0.5,1,1,1,1,NA,NA,NA)
eg=c(0.5,0.5,0.5,0.5,0.5,0.1,0.1,0.1,0.5,0.5,0.5,NA,NA,NA)
eg=c(4,3,4,3,4,1,2,4,3,4,3,NA,NA,NA)
eg=c(0.1,0.2,0.3,0.4,0.5,0.01,0.05,0.1,0.2,0.3,0.4,NA,NA,NA)