spiral.imperial.ac.uk...3 . Acknowledgements . I am deeply grateful to my supervisors Professor Deborah Ashby and Professor Paul Matthews. Their help, advice, inspiration, reassurance,

1

Bayesian statistics in the assessment of the benefit-risk balance of medicines using Multi Criteria Decision Analysis

Submitted for the degree of PhD

by

Edward Waddingham

Imperial Clinical Trials Unit, School of Public Health, Faculty of Medicine, Imperial College London.

2

Declaration of Originality

I hereby declare that the work in this thesis is my own contribution. Any published and unpublished

work of others has been acknowledged in the text and a list of references is given.

Copyright Declaration

The copyright of this thesis rests with the author. Unless otherwise indicated, its contents are

licensed under a Creative Commons Attribution-Non Commercial 4.0 International Licence (CC BY-

NC).

Under this licence, you may copy and redistribute the material in any medium or format. You may

also create and distribute modified versions of the work. This is on the condition that: you credit the

author and do not use it, or any derivative works, for a commercial purpose.

When reusing or sharing this work, ensure you make the licence terms clear to others by naming the

licence and linking to the licence text. Where a work has been adapted, you should indicate that the

work has been changed and describe those changes.

Please seek permission from the copyright holder for uses of this work that are not included in this

licence or permitted under UK Copyright Law.

3

Acknowledgements

I am deeply grateful to my supervisors Professor Deborah Ashby and Professor Paul Matthews.

Their help, advice, inspiration, reassurance, feedback, and patience have been immensely valuable in

producing this work.

Many thanks also to Dr Marc Chadeau-Hyam and Professor Nicky Welton, who examined this thesis

and whose feedback has greatly improved the manuscript.

I am indebted to my colleagues from PROTECT (Pharmacological Research on Outcomes of

Therapeutics by a European ConsorTium, funded by the Innovative Medicines initiative – www.imi-

protect.eu) for my introduction to benefit-risk assessment while still an MSc student, and the boost

that it gave to my fledgling career in biostatistical research. PROTECT has influenced this thesis in a

more direct sense, providing a great deal of the data I have used. In particular I am grateful to the

late Richard Nixon for leading the natalizumab case study team, Kimberley Hockley and her Patient

& Public Involvement team for their work on patient preference elicitation (which provided much of

the data used herein) and Shahrul Mt-Isa for his technical expertise and support.

Additional thanks to Kimberley Hockley for her comments on a draft of the manuscript.

This work was funded by an Imperial College PhD Scholarship.

4

Abstract

Medical decisions such as benefit-risk assessments of treatments should be based on the best

clinical evidence but also require subjective value judgements regarding the impact of disease and

treatment outcomes. This thesis argues for a Bayesian implementation of Multi-Criteria Decision

Analysis (MCDA) for such problems. It seeks to establish whether suitable Bayesian models can be

constructed given the variety of data formats and the interdependencies between the many

variables involved.

A modelling framework is developed for joint multivariate Bayesian inference of treatment effects

and preference values based on data from clinical trials and stated preference studies. This method

allows the sampling uncertainty of the parameters to be reflected in the analysis, overcoming a

recognised shortcoming of MCDA. Markov Chain Monte Carlo simulation is used to derive the

posterior distributions. The models are illustrated using a case study involving treatments for

relapsing remitting multiple sclerosis.

The clinical evidence synthesis has several advantages over existing multivariate evidence synthesis

models, including a comprehensive flexible allowance for correlations, compatibility with any

number of treatments and outcomes, and the ability to estimate unreported treatment-outcome

combinations.

The preference models can analyse data from a variety of elicitation methods such as discrete

choice, Analytic Hierarchy Process and swing weighting. In the case of swing weighting no Bayesian

analysis has previously been presented, and the results suggest a possible flaw in the standard

deterministic analysis that may bias the preference estimates when judgements are subject to

random variability. A novel meta-analysis model for preference elicitation studies is also presented.

The framework has the unique ability to analyse data from multiple methods jointly to yield a

common set of preference parameters.

These results demonstrate the flexibility of the Bayesian approach, and the depth of insight it can

provide into the impact of uncertainty and heterogeneity in multi-criteria medical decisions.

5

Table of Contents

Declaration of Originality ................................................................................................................... 2

Copyright Declaration ........................................................................................................................ 2

Acknowledgements ........................................................................................................................... 3

Abstract ............................................................................................................................................ 4

List of abbreviations ........................................................................................................................ 10

List of figures ................................................................................................................................... 12

List of tables .................................................................................................................................... 17

I. Introduction ............................................................................................................................ 19

I.1 Background to the thesis .................................................................................................. 19

I.1.1 Evidence-based medical decisions ............................................................................ 19

I.1.2 Benefit-risk balance .................................................................................................. 19

I.1.3 Benefit-risk assessment in practice ........................................................................... 20

I.1.4 Structured/quantitative benefit-risk assessment ...................................................... 22

I.1.5 Multi-Criteria Decision Analysis (MCDA) ................................................................... 23

I.1.6 Uncertainty, decision making and Bayesian statistics ................................................ 25

I.2 Purpose of the research ................................................................................................... 28

I.2.1 Motivations .............................................................................................................. 28

I.2.2 Research question .................................................................................................... 29

I.3 Methods/strategy ............................................................................................................ 31

I.3.1 Work plan and thesis structure ................................................................................. 31

I.3.2 Case study ................................................................................................................ 31

I.3.3 Software ................................................................................................................... 31

I.4 Literature search .............................................................................................................. 32

I.4.1 Search strategy ......................................................................................................... 32

I.4.2 Literature search flowchart....................................................................................... 34

I.4.3 Literature on quantitative benefit-risk assessment ................................................... 34

II. Bayesian synthesis of clinical evidence for benefit-risk assessment .......................................... 36

6

II.1 Background, aims & objectives ......................................................................................... 36

II.1.1 Introduction ............................................................................................................. 36

II.1.2 Aim, objectives, scope .............................................................................................. 43

II.1.3 Synopsis of literature ................................................................................................ 44

II.2 High level model structure ............................................................................................... 49

II.3 Data ................................................................................................................................. 50

II.3.1 Data structure .......................................................................................................... 50

II.3.2 Dataset: Relapsing-remitting multiple sclerosis ......................................................... 51

II.4 Treatment effects module ................................................................................................ 58

II.4.1 Initial (naïve) model: all outcomes independent (Model 0) ....................................... 58

II.4.2 Correlated non-zero outcomes (Model 1) ................................................................. 60

II.4.3 Contrast-level data (Model 1*) ................................................................................. 64

II.4.4 BUGS coding via variance decomposition ................................................................. 65

II.4.5 Fixed baseline (Model 2) ........................................................................................... 73

II.4.6 Mappings (Model 3) ................................................................................................. 76

II.4.7 Outcomes with zeroes (Models 4a and 4b) ............................................................... 80

II.4.8 Priors ........................................................................................................................ 83

II.4.9 Assessing model fit and complexity .......................................................................... 84

II.5 Population calibration module ......................................................................................... 87

II.5.1 Statistical model ....................................................................................................... 87

II.5.2 Priors ........................................................................................................................ 89

II.5.3 Outputs .................................................................................................................... 90

II.5.4 Rankings ................................................................................................................... 92

II.6 Results ............................................................................................................................. 93

II.6.1 Treatment effects module ........................................................................................ 93

II.6.2 Population calibration module ................................................................................ 122

II.6.3 Final synthesised outcomes on absolute scale ........................................................ 123

II.6.4 Rankings ................................................................................................................. 130

7

II.6.5 Conclusions regarding RRMS treatments ................................................................ 134

II.6.6 Sensitivity analyses ................................................................................................. 135

II.7 Discussion ...................................................................................................................... 135

III. Bayesian multi-criteria utility modelling ............................................................................. 142

III.1 Background, aims & objectives ....................................................................................... 142

III.1.1 Introduction ........................................................................................................... 142

III.1.2 Preference elicitation methods ............................................................................... 147

III.1.3 Data types .............................................................................................................. 154

III.1.4 Allowing for uncertainty in preferences .................................................................. 167

III.1.5 Aim and objectives ................................................................................................. 168

III.2 High level model structure ............................................................................................. 170

III.2.1 Notes on preference parameters ............................................................................ 170

III.3 Bayesian analysis of elicited ratings ................................................................................ 178

III.3.2 Datasets ................................................................................................................. 181

III.3.3 Statistical model ..................................................................................................... 185

III.3.4 Results.................................................................................................................... 191

III.3.5 Discussion .............................................................................................................. 202

III.4 Bayesian analysis of choice data ..................................................................................... 204

III.4.1 Data structure ........................................................................................................ 204

III.4.2 Dataset - PROTECT patient choice data ................................................................... 204

III.4.3 Choice model ......................................................................................................... 205

III.4.4 Results.................................................................................................................... 207

III.4.5 Discussion .............................................................................................................. 208

III.5 Bayesian meta-analysis of preferences ........................................................................... 210

III.5.1 Data structure ........................................................................................................ 211

III.5.2 Dataset: RRMS ........................................................................................................ 211

III.5.3 Data extraction ....................................................................................................... 213

III.5.4 Data rebasing ......................................................................................................... 219

8


III.5.6 Results.................................................................................................................... 226

III.5.7 Discussion .............................................................................................................. 229

III.6 Combining preferences from different methods ............................................................. 232

III.6.1 Datasets ................................................................................................................. 233


III.6.3 Results.................................................................................................................... 240

III.6.4 Discussion .............................................................................................................. 247

IV. Assessing the overall benefit-risk balance .......................................................................... 253

IV.1 Methods ........................................................................................................................ 254

IV.1.1 High level model structure ...................................................................................... 254

IV.1.2 Selection of outcomes and model versions ............................................................. 254

IV.2 Results ........................................................................................................................... 257

IV.2.1 Benefit-risk scores .................................................................................................. 257

IV.2.2 Rankings ................................................................................................................. 260

IV.2.3 Sensitivity analyses ................................................................................................. 262

IV.3 Discussion ...................................................................................................................... 268

IV.3.1 Bayesian MCDA ...................................................................................................... 268

IV.3.2 Benefit-risk assessment of RRMS treatments .......................................................... 269

V. Conclusions ........................................................................................................................... 271

V.1 Summary of results ........................................................................................................ 271

V.1.1 Bayesian synthesis of clinical evidence for benefit-risk assessment (Chapter II) ...... 271

V.1.2 Bayesian multi-criteria utility modelling (Chapter III) .............................................. 272

V.1.3 Assessing the overall benefit-risk balance (Chapter IV) ........................................... 272

V.2 Strengths ....................................................................................................................... 273

V.3 Limitations ..................................................................................................................... 273

V.4 Reflections on generalisability & applicability ................................................................. 275

V.5 Contribution to the field ................................................................................................. 277

9

V.6 Future research priorities ............................................................................................... 278

V.7 Concluding summary ...................................................................................................... 280

References .................................................................................................................................... 281

Appendices.................................................................................................................................... 293

10

List of abbreviations

Abbreviation Full name

AHP Analytic Hierarchy Process

ALT Alanine Aminotransferase

ARR Annualised Relapse Rate

BR Benefit-Risk

BRA Benefit-Risk Assessment

BUGS Bayesian Inference Using Gibbs Sampling

CC Continuity Correction

CI Confidence Interval or Credibility Interval

DCE Discrete Choice Experiment

DF Dimethyl Fumarate

DP Disability Progression

EDSS Expanded Disability Status Scale

EMA European Medicines Agency

EU European Union

FDA Food & Drug Administration

FM Fingolimod

GA Glatiramer Acetate

GI Gastrointestinal

IA (IM) Interferon beta-1a (intramuscular)

IA (SC) Interferon beta-1a (subcutaneous)

IB Interferon beta-1b

IM Intramuscular

IV Intravenous

JAGS Just Another Gibbs Sampler

LQ Laquinimod

MA Meta-Analysis

MACBETH Measuring Attractiveness by a Categorical-Based Evaluation Technique

MAUT Multi-Attribute Utility Theory

MCDA Multi-Criteria Decision Analysis

MCMC Markov Chain Monte Carlo

MED Macular Edema

MHRA Medicines and Healthcare Regulatory Agency

MNL Multinomial logit

MS Multiple Sclerosis

NICE National Institute for Clinical Excellence

NMA Network Meta-Analysis

PL Placebo

PML Progressive Multifocal Leukoencephalopathy

PROTECT Pharmacoepidemiological Research on Outcomes of Treatments by a European Consortium

PVF Partial Value Function

RFP Relapse-Free Proportion

RRMS Relapsing Remitting Multiple Sclerosis

SBC Serious Bradycardia

SC Subcutaneous

SD Standard Deviation

SE Standard Error

11

SGI Serious Gastrointestinal disorders

SUCRA Surface Under the Cumulative Ranking Curve

TF Teriflunomide

UK United Kingdom

ULN Upper Limit of Normal range

USA United States of America

12

List of figures

Figure 1 – Literature search flowchart ............................................................................................. 34

Figure 2 – Network diagram: pairwise meta-analysis ....................................................................... 38

Figure 3 – Network diagram: simple network meta-analysis (i).. ...................................................... 38

Figure 4 – Network diagram: simple network meta-analysis (ii). ...................................................... 39

Figure 5 – A disconnected network involving treatments A, B, C, D and E (top) is made connected by

the addition of treatment F (bottom). ............................................................................................. 41

Figure 6 – Venn diagram illustrating the relationships between various types of meta-analysis model.

........................................................................................................................................................ 45

Figure 7 – Pictorial representation of the types of meta-analysis model discussed in this section.. .. 47

Figure 8 - High-level model structure, focusing on clinical evidence synthesis. ................................. 49

Figure 9 – Network diagram for the RRMS case study (all outcomes combined). .............................. 53

Figure 10 - Outcomes for the RRMS case study ................................................................................ 55

Figure 11 - Posterior credibility intervals of relative treatment effects (population averages) from

Model 0, fixed effects. ..................................................................................................................... 94


Model 0, random effects (except serious GI disorders, serious bradycardia and macular edema). ... 95


Model 1 (random effects), with all correlations between outcomes set to zero ............................... 98


Model 1 (random effects), with all correlations between outcomes set to 0.6. ................................ 99


Model 2 (random effects), with all correlations between outcomes set to zero. ............................ 102


Model 2 (random effects), with all correlations between outcomes set to zero. ............................ 103

Figure 17 – Deviance and complexity (leverage) per observation for individual studies in the RRMS

dataset (Model 2, correlations of 0.6). ........................................................................................... 105


Model 3 (random effects, fixed mappings, one mapping group, all correlation coefficients between

outcomes = 0.6).. ........................................................................................................................... 106


Model 3 (random effects, random mappings, one mapping group, all correlation coefficients

between outcomes = 0.6). ............................................................................................................. 107

13


dataset (Model 3, correlations of 0.6, fixed mappings in one group). ............................................. 108


dataset (Model 3, correlations of 0.6, random mappings in one group). ........................................ 109


Model 4b (random effects, random mappings, one mapping group, all correlation coefficients

between outcomes = 0.6, sample variances estimated as 𝟎.𝟎𝟐𝟓 + 𝒑𝟎.𝟗𝟕𝟓− 𝒑 × 𝟏𝟎𝟎𝑵 for the

“zeroes” outcomes). ...................................................................................................................... 116


dataset (Model 4b, correlations of 0.6, random mappings in one group). ...................................... 117

Figure 24 - Posterior distributions of relative treatment effects (population averages) on Normal

scale from the final model (fixed effects on “zeroes” outcomes, otherwise random effects, three

mapping groups, all correlation coefficients between outcomes = 0.6). ......................................... 119

Figure 25 - Posterior distributions of relative treatment effects (population averages) on Normal




dataset (Final model, fixed mappings in three groups). .................................................................. 121


dataset (Final model, random mappings in three groups). ............................................................. 122

Figure 28 - Posterior distributions of absolute treatment outcomes (population averages) on Normal



Figure 29 - Posterior distributions of absolute treatment outcomes (population averages) on their

original scales from the final model (fixed effects on “zeroes” outcomes, otherwise random effects,

three mapping groups, all correlation coefficients between outcomes = 0.6). ................................ 126

Figure 30 - Posterior distributions of absolute treatment outcomes (study-level averages) on their

original scales from the final model (fixed effects on “zeroes” outcomes, otherwise random effects,

three mapping groups, all correlation coefficients between outcomes = 0.6). ................................ 128

Figure 31 - Posterior distributions of absolute treatment outcomes (individual-level) on their original

scales from the final model (fixed effects on “zeroes” outcomes, otherwise random effects, three


Figure 32 - SUCRA based on population averages; fixed mapping model ........................................ 130

Figure 33 - SUCRA based on population averages; random mapping model .................................. 131

14

Figure 34 - SUCRA based on population averages: serious gastrointestinal events ....................... 132

Figure 35 - SUCRA for the efficacy and liver outcomes in the three-group fixed-mapping model: the

impact of predictive variability. ..................................................................................................... 133

Figure 36 - Probabilistic rankings for the population average relapse rate, one-group random

mappings model ............................................................................................................................ 134

Figure 37 – A “star”-shaped evidence network with six active treatments (A, B, C, D, E, F) and placebo

(P). ................................................................................................................................................ 140

Figure 38 – Example of an AHP judgement matrix. ......................................................................... 148

Figure 39 – Swing weighting example using RRMS treatment outcomes and administration modes:

step (i) ........................................................................................................................................... 150


step (ii) .......................................................................................................................................... 151


step (iii) ......................................................................................................................................... 152


step (iv).. ....................................................................................................................................... 153

Figure 43 – Example of a binary choice set..................................................................................... 154

Figure 44 - Simple example of a network of outcome preferences (i). ............................................ 158

Figure 45 - Simple example of a network of outcome preferences (ii). ........................................... 159

Figure 46 – Example of a “web” network with six outcomes/criteria .............................................. 160

Figure 47 – Example of a “fan” network with six outcomes/criteria ............................................... 160

Figure 48 – Hierarchical elicitation network for 10 criteria arranged in two groups of three and one

group of four, using the agglomeration rule and webs at both levels of the hierarchy. .................. 162

Figure 49 - Hierarchical elicitation network for 10 criteria arranged in two groups of three and one

group of four, using the substitution rule and webs at both levels of the hierarchy. ...................... 163

Figure 50 - Hierarchical elicitation network for 10 criteria arranged in two groups of three and one

group of four, using the substitution rule and fans at both levels of the hierarchy – that is, a tree.

...................................................................................................................................................... 164

Figure 51 - Hierarchical elicitation network in Figure 50, shown before identification of criteria for

promotion, i.e. in value tree format. .............................................................................................. 165

Figure 52 - High-level model structure, focusing on preference modelling. .................................... 170

Figure 53 – Two ways to display preferences for categorical variables – an example using criteria

from the RRMS case study (but fictional data). .............................................................................. 174

Figure 54 – Value tree for the RRMS investigator ratings dataset before elicitation. ...................... 183

15

Figure 55 – Value tree for the RRMS investigator ratings dataset after the elicitation process is

complete. ...................................................................................................................................... 184

Figure 56 – Elicitation network diagram for administration modes in the PROTECT RRMS patient

ratings data. .................................................................................................................................. 185

Figure 57 - Network diagram of preference elicitation studies concerning relapsing remitting multiple

sclerosis treatment outcomes........................................................................................................ 224

Figure 58 – Example of combined preference network .................................................................. 234

Figure 59 – Hierarchical structure of the preference data, indicating the levels where random

preference distributions can be used. ............................................................................................ 238

Figure 60 – Forest plot showing the posterior predictive distributions of preference weights in the

full RRMS preference model, at various levels of predictive variability. .......................................... 246

Figure 61 – Preference weights (posterior means) for the key benefit-risk criteria, for three different

combinations of the source datasets. ............................................................................................ 252

Figure 62 – High level structure of the entire benefit-risk assessment model. ................................ 254

Figure 63 – Posterior benefit-risk score for RRMS treatments at three levels of predictive variability.

...................................................................................................................................................... 258

Figure 64 – Posterior benefit-risk score for RRMS treatments at three levels of predictive variability,

with a maximum of 3 relapses per year. ........................................................................................ 259

Figure 65 - SUCRA statistic for the overall benefit-risk score of the RRMS treatments at three levels of

predictive variability. ..................................................................................................................... 261

Figure 66 – SUCRA statistic by treatment based on population-average benefit risk score; efficacy

outcomes only ............................................................................................................................... 262

Figure 67 – SUCRA statistic by treatment based on population-average benefit risk score; liver safety

only. P ........................................................................................................................................... 263

Figure 68 – SUCRA statistic by treatment based on population-average benefit risk score; efficacy

and liver safety outcomes (but not administration). ...................................................................... 263

Figure 69 - SUCRA statistic by treatment based on population-average benefit risk score; disability

progression weight relates to disability progression events confirmed 6 months later (rather than 3

months later in the main results). .................................................................................................. 264

Figure 70 - SUCRA statistic by treatment based on population-average benefit risk score; liver

enzyme elevation weight relates to alanine aminotransferase above 3x upper limit of normal range

(rather than simply above upper limit of normal range as in the main results).. ............................. 265

16

Figure 71 - SUCRA statistic by treatment based on population-average benefit risk score; liver

enzyme elevation weight relates to alanine aminotransferase above 5x upper limit of normal range

(rather than simply above upper limit of normal range as in the main results).. ............................. 266

Figure 72 - SUCRA statistic by treatment based on population-average benefit risk score; preferences

from published studies excluded. .................................................................................................. 267

Figure 73 - SUCRA statistic by treatment based on population-average benefit risk score; PROTECT

patient choice dataset excluded. ................................................................................................... 267

Figure 74 - SUCRA statistic by treatment based on population-average benefit risk score; PROTECT

ratings datasets excluded. ............................................................................................................. 268

17

List of tables

Table 1 - Proportion of patients experiencing effects of treatment for a fictional chronic disease. ... 23

Table 2 - Proportion of patients experiencing effects of treatment for a fictional chronic disease. ... 24

Table 3 - Proportion of patients experiencing effects of treatment for a fictional chronic disease –

with illustrative weights .................................................................................................................. 24

Table 4 – Treatments in the RRMS case study. ................................................................................. 52

Table 5 – Published trial reports providing data to the RRMS case study. ......................................... 56

Table 6 – Distributions commonly used for modelling clinical outcomes at group level. .................. 58

Table 7 – Posterior mean effect estimates from Model 2. ................................................................ 76

Table 8 – Priors for treatment effect module parameters ................................................................ 84

Table 9 – Priors for the population calibration module. ................................................................... 90

Table 10 – Posterior distributions from Model 4: effect of varying mapping groups (random effects,

all correlation coefficients between outcomes = 0.6). .................................................................... 110

Table 11 – Posterior distributions from Models 4a and 4b for the “zeroes” outcomes (fixed effects,

no correlations between outcomes), and empirical treatment effect estimates. .......................... 112

Table 12 - Posterior distributions from Models 4a and 4b (with 100x inflated sample variances) for

the “zeroes” outcomes (fixed effects, no correlations between outcomes), and empirical treatment

effect estimates. . .......................................................................................................................... 114

Table 13 - Posterior distributions of untreated population outcomes on Normal scale from

population calibration module....................................................................................................... 123

Table 14 – Data structure for a “fan” network with six outcomes A, B, C, D, E and F. ..................... 179

Table 15 – Data structure for a “web” network with six criteria A, B, C, D, E and F. ........................ 179

Table 16 – Data structure for the network in Figure 49 with ten outcomes A, B, C, D, E, F, G, H, I and

J. ................................................................................................................................................... 180

Table 17 – Data structure for the tree in Figure 50 with ten outcomes A, B, C, D, E, F, G, H, I and J.

...................................................................................................................................................... 181

Table 18 – Priors for ratings model parameters ............................................................................. 190

Table 19 – Mean preference weights for individual participants in the investigator ratings dataset;

deterministic analysis and Bayesian analysis with sensitivity to assumed ratings standard deviation

...................................................................................................................................................... 193

Table 20 – Posterior distribution of preferences for simultaneous analysis of all participants in the

investigator ratings dataset ........................................................................................................... 194

Table 21 – Median preferences for a single participant in the patient ratings dataset; deterministic

analysis and Bayesian analysis with sensitivity to assumed ratings standard deviation ................... 196

18


patient ratings dataset .................................................................................................................. 198


investigator ratings and patient ratings datasets ........................................................................... 200

Table 24 - Posterior distribution of preferences in the patient choice dataset ................................ 208

Table 25 – Comparison of criteria weights in the investigator ratings and patient choices datasets.

...................................................................................................................................................... 209

Table 26 – RRMS case study outcomes for the preference synthesis module. PVF = partial value

function......................................................................................................................................... 212

Table 27 – Source studies for the RRMS dataset for the preference synthesis module. .................. 213

Table 28 – Example of a 4-category variable and its dummy-coded indicator variables .................. 214

Table 29 – Example of a 4-category variable and its effects-coded indicator variables ................... 216

Table 30 - Posterior distribution of preferences in published RRMS preference elicitaton studies;

fixed preference model ................................................................................................................. 226

Table 31 - Posterior distribution of preferences in published RRMS preference elicitation studies;

random preference model ............................................................................................................. 227

Table 32 - Posterior distribution of preferences in published RRMS choice studies and summary data

from PROTECT patient choice study; random preference model .................................................... 228

Table 33 – Treatment outcomes and administration modes for the RRMS treatments, and the

availability of corresponding preference data. ............................................................................... 233

Table 34 – Overall RRMS preference model: Criteria from each dataset for inclusion/exclusion .... 236

Table 35 - Posterior distribution of preferences based on published RRMS choice studies and full

data from PROTECT patient choice study; fixed preference model ................................................. 241

Table 36 - Posterior distribution of preferences based on published RRMS choice studies and full

data from PROTECT patient choice study; random (by study) preference model ............................ 242

Table 37 - Posterior distribution of preferences based on all preference datasets; fixed preference

model ............................................................................................................................................ 244

Table 38 - Posterior distribution of preferences based on all preference datasets; random (by study)

preference model .......................................................................................................................... 245

Table 39 – Identification of outcomes to which preferences relate for the criteria in the RRMS case

study. ............................................................................................................................................ 255

Table 40 – Benefit-risk score by treatment, with breakdown by criterion. Figures are population

average posterior means and (standard deviations). ..................................................................... 258

Chapter I.1

19

I. Introduction

I.1 Background to the thesis

I.1.1 Evidence-based medical decisions

Modern healthcare provision is guided by the concept of “evidence-based medicine”, which has

been defined as “the conscientious, explicit, and judicious use of current best evidence in making

decisions about the care of individual patients” 1.

Ultimately most medical decisions are a choice between treatments (or the option of no treatment)

for a particular indication. For example, a patient may need to choose which (if any) drug to take; a

clinician may need to choose which treatment to prescribe; a healthcare provider, which drugs to

stock; a regulator, which treatments to license; or a pharmaceutical company, which potential new

drugs to develop. All of these decisions ultimately influence the patient’s choice of treatment.

Medical statisticians aim to provide evidence to inform decisions such as these. However, they often

do not attempt to directly answer the overall question of which treatment(s) is/are optimal. Instead

they break the problem down into smaller, more focused questions that can be answered more

easily (which treatment is best with respect to efficacy outcome A, or safety outcome B, for

example). This is a necessary part of the evidence gathering process, but unless it is made clear how

the answers to those smaller questions can be put together to answer the overall question, then

arguably the job is only half done.

As the examples above illustrate, within the healthcare field there are a number of different decision

makers and types of decision. This thesis focuses in particular on regulatory benefit-risk

assessments, as explained in the next section, but many of the principles apply to medical decision

making in the more general context.

I.1.2 Benefit-risk balance

Ensuring the safety of medicines is a key priority for drug developers and regulators, but in practice

there is no such thing as a drug that is 100% safe. All pharmaceutical treatments are associated

with a risk of adverse events of one kind or another, even if the events are mild or the risk is low (or

confined to certain subgroups of the patient population).

The key question then, when deciding whether a treatment is appropriate, is: do the benefits

outweigh the risks?

Chapter I.1

20

In the context of regulatory licensing this is known as the benefit-risk balance. The concept of

balance recognises that there is an implicit trade-off between benefit and risk – in other words,

there is (perhaps within limits) no level of benefit, or of risk, that on its own should tip the scales and

lead to a treatment being approved (or not approved) for the market; rather, treatments with a high

level of risk should balance this with a high level of benefit, and treatments with modest benefits

should only be associated with modest risks. This reflects the fact that there is a range of needs and

expectations in the patient population. Not all patients require or demand the same level of

effectiveness, but those who opt for more aggressive therapy may do so in the knowledge that the

risks may also increase.

This trade-off or balance is the key consideration in getting drugs through development, clinical

trials, and ultimately onto the market. This process culminates in a benefit-risk assessment, when

the drug’s clinical benefits and risks (in other words, its efficacy and safety profile) are weighed

against one another to determine whether it is fit for general use 2. Such an assessment is typically

carried out by a jurisdiction’s medicine regulator when it considers whether to issue a marketing

licence for the treatment in question, and periodically thereafter as new data emerges. The

pharmaceutical company that manufactures the drug may also carry out similar assessments to

support its licence application/renewal, or earlier during the development process.

I.1.3 Benefit-risk assessment in practice

A real-word benefit-risk assessment carried out for regulatory purposes should reflect the latest

clinical evidence, which may be drawn from several studies. Various data sources may be available,

but those most likely to be suitable for benefit-risk assessment are:

• Randomised controlled trials (RCTs) are generally seen as the gold standard for assessing the

relative efficacy and safety of two or more treatments; the randomised design eliminates

the selection bias that can occur in observational studies, ensuring the treatment groups are

comparable and avoiding confounding between treatment allocation and outcome. The

main limitations of RCTs for benefit-risk assessment are that they are of limited duration and

are carried out in relatively small numbers of patients, and thus cannot always establish

outcomes that occur rarely or take a long time to manifest.

• Post-marketing studies are carried out after a drug has been through the RCT process and

released onto the market3. These studies typically use observational designs which leave

their relative effect estimates more open to bias, and so are not often used to supplant RCT

evidence, especially for efficacy outcomes. However, since post-marketing studies can

follow much larger groups of patients for much longer periods than RCTs, they can be a

Chapter I.1

21

valuable source of data on rare and/or long-term adverse events which could not be

measured in a trial and may be important in a benefit-risk assessment.

• Registries typically collect information on a routine basis from a large number of patients

distributed across multiple sites in a healthcare system, and so may be well placed to

provide data on a set of patients that is highly representative of the target population for a

given decision. As such, registry data can be useful for estimating the baseline distribution

of outcomes experienced among untreated patients (or patients on the current standard of

care). However, as a form of observational data, and with enrolment and data entry

practises that may vary between sites and among personnel, registry data is usually less

suitable for deriving effects estimates4.

Evidence synthesis methods such as meta-analyses may help to combine the study results into a

coherent overall picture. However, with or without such techniques, there are a number of

complicating factors that can cause difficulties in gathering and combining the evidence, and

establishing the appropriate balance of efficacy and safety, including:

• Comparators: If alternative treatments already exist for the same indication, it may

sometimes be appropriate to assess the new drug’s benefit-risk profile relative to these

comparators, rather than in isolation 2. Hereinafter the term “decision set” will be used to

refer to the group of treatments that are included in a benefit-risk assessment – in other

words, the drug in question and any relevant comparators. The more treatments are in a

decision set, the more complex the assessment process.

• Multiple benefits and risks: within each of the categories “benefit” and “risk” there may be

several clinical outcomes to consider; and the set of outcomes may vary between a

treatment and its comparators. It can take a significant amount of time and effort to

examine what can be a large volume of data and pick out a coherent set of outcomes on

which to base the assessment 5. Furthermore, if evidence syntheses are performed, these

should reflect the possibility of correlations between outcomes.

• Few source studies: At or soon after the point of licensing, the number of studies providing

data on a drug is likely to be minimal; there can remain therefore a good deal of uncertainty

in the evidence.

Chapter I.1

22

• Limited, sparse or heterogeneously defined data: It may not always be possible to find

clinical evidence for all of the relevant clinical outcomes for each treatment in the decision

set. Additionally, studies may adopt different definitions or measurement scales for a given

outcome, leading to compatibility problems.

• Establishing appropriate trade-offs: Although it may sometimes be clear, the appropriate

level of trade-off between benefits and risks is, in general, a question of subjective value

judgements.

Traditionally, regulatory benefit-risk assessments have been carried out by committees who

consider the clinical evidence (typically presented as a written summary of individual study findings)

and come to a judgement regarding the overall benefit-risk balance. Often no attempt is made to

present source data side-by-side in a form suitable for direct comparison. In the early 21st century

there arose concerns that this approach lacks rigour and transparency 6. The factors listed above

contribute to a highly complex evidence base that will often simply be too difficult to weigh up

reliably unless one can perform additional analyses to elucidate the key differences in outcomes

between treatments and/or work through the implicit value trade-offs. Attempting to disentangle

all of the strands of such a problem “in one’s head” can lead to poor decisions because there is a

limit on the number of factors people can weigh up simultaneously, meaning that some aspects of

the problem may be misjudged or overlooked 7,8.

Structured benefit-risk assessment methods have been gaining momentum as a way of addressing

these concerns.

I.1.4 Structured/quantitative benefit-risk assessment

Recently, regulators including the European Medicines Agency (EMA) in the EU, MHRA (Medicines

and Healthcare Regulatory Agency) in the UK and Food & Drug Administration (FDA) in the USA

began to show interest in the use of more formal decision-making techniques for benefit-risk

assessment.

In 2009 the EMA embarked on a three-year project looking into the feasibility of adopting methods

from decision theory for this purpose9. Other similar initiatives were launched, on both sides of the

Atlantic, in collaboration between regulators, pharmaceutical companies and academics 10-12. These

projects identified a number of methods that may be suitable and explored these via applications to

a number of topical problems in the field of drug regulation.

Chapter I.1

23

These methods range from simple stepwise frameworks that encourage structured thinking and

documentation of the decision process 11,13, through to fully quantitative decision analysis

techniques, of which a leading example is multi-criteria decision analysis (MCDA) 14. Quantitative in

this sense means that preferences for specific benefits and risks are explicitly incorporated in the

assessment and used to weigh the effects of each treatment 5. Preferences can be elicited from

patients or other stakeholders. Modelling preferences in this way is somewhat novel in the health

sciences and is not always straightforward, but can provide evidence fundamental to understanding

and making decisions. It may be worth noting that in their definition of “evidence-based” medicine,

Sackett et al refer to “thoughtful identification and compassionate use of individual patients'

predicaments, rights, and preferences in making clinical decisions”1.

Such methods have begun to make an impact on industry and regulatory practice. The EMA has

issued guidance on benefit-risk assessments stating that “the assumptions, considerations, and

judgement or weighting that support the conclusions of the benefit-risk evaluation should be clear”

and acknowledging that quantitative methods may sometimes be used2. Regulators in the USA have

begun carrying out their own elicitation studies in order to inform real-world decisions 15.

I.1.5 Multi-Criteria Decision Analysis (MCDA)

MCDA is a formal framework for breaking down complex decisions into a series of simpler

judgements that logically lead to an overall solution. The key value trade-offs are identified and

addressed (e.g. how many occurrences of a particular adverse event can be tolerated for a given

level of benefit), facilitating critical thinking about the problem and transparent communication of

the final decision. MCDA in the broad sense refers to a family of related methods, with histories of

use in various fields, with minor differences in their formulations and terminology. Most

implementations of MCDA require decision makers to explicitly specify their value judgements

(preferences) in quantitative terms, as shown in the example below.

Table 1 shows the proportion of patients experiencing the key benefit and side effects of two

treatments for a fictional chronic disease. A patient faced with choosing between these two

treatment options must decide whether the additional chance of benefit on Drug B (15%) outweighs

the elevated risk of cardiovascular events (4%), a potentially serious side effect.

Table 1 - Proportion of patients experiencing effects of treatment for a fictional chronic disease.

Treatment >50% reduction in disease symptoms

Cardiovascular events

Drug A 30% 0%

Drug B 45% 4%

Chapter I.1

24

In this simple example, with only one trade-off to consider, it is probably not so difficult to come to a

decision without any further analysis. A decision maker can simply weigh up a 4% increase in

cardiovascular events However, consider Table 2 which includes evidence on 2 additional risks, liver

damage and seizures.

Table 2 - Proportion of patients experiencing effects of treatment for a fictional chronic disease.

Treatment Benefit: >50% reduction in disease symptoms

Cardiovascular events

Liver damage Seizures

Drug A 30% 0% 3% 2%

Drug B 45% 4% 1% 0

This time there are more trade-offs to consider and the problem starts to become too complex to be

handled in the decision maker’s head – particularly if the decision maker is a regulator with

professional responsibility for public safety. Simply forming an opinion without explicit

consideration of the underlying trade-offs is not likely to be a satisfactory approach; regardless of

the decision maker’s confidence in his or her judgement, the decision that is ultimately made should

be defensible as being transparently based on sound evidence and reasoning. MCDA involves

breaking the problem down into a set of simpler trade-offs and clearly setting these out. Suppose,

for example, the decision maker forms the following opinions regarding the various effects of

treatment: (i) a cardiovascular event is the most serious outcome to avoid; (ii) reducing disease

symptoms by 50% is about half as important as avoiding a cardiovascular event; (iii) a seizure is also

about half as important as a cardiovascular event; (iv) liver damage is only one quarter as important

as a cardiovascular event. These judgements can be expressed as a vector of weights, shown as an

additional row in Table 3.

Table 3 - Proportion of patients experiencing effects of treatment for a fictional chronic disease – with illustrative weights

Treatment Benefit: >50%

reduction in

disease

symptoms

Cardiovascular

events

Liver damage Seizures

Drug A 30% 0 3% 2%

Drug B 45% 4% 1% 0

Weight 22% 45% 11% 22%

Chapter I.1

25

The overall weighted effect in favour of Drug A is then calculated as the weighted average of the

individual effects (paying careful attention to the signs so that a positive sign indicates that the

effect favours Drug A and a negative sign favours Drug B), i.e.

Net benefit on drug A = 22% x (30% - 45%) + 45% x (4% - 0%) + 11% x (1% - 3%) + 22% x (0% - 2%)

= -2.2%

In other words, the overall weighted effect shows that on the basis of the specified preference

weights, Drug B is the favoured choice. This is an idealised illustration of the MCDA approach using

a simplified version of the method (sometimes referred to as net clinical benefit 16). By explicitly

valuing the trade-offs underlying decision, and then putting those values together with the data in a

principled fashion, the logical course of action is revealed. Furthermore the decision has been made

on a transparent basis, facilitating critical appraisal or future reviews, and helping to ensure

consistency with other decisions. Another important benefit of the method is the ability to

sensitivity-test decisions by repeating the analysis with different assumptions, clinical data values or

preference trade-offs.

The tables above are examples of effects tables, displays used in benefit-risk assessment which

show data for all the key favourable and unfavourable effects for the treatments in the decision set.

I.1.6 Uncertainty, decision making and Bayesian statistics

The deterministic nature of MCDA has often been recognised as problematic. The problem is not

insurmountable, but exactly how uncertainty should be incorporated is an open question; this issue

has been acknowledged in healthcare 17-19 and other fields 20-23.

Various methods have been proposed to handle uncertainty in preferences, including:

o Altering the elicitation tasks to include direct elicitation of uncertainty levels 24-26 or

preference intervals 27,28. These techniques require participants to state not only point

estimates of their preferences in the usual way but also to provide some indication of the

certainty of those estimates. A number of variations on this technique exist – to give just

two examples, participants might be asked to suggest a plausible range for the estimate, or

to estimate probabilities from the cumulartive distribution function at various points in the

distribution. These responses can be used to derive estimates of the underlying

distributions – for example, if the preference distributions are assumed to follow specific

parametric forms then their parameters can be estimated from elicited ranges by treating

Chapter I.1

26

them as confidence intervals 27. There are some problems with this approach, however.

Firstly, it may increase the cognitive burden on participants. Secondly, translating the stated

ranges or certainty measures into estimates of the actual preference parameters may rest

on some rather strong assumptions regarding both the shape of the underlying preference

distributions and the participants’ ability to characterise those distributions

accurately.Conducting one-way sensitivity analyses 29. This is a simplistic form of analysing

the impact of uncertainty, performed by repeating the decision analysis with different values

for the preference parameters. The key problem with this approach is that provides no

information on the probability of those values being observed, and hence gives no sense of

the distribution of the results.

o Stochastic Multicriteria Acceptability Analysis (SMAA) 30, which assumes complete

uncertainty over preferences (or minimal information such as criteria rankings only), and

performs an analysis treating all possible weight combinations as equaly likely. Where no

elicited preference data exists, this may be a reasonable approach, but where there is some

data available, SMAA cannot make best use of it to narrow down the estimates31,32.

o External estimation of probability distributions: This approach involves estimating the

distribution of preferences outside the main analysis model (for example by bootstrapping a

sample of preferences33). The resulting distributions can then be fed into the main MCDA

model. This approach will tend to require tailoring to the specific study and sample on

which it is based, and is therefore difficult to generalise to arbitrary datasets; it also requires

the distributions to be estimated separately to the main model, which may be laborious,

approximate, and lacking in elegance compared to a holistic one-step analysis.

I would argue however that the best approach may be a long-established paradigm that allows the

parameters of a system (here, the preferences and treatment effects) to themselves be regarded as

random variables subject to uncertainty. I refer of course to Bayesian statistics, which provides a

principled means for constructing credibility distributions of statistical parameters given observed

data. In the Bayesian paradigm there is no need to augment the data with additional questions

relating to uncertainty, as the uncertainty level is inferred from the data and model structure.

Furthermore the parameter distributions are informed and constrained by the evidence, not

completely uncertain as in SMAA. Unlike in one-way sensitivity analyses, a full characterisation of

the parameter distributions is obtained.

Bayes’ theorem, and the principles of Bayesian inference, have been known about for hundreds of

years, and their potential applicability to decision making has long been recognised34. Owing in part

Chapter I.1

27

to computational difficulties, however, applications in medicine were few and far between

throughout much of the 20th century. Finally, in the 1990s, advances in computing made it feasible

to sample from arbitrary posterior distributions using Markov Chain Monte Carlo (MCMC)

techniques, leading to an increase in applied Bayesian work 35.

The use of statistical methods to assist decision making is not novel in healthcare. Meta-analysis, for

example, has long been associated with systematic reviews of evidence, functioning as an important

tool for aggregation and efficient communication of the results. Nevertheless it has been argued

that statisticians should do more to bridge the gap between their analyses and decision-making, and

that Bayesian thinking and utilitarianism are natural tools for this task36-38. Others have noted the

promise of Bayesian methods for benefit-risk assessment 39. There are a number of reasons why

Bayesian thinking is well suited to the problem of incorporating uncertainty in MCDA and benefit-

risk assessment, such as:

- The long alliance between Bayes and decision making under uncertainty 34,38;

- Treating parameters as random variables translates well to inference on functions of

multiple parameters;

- The ability to supplement evidence with priors if data is lacking;

- MCMC allows construction of (almost) arbitrarily complex models; provided the likelihood

and priors can be specified, there is no need for closed-form posteriors. Similarly, the

distribution of any derived variables can be obtained.

Chapter I.2

28

I.2 Purpose of the research

I.2.1 Motivations

Researchers, pharmaceutical companies and regulators have begun to show an interest in

quantitative benefit-risk assessment using MCDA-style approaches. It is recognised that such

methods can be valuable in helping decision makers to make sense of the data and clarify the

decision process. MCDA is a relatively new and unfamiliar method in the health sciences, however,

and - even in its standard, simple determistic form - is not always implemented well, both in terms of

the technical details of its application and the suitability of its underlying assumptions. One frequent

mistake often seen in health science applications is the confounding of outcome importance with

incidence (see III.1.3.3). Even with a good understanding of the method, significant practical

difficulties remain that must be overcome before the use of MCDA for benefit-risk assessment can

become more widely adopted. A summary of some of the pitfalls to watch out for – and mistakes

made by existing studies – has been provided by Garcia-Hernandez 40.

However, if done well, the use of methods such as MCDA could bring benefits. Incorporating patient

preferences in health decision making is recognised as advantageous in theory 41 and alignment of

treatment prescriptions with patient preferences has been shown to result in greater treatment

satisfaction 42.

Many of the challenges in using MCDA to support benefit-risk assessments are statistical in nature.

As noted above, summaries of multiple clinical outcomes of multiple treatments derived from

multiple studies may need to be combined, pushing the limits of current meta-analytical techniques.

Benefit-risk is by nature a multidimensional problem, but typically meta-analyses are restricted to a

single outcome and a single treatment contrast, and extending these dimensions is difficult to

handle in a rigorous manner – one cannot simply carry out separate analyses of the various contrasts

and outcomes, as they are linked by correlations and consistency relations. Extensions to either

multiple contrasts or multiple outcomes exist 43-47, but models that can handle both situations are

few and none are entirely satisfactory, for reasons that will be discussed at greater length in the next

chapter. Further problems arise when the evidence base is sparse or outcome definitions are not

consistent.

One notable challenge with using MCDA for healthcare decision-making is the methid’s deterministic

nature. The standard version of MCDA makes no allowance for sampling error or other uncertainties

in the data, or the preference parameters, and therefore no indication of the robustness of the

conclusions. A Bayesian implementation of MCDA would go some way towards addressing this, and

Chapter I.2

29

demonstrating that such an analysis is possible is the main focus of this thesis. Bayesian (or other

probabilistic) modelling of outcome preferences is again not straightforward because of correlations

and consistency relations (see below) among them. Allowing for correlations among parameters is

particularly important in MCDA, because the method computes an overall score as a linear function

of the individual parameters, and this means that the presence of correlations (or equivalently,

covariances) contribute additional terms to the variance (since 𝑣𝑎𝑟(𝑎𝑋 + 𝑏𝑌) = 𝑎2𝑣𝑎𝑟(𝑋) +

𝑏2𝑣𝑎𝑟(𝑌) + 2𝑎𝑏𝑐𝑜𝑣(𝑋, 𝑌) where X, Y are random variables and a, b are linear coefficients).

“Consistency relations” refers to relationships among the model parameters that must hold in order

for the estimates to have a coherent logical interpretation when considered as a whole. Briefly, this

means that (for example) when a parameter representing the difference between quantity A and

quantity B is added to a parameter representing the difference between quantity B and quantity C,

the result always corresponds to the difference between A and C. Consistency is a key concept

underlying the models in this thesis; subsequent chapters discuss in more detail the relevance of

consistency relations to estimating treatment effects (see II.4.2), mappings between outcomes

(II.4.6), and preferences (III.2.1.4).The use of elicited preference values is particularly novel in the

field of benefit-risk assessment, and raises interesting questions relating to the uncertainty and

homogeneity of preferences among patient populations and other stakeholders, how to make

decisions in the face of preference heterogeneity, and whether the elicitation process introduces

further uncertainties that may impact the elicited results. In principle at least, Bayesian statistics

(with its perspective of parameters as random variables) can provide a convenient framework for

propagating preference uncertainty and examining/accounting for heterogeneity (for example using

random effects or hierarchical models). There has to date been little research into Bayesian

modelling of elicited preferences in healthcare, however.

For its utility as a benefit-risk assessment tool to be established, the Bayesian MCDA approach must

be shown to be able to address the issues discussed above for real-world benefit-risk decision

problems.

I.2.2 Research question

The overall question this research addresses is:

“Can a modelling framework be developed that facilitates a fully Bayesian implementation of MCDA

for benefit-risk decision making; with parameters for clinical outcomes and associated preferences

directly informed by real-world data, and reflecting the uncertainties inherent in such data, while

respecting all relevant correlations and consistency relations?”

Chapter I.2

30

Here “real-world data” refers to evidence obtained from actual studies carried out in the target

patient population (or other substantially similar population), which would be considered

appropriate to inform a real regulatory benefit-risk assessment. It excludes data consisting of

assumptions or estimates made by the decision maker, or fictional or idealised data fabricated

purely to illustrate the methodology.

Chapter I.3

31

I.3 Methods/strategy

I.3.1 Work plan and thesis structure

The modelling work involved in this project divides into three parts, dealt with in Chapters II, III and

IV in turn. Chapters II and III deal with methods for Bayesian inference ofthe two types of data

needed for quantitative benefit-risk assessment using MCDA: in Chapter II, clinical data concerning

the effects of treatments; and in Chapter III, preference data concerning the relative importance of

those effects.

Specific aims and objectives are set out in each of Chapters II and III. In each case the same broad

work strategy will apply:

• Identify key modelling issues to be addressed

• Review the literature for existing methods

• Try to develop new approaches where existing methods fall short

• Test approaches on case study data

• Evaluate the results and draw conclusions

Having developed the necessary methodological tools in Chapters II and III, Chapter IV then brings

the parameter distributions together and performs the MCDA calculations using probabilistic

simulation, addressing the overall goal of creating a full Bayesian benefit-risk assessment model.

Finally, chapter V reflects on the overall findings and puts the results in context.

I.3.2 Case study

A case study based on relapsing multiple sclerosis treatments will be used throughout the thesis to

work through the methodological issues.

I.3.3 Software

The project will require complex modelling of clinical evidence and preferences, possibly from a

diversity of data sources. The uncertainty from the various data sources will be propagated through

to final benefit-risk decision model. This means that the modelling approach will need to be

modular, multivariate and customisable.

For this reason no attempt will be made to derive closed-form posteriors at any stage of the

modelling, as this would impose tight restrictions on the model structure. Instead the approach will

be to construct complex models that simultaneously define the priors, the likelihood of the various

data inputs and the derivation of the variables that determine the overall benefit-risk balance.

Chapter I.4

32

MCMC techniques can then be used to sample from the joint posterior, with the result that the

uncertainty of the inputs is automatically propagated to the outputs.

Models will be specified in the BUGS language 48 and run using either WinBUGS (version 1.4.3) or

OpenBUGS (version 3.2.2). Any frequentist analyses required for comparison purposes will be run

using R (version 3.2.1).

I.4 Literature search

This section sets out the literature search and strategy and results, together with an overview of the

literature in the field of quantitative benefit-risk assessment. Chapters II and III each include a

synopsis of the relevant technical literature drawn from a wider field.

I.4.1 Search strategy

The literature search strategy comprised three searches in parallel, as detailed in the following

subsections and carried out on the PubMed, Scopus and Web of Science databases. After the

searches a screening process was carried out, in which duplicate records found by more than one

database or search are discarded, and references are examined for relevance, first based on title,

then abstract, and finally the full text. Forward and backward citation tracking was also carried out

on key references in order to pick up any additional publications of interest.

I.4.1.1 Search 1: Quantitative benefit risk assessment

Purpose: to establish the current state of the art regarding applications of quantitative benefit-risk

methods that either (i) are Bayesian or otherwise probabilistic; or (ii) focus on preference elicitation

methods

Scope: Journal articles, reviews and books, no time limit.

Keywords:

ANY OF:

AND

ANY OF:

AND

ANY OF:

“benefit risk” “risk benefit” “benefit and risk” “risk and benefit” “benefit harm” “harm benefit” “benefit and harm” “harm and benefit” “net benefit” “net clinical benefit”

“Bayesian” “probabilistic” “stochastic” "AHP" "Analytic Hierarchy Process" "Swing weighting" "MACBETH" "Measuring Attractiveness by a Categorical Based Evaluation Technique" "DCE" "discrete choice"

“MCDA” “MCDM” “MAUT” “multi criteria” “multiple criteria” “multi attribute” “multiple attributes” “multi outcome” “multiple outcomes” “multi endpoint” “multiple endpoint” “multivariate” “quantitative” “weighted” “utility”

Chapter I.4

33

I.4.1.2 Search 2: Bayesian meta-analysis

Purpose: to avoid duplicating work developing the meta-analytical techniques required for the

project.

Scope: Reviews and books, no time limit.

Keywords:

ANY OF:

AND

ANY OF:

“meta analysis” “evidence synthesis” “indirect treatment comparison” “mixed treatment comparison”

“Bayesian” “probabilistic” “stochastic”

I.4.1.3 Search 3: Preference elicitation

Purpose: to identify any existing methods for analysing stated preference data with explicit

allowance for sampling variability, potentially from any field of study

Scope: Journal articles, reviews and books, no time limit.

Keywords:

ANY OF:

AND

ANY OF:

AND

ANY OF:

"preference" "value" "utility" "weights" "judgements" "choice model" "stated preference" "conjoint analysis"

"Bayesian" "probabilistic" “regression” “stochastic”

"MCDA" “MCDM” “MAUT” "multi criteria" "multi attribute" "multiple attributes" "multi outcome" "multiple outcomes" "multi endpoint" "multiple endpoint" "AHP" "Analytic Hierarchy Process" "Swing weighting" "MACBETH" "Measuring Attractiveness by a Categorical Based Evaluation Technique" "DCE" "discrete choice"

Chapter I.4

34

I.4.2 Literature search flowchart

Figure 1 summarises the steps of the literature review. Searches were run on 2 June 2016.

Figure 1 – Literature search flowchart

I.4.3 Literature on quantitative benefit-risk assessment

MCDA has been used in various fields dating back at least to the 1970s; however, its use in benefit-

risk assessment of medicines is a more recent development. CIOMS (Council for International

Organisations of Medical Sciences) Working Group IV (1998) noted that it would be desirable if

regulatory decisions could be made on a firmer, more quantitative basis 49. The idea gained

momentum throughout the following decade and in 2009 two major European initiatives were

launched with the aim of evaluating the usefulness of quantitative benefit-risk assessment

methods9,10. A number of authors noted that MCDA could in principle be applied to benefit-risk

problems and/or demonstrated simple deterministic examples14,29,50.

Quantitative benefit-risk assessments allowing for uncertainty in the input parameters can also be

found in the literature, although these frequently apply only to specific examples with relatively

Initial search hits

•Search 1: PubMed 66, Scopus 183, Web of Science 93



•Total before duplicates removed: 4019

Removal of duplicates

•1389 duplicates removed

•Total 2630 hits remaining

Title and abstract

screening

•1967 rejected


Full text screening

•504 rejected


Citation tracking

•61 added

•Total 220 citations remaining

Chapter I.4

35

simple models or problem structures, rather than presenting a generalizable framework for

uncertainty in MCDA.

In 2005 Sutton et al used Bayesian modelling to derive distributions of the benefit-risk balance of

warfarin, an anticoagulant 16. The decision model used was known as Net Clinical Benefit (NCB) and

essentially corresponds to a special case of MCDA with binary outcomes and linear utility functions,

simplifying the statistics required. Furthermore, the case study was relatively simple with only one

benefit, one risk and two treatment options. Although this paper illustrates the value of a Bayesian

quantitative benefit-risk assessment, the net clinical benefit framework is somewhat restrictive,

limiting the applicability of this approach to other problems.

Hughes et al. used probabilistic methods to allow for uncertainty of treatment effects in benefit-risk

assessment using a “decision tree” framework that is very similar to MCDA 6. However, their

method relies on the existence of evidence from studies directly comparing each treatment of

interest, which is not always available. Caster et al. extended this approach by considering the

uncertainty of utilities given only qualitative preference data51. In both cases the approach used was

effective but not fully generalizable.

Stochastic multi-criteria acceptability analysis (SMAA) is a variation of MCDA, designed for situations

where preference weights are unknown or only partially known 30. SMAA ranks alternatives by

exploring all possible combinations of weights using MCDA and calculating how often each

alternative is chosen. SMAA has been aplied to benefit-risk assessment 31, using a specialised

software package 52 to carry out a probabilistic benefit-risk assessment using Monte Carlo

simulations to allow for uncertainty in both clinical parameters and preferences.

SMAA provides a useful computational approach for obtaining results from a multi-criteria decision

model in the absence of clear preference data, but it does not provide any guidance on how the

underlying parameter distributions can be derived starting from clinical data or from elicited

preferences. In a world of evidence-based medicine, sound methods are needed for making

inferences from real data and these are absent from the standard SMAA approach.

Waddingham et al carried out a Bayesian quantitative benefit-risk assessment of natalizumab for

relapsing-remitting multiple sclerosis, using MCDA53. The Bayesian model was applied successfully

but the modelling approach was overly simplistic, lacking correlations, and required a number of ad

hoc alterations to fit the data; a more rigorous and generalisable approach is needed.

MCDA in healthcare has generated enough interest that a number of “good practice” guides have

appeared in recent years 5,54,55.

Chapter II.1

36

II. Bayesian synthesis of clinical evidence for benefit-risk assessment

II.1 Background, aims & objectives

II.1.1 Introduction

In medical science, there is a close alliance between decision-making, systematic reviews and meta-

analyses. These are highly related disciplines, each with its own particular emphasis but all relating

to the process of gathering, summarising, and interpreting existing evidence.

Meta-analysis, or evidence synthesis, focuses on the quantitative, statistical aspects of this process.

It is essentially a technique for combining treatment effect estimates from multiple clinical studies to

give an overall “average” estimate. Much of the modern discipline of meta-analysis was pioneered

by Gene Glass in the late twentieth century 56 but its roots in clinical research go back at least as far

as 190457.

The simplest and most familiar form of meta-analysis is known as pairwise meta-analysis because it

focuses on a single pair of treatments. A set of head-to-head studies involving both treatments is

identified, and some relative outcome measure (such as a difference in a continuous outcome, an

odds ratio, or a hazard ratio) is extracted from each study; the overall combined estimate can be

derived in a number of ways but essentially represents an average of the individual study estimates,

weighted by the inverse of their variances.

Researchers have found it necessary to extend/adapt this approach in various ways order to

estimate two or more related parameters that inform a decision. This family of approaches is

sometimes referred to by the umbrella term evidence synthesis. For example, one may wish to

jointly estimate a set of probabilities that inform a decision model, or to combine parameters

estimated from distinct sets of studies 58. Such methods are often employed in health economics -

where clinical outcomes and costs, and their relationships with one another, are to be considered

jointly - and for similar reasons, would appear to be well suited to benefit-risk assessments.

It is often tempting, but dangerous, to perform “evidence synthesis” in a naïve/informal manner.

One common example of poor practice is to pool randomised clinical trials with respect to a binary

event by simply adding up, for each treatment, the patient and event numbers in each trial arm

featuring that treatment (“naïve pooling”). This method, although holding great appeal due to its

simplicity, is not to be recommended. Within-study relative effects are generally more

homogeneous than the absolute baseline level of outcome, which often varies widely between

studies in different groups of patients. Randomisation is used within trials to eliminate the risk of

Chapter II.1

37

confounding the effects of treatment with variations in the baseline outcomes of etween study

populations, 59but naïve pooling derives the relative effect at the between-study level, comparing

groups of patients from different trials on a pooled basis. This defeats the object of randomisation,

which is to ensure that relative effects are only derived from comparisons between groups that

share the same baseline characteristics. Ultimately the naïve pooling approach results in evidence

that is of a similar grade to data from observational studies, in that it may be biased due to

confounding between the effects of treatment and any baseline differences in patient

characteristics59. The magnitude of this bias will depend on the extent of heterogeneity in baseline

outcomes between the source studies, and may be acceptably small if the study populations are very

similar; it can be avoided altogether, however, by using a more principled method to derive the

combined effects.

Formal evidence synthesis methods should be designed to eliminate any such confounding

influences and obtain estimates on a scale that is suitable for comparison 60. This is arguably of

particular importance when the treatment effect estimates are to be fed into quantitative decision

models such as MCDA, as any model is only as reliable as the data it is given. However, the

complicated nature of some benefit-risk assessments can present significant obstacles to evidence

synthesis. In particular, the following factors (many of which were also discussed in the previous

chapter) can be challenging to deal with:

Comparators

There may be a need to compare two or more treatments where no direct head-to-head studies

have been carried out. A simple example of such an indirect comparison is when one wishes to

compare two drugs that have each only been clinically evaluated alongside placebo (or some other

standard treatment), not directly against each other. A naïve comparison of the absolute level of an

outcome in the active arms of both studies will confound the difference between the treatments’

effects with the difference in characteristics between the two study populations. In this simple case,

it is straightforward to adjust the estimate to eliminate this confounding by subtracting the

difference in outcome between the untreated populations, i.e. the two placebo arms. This is

equivalent to comparing the relative effect of treatment (active vs placebo) in the two studies.

Generalising this technique to more complex datasets, and combining direct and indirect evidence to

obtain a coherent set of estimated effects for three or more treatments while avoiding confounding,

requires a more technical level of analysis.

Network meta-analysis (NMA) has established itself as a powerful evidence synthesis technique for

such situations 61-63. The method is a generalisation of standard meta-analysis that accommodates

Chapter II.1

38

both direct and indirect comparisons. This can be represented by means of a network structure,

where each treatment is represented by a node and lines between nodes represent head-to-head

studies, as shown in the examples below. The use of network diagrams such as these is widespread

among applications of network meta-analysis 64.

Figure 2 – Network diagram: pairwise meta-analysis

Pairwise meta-analysis is concerned with a single contrast AB (that is, a comparison between two

treatments A and B). All the evidence is drawn from studies directly comparing the two.

Figure 3 – Network diagram: simple network meta-analysis (i). The numbers given are (estimate, variance) for the treatment contrasts in the direction indicated by the arrows.

When more treatments are involved, there may not be direct evidence on all the treatment

contrasts. Network meta-analysis provides a solution by following the chain of evidence around the

network. In the simple case shown in Figure 3, separate trials provide information on the contrasts

AB and BC; the missing contrast AC can be estimated as the sum AB + BC 60 under the assumption

that the contrasts are expressed on an additive scale, perhaps after a transformation (for example,

taking the logarithm of a contrast expressed on a multiplicative scale). Since AB and BC are

estimated from different trials, and hence independent, var(AC) = var(AB + BC) = var (AB) + var(BC),

and so the indirect comparison has greater uncertainty than the direct evidence – and the greater

the number of steps in the chain, the greater the additional uncertainty. Here, the indirect estimate

of AC has expectation 1+2=3 and variance 1+2=3. Note that even if treatment B is not of direct

interest, its inclusion in the dataset has allowed an estimate of the contrast AC. Furthermore, since

AB and BC are obtained within randomised studies, the estimate AC = AB + BC is itself free of most of

the confounding and bias that may occur in non-randomised comparisons 59. There is an

Chapter II.1

39

assumption, however, that a treatment contrast measured one study is representative of the same

treatment effect in the other study population(s). In other words, the populations are homogeneous

with regard to the relative treatment effects; there are no differences between the populations that

act as effect modifiers.

Figure 4 – Network diagram: simple network meta-analysis (ii). The numbers given are (estimate, standard deviation) for the treatment contrasts in the direction indicated by the arrows.

Sometimes both direct and indirect evidence contributes to a contrast estimate. In the example

shown in Figure 4 there is now information on the previously missing contrast AC (from a head-to-

head trial of A versus C). Attempting to combine the evidence to get a clear picture of the relative

performance of the three treatments is not straightforward since the evidence in some networks

(specifically, networks with closed loops such as that formed by the three treatments in Figure 4)

may show inconsistency. Network meta-analysis aims to resolve any inconsistency that may occur

due to random chance and produce a set of estimates that are entirely consistent with one another.

More systematic inconsistency between the treatment contrasts indicates uneven distribution of

effect modifiers between studies and invalidates the model’s assumptions 65.

Estimating each treatment contrast from the direct evidence alone (in this case, simply using the

numbers shown in Figure 4) is not satisfactory when there is any inconsistency between those

contrasts. It is axiomatic that relative treatment effects should be transitive (XY + YZ = XZ for any X,

Y, Z) and that the (additive) effect of a treatment relative to itself is zero. Consequently it must be

the case that AB + BC + CA = AA = 0; in this example however, using the direct estimates gives AB +

BC + CA = 1 + 2 - 2 = 1. In a similar vein, we can see that for any given contrast there is inconsistency

between the direct and indirect estimates; for AC, say, the direct estimate is 2 and the indirect

estimate is obtained as AB + BC, giving 3.

One might attempt to perform a meta-analysis on each treatment comparison independently,

deriving for each contrast an overall estimate that corresponds to an inverse-variance-weighted

average of the direct and indirect evidence. For example, the combined estimate for AC in the

Chapter II.1

40

example in Figure 4 would be 3

4× (1 × 2 +

1

3× 3) = 2.25. Unfortunately, however, following this

approach for all treatment contrasts in the network will also in general produce treatment effect

estimates that are not consistent with one another (verifying this for the example in Figure 4 is left

as an exercise). The need for consistency means there is mutual dependence among the treatment

effects that is not respected if we estimate them independently of one another.

Network meta-analysis solves this problem by building consistency into the model structure via one

or more consistency equations. The consistency equation for the example in Figure 4 is AC = AB +

BC.

The key insight is that the model does not need an independent parameter for every treatment

contrast; but rather only for a subset of them (the basic parameters). The consistency equations

provide a means of calculating the remaining contrasts (the functional parameters) from the basic

parameters 66. Using the Figure 4 example, it is only necessary to introduce a model parameter for

two of the three contrasts AB, BC and AC; these two are the basic parameters, and the final

(functional) treatment effect parameter is evaluated via the consistency equation. This creates the

necessary dependence among the full set of treatment contrasts. The statistical model is

constructed based on only the basic parameters, which are independent.

Any network structure can be used so long as it is connected 67 (that is, any two nodes in the

network diagram are connected by a chain of one or more studies). Studies involving additional

treatments (beyond those directly relevant to the decision) can sometimes be introduced in order to

connect a disconnected network. For example, a network consisting of the direct comparisons AB ,

BC, AC and DE is disconnected, but introducing a new treatment F allows studies of BF and DF to be

included, resulting in a connected network (Figure 5).

Chapter II.1

41

Figure 5 – A disconnected network involving treatments A, B, C, D and E (top) is made connected by the addition of treatment F (bottom).

The set of basis parameters is not uniquely determined – in principle the modeller has some choice

over which parameters are to be treated as basic and which as functional, subject to certain

constraints 66. In practice the basic parameters are usually chosen to be the set of treatment effects

relative to some chosen reference treatment (often no treatment, placebo or standard of care). This

results in the consistency equations for an arbitrary evidence network taking the form XY = AY – AX

where A is the reference treatment and X, Y are any other treatments.

Constructing the model this way ensures that a consistent set of treatment effects is obtained. The

data themselves, however, may be inconsistent. Any inconsistency in the evidence will ultimately be

reflected in the variance of the treatment effect estimates. Excessive inconsistency in the evidence

network (exceeding that which may occur due to random sampling error) indicates that the assumed

consistency equations do not hold, suggesting that studies are heterogeneous with regard to effect

modifiers and casting doubt on the validity of the analysis. Network meta-analysis thus provides a

principled framework for comparing a set of several treatments based on summary data fron clinical

trials (no data at the individual participant level is required), and therefore provides an ideal starting

point for the models to be explored in this chapter; however, models will be needed that go beyond

standard network meta-analyses.

Few source studies

Unlike typical meta-analyses, many benefit-risk assessments (i.e. those carried out on fairly new

drugs) must rely on a small number of studies for each treatment. Chance imbalances in the

distribution of any effect modifiers are more likely when the number of studies is low, potentially

increasing heterogeneity and inconsistency.. It is all the more important, therefore, to ensure the

Chapter II.1

42

uncertainty of the treatment effect estimates is allowed for in the decision process rather than

relying on deterministic methods.

Multiple outcomes

Benefit-risk assessment is by definition a multivariate problem, with at least one benefit and one risk

to consider. Extending network meta-analyses into the multivariate domain has been rarely

attempted, however. Most published NMAs include only one key outcome of treatment, or perform

separate analyses for several outcomes independently, ignoring any correlations between the

outcomes. It has been noted that this approach is not particularly satisfactory, as correlations are

almost certain to exist and ignoring them will tend to understate the uncertainty in model outputs 68.

Correlations between outcomes can occur at both the within-study and between-study levels. A

within-study correlation means that the outcomes exhibit some mutual dependency as their values

vary from patient to patient in a study; a between-study correlation indicates mutual dependency in

the average values of the outcomes from study to study.

Summary data

Although the availability of individual patient data (IPD) from clinical studies is improving, it cannot

be relied upon, especially for older studies. Furthermore, analyses based on IPD from multiple

studies may be more complex and time-consuming than a regulatory benefit-risk assessment would

allow. This thesis therefore concentrates on clinical data that is summarised by treatment. Clinical

IPD is considered beyond scope; however, where it is available it may provide an alternative

framework for dealing with some of the same issues (eg within-study correlations, heterogeneity).

Limited, sparse or heterogeneously defined data

One practical issue with performing a multivariate evidence synthesis is that not all of the source

studies may have data on every outcome of interest. Furthermore, for some outcomes there may

be several alternative definitions adopted by different studies; this can be extremely frustrating for

reviewers who may end up with a piecemeal collection of outcomes, clearly clinically related to one

another yet too different to be reliably pooled for a meta-analysis.

Datasets with missing outcomes for some treatments, and/or with outcome definitions that are not

consistent from one study to the next, are hereinafter referred to as patchy.

It is not difficult to see that the ability to model a direct relationship between outcomes within a

meta-analysis could be invaluable in terms of maximising the useful information that can be gleaned

from patchy data, and even imputing missing outcomes for certain treatments. Developing such a

model is therefore the focus of this chapter.

Chapter II.1

43

II.1.2 Aim, objectives, scope

The overall aim of this part of the project is to establish a generalised method for Bayesian

multivariate evidence synthesis that is designed with the ability to exploit relationships between

outcomes in patchy datasets, so that the number of treatment effects that can be estimated is

maximised andall treatments can be compared in respect of all outcomes, such as may be required

in a benefit-risk assessment.

The methodology will be based on network meta-analysis. It may sometimes be the case that a

benefit-risk decision only relies on a single study, or a pairwise univariate or multivariate meta-

analysis, but there is no need to dwell on these methods here as they are already well established

(and besides, they are special restricted cases of the more general approach set out here). Instead

the chapter works within the framework of network meta-analysis, concentrating specifically on

issues relating to multiple outcomes and data sparseness which may be present in real-world

treatment decisions. Insofar as these issues are relevant to pairwise meta-analysis, or other

restricted special cases of network meta-analysis, the discussion and results herein will also apply.

It is assumed that the data available consists of arm-level aggregate summaries of randomised trials

unless otherwise stated. In principle observational data could be used, provided that one recognises

and accepts the elevated risk of biases including selection bias and attrition bias in such studies, and

bear in mind that a model is only as good as the data supplied to it.

Specific objectives to be addressed are:

• to find or create a working Bayesian multivariate network meta-analysis model;

• to investigate (and if necessary, develop) the model’s ability to provide the types of

parameter estimates required in benefit-risk assessment such as missing treatment-

outcome combinations and outcomes on absolute scales;

• to develop a fully Bayesian interface between the evidence synthesis model and MCDA

models.

Methods for evaluating inconsistency in evidence networks will not be directly addressed in this

thesis. Nevertheless it remains important to verify that the evidence is consistent when carrying

out network meta-analyses. Methods for assessing inconsistency in univariate networks have

been discussed at some length in the literature 65,66 69 70; future research should aim to extend

these approaches to the multivariate models in this chapter.

Chapter II.1

44

II.1.3 Synopsis of literature

II.1.3.1 Evidence synthesis for benefit-risk assessment

The few existing attempts at evidence synthesis for quantitative benefit-risk assessments have

tended to use either very simple datasets and/or pragmatic approaches that do not strictly follow

the statistical principles underlying evidence synthesis.

A quantitative benefit-risk assessment of treatments for depression 31 used probabilistic simulations

of multiple clinical outcomes but this was limited to data from a single trial, and with no allowance

for outcome correlations, which are likely to exist and may be influential upon the results 40,68. Later

benefit-risk assessments of statins 54 and antidepressants 32 by some of the same authors used

network meta-analysis on multiple outcomes but did not allow for correlations or have to deal with

any gaps in the data.

Caster et al carried out a benefit-risk assessment of methylprednisolone in multiple sclerosis 71, but

obtained their data by analysing individual treatment arms from multiple studies, rather than

focusing on the contrasts between arms within studies. In general such an approach risks

confounding the effects of treatment with characteristics of the study populations, as it sidesteps

randomisation, and thus exposes trial data to many of the same problems as observational data.

Contrast-based NMA models, which calculate treatment effects within studies, respect

randomisation and are therefore preferred.

Among the reports and publications on benefit-risk by the PROTECT initiative 10 were a methodology

review that identified NMA as a potentially useful tool for benefit-risk assessment 14 , and two

applications using contrast-based NMAs to synthesise multiple outcomes for benefit-risk assessment

purposes 53,72. The models, however, included various ad hoc approximations and modifications in

order to patch together the available data at the expense of generalisability and rigour. Again, there

was no allowance for correlations between outcomes.

Beyond the field of benefit-risk assessment, however, a number of principled Bayesian evidence

synthesis methods have been developed, and recently extensions into the multivariate domain have

appeared.

II.1.3.2 Network meta-analysis

Bayesian meta-analysis methods gathered momentum around the turn of the millennium as Gibbs

sampling made such models more practical. Smith et al developed a generalised Bayesian random

effects model for pairwise meta-analysis of binary outcomes based on the log odds ratio 73. Warn et

Chapter II.1

45

al later extended this work 74, developing analogous models based on the relative risk and/or risk

difference.

The concept of network meta-analysis, making use of indirect comparisons between treatments as

well as direct randomised trial evidence, also emerged during this period 59,75,76, and its potential was

quickly recognised. An early version of NMA was proposed by Lumley 77 but arguably the most

successful framework is that of Lu and Ades 60 who showed how the Bayesian meta-analysis models

of Smith et al 73 could be extended to indirect comparisons. This was later developed into a highly

influential canonical framework 78.

Interest in the technique has since accelerated, with numerous applications 61 and adaptations79,80.

It has quickly gained acceptance as an important tool for researchers and modellers, features in

high-profile systematic reviews 81 and is covered in regulatory guidance 82.

II.1.3.3 Multivariate meta-analysis and network meta-analysis

Bayesian bivariate or multivariate (pairwise) meta-analysis has been proposed on a number of

occasions83-85, and recently some multivariate network meta-analysis models have appeared43-45.

Univariate network meta-analysis and (univariate or multivariate) pairwise meta-analysis can be

seen as special cases of multivariate network meta-analysis (with one outcome and two tretaments

respectively), as shown in the Venn diagram in Figure 6.

Figure 6 – Venn diagram illustrating the relationships between various types of meta-analysis model. Multivariate network meta-analysis is the most generalised model, with the other types of model corresponding to special cases.

Chapter II.1

46

Many of these models allow for correlations among outcomes, but only a few models in the

literature define any mappings or other structural relationships between outcomes (“Models with

linked outcomes”, Figure 6) . The models that do not link outcomes in this way can “borrow

strength” in the sense of reducing the posterior variance of the treatment effects, but they cannot

impute unreported treatment-outcome combinations in patchy datasets. To do this one must also

introduce structural relationships between outcomes into the model – in other words, to provide

equations that define explicit mathematical connections between the outcomes.

One approach that can be used is to specify structural relationships that can logically be seen to

follow between specific outcomes in a model, perhaps given some simple assumptions. For

instance, a one-year survival probability 𝑝1 and a two-year survival probability 𝑝2 are clearly related

and can be explicitly linked by the equation 𝑝2 = 𝑝12 assuming a constant hazard. Similar context-

specific relationships have previously been used in multivariate NMAs 86-88.

One particular model for pairwise meta-analysis with structural links between outcomes has

appeared in two published applications 46,47. In each of these datasets the outcomes all represent

the same underlying clinical concept, but are measured using different test instruments which

express the results on different scales. As such the outcomes are assumed to be in strict linear

correspondence with one another at the between-study level (the relationship may be less perfect

at the within-study level due to measurement error) 46,47. In other words, the model links outcomes

with linear mappingsbetween the study-specific treatment effects.

Linking outcomes to one another via a known mathematical relationship may certainly be useful in

multivariate evidence synthesis, but this approach relies on the form of that relationship being

straightforward to identify a priori. On many occasions, however, it may seem likely that outcomes

are related to one another but the precise nature of the relationship may not be clear a priori.

One possible approach in such situations might be to adapt the linear mapping model 46,47 so that

the mappings apply to to the population average treatment effect parameters. Although the

assumption of a proportional linear relationship between outcomes is a strong one, applying the

mappings at the population-average level allows the outcomes to be less than perfectly correlated at

the between-study level in a random effects model, thereby potentially allowing the mappings to be

used for outcomes that are more loosely related. Allowing the mappings to vary between

treatments may provide some additional “wiggle room” and permit such a model to be used where

outcomes do not always occur in exactly the same proportions. A similar use of mappings has been

proposed before and applied to Bayesian meta-analysis of HIV with a view to establishing surrogacy

Chapter II.1

47

relationships between outcomes89. The model was limited to two treatments and fixed effects,

however, and would need to be extended beyond these limitations to be useful as a general tool for

benefit-risk evidence synthesis.

Figure 7 – Pictorial representation of the types of meta-analysis model discussed in this section. Top row: pairwise models for comparing two treatments only. Bottom row: models for use in connected evidence networks (an example network is shown). Each node in the evidence network represents a treatment and lines connecting nodes indicate the existence of head-to-head trial evidence. Different coloured circles represent different outcomes (the multivariate models are here illustrated with three outcomes) and white lines represent structural links between the outcomes for each treatment.

In all current multivariate NMA models the treatment effect parameters in each trial are expressed

relative to a study-specific control treatment – in other words, the baseline for the relative effect

parameters may vary from study to study depending on which treatments are present. This results

in the variance of the treatment outcome in the baseline arm (when the baseline is an active

treatment) being lower than in the other arms. Arguably this may not be a substantial problem in

most circumstances since it concerns only the prior variance, which will only have a significant

impact on the results when actual data is scarce; and also since the main target of inference is the

relative effects, while the baseline outcome is merely a nuisance parameter. Nevertheless it may

still be possible to avoid the issue with an alternative parameterisation. Others have previously

identified these problems with the usual parameterisation 45 but as a solution proposed a model

with a baseline that is not always identifiable from the data, with no allowance for within-study

correlations, and which appears not to converge when applied to the RRMS dataset. The same

authors also develop an alternative “arm-based” parameterisation where the absolute treatment

Chapter II.1

48

outcomes in each study arm are modelled directly; however, this type of model is not favoured here

as it risks confounding the treatment effects with differences in trial sample characteristics 59.

Existing multivariate NMA models also have many important practical limitations, with every model

identified in the literature having at least one of the following restrictions:

• limits on the number/types of treatments and outcomes that can be incorporated;

• model code that is tailored to the dimensions of a particular dataset;

• requirement for the user to specify unwieldy covariance arrays in the data; and/or

• failure to allow for correlations that must or may exist between variables.

Chapter II.2

49

II.2 High level model structure

The evidence synthesis strategy will use the overall model structure depicted within the blue area of

Figure 8, which also shows how this fits together with the other modelling components to be

described in the next chapter (shown in faded tones).

The two key Bayesian models to be covered in this chapter are the treatment effects module, which

extracts and aggregates the relative treatment contrasts from a set of randomised controlled trials,

and the population calibration module, which applies those contrasts to the overall distribution of

outcomes observed across the population in a selected set of studies (perhaps some or all of the

same trials used to estimate the treatment effects, and/or alternative data sources) . This strategy

allows outcomes to be estimated on the absolute scale while also avoiding confounding between

treatment effects and population characteristics, and is often used in health economic models

where the absolute level of outcomes is of key importance90,91

Figure 8 - High-level model structure, focusing on clinical evidence synthesis.

Chapter II.3

50

II.3 Data

II.3.1 Data structure

The models are designed to work with any dataset consisting of a set of randomised controlled trials

with any number of treatment arms and outcome measures.

II.3.1.1 Network-level constants

Let 1, … ,𝑁𝑇 be a set of treatments, where t=1 is the reference treatment relative to which all

the other treatments’ effects are expressed, usually placebo. Let 1, … , be a set of outcomes,

and 1, … ,𝑁𝑆 a set of studies.

II.3.1.2 Study-level constants

For each study 𝑖 ∈ 1,… ,𝑁𝑆 the following constants are taken to be known:

𝑁𝐴𝑖 ∈ 1,2, … the number of treatment groups/arms

𝑁𝑂𝑖 ∈ 1,… , the number of outcomes within study 𝑖 ∈ 1,… ,𝑁𝑆

The treatment arms 𝑘 ∈ 1,… , NA𝑖 and outcomes 𝑗 ∈ 1,… ,𝑁𝑂𝑖 within study i are ordered such

that 𝑡𝑖𝑘 ∈ 1,… ,𝑁𝑇 refers to the treatment in the kth arm and 𝜔𝑖𝑗 ∈ 1,… , refers to the jth

outcome.

For each 𝑘 ∈ 1,… , NA𝑖, 𝑛𝑖𝑘 ∈ 1,2, … refers to the number of patients in the kth treatment arm.

II.3.1.3 Arm/outcome-level data

In all of the models that follow, each outcome 𝑗 within each arm 𝑘 of each study 𝑖 is assumed to

have one of the following likelihoods based on well-known distributions:

• Normal likelihood with the observed within-arm mean and sample variance supplied as

given data and denoted by 𝑦𝑖𝑘𝑗 and 𝑣𝑖𝑘𝑗 respectively. (In practice, the variances may

instead be supplied as standard errors or standard deviations provided that appropriate

transformations are applied to the formulae/code given herein. It is straightforward to

convert between these measures given the patient numbers.) Multivariate normal

distributions will be used to allow for correlations between outcomes, with the correlation

coefficients either estimated in the model or supplied as additional data.

• Binomial likelihood with the observed within-arm number of events (out of 𝑛𝑖𝑘) supplied as

given data and denoted by 𝑦𝑖𝑘𝑗

• Poisson likelihood with the observed within-arm number of events and person-years of

exposure supplied as given data and denoted by 𝑦𝑖𝑘𝑗 and 𝑐𝑖𝑘𝑗 respectively

Chapter II.3

51

Whichever likelihood is used, the mean of the distribution (and/or any higher-level model

parameters on which the mean depends) is the principal unknown quantity regarding which

inferences are to be made. These parameters relate to the underlying population mean for each

outcome and can be estimated via Bayes’ Theorem.

In formal terms, if 𝝋 refers to the set of treatment effect parameters that are the target for

inference, and 𝒚 is the vector of observed values 𝑦𝑖𝑘𝑗, then the likelihood of the data conditional on

𝝋 is 𝑃(𝒚|𝝋) and, by Bayes’ Theorem, the joint posterior distribution of 𝝋 is

𝑃(𝝋|𝒚) ∝ 𝑃(𝒚|𝝋)𝑝(𝝋)

where 𝑝(𝝋) is the joint prior distribution.

II.3.2 Dataset: Relapsing-remitting multiple sclerosis

Multiple sclerosis (MS) is a disease of the immune system characterised by aggressive immune

system action against the myelin insulation of the body’s own neurons. The resulting nerve damage

can cause a variety of physical, sensory and cognitive symptoms 92.

The most common type of MS is relapsing-remitting MS (RRMS). RRMS patients suffer periodic

symptomatic attacks (relapses) together with a more general trend of increased disability

progression over time. No cure for RRMS exists; but there are several disease-modifying therapies

that vary in their effectiveness in reducing the frequency of relapses and delaying clinical disease

progression, and in the nature and frequency of their side effects. For many years the standard first-

line treatments were injectable drugs, but recently a number of oral drugs have appeared on the

market that show potential as first-line therapies, as they can be easily self-administered and are

reasonably well tolerated: dimethyl fumarate, fingolimod, teriflunomide and laquinimod. Many

patients on the older injectable therapies are expected to switch to one of these new drugs 93.

However, coming to firm conclusions as to the relative merits of these treatments is hindered by is a

lack of direct trial evidence.

A recent Cochrane review of RRMS treatments 81 presented a network meta-analysis of dimethyl

fumarate, fingolimod, teriflunomide and laquinimod together with a number of other RRMS

treatments (generally these were either older treatments, or drugs with more safety concerns that

are reserved for more aggressive disease). The review was very thorough in its identification and

evaluation of trials for inclusion but, was limited to two efficacy outcomes (the proportion of

patients avoiding relapse and disability progression) and one safety/acceptability outcome (the

proportion of patients adhering to treatment), and no attempt was made to allow for any

correlations between these outcomes (although the existence of correlations seems likely).

Chapter II.3

52

Furthermore, there were different definitions of the efficacy outcomes among the source studies;

where studies provided two definitions, either only one version was extracted or both were analysed

in separate NMAs. The dataset used in this chapter is based on the studies in the Cochrane review,

but uses an expanded set of outcomes. The treatment options are also restricted here to the

established first-line treatments and second-generation oral drugs specified below, and only at the

usual prescribed dosages. 16 studies in total provided data 94-109, covering 8 active treatments and

placebo.

II.3.2.1 Treatments

Treatment regimens are defined in accordance with the substances, dosages, administration routes

and frequencies shown in Table 4; studies or study arms not meeting these definitions are excluded

from the dataset (see II.3.2.3 for details of the exclusions). Dosages were selected in accordance

with prescribing guidelines.

Table 4 – Treatments in the RRMS case study.

Abbreviation Substance Route of administration Dose & frequency

PL Placebo As appropriate to each study

As appropriate to each study

DF Dimethyl fumarate Oral 240g 2x daily

FM Fingolimod Oral 500g 1x daily

GA Glatiramer acetate Subcutaneous injection 20 g 1x daily

IA (IM) Interferon beta-1a Intramuscular injection 30 g 1x weekly

IA (SC) Interferon beta-1a Subcutaneous injection 44 g 3x weekly

IBB Interferon beta-1b Subcutaneous injection 250g 1x every 2 days

LQ Laquinimod Oral 600g 1x daily

TF Teriflunomide Oral 14mg 1x daily

Figure 9 is a network diagram showing where the studies in the dataset provide direct evidence

comparing the treatments above. A line between two treatments indicates the existence of a head-

to-head trial comparing them; the line’s thickness is proportional to the number of such trials (in this

case the maximum is three head-to-head trials, between placebo and glatiramer acetate). Not all

studies report all outcomes, however, so for any given outcome there may be fewer links in the

network than are shown here. A network diagram for each outcome is provided in Appendix A.1.

Chapter II.3

53

Figure 9 – Network diagram for the RRMS case study (all outcomes combined). The thickness of the links is proportional to the number of studies directly comparing the linked treatments.

II.3.2.2 Outcomes

Definitions of all outcomes used are given below; study data must meet these definitions for

inclusion. The time horizon for all outcomes is 24 months. Only a small number of studies provide

outcomes at other time periods eg 12 or 36 months, and rather than include these it was considered

appropriate to focus on a single universal time horizon in this instance.

1. Annualised relapse rate (ARR): The mean number of relapses per subject per year, where a

relapse is defined as a new episode of significantly worsened neurological symptoms not

attributable to any disease other than multiple sclerosis and separated from previous

relapses by at least 30 days. Some studies required relapses to persist for at least 24 hours,

and other studies 48 hours; data based on either definition was accepted.

2. Relapse-free proportion (RFP): The proportion of subjects without relapses, where a relapse

is defined as above.

3. Proportion experiencing disability progression, confirmed 3 months later: The proportion of

subjects avoiding disability progression, defined as a 1-point increase in the Expanded

Disability Status Scale (EDSS), confirmed on two occasions three months apart. Slight

variations on this definition used by some studies were accepted, whereby the required

EDSS increase is 0.5 if the starting value is above 5 or 5.5 and/or is 1.5 if the starting EDSS is

0.

Chapter II.3

54

4. Proportion experiencing disability progression, confirmed 6 months later (DP6): The

proportion of subjects avoiding disability progression, defined as above but confirmed on

two occasions six months apart.

5. Proportion with ALT above upper limit (ALT1): the proportion of subjects with alanine

aminotransferase levels above the upper limit of the normal range, as revealed by a blood

test at any point within the follow-up period.

6. Proportion with ALT above 3x upper limit (ALT3): the proportion of subjects with alanine

aminotransferase levels above 3x the upper limit of the normal range, as revealed by a blood


7. Proportion with ALT above 5x upper limit (ALT5): the proportion of subjects with alanine

aminotransferase levels above 5x the upper limit of the normal range, as revealed by a blood


8. Proportion with serious gastrointestinal disorders (SGI): the proportion of subjects

experiencing, at least once during the follow-up period, any serious adverse event classed as

gastrointestinal or one of the following listed serious adverse events: diarrhoea, nausea,

upper abdominal pain, abdominal pain, gastritis, gastroenteritis, vomiting, abdominal

discomfort, appendicitis.

9. Proportion with serious bradycardia (SBC): the proportion of subjects experiencing at least

one serious adverse event classed as bradycardia during the follow-up period.

10. Proportion with macular edema (MED): the proportion of subjects experiencing at least one

serious adverse event classed as macular edema at any point within the follow-up period.

Outcomes 1-4 are the most commonly encountered measures of relapse and disability progression,

the key indicators of efficacy that are assessed in all RRMS clinical trials. It is not practical however

to adopt a comprehensive set of safety outcomes here for all of the treatments in the dataset, as

this would make the chapter rather unwieldy and difficult to digest. The safety outcomes (5-10)

have therefore been selected with a view to illustrating the methodology rather than making an

exhaustive assessment of the safety profile. The selection aims to include some adverse events

(outcomes 5-8) that occur on multiple treatments, some that have only been observed on one

treatment (outcomes 9-10), and some that are clearly closely related to one another (outcomes 5-7).

From a clinical perspective the selection may appear somewhat arbitrary, with some treatments’

safety profiles better represented than others, and comparing the safety of the featured treatments

purely on the basis of these illustrative results is not recommended.

Figure 10 is a hierarchical diagram showing the outcomes and the criteria they represent.

Chapter II.3

55

Figure 10 - Outcomes for the RRMS case study. Blue cells are the outcomes in the dataset; green cells are decision criteria (i.e. specific benefits and risks) that can be measured by the outcomes below them, and yellow cells represent the broad grouping into benefits and risks. This hierarchical structure will be exploited in some of the models in this chapter.

Treatment effects

Benefits (efficacy)

Reduction in relapses

Relapse rate

Relapse-free proportion

Slowing of disability progression

Proportion progressing; confirmed 3 months

later

Proportion progressing; confirmed 6 months

later

Risks (safety)

ALT elevation

ALT > ULN

ALT > 3 x ULN

ALT > 5 x ULN

Gastrointestinal disorders

Proportion with serious gastrointestinal events

Cardiac disordersProportion with serious

bradycardia

Eye disordersProportion with macular

edema

Chapter II.3

56

II.3.2.3 Source studies

Table 5 – Published trial reports providing data to the RRMS case study.

Name & publication year Number of subjects

Treat-ments

Outcomes

ARR RFP DP3 DP6 ALT1 ALT3 ALT5 SGI SBC MED

1. BRAVO 2014 109 1331 PL, IA

(IM), LQ

2. CONFIRM 2012 100 1072 PL, DF, GA

3. ALLEGRO 2012 97 1106 PL, LQ

4. BECOME 2009 95 75 GA, IB

5. BEYOND 2009 106 1345 GA, IB

6. DEFINE 2012 101 817 PL, DF

7. FREEDOMS 2010 104 843 PL, FM

8. FREEDOMS II 2014 96 713 PL, FM

9. INCOMIN 2002 98 182 IA(IM), IB

10. JOHNSON 1995 103 251 PL, GA

11. MSCRG 1996 102 301 PL, IA(IM)

12. PRISMS 1998 99 376 PL, IA(SC)

13. REGARD 2008 105 756 GA, IA(SC)

14. TEMSO 2011 107 721 PL, TF

15. BORNSTEIN 1987 94 48 PL, GA

16. IFNB 1993 108 227 PL, IB

The following three- or four-arm studies had one treatment arm excluded due to the use of non-

standard dosages:

• CONFIRM 2012 100: study arm receiving 720µg dimethyl fumarate daily

• DEFINE 2012 101: study arm receiving 720µg dimethyl fumarate daily

• FREEDOMS 2010 104: study arm receiving 1.25mg fingolimod daily

• FREEDOMS II 2014 96: study arm receiving 1.25mg fingolimod daily

Chapter II.3

57

• PRISMS 1998 99: study arm receiving 22µg subcutaneous interferon beta-1a three times per

week

• BEYOND 2009 106: study arm receiving 500µg interferon beta-1b every 2 days

• IFNB 1993 108: study arm receiving 500µg interferon beta-1b every 2 days

• TEMSO 2011 107: study arm receiving 7mg teriflunomide daily

Additionally, one two-arm study was excluded altogether because it included a treatment arm

receiving 22µg subcutaneous interferon beta-1a three times per week 110.

II.3.2.4 Extraction

Figures were extracted from the published study reports. Results were approximated visually based

on graphs where possible in the absence of quoted figures.

Annualised relapse rate standard errors were frequently not quoted; the missing values were

imputed based on the mean rate and number of patients (assuming Poisson-distributed relapses,

the variance of the sample mean is equal to the mean divided by the number of patients; later the

sensitivity to the imputation was checked by systematically altering the variances for this outcome,

with negligible impact on the model results).

The source data is tabulated in Appendix A and the BUGS data files are set out in Appendix B.

Chapter II.4

58

II.4 Treatment effects module

II.4.1 Initial (naïve) model: all outcomes independent (Model 0)

A first step towards constructing a true multivariate NMA model is to perform NMA simultaneously

but separately for each outcome in the network. The validity of this approach relies on an implicit

assumption that all outcomes occur independently of one another (at both the within- and between-

study levels). This is unlikely to hold in practice, but provides a simple starting point for getting to

grips with the high-level model structure.

The outcomes to be modelled may be represented by a number of types of variable. Table 6 shows

the most common outcome types encountered in the health sciences, the probability distributions

that are typically used to model the likelihood of observed data, their corresponding Normal

approximations, and the corresponding linear treatment contrasts that are used to compare pairs of

treatments. It is not however intended to be an exhaustive list and other modelling approaches are

available.

Table 6 – Distributions commonly used for modelling clinical outcomes at group level. Domain refers to the range of values taken by the approximate Normal statistic. *after possible transformation to account for skew/kurtosis; may in practice include integer-valued or fractional variables.

Outcome type Sampling distribution

at arm level

Approximate Normal sampling distribution Domain

Between-

arm contrast

Continuous

measurement*

𝑔𝑟𝑜𝑢𝑝 𝑚𝑒𝑎𝑛 𝑌

~𝑁𝑜𝑟𝑚𝑎𝑙(𝜇, 𝑆𝐸2))

𝑌~𝑁𝑜𝑟𝑚𝑎𝑙(𝜇, 𝑆𝐸2) ℝ Difference

(Potentially

recurrent) event

counts

𝑔𝑟𝑜𝑢𝑝 𝑡𝑜𝑡𝑎𝑙 𝑐𝑜𝑢𝑛𝑡 𝑌

~𝑃𝑜𝑖𝑠𝑠𝑜𝑛(𝜇)

log(𝑌)

~𝑁𝑜𝑟𝑚𝑎𝑙(log (𝜇), (1 𝜇⁄ )2)

ℝ Log rate

ratio

Binary outcomes 𝑔𝑟𝑜𝑢𝑝 𝑡𝑜𝑡𝑎𝑙 𝑐𝑜𝑢𝑛𝑡 𝑌

~𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙 (𝑛, 𝑝)

logit(𝑌/𝑛)

~𝑁𝑜𝑟𝑚𝑎𝑙(logit(𝑝), 1 𝑛𝑝 + 1/(𝑛(1 − 𝑝)⁄ )

ℝ

Log odds

ratio

𝑌/𝑛~𝑁𝑜𝑟𝑚𝑎𝑙(𝑝, 𝑛𝑝(1 − 𝑝)) [0,1] Risk

difference

log(𝑌/𝑛)

~𝑁𝑜𝑟𝑚𝑎𝑙(log (𝑝), 1 𝑛𝑝⁄ − 1/𝑛)

(-∞,0] Log relative

risk

Chapter II.4

59

For modelling continuous outcomes at the treatment-arm level it will be assumed that the Normal

distribution (using the sample standard error) is suitable. Whilst this is not the only continuous

distribution that could be employed, its mathematical tractability and well-understood properties

make it an obvious choice. Variables with skew or kurtosis can be handled via data transformations,

meaning that the Normal distribution is flexible enough for most purposes. The natural contrast

between treatment arms is the linear difference, which itself is Normally distributed given a

Normally distributed outcome in each arm.

For count outcomes taking integer values, typical distributions are the Poisson (or sometimes the

Negative Binomial). Alternatively, it is common practice in biostatistics (in Poisson regression, for

example) to model such outcomes by assigning a Normal sampling distribution to the log incidence

rate, with standard error estimated as 1/√(# 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑒𝑣𝑒𝑛𝑡𝑠). The conventional contrast for

count data is the incidence rate ratio, or its logarithm, which is equal to the difference in the log

event rates and can thus also be modelled using a Normal distribution.

For binary outcomes, the Binomial distribution is the natural likelihood for outcomes at the

treatment-arm level. Using this likelihood, various contrasts between treatment arms are possible

and have previously been applied in evidence synthesis models 74,111. The (log) odds ratio tends to

be favoured in most applications due to its mathematical properties; however it is undefined for

proportions of 0 or 100%. A risk difference model may be more appropriate in such circumstances.

The relative risk is an alternative that has been used elsewhere; it is not used here as it offers little

advantage over the odds ratio and risk difference approaches, but in principle a model such as that

used by Warn et al74 could be employed. For the odds-ratio based model, instead of a binomial

likelihood, a normal distribution can be assigned to the log odds in each treatment arm with

standard error estimated by √(1

𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑒𝑠+

1

𝑓𝑎𝑖𝑙𝑢𝑟𝑒𝑠).

Outcome types other than those in Table 6 may sometimes be encountered. Categorical outcomes

(with more than two categories) can be modelled using the multinomial distribution, or as a

combination of Binomial variables. Survival/time-to-event variables will require careful

consideration, as they are normally reported using a proportional hazards model that does not

require specification of an arm-level likelihood. Approaches that may be used for evidence synthesis

of such variables have been described elsewhere 112,113 but are considered beyond the scope of this

thesis.

Whichever likelihood and treatment contrast is adopted, the parameterisation of the treatment

effects follows the same principles, following conventions established in the canon 78:

Chapter II.4

60

• Study-specific “baseline” parameters: the mean of outcome j in arm 1 of study i is denoted

by 𝜇𝑖𝑗

• Basic population-average treatment effect parameters: the mean effect on outcome 𝜔 of

treatment t relative to treatment 1 is denoted by 𝑑𝜔𝑡

• Functional population-average treatment effects: For any two treatments t1 and t2, The

mean effect on outcome 𝜔 for t2 relative to t1 is calculated as 𝑑𝜔𝑡2 − 𝑑𝜔𝑡1 to ensure

consistency (see II.1.1)

• Study-specific random effects: 𝛿𝑖𝑘𝑗 ~ 𝑁( 𝑑𝜔𝑖𝑗𝑡𝑖𝑘 − 𝑑𝜔𝑖𝑗𝑡𝑖1 , 𝜎2) is the (marginal)

distribution of the treatment effect on outcome j in arm k of study i, relative to arm 1

• Treatment effect correlations within multi-arm trials: for a given outcome j, estimates

of 𝛿𝑖𝑘1𝑗 and 𝛿𝑖𝑘2𝑗 are linked by their common baseline where 𝑘1, 𝑘2 > 1 are distinct arms

in study i. A correlation between them therefore needs to be allowed for; this is usually

taken to be 0.5, under the assumption of equal between-study variance across treatment

contrasts 76 60(see the formal specification of the random effects distribution in II.4.2 for an

explanation).

The BUGS code for this model, as applied to the RRMS dataset, is given in Appendix B, illustrating

how the model is put together for continuous, count and binary outcomes. A random effects model

is applied to outcomes ARR, RFP, DP3, DP6, ALT1, ALT3 and ALT5 while a fixed effects model is used

for SGI, SBC and MED (due to the low number of studies contributing data for these outcomes).

II.4.2 Correlated non-zero outcomes (Model 1)

Model 0 incudes all of the outcomes simultaneously, but not jointly. The outcomes are statistically

independent and equivalent results could be achieved by running a series of separate NMAs. This

independent model is convenient but lacks rigour; ignoring correlations in multivariate analyses has

been shown to impact on the treatment effect estimates 40,114,115. The impact will be potentially

even greater on variables defined as functions of several treatment effect parameters (which

essentially describes the MCDA scores that will be constructed later) because the correlations

between parameters then add additional terms to the overall variance.

In terms of specifying the likelihood, it is important to allow for correlations in outcomes at two

levels of the model:

• Between-study correlations: these describe correlations among the random effects 𝛿𝑖𝑘𝑗 , i.e.

they relate to the random variability of the average treatment effect from study to study

(random effects models only)

Chapter II.4

61

• Within-study correlations: these describe correlations among the observed outcomes 𝑦𝑖𝑘𝑗

conditional on 𝛿𝑖𝑘𝑗 , i.e. they relate to random variability at the level of individual subjects

Implementing the within-study correlations in the distribution of Y for binary or count data is

somewhat problematic. Correlated versions of the Binomial, Poisson, and Negative Binomial

distributions are not straightforward to describe mathematically and in any event are not (currently)

supported by BUGS. However, the multivariate Normal distribution does not suffer from these

problems. The Poisson and Binomial distributions can both be approximated by Normal distributions

as set out in the last two columns of Table 6 unless the number of patients is small or the underlying

event rate is close to 0 (or 100%). To induce the correlations these Normal distirbutions can be

combined into a multivariate Normal.

This does however present an issue with some kinds of outcome when values of zero (or 100%) are

observed. Odds, the log rate, and the Normal approximation to the variance of the risk are all not

definable for such values. This issue will be revisited later (see II.4.7) but for now these outcomes

are excluded from the model, which in the context of the RRMS case study means dropping serious

gastrointestinal events, serious bradycardia and macular edema. The remaining outcomes are

transformed to the Normal scale indicated in Table 6.

The NOi-length vector 𝒚𝑖𝑘 (now referring to the transformed outcomes on the Normal scale) in arm

k of study i is thus given a multivariate Normal likelihood:

𝒚𝑖𝑘~ 𝑀𝑉𝑁(𝝁𝑖 + 𝜹𝑖𝑘 , 𝐂𝐕𝑖𝑘 ) (1)

where 𝜹𝑖𝑘 is a vector of length NOi whose elements are the study-specific treatment effects for arm

k relative to arm 1 in respect of outcomes 1 to NOi , 𝝁𝑖 is the study-specific baseline vector also of

length NOi , representing the outcomes in arm 1 of study i, and 𝐂𝐕𝑖𝑘 is the within-study covariance

matrix. We can rewrite (1) to more explicitly show the elements of the mean vector and covariance

matrix:

𝒚𝑖𝑘~ 𝑀𝑉𝑁

(

(

𝜇𝑖1 + 𝛿𝑖𝑘1𝜇𝑖2 + 𝛿𝑖𝑘2

⋮𝜇𝑖𝑁𝑂𝑖 + 𝛿𝑖𝑘𝑁𝑂𝑖

) ,

(

𝑣𝑎𝑟(𝑦𝑖𝑘1) 𝑐𝑜𝑣(𝑦𝑖𝑘1, 𝑦𝑖𝑘2)𝑐𝑜𝑣(𝑦𝑖𝑘1, 𝑦𝑖𝑘2) 𝑣𝑎𝑟(𝑦𝑖𝑘2)

⋯ 𝑐𝑜𝑣(𝑦𝑖𝑘1, 𝑦𝑖𝑘𝑁𝑂𝑖)

⋯ 𝑐𝑜𝑣(𝑦𝑖𝑘2, 𝑦𝑖𝑘𝑁𝑂𝑖)

⋮ ⋮𝑐𝑜𝑣(𝑦𝑖𝑘1, 𝑦𝑖𝑘𝑁𝑂𝑖) 𝑐𝑜𝑣(𝑦𝑖𝑘2, 𝑦𝑖𝑘𝑁𝑂𝑖)

⋱ ⋮⋯ 𝑣𝑎𝑟(𝑦𝑖𝑘𝑁𝑂𝑖) )

)

The diagonal terms of 𝐂𝐕𝑖𝑘 are the empirical within-study outcome variances 𝑣𝑎𝑟(𝑦𝑖𝑘𝑗) which are

provided in the arm-level data from the source studies. Note that this is the variance of the sample

mean, i.e. the squared standard error; if studies instead report sample variances, then the sample

mean variance can be easily obtained by dividing by the number of subjects. (It may be worth

Chapter II.4

62

examining the raw sample variances as a cursory check on the compatibility of the source studies,

however. Sample mean variances may differ greatly in magnitude between studies due to

differences in sample size, but one would normally expect the sample variances to be of a similar

magnitude in homogeneous populations).

The off-diagonal terms, however, cannot be estimated from typical arm-level summary data unless

the studies specifically report them; the approach taken here is to derive the covariances within the

model by using assumed values (or prior distributions) for the within-study correlations. Initially it

will be assumed that these correlations are equal to a fixed value ρw so that 𝑐𝑜𝑣(𝑦𝑖𝑘𝑗1 , 𝑦𝑖𝑘𝑗2) =

𝜌𝑤√𝑣𝑎𝑟(𝑦𝑖𝑘𝑗1)𝑣𝑎𝑟(𝑦𝑖𝑘𝑗2) for all pairs of outcomes 𝑗1, 𝑗2; later this assumption will be relaxed.

Other methods for handling the within-study correlations have also been proposed43,44,116.

By definition 𝛿𝑖𝑗𝑘 = 0 for 𝑘 = 1. For 𝑘 > 1 in the random effects model the δijk are jointly

described by a multivariate normal distribution, which allows for outcomes to be correlated at the

between-study level. Here, however, in addition to the correlations between outcomes, there are

also correlations linking the same outcome in different treatment arms to consider. For a given

outcome j, Estimates of 𝛿𝑖𝑘1𝑗 and 𝛿𝑖𝑘2𝑗 are linked by their common baseline where 𝑘1 , 𝑘2 > 1 are

distinct arms in study i. A correlation between them therefore needs to be allowed for; this is

usually taken to be 0.5, under the assumption of equal between-study variance across treatment

contrasts 76. This follows when one considers the variance of 𝛿𝑖𝑘2𝑗 − 𝛿𝑖𝑘1𝑗 , which by assumption is

𝜎2 (since it is itself a relative treatment contrast within study i) but is also equal to 𝜎2 + 𝜎2 − 2𝜌𝜎2

(by the formula for variance of a difference), implying the correlation coefficient 𝜌 = 0.5. 78. 43,46,47

If the between-study correlation coefficient for different outcomes in the same trial arm is taken to

be ρb, then the correlation coefficient for different outcomes in different trial arms is the product

0.5* ρb 44. This can be seen by considering the covariance between 𝛿𝑖𝑘2𝑗1 − 𝛿𝑖𝑘1𝑗1 and 𝛿𝑖𝑘2𝑗2 −

𝛿𝑖𝑘1𝑗2 for distinct outcomes j1, j2 in distinct arms k 1, k2 of study i. Each of these two expressions is,

by consistency, a treatment contrast within study i. Indeed they are the same treatment contrast

but for outcomes j1, j2 respectively. Therefore the covariance 𝑐𝑜𝑣( 𝛿𝑖𝑘2𝑗1 − 𝛿𝑖𝑘1𝑗1 , 𝛿𝑖𝑘2𝑗2 −

𝛿𝑖𝑘1𝑗2 ) = 𝜌𝑏𝜎2. But, using well-known properties of the covariance,

𝑐𝑜𝑣( 𝛿𝑖𝑘2𝑗1 − 𝛿𝑖𝑘1𝑗1 , 𝛿𝑖𝑘2𝑗2 − 𝛿𝑖𝑘1𝑗2 )

= 𝑐𝑜𝑣( 𝛿𝑖𝑘2𝑗1 , 𝛿𝑖𝑘2𝑗2 ) + 𝑐𝑜𝑣( 𝛿𝑖𝑘1𝑗1 , 𝛿𝑖𝑘1𝑗2 ) − 𝑐𝑜𝑣( 𝛿𝑖𝑘2𝑗1 , 𝛿𝑖𝑘1𝑗2 ) − 𝑐𝑜𝑣( 𝛿𝑖𝑘1𝑗1 , 𝛿𝑖𝑘2𝑗2 )

= 2𝜌𝑏𝜎2 − 𝑐𝑜𝑣( 𝛿𝑖𝑘2𝑗1 , 𝛿𝑖𝑘1𝑗2 ) − 𝑐𝑜𝑣( 𝛿𝑖𝑘1𝑗1 , 𝛿𝑖𝑘2𝑗2 )

Therefore 𝑐𝑜𝑣( 𝛿𝑖𝑘2𝑗1 , 𝛿𝑖𝑘1𝑗2 ) + 𝑐𝑜𝑣( 𝛿𝑖𝑘1𝑗1 , 𝛿𝑖𝑘2𝑗2 ) = 𝜌𝑏𝜎2.

Chapter II.4

63

Under the assumption that the correlation does not depend upon the ordering of the treatments,

𝑐𝑜𝑣( 𝛿𝑖𝑘2𝑗1 , 𝛿𝑖𝑘1𝑗2 ) = 𝑐𝑜𝑣( 𝛿𝑖𝑘1𝑗1 , 𝛿𝑖𝑘2𝑗2 ) = 0. 5𝜌𝑏𝜎2and therefore the correlation coefficient for

different outcomes in different trial arms is 0.5* ρb.

Using the notation 𝑀𝑉𝑁𝑗𝑘 to indicate that the components of the multivariate normal distribution

are indexed over values of both 𝑗 and 𝑘, we have:

𝛅𝐢 ~ 𝐌𝐕𝐍𝐣𝐤(𝐝𝐢𝐑,𝐢 ) (2)

where 𝒅𝑖𝑅

is a vector of length 𝑁𝑂𝑖 × (𝑁𝐴𝑖 − 1) whose elements are 𝑑𝜔𝑖𝑗𝑡𝑖𝑘 − 𝑑𝜔𝑖𝑗𝑡𝑖1 (indexed by

(𝑗, 𝑘) ∈ 1, … , 𝑁𝑂𝑖 × 2, … , 𝑁𝐴𝑖 ) and 𝒊 is a (𝑁𝑂𝑖 × (𝑁𝐴𝑖 − 1)) × (𝑁𝑂𝑖 × (𝑁𝐴𝑖 − 1)) between-study

treatment effects covariance matrix. The diagonal elements of 𝒊 are equal to the random-effects

variance σ2 and the off-diagonal elements are equal to either 0.5σ2 (same 𝑗 different 𝑘, ρbσ2

(different 𝑗 same 𝑘), or 0.5ρbσ2 (different 𝑗 different 𝑘). if we order the elements of 𝜹𝑖 and 𝒅𝑖

lexicographically (advancing through values of j first, then values of k starting at k=2), then (2) can

be written:

(

𝛿𝑖21⋮

𝛿𝑖2𝑁𝑂𝑖𝛿𝑖31⋮

𝛿𝑖3𝑁𝑂𝑖⋮⋮

𝛿𝑖𝑁𝐴𝑖1⋮

𝛿𝑖𝑁𝐴𝑖𝑁𝑂𝑖)

~ 𝑀𝑉𝑁(

(

𝑑𝜔𝑖1𝑡𝑖2 − 𝑑𝜔𝑖1𝑡𝑖1

⋮ 𝑑𝜔𝑖𝑁𝑂𝑖𝑡𝑖2 − 𝑑𝜔𝑖𝑁𝑂𝑖𝑡𝑖1

𝑑𝜔𝑖1𝑡𝑖3 − 𝑑𝜔𝑖1𝑡𝑖1

⋮ 𝑑𝜔𝑖𝑁𝑂𝑖𝑡𝑖3

⋮

− 𝑑𝜔𝑖𝑁𝑂𝑖𝑡𝑖1

⋮ 𝑑𝜔𝑖1𝑡𝑖𝑁𝐴𝑖

− 𝑑𝜔𝑖1𝑡𝑖1

⋮ 𝑑𝜔𝑖𝑁𝑂𝑖𝑡𝑖𝑁𝐴𝑖

− 𝑑𝜔𝑖𝑁𝑂𝑖𝑡𝑖1 )

,𝒊 )

Chapter II.4

64

where𝒊 takes the form

𝒊 = σ2

(

(

1 𝜌𝑏𝜌𝑏 1

⋯ 𝜌𝑏⋯ 𝜌𝑏

⋮ ⋮𝜌𝑏 𝜌𝑏

⋱ ⋮⋯ 1

) (

0.5 0.5𝜌𝑏0.5𝜌𝑏 0.5

⋯ 0.5𝜌𝑏⋯ 0.5𝜌𝑏

⋮ ⋮0.5𝜌𝑏 0.5𝜌𝑏

⋱ ⋮⋯ 0.5

)

(

0.5 0.5𝜌𝑏0.5𝜌𝑏 0.5

⋯ 0.5𝜌𝑏⋯ 0.5𝜌𝑏

⋮ ⋮0.5𝜌𝑏 0.5𝜌𝑏

⋱ ⋮⋯ 0.5

) (




⋱ ⋮⋯ 1

)

⋯ (

0.5 0.5𝜌𝑏0.5𝜌𝑏 0.5

⋯ 0.5𝜌𝑏⋯ 0.5𝜌𝑏

⋮ ⋮0.5𝜌𝑏 0.5𝜌𝑏

⋱ ⋮⋯ 0.5

)

⋯ (

0.5 0.5𝜌𝑏0.5𝜌𝑏 0.5

⋯ 0.5𝜌𝑏⋯ 0.5𝜌𝑏

⋮ ⋮0.5𝜌𝑏 0.5𝜌𝑏

⋱ ⋮⋯ 0.5

)

⋮ ⋯

(

0.5 0.5𝜌𝑏0.5𝜌𝑏 0.5

⋯ 0.5𝜌𝑏⋯ 0.5𝜌𝑏

⋮ ⋮0.5𝜌𝑏 0.5𝜌𝑏

⋱ ⋮⋯ 0.5

) (

0.5 0.5𝜌𝑏0.5𝜌𝑏 0.5

⋯ 0.5𝜌𝑏⋯ 0.5𝜌𝑏

⋮ ⋮0.5𝜌𝑏 0.5𝜌𝑏

⋱ ⋮⋯ 0.5

)

⋱ ⋮

⋯ (




⋱ ⋮⋯ 1

)

)

consisting of (𝑁𝐴𝑖 − 1) × (𝑁𝐴𝑖 − 1) sub-matrices with compound symmetry, each of size 𝑁𝑂𝑖 ×𝑁𝑂𝑖.

To pick out an element 𝒊[(𝑗1, 𝑘1), (𝑗2, 𝑘2)], 𝑘1 − 1 and 𝑘2 − 1 give the row and column coordinates

of the relevant sub-matrix and 𝑗1 and 𝑗2 give the row and column coordinates of the relevant

element within the sub-matrix.)The corresponding fixed effect model is obtained by replacing 𝛿𝑖𝑗𝑘

with its mean according to the distribution above.

II.4.3 Contrast-level data (Model 1*)

Instead of using the full arm-level data, the model can also be expressed in terms of contrast-level

data relative to the first (baseline) trial arm. In this alternative formulation, the data supplied to the

model consists of the contrasts 𝑦𝑖𝑘𝑗𝑐 = 𝑦𝑖𝑘𝑗 − 𝑦𝑖1𝑗 (𝑘 ∈ 2,… , NA𝑖) together with their estimated

variances, standard deviations or standard errors. This version eliminates the need for the

parameters 𝜇𝑖𝑗 as the mean of 𝑦𝑖𝑘𝑗𝑐 is simply equal to 𝛿𝑖𝑘𝑗, the difference between the means of 𝑦𝑖𝑘𝑗

and 𝑦𝑖1𝑗 in the arm-level model.

It is necessary however to allow for additional correlations between the contrasts in trials with more

than two treatment arms: 𝑦𝑖𝑘1𝑗𝑐 and 𝑦𝑖𝑘2𝑗

𝑐 will be correlated (for 𝑘1, 𝑘2 ∈ 2,… , NA𝑖) since they

both depend on 𝑦𝑖1𝑗, the outcome in the baseline arm. It has been shown114 that the covariance is

equal to the sampling variance of 𝑦𝑖1𝑗given by 𝑣𝑎𝑟(𝑦𝑖1𝑗) 𝑛𝑖1⁄ . If arms 𝑘1 , 𝑘2 have roughly the same

number of patients (i.e. 𝑛𝑖𝑘1 ≈ 𝑛𝑖𝑘2) and the outcome variance is assumed to be equal across trial

arms (i.e. 𝑣𝑎𝑟(𝑦𝑖1𝑗) ≈ 𝑣𝑎𝑟(𝑦𝑖𝑘1𝑗) ≈ 𝑣𝑎𝑟(𝑦𝑖𝑘2𝑗), then the correlation coefficient between the

sampling distributions of 𝑦𝑖𝑘1𝑗𝑐 and 𝑦𝑖𝑘2𝑗

𝑐 is given by

[𝑣𝑎𝑟(𝑦𝑖1𝑗) 𝑛𝑖1⁄ ] 𝑆𝐸(𝑦𝑖𝑘2𝑗𝑐 )𝑆𝐸(𝑦𝑖𝑘2𝑗

𝑐 )⁄

≈ [𝑣𝑎𝑟(𝑦𝑖1𝑗) 𝑛𝑖1⁄ ] 𝑆𝐸(𝑦𝑖𝑘2𝑗𝑐 )2⁄

Chapter II.4

65

≈ [𝑣𝑎𝑟(𝑦𝑖1𝑗) 𝑛𝑖1⁄ ] [𝑣𝑎𝑟(𝑦𝑖1𝑗) 𝑛𝑖1⁄ + 𝑣𝑎𝑟(𝑦𝑖𝑘𝑗) 𝑛𝑖𝑘⁄ ]⁄ ≈ 𝑛𝑖𝑘 (𝑛𝑖1⁄ + 𝑛𝑖𝑘) where 𝑘 = 𝑘1 or 𝑘2.

For trials where patients are randomised to treatment arms in equal proportions, this correlation

coefficient is simply equal to 0.5.

In all other respects the model for contrast-level data is defined identically to the arm-level version,

and the two models give equivalent results. It will not be used in this chapter, but code and data are

available in Appendix B.

II.4.4 BUGS coding via variance decomposition

In many existing Bayesian MCMC software packages (such as WinBUGS and OpenBUGS as used in

this thesis, or JAGS), implementing an indexed multivariate normal distribution with arbitrary

dimensions poses difficulties, and hence all multivariate NMAs to date have to some extent “hard-

coded” the model to be specific to the dimensions of the dataset. It is possible however to replace

the multivariate Normal with a combination of univariate Normals that can be coded in arbitrary

dimensions by exploiting the structure of the covariance matrix . This technique may not be

required if alternative software with more flexible provision for multivariate distributions is used, in

which case this section can be skipped and the multivariate distributions specified as already

described.

The key insight here is that the correlation coefficient 𝜌 between two Normally distributed variables

A and B (both with variance σ2) can be interpreted as the proportion of A’s variability that is shared

with B (i.e. their covariance), and vice versa. This follows since 𝑐𝑜𝑣(𝐴, 𝐵) = 𝜌𝜎2. The remaining

part of A’s variance is (1 − 𝜌)𝜎2, which it experiences independently of B (and B experiences the

same amount of variability independently of A). If we think of A as a mean plus a Normal random

term,

𝐴 = 𝑚𝑒𝑎𝑛𝐴 + 휀𝐴 휀𝐴 ~ 𝑁(0, 𝜎2)

Then we can rewrite this to partition the random term into two separate independent Normal

variables

𝐴 = 𝑚𝑒𝑎𝑛𝐴 + 휀𝐴∗ + 휀𝐴𝐵 휀𝐴∗ ~ 𝑁(0, (1 − 𝜌)𝜎2), 휀𝐴𝐵 ~ 𝑁(0, 𝜌𝜎

2)

(it should be easy to see that 휀𝐴∗ + 휀𝐴𝐵 has the same distribution as 휀𝐴 due to standard properties

of Normal distributions). B can be written similarly as

𝐵 = 𝑚𝑒𝑎𝑛𝐵 + 휀𝐵∗ + 휀𝐴𝐵 휀𝐵∗ ~ 𝑁(0, (1 − 𝜌)𝜎2)

Chapter II.4

66

Thus 휀𝐴𝐵 explicitly represents the portion of variability that is shared between A and B, and

𝑐𝑜𝑣(𝐴, 𝐵) = 𝑣𝑎𝑟(휀𝐴𝐵) = 𝜌𝜎2.

This approach partitions the variance of A and B into shared and independent components. This

allows the multivariate distribution of A and B to be constructed by specifying the mutually

independent variables 휀𝐴∗, 휀𝐵∗ and 휀𝐴𝐵 together with the correlation coefficient 𝜌.

The multivariate distributions of 𝒚𝑖𝑘 and 𝛅i shown in (1) and (2) have a slightly more complex

correlation structure than the variables A and B in this simple example, but the same technique of

partitioning the variance into shared and independent components can be used to construct the

required multivariate distributions as combinations of mutually independent variables, as shown in

the following subsections.

II.4.4.1.1 Constant non-negative correlation

Assuming for now that ρb is constant for all pairs of outcomes and 0 ≤ ρb ≤ 1, the definition of

implies that the variance σ2 of 𝛿𝑖𝑗𝑘 can be partitioned into the following components:

(i) 0.5ρbσ2 is shared as covariance with 𝛿𝑖𝑗`𝑘` for all 𝑗`, 𝑘`;

(ii) an additional (ρb − 0.5ρb)σ2 = 0.5ρbσ

2 is shared as covariance with 𝛿𝑖𝑗`𝑘 for all 𝑗` ;

(iii) an additional (0.5 − 0.5ρb)σ2 is shared as covariance with 𝛿𝑖𝑗𝑘` for all 𝑘` ; and

(iv) a remaining σ2 − 0.5ρbσ2 − (0.5 − 0.5ρb)σ

2 − 0.5ρbσ2 = 0.5σ2 − 0.5ρbσ

2 is unique to

𝛿𝑖𝑗𝑘.

This allows the multivariate normal distribution to be expressed as a combination of independent

univariate normal distributions. (2) is equivalent to

𝛿𝑖𝑗𝑘~ 𝑁(𝑑𝜔𝑖𝑗𝑡𝑖𝑘𝑅 + 𝐸𝑖 + 𝐹𝑖𝑘 + 𝐺𝑖𝑗, (0.5 − 0.5ρb)σ

2)

where 𝐸𝑖~ 𝑁(0, 0.5ρbσ2) corresponds to covariance (i), 𝐹𝑖𝑘~ 𝑁(0, (ρb − 0.5ρb)σ

2) corresponds to

covariance (ii), 𝐺𝑖𝑗~ 𝑁(0, (0.5 − 0.5ρb)σ2) corresponds to covariance (iii), and the remaining

variance (0.5 − 0.5ρb)σ2 is the final component (iv).

Equivalently, one can let 𝐸𝑖, 𝐹𝑖𝑘, 𝐺𝑖𝑗 ~ 𝑁(0,1) and rescale to the appropriate standard deviation (i.e.

multiply by √(0.5ρbσ2), etc) within the definition of 𝛿𝑖𝑗𝑘. This allows the variance σ2 to vary by arm

or by outcome, if desired.

Chapter II.4

67

This is a particular example of the result in Theorem 1. The i subscript does not contribute to the

theorem and can be ignored if preferred, but has been left in so that the notation matches the rest

of this chapter.

Theorem 1: If 𝜽𝑖 is a given vector of length 𝑁𝑂𝑖 ×𝑁𝐴𝑖 (indexed by (𝑗, 𝑘) ∈ 1, … , 𝑁𝑂𝑖 × 1, … , 𝑁𝐴𝑖

) and 𝒊 is a covariance matrix of size (𝑁𝑂𝑖 × 𝑁𝐴𝑖) × (𝑁𝑂𝑖 × 𝑁𝐴𝑖) where each element

𝒊[(𝑗1, 𝑘1), (𝑗2, 𝑘2)] is defined as follows:

𝒊[(𝑗1, 𝑘1)(𝑗2, 𝑘2)] =

𝜎𝑖𝑗𝑘

2 𝑗1 = 𝑗2 = 𝑗 , 𝑘1 = 𝑘2 = 𝑘

𝑟𝜎𝑖𝑗𝑘1𝜎𝑖𝑗𝑘2 𝑗1 = 𝑗2 = 𝑗 , 𝑘1 ≠ 𝑘2 𝜌𝜎𝑖𝑗1𝑘𝜎𝑖𝑗2𝑘 𝑗1 ≠ 𝑗2 , 𝑘1 = 𝑘2 = 𝑘

𝑟𝜌𝜎𝑖𝑗1𝑘1𝜎𝑖𝑗2𝑘2 𝑗1 ≠ 𝑗2 , 𝑘1 ≠ 𝑘2

𝑗1, 𝑗2 ∈ 1,… ,𝑁𝑂𝑖

𝑘1, 𝑘2 ∈ 1,… , 𝑁𝐴𝑖

then 𝜹𝑖 ~ 𝑀𝑉𝑁(𝜽𝑖 ,𝒊) is equivalent to

𝛿𝑖𝑗𝑘~ 𝑁( 𝜃𝑖𝑗𝑘 +√𝑟𝜌𝜎𝑖𝑗𝑘𝐸𝑖 + √(𝜌 − 𝑟𝜌)𝜎𝑖𝑗𝑘𝐹𝑖𝑘 + √(𝑟 − 𝑟𝜌)𝜎𝑖𝑗𝑘𝐺𝑖𝑗, (1 − 𝑟 − 𝜌 + 𝑟𝜌)σ𝑖𝑗𝑘2 )

where 𝐸𝑖 , 𝐹𝑖𝑘 , 𝐺𝑖𝑗~ 𝑁(0, 1) 𝑖. 𝑖. 𝑑.

(Note that the variance (1 − 𝑟 − 𝜌 + 𝑟𝜌)σ𝑖𝑗𝑘2 is guaranteed to be nonnegative for 𝑟, 𝜌 ∈ [0,1].)

Proof:

Given the latter specification, it is clear that the marginal distribution of each 𝛿𝑖𝑗𝑘 is Normal with a

mean of 𝑑𝑖𝑗𝑘 as required.

Any linear combination of the 𝛿𝑖𝑗𝑘 is also clearly Normal and so the 𝛿𝑖𝑗𝑘 are jointly Normally

distributed.

It is therefore necessary simply to verify that the variances and covariances among the 𝛿𝑖𝑗𝑘 are equal

to the corresponding elements of 𝒊.

𝑣𝑎𝑟(𝛿𝑖𝑗𝑘 = 𝑟𝜌𝜎𝑖𝑗𝑘2 + (𝜌 − 𝑟𝜌)𝜎𝑖𝑗𝑘

2 + (𝑟 − 𝑟𝜌)𝜎𝑖𝑗𝑘2 + (1 − 𝑟 − 𝜌 + 𝑟𝜌)𝜎𝑖𝑗𝑘

2

= 𝜎𝑖𝑗𝑘2 as required.

𝑐𝑜𝑣(𝛿𝑖𝑗1𝑘,𝛿𝑖𝑗2𝑘) = 𝑟𝜌𝜎𝑖𝑗1𝑘𝜎𝑖𝑗2𝑘𝑐𝑜𝑣(𝐸𝑖 , 𝐸𝑖) + (𝜌 − 𝑟𝜌)𝜎𝑖𝑗1𝑘𝜎𝑖𝑗2𝑘𝑐𝑜𝑣(𝐹𝑖𝑘 , 𝐹𝑖𝑘) = 𝜌𝜎𝑖𝑗1𝑘𝜎𝑖𝑗2𝑘 as

required.

𝑐𝑜𝑣(𝛿𝑖𝑗𝑘1,𝛿𝑖𝑗𝑘2) = 𝑟𝜌𝜎𝑖𝑗𝑘1𝜎𝑖𝑗𝑘2𝑐𝑜𝑣(𝐸𝑖 , 𝐸𝑖) + (𝑟 − 𝑟𝜌)𝜎𝑖𝑗𝑘1𝜎𝑖𝑗𝑘2𝑐𝑜𝑣(𝐺𝑖𝑗, 𝐺𝑖𝑗) = 𝑟𝜎𝑖𝑗𝑘1𝜎𝑖𝑗𝑘2 as

required.

𝑐𝑜𝑣(𝛿𝑖𝑗1𝑘1,𝛿𝑖𝑗2𝑘2) = 𝑟𝜌𝜎𝑖𝑗1𝑘1𝜎𝑖𝑗2𝑘2𝑐𝑜𝑣(𝐸𝑖 , 𝐸𝑖) = 𝑟𝜌𝜎𝑖𝑗1𝑘1𝜎𝑖𝑗2𝑘2 as required.

Chapter II.4

68

II.4.4.1.2 Examples of this decomposition in the evidence synthesis model

The theorem above makes it unnecessary to explicitly specify a multivariate normal distribution in

BUGS when writing a multivariate NMA model, provided that the assumed correlation structure can

be taken to hold.

Substituting 𝜽𝑖 = 𝒅𝑖𝑅, 𝜎𝑖𝑗𝑘 = 𝜎𝑖 , 𝜌 = 𝜌𝑏 , r = 0.5 gives the distribution of the study-specific

treatment effects 𝛿𝑖𝑗𝑘 .

Substituting 𝜽𝑖 = 𝝁𝑖 + 𝜹𝑖𝑘 , 𝜎𝑖𝑗𝑘 = √𝑣𝑖𝑘𝑗 , 𝜌 = 𝜌𝑤 , r = 0 gives the distribution of the observed

outcomes 𝑦𝑖𝑘𝑗 in the model for arm-level data.

Substituting 𝜽𝑖 = 𝝁𝑖 + 𝜹𝑖𝑘 , 𝜎𝑖𝑗𝑘 = √𝑣𝑖𝑘𝑗 , 𝜌 = 𝜌𝑤 , r = 𝑛𝑖𝑘 (𝑛𝑖1 + 𝑛𝑖𝑘)⁄ gives the distribution

of the observed outcomes 𝑦𝑖𝑘𝑗 in the model for contrast-level data under the assumption that 𝑛𝑖𝑘

is broadly equal for all 𝑘 ∈ 2,… , NA𝑖 and that the underlying outcome variance is broadly equal in

all treatment arms.

II.4.4.1.3 A more general correlation structure

The construction of the random effects multivariate normal distribution used above assumes a

universal non-negative correlation coefficient between all outcome pairs. By adapting the

construction slightly, it is possible to incorporate a broader class of covariance structures, with

correlation coefficients that can vary (both in sign and magnitude) between outcome pairs. The

universal correlation coefficient 𝜌 (at the within- or between-study level) is replaced by a vector

across the set of outcomes 1, … , , with a somewhat altered interpretation, as detailed in

Theorem 2 below. Again, the i subscript does not contribute to the theorem and can be ignored if

preferred, but has been left in so that the notation matches the rest of this chapter.

Theorem 2:

Let 𝝆 = 𝜌1,… , 𝜌 be a vector whose elements lie in the interval [−1,1], and let

𝛿𝑖𝑗𝑘~ 𝑁( 𝜃𝑖𝑗𝑘 + 𝑠𝑖𝑔𝑛(𝜌𝑗)√(𝑟|𝜌𝑗|)𝜎𝑖𝑗𝑘𝐸𝑖 + 𝑠𝑖𝑔𝑛(𝜌𝑗)√(|𝜌𝑗| − 𝑟|𝜌𝑗|)𝜎𝑖𝑗𝑘𝐹𝑖𝑘

+𝑠𝑖𝑔𝑛(𝜌𝑗)√(𝑟 − 𝑟|𝜌𝑗|)𝜎𝑖𝑗𝑘𝐺𝑖𝑗, (1 − 𝑟 − |𝜌𝑗| + 𝑟|𝜌𝑗|)σ𝑖𝑗𝑘2 )

where 𝐸𝑖 , 𝐹𝑖𝑘 , 𝐺𝑖𝑗~ 𝑁(0, 1) 𝑖. 𝑖. 𝑑. and 𝑠𝑖𝑔𝑛(𝑥) = 𝑥/|𝑥|

noting that the variance (1 − 𝑟 − |𝜌𝑗| + 𝑟|𝜌𝑗|)σ𝑖𝑗𝑘2 is still guaranteed to be nonnegative for 𝜌𝑗 ∈

[−1,1], 𝑟 ∈ [0,1].

Chapter II.4

69

This is equivalent to the multivariate normal distribution 𝜹𝑖 ~ 𝑀𝑉𝑁(𝜽𝑖 ,𝒊) where the covariance

matrix 𝒊 is now defined as follows:

𝒊[(𝑗1, 𝑘1)(𝑗2, 𝑘2)] =

𝜎𝑖𝑗𝑘

2 𝑗1 = 𝑗2 = 𝑗 , 𝑘1 = 𝑘2 = 𝑘

𝑟𝜎𝑖𝑗1𝑘𝜎𝑖𝑗2𝑘 𝑗1 = 𝑗2 = 𝑗 , 𝑘1 ≠ 𝑘2

𝑠𝑖𝑔𝑛(𝜌𝑗1)𝑠𝑖𝑔𝑛(𝜌𝑗2)√(|𝜌𝑗1𝜌𝑗2|)𝜎𝑖𝑗1𝑘𝜎𝑖𝑗2𝑘 𝑗1 ≠ 𝑗2 , 𝑘1 = 𝑘2 = 𝑘

𝑠𝑖𝑔𝑛(𝜌𝑗1)𝑠𝑖𝑔𝑛(𝜌𝑗2)𝑟√(|𝜌𝑗1𝜌𝑗2|)𝜎𝑖𝑗1𝑘𝜎𝑖𝑗2𝑘 𝑗1 ≠ 𝑗2 , 𝑘1 ≠ 𝑘2

Proof: again it is sufficient to verify the elements of the covariance matrix.

𝑣𝑎𝑟(𝛿𝑖𝑗𝑘) = 𝑟|𝜌𝑗|𝜎𝑖𝑗𝑘2 + (|𝜌𝑗| − 𝑟|𝜌𝑗|)𝜎𝑖𝑗𝑘

2 + (𝑟 − 𝑟|𝜌𝑗|)𝜎𝑖𝑗𝑘2 + (1 − 𝑟 − |𝜌𝑗| + 𝑟|𝜌𝑗|)𝜎𝑖𝑗𝑘

2

= 𝜎𝑖𝑗𝑘2 as required.

𝑐𝑜𝑣(𝛿𝑖𝑗1𝑘,𝛿𝑖𝑗2𝑘) = 𝑠𝑖𝑔𝑛(𝜌𝑗1)𝑠𝑖𝑔𝑛(𝜌𝑗2)𝑟 √(|𝜌𝑗1𝜌𝑗2|)𝜎𝑖𝑗1𝑘𝜎𝑖𝑗2𝑘

+ 𝑠𝑖𝑔𝑛(𝜌𝑗1)𝑠𝑖𝑔𝑛(𝜌𝑗2)√((|𝜌𝑗1| − 𝑟|𝜌𝑗1|)(|𝜌𝑗2| − 𝑟|𝜌𝑗2|))𝜎𝑖𝑗1𝑘𝜎𝑖𝑗2𝑘

= 𝑠𝑖𝑔𝑛(𝜌𝑗1)𝑠𝑖𝑔𝑛(𝜌𝑗2)𝜎𝑖𝑗1𝑘𝜎𝑖𝑗2𝑘 (𝑟 √|𝜌𝑗1𝜌𝑗2| + (1 − 𝑟)√|𝜌𝑗1𝜌𝑗2|)

= 𝑠𝑖𝑔𝑛(𝜌𝑗1)𝑠𝑖𝑔𝑛(𝜌𝑗2)√(|𝜌𝑗1𝜌𝑗2|)𝜎𝑖𝑗1𝑘𝜎𝑖𝑗2𝑘 as required.

𝑐𝑜𝑣(𝛿𝑖𝑗𝑘1,𝛿𝑖𝑗𝑘2) = 𝑟|𝜌𝑗|𝜎𝑖𝑗𝑘1𝜎𝑖𝑗𝑘2 + (𝑟 − 𝑟|𝜌𝑗|)𝜎𝑖𝑗𝑘1𝜎𝑖𝑗𝑘2 = 𝑟𝜎𝑖𝑗𝑘1𝜎𝑖𝑗𝑘2 as required.

𝑐𝑜𝑣(𝛿𝑖𝑗1𝑘1,𝛿𝑖𝑗2𝑘2) = 𝑠𝑖𝑔𝑛(𝜌𝑗1)𝑠𝑖𝑔𝑛(𝜌𝑗2)𝑟 √(|𝜌𝑗1𝜌𝑗2|)𝜎𝑖𝑗1𝑘1𝜎𝑖𝑗2𝑘2 as required.

In this version of the model, the correlation between outcomes 𝜔1 and 𝜔2 in the same trial arm is

equal to sign(ρ𝑗1)sign(ρ𝑗2)√|ρ𝑗1ρ𝑗2|. The parameter ρ𝑗 is no longer strictly a correlation coefficient,

but can be thought of as the propensity of outcome j to correlate with other outcomes. In terms of

magnitude the correlation between elements 𝛿𝑖𝑗1𝑘 and 𝛿𝑖𝑗2𝑘 is the geometric mean of the

correlation propensities ρ𝑗1 and ρ𝑗2 , with positive sign if the signs of ρ𝑗1 and ρ𝑗2match, and negative

if they do not. This results in a class of correlation structures that have a particular kind of symmetry

in the sense that each outcome blindly shares its “correlation propensity” with all other outcomes

equally, with any difference in the correlation coefficients being due to their own respective

propensities; an outcome cannot selectively favour any particular others for correlation. In

Chapter II.4

70

particular, if outcomes j1 and j2 are uncorrelated, then either ρ𝑗1 = 0 or ρ𝑗2 = 0, and so at least one

of them must be uncorrelated with every outcome in the model.

This structure permits the use of negative correlations between certain outcomes if desired, but

does not allow all outcomes to be negatively correlated with one another (except in the case with

only two outcomes); rather, the outcomes are partitioned according to the sign of ρ𝑗 into two sets

with positive intra-set correlations and negative inter-set correlations. This is expected to be flexible

enough for most purposes: the scenario where all pairs of outcomes are negatively correlated seems

somewhat improbable, and in any case it is always possible to reverse the sign of an outcome

variable and hence also reverse its correlations, if desired.

Although this structure places restrictions on the space of possible covariance matrices, one

advantage of this is that it is a sufficient (but not necessary) condition for the covariance matrix to be

positive-definite, which is a fundamental requirement of a multivariate Normal distribution. This

result is formalised in the theorem below. Once again, the i subscript plays no part in the theorem

and can be ignored if preferred, but has been left in so that the notation matches the rest of this

chapter.

Theorem 3: A matrix that can be written in the form 𝒊 as defined in Theorem 2 is always positive-

definite, but the converse does not hold.

Proof: First, suppose that we have a matrix 𝒊 as defined.

For clarity, drop the subscript i and define 𝑅𝑗 = 𝑠𝑖𝑔𝑛(𝜌𝑗)√(|𝜌𝑗|). Note that 𝑅𝑗2 = |𝜌𝑗|.

The elements of the multivariate normal distribution are indexed by pairs (𝑗, 𝑘) ∈ 1,… ,𝑁𝑂 ×

1,… ,𝑁𝐴. Ordering these lexicographically (advancing through values of j first, then values of k)

gives the following form for (shown landscape overleaf due to size):

Chapter II.4

71

=

(

(

𝜎112 𝑅1𝑅2𝜎11𝜎21 ⋯ 𝑅1𝑅𝑁𝑂𝜎11𝜎𝑁𝑂,1

𝑅1𝑅2𝜎11𝜎21 𝜎212

⋮ ⋱ ⋮𝑅1𝑅𝑁𝑂𝜎11𝜎𝑁𝑂,1 ⋯ 𝜎𝑁𝑂,1

2)

(

𝑟𝜎11𝜎12 𝑟𝑅1𝑅2𝜎12𝜎22 ⋯ 𝑟𝑅1𝑅𝑁𝑂𝜎1,2𝜎𝑁𝑂,2𝑟𝑅1𝑅2𝜎12𝜎22 𝑟𝜎21𝜎22

⋮ ⋱ ⋮𝑟𝑅1𝑅𝑁𝑂𝜎1,2𝜎𝑁𝑂,2 ⋯ 𝑟𝜎𝑁𝑂,1𝜎𝑁𝑂,2 )

⋯

(

𝑟𝜎11𝜎1𝑁𝐴 𝑟𝑅1𝑅2𝜎1,𝑁𝐴𝜎2,𝑁𝐴 ⋯ 𝑟𝑅1𝑅𝑁𝑂𝜎1,𝑁𝐴𝜎𝑁𝑂,𝑁𝐴𝑟𝑅1𝑅2𝜎1,𝑁𝐴𝜎2,𝑁𝐴 𝑟𝜎2,𝑁𝐴𝜎2,𝑁𝐴

⋮ ⋱ ⋮𝑟𝑅1𝑅𝑁𝑂𝜎1,𝑁𝐴𝜎𝑁𝑂,𝑁𝐴 ⋯ 𝑟𝜎𝑁𝑂,1𝜎𝑁𝑂,𝑁𝐴 )

(

𝑟𝜎11𝜎12 𝑟𝑅1𝑅2𝜎12𝜎22 ⋯ 𝑟𝑅1𝑅𝑁𝑂𝜎1,2𝜎𝑁𝑂,2𝑟𝑅1𝑅2𝜎12𝜎22 𝑟𝜎21𝜎22

⋮ ⋱ ⋮𝑟𝑅1𝑅𝑁𝑂𝜎1,2𝜎𝑁𝑂,2 ⋯ 𝑟𝜎𝑁𝑂,1𝜎𝑁𝑂,2 )

(

𝜎122 𝑅1𝑅2𝜎12𝜎22 ⋯ 𝑅1𝑅𝑁𝑂𝜎12𝜎𝑁𝑂,2

𝑅1𝑅2𝜎12𝜎22 𝜎222

⋮ ⋱ ⋮𝑅1𝑅𝑁𝑂𝜎12𝜎𝑁𝑂,2 ⋯ 𝜎𝑁𝑂,2

2)

⋮

⋮ ⋱

(

𝑟𝜎11𝜎1𝑁𝐴 𝑅1𝑅2𝜎1,𝑁𝐴𝜎2,𝑁𝐴 ⋯ 𝑅1𝑅𝑁𝑂𝜎1,𝑁𝐴𝜎𝑁𝑂,𝑁𝐴𝑟𝑅1𝑅2𝜎1,𝑁𝐴𝜎2,𝑁𝐴 𝑟𝜎2,𝑁𝐴𝜎2,𝑁𝐴

⋮ ⋱ ⋮𝑟𝑅1𝑅𝑁𝑂𝜎1,𝑁𝐴𝜎𝑁𝑂,𝑁𝐴 ⋯ 𝑟𝜎𝑁𝑂,1𝜎𝑁𝑂,𝑁𝐴 )

⋯

(

𝜎1,𝑁𝐴2 𝑟𝑅1𝑅2𝜎1,𝑁𝐴𝜎2,𝑁𝐴 ⋯ 𝑟𝑅1𝑅𝑁𝑂𝜎1,𝑁𝐴𝜎𝑁𝑂,𝑁𝐴

𝑟𝑅1𝑅2𝜎1,𝑁𝐴𝜎2,𝑁𝐴 𝜎2,𝑁𝐴2

⋮ ⋱ ⋮𝑟𝑅1𝑅𝑁𝑂𝜎1,𝑁𝐴𝜎𝑁𝑂,𝑁𝐴 ⋯ 𝜎𝑁𝑂,𝑁𝐴

2)

)

(There are 𝑁𝐴𝑖 ×𝑁𝐴𝑖 sub-matrices, each of size 𝑁𝑂𝑖 ×𝑁𝑂𝑖. To pick out an element 𝒊[(𝑗1, 𝑘1), (𝑗2, 𝑘2)], 𝑘1 and 𝑘2 give the row and column coordinates of

the relevant sub-matrix and 𝑗1 and 𝑗2 give the row and column coordinates of the relevant element within the sub-matrix.)

Chapter II.4

72

To prove that is positive-definite, it is necessary to show that 𝒙𝑇𝒙 > 𝟎 for any non-zero column

vector 𝒙 of length 𝑁𝑂 × 𝑁𝐴. Again this is written as 𝑥𝑗𝑘, (𝑗, 𝑘) ∈ 1,… , 𝑁𝑂 × 1, … ,𝑁𝐴.

It is possible to directly evaluate 𝒙𝑇𝒙

=∑∑𝑥𝑗𝑘

𝑁𝑂

𝑗=1

(𝜎𝑗𝑘2 𝑥𝑗𝑘 +∑𝑅𝑗𝑅𝑠𝜎𝑗𝑘𝜎𝑠𝑘𝑥𝑠𝑘

𝑠≠𝑗

+ 𝑟∑𝜎𝑗𝑘𝜎𝑗𝑡𝑥𝑗𝑡 +𝑡≠𝑘

𝑟∑∑𝑅𝑗𝑅𝑠𝜎𝑗𝑘𝜎𝑠𝑡𝑥𝑠𝑡𝑡≠𝑘𝑠≠𝑗

)

𝑁𝐴

𝑘=1

then rearrange and complete squares:

=∑(∑𝜎𝑗𝑘2 𝑥𝑗𝑘

2 +∑𝑅𝑗𝑅𝑠𝜎𝑗𝑘𝜎𝑠𝑘𝑥𝑗𝑘𝑥𝑠𝑘𝑠≠𝑗

𝑁𝑂

𝑗=1

)+∑∑(𝑟∑𝜎𝑗𝑘𝜎𝑗𝑡𝑥𝑗𝑘𝑥𝑗𝑡 +𝑡≠𝑘

𝑟∑∑𝑅𝑗𝑅𝑗𝑠𝜎𝑗𝑘𝜎𝑠𝑡𝑥𝑗𝑘𝑥𝑠𝑡𝑡≠𝑘𝑠≠𝑗

)

𝑁𝑂

𝑗=1

𝑁𝐴

𝑘=1

𝑁𝐴

𝑘=1

=∑[(∑𝑅𝑗𝜎𝑗𝑘𝑥𝑗𝑘

𝑁𝑂

𝑗=1

)

2

+∑(1− |𝜌𝑗|)𝜎𝑗𝑘2 𝑥𝑗𝑘

2

𝑁𝑂

𝑗=1

]

𝑁𝐴

𝑘=1

+ 𝑟∑∑(∑𝜎𝑗𝑘𝜎𝑗𝑡𝑥𝑗𝑘𝑥𝑗𝑡 +𝑡≠𝑘

∑∑𝑅𝑗𝑅𝑗𝑠𝜎𝑗𝑘𝜎𝑠𝑡𝑥𝑗𝑘𝑥𝑠𝑡𝑡≠𝑘𝑠≠𝑗

)

𝑁𝐴

𝑘=1

𝑁𝑂

𝑗=1

=∑(∑𝑅𝑗𝜎𝑗𝑘𝑥𝑗𝑘

𝑁𝑂

𝑗=1

)

2

+∑∑(1− 𝑟 − |𝜌𝑗|)𝜎𝑗𝑘2 𝑥𝑗𝑘

2

𝑁𝑂

𝑗=1

𝑁𝐴

𝑘=1

𝑁𝐴

𝑘=1

+ 𝑟∑(∑𝜎𝑗𝑘𝑥𝑗𝑘

𝑁𝐴

𝑘=1

)

2𝑁𝑂

𝑗=1

+ 𝑟∑∑∑∑𝑅𝑗𝑅𝑗𝑠𝜎𝑗𝑘𝜎𝑠𝑡𝑥𝑗𝑘𝑥𝑠𝑡𝑡≠𝑘𝑠≠𝑗

𝑁𝐴

𝑘=1

𝑁𝑂

𝑗=1

(3)

Now observe that (∑ ∑ 𝑅𝑗𝜎𝑗𝑘𝑥𝑗𝑘𝑁𝐴𝑘=1

𝑁𝑂𝑗=1 )

2 expands as follows:

(∑ ∑ 𝑅𝑗𝜎𝑗𝑘𝑥𝑗𝑘𝑁𝐴𝑘=1

𝑁𝑂𝑗=1 )

2

=∑∑|𝜌𝑗|𝜎𝑗𝑘2 𝑥𝑗𝑘

2

𝑁𝐴

𝑘=1

𝑁𝑂

𝑗=1

+∑|𝜌𝑗|∑∑𝜎𝑗𝑘𝜎𝑗𝑡𝑥𝑗𝑘𝑥𝑗𝑡𝑡≠𝑘

𝑁𝐴

𝑘=1

𝑁𝑂

𝑗=1

+∑∑∑𝑅𝑗𝑅𝑗𝑠𝜎𝑗𝑘𝜎𝑠𝑘𝑥𝑗𝑘𝑥𝑠𝑘

𝑠≠𝑗

𝑁𝑂

𝑗=1

𝑁𝐴

𝑘=1

+∑∑∑∑𝑅𝑗𝑅𝑗𝑠𝜎𝑗𝑘𝜎𝑠𝑡𝑥𝑠𝑡𝑡≠𝑘𝑠≠𝑗

𝑁𝐴

𝑘=1

𝑁𝑂

𝑗=1

Rearranging and completing squares again gives

∑∑∑∑𝑅𝑗𝑅𝑗𝑠𝜎𝑗𝑘𝜎𝑠𝑡𝑥𝑠𝑡𝑡≠𝑘𝑠≠𝑗

𝑁𝐴

𝑘=1

𝑁𝑂

𝑗=1

= (∑∑𝑅𝑗𝜎𝑗𝑘𝑥𝑗𝑘

𝑁𝐴

𝑘=1

𝑁𝑂

𝑗=1

)

2

−∑∑|𝜌𝑗| 𝜎𝑗𝑘

2 𝑥𝑗𝑘2

𝑁𝐴

𝑘=1

𝑁𝑂

𝑗=1

−∑|𝜌𝑗| [(∑𝜎𝑗𝑘𝑥𝑗𝑘

𝑁𝐴

𝑘=1

)

2

−∑𝜎𝑗𝑘2 𝑥𝑗𝑘

2

𝑁𝐴

𝑘=1

]

𝑁𝑂

𝑗=1

−∑[(∑𝑅𝑗𝜎𝑗𝑘𝑥𝑗𝑘

𝑁𝑂

𝑗=1

)

2

−∑|𝜌𝑗| 𝜎𝑗𝑘

2 𝑥𝑗𝑘2

𝑁𝑂

𝑗=1

]

𝑁𝐴

𝑘=1

= (∑∑ 𝑅𝑗𝜎𝑗𝑘𝑥𝑗𝑘

𝑁𝐴

𝑘=1

𝑁𝑂

𝑗=1

)

2

− ∑|𝜌𝑗|(∑ 𝜎𝑗𝑘𝑥𝑗𝑘

𝑁𝐴

𝑘=1

)

2

+

𝑁𝑂

𝑗=1

∑∑|𝜌𝑗|𝜎𝑗𝑘

2𝑥𝑗𝑘2

𝑁𝐴

𝑘=1

𝑁𝑂

𝑗=1

−∑(∑ 𝑅𝑗𝜎𝑗𝑘𝑥𝑗𝑘

𝑁𝑂

𝑗=1

)

2𝑁𝐴

𝑘=1

Substituting this into (3) gives

Chapter II.4

73

𝒙𝑇𝒙 =∑(∑𝑅𝑗𝜎𝑗𝑘𝑥𝑗𝑘

𝑁𝑂

𝑗=1

)

2

+∑∑(1 − 𝑟 − |𝜌𝑗|)𝜎𝑗𝑘2 𝑥𝑗𝑘

2

𝑁𝑂

𝑗=1

𝑁𝐴

𝑘=1

𝑁𝐴

𝑘=1

+ 𝑟∑(∑𝜎𝑗𝑘𝑥𝑗𝑘

𝑁𝐴

𝑘=1

)

2𝑁𝑂

𝑗=1

+ 𝑟(∑∑𝑅𝑗𝜎𝑗𝑘𝑥𝑗𝑘

𝑁𝐴

𝑘=1

𝑁𝑂

𝑗=1

)

2

− 𝑟∑|𝜌𝑗|(∑𝜎𝑗𝑘𝑥𝑗𝑘

𝑁𝐴

𝑘=1

)

2

+

𝑁𝑂

𝑗=1

𝑟∑∑|𝜌𝑗|𝜎𝑗𝑘2 𝑥𝑗𝑘

2

𝑁𝐴

𝑘=1

𝑁𝑂

𝑗=1

− 𝑟∑(∑𝑅𝑗𝜎𝑗𝑘𝑥𝑗𝑘

𝑁𝑂

𝑗=1

)

2𝑁𝐴

𝑘=1

= 𝑟(∑∑𝑅𝑗𝜎𝑗𝑘𝑥𝑗𝑘

𝑁𝐴

𝑘=1

𝑁𝑂

𝑗=1

)

2

+ 𝑟∑(1 − |𝜌𝑗|) (∑𝜎𝑗𝑘𝑥𝑗𝑘

𝑁𝐴

𝑘=1

)

2

+

𝑁𝑂

𝑗=1

(1 − 𝑟)∑(∑𝑅𝑗𝜎𝑗𝑘𝑥𝑗𝑘

𝑁𝑂

𝑗=1

)

2𝑁𝐴

𝑘=1

+∑∑(1− 𝑟 − |𝜌𝑗| + 𝑟|𝜌𝑗|)𝜎𝑗𝑘2 𝑥𝑗𝑘

2

𝑁𝑂

𝑗=1

𝑁𝐴

𝑘=1

For 𝑟 ∈ [0,1] every term in the final expression is guaranteed to be nonnegative, and at least one

term must be strictly positive. Therefore 𝒙𝑇𝒙 > 0 and is positive-definite.

To show that the converse does not hold, it is sufficient to provide a counterexample. One possible

positive-definite correlation matrix that does not conform to the correlation structure described

above is 𝑨 = (1 −0.5 0

−0.5 1 −0.50 −0.5 1

) with NO=3, NA=1.

If 𝑨 could be expressed with the parameters described above we would have 𝑅1𝑅3 = 0 and 𝑅1𝑅2 =

𝑅2𝑅3 = −0.5 which cannot simultaneously hold.

𝑨 is positive-definite since

(𝑥1, 𝑥2, 𝑥3)𝑨(𝑥1, 𝑥2, 𝑥3)𝑇 = 𝑥1

2 + 𝑥22 + 𝑥3

2 − 𝑥1𝑥2 − 𝑥1𝑥3 = (𝑥2 − 𝑥1 2)⁄ 𝟐+ (𝑥2 − 𝑥1 2)⁄ 𝟐

+ 𝑥12 2⁄ > 0 unless

𝑥1 = 𝑥2 = 𝑥3 = 0.

II.4.5 Fixed baseline (Model 2)

Although the model above allows correlations between outcomes to be incorporated, the existence

of active-active trials within the dataset means there is potentially a problem with the model,

stemming from the definitions of 𝜇 and 𝛿.

Within study i, the mean value of outcome j in the first (or “baseline”) arm (𝑘 = 1) is given by 𝜇𝑖𝑗

and in all other arms (𝑘 > 1) is given by 𝜇𝑖𝑗 + 𝛿𝑖𝑗𝑘 . In other words, the parameterisation is

asymmetrical across trial arms, with the mean outcome having higher prior variability for 𝑘 > 1 than

for 𝑘 = 1 (since 𝜇 and 𝛿 are assumed independent). Usually 𝑘 = 1 represents placebo or no

treatment, and the asymmetry perhaps makes intuitive sense, but in trials without a placebo arm

Chapter II.4

74

𝑘 = 1 represents the outcome on an arbitrarily chosen active treatment and the asymmetry is

undesirable.

This issue been noted elsewhere45 but the solutions put forward by the authors do not appear to

result in satisfactory models, as explained below.

Since it is only the prior variance of the arm-level outcomes that is affected by this issue with the

model structure, and the main target of inference is the relative effects, this issue may not usually be

of great concern. Still, it may be possible to avoid it altogether by a simple change in the

parameterisation so that the treatment effects are expressed relative to a fixed baseline of placebo /

no treatment in every trial. In other words, redefine 𝛿𝑖𝑗𝑘 = 0 for 𝑡𝑖𝑘 = 1. For 𝑘 > 1, in the random

effects model, the 𝛿𝑖𝑗𝑘 are jointly described by the following distribution:

𝜹𝑖 ~ 𝑀𝑉𝑁𝑗𝑘(𝒅𝑖 ,𝒊 )

where 𝒅𝑖 is a vector of length 𝑁𝑂𝑖 ×𝑁𝐴𝑖 whose elements are 𝑑𝜔𝑖𝑗𝑡𝑖𝑘 (indexed by (𝑗, 𝑘) ∈

1, … , 𝑁𝑂𝑖 × 1, … , 𝑁𝐴𝑖 ) and 𝒊 is of dimension (𝑁𝑂𝑖 ×𝑁𝐴𝑖) × (𝑁𝑂𝑖 ×𝑁𝐴𝑖) but otherwise defined

as before. The between-arm correlations now apply in any trial with at least two active treatments

(rather than only in multi-arm trials as before). The diagonal elements of 𝒊 are equal to the

random-effects variance σ2 and the off-diagonal elements are equal to either 0.5 σ2 (same 𝑗 different

𝑘, ρb σ2 (different 𝑗 same 𝑘), or 0.5ρb σ2 (different 𝑗 different 𝑘). If we order the elements of 𝜹𝑖 and

𝒅𝑖 lexicographically (advancing through values of j first, then values of k), then the distribution can

be written

(

𝛿𝑖11⋮

𝛿𝑖1𝑁𝑂𝑖𝛿𝑖21⋮

𝛿𝑖2𝑁𝑂𝑖⋮⋮

𝛿𝑖𝑁𝐴𝑖1⋮

𝛿𝑖𝑁𝐴𝑖𝑁𝑂𝑖)

~ 𝑀𝑉𝑁(

(

𝑑𝜔𝑖1𝑡𝑖1


𝑑𝜔𝑖1𝑡𝑖2


⋮⋮

𝑑𝜔𝑖1𝑡𝑖𝑁𝐴𝑖

⋮ 𝑑𝜔𝑖𝑁𝑂𝑖𝑡𝑖𝑁𝐴𝑖 )

,𝒊 )

Chapter II.4

75

where 𝒊 takes the form

𝒊 = σ2

(

(




⋱ ⋮⋯ 1

) (

0.5 0.5𝜌𝑏0.5𝜌𝑏 0.5

⋯ 0.5𝜌𝑏⋯ 0.5𝜌𝑏

⋮ ⋮0.5𝜌𝑏 0.5𝜌𝑏

⋱ ⋮⋯ 0.5

)

(

0.5 0.5𝜌𝑏0.5𝜌𝑏 0.5

⋯ 0.5𝜌𝑏⋯ 0.5𝜌𝑏

⋮ ⋮0.5𝜌𝑏 0.5𝜌𝑏

⋱ ⋮⋯ 0.5

) (




⋱ ⋮⋯ 1

)

⋯ (

0.5 0.5𝜌𝑏0.5𝜌𝑏 0.5

⋯ 0.5𝜌𝑏⋯ 0.5𝜌𝑏

⋮ ⋮0.5𝜌𝑏 0.5𝜌𝑏

⋱ ⋮⋯ 0.5

)

⋯ (

0.5 0.5𝜌𝑏0.5𝜌𝑏 0.5

⋯ 0.5𝜌𝑏⋯ 0.5𝜌𝑏

⋮ ⋮0.5𝜌𝑏 0.5𝜌𝑏

⋱ ⋮⋯ 0.5

)

⋮ ⋯

(

0.5 0.5𝜌𝑏0.5𝜌𝑏 0.5

⋯ 0.5𝜌𝑏⋯ 0.5𝜌𝑏

⋮ ⋮0.5𝜌𝑏 0.5𝜌𝑏

⋱ ⋮⋯ 0.5

) (

0.5 0.5𝜌𝑏0.5𝜌𝑏 0.5

⋯ 0.5𝜌𝑏⋯ 0.5𝜌𝑏

⋮ ⋮0.5𝜌𝑏 0.5𝜌𝑏

⋱ ⋮⋯ 0.5

)

⋱ ⋮

⋯ (




⋱ ⋮⋯ 1

)

)

consisting of 𝑁𝐴𝑖 ×𝑁𝐴𝑖 sub-matrices, each of size 𝑁𝑂𝑖 ×𝑁𝑂𝑖. To pick out an element

𝒊[(𝑗1, 𝑘1), (𝑗2, 𝑘2)], 𝑘1 and 𝑘2 give the row and column coordinates of the relevant sub-matrix and

𝑗1 and 𝑗2 give the row and column coordinates of the relevant element within the sub-matrix.)

A first attempt at accommodating this revised definition of 𝜹𝑖might simply be to redefine 𝜇𝑖𝑗 to be

the mean value of outcome 𝑗 in study 𝑗 on placebo (and indeed this model is suggested by Hong et

al45). However, 𝜇𝑖𝑗 thus defined cannot be estimated from the data for trials without a placebo arm,

which results in a model that does not converge.

Instead redefine 𝜇𝑖𝑗 to be the average value of outcome 𝑗 across all arms of study 𝑖. This quantity is

readily identifiable from the arm-level data. Then replace (1) with

𝒚𝑖𝑘~ 𝑀𝑉𝑁(𝝁𝑖 + 𝜹𝑖𝑘 −1

𝑁𝐴i∑ 𝜹𝑖𝑚𝑁𝐴𝑖𝑚=1 , 𝐂𝐕𝑖𝑘 )

It is important to note however that this parameterisation can only be used for arm-level, not

contrast-level data.

One can interpret this model in two ways:

• 𝜹𝑖𝑘 −1

𝑁𝐴i∑ 𝜹𝑖𝑚𝑁𝐴𝑖𝑚=1 is the treatment effect vector for arm 𝑘 relative to the outcome vector

in the “average” arm whose value is given by 𝝁𝑖 ; or

• 𝜹𝑖𝑘 is the treatment effect vector for arm 𝑘 relative to the outcome on placebo whose value

is given by 𝝁𝑖 −1

𝑁𝐴i∑ 𝜹𝑖𝑚𝑁𝐴𝑖𝑚=1

This parameterisation for 𝜇 and 𝛿 has previously been used in univariate NMAs 60 but not in the

multivariate setting. Under this model the prior outcome variance is better behaved in that it is

Chapter II.4

76

always placebo arms that have a lower variance, and active treatments are treated symmetrically. .

If desired, an active treatment can be used as the baseline, not just placebo, but due to these

properties a placebo baseline is to be preferred whenever possible.

Again it is straightforward to obtain the corresponding fixed effect model by replacing 𝛿𝑖𝑗𝑘 with its

mean according to the distribution above.

II.4.6 Mappings (Model 3)

The models described so far do not address the issues of patchy data. In patchy networks, outcomes

may be completely missing for some treatments, or studies may adopt different clinical thresholds

or definitions. The way I am proposing to address these issues is to provide structural links between

the mean treatment effects 𝑑𝜔𝑡 via proportional mappings between different outcomes. Specifically,

the treatment effect parameters for pairs of outcomes 1, 2 will be linked by equations of the form

𝑑𝜔1𝑡 = 𝑏𝑑𝜔2𝑡 for some constant 𝑏; in other words, linear mappings with no intercept. The mapping

parameters 𝑏 are to be estimated based on the ratios 𝑑𝜔1𝑡/𝑑𝜔2𝑡 of treatment effects estimated

from the trial data.

The concept can be illustrated using the RRMS case study. Table 7 shows the posterior mean effect

size (relative to placebo) for every treatment and outcome that could be estimated by Model 2.

Table 7 – Posterior mean effect estimates from Model 2.

Outcome

Treatment ARR RFP DP3 DP6 ALT ALT3 ALT5

PL 0 0 0 0 0 0 0

DF -0.70 -0.71 -0.53 -0.47 0.25 0.23 -0.53

FM -0.73 -0.92 -0.30 -0.41 1.37 1.30 0.64

GA -0.38 -0.63 -0.18 -0.08 -0.12 0.32 -0.11

IA (IM) -0.20 -0.38 -0.30 -0.44 0.63 0.21 -0.33

IA (SC) 0.03 -0.80 -0.65 0.48 1.61

IBB -0.41 -0.77 -0.11 -1.51 1.11

LQ -0.20 -0.33 -0.39 -0.48 0.74 0.84 -0.34

TF -0.38 -0.43 -0.40 0.88 0.00

The missing entries in the table are the treatment-outcome combinations that were not reported in

the data. The mapping-based model aims to fill in these gaps by, essentially, observing the ratios

between the column entries within other rows and applying the average ratios to the rows with

missing entries to impute the values. At the same time it will smooth the observed values somewhat

to better fit the average mapping ratios.

Introducing mappings into the model has appealing potential benefits in patchy networks:

Chapter II.4

77

• Estimation of treatment effect parameters for outcomes that are not reported in the data

by mapping from other outcomes that are reported. This facilitates the estimation of a

standard set of outcomes for each treatment to take forward for decision making, when the

outcomes as reported are not standardised across treatments.

• Increasing the extent of “borrowing strength” between closely related outcomes . The

mappings will tend to smooth the results between outcomes that are mapped to one

another which may be helpful In some situatuions, such as when there is any uncertrianty

over which version of an outcome to take forward in a decision analysis. Choosing any one

outcome risks discarding valuable information if mappings are not used, but with mappings

in place, the results from any one outcome will automatically be influenced by trends in

those with which it is mapped - so no data is truly discarded no matter which outcome is

chosen, and in a sense the choice becomes less critical.

The mappings could take a number of forms but in the absence of any specific hypotheses on the

relationships between outcomes, proportional mappings between outcomes appears an obvious and

straightforward starting point. It is logical not to include an additive/intercept term, as a null

treatment such as placebo would have no effect on any outcome as per the first row of Table 7).

The approach is similar to one employed by Lu et al and Ades et al 46,47, but their mappings were

applied to the study-specific random effects 𝜹. Moreover, their mappings were only used for

outcomes that measured the same underlying clinical concept, and hence could safely be assumed

to occur in linear proportion to one another. Here I am proposing to use the mappings to link

different but related clinical concepts, and as such the assumption that the treatment effects on

different outcomes occur in consistent proportions regardless of treatment is somewhat stronger.

In a sense this strong assumption is the price that must be paid for the additional inferences the

mappings make available. In order to verify that the proportionality assumption holds for the RRMS

dataset, Appendix A contains two-way plots of the posterior mean treatment effect estimates 𝑑𝜔𝑡

from a univariate NMA model for each outcome. Overall the plots appear to correspond reasonably

well to straight lines through the origin, suggesting that the estimates do indeed occur in fairly

constant proportions, although there is some spread around the apparent trend lines and a few

outliers.

Recall from II.1.1 that, since treatment contrasts are transitive (i.e. AC = AB + BC for treatments A, B

and C, where AB is the contrast comparing B to A, etc.), it is necessary to parameterise the model in

a way that gives estimates which are consistent with regard to transitivity. For this reason only the

basic treatment effect parameters 𝑑𝜔𝑡 (comparing each treatment t > 1 to the reference treatment

Chapter II.4

78

1) were independently defined in the model; the remaining treatment effects are found via the

consistency equations (for example, the effect for t2 relative to t1 is calculated as 𝑑𝜔𝑡2 − 𝑑𝜔𝑡1 ). An

analogous situation arises with the mapping ratios, which are also transitive (on a multiplicative

scale) in the sense that the ratio 𝑑𝜔3𝑡 / 𝑑𝜔1𝑡 is the product of the ratios 𝑑𝜔3𝑡 / 𝑑𝜔2𝑡 and

𝑑𝜔2𝑡 / 𝑑𝜔1𝑡 . Accordingly the model should also exhibit consistency of mappings, and so rather than

defining a mapping maprameter for every pair of outcomes, only the basic mapping parameters 𝑏𝜔𝑡

(the ratio 𝑑𝜔𝑡 / 𝑑1𝑡 between the effects on outcome ω > 1 and outcome 1) are independently

defined in the model, leaving the remaining mapping ratios between outcomes ω2 and ω1 to be

calculated as 𝑏𝜔2𝑡/𝑏𝜔1𝑡 if required.

Thus, using fixed mappings for all treatments, For >1 the mapping equation for each treatment t is

specified in the model as 𝑑𝜔𝑡 = 𝑏𝜔𝑑1𝑡 where 𝑏𝜔 maps the treatment’s effect on outcome 1 to its

effect on outcome .

46,47In the same way as Lu et al and Ades et al46,47, only the absolute value of the mappings is allowed

to be random – the sign of each treatment effect 𝑑𝜔𝑡 is assumed to be known a priori for each

outcome and is taken to be the same for all treatments. The reference treatment is usually a

placebo/no treatment option and therefore specifying the signs of the treatment effects in advance

should not be too controversial, at least for treatments that have already cleared early phase or

pivotal trials since as a rule these will have a nonnegative effect on efficacy and a negative effect on

safety.

For the “fixed-mapping” model as described, the mappings are identical for all treatments, and thus

the average treatment effects 𝑑𝜔𝑡 are kept in strict proportion. The strength of the proportionality

assumption can be relaxed somewhat by use of a “random-mapping” model, where mappings are

allowed to vary between treatments (but they remain similar in the sense they are drawn from the

same distribution, and always respect the known signs of the treatment effects). For >1, the

mapping equation for treatment t is 𝑑𝜔𝑡 = 𝛽𝜔𝑡𝑑1𝑡 where 𝛽𝜔𝑡 maps the treatment’s effect on

outcome 1 to its effect on outcome .

It is convenient to define the distribution of the random mappings on the logarithmic scale – i.e. by

assigning a distribution to log (𝛽𝜔𝑡) – for two reasons. The first is that constant interval variability

of mappings on the log scale corresponds to linear mappings with variability proportional to their

magnitude, which has fits well with the multiplicative nature of the mappings and prevents the

lower tail of the distribution from straying into negative territory.

Chapter II.4

79

The second reason for the log transformation relates to correlations between mappings. For

outcomes 1, 2 >1, the estimated mappings 𝛽𝜔1 and 𝛽𝜔2 will be correlated across treatments, as

for a given treatment, they are estimated by the absolute ratios |𝑑𝜔1/𝑑1| and |𝑑𝜔2/𝑑1|

respectively, sharing a common denominator 𝑑1 (which is essentially a weighted average of

random-effects estimates , in turn estimated as linear differences between observed data points).

These correlations are much more easily expressed on the logarithmic scale, which replaces the

ratios with linear differences, i.e. log (𝛽𝜔𝑡) = log(|𝑑𝜔1|) − log (|𝑑1|). Even so, the correlation

coefficients cannot be specified in advance with any accuracy, as they derive from the relative

variances of the estimates log ( |𝑑𝜔|), and hence depend not only on the network structure but also

on the magnitude of 𝑑𝜔. However, under the assumption that log ( |𝑑𝜔|) is of equal variance for

different values of , the correlation between log(𝛽𝜔1) and log(𝛽𝜔2) will on average be 0.5, and

this seems a reasonable starting assumption.

The random mapping distribution is therefore defined as log (𝜷𝒕) ~ 𝑀𝑉𝑁(log (𝒃), 𝑸) where 𝒃 =

𝑏2, 𝑏𝟑,… , 𝑏 is the vector of average mappings and 𝑸 is a covariance matrix with diagonal terms

equal to the between-treatment mapping variance 𝜎𝑚𝑎𝑝2 and off-diagonal terms equal to 0.5*𝜎𝑚𝑎𝑝

2 .

(

log (𝛽2)log (𝛽3)

⋮log (𝛽)

)~ 𝑀𝑉𝑁((

log (𝑏2)log (𝑏𝟑)

⋮log (𝑏)

) , 𝜎𝑚𝑎𝑝2 (

1 0.50.5 1

⋯ 0.5⋯ 0.5

⋮ ⋮0.5 0.5

⋱ ⋮⋯ 1

) )

As the assumption of proportionality may be considered too strong to apply universally across a

given set of outcomes, the mappings can be applied only within certain subsets of outcomes that are

especially closely related, rather than between all outcomes simultaneously. A number of mapping

schemes have been evaluated within the RRMS case study, as follows:

• One-group model: all outcomes grouped together.

• Two-group model: all efficacy outcomes grouped together, all liver-safety outcomes grouped

together.

• Three-group model: both relapse outcomes grouped together, both disability progression

outcomes grouped together, all liver-safety outcomes grouped together (groups correspond

to the green cells in Figure 10).

• No mappings (alternatively, this can be thought of as a model with a group for each

outcome).

Chapter II.4

80

The groupings apply only to the mappings and do not impose any restrictions on the within- or

between-study correlations.

II.4.7 Outcomes with zeroes (Models 4a and 4b)

When correlations were introduced to the models above, it was convenient to drop three binary

outcomes from the decision set (serious gastrointestinal events, serious bradycardia and macular

edema) as, due to the presence of zero rates in the data, they could not be expressed on a scale

suitable for modelling with a multivariate normal distribution. This section revisits these outcomes

and considers how they can be included in the multivariate model.

Many of the measurement scales typically used for binary outcomes cannot cope well with

proportions of zero (or, equivalently, 100%). Odds (and therefore odds ratios) cannot be defined for

a study arm with zero observed events for a given outcome, and neither can one inversion of the log

relative risk comparing study arms.

For a binary outcome in a benefit-risk assessment, any observed zeroes will usually occur alongside

non-zero observations (if all the observations were zeroes, the outcome would not differentiate

between treatments and could be excluded from the analysis). When considering such outcomes, it

may be helpful to consider whether the zeroes are likely to be chance observations from a

distribution with nonzero expectation or, alternatively, if any of the underlying average rates may

actually be zero (for example, adverse events that only occur on certain treatments). In the former

case, it is probably most straightforward to handle any isolated zero data points by adding an equal

continuity correction to all study arms, as is common practice117, and using the log odds scale for

binary outcomes as in the models above, as this is the most convenient scale for nonzero binary

rates. In the latter case, however, with zero average rates expected, this approach seems

unsatisfactory: not only would this require extensive modification to the data, but the log odds scale

simply seems inappropriate when it is mathematically unable to express the true expected rate.

Arguably modelling the proportion in each study arm, with the risk difference as contrast, is the

natural solution for such outcomes. However, using this scale together with the multivariate normal

distributions in the model presents a few further issues.

With a between-study random effects distribution on the risk difference scale, values of 𝛿 may

exceed the theoretical range [-1,1] of the risk difference (and an even tighter range may apply

depending on the corresponding arm-level probabilities). This should not present many issues

provided that any parameters or variables dependent on 𝛿 are subject to an appropriate

Chapter II.4

81

ceiling/floor, which is straightforward to apply in BUGS. The arm-level probability 𝑝𝑖𝑘𝑗 of the jth

outcome occurring in the kth arm of study i is specified as follows :

𝑝𝑖𝑘𝑗 = min(max(𝜇𝑖𝑗 + 𝛿𝑖𝑘𝑗) , 0 ) , 1)

What this means is that the portions of the tails of the random effects distribution that extend

beyond the range are replaced by probability masses at the limits. For the RRMS dataset, there are

few studies providing data for the outcomes that have zero rates, so it will typically be assumed in

this section that fixed effect models (and therefore, no between-study correlations) are to be fitted

for these outcomes; in other words 𝛿𝑖𝑘𝑗 = 𝑑𝜔𝑖𝑗𝑡𝑖𝑘 . The study-level likelihood is more problematic.

One approach is to use a Binomial likelihood for the outcome in question (as per Model 0) – call this

Model 4a:

𝑦𝑖𝑘𝑗~𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙(𝑛𝑖𝑘 , 𝑝𝑖𝑘𝑗)

The main drawback of this approach is that it renders it unfeasible to incorporate within-study

correlations with other outcomes. One might argue that, if the outcomes with zeroes are adverse

events caused by particular treatments, rather than part of the usual disease course, then it may be

reasonable to assume that these outcomes exhibit a low degree of correlation with other outcomes

in the model. However, some correlations may remain since, for example, the incidence of an

adverse event and the efficacy of treatment may both be influenced by a patient’s particular

pharmacodynamics and pharmacokinetics.

A model that allows for within-study correlations would therefore be desirable. This can be

achieved using a multivariate Normal likelihood, but two problems present themselves:

• the conventional approximate Normal variance of the risk, p(1-p)/n , estimated using the

observed proportion , gives a variance of zero when no events are observed;

• the Normal distribution gives a non-zero likelihood of proportions below zero.

The second point is arguably of little practical concern: parameters can be constrained to avoid

impossible values, for example via priors or by applying a ceiling/floor within the model. However,

the first point, estimating the variance of the data points with zero events, must be dealt with before

a Normal model can be fitted. One approach may be to use a continuity correction.

Adding a constant continuity correction to the observed outcome proportion in all treatment arms

allows the Normal approximation to be used – and hence, permits within-study correlations - while

preserving the observed risk differences. This is Model 4b. The estimated variances may be

Chapter II.4

82

spurious, however, they serve as a starting point. By comparing the model’s results to model 4a

under the assumption of zero within-study correlations, it may be possible to tune the sample

variances to give the “right” results, and perhaps derive a rule of thumb to estimate the sample

variance for observed zero proportions.

Assuming a constant within-study correlation coefficient 𝜌𝑤 between all pairs of outcomes, the

Model 4b likelihood is:

𝒚𝑖𝑘/𝑛𝑖𝑘 ~ 𝑀𝑉𝑁

(

(

𝑝𝑖𝑘1𝑝𝑖𝑘2⋮

𝑝𝑖𝑘𝑛

) ,

(

𝑆𝑖𝑘1 𝜌𝑤√𝑆𝑖𝑘1𝑆𝑖𝑘2

𝜌𝑤√𝑆𝑖𝑘1𝑆𝑖𝑘2 𝑆𝑖𝑘2

⋯ 𝜌𝑤√𝑆𝑖𝑘1𝑆𝑖𝑘𝑁𝑂𝑖

⋯ 𝜌𝑤√𝑆𝑖𝑘2𝑆𝑖𝑘𝑁𝑂𝑖

⋮ ⋮

𝜌𝑤√𝑆𝑖𝑘1𝑆𝑖𝑘𝑁𝑂𝑖 𝜌𝑤√𝑆𝑖𝑘2𝑆𝑖𝑘𝑁𝑂𝑖

⋱ ⋮⋯ 𝑆𝑖𝑘𝑁𝑂𝑖) )

)

where 𝒚𝑖𝑘/𝑛𝑖𝑘 = 𝒚𝑖𝑘/𝑛𝑖𝑘 + 𝑐𝑐 , i.e. the observed proportions are adjusted by adding a continuity

correction, 𝑆𝑖𝑘𝑗 = 𝛼𝑛𝑖𝑘𝑝𝑖𝑘𝑗(1 − 𝑝𝑖𝑘𝑗) based on the approximate sampling distribution of a

proportion and α is a pre-specified constant used to scale the variances.

A similar method based on rate differences should be possible for count outcomes, which also

present difficulties when zero rates are encountered. The RRMS dataset contains no such examples,

however, so this will not be explored here.

It is worth bearing in mind that outcomes with zeroes will often be adverse events specific to

particular treatments – and as such it will normally be more appropriate to assume a zero treatment

effect for treatments with no data, rather than using mappings to fill in the gaps based on the rates

observed on other treatments. “If it’s not reported, it doesn’t happen” may often be a fair

assumption when it comes to adverse events. However, this is not always the case. Reporting

practices vary118, and adverse events can go unmentioned in published trial reports for a number of

reasons, for example if they occur below a certain threshold rate, were not specified in the study

protocol, are judged to be clinically insignificant or unrelated to treatment, or simply at the

discretion of the investigators. In a trial of an active drug against placebo, or a pairwise meta-

analysis, allowing adverse events to go unreported will usually bias the benefit-risk assessment in

favour of the active drug. Where a more complex evidence network is used, the bias can go in

various directions depending on the structure of the evidence network and the (unknown) true rates

of unreported events; scenario-based sensitivity analyses could help to unpick the impact.

Chapter II.4

83

For the purpose of this thesis it will be assumed that serious gastrointestinal events, serious

bradycardia and macular edema did not occur unless reported, and the models shown here

therefore do not apply mappings to these outcomes, instead assuming a zero effect wherever there

is no data. This is implemented by modelling multiplying each treatment effect by a constant

specific to each treatment/outcome combination that takes the value 0 as needed (or 1 otherwise).

The details can be found with the code in Appendix B. This same method can be used in any

applications where it is believed a priori that some outcomes never occur on any treatment.

II.4.8 Priors

The general principle adopted here is to use “minimally informative” priors wherever possible. This

means they establish the appropriate scale/sign but go no further. Table 8 lists each relevant

parameter and the prior(s) it is given.

Outcomes 1-7 are expressed on scales where the outcomes and treatment effects can take any value

on the real line, whereas outcomes 8-10 (the “zeroes” outcomes) are expressed on the risk

difference scale where outcomes are restricted to the interval [0,1]. Different priors are employed

accordingly.

The random effects standard deviation is assigned a vague Uniform prior as has previously been

recommended 78; in evidence networks with very few studies where the posterior distribution of the

standard deviation is dominated by the prior, it might be appropriate to be more informative. The

random mappings precision uses a prior that has been suggested in a similar model 46,47.

The signs of the treatment effects are taken to be known, and it is only necessary to assign priors to

the magnitudes of the effects 𝑑𝜔𝑡 and mappings 𝑏 . The upper half of a Normal distribution

centred on 0 is sometimes used for this purpose (this is sometimes known as a folded-Normal), and

is denoted below by N+(0,variance). Certain priors cause error messages in BUGS in some models; it

is therefore necessary to sometimes adopt different priors in different models, as indicated. The

sensitivity of the results to the choice of priors is explored later.

Chapter II.4

84

Table 8 – Priors for treatment effect module parameters

Parameter name Parameter description Prior(s)

𝑑𝜔𝑡 Population-average

treatment effect

(compared to placebo)

of treatment t on

outcome

𝑑𝜔𝑡 ∈ ℝ:

|𝑑𝜔𝑡 | ~ 𝑁+(0, 1000)

𝑑𝜔𝑡 ∈ [-1,1]:

|𝑑𝜔𝑡 | ~ 𝐵𝑒𝑡𝑎(0.5, 0.5)

𝜇𝑖𝑗 “Baseline” value of

outcome j in study i

(refers either to arm 1

or average of arms,

depending on model

version)

𝜇𝑖𝑗 ∈ ℝ:

𝜇𝑖𝑗 ~ 𝑁(0,1000) or 𝑁(0,100)

𝜇𝑖𝑗 ~ 𝑁(0,1) for some outcomes in Model 0

with fixed effects

𝜇𝑖𝑗 ∈ [0,1]:

𝜇𝑖𝑗 ~ 𝐺𝑎𝑚𝑚𝑎(0.5,0.5) (although this

distribution can take values above 1, these

values are effectively censored by the

model)

σ Standard deviation of

random effects

distribution

𝜎 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10)

𝑏 Average mapping

coefficient for outcome

|𝑏𝜔 | ~ 𝑁+(0,1000)

𝜏𝑚𝑎𝑝 = 1/𝜎𝑚𝑎𝑝2 Precision of random

mappings distribution

𝜏𝑚𝑎𝑝 ~ 𝐺𝑎𝑚𝑚𝑎(0.05,0.05) censored on interval (1,

∞)

II.4.9 Assessing model fit and complexity

The suitability of a statistical model for a particular dataset is often described in terms of its fit and

complexity. In broad terms, fit measures how closely the model describes the data (the closer the

fitted/predicted values are to the observed values, the better the fit) and complexity measures the

level of detail in the model structure (typically taken to mean the number of parameters). Models

with better fit and lower complexity are favoured, although these are often conflicting objectives –

Chapter II.4

85

for example, a model with a parameter for every observation will achieve perfect fit but is overly

complex, while at the other extreme, a model where all the fitted values are identical is of minimal

complexity but will exhibit poor fit. For this reason it is good practice to select models based on

both fit and complexity rather than focusing exclusively on one or the other.

Various measures of model fit and complexity have been developed but, in the Bayesian context, it

has been argued that the residual deviance is a natural measure of model fit, and the Deviance

Information Criterion a reasonable model selection measure that reflects both fit and complexity 119.

In a univariate Normal model, the residual deviance can be simply understood as the sum of the

squared standardised residuals for all observations in the data, where the standardised residual for

an observation is the number of standard deviations by which the observed value differs from the

fitted value. Extending this to the case of a multivariate Normal likelihood, the residual deviance

𝐷(𝝋), where 𝝋 refers to the set of parameters that are the target for inference, is calculated as

𝐷(𝝋) =∑(𝒚𝒊 −𝑬[𝒀𝒊|𝝋]) 𝐂𝐕𝒊−𝟏(𝒚𝒊 −𝑬[𝒀𝒊|𝝋])

𝒊

where 𝐂𝐕𝒊−𝟏 is the within-study coprecision matrix, i.e. the inverse of 𝐂𝐕𝒊. If the model fits well

then the mean residual deviance 𝐷(𝝋) should be similar to or less than the number of independent

observations. One can also calculate each study’s contribution to the residual deviance (i.e. the

summand on the right hand side of the definition above, for each study i) to reveal if any issues of

poor fit can be traced back to individual studies.

The complexity of the model is described by the effective number of parameters 𝑝𝐷. Counting the

number of parameters is not always as straightforward as it might sound, particularly in hierarchical

random effects models, but 𝑝𝐷 can be calculated as the mean residual deviance less the residual

deviance evaluated at the mean, i.e. 𝑝𝐷 = 𝐷(𝝋) − 𝐷(). The deviance information criterion 𝐷𝐼𝐶 =

𝐷(𝝋) + 𝑝𝐷 is used to compare models in terms of both their fit and complexity; models with lower

DIC are favoured. The contribution of individual studies to 𝑝𝐷 (known as leverages) can also be

calculated119; the leverage is a measure of the influence a study has on the estimated parameters.

It is worth bearing in mind that these measures evaluate the fit and complexity of the multivariate

Normal approximation to the likelihood; if the data/parameters are such that the approximation is

poor then they may not accurately reflect the fit and complexity of the “exact” underlying model.

Chapter II.4

86

Additionally, although inconsistency in the evidence network may be one factor contributing to poor

model fit, the residual deviance does not directly evaluate inconsistency, which should ideally be

assessed by other methods when performing a network meta-analysis (see II.7).

Chapter II.5

87

II.5 Population calibration module

For multi-outcome decision-making purposes such as benefit-risk assessments or heath economic

evaluations it is sometimes essential to translate the relative treatment effects from a meta-analysis

into real-world outcome estimates on an absolute scale, e.g. proportions and incidences rather than

relative risks or rate ratios. For this to happen, the relative effects must be combined with and

calibrated by a typical baseline value (or distribution) 90,91. This is akin to the situation in simple

linear regression where the slope coefficient (relative-effect) expresses the fundamental relationship

between variables but a constant intercept term is required in order to estimate predicted values

(absolute-effects). In the context of the RRMS case study the baseline or intercept term

corresponds to the outcome level in an untreated population of RRMS patients. I have chosen to

estimate its posterior distribution by a random-effects multivariate meta-analysis of the absolute

outcome levels across the set of all trial arms in the RRMS dataset, adjusted by the corresponding

treatment contrast from the treatment effects model. This assumes that the aim of the case study is

to assess the benefits and risks of the RRMS treatment in a generalised trial-eligible Western

population of RRMS patients, and is a convenient approach for illustration purposes as it makes use

of all the source data already identified for the case study. However it should be recognised that

outcomes on absolute scales tend to be much more heterogeneous than relative effects, and the

resulting wide distributions will contribute to the uncertainty of the overall results. When models

are used to inform real decisions, a better approach may be to carefully select the source data to

provide a more homogeneus sample that is highly relevant to the target population. Possible

approaches that could be used include:

• selecting a subset of the studies used in the treatment effects model, eg based on

demographic similarity to the target population. This approach would require minimal

changes to the models described here, being simply a matter of indexing;

• using a different set of studies altogether; note however that if there are any treated study

arms in the population calibration dataset that are not in the treatment effects dataset, and

a random effects model is used for the treatment effects, then the corresponding study-

specific treatment effects (needed for adjustment) will have a high degree of uncertainty.

• constructing explicit posteriors, perhaps based on external data (eg national/local statistics

or patient registries).

II.5.1 Statistical model

The model is superficially similar to that used for the main multivariate NMA in the treatment

contrast module, but is used rather differently. No inference is made regarding the underlying

Chapter II.5

88

treatment contrasts; these are assumed to be “known” (what this really means is that the posterior

distributions from the treatment effects module are used, but with no inferential feedback from the

population calibration model – this is achieved by using the “cut” function in BUGS). The aim of this

module is to model the baseline distribution of outcomes on the absolute scale in an untreated

population. The population-average value of outcome in an untreated population is denoted by

𝑎𝜔 , 𝜔 𝜖 1, … ,.

In the untreated population model a multivariate normal distribution is again assigned to 𝒚𝑖𝑘, the

NOi-length vector of observed outcomes in arm k of study i:

𝒚𝑖𝑘~ 𝑀𝑉𝑁(𝜶𝑖 + 𝜹𝑖𝑘 , 𝐂𝐕𝑖𝑘 ) (4)

where 𝜶𝑖 = 𝛼1,… , 𝛼𝑁𝑂𝑖 is the vector of untreated study-specific population means relating to the

NOi outcomes in study i, 𝜹𝑖𝑘 is the vector of “known” treatment effects in arm k of study i, and 𝐂𝐕𝑖𝑘

is the within-study covariance matrix, here taken to be the same as in the treatment effects module.

𝜶𝑖 is given a multivariate distribution, 𝜶𝑖 ~ 𝑀𝑉𝑁(𝒂𝑖 ,𝒊) where 𝒂𝑖 is a vector of length 𝑁𝑂𝑖 of whose

elements are 𝑎𝜔𝑖𝑗 and 𝒁𝒊 is a between-study covariance matrix of size 𝑁𝑂𝑖 ×𝑁𝑂𝑖. For simplicity and

coherence it is assumed here that the correlation between untreated outcomes 𝛼𝑖𝑗1 and 𝛼𝑖𝑗1 is the

same as the correlation between treatment effects 𝛿𝑖𝑗1𝑘 and 𝛿𝑖𝑗2𝑘 in the treatment effects module.

In other words, the diagonal elements of 𝒁𝒊 are equal to ζ2 (the between-study variance) and the

off-diagonal elements are equal to ρb ζ2 (or, if outcome-specific correlation propensities are used,

𝑠𝑖𝑔𝑛(𝜌𝑏𝑗1)𝑠𝑖𝑔𝑛(𝜌𝑏𝑗2)𝑟 √(|𝜌𝑏𝑗1𝜌𝑏𝑗2|)ζ2). It is however theoretically possible for correlations

between the untreated outcomes to differ from those between the treatment effects, and the

model can accommodate this if desired.

This is analogous to a “random effects” model; the corresponding “fixed effects” model is obtained

by replacing 𝛼𝑖𝑗 with its mean 𝑎𝜔𝑖𝑗 in (4) and can be used when there is little between-study

heterogeneity in the untreated outcomes.

For outcomes on a restricted scale, such as the binary risk difference, it is worth remembering to

apply the appropriate floor/ceiling to 𝛼𝑖𝑗 by using the min/max functions in the BUGS language

before passing the values on to any dependent nodes.

The variance decomposition described in II.4.4 is used again to express the multivariate Normal

distributions above as combinations of univariate Normals.

Chapter II.5

89

Since estimates of the parameters one is trying to estimate (i.e. the mean and variance of the

untreated outcome distributions) can be obtained directly from every study (in combination with the

assumed treatment contrasts from the treatment effects module) there is no need for the

parameterisation described in II.4.5, which was only necessary in the treatment contrasts model

because the untreated “baseline” outcomes were inestimable in studies with no untreated/placebo

group.

In some circumstances where no events are observed in untreated study arms, it may be reasonable

to assume an untreated event rate of zero rather than attempting to infer the rate. For example, the

occurrence of many treatment-related adverse events in untreated control arms may be zero – or

close enough to make no practical difference. The approach taken here, however, will be to model

the underlying rates in such instances.

It is possible to calculate the residual deviance for the population calibration module, just as in the

treatment effects module, but if (as here) the same data is used for both modules then the residual

deviance will also be the same in both. This is because a non-zero deviance only occurs when there

is inconsistency in the relative treatment contrasts (due either to mismatched treatment effects

within a “loop” in the network diagram or inconsistent mapping ratios) - and these contrasts are the

same in both modules. The study-specific “baseline” is defined differently in the two modules, but

in either case it is not subject to any consistency constraints that prevent it from fitting perfectly,

and therefore does not contribute to the residual deviance.

It is worth noting that performing such an analysis of outcomes on the absolute scale in order to

make differential inferences about the effect of treatments (or other trial characteristics) would not

be advisable as differences in studies or populations could confound any true effects; here, however,

the intention is simply to describe the extent of variability in untreated populations rather than to

seek to explain or classify it. The treatment effects model only makes inferences regarding

treatment contrasts, which are assumed to be homogeneous due to randomisation.

II.5.2 Priors

The priors used for the population calibration model are shown in Table 9 and mirror those used in

the treatment effects model.

Chapter II.5

90

Table 9 – Priors for the population calibration module.


𝑎𝜔 Population-average value of

untreated outcome

aω ∈ ℝ:

𝑎𝜔 ~𝑁(0,1000)

aω ∈ [−1,1]:

𝑎𝜔 ~ 𝐵𝑒𝑡𝑎(0.5,0.5)

ζ Standard deviation of random

untreated outcomes distribution

ζ~𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10)

II.5.3 Outputs

The untreated population-average parameters 𝑎𝜔 are combined with the population-average

contrasts 𝑑𝜔𝑡 to give absolute-scale treatment effects 𝑥𝜔𝑡 :

𝑥𝜔𝑡 = 𝑎𝜔 + 𝑑𝜔𝑡 𝜔 𝜖 1, … ,, 𝑡 𝜖 1, … ,𝑁𝑇

These parameters are population averages, and due to the law of averages their posteriors exhibit

little uncertainty provided that there sufficient numbers of patients and/or studies in the trial data.

For decision-making purposes, however, it may not be the uncertainty on the average that is most

relevant but the variability around that average within a typical population. It is therefore often

more relevant to consider the predictive distribution of outcomes which incorporates this additional

variability around the average. Owing to the structure of the model, there are two levels of

additional uncertainty to consider. In the first place, one can allow for the between-study variability

in untreated outcomes and treatment effects by simulating new values (or rather, vectors) from the

posterior distributions of 𝜶𝑖 and 𝜹𝑖𝑘, with i in this case referring to a hypothetical unobserved study

that includes all treatments and outcomes. The posterior of 𝑖𝑘 = 𝜶𝑖 + 𝜹𝑖𝑘 for this hypothetical

study then corresponds to the predictive distribution of the study-level average outcome vector, the

elements of which will have the same expectation as 𝑥𝜔𝑡 but a bigger variance, reflecting the

observed between-study heterogeneity. This predictive distribution can be sampled within the

Bayesian MCMC environment by calculating 𝑖𝑘

within the model code and sampling from its

posterior.

This allows one to assess the distribution of treatment effects – and thus, ultimately, the benefit-risk

balance – in terms of “study-level” averages. This need not necessarily be literally interpreted as

Chapter II.5

91

referring to different “studies” but perhaps as regions, towns, clinics etc depending on how the

study populations are recruited. An average at this level might suffice for many decision-making

purposes, such as determining the costs of an intervention, where it is the aggregate sum to be

entered in the accounts that is key. It is important to recognise, however, that the patient-level

variability within these “study”-sized units has still not been accounted for.

For this, one must construct the predictive distribution of outcomes allowing for both between-

study and within-study (patient-level) variability. This is achieved by simulating new values (vectors)

from the posterior distribution of 𝒚𝑖𝑘 , 𝑤ith i again referring to the hypothetical unobserved study

with all treatments and outcomes. The mean vector of 𝒚𝑖𝑘 is given by 𝑖𝑘

, and the correlations

between components are the same as elsewhere in the model. The variance is however unknown.

For observed values of 𝒚𝑖𝑘 in the data, the sample variance was used as an estimator of the true

variance, as is conventional; for the hypothetical unobserved study there is no sample variance to

use and another approach is needed. The strategy used here is to estimate the marginal variance of

𝑦𝑖𝑗𝑘 for each outcome j within the model by assuming that the sample variances in the data are

observations from a common Normal distribution, the mean of which can be used as the variance

estimate for the new hypothetical study. To obtain the full patient-level variability in the predictive

distribution (rather than the sampling variability of a group average), the number of patients in each

treatment group should be set to 1.

(As an aside, it may be worth noting that one could also use this variance estimate in place of the

sample variances when specifying the distribution of 𝒚𝑖𝑘 for observed studies i. This will not

however be pursued here as it goes rather beyond the scope of this thesis, being more a matter of

general statistical modelling than one specific to multivariate evidence synthesis or benefit-risk.

Initial attempts suggest that it does not make much difference in this case.)

Strictly speaking the estimated variance from this method will reflect both between-patient

heterogeneity and measurement error of outcomes; it should therefore be an estimated upper

bound on the between-patient heterogeneity rather than a straightforward estimate.

Again, allowing for the additional patient-level uncertainty means that 𝒚𝑖𝑘 has greater variance than

𝑖𝑘

but with the same mean.

As a final step, the output parameters can be transformed back to their original scale (i.e. converting

log odds back to proportions, etc) if desired.

Chapter II.5

92

II.5.4 Rankings

It is straightforward to create treatment rankings for each outcome based on the population-level

average outcomes or the study- or patient-level predictive distributions. This does not require any

particular assumptions about a decision maker’s relative preferences for different outcomes - the

only value judgements that need to be specified are the (hopefully self-evident) impact signs for

each outcome: the impact is either positive (higher values are better) or negative (lower values are

better) depending on the outcome definition. This additional information can be supplied in the

data. Within the RRMS case study, only the proportion avoiding relapse has positive impact; for the

other outcomes, a lower value is better.

The rankings can be calculated within each MCMC iteration of the model so that a posterior

distribution of rankings is obtained. One way to present these distributions is via the Surface Under

the Cumulative Ranking Curve (SUCRA) statistic120, which summarises the posterior rankings for each

treatment as an overall rating between 0 and 1, where 0 is a treatment that is always outranked by

all others and 1 is a treatment that always outranks all others. One way to interpret the SUCRA for a

given treatment is that it represents the expected (or average) proportion of its competitors that it

outranks.

The rankings should be the same whether calculated in the treatment effects module or the

population calibration module (since the difference between the two is the untreated baseline,

which is equal for all treatments), and will also be the same regardless of whether the outcomes

have been transformed to another scale (since scale transformations are monotonic and have no

effect on ordering relations). The distribution of rankings will however depend on whether they are

based on the population-level average outcomes or the study- or patient-level predictive

distributions.

Rankings and SUCRAs are ordinal summaries that give no information about the magnitude of the

difference between treatments. Just because a treatment outranks another does not mean that the

difference is clinically or statistically significant, and the difference between adjacent ranks among a

set of treatments can vary greatly, potentially making the SUCRA somewhat misleading120. For this

reason SUCRAs should always be presented and interpreted alongside (rather than instead of)

summaries of the posterior distribution such as credibility intervals.

Chapter II.6

93

II.6 Results

Simulations were performed using the Markov Chain Monte Carlo technique in either WinBUGS

(version 1.4.3) 48 or OpenBUGS (version 3.2.2 rev 1063 - www.openbugs.net). Initial values were

generated within BUGS for the majority of models. 100,000 iterations were discarded to allow for

“burn-in”; the posterior statistics were then derived from a further 100,000 iterations. Convergence

was assessed by inspection of the sample histories. Model fit was assessed by calculation of the

mean residual deviance119, which in a well-fitting model should be similar to (or less than) the

number of independent observations.

Appendix B contains the BUGS code and RRMS data files used to generate the results.

II.6.1 Treatment effects module

II.6.1.1 Model 0: all outcomes independent

Figure 11 and Figure 12 show posterior summary statistics for the key parameters of the

naïve/univariate Model 0, with fixed and random effects respectively on the relapse, disability and

liver safety outcomes. The number of observations for the remaining outcomes (serious

gastrointestinal disorders, serious bradycardia and macular edema) is not sufficient to justify a

random effects model and so fixed effects have been used for these outcomes in both models. The

risk difference for these outcomes has also been magnified by a factor of 10 for clarity. Only

treatment-outcome combinations with data are shown.

Chapter II.6

94

Figure 11 - Posterior credibility intervals of relative treatment effects (population averages) from Model 0, fixed effects. Markers indicate means and lines indicate 95% credibility intervals. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. GI = gastrointestinal. sd = standard deviation.

Chapter II.6

95

Figure 12 - Posterior credibility intervals of relative treatment effects (population averages) from Model 0, random effects (except serious GI disorders, serious bradycardia and macular edema). Markers indicate means and lines indicate 95% credibility intervals. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. GI = gastrointestinal. sd = standard deviation.

Chapter II.6

96

The treatment effect estimates are largely similar from both models, albeit with slightly wider

distributions in the random effects model.

The fixed effect model does not fit well, with a mean residual deviance of 199.5, well in excess of

169 (the number of observations in the dataset). The residual deviance in the random effects model

is somewhat better at 174.5 but still exceeds the number of observations.For the remainder of the

chapter, all models will use random effects on relapse, disability progression and liver safety

outcomes, and fixed effects on serious GI disorders, serious bradycardia and macular edema, unless

otherwise indicated.

Chapter II.6

97

II.6.1.2 Model 1: Correlated non-zero outcomes

Figure 13 summarises the posterior distribution of the key parameters of Model 1, with all

correlation coefficients set to zero. Again only parameters relating to non-missing data are shown.

Figure 14 shows the results from Model 1 with assumed correlation coefficients of 0.6 between all

pairs of outcomes (at both the between- and within-study levels). This is probably not a realistic

correlation structure but serves to illustrate the model’s capabilities (see II.6.1.6 for more discussion

of this point).

Chapter II.6

98

Figure 13 - Posterior credibility intervals of relative treatment effects (population averages) from Model 1 (random effects), with all correlations between outcomes set to zero. Markers indicate means and lines indicate 95% credibility intervals. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. sd = standard deviation.

Chapter II.6

99

\ Figure 14 - Posterior credibility intervals of relative treatment effects (population averages) from Model 1 (random effects), with all correlations between outcomes set to 0.6. Markers indicate means and lines indicate 95% credibility intervals. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. sd = standard deviation.

Chapter II.6

100

As expected, the treatment effect posteriors in Figure 13 appear much the same as in Model 0

(seeFigure 12), apparently confirming that the Normal approximation to the true likelihood is

reasonable for the purpose of estimating the treatment effects. The mean residual deviance of 155.5

is in fairly close agreement with Model 0 (where a value of 156.9 is obtained for this restricted set of

outcomes) and is only slightly higher than the number of observations (152), indicating reasonable

fit.

Any slight differences between the two sets of results (from Model 0 and Model 1 with no

correlations) are due to a combination of three factors: (i) the difference between the original

Binomial/Poisson likelihood and the (approximate) Normal likelihood, (ii) the use of different priors

due to the different parameter scales associated with each version of the likelihood, and (iii) random

artefacts of the MCMC process (since the sampling algorithm only approximates the true posterior

distribution). Inspection of the results reveals no differences of particular significance or concern in

this case.

Comparing Figure 13 and Figure 14 reveals that including the correlations has had the following

impacts:

- The posterior standard deviation of the treatment effects has increased and the residual

deviance has increased slightly from 155.5 to 161.5, indicating worsening heterogeneity and

fit. This is likely to be because the assumed correlation structure (correlation of 0.6 between

all pairs of outcomes) is not realistic for this dataset, since some outcomes are likely to be

negatively correlated (see II.6.1.6).

- There have been various minor changes to the estimated treatment effect means, some

increasing and some decreasing, but with no overall systematic trend.

By this point it should be clear from the results that there is not straightforward way to rank the

treatments in terms of their overall benefit-risk balance as the results vary by outcome. Indeed, the

best- and worst-performing treatments are different for almost every outcome (not counting

placebo).

The treatment effect estimates mostly seem reasonable, but there are some slight surprises:

• The performance of subcutaneous interferon beta-1a and interferon beta-1a with regard to

disability progression is rather confusing, with each of these treatments performing very

well on one outcome measure but very poorly on the other.

Chapter II.6

101

• It is unexpected that so many treatments should perform better than placebo with regard to

ALT elevation above 5x the upper limit of the normal range.

While these could be chance findings, it is also possible that they are the result of bias due to

inconsistency in the network. In the case of ALT elevation above 5x the upper limit of the normal

range, the network is especially sparse, making the estimates particularly vulnerable to any chance

findings, biased studies or between-study heterogeneity.

II.6.1.3 Model 2: fixed baseline

Figure 15 and Figure 16 summarise the posterior distributions of the key parameters of Model 2,

with all correlation coefficients again set either to 0 or 0.6. Again only parameters relating to non-

missing data are shown.

Chapter II.6

102


Chapter II.6

103


Chapter II.6

104

Again, with correlations set to zero there are some minor changes in the treatment effect estimates

between this model and Models 0 and 1, but these appear insubstantial. For the most part, there

also appears to be little difference between the results of Model 1 and 2, even with correlations

present, suggesting that there may be little practical difference between the variable- and fixed-

baseline parameterisations; however, it is the fixed-baseline version that will be taken forward here

owing to its theoretical properties.

One can examine the study-level contributions to the fit and complexity statistics in order to identify

any outlying studies. However, since each study contributes a different number of observations

(treatment arms multiplied by outcomes), the studies with more observations will tend to contribute

more. To adjust for this, it may be helpful simply to divide each study’s contribution by the number

of observations in order to obtain the average contribution per observation. This is done in Figure

17, which plots each study’s average contribution to the residual deviance per observation

(horizontal axis) against its average contribution to the complexity (leverage) per observation

(vertical axis). If a study fits well then the deviance per observation should not be much greater than

1, resulting in an overall model deviance roughly equal to the number of observations. The leverage

per observation should generally be less than 1, as otherwise the model will have more parameters

than observations (since the effective number of parameters is the sum of the study-level

leverages). Any study with both a high deviance contribution and a high leverage can be regarded as

an outlier that is adversely affecting the overall model fit and complexity.

0

0.2

0.4

0.6

0.8

1

1.2

0 0.5 1 1.5 2 2.5 3

Leve

rage

per

ob

sera

vtio

n (

stu

dy-

leve

l ave

rage

)

Deviance contribution per observation (study-level average)

Model 2, correlations of 0.6

BORNSTEIN 1987

Chapter II.6

105

Figure 17 – Deviance and complexity (leverage) per observation for individual studies in the RRMS dataset (Model 2, correlations of 0.6).

In this case the majority of studies show reasonable deviance contributions (clustered close to 1) and

leverages (all lying below 1) except for BORNSTEIN 198794, which has a high deviance contribution.

As the earliest trial in the dataset by several years, there may be differences in population

characteristics, aspects of clinical care or study conduct that result in its treatment effects being

heterogeneous with the other studies. It is also the smallest study in the dataset and therefore the

most prone to chance sampling error. Since the leverage for this study is low, however, its influence

on the overall results should be modest and it seems reasonable to retain it in the dataset.

II.6.1.4 Model 3: mappings

Figure 18 and Figure 19 summarise the posterior distribution of the key parameters of Model 3, for

fixed mappings and random-by-treatment mappings applied respectively, with all outcomes in one

mapping group, and all correlation coefficients set to 0.6. The treatment effect parameters with

missing data have now been included as they can be estimated via the mappings. The mapping

ratios themselves are not shown here but details can be found in Appendix C, along with results for

alternative correlation structures.

Chapter II.6

106

Figure 18 - Posterior credibility intervals of relative treatment effects (population averages) from Model 3 (random effects, fixed mappings, one mapping group, all correlation coefficients between outcomes = 0.6). Markers indicate means and lines indicate 95% credibility intervals. Solid markers and lines are treatment-outcome combinations for which data was available; hollow markers and dashed lines are estimated by mappings. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. sd = standard deviation.

Chapter II.6

107

Figure 19 - Posterior credibility intervals of relative treatment effects (population averages) from Model 3 (random effects, random mappings, one mapping group, all correlation coefficients between outcomes = 0.6). Markers indicate means and lines indicate 95% credibility intervals. Solid markers and lines are treatment-outcome combinations for which data was available; hollow markers and dashed lines are estimated by mappings. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. sd = standard deviation.

Chapter II.6

108

The results from both models are for the most part very similar, but some of the more extreme

effect sizes appear to be moderated in the fixed-mapping model. The random-mapping model

shows marginally better fit. Curiously, the complexity of the random-mapping model (as measured

by the effective number of parameters pd) is less than that of the fixed-mapping model, which goes

contrary to expectations. The reason for this is not immediately clear, but it may be that pd is a

misleading measure for this structure of model and should ot be too heavily relied upon for model

selection.

Plots of the average deviance contribution per observation against the average leverage per

observation are shown in Figure 20 and Figure 21 below. For the fixed mapping model there is little

change from Model 2, but in the random-mapping model the leverage of the outlying study

(BORNSTEIN 1987) is reduced almost to zero. This indicates that this study is having very little

impact on the results, presumably because (i) the mappings allow strength to be borrowed from

elsewhere in the network, and (ii) the random-mappings formulation means that mappings derived

from this study have little impact on the rest of the model, since only one other study uses the same

active treatment.

Figure 20 – Deviance and complexity (leverage) per observation for individual studies in the RRMS dataset (Model 3, correlations of 0.6, fixed mappings in one group).

0

0.2

0.4

0.6

0.8

1

1.2

0 0.5 1 1.5 2 2.5 3

Leve

rage

per

ob

sera

vtio

n (

stu

dy-

leve

l ave

rage

)


Model 3, fixed mappings

BORNSTEIN 1987

Chapter II.6

109

Figure 21 – Deviance and complexity (leverage) per observation for individual studies in the RRMS dataset (Model 3, correlations of 0.6, random mappings in one group).

The mappings have allowed the missing treatment-outcome combinations to be estimated, at the

cost of some distortion in the estimates of the other effects (compared to the estimates obtained

from Model 2, without mappings). Effectively the estimates are all brought more in line with the

proportionality assumption.

The extent of this distorting effect may depend on how the outcomes are grouped for mapping

purposes. To examine this, Table 10 explores the impact of altering the mapping groups within the

random-mapping Model 3. Again all correlation coefficients have been set to 0.6. Rows

corresponding to missing data are highlighted.

0

0.2

0.4

0.6

0.8

1

1.2

0 0.5 1 1.5 2 2.5

Leve

rage

per

ob

sera

vtio

n (

stu

dy-

leve

l ave

rage

)


Model 3, random mappings

BORNSTEIN 1987

Chapter II.6

110

Table 10 – Posterior distributions from Model 4: effect of varying mapping groups (random effects, all correlation coefficients between outcomes = 0.6). DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. GI = gastrointestinal. sd = standard deviation.

RANDOM EFFECTS FIXED MAPPINGS MODEL

1 group 2 groups 3 groups No mappings

mean sd mean sd mean sd mean sd

Log annual relapse rate ratio (vs placebo)

DF -0.443 0.142 -0.620 0.138 -0.658 0.140 -0.699 0.181

FM -0.730 0.163 -0.714 0.158 -0.759 0.153 -0.726 0.184

GA -0.274 0.101 -0.460 0.109 -0.467 0.108 -0.379 0.142

IA (IM) -0.216 0.087 -0.266 0.089 -0.270 0.094 -0.201 0.160

IA (SC) -0.272 0.106 -0.338 0.111 -0.327 0.117 0.026 0.192

IB -0.421 0.129 -0.521 0.128 -0.505 0.126 -0.407 0.171

LM -0.253 0.091 -0.259 0.090 -0.236 0.096 -0.205 0.179

TF -0.296 0.120 -0.380 0.135 -0.392 0.147 -0.377 0.257

Log odds ratio of avoiding relapse (vs placebo)

DF 0.832 0.158 0.755 0.143 0.739 0.149 0.715 0.184

FM 0.856 0.158 0.816 0.149 0.875 0.159 0.921 0.194

GA 0.647 0.134 0.559 0.120 0.515 0.124 0.632 0.160

IA (IM) 0.388 0.118 0.331 0.106 0.304 0.110 0.384 0.182

IA (SC) 0.573 0.161 0.496 0.149 0.377 0.151 0.804 0.227

IB 0.761 0.170 0.704 0.158 0.644 0.158 0.766 0.214

LM 0.401 0.115 0.358 0.105 0.286 0.111 0.326 0.184

TF 0.512 0.168 0.449 0.156 0.421 0.163 0.429 0.275

Log odds ratio of disability progression confirmed 3 months later (vs placebo)

DF -0.314 0.129 -0.472 0.137 -0.507 0.173 -0.528 0.201

FM -0.398 0.138 -0.481 0.134 -0.391 0.170 -0.296 0.201

GA -0.193 0.094 -0.355 0.114 -0.464 0.158 -0.179 0.194

IA (IM) -0.195 0.102 -0.208 0.088 -0.294 0.169 -0.301 0.250

IA (SC) -0.489 0.241 -0.356 0.170 -0.656 0.259 -0.650 0.277

IB -0.350 0.151 -0.425 0.144 -0.648 0.231 -0.110 0.288

LM -0.283 0.124 -0.239 0.098 -0.409 0.154 -0.385 0.214

TF -0.259 0.127 -0.283 0.122 -0.348 0.245 -0.400 0.288


DF -0.356 0.155 -0.530 0.179 -0.494 0.198 -0.473 0.286

FM -0.508 0.157 -0.566 0.167 -0.412 0.183 -0.412 0.213

GA -0.241 0.116 -0.411 0.147 -0.458 0.178 -0.082 0.268

IA (IM) -0.245 0.118 -0.241 0.103 -0.281 0.158 -0.438 0.219

IA (SC) -0.319 0.147 -0.330 0.138 -0.530 0.215 0.477 0.390

IB -0.836 0.391 -0.623 0.289 -0.833 0.358 -1.513 0.408

LM -0.352 0.141 -0.281 0.120 -0.431 0.163 -0.483 0.221

TF -0.357 0.252 -0.336 0.177 -0.341 0.302 -0.046 31.600

Log odds ratio of ALT above upper limit of normal range (vs placebo)

DF 0.669 0.181 0.322 0.202 0.301 0.209 0.254 0.229

FM 1.233 0.257 1.300 0.269 1.380 0.282 1.368 0.313

GA 0.310 0.127 -0.120 0.204 -0.183 0.215 -0.124 0.220

IA (IM) 0.583 0.169 0.617 0.198 0.590 0.206 0.628 0.221

IA (SC) 1.133 0.366 1.278 0.406 1.075 0.427 1.614 0.463

IB 1.304 0.297 1.063 0.355 0.958 0.369 1.109 0.377

LM 0.721 0.144 0.830 0.150 0.761 0.157 0.735 0.191

TF 0.875 0.212 0.795 0.234 0.781 0.244 0.875 0.275

Log odds ratio of ALT above 3x upper limit of normal range (vs placebo)

DF 0.539 0.179 0.242 0.164 0.215 0.166 0.225 0.277

FM 1.079 0.248 1.007 0.263 1.072 0.284 1.300 0.309

GA 0.402 0.171 -0.079 0.154 -0.113 0.152 0.316 0.301

IA (IM) 0.366 0.169 0.454 0.191 0.397 0.185 0.213 0.392

IA (SC) 0.732 0.484 1.203 0.744 0.774 0.496 -0.060 31.650

Chapter II.6

111

IB 0.998 0.641 0.958 0.555 0.828 0.558 -0.069 31.640

LM 0.753 0.241 0.801 0.237 0.657 0.241 0.835 0.326

TF 0.403 0.168 0.509 0.187 0.466 0.191 -0.003 0.376


DF 0.175 0.139 0.075 0.083 0.072 0.087 -0.529 0.454

FM 0.303 0.237 0.306 0.250 0.371 0.310 0.643 0.460

GA 0.124 0.108 -0.029 0.071 -0.047 0.080 -0.113 0.410

IA (IM) 0.120 0.108 0.150 0.133 0.145 0.137 -0.327 0.540

IA (SC) 0.216 0.221 0.376 0.381 0.264 0.275 -0.183 31.560

IB 0.295 0.303 0.302 0.319 0.287 0.322 -0.088 31.700

LM 0.157 0.132 0.226 0.183 0.201 0.169 -0.338 0.581

TF 0.170 0.175 0.188 0.177 0.187 0.181 0.095 31.670

Average mapping (reference outcomes have constant mapping of 1)

log ARR 1 1 1 1 1 1 N/A N/A

logit avoid relapse 2.270 0.933 1.396 0.482 1.235 0.363 N/A N/A

logit 3M DP -1.046 0.415 -0.843 0.236 1 1 N/A N/A

logit 6M DP -1.316 0.558 -0.981 0.322 -1.061 0.327 N/A N/A

logit ALT>ULN 2.933 1.130 1 1 1 1 N/A N/A

logit ALT>3xULN 2.216 1.007 0.818 0.215 0.740 0.210 N/A N/A

logit ALT>5xULN 0.665 0.596 0.259 0.206 0.258 0.205 N/A N/A

Between-study treatment effects sd 0.250 0.052 0.256 0.049 0.262 0.053 0.221 0.061

Between-treatment mapping sd 0.566 0.161 0.263 0.158 0.264 0.170 N/A N/A

Residual deviance 161.0 17.5 161.5 17.5 162.1 17.8 161.4 17.6

Comparing the columns of Table 10, the general trend appears to be that the higher the number

mapping groups, the closer the estimates are to those from the model without mappings (for effects

that the latter model is able to estimate). In other words the distortion induced by the mappings is

less when the mappings are only applied between more closely related outcomes, as one would

expect. This confirms that one should carefully consider how to group the outcomes when using

mappings, and try to ensure outcomes remain as similar as possible within each group.

II.6.1.5 Models 4a/4b: outcomes with zeroes

Table 11 shows posterior summary statistics from Models 4a and 4b, for only the three outcomes

that were excluded from Models 1-3 due to the presence of zeroes (serious gastrointestinal

disorders, serious bradycardia and macular edema). Fixed effects have been used for these

outcomes owing to the small number of observations. All correlations involving these three

outcomes have been set to zero, i.e. they are modelled in a univariate fashion, and they have not

been subjected to mappings. This allows for a fair comparison between the estimates produced by

the two models and the empirical estimates of the treatment effects (also shown in the table;

obtained as the difference in observed proportions between study arms, taking an average weighted

by patient numbers where more than one study contributes). Both models used identical

beta(0.5,0.5) priors on the treatment effects d and “baseline” probabilities . A range of constant

continuity corrections (added to the observed proportion in all study arms) have been used in

conjunction with Model 4b.

Chapter II.6

112

Table 11 – Posterior distributions from Models 4a and 4b for the “zeroes” outcomes (fixed effects, no correlations between outcomes), and empirical treatment effect estimates. Only treatment-outcome combinations with data are shown; all other treatment effect parameters for these outcomes are set to zero. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, LQ = laquinimod, TF = teriflunomide, cc = continuity correction.

Mean sd 2.50% 97.50% Empirical estimate

Mean – empirical

difference Model 4a (binomial likelihood)

Serious GI, DF 0.01012 0.00381 0.00381 0.01860 0.01041 -0.00029

Serious GI, GA 0.00142 0.00201 0.00000 0.00714 0 0.00142

Serious GI, LQ 0.01215 0.00584 0.00137 0.02463 0.01275 -0.00060

Serious GI, TF 0.01860 0.00892 0.00214 0.03776 0.01957 -0.00097

Serious bradycardia, FM 0.00209 0.00232 0.00000 0.00825 0.00251 -0.00042

Macular edema, FM 0.00124 0.00140 0.00000 0.00501 0.00128 -0.00004

Residual deviance 161.2 17.39 128.7 197.1 n/a n/a

Model 4b (normal likelihood, cc=0.01)

Serious GI, DF 0.01043 0.00031 0.00982 0.01104 0.01041 0.00002 Serious GI, GA 0.00013 0.00015 0.00000 0.00055 0 0.00013

Serious GI, LQ 0.01274 0.00034 0.01207 0.01341 0.01275 -0.00001

Serious GI, TF 0.01956 0.00059 0.01842 0.02070 0.01957 -0.00001


Macular edema, FM 0.00104 0.00027 0.00051 0.00157 0.00128 -0.00024



Serious GI, DF 0.01047 0.00043 0.00964 0.01130 0.01041 0.00006 Serious GI, GA 0.00021 0.00024 0.00000 0.00088 0 0.00021

Serious GI, LQ 0.01274 0.00046 0.01185 0.01364 0.01275 -0.00001

Serious GI, TF 0.01955 0.00075 0.01809 0.02101 0.01957 -0.00002

Serious bradycardia, FM 0.00252 0.00043 0.00166 0.00337 0.00251 0.00000

Macular edema, FM 0.00103 0.00042 0.00017 0.00185 0.00128 -0.00025



Serious GI, DF 0.01050 0.00057 0.00940 0.01162 0.01041 0.00009

Serious GI, GA 0.00031 0.00036 0.00000 0.00127 0 0.00031

Serious GI, LQ 0.01273 0.00060 0.01155 0.01391 0.01275 -0.00002 Serious GI, TF 0.01954 0.00095 0.01768 0.02142 0.01957 -0.00003


Macular edema, FM 0.00098 0.00057 0.00002 0.00213 0.00128 -0.00030



Serious GI, DF 0.01053 0.00076 0.00905 0.01202 0.01041 0.00012

Serious GI, GA 0.00044 0.00050 0.00000 0.00179 0 0.00044

Serious GI, LQ 0.01272 0.00080 0.01116 0.01429 0.01275 -0.00003

Serious GI, TF 0.01953 0.00125 0.01709 0.02198 0.01957 -0.00004


Macular edema, FM 0.00096 0.00070 0.00001 0.00248 0.00128 -0.00032



Serious GI, DF 0.01056 0.00107 0.00848 0.01267 0.01041 0.00015

Serious GI, GA 0.00067 0.00075 0.00000 0.00267 0 0.00067

Serious GI, LQ 0.01270 0.00113 0.01050 0.01491 0.01275 -0.00005

Serious GI, TF 0.01949 0.00172 0.01612 0.02287 0.01957 -0.00008


Macular edema, FM 0.00102 0.00087 0.00000 0.00306 0.00128 -0.00026


Chapter II.6

113

Both models 4a and 4b get reasonably close to the empirical treatment effect means. 4b, using a

Normal approximation, actually appears to perform slightly better in this regard than 4a, perhaps

surprisingly, but in either case the difference is not great enough to be of any concern. In model 4b

the treatment effect means are insensitive to the continuity correction, as expected. The treatment

effect standard deviations are positively correlated with the continuity correction but always fall

short of the variances from the exact binomial likelihood by an order of magnitude or two. This is

also true of the upper 97.5% points which may be a more sensible measure of spread as the

distribution is skewed due to the floor at zero. Conversely, the residual deviance decreases with

increasing continuity correction values, because the higher variance results in smaller standardised

residuals for any points that do not fit perfectly - but always remains above that from the exact

binomial likelihood.

By artificially inflating the sample variances supplied to model 4b, the posterior treatment effect

distributions can be calibrated to broadly match those in the exact binomial model. Table 12 shows

the impact of inflating the sample variances by a factor of 100.

Chapter II.6

114

Table 12 - Posterior distributions from Models 4a and 4b (with 100x inflated sample variances) for the “zeroes” outcomes (fixed effects, no correlations between outcomes), and empirical treatment effect estimates. Only treatment-outcome combinations with data are shown; all other treatment effect parameters for these outcomes are set to zero. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, LQ = laquinimod, TF = teriflunomide, cc = continuity correction.

Mean sd 2.50% 97.50% Empirical estimate

Mean – empirical

difference

Model 4a (binomial likelihood) Serious GI, DF 0.01012 0.00381 0.00381 0.01860 0.01041 -0.00029

Serious GI, GA 0.00142 0.00201 0.00000 0.00714 0 0.00142

Serious GI, LQ 0.01215 0.00584 0.00137 0.02463 0.01275 -0.00060

Serious GI, TF 0.01860 0.00892 0.00214 0.03776 0.01957 -0.00097


Macular edema, FM 0.00124 0.00140 0.00000 0.00501 0.00128 -0.00004


Model 4b (normal likelihood, cc=0.01) Serious GI, DF 0.01016 0.00319 0.00377 0.01638 0.01041 -0.00025

Serious GI, GA 0.00170 0.00189 0.00000 0.00669 0 0.00170

Serious GI, LQ 0.01215 0.00351 0.00518 0.01898 0.01275 -0.00060

Serious GI, TF 0.01835 0.00622 0.00608 0.03016 0.01957 -0.00122


Macular edema, FM 0.00159 0.00161 0.00000 0.00568 0.00128 0.00031



Serious GI, GA 0.00257 0.00289 0.00000 0.01027 0 0.00257

Serious GI, LQ 0.01164 0.00482 0.00188 0.02090 0.01275 -0.00111

Serious GI, TF 0.01758 0.00792 0.00171 0.03308 0.01957 -0.00199


Macular edema, FM 0.00227 0.00235 0.00000 0.00830 0.00128 0.00099



Serious GI, GA 0.00357 0.00399 0.00001 0.01416 0 0.00357

Serious GI, LQ 0.01101 0.00610 0.00030 0.02328 0.01275 -0.00174

Serious GI, TF 0.01680 0.00961 0.00040 0.03629 0.01957 -0.00277


Macular edema, FM 0.00304 0.00320 0.00001 0.01128 0.00128 0.00176



Serious GI, GA 0.00496 0.00550 0.00001 0.01965 0 0.00496

Serious GI, LQ 0.01059 0.00737 0.00007 0.02640 0.01275 -0.00216

Serious GI, TF 0.01637 0.01147 0.00013 0.04110 0.01957 -0.00320


Macular edema, FM 0.00403 0.00428 0.00001 0.01515 0.00128 0.00275


Model 4b (normal likelihood, cc=0.25) Serious GI, DF 0.01062 0.00878 0.00004 0.03076 0.01041 0.00021

Serious GI, GA 0.00731 0.00807 0.00001 0.02872 0 0.00731

Serious GI, LQ 0.01105 0.00906 0.00004 0.03183 0.01275 -0.00170

Serious GI, TF 0.01698 0.01404 0.00005 0.04935 0.01957 -0.00259


Macular edema, FM 0.00568 0.00610 0.00001 0.02165 0.00128 0.00440


Although not perfect, a continuity correction of 0.025 with sample variances inflated by 100 provides

a reasonable fit to the treatment effect posteriors in the binomial model. The residual deviance has

also moved in the right direction, albeit with a little way to go to match the fit of the exact binomial

model. The inflation factor and continuity correction could be refined, or alternative approximations

used, to more closely replicate the posteriors if desired (perhaps even using a different formula for

Chapter II.6

115

different outcomes, studies or arms). In this instance a broad-brush approach is sufficient to

illustrate the principle, and further refinement will not be sought; instead the continuity correction

of 0.025 and inflation factor of 100 will be adopted. This implies an estimated variance of

(0.025 + 𝑝)(0.975 − 𝑝) × 100 𝑁⁄ for the mean proportion of patients experiencing a given

outcome in a study arm with observed proportion p and N patients. Whether this approximation is

generalizable to other datasets is not clear at this stage and may be worthy of further investigation.

Having derived the inflated sample variances, it is a good idea to remove the continuity correction

from the estimated proportions themselves (i.e. the data items y). As the same continuity correction

is added to the observed proportion in all study arms, it should make no difference to the risk

difference d. However, the same dataset will be used for the population calibration module, where

preserving the true proportions at arm level will be more important.

When the variances have been adjusted appropriately, Model 4b can then be re-run with within-

study correlations if desired, as shown in Figure 22 where within-study correlations of 0.6 between

all pairs of outcomes have been assumed.

Chapter II.6

116

Figure 22 - Posterior credibility intervals of relative treatment effects (population averages) from Model 4b (random effects, random mappings, one mapping group, all correlation coefficients between outcomes = 0.6, sample variances estimated as (𝟎.𝟎𝟐𝟓+ 𝒑)(𝟎.𝟗𝟕𝟓− 𝒑) × 𝟏𝟎𝟎 𝑵⁄ for the “zeroes” outcomes). Markers indicate means and lines indicate 95% credibility intervals. Solid markers and lines are treatment-outcome combinations for which data was available; hollow markers and dashed lines are estimated by mappings. Any treatment-outcome combinations not shown are assumed to have a zero effect. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. GI = gastrointestinal. sd = standard deviation.

Chapter II.6

117

To recap, the following process has been used to derive these results:

1. Obtain posterior statistics from the model using the exact likelihood for the “zeroes”

outcomes (in this case, Binomial) without allowing for within-study correlations

2. Use a continuity correction to obtain a Normal approximation to the likelihood, and (still

assuming no within-study correlations) adjust the sample variances in the data to obtain

posterior distributions that match those in step 1.

3. Obtain final posterior estimates from the adjusted-variance Normal model, this time with an

allowance for within-study correlations.

Although it requires several model runs and two versions of the code, this procedure appears to be a

reasonable way to incorporate within-study correlations, which is prohibitively difficult using a

Binomial likelihood.

A plot of the average deviance contribution per observation against the average leverage per observation for this model is shown in Figure 23, and looks very similar to those seen earlier.

Figure 23 – Deviance and complexity (leverage) per observation for individual studies in the RRMS dataset (Model 4b, correlations of 0.6, random mappings in one group).

0

0.2

0.4

0.6

0.8

1

1.2

0 0.5 1 1.5 2 2.5 3

Leve

rage

per

ob

sera

vtio

n (

stu

dy-

leve

l ave

rage

)


Model 4b, random mappings in one group

BORNSTEIN 1987

Chapter II.6

118

Either model 4a or 4b can in principle incorporate random effects, between-study correlations or

mappings to the new outcomes but as these do not seem appropriate for the RRMS case study, no

such results are used here.

II.6.1.6 Final models

To allow estimation of all the treatment effects but only map between outcome measures that

relate to the same benefit or risk, the three-group version of model 4b (with the amended sample

variance formula for the “zeroes” outcomes, as set out in the previous section) will be used for the

remainder of the thesis. As before, fixed effects will be used for the “zeroes” outcomes and random

effects for all others, and fixed correlations of 0.6 between all pairs of outcomes at the between-

and within-study levels will be assumed. The value of 0.6 is somewhat arbitrary, having been chosen

simply as the moderate “middling” option of the three positive constants that were when building

the models (results for the alternative values of 0.3 and 0.9 are available in Appendix C). In reality a

positive correlation between all of the RRMS outcomes seems unfeasible, since one would expect

some pairs of outcomes (such as the annualised relapse rate and the relapse-free proportion) to be

negatively correlated. Indeed, the results corroborate this, with the random effects standard

deviation and residual deviance both increasing for higher (positive) assumed correlations, indicating

worsening model fit. Negative correlations can be allowed for using the extended correlation

structure described in II.4.4.1.3 and results on this basis are presented in Appendix C; this probably

gives a more realistic model but also increases run-time, so for practical reasons correlations of 0.6

have been assumed instead.

Figure 24 and Figure 25 summarise the posterior distribution of the key parameters from the fixed-

and random-mapping versions of this final model respectively.

Chapter II.6

119

Figure 24 - Posterior distributions of relative treatment effects (population averages) on Normal scale from the final model (fixed effects on “zeroes” outcomes, otherwise random effects, three mapping groups, all correlation coefficients between outcomes = 0.6). Markers indicate means and lines indicate 95% credibility intervals. Solid markers and lines are treatment-outcome combinations for which data was available; hollow markers and dashed lines are estimated by mappings. Any treatment-outcome combinations not shown are assumed to have a zero effect. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. GI = gastrointestinal. sd = standard deviation.

Chapter II.6

120

Figure 25 - Posterior distributions of relative treatment effects (population averages) on Normal scale from the final model (fixed effects on “zeroes” outcomes, otherwise random effects, three mapping groups, all correlation coefficients between outcomes = 0.6). Markers indicate means and lines indicate 95% credibility intervals. Solid markers and lines are treatment-outcome combinations for which data was available; white markers and dashed lines are estimated by mappings. Any treatment-outcome combinations not shown are assumed to have a zero effect. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. GI = gastrointestinal. sd = standard deviation.

Chapter II.6

121

The fixed- and random-mapping models appear very similar both in terms of the estimated

treatment effects and the overall fit. The most obvious difference is that the treatment effects

which are imputed by mappings have wider distributions in the random-mapping model. Either

model could be suitable as the basis for a benefit-risk assessment, and ultimately the choice

between them may come down to one’s view of the proportionality assumption, which is stronger in

the fixed-mapping version. The choice between these models will be discussed further in IV.1.2.

Plots of the average deviance contribution per observation against the average leverage per

observation for these models are shown in Figure 26 and Figure 27 below, and again the pattern is

very similar to those encountered earlier, with BORNSTEIN 1987 the only outlying study but not of

major concern due to its low leverage.

Figure 26 – Deviance and complexity (leverage) per observation for individual studies in the RRMS dataset (Final model, fixed mappings in three groups).

0

0.2

0.4

0.6

0.8

1

1.2

0 0.5 1 1.5 2

Leve

rage

per

ob

sera

vtio

n (

stu

dy-

leve

l ave

rage

)


Final model, fixed mappings in three groups

BORNSTEIN 1987

Chapter II.6

122

Figure 27 – Deviance and complexity (leverage) per observation for individual studies in the RRMS dataset (Final model, random mappings in three groups).

II.6.2 Population calibration module

Table 13 shows the posterior distribution of the key untreated population parameters from the

population calibration module.

0

0.2

0.4

0.6

0.8

1

1.2

0 0.5 1 1.5 2 2.5

Leve

rage

per

ob

sera

vtio

n (

stu

dy-

leve

l ave

rage

)


Final model, random mappings in three groups

BORNSTEIN 1987

Chapter II.6

123

Table 13 - Posterior distributions of untreated population outcomes on Normal scale from population calibration module.

Mean sd 2.5% 97.5%

Untreated population averages on Normal scale Log annualised relapse rate -0.473 0.273 -1.012 0.066 Log odds of avoiding relapse -0.292 0.282 -0.849 0.264

Log odds of disability progression, confirmed 3 months later -1.036 0.308 -1.643 -0.430

Log odds of disability progression, confirmed 6 months later -1.424 0.341 -2.099 -0.754

Log odds of ALT > ULN -2.128 0.353 -2.833 -1.447

Log odds of ALT > 3x ULN -3.279 0.363 -3.998 -2.568 Log odds of ALT > 5x ULN -4.009 0.447 -4.892 -3.140

Proportion with serious gastrointestinal events 0.0002309 0.0004090 0.0000001 0.0015180

Proportion with serious bradycardia 0.0029820 0.0015840 0.0001516 0.0061160

Proportion with macular edema 0.0007984 0.0008261 0.0000013 0.0029080

Untreated population averages on transformed scale

Annualised relapse rate 0.647 0.180 0.364 1.068

Proportion avoiding relapse 0.429 0.068 0.300 0.566

Proportion with disability progression, confirmed 3 months later 0.266 0.059 0.162 0.394

Proportion with disability progression, confirmed 6 months later 0.200 0.054 0.109 0.320

Proportion with ALT > ULN 0.111 0.035 0.056 0.191

Proportion with ALT > 3x ULN 0.038 0.014 0.018 0.071

Proportion with ALT > 5x ULN 0.020 0.009 0.007 0.041

Proportion with serious gastrointestinal events 0.0002309 0.0004090 0.0000001 0.0015180

Proportion with serious bradycardia 0.0029820 0.0015840 0.0001516 0.0061160

Proportion with macular edema 0.0007984 0.0008261 0.0000013 0.0029080

Between-study heterogeneity sd 1.076 0.112 0.881 1.317

The distributions appear perfectly plausible and reflect the data well. As expected, there is

considerably more between-study heterogeneity in the untreated outcomes than in the relative

treatment effects, with the standard deviation here being roughly 4 times larger than in the

treatment effects module.

The proportions for the final three outcomes are so low that one could probably quite reasonably

set them equal to zero for most practical purposes, but they will be retained in the model here.

II.6.3 Final synthesised outcomes on absolute scale

II.6.3.1 Population-average outcomes

Figure 28 summarises the posterior distribution of the population-average outcomes on the Normal

scale used for modelling. The relapse rate (which is almost always less than one) and the odds of

Chapter II.6

124

relapse, disability progression and liver enzyme elevation (which are almost always lower than

evens) appear negative on this scale due to the logarithmic transformation.

Figure 29 shows the same distributions, back-transformed to their original scales. On this scale the

absolute level of outcome is always positive.

Chapter II.6

125

Figure 28 - Posterior distributions of absolute treatment outcomes (population averages) on Normal scale from the final model (fixed effects on “zeroes” outcomes, otherwise random effects, three mapping groups, all correlation coefficients between outcomes = 0.6). Markers indicate means and lines indicate 95% credibility intervals. Solid markers and lines are treatment-outcome combinations for which data was available; white markers and dashed lines are estimated by mappings. Any treatment-outcome combinations not shown are assumed to have the same distributions as placebo. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. GI = gastrointestinal. sd = standard deviation.

Chapter II.6

126

Figure 29 - Posterior distributions of absolute treatment outcomes (population averages) on their original scales from the final model (fixed effects on “zeroes” outcomes, otherwise random effects, three mapping groups, all correlation coefficients between outcomes = 0.6). Markers indicate means and lines indicate 95% credibility intervals. Solid markers and lines are treatment-outcome combinations for which data was available; white markers and dashed lines are estimated by mappings. Any treatment-outcome combinations not shown are assumed to have the same distributions as placebo. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. GI = gastrointestinal. sd = standard deviation.

Chapter II.6

127

II.6.3.2 Predictive distributions

Figure 30 summarises the posterior predictive distributions of the study-level average outcomes.

Note that for the final three outcomes, which employ fixed effects (and fixed untreated values),

there is no change from Figure 29. The annual relapse rate has been capped at 3 relapses per year

as otherwise the tail of the distribution contains arbitrarily (and unrealistically) high values.

Figure 31 summarises the posterior predictive distributions of the patient-level outcomes. In this

case the probabilities of relapse, disability progression are frequently capped at both 0 and 1 (i.e. the

minimum and maximum values for a probability), to the extent that the 95% credibility intervals

cover pratcially the entire interval [0,1].

Chapter II.6

128

Figure 30 - Posterior distributions of absolute treatment outcomes (study-level averages) on their original scales from the final model (fixed effects on “zeroes” outcomes, otherwise random effects, three mapping groups, all correlation coefficients between outcomes = 0.6). Markers indicate means and lines indicate 95% credibility intervals. Solid markers and lines are treatment-outcome combinations for which data was available; white markers and dashed lines are estimated by mappings. Any treatment-outcome combinations not shown are assumed to have the same distributions as placebo. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. GI = gastrointestinal. sd = standard deviation.

Chapter II.6

129

Figure 31 - Posterior distributions of absolute treatment outcomes (individual-level) on their original scales from the final model (fixed effects on “zeroes” outcomes, otherwise random effects, three mapping groups, all correlation coefficients between outcomes = 0.6). Markers indicate means and lines indicate 95% credibility intervals. Solid markers and lines are treatment-outcome combinations for which data was available; white markers and dashed lines are estimated by mappings. Any treatment-outcome combinations not shown are assumed to have the same distributions as placebo. DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. GI = gastrointestinal. sd = standard deviation.

Chapter II.6

130

II.6.4 Rankings

Figure 32 and Figure 33 show the SUCRA statistics (see II.5.4) based on population average outcomes

in the fixed mapping and random mapping models respectively, for the three different grouping

structures. The changing formats within Figure 32 reflect that the rankings are equivalent for any

outcomes that are grouped together, an inherent property of the fixed-mapping model. In

particular, in the one-group model (top left), the treatment effects for every outcome occur in the

same proportions (relative to one another) for all treatments. This means that the treatment

rankings are essentially equivalent for all outcomes, but the rankings for efficacy outcomes (where

the treatment effects have a positive impact on the patient) are reversed for the liver safety

outcomes (where the impact of treatment is negative); this results in the SUCRAs for efficacy and

safety always summing to 1. This is not the case with more than one mapping group, where the

rankings can differ between the groups.

Figure 32 - SUCRA based on population averages; fixed mapping model

Chapter II.6

131

Figure 33 - SUCRA based on population averages; random mapping model

The rankings for serious gastrointestinal events are largely unaffected by the mappings and are

shown in Figure 34. The rankings for serious bradycardia and macular edema are not worth a graph;

fingolimod is always ranked lowest and all other treatments equal first.

Chapter II.6

132

Figure 34 - SUCRA based on population averages: serious gastrointestinal events

Figure 35 shows how the SUCRA figures change depending on the level of predictive variability that

is accounted for. The figures are based on the efficacy and liver safety outcomes in the three-group

fixed-mapping model but the impact is similar for other outcomes/models. If one considers

predictive distributions at either the study or patient level, instead of population averages, the

increased variability of the treatment effects feeds through to the rankings, which become more

random and less systematic. The ultimate effect on the SUCRA statistics, as revealed in Figure 26 is

to that all treatments’ scores are shrunk towards 0.5, the value that indicates neutral distribution of

rankings. This effect is particularly pronounced in the patient-level predictive distributions for the

RRMS dataset, a stark result that suggests treatment is a very poor predictor of the outcomes an

average RRMS patient will experience, with patient-to-patient heterogeneity and random chance

playing a far greater role.

One can also observe that the equivalence of rankings for outcomes in the same group is slightly

disrupted by the additional variability in the predictive distributions.

Chapter II.6

133

Figure 35 - SUCRA for the efficacy and liver outcomes in the three-group fixed-mapping model: the impact of predictive variability.

If the underlying rankings themselves are of interest, rather than the high-level summary provided

by SUCRA, Figure 36 provides one possible visualisation, showing the proportion of posterior

samples in which each treatment was at each ranking level for a particular outcome statistic (in this

case the population average annualised relapse rate in the one-group random mappings model –

similar graphs for the other outcomes/models are available in Appendix C). These particular

rankings underlie the SUCRAs shown by the darkest green bars in the top-left graph in Figure 33.

Graphs such as this may help to compare performance when the SUCRAs do not distinguish clearly

between treatments (such as glatiramer acetate, subcutaneous interferon beta-1a and laquinimod in

this example), but being based on rankings they still do not convey any sense of whether differences

in ranks correspond to clinically meaningful differences in outcome. For this one must examine the

posterior distributions of the treatment effects.

Chapter II.6

134

Figure 36 - Probabilistic rankings for the population average relapse rate, one-group random mappings model

II.6.5 Conclusions regarding RRMS treatments

Given the results on all ten outcomes, a decision maker must somehow put together all ten sets of

results to come up with an overall score or rule that ranks the treatments. The next chapter will

address this problem by incorporating additional parameters in the model relating to outcome

importance. At this stage, however, one can still make a few general observations about the

treatments’ performance:

• Fingolimod ranks highest in terms of relapse prevention but is not quite so outstanding with

regard to disability progression, and is one of the worst treatments for most safety

outcomes.

• Interferon beta-1b performs well on all efficacy measures but poorly on liver safety.

• Dimethyl fumarate and glatiramer acetate both perform well on efficacy (where dimethyl

fumarate has a slight lead) and liver safety (where glatiramer acetate is second only to

placebo), but both are associated with serious gastrointestinal adverse events.

• Intramuscular interferon beta-1a does well on liver safety but not so well on efficacy.

The efficacy findings are broadly in line with the results of the Cochrane review 81 but there are some

differences. Most notably, glatiramer acetate appears less effective in our analysis and dimethyl

fumarate appears more effective. This difference persists even when the analysis is restricted to the

Cochrane efficacy outcomes and performed on a univariate basis. The difference appears to relate to

our differing approaches to trials using non-standard dosages: the Cochrane review pooled all

dosages for each treatment, whereas here, study arms that did not use the normal dosage were

excluded. Dimethyl fumarate and glatiramer acetate were among the drugs with more than one

dosage used in trials. Surprisingly, the Cochrane review’s safety results (based on discontinuation

Chapter II.7

135

due to any adverse events) are largely in line with the liver safety rankings here (based on biomarker

tests); this similarity may be a chance finding or could perhaps indicate that elevated liver enzymes

can act as a proxy for lack of tolerability in a more general sense. But worth bearing in mind is that

lack of efficacy can be intolerable too, and a recent observational post-marketing review of RRMS 121

found that compliance with treatment was highest for fingolimod and other very effective drugs,

which suggests that most MS patients may value efficacy higher than safety.

II.6.6 Sensitivity analyses

Results of the sensitivity analysis on the assumed priors and correlation parameters are shown in

Appendix C. Alternative non-informative priors were found to have little impact on the results.

Assuming extreme values for the correlation coefficients (0 or 0.9) had some impact on the

individual treatment effect estimates but not so much on the rankings; vague priors on the

correlation propensities did affect the rankings somewhat but not sufficiently to have much impact

on the overall conclusions.

II.7 Discussion

Multivariate network meta-analysis models are relatively novel, but this is an active research area

and various models have recently been proposed, as described in the literature synopsis. Few

applications exist, however, beyond the examples used to introduce the models, and clearly more

experience is needed to evaluate the various approaches’ reliability, practicality and generalizability.

For reasons already discussed, Bayesian multivariate NMA lends itself well to the demands of

quantitative benefit-risk assessment, and the growth of this field may lead to more applications in

future. The computations themselves do not take especially long on modern computers; all of the

models in this chapter that assumed constant correlations between outcomes took less than 5

minutes to run 200,000 iterations in OpenBUGS (version 3.2.2 rev 1063) on a Microsoft Surface Book

2 (i5-8350U 1.70 GHz quad core) running Windows 10; where random correlation propensities were

used, the run time did not exceed 30 minutes. Bayesian modelling using MCMC remains a highly

specialised discipline, however, and the expertise required to set up and interpret such analyses is

likely to be the most significant barrier to more widespread adoption.

The family of models presented here share a combination of features that stands out compared to

other published multivariate NMA models, particularly with a view to benefit-risk assessment:

estimation of missing treatment-outcome combinations, estimation of outcomes on the absolute

scale, flexibility in the assumed correlation structure, code that needs minimal adaptation for each

Chapter II.7

136

dataset, no need to specify covariance arrays in the data. Among the factors that may discourage

researchers and analysts from employing multivariate NMA models are complexity of

implementation and patchiness of data, and these models address both issues.

There are however a number of limitations. The model developed here relies on multivariate

Normal distributions to allow for within-study correlations since they are mathematically tractable

and can be used to approximate most other common distributions. Even implementing multivariate

Normal distributions is not always easy in BUGS, however, and the novel construction developed in

II.4.4 was used in order to facilitate the coding. Other software packages (for example Stan122) may

be able to overcome this limitation.

Avoiding the Normal approximations altogether may be difficult as extending other distributions into

the multivariate domain is not straightforward. In the case of several binomial outcomes, one

possible approach might be to characterise the joint likelihood by specifying all of the conditional

probabilities, but this may be rather cumbersome for large numbers of outcomes and it would

presumably be very difficult to parameterise and code such a model for an arbitrary dataset.

Combining different outcome types will increase the difficulty. It seems likely that using

transformations to achieve an approximation of multivariate Normality will remain the favoured

approach.

Within the multivariate Normal framework, binary events with zero rates can be handled on a risk

difference (or rate difference) scale, but if within-study correlations are to be included, this must be

implemented by post-hoc tuning of the variance data to obtain the correct posteriors. This is not an

ideal solution – not only does it lack rigour, it also makes the process somewhat longwinded as

several model runs are needed – once to establish the results using the “exact” likelihood in a

univariate context; several more runs to tune a Normal likelihood to achieve equivalent results; and

then a final run using the Normal likelihood with within-study correlations. A formula can be derived

for the sample variances (as a function of the sample proportion) that were fed to the final model in

the RRMS case study, but whether this will be generalisable to other datasets is unclear and until

this is established it would be prudent to repeat the same iterative tuning procedure.

It may also be possible to use the inverse relative risk scale for binary outcomes with zero rates, but

this has not been pursued here as it has no clear advantage over the risk difference, presents similar

issues with estimating the sample variance, and suffers from a slightly awkward interpretation.

Chapter II.7

137

Any zeroes in the data for outcomes modelled on the logit scale or log rate scale could be handled

via a continuity correction, although this was not necessary with any of the RRMS outcomes we

considered.

The decomposition of the multivariate Normal distribution into an array of univariate Normals is

restricted to a particular class of correlation structures; but one that is arguably sufficiently broad for

most contexts. It has the advantage of using a relatively small number of parameters to encode the

covariance matrix, thus leaving sufficient degrees of freedom to allow the model to be fitted in the

small patchy datasets for which it is designed.

It appears that allowing for correlations in the RRMS case study may have only had a substantial

impact on the results when the unrealistic assumption of universal positive correlations was

imposed, and not for more realistic correlation structures. It remains to be seen if this is always the

case for other datasets, however. Simulation studies may help to clarify the model’s performance –

and the impact of the allowance for correlations - in a range of different scenarios.

Not all possible types of outcomes have been included in the model, most notably survival/time-to-

event outcomes. Given data on the survival probabilities in each treatment arm at the time point(s)

of interest, it should in principle be straightforward to incorporate such outcomes if one assumes

proportional hazards as per the Cox model. . For example, if outcome k in a trial is the survival

probability at a fixed time point, the relative treatment effect 𝑑𝑡𝑘 can be defined as the log hazard

ratio between treatment t and the reference treatment 1; then the log hazard ratio of treatment 𝑡2

compared to 𝑡1 is 𝑑𝑡2𝑘 − 𝑑𝑡1𝑘 and the ratio of the survival probabilities (i.e. the inverse relative risk

of failure) is the exponential of the hazard ratio, i.e. 𝑒𝑒𝑑𝑡2𝑘

−𝑑𝑡1𝑘 . This should make it straightforward

to specify (a Normal approximation of) the binomial likelihood of the survival data using the inverse

relative risk to compare treatments.

Categorical outcomes with more than two categories (i.e. multinomial) have not yet been

incorporated; allowing for correlations involving these outcomes could be difficult, but it may be

possible to find a way, perhaps by using the “zeroes trick” in BUGS to specify a custom likelihood123.

Throughout the RRMS case study I have assumed that the same between-study standard deviation

sigma applies to the treatment effects on all outcomes, although they are expressed on different

scales and represent different clinical effects. It may be more realistic to explore using a different

standard deviation parameter for each outcome, but for reasons of parsimony this was not done

here. Expressing the multivariate Normal distribution as a combination of univariate Normals is still

possible in such a model, as shown by Theorems 1 and 2.

Chapter II.7

138

The assumption of proportionality between outcomes, underlying the mappings, is a strong one.

This can be mitigated to some extent by grouping the outcomes so that mappings are only applied

between outcomes that are particularly closely related, and by using random mappings so that the

proportions do not have to be exactly equal on all treatments. Potentially one could go further and

build more flexibility into the mapping relationships by adding more parameters – for example,

introducing a power parameter q so that the mapping equation for treatment t becomes 𝑑𝜔𝑡𝑞 =

𝛽𝜔𝑡𝑑1𝑡𝑞. This power parameter could be assigned different constant values as a form of sensitivity

analysis, or given a vague prior.

The mappings, as currently defined, should only be used where the treatment effects on the

outcome(s) used as the baseline for mappings are non-missing for all treatments. Should any of

these treatment effects be missing, the baseline outcome will itself be estimated via the mappings

and the model will not behave as intended. This issue may be avoidable if one reparameterises the

model such that the baseline outcome varies by treatment according to data availability, and this

could facilitate applications to datasets with a higher degree of patchiness. There are other

restrictions in the model that could perhaps be overcome, such as the assumptions of equal variance

across treatment contrasts and across log treatment effects that give rise to fixed correlations of 0.5

between random effects and between mappings respectively.

The mappings cannot be used for outcomes where the effect size comparing a treatment to baseline

t=1 is zero for some outcomes and non-zero for others in the same mapping group, as this is not

consistent with a linear scaling between the effects on different outcomes. In particular, adverse

effects that are only associated with one (or a subset) of the treatments may produce spurious non-

zero results for any other treatments in the model if they are subjected to mappings. In practice this

should not present problems if one selects the mapping groups appropriately.

Another method to allow for correlations or mappings, not pursued here, is to define explicit

structural relationships between outcomes. For example, using the RRMS case study, one could link

the four relapse and disability progression outcomes by assuming that within each study there is a

constant annualised relapse rate r; that a proportion φ of relapses lead to a disability progression

that persists for 3 months; and a proportion θ of such disability progressions persist for a further 3

months. The mathematics of rates and proportions then implies that

𝑃(𝑎𝑣𝑜𝑖𝑑𝑖𝑛𝑔 𝑟𝑒𝑙𝑎𝑝𝑠𝑒) = 𝑒−𝐴𝑅𝑅∗𝑡,

𝑃(𝑑𝑖𝑠𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑝𝑟𝑜𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑐𝑜𝑛𝑓𝑖𝑟𝑚𝑒𝑑 3 𝑚𝑜𝑛𝑡ℎ𝑠 𝑙𝑎𝑡𝑒𝑟) = 𝜑𝑃(𝑎𝑣𝑜𝑖𝑑𝑖𝑛𝑔 𝑟𝑒𝑙𝑎𝑝𝑠𝑒), and

Chapter II.7

139

𝑃(𝑑𝑖𝑠𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑝𝑟𝑜𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑐𝑜𝑛𝑓𝑖𝑟𝑚𝑒𝑑 3 𝑚𝑜𝑛𝑡ℎ𝑠 𝑙𝑎𝑡𝑒𝑟) = 𝜃𝜑𝑃(𝑎𝑣𝑜𝑖𝑑𝑖𝑛𝑔 𝑟𝑒𝑙𝑎𝑝𝑠𝑒)

where t is the number of years of follow-up. Or one could model the distribution of ALT within each

study as a continuous variable and derive the binary outcomes used in the models by using

thresholds. This kind of approach has not been pursued here for two main reasons:

(i) It is difficult to give any generic recipes for such models as they will be highly dependent

on context. The mapping-based models on the other hand take a very generalised form.

(ii) The assumptions underlying a specific structural model may be somewhat stronger than

the relatively loose proportionality condition used for the models in this chapter.

Placing vague priors on the mappings and correlation coefficients allows one to say very

little at all about the extent and form of any relationships while still allowing for the

possibility that they exist.

However, there are situations when specifying explicit structural relationships may have advantages,

such as when the nature of the relationships is known with certainty a priori, or when there is readily

available data on the structural parameters (such as φ and θ as defined above).

Here mappings have been used merely as a tool to synthesise missing outcome measures, but the

model could also be used to investigate the mappings themselves in order to establish surrogacy

relationships between outcomes, as has previously been done with a similar model 89.

It is good practice to only rely on network meta-analyses when the data are consistent; that is, there

should be little conflict between effects estimates that form closed loops in a network. Where

mappings are used, one should also pay attention to the consistency of the mapping ratios.

Techniques have been developed to assess treatment effect consistency in univariate NMAs 43,44 and

in principle a multivariate extension to handle mapping consistency should be possible; this has not

been addressed directly in this thesis but is a target for future work.

The fixed-baseline parameterisation of the treatment effects module (as opposed to the more

common variable-baseline approach) needs summary data for each trial arm, rather than contrast-

level data comparing each treatment to baseline within a trial. Arm-level data is generally published,

but should it not be available for all studies then the corresponding variable-baseline model can be

used instead (with additional within-study correlations reflecting that contrasts observed within a

trial are expressed relative to the same baseline arm and hence correlated). The population

calibration module does however fundamentally require arm-level data, and so any contrast-only

studies would need to be omitted from the data for this part of the model.

Chapter II.7

140

Many biostatitistical methods assume that relative treatment effects comparing pairs of arms within

RCTs tend to be homogeneous throughout the population, with most study- and patient-level

variables affecting only the baseline outcome levels. Where variables do impact on the relative

treatment effects, these are known as effect modifiers, and their influence can cause problems in

evidence synthesis 65,124,125. When combining evidence from multiple studies, or extrapolating to

different populations, any differences in the distribution of effect modifiers can lead to confounding

and bias in the treatment effect estimates.

The key assumption in (multivariate) NMA (and hence a key limitation on when it can be applied) is

that the distribution of effect modifiers is the same in all studies (and in the population that is the

target of the decision). A special case where this assumption may not hold is in the presence of

publication bias: when some outcomes for some treatments are not reported due to poor

performance, then outcome missingness may be a (proxy for one or more) effect modifier(s), and

there may be a heterogeneity in effects between the studies and the target population. In practice

there may be little way to avoid the data being coloured by publication bias but methods exist to

detect it 126 and one could make allowance for its impact, say by being conservative in other ways

such as in the setting of priors or utility trade-offs. This relies, however, on being able to anticipate

the likely direction of the impact that publication bias has on the evidence synthesis. For trials with

a placebo (or established active treatment) control arm, publication bias will usually favour the

active/experimental treatment arm; where trials compare two new treatments (such as many multi-

arm trials), the direction of bias may be less obvious. In network meta-analyses, the network

structure becomes significant in determining the likely impact of any bias: in simple “star”-shaped

networks of placebo-controlled studies, for example (see Figure 37), the bias will usually act in

favour of active drugs whose outcomes go unreported and against those where reporting is more

complete; in more complex networks the impact may be less straightforward, although sensitivity

analyses could be used to test various scenarios.

Figure 37 – A “star”-shaped evidence network with six active treatments (A, B, C, D, E, F) and placebo (P).

Chapter II.7

141

It is not immediately clear what is driving the between-study heterogeneity in the RRMS case study.

The populations of the source studies mostly appear very similar in terms of the distribution of age,

gender and disability level, but show considerable variation in geographical location, time since MS

diagnosis and treatment history, among other factors. It may be that one or more of these factors is

acting as an effect modifier; specific hypotheses could be tested by including these factors as

covariates in the model (in other words, turning the network meta-analysis into a network meta-

regression).

As previously noted, the modelling in this chapter is not intended to reflect the full safety profile of

the RRMS treatments involved, as to do so would have resulted in an impractically large example

dataset. The safety outcomes adopted have been chosen to illustrate the model’s features and

capabilities. It would therefore be wrong to interpret the results as a reflection of the overall

benefit-risk balance for these drugs. However, a fairly comprehensive set of efficacy outcomes has

been included, and to the extent that the data are reliable and consistent, the results for the

outcomes presented here should be meaningful.

In summary, the modelling approaches detailed in this chapter make many unique and useful

contributions to the field of multivariate evidence synthesis, including:

• efficient coding of arbitrary multivariate NMA models with flexible correlation structures

• mappings that exploit related outcome definitions to fill in gaps in data and borrow strength

• ability to incorporate outcomes with zero rates

• a population calibration model accompanying the main evidence synthesis, going beyond

treatment contrasts to estimate the distribution of real-world outcomes.

Ultimately the aims of the chapter have been achieved. A model (or rather family of models) has

been constructed that can perform a principled Bayesian evidence-based multivariate synthesis of a

variety of outcomes, for a number of difference treatments, with variables on a scale suitable for use

in a multi-criteria decision model. Implementing such a decision model, and deriving Bayesian

estimates of its preference-related parameters, is the focus of the next chapter.

Chapter III.1

142

III. Bayesian multi-criteria utility modelling

III.1 Background, aims & objectives

III.1.1 Introduction

A benefit-risk assessment is an example of what is known in decision theory as a multicriteria

decision. The idea is to select between a group of alternatives (here, treatments) on the basis of

several criteria (here, the benefits and risks contributing to the overall balance). The previous

chapter discussed methods for obtaining estimates of each treatment’s performance in relation to

each criterion, resulting in an effects table such as the simple examples in I.1.5 or the forest plots

from the RRMS case study in II.6.3.

In this chapter the existence of a fully populated effects table is taken as given; now the focus shifts

to using these estimates to formulate an overall decision. Multicriteria decisions can be trivial if one

treatment is optimal with regard to all criteria. In general, however, the treatments may be ranked

differently depending which criterion is used as the basis for judgement. Multicriteria decision

analysis (MCDA) is a discipline that breaks down decisions in order to resolve conflicts between

criteria and determine which treatment is favoured overall.

III.1.1.1 MCDA

An informal example of the MCDA approach, applied to the context of benefit-risk, was given in

I.1.5. The discussion here will focus more on the theoretical and technical underpinnings.

MCDA is not so much a single method as a field of study; the term can be applied to a diverse family

of approaches to decision problems 127. They all however have the following general characteristics

(the interpretation in the benefit-risk context is given in parentheses):

• The decision amounts to a choice between a number of alternatives (i.e. treatments)

• Several criteria are relevant to the decision (i.e. efficacy and safety outcomes)

• Each alternative is assessed with regard to each criterion (i.e. the effects of each treatment

on each outcome are evaluated)

• (Optionally) a method is provided to aggregate the indvidual criterion assessments into an

overall assessment that accounts for all criteria (i.e. the overall benefit-risk balance is

evaluated).

The final step (aggregation) is present in some formulations of MCDA but not others 12. The version

of MCDA regarded here as canonical is that set out by Keeney and Raiffa 128, which has its roots in

Chapter III.1

143

multi-attribute utility theory and uses explicit elicitation of the trade-offs between criteria to

construct an aggregated score.

This formulation of MCDA is an example of a compensatory decision-making method, which means

that changing the value of one criterion can be compensated for (in terms of overall value) by a

change in one or more of the others 129. For example, an increase in unwanted side effects might be

mitigated by an increase in efficacy, leaving the overall perceived value of treatment unchanged.

This is what is meant by a trade-off 128. In non-compensatory decision-making this does not hold and

a variety of other decision rules may apply. For example, any alternative where one particular

criterion exceeds (or falls below) a certain threshold might be automatically accepted (or rejected) -

such an approach has been proposed in the benefit-risk context 130. It is not too difficult to imagine

that an individual patient may have a non-compensatory attitude to benefits and risks of treatment.

From the perspective of a regulator or drug developer making decisions based on outcomes at the

population level, non-compensatory attitudes are possible for extremely poor outcome values

(efficacy less than placebo, or risk at unacceptably high levels) but, crucially, such “non-starter”

drugs should usually be easy to identify and can be screened out prior to applying MCDA. Indeed, in

real life it seems highly unlikely that any such treatments would ever get though early phase trials

and reach the stage where a population benefit-risk assessment is carried out. Any risks that have

not emerged prior to large Phase III trials are unlikely to occur with high enough frequencies in the

treated population to justify a non-compensatory approach. It does not seem unreasonable to

suppose that a population-level decision-maker will exhibit compensatory attitudes over the range

of outcomes represented by viable real-world alternatives, and this assumption will underlie the

methods in this chapter. If the decision maker takes a non-compensatory perspective, then the

MCDA model proposed by Saint-Hilary et al 130 may be a viable approach that still has much in

common with the models described here.

There is plenty of common ground between MCDA and health economic decision models 131. MCDA

aggregates multiple outcomes by placing them on a utility scale, while economic models typically

express outcomes on a monetary or QALY scale (based on implicit or explicit valuations of the

underlying trade-offs). Economic models can also be used to analyse decisions with uncertain

consequences (using decision trees, for example) 132. Much of this work will therefore translate

readily to the realm of economic models; however, it seems more appropriate to use MCDA as the

main basis for the methodology since (i) economic costs are usually not considered in benefit-risk

assessments, and (ii) the explicit focus on preferences and trade-offs in MCDA provides helpful

clarity on the underlying decision principles.

Chapter III.1

144

III.1.1.2 Multi-attribute utility theory

Multi-attribute utility theory (MAUT) is an extension of utility theory (which was originally

formulated to deal with only a single criterion) into the multi-attribute (i.e. multi-criteria) domain. It

defines preferences using the concept of utility, which is a cardinal measure of value or

satisfaction133. In MAUT an individual’s utility is taken to be a real-valued function (the utility

function) of a number of underlying criteria. The defining characteristic of a utility function U(X) is

that 𝐸𝐴[𝑈(𝑿)] ≥ 𝐸𝐵[𝑈(𝑿)] if and only if A is at least as desirable as B, where A and B are probability

distributions over the space of multi-attribute consequences X and 𝐸𝐴, 𝐸𝐵 are the corresponding

expectations128. A theorem by von Neumann and Morgenstern guarantees the existence of such a

function given some basic axioms regarding the nature of preferences, and shows that it is unique

up to a positive linear transformation 133. The latter point makes intuitive sense since utility thus

defined has no absolute meaning; all that matters is whether the expected utility of A exceeds that

of B, and this does not depend on any particular linear scale. Various versions of MAUT have been

developed (depending for example on whether the criteria can be evaluated with or without

certainty) 134 but it is a closer family than MCDA as a whole, as some MCDA methods do not attempt

to quantify preferences in a cardinal fashion.

It is of course impossible to account for every single criterion influencing an individual’s overall utility

– potentially this could include any aspect of life from the weather to what they had for breakfast –

but this is not necessary in order to apply MAUT; one can simplify the utility model by focusing on a

set of criteria that are relevant to the decision at hand 128,135.

MAUT provides a rigorous axiomatic mathematical framework for multi-criteria decisions through

the eyes of a rational decision maker (i.e. one who seeks to maximise expected utility 133). While it

can be argued that individual decision makers may often not live up to expectations of rationality,

those such as regulators who make official decisions on behalf of the general public should generally

aim for their decisions to be justifiable, and hence rational 134.

Certain classes of multi-attribute utility functions (MAUFs) are often favoured due to their

tractability:

• Additive and linear in partial values: Such a MAUF can be expressed as a weighted sum of

partial values for the various criteria, where a partial value (or partial utility) function is itself

a utility function restricted to one criterion. In this class the partial value functions (PVFs)

need not be linear. The general form is:

Chapter III.1

145

𝑈 = ∑ 𝑤𝑒𝑖𝑔ℎ𝑡𝜔𝑃𝑉𝐹𝜔(𝑥𝜔)

𝜔=1

If the utility function is of this form then the criteria are said to be mutually utility

independent 128,134. In simple terms this means the criteria weights, 𝑤𝑒𝑖𝑔ℎ𝑡𝜔 , are

independent of the criteria partial utilities, 𝑃𝑉𝐹𝜔(𝑥𝜔). Such models have been favoured in

part due to their high statistical tractability, which facilitates elicitation of preferences 34.

Applications date back at least as far as the 1960s 136.

As the scale of the utility function is essentially arbitrary, as per the von Neumann-

Morgenstern theorem133, the weights and partial values are unique only up to overall scaling

constants. It is conventional however to normalise the weights to sum to 1, and to scale the

partial values such that the maximum partial value for any alternative is 1 and the minimum

partial value is 0. These constraints allow a unique solution to be identified and limit overall

utility to the interval [0,1].

• Additive and linear in criteria: this is a subset of the above class where each PVF is also a

linear function of the underlying criterion measure 𝑥𝜔. Equivalently, the MAUF is itself an

additive linear function of the criteria and can be written as

𝑈 = ∑ 𝛼 + 𝑈𝐶𝜔𝑥𝜔

𝜔=1

where 𝑈𝐶𝜔 is a utility coefficient reflecting both 𝑤𝑒𝑖𝑔ℎ𝑡𝜔 and the linear coefficient of 𝑥𝜔

within its partial value function, and 𝛼 is an overall intercept term.

Although the Bayesian paradigm has been associated with decision theory for a long time, it has only

recently begun to be taken seriously with regard to multi-attribute problems. This is due in part to

the advent of MCMC sampling, which has helped to overcome serious difficulties in computation.

Now that this issue has been overcome there are good arguments supporting the use of Bayesian

approaches in this field 34 (see also I.1.5).

III.1.1.3 MCDA and MAUT in health

Multi-criteria utility elicitation techniques have been used for some time to construct health-related

quality of life measures 137,138 These composite measures are essentially utility values calculated

Chapter III.1

146

based on a number of underlying outcome measures; the parameters for the utility function are

estimated using the principles of MAUT (typically elicited using absolute scenario ratings). Such

measures tend not to be suitable for benefit-risk assessment as they do not capture the specific risks

associated with each treatment.

Beginning in the 1990s, various attempts have been made to use MCDA to perform holistic

assessments of treatments and other health interventions. Such analyses have been used for

purposes including health technology assessments, economic evaluations and clinical prescribing

recommendations 139.

Around the turn of the millennium it was recognised that multi-criteria decision making techniques

could have the potential to put benefit-risk assessment on a more formal and reliable footing 6,49.

This led to several exploratory initiatives aiming to demonstrate proof of the concept by government

bodies, institutions, organisations and consortia such as the European Medicines Agency 9,

Innovative Medicines Initiative 10, International Society for Outcomes Research 12, and the

Pharmaceutical Research and Manufacturers of America. Many projects have now issued guidance

on best practice for MCDA in benefit-risk 5,11 55. There have also been a number of reviews and

recommendations specifically concerning utility elicitation methods for use in health related

fields140,141142,143.

This activity is reflected an increase in use of quantitative benefit-risk methods by pharmaceutical

companies and other industry stakeholders 15.

III.1.1.4 Notation and nomenclature

A great deal of disparity and ambiguity of terminology exists in MCDA, owing in part to the

independent parallel development of methods that would later be grouped under the MCDA

umbrella. The following conventions will be adopted here:

• The MCDA decision variables are known as criteria.

• Clinical variables are known as outcomes. There may be more than one outcome that can be

used to measure each criterion (for example, see Figure 10).

• Preferences is a general umbrella term for utility function parameters.

• Utility coefficient is the linear coefficient for a criterion in the utility function.

• (Preference) weights are utility coefficients normalised to sum to 1 across all criteria.

• Preference strength for a criterion is the log of the absolute value of its utility coefficient.

• The utility ratio of two outcomes is the absolute value of the ratio between their utility

coefficients.

Chapter III.1

147

• Relative preference strength for two criteria is the difference between their preference

strengths (or equivalently, the log utility ratio).

The reasoning behind using these particular numerical measures will become clearer when the

model parameterisation is set out in sections III.2 to 0. In the meantime, defining these terms

facilitates discussion of existing preference elicitation methods using consistent terminology.

III.1.2 Preference elicitation methods

A wide variety of methods have been developed for eliciting the parameters of the utility function

(often simply referred to here as preferences) 141,143,144 . Elicitation can be carried out in groups, one

to one sessions, or individually via paper, internet, or telephone. Participants are required to make

judgements regarding either:

• The value of the decision criteria themselves, in isolation or via pairwise comparisons (such

methods are known as compositional); or

• The overall value of scenarios involving several criteria at once (such methods are known as

decompositional) 145.

Elicitation methods also vary in terms of the format of judgements that participants are asked to

express, and thus the types of data that needs to be analysed. This will be covered in more detail in

III.1.3. First, the paragraphs below briefly introduce some of the well-known elicitation methods

that have been employed to elicit preferences for health outcomes. This is by no means intended as

an exhaustive list. Later in the chapter it will be shown how data from most of these methods can

be analysed in a generalised Bayesian framework.

III.1.2.1 Analytic Hierarchy Process

The Analytic Hierarchy Process (AHP) is a framework for multi-criteria decision making that was

originally developed by Saaty 146 and has spawned a substantial literature, with applications in many

fields and various technical adaptations/extensions of the methodology having been developed 147.

AHP features heavily in the literature on decision-making in other disciplines and there have also

been applications in health-related fields 148.

AHP is a compositional method that includes a technique for eliciting priorities (the AHP terminology

for preference weights or partial values) based on exhaustive pairwise comparisons of criteria (or

criterion levels). A judgement matrix is created with a row and a column for each criterion (or level),

such as that in Figure 38; participants fill in the matrix with estimates of the utility ratios between

the corresponding row and column.

Chapter III.1

148

Figure 38 – Example of an AHP judgement matrix. C1-C4 are the criteria (or criterion levels) to be compared; the symbol indicates where a judgement is to be entered estimating the utility ratio between the row and column criteria. The grey cells do not need to be filled in.

The numerical judgements in the matrix are usually expressed using values from 1 to 9 (or

reciprocals thereof 146, where 1 represents equal importance between the criteria and 9 (or 1/9)

represents an extreme difference in importance. Alternative numerical scales have also been

proposed 149-151. Typically the questions are administered in a paper or electronic questionnaire for

participants to complete individually.

A number of methods exist for deriving the weights or partial values from the judgement matrix.

Saaty originally proposed deriving the weight vector as the eigenvector of the matrix and this

remains the standard method, but it has been criticised for (among other reasons) its deterministic

nature and apparent lack of sound principles 152,153. An alternative regression-based method for

analysing AHP results has been proposed at various times in order to address these issues 149,151,154-

156; this tends to give equivalent weight estimates to the eigenvalue method but with the advantages

that is based on well-founded axioms and provides insight into the statistical properties of the

preference estimates 152,153 . The regression-based AHP analysis is particularly important for this

thesis because it is conducive to a Bayesian implementation 157.

III.1.2.2 Measuring Attractiveness by a Categorical BasEd TecHnique (MACBETH)

MACBETH is a compositional MCDA tool that, like AHP, requires participants to fill out judgement

matrices to provide judgements of the utility ratio between criteria on a pairwise basis. However, a

strictly verbal scale is used to express judgements. If one criterion is judged more important than

another, the difference is qualified as very weak, weak, moderate, strong, very strong, or extreme158.

Unlike the AHP judgement scale, these descriptions do not correspond to fixed numerical

differences. Instead, a specialised MACBETH software package derives numerical weights from the

Chapter III.1

149

judgement matrices that satisfy the verbal comparative judgements (and the implicit ordering

between them) using an internal algorithm, and also calculates a measure of consistency. The

specifics of the process are too complex to go into here but have been detailed elsewhere159.

III.1.2.3 Swing weighting

Swing weighting is another compositional method based on pairwise criteria ratings. Rather than

evaluating all possible pairs of criteria, however, a hierarchical tree structure is used 160. The method

emphasises the need to be clear about the amount of “swing” in each criterion when evaluating the

pairwise ratings, indicating its roots in Keeney-Raiffa’s canonical work on MCDA, which uses a similar

elicitation procedure 128.

Swing weighting is facilitated by use of a tree diagram. The series of figures below illustrate the

process using a simplified tree diagram based on RRMS treatment outcomes and administration

modes (see III.3.2.1 for the original version); the data has been made up for this example. The

details of the swing weighting process vary but the following is typical:

(i) A tree diagram is constructed with the criteria arranged in hierarchical groups (Figure

39). Here benefit criteria are shown in green, risk criteria in red, and administration

mods in blue. Note that the level of the swing has been emphasised in bold here for

each outcome; without quantifying the swings this way it is impossible to give

meaningful weights. For categorical variables such as the administration modes, it is

first necessary to rank the levels in terms of desirability and to use the lowest-ranked (in

this case daily subcutaneous) as a reference to define the swings.

Chapter III.1

150

Figure 39 – Swing weighting example using RRMS treatment outcomes and administration modes: step (i)

(ii) At each yellow cell in the middle level of the hierarchy, its subordinate cells (i.e. those

criteria on the right which branch off from it) are weighted numerically, with a notional

value of 100% for the most important branch and values between 0 and 100% for the

other branches, reflecting their relative value (Figure 40).

Chapter III.1

151

Figure 40 – Swing weighting example using RRMS treatment outcomes and administration modes: step (ii)

(iii) The top-ranked criterion in each group is “promoted” to its parent cell and the same

weighting process then takes place at the next level up the hierarchy (Figure 41). In

trees with more hierarchical levels, this process continues all the way up the hierarchy.

Chapter III.1

152

Figure 41 – Swing weighting example using RRMS treatment outcomes and administration modes: step (iii)

(iv) Each criterion’s overall weight is determined by multiplying its weight with those of its

“parent” cells at higher levels of the hierarchy (Figure 42). These weights are on an

arbitrary scale; it is conventional to normalise them to sum to 1 at the end of the

process.

Chapter III.1

153

Figure 42 – Swing weighting example using RRMS treatment outcomes and administration modes: step (iv). Final weights are shown in bold (right).

Note that this analysis is entirely deterministic. A fuller description of the process is given by

Mussen et al 50. The use of hierarchies and other network structures for elicitation will be discussed

further in III.1.3.3.1.

III.1.2.4 Choice experiments

Choice experiments (or discrete choice experiments, DCEs) are fundamentally different from the

methods discussed above as they are decompositional in nature and the responses are ordinal, not

cardinal. In other words, rather than asking participants to directly quantify or qualify their relative

preferences for individual criteria, DCEs consist of several choice tasks, each of which requires

participants to choose the most appealing option from a choice set consisting of a number of

different multi-criteria scenarios. The responses are then analysed (using regression-based

methods) to infer the strength of preference for each individual criterion. Figure 43 is an example of

a choice task used to elicit preferences for benefit-risk assessment.

Chapter III.1

154

Figure 43 – Example of a binary choice set. This is from the PROTECT RRMS patient choice experiment, which will be described in more detail in III.4.2. PML = progressive multifocal leukoencephalopathy.

The statistical properties of the estimates are dependent on the precise design of the DCE (i.e. which

scenarios are shown to which participants), which should be tailored to the problem under

investigation. DCEs were developed outside the Keeney-Raiffa 128 linear MCDA framework used in

this thesis and are not necessarily as constrained by the same assumptions regarding values/utilities,

but can be designed to elicit preferences for use in the linear MCDA setting.

Of all the well-known methods for preference elicitation, it is probably discrete choice experiments

that have the most extensive literature 161, and have been used for some time in health related

fields162-166. DCEs have been employed in benefit-risk modelling using MCDA, for example in the

PROTECT intiative 72.

III.1.3 Data types

This section sets out a formal system for classification of the data formats and structures commonly

used for preference elicitation.

III.1.3.1 Rankings and choices

Rankings express relative differences in value (giving no information as to the absolute scale) and are

expressed as ordinals. Choice data, in which the most highly valued element of a set is chosen, is a

Chapter III.1

155

type of partial ranking data in which only the top ranking is supplied. The majority of this section

deals with choice data; it will be explained below how choice methods can be extended to deal with

full rankings.

Choice task responses are analysed using choice models. The multinomial logit model (an extension

of binary logistic regression) is the most popular model for various reasons both practical167 and

theoretical168. The analysis of choice data typically works according to the following principles:

• The utility 𝑉𝑋𝑖 of a scenario X to an individual i is assumed to consist of (i) a deterministic

component 𝑈𝑋 defined as a specific function of the criteria, with parameters to be

estimated, and (ii) an individual-specific random error term 휀𝑖. That is, 𝑉𝑋𝑖 = 𝑈𝑋 + 휀𝑖 and

𝑈𝑋 = 𝑓(𝑥1, … , 𝑥𝑚; 𝜷) where 𝑥1, … , 𝑥𝑚 are the criteria values in scenario X and 𝜷 is the set

of preference parameters to be estimated. If a linear utility model is assumed, then 𝑈𝑋 =

𝛽1𝑥1 +⋯+ 𝛽𝑚𝑥𝑚 but the method is not restricted to this particular form.

• An individual i selects option A if 𝑉𝐴𝑖 > 𝑉𝑋𝑖 for all alternative options X.

• For the multinomial logit model, it is assumed that the error terms follow a Gumbel

(extreme value type I) distribution, with the result that the probability of selecting option A

in a choice task is given by 𝑃𝐴 = 𝑒𝑈𝐴 ∑𝑒𝑈𝑋⁄ where the summation is over all possible

options X 169.

• Given data on which options were selected, the coefficients 𝜷 and their standard errors can

be found by regression.

Extensions of the multinomial logit model, such as the exploded (or rank ordered) logit and

sequential best worst logit 170,171 have been developed to allow the analysis of full rankings data (as

opposed to the partial rankings provided by choice data). This is done by noting that a ranking of

several options can be broken down as a series of statistically independent choices: in the exploded

logit model, for example, it is assumed that participants first choose the best option, then choose

the best of those that remain, and so on. Any ranking data can thus be re-expressed as choice data

and analysed accordingly.

A popular alternative to the logit model is the probit model, where the errors follow a Normal (as

opposed to Gumbel) distribution. In practice there is usually little difference between the estimates

of either model, as the two distributions are very similar except in the tails 172.

Choice (and by extension, ranking) models are founded on sound statistical principles and utilise a

probabilistic, regression-based analysis. This all hints that a Bayesian implementation should be

Chapter III.1

156

natural and reasonably straightforward. Indeed, Bayesian applications and adaptations/extensions

of choice models already exist, and have been used in the health sciences 165.

One issue that has been identified with choice models is that their reliability decreases (and the

cognitive burden on participants increases) as the number of criteria increases 7,8. Benefit-risk

assessments may often contain more criteria than is recommended. One solution to this may be to

split the criteria between two or more separate choice experiments (with some overlap) and use a

model that can combine the results. Alternatively, some other elicitation method could be used to

augment the number of criteria. Such approaches require a model that can estimate preferences

based two or more datasets jointly. This has been attempted before on a limited and non-Bayesian

basis173; a Bayesian approach that can accommodate other data formats such as relative ratings

would be particularly useful for benefit-risk assessments.

III.1.3.2 Absolute ratings

Absolute ratings, in the context of healthcare, can only be used to express the participant’s

judgement of his or her overall health state. This is because there is a natural universal absolute

scale for overall health, ranging from 0 (“dead”) to 1 (“perfect health”). There is however no

universal absolute scale for ratings that express the importance of an individual criterion or

outcome; such judgements can only be evaluated relative to one another.

The only way to use absolute ratings to evaluate outcome preferences, therefore, is to ask

participants to rate multi-criteria scenarios on the natural absolute scale described above, and to use

these ratings as the dependent variable in a regression in order to estimate the parameters of the

utility function.

The use of such a model is common in marketing-based preference elicitation studies (which tend to

be known as conjoint analyses) 174, and there is a substantial literature on its theory and applications,

including Bayesian versions 175. In principle such models could be included in the scope of this

chapter. However, due to the lack of applications in the benefit-risk field and a lack of relevant data,

approaches based on absolute scenario ratings will not be pursued here.

III.1.3.3 Relative ratings

Relative ratings express the ratio of preference intensity between scenarios by comparing their

utilities (or incremental utilities). Most commonly, relative ratings are used to express the relative

importances of individual criteria/outcomes.

For example, a rating task might require participants – diabetes patients, say - to express their

relative preference for “avoiding a disability progression” versus “reducing the relapse rate by 1

Chapter III.1

157

relapse/year”. The response could be any positive number, with (for example) a value of 0.5

indicating that a myocardial infarction is half as important as a 10% change in body mass, or a value

of 100 indicating that it is 100 times more important. Negative ratings are not encountered, as it is

usually clear whether a criterion has positive or negative impact; this is made particularly

transparent in this example by the use of “avoiding” and “reducing” in the criteria descriptions.

Sometimes ratings are elicited on what may at first glance appear to be an absolute scale, with one

elicited value per criterion (rather than per pair of criteria). For example, an elicitation task involving

m criteria might ask participants to “place the most important criterion at value 100 and the others

at appropriate values xi between 0 and 100”. In such instances the data should be analysed as m-1

relative ratings of value xi /100 rather than as absolute ratings. The value of 100 has no meaning and

is simply an arbitrary fixed anchor point used to establish scale. It is recommended (and assumed in

this chapter) that any such data are transformed onto the relative scale before use.

Asking subjects to rate the relative importance of outcomes can however be somewhat woolly,

depending on how the questions are phrased – for what exactly are they being asked to compare? If

an RRMS patient rates the benefit of “relapse prevention” to be five times as important as the risk of

“serious gastrointestinal events”, are they comparing a single relapse to a single serious

gastrointestinal event? Or are they implicitly also weighting them by their incidences, judging

relapses as more important merely because they occur more frequently? Or, given an unclear

question, might they alight on an answer that lies between these two extremes?

The burden of disease outcomes is a function both of their frequency* and their seriousness, and a

failure to disentangle the two (and/or clearly communicate the required task to participants) will

render any elicited data practically meaningless. Unfortunately, this is a common pitfall in

preference elicitation, sometimes known as range insensitivity bias176. To account correctly for

frequency and seriousness there are two possible approaches:

(i) Elicit judgements that reflect both seriousness and frequency and evaluate the decision with

regard only to this elicited data; or

(ii) Elicit judgements that relate only to seriousness and combine these with other evidence on

the frequencies.

* For clarity, the discussion here is phrased in terms of frequency, i.e. the typical objective clinical measure for either binary or count outcomes. For other outcomes other clinical measures may be used (eg continuous variables representing clinical severity, time to event, etc) and can be substituted into the argument accordingly. The key point is that these quantities are measurable at the clinical level and do not have to be elicited from individual subjects.

Chapter III.1

158

Option (i) is employed in many formulations of MCDA (such as classical AHP 146) and may well be

appropriate if objective data on frequency is unavailable; in the benefit-risk context, however, an

evidence-based analysis should (presumably) reflect the best available clinical evidence on

frequency (or other objectively estimable clinical measures) and restrict the use of subjective

judgements (which may be subject to cognitive biases) to seriousness alone, i.e. follow option (ii).

This means ensuring that elicitation tasks comparing outcomes should always clearly refer to fixed

intervals of the outcomes involved, as is emphasised in swing weighting. If (as will usually be

assumed here) the partial value functions for the outcomes in question are linear, then the relative

importance depends only on the interval width (not its location on the overall scale), and the

elicitation tasks can be phrased accordingly, eg “compare a reduction of 1 in the annual relapse rate

against a reduction of 5% in the serious gastrointestinal event risk.”

One common practice is to set the interval width for each criterion equal to the difference between

the best and worst alternative with respect to that criterion 160,177. In a fully Bayesian MCDA, this

approach is not feasible as these intervals are not fixed but random quantities. Instead predefined

fixed intervals must be used, and arguably a single unit (of whatever outcome measure is used) is

the simplest to communicate to participants.

III.1.3.3.1 Structure of relative rating tasks

Given a set of criteria, there is more than one way to break down the preference elicitation problem

into a series of pairwise comparisons.

Relative preference intensities are of course transitive, so that given preference ratios for A over B,

and for B over C, one can derive the ratio for A over C as their product, as illustrated in Figure 44.

Figure 44 - Simple example of a network of outcome preferences (i). The preference ratio for A over B is 2, and for B over C is 3, so one can deduce that the preference ratio for A over C is 6.

But what if one has also directly elicited the ratio for A over C (Figure 45)? A direct and indirect

estimate will be available, and these may not be consistent.

Chapter III.1

159

Figure 45 - Simple example of a network of outcome preferences (ii). In this case there is inconstancy between the direct estimate for the preference ratio of A over C (4) and its indirect estimate (6).

There is a clear parallel with the network meta-analysis models of the preceding chapter, where

indirect and direct evidence are combined to inform pooled estimates of the treatment contrasts.

Indeed, the situation is perfectly analogous, as shown by the similarity between Figure 4 and Figure

45. This parallel will be drawn upon throughout this chapter, particularly in section III.4 where a

“network meta-analysis” model for preferences is proposed.

Fans and webs

It is in situations where the network of preference comparisons contain loops that pooling of direct

and indirect estimates takes place, and thus the possibility of inconsistency emerges. When

designing an elicitation experiment, one can in theory elicit any combination of comparisons that

forms a connected network of outcomes. It is worth highlighting two particular approaches,

representing opposite extremes:

• Compare all pairs of outcomes on an exhaustive basis, resulting in a fully-linked network

(here called a “web”) such as the example shown in Figure 46. This provides the maximum

amount of data but requires participants to make the largest possible number of

comparisons (which is 𝑛(𝑛 − 1)/2 for a network with n outcomes), and possibly generates

many inconsistencies. This approach is employed by methods including AHP and MACBETH,

some of which also provide means of calculating the inconsistency in the network.

Chapter III.1

160

Figure 46 – Example of a “web” network with six outcomes/criteria (left). Often the comparisons are entered into a triangular “matrix” such as that shown on the right.

• Choose one outcome relative to which the other outcomes are all compared, resulting in a

network of comparisons as in Figure 47, with no loops (and hence no inconsistencies). For n

outcomes this results in n-1 comparisons, the smallest possible number in a connected

network.

Figure 47 – Example of a “fan” network with six outcomes/criteria (left). The number of comparisons required (right) is much lower than for a “web”.

Hierarchical elicitation structures

Many elicitation methods use networks with a hierarchical structure. In a simple two-level hierarchy

this means that the outcomes/criteria are divided into several groups; comparisons are performed (i)

between criteria within each group at the lower level of the hierarchy and (ii) between the groups at

the upper level; these are then combined to give ratings for the full set of outcomes (see III.1.2.3 for

an example). For example, the upper level of the hierarchy might consist of the two groups

“Benefits” and “Risks”, containing the benefit and risk outcomes respectively. The overall

Chapter III.1

161

importance of an individual benefit (say) is a combination of its importance relative to the other

benefits (lower level), and the importance of benefits relative to risks (upper level). Hierarchies can

be extended to any number of levels (given sufficient criteria).

Reasons for using a hierarchical elicitation structure may include:

• To mitigate the impact of cognitive biases in the elicitation process 176. For example,

equalisation bias can result in comparisons being biased towards equality. The more

unequal in importance the criteria being compared, the bigger a problem equalisation bias

will be. To avoid this, using hierarchies can allow highly unequal criteria to be compared

indirectly via criteria of intermediate importance, or bundle several less important criteria

together for comparison with a single more important criterion.

• Reducing the number of comparisons – if one is using an elicitation method that uses webs

(eg AHP, MACBETH) then introducing hierarchies will reduce the number of judgements that

need to be elicited from each participant.

• Simply to structure the problem and as a guide to the thought process, or to aid in

communicating the results 160.

Most methods use one of two rules for transferring preferences up and down the hierarchy:

1. Agglomeration – this means a set of criteria at the lower level are represented at the upper

level as an agglomerated group, and thus any judgements at the upper level reflect the total

importance of the set. In the benefit-risk example, the upper-level comparison would be

between all benefits and all risks. The agglomeration rule is commonly employed in AHP.

Figure 48 contains an example of a network diagram for a two-level hierarchy using the

agglomeration rule with web structures at every level. This structure is typical of AHP.

Chapter III.1

162

Figure 48 – Hierarchical elicitation network for 10 criteria arranged in two groups of three and one group of four, using the agglomeration rule and webs at both levels of the hierarchy. The table on the right shows the comparisons that need to be performed by participants.

2. Substitution – this means that a set of criteria at the lower level are represented at the

upper level by a single member, usually taken to be the most important in the set (but more

on this below). In the benefit-risk context, the upper-level comparison would be between

the most important benefit and the most important risk. The substitution rule is commonly

employed in swing weighting (an example was shown in III.1.2.3).

Figure 49 is an example of a network diagram for a two-level hierarchy using the substitution

rule with web structures at every level.

Chapter III.1

163

Figure 49 - Hierarchical elicitation network for 10 criteria arranged in two groups of three and one group of four, using the substitution rule and webs at both levels of the hierarchy. The table on the right shows the comparisons that need to be performed by participants.

Figure 50 is an example of a network diagram for a two-level hierarchy using the substitution

rule with fan structures at every level. This type of network can be called a tree and is

typical of swing weighting (as in the example in III.1.2.3).

In a tree with a given number of criteria, the number of levels has no effect on the number

of comparisons (this is straightforward to prove by mathematical induction on the number

of criteria).

Chapter III.1

164

Figure 50 - Hierarchical elicitation network for 10 criteria arranged in two groups of three and one group of four, using the substitution rule and fans at both levels of the hierarchy – that is, a tree. The table on the right shows the comparisons that need to be performed by participants.

The first stage in the elicitation process is usually to decide which criteria are to be

promoted/substituted; before this is done the network must be drawn differently. For example,

prior to elicitation the tree in Figure 54 would typically be drawn as in Figure 55, known in the swing-

weighting context as a tree diagram or value tree (see also the example in III.1.2.3).

Chapter III.1

165

Figure 51 - Hierarchical elicitation network in Figure 50, shown before identification of criteria for promotion, i.e. in value tree format.

While the agglomeration rule may be appropriate for some decision problems, its use in rating

outcomes for healthcare decisions poses difficulties due to inaccurate clarity of scale, a pitfall

already encountered in the preceding section. If asked to value “all benefits” against “all risks”,

what exactly will a participant be expected to mentally compare? Is he/she expected to weigh the

outcomes according to the frequencies with which they are expected to occur? To answer in the

affirmative is arguably more the more natural interpretation for the participant, but the negative is

the only way to obtain preferences that can be combined with externally estimated frequencies. In

any case one cannot be sure of what basis the participant has used to mentally agglomerate the

outcomes, and so any data are as good as meaningless. One could in principle come up with an

explicit agglomeration formula, such as giving even weight to a fixed interval on each criterion, but

this would have to be made clear in the elicitation questions and would be rather cognitively taxing

on the participants.

The substitution rule is therefore the favoured approach here. A common convention is to follow a

bottom-up procedure whereby the lower levels are evaluated first, and the most important criterion

at each level is promoted to the next level up 29,50. The selection of the most important criterion for

Chapter III.1

166

promotion may be because it is natural for importance to increase as one advances through the

levels. However, there is no obvious reason why this approach should be universally recommended;

where differences in weight between criteria may be significant, a better strategy would appear to

be to group criteria with others of similar importance, and promote the most important outcome

from groups of overall low importance, and the least important outcome from groups of overall high

importance (i.e. to aim to equalise the importance as one moves up the levels in order to minimise

the impact of equalisation bias).

The outcomes to be promoted can be selected in advance, using a priori (or pilot study) estimates of

the likely ratings to fix the network structure for all participants. Alternatively, some elicitation

methods may facilitate a dynamic process (such as the bottom-up procedure described above) that

allows the promoted outcomes to vary between participants.

Note that the lower down the hierarchy a given criterion is, the more comparisons are involved in

determining its weight – and therefore the more uncertainty there is on the weight, giving more

scope for random error. This uneven uncertainty structure has not been allowed for in some

previous attempts to allow for uncertainty in MCDA which treat all weights symmetrically with

regard to uncertainty178. This also may be an argument against using too many hierarchical levels in

an elicitation network: too many levels may result in too much uncertainty on the weights.

Due to the use of an unclear agglomeration rule, I would be wary of using the traditional hierarchical

AHP method for benefit-risk preference elicitation, but single-level AHP matrices are not affected by

this and can be analysed with the models in this chapter.

III.1.3.3.2 Rating scales

It is acknowledged that different numerical scales can be used to discretise or verbalise the ratio

judgements, and due to psychologies of scales it has been argued that this affects the results149,179.

Such considerations are beyond the scope of this thesis and any numerical ratings will be analysed as

continuous variables on the scale on which they are originally expressed.

III.1.3.4 Summary data from published elicitation studies

Preference elicitation studies are now becoming more widespread in the medical literature, and for

any given disease there is a reasonable chance that researchers will be able to find several published

studies with elicited preferences for several outcomes pertinent to that disease and/or its

treatments. To the best of my knowledge, however, there have been no attempts to perform meta-

analysis upon the results of elicitation studies. This may simply be because no appropriate method

has yet been proposed. As with all meta-analyses, the benefits of such a method would be twofold:

Chapter III.1

167

the ability to aggregate the results of multiple studies and obtain an overall result; and the ability to

assess the level of preference heterogeneity between studies. I would argue that both of these are

urgent needs in the field of quantitative benefit-risk assessment for the following reasons:

• If the value of the preference-weighted approach to benefit-risk assessment at the

population level is to be established, then it must be shown that preferences are (at least

some of the time) reasonably homogeneous.

• If quantitative benefit-risk assessments are to be used to inform real-world regulatory

decisions, they should be able to incorporate all available relevant data on preferences.

It strikes me therefore that developing a meta-analysis model for preferences would be particularly

helpful to the field at the present time and this will be one objective of this chapter.

One possibility is to employ an approach analogous to network meta-analysis. A standard NMA

compares a set of treatments with respect to some outcome measure of interest, and each source

study provides information on that measure for a subset of the treatments. Here, by parallel, we

would like to compare a set of outcomes with respect to the strength of preference for those

outcomes – but apart from this change in context, the method requires little adjustment. Just as a

relative treatment effect measure comparing two treatments is assumed to be homogeneous

between studies in standard meta-analysis, the key assumption here is that the ratio of preference

strength between any two outcomes is homogeneous. This is consistent with an additive linear-in-

criteria utility function and the assumptions of the elicitation methods described in this chapter.

The method could be used to obtain preference estimates without carrying out a new elicitation

exercise. It could also prove useful if employed alongside new preference elicitation studies, both as

a check on the external validity of the results and also for planning purposes (eg obtaining prior

estimates of the results in order to establish sample size). Despite its potential usefulness, meta-

analysis of preferences does not feature in the MCDA literature.

III.1.4 Allowing for uncertainty in preferences

As we have seen in III.1.2 , the most popular ratings models for benefit-risk preference elicitation are

deterministic, providing no natural way to allow for uncertainty. Indeed, within healthcare fields,

allowance for uncertainty in preference modelling is more often than not missing, or carried out

using one-way sensitivity or scenario analyses 139. The latter approaches are by nature approximate

and cannot fully characterise the uncertainty in model outputs 17. Given the level of unfamiliarity

with preference modelling in the field and the associated potential for bias or imprecision 40,50, not to

mention heterogeneity within the population, these are discomfiting findings; a more sophisticated

Chapter III.1

168

approach to preference uncertainty is surely warranted in order to address any concerns that the

preference estimates are imprecise or that preferences themselves are highly variable.

In a few cases, more advanced probabilistic simulation methods have used, allowing full simulation

of the uncertain model outputs139. There may still be still room for improvement, however.

Probabilistic approaches to preference uncertainty in benefit-risk have tended to either assume no

knowledge of the distribution of preferences (using Stochastic Multicriteria Acceptabilty Analysis,

which simply explores all possible combinations of weights), or to estimate the distribution based on

external data (which adds another layer to the analysis and relies on an appropriate data source

being available). See I.1.6 for more discussion of these methods.

1393331,32A further problem is presented by the diversity of elicitation methods and data types; this

has led to a segmentation of the field, with most ratings methods being tied to a specific data

structure - for example, AHP uses webs and swing weighting uses trees. This makes it hard to use all

of the available data or to compare the findings of heterogeneously designed studies.

A parametric generalised Bayesian approach that is designed to incorporate data from a range of

common preference elicitation methods would directly address these issues, allowing uncertainty to

be estimated directly (within the model) from an arbitrary dataset and propagated to the outputs.

III.1.5 Aim and objectives

In light of the above, the overall aim of this part of the project is to develop just such a Bayesian

framework for analysis and meta-analysis of preference data for benefit-risk assessment using

MCDA. Ideally the framework should:

• be able to accommodate original (raw) preference elicitation data in the form of either

choices or relative ratings of criteria, and originating from as many of the following methods

as possible: AHP, swing weighting, MACBETH, discrete choice experiments; since these are

the types of preference data most often used for quantitative benefit-risk assessment 139;

• be able to accommodate summary preference data from previous elicitation studies where

no raw data is available;

• be fully Bayesian, with new Bayesian models to be developed where no suitable models

exist for a given data type; and

Chapter III.1

169

• use a common preference parameterisation that allows inferences to be made on multiple

data types simultaneously

This should improve the ability of analysts using MCDA for benefit-risk assessments to make the

most of all available data while accounting for uncertainty in a Bayesian manner. Ultimately this will

help to ensure that the perspectives of stakeholders – as revealed by their stated preferences - can

be fairly reflected in the decision making process.

It will be assumed that the overall utility function is an additive linear combination of the partial

values on all of the criteria, and that partial values for continuous criteria are themselves linear in

the underlying outcome measures. Categorical criteria will also be accommodated.

As the focus here is on analysing and interpreting preference elicitation studies, I will not delve too

deeply into issues relating to the design and execution of such studies, except insofar as these

impact directly on the methods to be used for analysis.The RRMS case study will be used as a

motivating example throughout the chapter, with a view to estimating preference weights for the

outcomes synthesised in Chapter II.

Chapter III.2

170

III.2 High level model structure

The evidence synthesis strategy will use the overall model structure depicted within the blue area of

Figure 52. which also shows how this fits together with the other modelling components already

described in Chapter II (shown in faded tones).

Figure 52 - High-level model structure, focusing on preference modelling.

III.2.1 Notes on preference parameters

The details of the various models will follow in their respective sections but there are some general

points worth making at this stage.

III.2.1.1 Assumed form of the utility function

Throughout this chapter, it will be assumed that utility is linear in criteria (see III.1.1.2). This should

not be unduly restrictive, as the underlying criteria measures can be transformed to another scale if

Chapter III.2

171

it appears this will improve linearity, provided an appropriate monotonic transformation exists. This

means that the assumption of linearity in criteria is not in practice much stronger than that of

linearity in partial values. It is however stronger than merely assuming that preferences are

compensatory. It has been argued that linearity in criteria is usually a reasonable assumption

provided that every criterion always has a monotonically increasing or decreasing partial value

function irrespective of the value of other criteria 180, and it is hard to imagine any clinical benefits or

risks that would not fulfil this condition.

A further simplification will be made to the utility function for the purposes of this chapter. The

focus will be on comparing treatments in terms of the (additive) utility difference between them,

and therefore the intercept term will be omitted, with the caveat that utility will no longer always lie

within the interval [0,1].

This reduces the problem of identifying the utility function to estimating the utility coefficients 𝑈𝐶𝜔

where

𝑈 = ∑ 𝑈𝐶𝜔𝑥𝜔

𝜔=1

Again, the utility coefficients are in general identifiable and interpretable only up to an overall

scaling constant; to standardise the scale normalised “preference weights” 𝑤𝜔 will defined as

follows:

𝑤𝜔 =𝑈𝐶𝜔

∑ 𝑈𝐶𝜔𝜔=1

𝜔 ∈ 1,… ,

It is important to note however that these are not the same as the traditional MCDA weights

(𝑤𝑒𝑖𝑔ℎ𝑡𝜔). While 𝑤𝑒𝑖𝑔ℎ𝑡𝜔 corresponds to the entire domain of 𝑃𝑉𝐹𝜔, 𝑤𝜔 corresponds to a unit

change in 𝑥𝜔.

III.2.1.2 Shared preference parameters

As shown in Figure 52 and discussed in III.1.3, the preference module will be used to analyse various

types of elicited and published preference data. Each of these data types will have different

statistical properties and will therefore require different statistical models. However, inferences will

be made on the same set of preference strength parameters which will be common to all models.

By running the models simultaneously, combined inferences based on multiple data sources can be

obtained.

Chapter III.2

172

Specifically, any model featuring criterion will use the same preference strength parameter 𝑔𝜔

and utility coefficient 𝑈𝐶𝜔 for that criterion. In terms of magnitude the utility coefficients 𝑈𝐶𝜔 are

equal to the exponentiated preference strengths 𝑒𝑔𝜔.

III.2.1.3 Known signs

It will be assumed that the sign of each utility coefficient is known a priori; these signs will be passed

to the model as data. A similar assumption was made regarding the sign of the treatment effects in

Chapter II.

This assumption should usually be easily satisfied, at least for continuous criteria, since the sign of

the utility coefficient should be obvious to deduce from the definition of each criterion’s outcome

measure(s). For categorical criteria, it may sometimes be necessary to carry out a preliminary

analysis (eg using standard deterministic elicitation methods, or simply by inspecting the data) to

determine the signs.

III.2.1.4 Parameter scales

As has been noted (see III.1.1.2), in the general MAUT framework utility is expressed on an arbitrary

scale: in other words, the utility coefficients are only unique up to an overall scaling constant. This is

not always true of the coefficients obtained by different elicitation methods, however.

In the case of preferences obtained from choice models, the utility scale is fixed, since the

coefficients are related to the participants’ observed choice behaviour. For example, in the binomial

logit choice model, a coefficient of 1 represents an increase of 1 in the log odds of choosing a

particular alternative.

Preferences arising from relative ratings, however, are not calibrated to any absolute scale and so

there are infinitely many sets of coefficients that will fit the data. When relative ratings are analysed

alongside choice data with shared preference parameters, the choice analysis will fix the scale and

ensure a unique solution is found. Where only relative ratings are analysed, however, the analysis

will have an excessive degree of freedom which will be reflected as additional uncertainty in the

posterior distributions of the preference strengths or utility coefficients. There are two possible

strategies for dealing with this:

• Eliminate the extra degree of freedom from the analysis altogether by arbitrarily fixing the

value of 𝑔𝜔0 (typically at zero) for a particular outcome 𝜔0. This has the advantage of

ensuring a unique solution at the parameter inference level (which may help with model

convergence in an MCMC context); the disadvantage however is that the symmetry of the

Chapter III.2

173

model is compromised (in the Bayesian context this situation corresponds to a highly

lopsided prior, where the preference strength on one outcome is known with certainty and

the others are random).

• Allow the extra degree of freedom when making inferences on the preference strengths 𝑔𝜔,

but fix the scale when carrying the preferences forward for further analysis (i.e. in the actual

benefit-risk assessment). For example, defining

𝑤𝑒𝑖𝑔ℎ𝑡𝜔 = 𝑒𝑔𝜔 ∑ 𝑒𝑔𝑖

𝑖=1⁄

will give a set of normalised utility coefficients for use in benefit-risk assessment or any

other post-hoc analysis, without the additional posterior uncertainty associated with an

arbitrary absolute scale. This approach preserves the model’s symmetry but could

potentially give rise to MCMC convergence issues.

Here the latter approach that will be adopted on symmetry grounds; convergence of the preference

parameters will be monitored to ensure this approach remains appropriate.

Preference ratios, like treatment contrasts and mapping ratios, are transitive (see II.1.1 and II.4.6)

and whichever parameterisation is used should ensure that consistency is maintained. Both of the

approaches above can be seen to respect consistency since it is the preference strength (or its

exponentiated equivalent, the untility coefficient) for each criterion that is assigned a parameter

value; any preference ratios are calculated from these parameters and hence are guaranteed to be

consistent. This is not to say, however, that the observed preference ratios in the data are

guaranteed to be consistent when they are drawn from multiple source studies. Consistency in the

data is an assumption that the model relies on when combining disparate sources of evidence, and

one that should be verified in any applications. One way to check this assumption is simply to

inspect the observed preference ratios; developing more formal methods for evaluating preference

inconsistency is a priority for future work in this area.

III.2.1.5 Categorical criteria

The conventional way to describe preferences for a categorical criterion in MCDA is (i) to assign an

overall weight that represents the impact on utility of moving from the least favoured to the most

favoured level, and (ii) to express the relative utility of any intervening levels on a scale from 0 to 1

(i.e. as a partial value function).

Chapter III.2

174

The 0-1 scale will generally not be used here, as it is more straightforward to report all of the utility

coefficients for intermediate levels on the same scale as the overall weight. Note however that only

the overall weight (i.e the difference between the most and least favoured levels) is included in the

100% total; to also add the weights for other levels would exaggerate the criterion’s importance (as

they simply represent smaller portions of the criterion’s overall). This is essentially a matter of

reporting and does not affect the underlying preference parameters. An illustration is provided in

Figure 53 using criteria that will be explored later in the chapter as part of the RRMS case study.

Figure 53 – Two ways to display preferences for categorical variables – an example using criteria from the RRMS case study (but fictional data). Mode of administration is a 4-category variable, coded using 3 indicator variables with “daily subcutaneous” as the reference category. LEFT: The convention adopted in this thesis. The height of the bars shows the magnitude of the utility coefficient for each of the 3 admin indicator variables alongside the other RRMS criteria. These weights have been normalised so that the dark blue bars (i.e. excluding all admin contrasts but the weightiest) sum to 100%. RIGHT: The same information presented in more conventional MCDA fashion. Only the weightiest admin contrast is shown on the overall weight scale, and all weights are normalised to sum to 100%; the relative utility of the administration levels is shown separately as a partial value function.

Dummy coding will be used for categorical variables (see III.5.3.2) so that the utility coefficient for

each categorical level represents the difference in utility between that level and the reference level.

It is convenient to choose either the least or most favoured category as the reference so that one of

the coefficients represents the overall criterion weight. It turns out that using the least favoured

category as the reference makes for the most convenient model specification, as will be explained in

III.3.3.2.

Chapter III.2

175

III.2.1.6 Random preferences

Some models will use “random preference” formulations to allow for variation in preferences

(between studies and/or individuals), analogous to the random effects models widely used in

statistics. Typically random effects statistical models use a Normal distribution centred on the

population-average parameter to characterise the distribution of the random effects and I do not

propose to break with this convention. However, there is more than one scale such a distribution

could be fitted on as the parameters can be expressed either as preference strengths 𝑔𝜔 or utility

coefficients 𝑒𝑔𝜔 .

There are good arguments for fitting the random preferences distribution on the preference

strength scale, i.e. with mean 𝑔𝜔, and a fixed standard deviation. Specifically, preference strengths

can take any value on the real line, so under this method there is no need to censor or truncate the

distribution. Furthermore, a fixed standard deviation on the (logarithmic) preference strength scale

corresponds to a standard deviation which is a fixed proportion of the mean on the (exponentiated)

utility coefficient scale, and this seems intuitively appropriate: to use a monetary analogy, one would

expect to see greater absolute variation in the value individuals assign to a £500,000 house

compared to a 50p apple.

Based on preliminary model runs, however, it appears that models using random preference

strengths formulated on the logarithmic preference strength scale tend to exhibit somewhat poor

convergence. Better convergence is observed when the random effects are assigned a Normal

distribution on the utility coefficient scale. On this scale it is necessary to additionally ensure that

the random coefficients are strictly positive; this is easily achieved in BUGS by specifying a Normal

distribution left-censored at zero. It is also straightforward in BUGS to specify the random effects

standard deviation as a fixed proportion of the mean, thus approximating the desirable property of a

random effects distribution referred to in the previous paragraph.

When using random preference distributions, the implicit assumption is that it is valid to derive

population-level utility coefficients as a (weighted) average of the utility coefficients for each study

or individual. The question of how to aggregate preferences at the group level has long been

recognised as a key issue in utility-based economics: see for example 181-184. Support for the

approach used here is provided by a theorem by Keeney 185 (building on Harsanyi’s utilitarianian

theorem 186,187), which states that the only population utility function satisfying certain basic axioms

is a weighted average of the individual utility functions. Determining the weightings for the average

requires consideration of how important one individual’s utility is compared to another. Under the

general definition of utility, doing this in a fair and equal manner is not straightforward as a person’s

Chapter III.2

176

utility may also reflect the wellbeing of those around them (i.e. altruism) 186. In the benefit-risk

context we are restricting the utility function to measures of one’s personal health, and assuming a

regulatory perspective that assigns equal importance to all individuals, so a straightforward average

seems appropriate.

III.2.1.7 Priors

The preference priors can also (in principle) be expressed on either the preference strength or utility

coefficient scale, and there may be arguments for various distributions on one scale or the other.

Here identical Gamma distributions with shape parameter 1 and rate parameter 0.01 will be

assigned to the utility coefficients – i.e. 𝑒𝑔𝜔~𝐺𝑎𝑚𝑚𝑎(1,0.01).- There are theoretical justifications

for the Gamma prior. Firstly, the distribution has a floor at zero but no upper bound, as required for

the utility coefficients. Furthermore, a Gamma prior with shape parameter 1 on the utility

coefficients is equivalent to a 𝐷𝑖𝑟𝑖𝑐ℎ𝑙𝑒𝑡(𝟏) distribution on the normalised weights (where 1 is a

vector of length with each component equal to 1), also known as a flat Dirichlet distribution,

which is uniform over the weight space and therefore a natural choice for an uninformative prior in

this context. This distribution’s suitability for modelling weights in benefit-risk assessment has been

noted elsewhere 178.

On more practical grounds, preliminary model runs indicate that this choice of prior gives good

convergence compared to other possible priors that were investigated.

The rate parameter of the Gamma distribution has no impact on the distribution of the normalised

(relative) weights, but determines the absolute scale of the utility coefficients. This parameter

should therefore be set so that the distribution covers an a priori feasible range for the utility

coefficients.

In principle the priors in a Bayesian model should be set by referring to external evidence or expert

intuition rather than using the main model data. In the absence of any external or expert reference

points for the feasible range of utility coefficients, however, preliminary (non-Bayesian) analyses of

the datasets used in this chapter were carried out and the range of utility coefficients examined to

inform the prior (essentially an example of the “empirical Bayes” approach 188). When the rate

parameter of the Gamma distribution is set to 0.01, the 2.5% to 97.5% interval of the prior

corresponds well to the observed range of utility coefficients, with the maximum observed utility

coefficient of 388 (from a deterministic analysis of a discrete choice dataset; see III.4.4) lying just

above the 97.5% point. The 𝐺𝑎𝑚𝑚𝑎(1,0.01) distribution was therefore adopted as the prior.178

Chapter III.2

177

Other reasonable noninformative priors for the utility coefficients are possible. Normal priors with

wide variance have been used elsewhere 189 and 0 includes a sensitivity analysis using such a Normal

prior (although due to the fixing of signs in this model, only the upper half of a Normal distribution

centred on zero will be used for this purpose).

Priors can also be defined on the utility ratio scale relative to a selected reference criterion, which

may seem a natural approach when cost is included as the reference 190 but in the benefit-risk

assessment context the inherent asymmetry in treating one criterion differently from the others is

less appealing.

The remaining sections in this chapter set out the particular preference models, datasets and results

that will be used to make inferences on these parameters.

Chapter III.3

178

III.3 Bayesian analysis of elicited ratings

There is little literature on Bayesian relative ratings models. Searching the literature reveals no

Bayesian implementations of swing weighting. A Bayesian implementation of AHP has been shown

to be possible and to have advantages over traditional methods 157; but so far it has been limited to

the AHP format only; a model that generalises to other ratings data (and/or forms part of a larger

Bayesian benefit-risk assessment analysis) has not been demonstrated.

The underlying data structure is a network of criteria of the general form described in III.3.1.3; the

data themselves are ratings between which relative ratings have been provided.

III.3.1.1 Network-level constants

Let 1, … , be a set of criteria, and 1, … , 𝑛 a set of participants (typically it will be assumed

that these are individuals, but a single “participant” can also be a group of people who give ratings

on a collective consensus basis. Let 1, … , 𝐾 be the set of contrasts, i.e. the pairs of criteria

participants are asked to compare, corresponding to the links on the network diagram.

III.3.1.2 Contrast-level ratings data

The ratings data (if complete) consists of a set of K relative ratings 𝑧𝑖𝑘 (k ∈ 1,… , 𝐾) for each

participant i ∈ 1, … , 𝑛, where each k represents a specific contrast between a pair of outcomes. To

show which outcomes are involved in each contrast, an indicator variable 𝜒𝜔𝑘 is created for each

criterion 𝜔 and contrast k, taking the value 1 when the 𝜔 is the “headline” element of contrast k, -1

when 𝜔 is the “baseline” element of contrast k, and 0 otherwise. In a slight abuse of notation, 𝜔1 –

𝜔2 will sometimes be used to refer to the contrast between “headline” criterion 1 and “baseline”

criterion 2, although ratings are not straightforward subtractions but rather ratios of utility

coefficients (much of the modelling will however be carried out on the logarithmic “preference

strength” scale, where ratings do indeed correspond to subtractive differences).

III.3.1.3 Examples of contrast structures

The set of K criteria contrasts for which ratings are available is determined by the design of the

elicitation tasks. This section shows how the contrasts are coded for some example network

structures.

III.3.1.3.1 Fans

A particularly simple network structure for elicitation is the fan, as illustrated in Figure 47. In this

case a suitable set of contrasts is 𝐵 − 𝐴, 𝐶 − 𝐴 , 𝐷 − 𝐴, 𝐸 − 𝐴, 𝐹 − 𝐴, with 5 elements.

(Inverting any of these contrasts would also result in a suitable set.) The data would look like Table

14 (although the rows of data may be ordered differently).

Chapter III.3

179

Table 14 – Data structure for a “fan” network with six outcomes A, B, C, D, E and F. The ratings data are here represented by the placeholder symbol

Participant

i

Contrast

k

Rating

𝑧𝑖𝑘

𝜒𝐴𝑘 𝜒𝐵𝑘 𝜒𝐶𝑘 𝜒𝐷𝑘 𝜒𝐸𝑘 𝜒𝐹𝑘

1 1 -1 1 0 0 0 0

1 2 -1 0 1 0 0 0

1 3 -1 0 0 1 0 0

1 4 -1 0 0 0 1 0

1 5 -1 0 0 0 0 1

2 1 -1 1 0 0 0 0

2 2 -1 0 1 0 0 0

2 3 -1 0 0 1 0 0

2 4 -1 0 0 0 1 0

2 5 -1 0 0 0 0 1

etc...

III.3.1.3.2 Webs

The web structure (as in Figure 46) contains a contrast between all possible distinct pairs of

outcomes, and hence results in (2) contrasts, the maximum possible for outcomes. In the case

of Figure 46 this gives 15 contrasts, namely 𝐵 − 𝐴, 𝐶 − 𝐴 , 𝐷 − 𝐴, 𝐸 − 𝐴, 𝐹 − 𝐴, 𝐶 − 𝐵,𝐷 − 𝐵, 𝐸 −

𝐵, 𝐹 − 𝐵,𝐷 − 𝐶, 𝐸 − 𝐶, 𝐹 − 𝐶, 𝐸 − 𝐷, 𝐹 − 𝐷, 𝐹 − 𝐸 and data resembling Table 15 (again, the rows

may be permuted).

Table 15 – Data structure for a “web” network with six criteria A, B, C, D, E and F. The ratings data are here represented by the placeholder symbol

Participant

i

Contrast

k

Rating

𝑧𝑖𝑘

𝜒𝐴𝑘 𝜒𝐵𝑘 𝜒𝐶𝑘 𝜒𝐷𝑘 𝜒𝐸𝑘 𝜒𝐹𝑘

1 1 -1 1 0 0 0 0

1 2 -1 0 1 0 0 0

1 3 -1 0 0 1 0 0

1 4 -1 0 0 0 1 0

1 5 -1 0 0 0 0 1

1 6 0 -1 1 0 0 0

1 7 0 -1 0 1 0 0

1 8 0 -1 0 0 1 0

1 9 0 -1 0 0 0 1

Chapter III.3

180

1 10 0 0 -1 1 0 0

1 11 0 0 -1 0 1 0

1 12 0 0 -1 0 0 1

1 13 0 0 0 -1 1 0

1 14 0 0 0 -1 0 1

1 15 0 0 0 0 -1 1

2 1 -1 1 0 0 0 0

etc...

III.3.1.3.3 Trees and other hierarchical network structures

The same principles can be used to prepare elicitation data for trees and other hierarchical network

structures.

The network in Figure 49 includes 15 contrasts, namely 𝐵 − 𝐴, 𝐶 − 𝐴 , 𝐶 − 𝐵, 𝐸 − 𝐷, 𝐹 − 𝐷, 𝐹 −

𝐸, 𝐻 − 𝐺, 𝐼 − 𝐺, 𝐽 − 𝐺, 𝐼 − 𝐻, 𝐽 − 𝐻, 𝐽 − 𝐼, 𝐷 − 𝐴, 𝐺 − 𝐴, 𝐺 − 𝐷 and data resembling Table

16 (again, the rows may be permuted).

Table 16 – Data structure for the network in Figure 49 with ten outcomes A, B, C, D, E, F, G, H, I and J. The ratings data are here represented by the placeholder symbol

Participant

i

Contrast

k

Rating

𝑧𝑖𝑘

𝜒𝐴𝑘 𝜒𝐵𝑘 𝜒𝐶𝑘 𝜒𝐷𝑘 𝜒𝐸𝑘 𝜒𝐹𝑘 𝜒𝐺𝑘 𝜒𝐻𝑘 𝜒𝐼𝑘 𝜒𝐽𝑘

1 1 -1 1 0 0 0 0 0 0 0 0

1 2 -1 0 1 0 0 0 0 0 0 0

1 3 0 -1 1 0 0 0 0 0 0 0

1 4 0 0 0 -1 1 0 0 0 0 0

1 5 0 0 0 -1 0 1 0 0 0 0

1 6 0 0 0 0 -1 1 0 0 0 0

1 7 0 0 0 0 0 0 -1 1 0 0

1 8 0 0 0 0 0 0 -1 0 1 0

1 9 0 0 0 0 0 0 -1 0 0 1

1 10 0 0 0 0 0 0 0 -1 1 0

1 11 0 0 0 0 0 0 0 -1 0 1

1 12 0 0 0 0 0 0 0 0 -1 1

1 13 -1 0 0 1 0 0 0 0 0 0

1 14 -1 0 0 0 0 0 1 0 0 0

Chapter III.3

181

1 15 0 0 0 -1 0 0 1 0 0 0

2 1 -1 1 0 0 0 0 0 0 0 0

etc...

The tree in Figure 50 includes 9 contrasts, namely 𝐵 − 𝐴, 𝐶 − 𝐴 , 𝐸 − 𝐷, 𝐹 − 𝐷,𝐻 − 𝐺, 𝐼 − 𝐺, 𝐽 −

𝐺, 𝐷 − 𝐴, 𝐺 − 𝐴 and data resembling Table 17 (again, the rows may be permuted).

Table 17 – Data structure for the tree in Figure 50 with ten outcomes A, B, C, D, E, F, G, H, I and J. The ratings data are here represented by the placeholder symbol

Participant

i

Contrast

k

Rating

𝑧𝑖𝑘

𝜒𝐴𝑘 𝜒𝐵𝑘 𝜒𝐶𝑘 𝜒𝐷𝑘 𝜒𝐸𝑘 𝜒𝐹𝑘 𝜒𝐺𝑘 𝜒𝐻𝑘 𝜒𝐼𝑘 𝜒𝐽𝑘

1 1 -1 1 0 0 0 0 0 0 0 0

1 2 -1 0 1 0 0 0 0 0 0 0

1 3 0 0 0 -1 1 0 0 0 0 0

1 4 0 0 0 -1 0 1 0 0 0 0

1 5 0 0 0 0 0 0 -1 1 0 0

1 6 0 0 0 0 0 0 -1 0 1 0

1 7 0 0 0 0 0 0 -1 0 0 1

1 8 -1 0 0 1 0 0 0 0 0 0

1 9 -1 0 0 0 0 0 1 0 0 0

2 1 -1 1 0 0 0 0 0 0 0 0

etc...

These examples used the substitution rule to structure the hierarchy. An agglomeration rule, if well-

defined a priori (and not dependent upon any vague/uncertain frequencies), could also be

incorporated by including more indicator variables (with appropriate coefficients) in the relevant

contrasts.

III.3.2 Datasets

To illustrate the application of the ratings-based elicitation model, two datasets from a recent

benefit-risk methodology project will be used, as detailed in the sections below.

III.3.2.1 PROTECT investigator ratings for RRMS treatment outcomes

This dataset consists of relative ratings for treatment outcomes, derived using swing weighting and a

value tree structure, and used in an early example of benefit-risk assessment using MCDA 29,53.

Ratings were determined by consensus within the case study team; prior to this two individuals had

Chapter III.3

182

given their ratings. Here these will be regarded as 3 independent participants (but it is recognised

that in reality this probably is not the case as the two individuals were part of the team).

This elicitation exercise was part of a proof-of-concept MCDA-based benefit-risk assessment into the

RRMS drugs glatiramer acetate, intramuscular interferon beta-1a, and natalizumab. The first two

drugs have already been encountered in the comparison of first-line RRMS therapies in Chapter II,

whereas natalizumab was excluded as it is usually reserved as second-line treatment for more

aggressive cases.

A full listing of this dataset is shown in Appendix A.2.

The “swing weighting” methodology was used to elicit ratings. This uses a tree structure sometimes

known as a value tree (Figure 54). The benefits and one of the risks (liver enzyme elevation) are

essentially the same as in Chapter II but a number of other risks are also included:

• Herpes reactivation – the reactivation of dormant herpes infections, a risk of many RRMS

treatments due to their immunosuppressive nature.

• Progressive Multifocal Leukoencephalopathy (PML) - a brain infection by the John

Cunningham Virus, causing severe disability and death if untreated, this is a rare but very

serious risk associated with natalizumab.

• Congenital abnormalities – the risk of teratogenic disorders due to treatment.

• Seizures – a known risk of interferon beta-1a.

• Infusion/injection reactions – localised reactions to administration by infection or infusion

• Allergic/hypersensitivity reactions – systemic allergic reactions to treatment

• Flu-like reactions – transient systemic flu-like malaise as a result of treatment, usually

resolving in a few days

The route of administration for a treatment can have a significant impact on patient preferences 191

192 and the following routes of administration were included in the ratings exercise:

• Daily oral (self-administered)

• Daily subcutaneous injection (self-administered)

• Weekly intramuscular injection (self-administered)

• Monthly intravenous infusion (in clinic)

Chapter III.3

183

Figure 54 – Value tree for the RRMS investigator ratings dataset before elicitation. Administration modes (blue), clinical benefits (green) and risks (red) of treatment for the PROTECT RRMS investigator ratings data. The yellow cells are the comparison points at which the swing weighting methodology is to be applied.

Swing weighting takes place at each yellow cell, resulting in relative ratings for its “children”. Here

the weighting was carried out “bottom-up” (i.e. right to left in the orientation shown) with the

substitution rule (as per the example in III.1.2.3). Promoting the criteria changes the shape of the

network; after this process the tree appears rather different (Figure 55).

Treatment effects

Benefits

Reduction in relapses

Slowdown in disability progression

Risks

Infection

Herpes reactivation

PML

Congenital abnormalities

Liver enzyme elevation

Seizures

Other

Infusion/injection reactions

Allergic/hypersensitivity reactions

Flu-like reactions

Administration

Administration: daily oral vs daily subcutaneous

Administration: monthly infusion vs daily subcutaneous

Administration: weekly intramuscular vs daily

subcutaneous

Chapter III.3

184

Figure 55 – Value tree for the RRMS investigator ratings dataset after the elicitation process is complete. Administration modes (blue), clinical benefits (green) and risks (red) of treatment for the PROTECT RRMS investigator ratings data.

III.3.2.2 PROTECT patient ratings for RRMS patient outcomes

This dataset consists of criteria ratings originally elicited within PROTECT’s workstream on patient

and public involvement 193. As with the investigator ratings, the project concerned preferences for

outcomes of treatment with the RRMS drugs glatiramer acetate, intramuscular interferon beta-1a,

and natalizumab. The data were elicited using the (classical) AHP method 146, in a paper-based

survey issued to RRMS patients at a London clinic. The study design, consent processes and ethical

approval have been described elsewhere 193. The initial analysis of these data revealed a problem

with the way the survey questions had been worded, however, and as a result there is substantial

doubt over the validity of the elicited preferences. The problems were twofold and both were

PML

Disability progression Relapse

Herpes reactivation

Congenital abnormalities

Liver enzyme elevation

Seizures

Infusion/injection reactions

Allergic/hypersensit-ivity reactions

Flu-like reactions

Administration: daily oral vs daily

subcutaneous

Administration: monthly infusion vs daily subcutaneous

Administration: weekly intramuscular

vs daily subcutaneous

Chapter III.3

185

common pitfalls that have already been discussed: firstly, the “amount” of each outcome used for

the comparisons was not specified, meaning that the interpretation of the ratings is not clear, an

example of range insensitivity bias (see III.1.3.3). Secondly, no rule was specified for moving

between the hierarchies, causing further problems (see III.3.1.3.3). Not all of the ratings in the

dataset are affected by these issues, however. Among the criteria being compared were the route

and frequency of administration for each drug, which should give valid ratings as they are

unambiguously defined and located at the same level of the elicitation hierarchy.

Using just the administration categories within the AHP framework results in the elicitation network

– a web – shown in Figure 56. Note that the administration categories feature alone rather than as

pairwise contrasts; this is a feature of the AHP methodology that will be discussed further in III.3.3.2.

Figure 56 – Elicitation network diagram for administration modes in the PROTECT RRMS patient ratings data.

A full listing of this dataset is provided in Appendix A.2.

III.3.3 Statistical model

III.3.3.1 Key model features and parameters

The model has its roots in the regression-based AHP analysis that has been proposed before on

various occasions 149,151,154-156 including in a Bayesian context 157. Here however the model will be

generalised beyond AHP to apply more widely to any cardinal relative ratings data inlcuding that

obtained from other elicitation methods such as swing weighting.

Chapter III.3

186

Recall from III.2.1 that the population-average utility function is assumed to be linear and additive in

all criteria, taking the form

𝑈 = ∑ 𝑒𝑔𝜔𝑥𝜔

𝜔=1

where 𝑥𝜔 is the independent variable representing criterion 𝜔 and 𝑔𝜔 is its associated preference

strength.

If preferences are assumed to vary between units (such as studies or individuals), a random

preference model can be used so that the utility function for unit i is

𝑈𝑖 = ∑ 𝑒𝛾𝑖𝜔𝑥𝜔

𝜔=1

where 𝛾𝑖𝜔 is unit i's preference strength for criterion 𝜔. (The distribution of 𝛾𝑖𝜔, or rather 𝑒𝛾𝑖𝜔 , is

set out in the following page).

As previously discussed, the absolute scale of the utility function is arbitrary and irrelevant to

multicriteria decision making. Decision options are always evaluated relative to one another rather

than by reference to some external benchmark, so it is the relative magnitudes of the utility

coefficients 𝑒𝛾𝑖𝜔 for the set of criteria 𝜔 ∈ 1,… , Ω that influence decision making behaviour. In

other words, if one assumes homogeneous preferences among a population, then for any pair of

criteria 𝜔1, 𝜔2 ∈ 1,… , Ω it is the preference ratios 𝑒𝛾𝑖𝜔2/𝑒𝛾𝑖𝜔1 that must be homogeneous rather

than the absolute values 𝑒𝛾𝑖𝜔1 and 𝑒𝛾𝑖𝜔2 .

It is axiomatic that preferences should be transitive – in other words, if one perceives A as twice as

attractive as B, and B as twice as attractive as C, then A should appear four times as attractive as B.

This is guaranteed by the model since the preference ratios are naturally transitive under

multiplication, i.e. 𝑒𝛾𝑖𝜔3

𝑒𝛾𝑖𝜔1

= 𝑒𝛾𝑖𝜔3

𝑒𝛾𝑖𝜔2

× 𝑒𝛾𝑖𝜔2

𝑒𝛾𝑖𝜔1

.

A rating by individual i comparing the outcome 𝜔2to 𝜔1 is an estimate or “measurement” (subject to

measurement error) of the ratio of utility coefficients 𝑒𝛾𝑖𝜔2/𝑒𝛾𝑖𝜔1 .

To construct the likelihood for the ratings model, these ratio estimates are transformed to the

logarithmic scale and assumed to consist of (i) a deterministic component 𝛾𝑖𝜔2 − 𝛾𝑖𝜔1 and (ii) a

Normal error term 휀𝑖𝑘 with variance 𝜎𝑟𝑎𝑡2 . is the error terms are assumed to be independent,

Chapter III.3

187

meaning that any “mistake” made by a participant in a rating task (i.e., a deviation from their true

underlying preferences) is independent of any “mistakes” they make on other ratings.

In other words, if 𝑧𝑖𝜔1𝜔2 is individual i’s rating of 𝜔2 compared to 𝜔1 then log(𝑧𝑖𝜔1𝜔2) follows a

Normal distribution with mean 𝛾𝑖𝜔2 − 𝛾𝑖𝜔1 and variance 𝜎𝑟𝑎𝑡2 :

log(𝑧𝑖𝑘) ~ 𝑁(𝛾𝑖𝜔2 − 𝛾𝑖𝜔1 , 𝜎𝑟𝑎𝑡2 )

Equivalently, using the notation of III.3.1.2,

log(𝑧𝑖𝑘) ~ 𝑁(∑ 𝛾𝑖𝜔𝜒𝜔𝑘𝜔=1 , 𝜎𝑟𝑎𝑡

2 ) for every pairwise contrast 𝑘 ∈ 1,… , 𝐾.

The use of an additive error term on the log ratio scale is equivalent to assuming multiplicative

errors on the ratio scale.

The variance 𝜎𝑟𝑎𝑡2 is here assumed to be constant for all participants for the sake of parsimony, but

in principle it would be straightforward to allow for heterogeneous variances. The value of 𝜎𝑟𝑎𝑡2

reflects the deviation of the stated ratings from the underlying preference strength ratios. It is this

variability that allows the model to incorporate inconsistencies in the ratings. Thus if a vague prior is

assigned to 𝜎𝑟𝑎𝑡2 then its posterior will reflect the overall level inconsistency in the dataset. In some

datasets (such as trees evaluated by only one individual), there is no scope for inconsistency in the

data and hence no data-based estimate of 𝜎𝑟𝑎𝑡2 will be possible. In such cases the posterior

distribution will reflect only the prior.

The model can be constructed with either “fixed” or “random” preferences, somewhat analogous to

fixed- and random-effects meta-analysis but at the level of individual participants rather than

studies. In the random preference version of the model, 𝛾𝑖𝜔 is allowed to vary between individuals

to accomodate the presence of preference heterogeneity in the population. For the reasons

discussed in III.2.1.6, a Normal distribution on the exponentiated scale is used, with mean 𝑒𝑔𝜔 (the

population-average utility coefficient for criterion ) and standard deviation proportional to the

mean (with the coefficient of proportionality denoted 𝜎𝑝𝑟𝑒𝑓), i.e.

𝑒𝛾𝑖𝜔 ~ 𝑁(𝑒𝑔𝜔 , (𝑒𝑔𝜔𝜎𝑝𝑟𝑒𝑓) 2) for an individual participant i.

No allowance is made for any correlations among 𝛾𝑖𝜔 or 𝑒𝛾𝑖𝜔 for distinct values of . I am not

aware of any compelling reason to believe that statistical correlations must exist between the

preferences for different criteria, although it is not implausible. The model could in principle be

extended to incorporate such correlations using an approach similar to that employed for the

Chapter III.3

188

between-study outcome correlations in Chapter II. For this initial proof of concept it was felt that

this was an unnecessary layer of complexity.

In the fixed preferences version of the model, preferences are assumed to be perfectly

homogeneous among the participants, i.e. 𝛾𝑖𝜔 = 𝑔𝜔.

It is also possible to combine data from more than one elicitation study in the model; in such

instance it may be desirable to add in another hierarchy of random effects so that preferences can

vary between studies in addition to (or instead of) between individuals.

III.3.3.2 Categorical variables

Ratings-based elicitation methodologies vary in their parameterisation of the levels of a categorical

variable. Some methodologies such as AHP estimate a utility coefficient for every level, whereas

others such as swing weighting fix one level as a reference and estimate coefficients for all but the

reference level. The utility associated with the reference level is analogous to the intercept term in

linear regression – i.e. it is a nuisance parameter that is not important when comparing treatments,

where only the difference in utility between levels matter. Furthermore, the parameterisation used

in AHP has been criticised because the intercept term can interfere with judgements expressed on

the ratio scale179. The model developed here therefore uses a parameterisation similar to swing-

weighting that is based on utility differences and does not estimate the reference level. Instead, an

indicator variable is constructed for each level apart from the reference and these indicators are

treated as separate criteria, with their estimated utility coefficients representing the difference in

utility from the reference level (i.e. the convention known as dummy coding – see III.5.3.2 for more

discussion of coding schemes).

To give an example, if Q is a categorical variable with n levels 0,1, … , 𝑛 − 1 then it will be

represented in the utility function by n-1 criteria 𝜔1, … , 𝜔𝑛−1. The population-average utility

function (focusing here solely on these n-1 criteria for clarity, although there may be others) is

𝑈 = ∑𝑒𝑔𝜔𝑞𝑥𝜔𝑞

n−1

𝑞=1

where 𝑥𝜔𝑞 is an indicator variable taking the value 1 when Q=q and 0 otherwise.

This way, each coefficient 𝑒𝑔𝜔𝑞 represents the change in utility associated with moving from Q level

0 (the reference level) to level q. When Q=0 (the reference level), U=0. In other words, there is no

intercept term in the utility function. In principle this lack of intercept should not concern us since

(as we have already seen) the absolute level of utility is arbitrary and irrelevant; and indeed this

Chapter III.3

189

parameterisation works fine with ratings methodologies such as swing weighting that ask

participants to compare changes in utility rather than absolute levels.

In the AHP elicitation method, however, participants are required to compare the absolute level of

utility associated with each level of a categorical variable. In order to analyse ratings elicited using

this method, an additional nuisance intercept parameter

𝛼𝐴𝐻𝑃 is included in the utility function to represent the reference level; that is, the utility function is

𝑈 = 𝛼𝐴𝐻𝑃 +∑𝑒𝑔𝜔𝑞𝑥𝜔𝑞

n−1

𝑞=1

The mean utility coefficient for any other level q on the absolute AHP utility scale is then equal to

𝛼𝐴𝐻𝑃 + 𝑒𝑔𝜔𝑞 . .

For coding purposes it is convenient to choose the least favoured level (i.e. the level with the lowest

utility) as the reference. If the most favoured level were used as the reference, then the coefficient

on the AHP scale would be 𝛼𝐴𝐻𝑃 - 𝑒𝑔𝜔 and additional constraints would need to be imposed to

ensure that this quantity remained strictly positive (since its logarithm informs the likelihood). If any

intermediate level were used as the reference then some coefficients would be equal to 𝛼𝐴𝐻𝑃 + 𝑒𝑔𝜔

and some equal to 𝛼𝐴𝐻𝑃 - 𝑒𝑔𝜔, overcomplicating the model code (and additionally, no single 𝑒𝑔𝜔

would correspond to the entire criterion preference weight).

As discussed in III.2.1.5, it is conventional in MCDA to present the preference parameters for the

administration modes as a partial value function taking values from 0 (least preferred) to 1 (most

preferred) for each category, and a weight that corresponds to the entire range. This can be done

based on the utility coefficient parameterisation used here; the weight for the range is simply the

largest utility coefficient 𝑒𝑔𝜔 among the categorical levels, and the partial value for each level is its

utility coefficient expressed as a proportion of the weight for the range. If the maximum coefficient

is not identifiable in advance, presenting the results this way (with fully simulated posterior

distributions) will tend to require an additional post hoc model run so that the appropriate

calculations can be specified.

III.3.3.3 Priors

The main prior distributions used are set out in Table 18 and will be used throughout this chapter

unless otherwise stated. From time to time alternative priors may be used to investigate specific

points; this will be made clear at the time. The priors for the standard deviation parameters may

appear rather wide, but it was felt to be sensible for the distribution to cover multiple orders of

Chapter III.3

190

magnitude so as not to obscure any extreme heterogeneity caused by combining different

methodologies/scales in different populations, and the possibility of framing biases, etc.

Table 18 – Priors for ratings model parameters


𝑔𝜔 Population-average preference strength for

criterion

𝑒𝑔𝜔~𝐺𝑎𝑚𝑚𝑎(1,0.01)

(see III.2.1.7 for more details)

𝛼𝐴𝐻𝑃 Nuisance parameter representing utility of “reference” level of categorical administration variable in AHP-style ratings

𝛼𝐴𝐻𝑃~𝐺𝑎𝑚𝑚𝑎(1,0.1)

(similar to the prior for 𝑒𝑔𝜔 but over a smaller scale, as the reference level is

expected to have a relatively low utility)

𝜎𝑟𝑎𝑡 Standard deviation of log ratings

𝜎𝑟𝑎𝑡 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10)

This vague prior is expected to be more than wide enough to cover all plausible

values since a deviation of 10 on the logarithmic scale (on which the random

distribution of ratings is defined) corresponds to multiplication or

division by 𝑒10 ≈ 22,000, i.e. a change of several orders of magnitude.

𝜎𝑝𝑟𝑒𝑓 Proportional standard deviation of random preference distribution

𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10)

This was the widest uniform prior that allowed the models to run without errors. The random distribution of preferences is defined such that

a deviation of 10 corresponds to a 10-fold increase (or decrease) in any given utility

coefficient; the variability in ratings (i.e. ratios of utility coefficients) will be somewhat higher still and this should be sufficient to cover any

plausible between-participant variation in preferences.

III.3.3.4 Initial values for MCMC simulation

Preliminary runs reveal that, unlike most models in this thesis where BUGS is able to generate

suitable initial values, in the ratings model initial values for the utility coefficients must be supplied

by the user in order for the model to converge properly. This is likely to be because the utility

coefficients at intermediate levels of the administration variable must be lower than the coefficient

at the highest level and BUGS does not recognise this restriction when generating initial values. The

initial values used here are 𝑒𝑔𝜔 = 1 for any criterion used to derive a preference weight, and

Chapter III.3

191

𝑒𝑔𝜔 = 0.5 for the intermediate levels of the administration variable. If the model converges

properly then the results should not be sensitive to the choice of initial values; to confirm this, a

sensitivity analysis of the overall MCDA results to alternative sets of initial values is presented in

Appendix C.

III.3.4 Results

Appendix B contains the BUGS code and data files used to generate the results.

III.3.4.1 PROTECT investigator ratings

Simulations were performed using the Markov Chain Monte Carlo technique in either WinBUGS

(version 1.4.3) 48 or OpenBUGS (version 3.2.2 rev 1063 - www.openbugs.net). Initial values were

generated within BUGS for the majority of models. 100,000 iterations were discarded to allow for

“burn-in”; the posterior statistics were then derived from a further 100,000 iterations. Convergence

was assessed by inspection of the sample histories. Model fit was assessed by calculation of the

mean residual deviance119, which in a well-fitting model should be similar to (or less than) the

number of independent observations.

Table 19

Chapter III.3

192

Table 19 shows each participant’s mean preference weights calculated using the standard

deterministic method (see III.1.2.3) and also in the fixed effects Bayesian model described above, for

three different uniform priors on the ratings standard deviation.

Note that (as will be the case throughout this chapter) the utility coefficients and/or weights for

continuous criteria are expressed with regard to fixed unit intervals in the underlying outcome

measures. These units are provided in the results tables. It is important to recognise that the

outcome values used for elicitation, and/or experienced by patients on real-world treatments, may

be very different. The given “weight” for a criterion should therefore not by itself be interpreted as

a measure of that criterion’s influence on the benefit-risk balance without also considering the likely

real-world levels of the associated outcome measure.

Chapter III.3

193

Table 19 – Mean preference weights for individual participants in the investigator ratings dataset; deterministic analysis and Bayesian analysis with sensitivity to assumed ratings standard deviation

Participant 1 Normalised preference weights Unit Determin-

istic Posterior mean

σrat (0,0.01)

Posterior mean

σrat (0,1)

Posterior mean

σrat (0,10) Clinical outcomes

Relapse 1 event 3.9% 3.9% 5.8% 7.0%

Disability progression 1 event 5.6% 5.6% 6.5% 7.1%

PML 1 event 55.9% 55.9% 45.0% 34.9%

Herpes reactivation 1 event 6.7% 6.7% 6.8% 7.1%

Liver enzyme elevation 1 event 11.2% 11.2% 9.5% 8.8%

Seizures 1 event 5.6% 5.6% 6.0% 6.6%

Congenital abnormalities 1 event 5.6% 5.6% 6.0% 6.7%

Infusion/injection reactions 1 event 2.8% 2.8% 5.4% 6.9%

Allergic/hypersensitivity reactions 1 event 1.1% 1.1% 3.5% 5.5%

Flu-like reactions 1 event 1.1% 1.1% 3.5% 5.5%

Modes of administration (reference = daily subcutaneous)

Daily oral N/A 0.6% 0.6% 1.9% 3.9%

Monthly infusion N/A 0.4% 0.4% 2.5% 5.6%

Weekly intramuscular N/A 0.3% 0.3% 1.9% 4.9%

Participant 2

Normalised preference weights Unit Determin-istic

Posterior mean

σrat (0,0.01)

Posterior mean

σrat (0,1)

Posterior mean


Relapse 1 event 19.0% 19.0% 11.8% 10.6%


PML 1 event 30.1% 30.1% 31.4% 28.6%



Seizures 1 event 3.0% 3.0% 4.5% 5.2%






Daily oral N/A 3.0% 3.0% 5.8% 6.6%



Participant 3 Normalised preference weights Unit Determin-

istic Posterior mean

σrat (0,0.01)

Posterior mean

σrat (0,1)

Posterior mean


Relapse 1 event 15.9% 15.9% 11.0% 10.4%


PML 1 event 29.5% 29.5% 29.7% 28.1%



Seizures 1 event 2.9% 3.0% 4.1% 4.5%






Daily oral N/A 2.9% 3.0% 5.2% 5.8%



Chapter III.3

194

This shows that the methods give the same central estimate of the weights when the standard

deviation is effectively zero, but when a nonzero standard deviation is allowed for the estimates

diverge considerably, with the weights being dragged towards their joint prior (which weights all 11

criteria equally at 9%). In other words the value of the standard deviation parameter can have a

strong impact on the results - but analysing individual trees like this provides no actual evidence on

what the standard deviation should be; its posterior essentially just reflects its prior. In the Bayesian

model, however, we can analyse all three participants together and thus obtain a data-based

posterior of the ratings standard deviation that reflects the inconsistency between individuals’

ratings. Results from this model are shown in Table 20 together with the preference weights and the

calculated residual deviance. Because of the small number of participants in this rating exercise,

only the fixed effects results are provided. The figures in Table 20 are based upon a uniform

standard deviation prior on the interval (0,10) which is intended to be wide enough to cover the full

range of plausible values (see III.3.3.3).

Table 20 – Posterior distribution of preferences for simultaneous analysis of all participants in the investigator ratings dataset

FIXED PREFERENCES 3 participants, 36 ratings in total

Normalised preference weights

Unit Mean sd 2.5% Median 97.5%

Clinical outcomes Relapse 1 event 10.1% 3.8% 4.2% 9.7% 18.9% Disability progression 1 event 15.2% 3.8% 8.5% 15.0% 23.4% PML 1 event 40.1% 4.2% 31.6% 40.1% 48.0% Herpes reactivation 1 event 8.1% 2.7% 3.9% 7.7% 14.4% Liver enzyme elevation 1 event 8.3% 2.8% 4.0% 8.0% 14.8% Seizures 1 event 4.5% 1.6% 2.1% 4.2% 8.4% Congenital abnormalities 1 event 4.5% 1.6% 2.1% 4.2% 8.4% Infusion/injection reactions 1 event 2.8% 1.0% 1.4% 2.7% 5.4% Allergic/hypersensitivity reactions 1 event 1.8% 1.0% 0.6% 1.5% 4.2% Flu-like reactions 1 event 1.9% 1.0% 0.6% 1.7% 4.5%


Daily oral N/A 2.7% 1.0% 1.2% 2.5% 5.2% Monthly infusion N/A 2.2% 1.2% 0.7% 1.9% 5.4% Weekly intramuscular N/A 1.6% 0.9% 0.5% 1.4% 4.0%

Ratings standard deviation N/A 0.63 0.10 0.47 0.62 0.87 Residual deviance N/A 35.0 8.4 20.5 34.3 53.2

Chapter III.3

195

The number of observations in the dataset is 36 (3 individuals providing 12 ratings each); since the

mean residual deviance is slightly lower, at 35, this indicates a good model fit. The posterior

distribution of the ratings standard deviation lies well below the upper bound of the prior,

suggesting that the prior was indeed suitably vague (this is also confirmed by a sensitivity analysis in

Appendix C).

III.3.4.2 PROTECT patient ratings

Table 21 shows the median preferences for administration modes for a single (arbitrarily chosen)

participant in the PROTECT patient ratings dataset, derived using both the Bayesian model (for two

different priors on the ratings standard deviation) and the standard deterministic “eigenvalue”

method 146.

In this instance, as only the administration categories feature in this dataset and the ratings were

elicited using the AHP methodology, the results are presented in two ways:

• as AHP-style partial “priorities” – that is, the utility coefficients 𝛼𝐴𝐻𝑃 + 𝑒𝑔𝜔 for every

category but the reference and simply 𝛼𝐴𝐻𝑃 for the reference, normalised to sum to 1 across

all categories; and

• as a partial value function on a scale from 0 to 1, representing each category’s utility

coefficient as a proportion of the maximum.

Medians have been shown instead of means because the partial value distributions are highly

skewed.

Because this ratings dataset uses a web of pairwise comparisons, rather than the tree used in the

investigator ratings dataset, there is scope for inconsistency in a single individual’s ratings and an

estimate of the ratings standard deviation can therefore be obtained.

Chapter III.3

196

Table 21 – Median preferences for a single participant in the patient ratings dataset; deterministic analysis and Bayesian analysis with sensitivity to assumed ratings standard deviation

Participant 1 Determ

-inistic Posterior median

σrat

(0,0.01)

Posterior median

σrat (0,0.1)

Posterior median

σrat (0,1)

Posterior median

σrat (0,10)

Posterior median

σrat

(0,100) Partial priorities for modes of administration (on strictly positive absolute scale from AHP)

Daily subcutaneous 0.05 0.06 0.06 0.06 0.03 0.03 Daily oral 0.42 0.42 0.42 0.40 0.35 0.35 Monthly infusion 0.13 0.13 0.13 0.14 0.19 0.19 Weekly intramuscular 0.40 0.39 0.39 0.37 0.33 0.33

Partial values for modes of administration (relative to reference = daily subcutaneous)

Daily oral 1.00 1.00 1.00 1.00 1.00 1.00 Monthly infusion 0.21 0.19 0.20 0.23 0.48 0.47 Weekly intramuscular 0.94 0.93 0.93 0.93 0.96 0.97

Ratings standard deviation N/A 0.01 0.10 0.90 2.35 2.34 Residual deviance N/A 43,080.0 435.1 10.7 4.3 4.3

As was found with the investigator ratings dataset, the standard deterministic analysis and the

Bayesian analysis are in close agreement for a single participant if the ratings standard deviation is

constrained to be small enough, but can diverge considerably otherwise due to shrinkage towards

the prior weights. In this instance the true value of the standard deviation can be estimated from

the data. The value of 2.34 is the same in both of the last two columns, indicating that it is a true

data-based estimate rather than an artefact of the prior; this in turn strongly suggests that the

narrower priors in the previous two columns are unduly restrictive. Examining the residual deviance

(bearing in mind there are 6 observations) backs up this conclusion, with the model fit clearly

superior when the ratings standard deviation is not tightly constrained. This means that the best-

fitting model is somewhat sensitive to the prior weights, suggesting a need for particular care when

setting these priors and perhaps some kind of sensitivity analysis.

Chapter III.3

197

Table 27 summarises the posterior distribution of preferences for administration modes (and other

key variables) based on a simultaneous analysis of all participants in the PROTECT patient ratings

dataset. Results from a fixed preference model and a random (by participant) preference model are

shown. As before, the ratings standard deviation is assigned a uniform prior on the interval (0,10)

on the grounds that this includes all plausible values.

Chapter III.3

198

Table 22 – Posterior distribution of preferences for simultaneous analysis of all participants in the patient ratings dataset

FIXED PREFERENCES 36 participants, 207 ratings in total

Mean sd 2.5% Median 97.5%

Partial priorities for modes of administration (on strictly positive absolute scale from AHP)

Daily subcutaneous 0.09 0.05 0.00 0.09 0.18 Daily oral 0.42 0.04 0.34 0.41 0.49 Monthly infusion 0.29 0.03 0.24 0.29 0.36 Weekly intramuscular 0.20 0.02 0.16 0.20 0.24


Daily oral 1.00 N/A

(constant) 1.00 1.00 1.00 Monthly infusion 0.61 0.15 0.30 0.62 0.92 Weekly intramuscular 0.30 0.15 0.02 0.31 0.55

Ratings standard deviation 1.21 0.06 1.10 1.21 1.34 Residual deviance 205.9 20.3 168.0 205.3 247.4 RANDOM PREFERENCES BY PARTICIPANT

36 participants, 207 ratings in total

Mean sd 2.5% Median 97.5%

Partial priorities for modes of administration (on strictly positive absolute scale from AHP)

Daily subcutaneous 0.09 0.05 0.01 0.10 0.18 Daily oral 0.42 0.04 0.34 0.42 0.50 Monthly infusion 0.29 0.03 0.23 0.29 0.36 Weekly intramuscular 0.20 0.02 0.16 0.19 0.24


Daily oral 1.00 N/A

(constant) 1.00 1.00 1.00 Monthly infusion 0.60 0.16 0.29 0.60 0.92 Weekly intramuscular 0.29 0.15 0.02 0.30 0.55

Ratings standard deviation 1.06 0.07 0.93 1.05 1.20 Proportional between-participant preference standard deviation 0.33 0.04 0.25 0.34 0.41 Residual deviance 206.1 20.3 168.4 205.5 247.6

The mean residual deviance is almost identical in both the fixed and random preference models, and

lies just below 207, the number of observations, indicating that both models fit the data well, indeed

Chapter III.3

199

equally well. In such instances it is good practice 119 to favour the model with fewer parameters i.e.

the fixed preference model.

The preferences for administration modes seem reasonably consistent with those in the investigator

ratings dataset in that the categories are ranked equivalently, although the estimated partial values

differ somewhat.

There is not much that can be inferred about the homogeneity (or otherwise) of the study

populations by comparing the standard deviation estimates in each dataset. The estimated ratings

standard deviation in the fixed preference model here (mean=1.21) is higher than the estimate

obtained from the investigator ratings dataset (mean=0.635), but it would be presumptuous to

interpret this is indicating a true difference in populations, since the latter estimate is based on only

3 participants and comes from a tree structure that provides minimal information on such a

parameter.

III.3.4.3 PROTECT investigator and patient ratings

Table 23 shows the results for a ratings model combining both the PROTECT patient and investigator

ratings. Results are shown for two versions of the model: fixed preferences, and random (by

participant) preferences. A version allowing for preference variation at the between-study level has

also been developed but has not been fitted here, partly to avoid generating too many results and

partly because there are only two studies providing data. This model will however be used later in

the chapter when more studies (using different elicitation methods) are added – see 0. From here

on the preferences for administration modes are presented here only as partial values, not as AHP-

style partial priorities.

Chapter III.3

200

Table 23 – Posterior distribution of preferences for simultaneous analysis of all participants in the investigator ratings and patient ratings datasets

FIXED PREFERENCES 2 studies, 39 participants and 243 ratings in total


Preference weights Relapse 1 event 11.5% 6.5% 2.7% 10.1% 27.5% Disability progression 1 event 15.0% 6.2% 5.6% 14.2% 29.3% PML 1 event 32.8% 6.4% 20.8% 32.6% 45.9% Herpes reactivation 1 event 8.6% 4.9% 2.2% 7.5% 20.7% Liver enzyme elevation 1 event 8.8% 5.0% 2.3% 7.7% 21.2% Seizures 1 event 5.1% 3.2% 1.2% 4.3% 13.5% Congenital abnormalities 1 event 5.1% 3.3% 1.2% 4.3% 13.6% Infusion/injection reactions 1 event 4.7% 2.5% 1.3% 4.2% 11.0% Allergic/hypersensitivity reactions 1 event 3.9% 3.1% 0.6% 3.0% 12.2% Flu-like reactions 1 event 4.1% 3.3% 0.6% 3.2% 12.8% Administration (daily oral vs daily subcutaneous) N/A 0.5% 0.2% 0.2% 0.5% 1.0% Administration (monthly infusion vs daily subcutaneous) N/A 0.3% 0.1% 0.1% 0.3% 0.6% Administration (weekly intramuscular vs daily subcutaneous) N/A 0.2% 0.1% 0.1% 0.2% 0.4%

Ratings standard deviation

N/A 1.17 0.05 1.06 1.16 1.28 Residual deviance N/A 242.0 22.1 200.9 241.3 286.9 RANDOM PREFERENCES BY PARTICIPANT

2 studies, 39 participants and 243 ratings in total


Preference weights Relapse 1 event 12.0% 6.7% 2.6% 10.7% 28.6% Disability progression

1 event 15.9% 6.4% 5.7% 15.0% 31.0% PML 1 event 32.6% 7.0% 20.1% 32.3% 47.5% Herpes reactivation 1 event 8.4% 4.6% 2.3% 7.4% 19.9% Liver enzyme elevation

1 event 9.0% 5.1% 2.4% 7.9% 21.7% Seizures 1 event 5.2% 3.1% 1.3% 4.5% 13.0% Congenital abnormalities

1 event 4.9% 2.8% 1.3% 4.2% 12.2% Infusion/injection reactions 1 event 4.4% 2.4% 1.3% 3.9% 10.7% Allergic/hypersensitivity reactions 1 event 3.3% 2.5% 0.6% 2.6% 9.9% Flu-like reactions 1 event 3.8% 3.0% 0.6% 2.9% 11.9% Administration (daily oral vs daily subcutaneous) N/A 0.6% 0.2% 0.3% 0.6% 1.1% Administration (monthly infusion vs daily subcutaneous) N/A 0.4% 0.1% 0.2% 0.4% 0.7% Administration (weekly intramuscular vs daily subcutaneous) N/A 0.2% 0.1% 0.1% 0.2% 0.5%

Ratings standard deviation N/A 1.01 0.06 0.90 1.01 1.14

Proportional between-participant preference standard deviation

N/A 0.33 0.04 0.25 0.33 0.40 Residual deviance N/A 241.9 21.9 200.7 241.2 286.5

Chapter III.3

201

There is very little difference between the results for the two versions of the model and the

estimated between-participant preference standard deviation in the second model is low –

suggesting the simple fixed preference model is most appropriate. The residual deviance is very

similar in both models (again indicating the fixed preference model should be favoured), and falls

slightly below the number of observations (243), indicating a good fit.

The model appears to be combining the two datasets in an appropriate manner. As expected, the

partial values for administration modes fall between the estimates from each individual dataset, and

the preference weights agree with those in Table 20, as they are drawn only from the investigator

ratings dataset.

In this instance, the main benefit of adding in the patient ratings data is only a modest refinement in

the estimates of preferences for administration modes (since these are the only criteria in the

patient ratings dataset). However, these results provide an important proof of concept, showing

that ratings from separate studies using different sets of outcomes can be combined in a single

analysis. As well as obtaining combined results, this provides a means to examine the between-

study heterogeneity of the elicited preferences. In this instance, another benefit of including the

patient ratigns data is that it allows one to obtain a data-based estimate of the ratings standard

deviation to feed back into the investigator ratings analysis, which otherwise had to rely only on the

prior to estimate this parameter.

The ratings standard deviation and between-participant standard deviation cannot be directly

compared as they are defined on different scales. To convert the estimated mean ratings standard

deviation of 1.01 to the proportional absolute scale used for the between-participant preference

standard deviation requires the following steps:

• Start on log ratings scale: 1.01

• Divide by √2 to obtain standard deviation associated with a single preference strength

rather than a rating which compares two preference strengths (assuming independence of

preferences), still on log ratings scale: 0.71

• Exponentiate to obtain standard multiplicative factor on absolute scale: 2.02

• Subtract 1 to obtain proportional standard deviation on absolute scale: 1.02

(As it happens, this set of transformations is approximated by the identity when the starting figure is

close to 1, but this is a mere coincidence.)

Chapter III.3

202

The estimated (between-participant) preference standard deviation of 0.33 appears therefore to be

considerably lower than the within-participant random error, indicating a good degree of preference

homogeneity among and between populations even if individual participants apparently found it

hard to always express their preferences consistently or accurately.

III.3.5 Discussion

The Bayesian model appears viable and brings with it the advantages inherent to all Bayesian MCMC

models: the ability to allow for uncertainty in the data/parameters and propagate the full

uncertainty distribution through the model outputs and any subsequent calculations together with

the ability to allow for prior information if desired. Furthermore it has been shown here that the

model can combine more than one source of ratings despite differences in the original elicitation

methods.

The results in Table 19 show that this model’s Bayesian method gives weights that are noticeably

influenced by the prior . Here, this resulted in shrinkage towards equal weighting on all criteria.

Given that some outcomes are a priori more “important” than others, such even weighting in the

prior may be undesirable; it may be more appropriate to use weights from a pilot study, or an

empirical Bayes approach where the prior is based on the data. One way to specify such a prior is to

set 𝑒𝑔𝜔~𝐺𝑎𝑚𝑚𝑎(𝑤𝜔 , 0.01) where 𝑤𝜔 is an empirical deterministic estimate of the weight on

outcome 𝜔.

Moreover, the standard deviations estimated from the data (Tables 21 to 23) were sufficient to

cause substantial disagreement with the deterministic analysis. Indeed, this was the case despite

the fact that the standard deterministic analysis was not the same in both cases (swing weighting for

the investigator ratings, eigenvalue AHP for the patient ratings). This is an unexpected result that

raises questions over whether a deterministic analysis of ratings data can be relied upon to provide a

robust central estimate. It would be of value to confirm these findings in other datasets, and

perhaps using alternative models for the distribution of ratings, but on the face of these results

there is a strong argument for favouring the Bayesian analysis over standard deterministic

techniques for ratings even if only point estimates of preferences are required.

Another advantage of the Bayesian analysis is that it provides a principled framework for combining

ratings from multiple participants. It is of course possible to use the standard deterministic analyses

and calculate averages across participants – but the averages can be taken at the level of either the

input ratings or the output parameters, and there has been substantial discussion in the literature as

to the appropriateness and underlying assumptions of these two approaches 157,194,195. The use of a

Bayesian hierarchical model means this debate can be avoided, and provides a principled solution

Chapter III.3

203

based on a clear underlying model, with the added bonus that MCMC allows one to derive full

posterior distributions rather than just summary statistics.

Insofar as the datasets could be compared (i.e. the preferences for administration modes), they

appeared reasonably homogeneous; there will be more scope to investigate the homogeneity of

preferences in the sections that follow.

It is somewhat surprising that residual deviance was virtually the same for the fixed and random

preference versions of the model, as this is generally not the case in fixed- and random-effects meta-

analysis. The individual deviances for each observation do vary between the models, but after

summation the same total is obtained. It is not yet clear if this is an indicator of the homogeneity in

the datasets used here or an inherent property of the model.

Although the model was designed to be generalisable to a variety of datasets, there are some

limitations. Non-standard variants of criteria ratings data (such as ordinal ratings) are not currently

supported. The model also cannot directly analyse the verbal judgements made in MACBETH unless

these are first translated to a numerical scale of some kind.

The estimated preferences appear quite reasonable; certainly there are no figures that stand out as

implausible, and the criterion with the greatest weight is PML, which is fairly unarguably the most

debilitating clinical outcome. It is worth recalling that the participants in the investigator ratings

dataset were not truly three independent individuals (see III.3.2.1), but nevertheless they do still

represent real-world preference estimates. Therefore, with regard to the RRMS case study, the

ratings model has now provided some real-world evidence on relative preference weights for:

• ten clinical outcomes, three of which were included in the evidence synthesis in Chapter II

and are therefore of primary interest (relapses, disability progression, liver enzyme

elevation)

• four modes of administration, covering all the RRMS treatments that were featured in

Chapter II.

This is a good start, but there is more usable evidence on RRMS preferences beyond just these

relative ratings, as the remainder of this chapter will demonstrate.

Chapter III.4

204

III.4 Bayesian analysis of choice data

As discussed in III.1.2.4, Bayesian models for analysing discrete choice experiments have already

been established. The aim of this section is therefore not to describe a new methodology, but rather

to use standard methods to carry out a Bayesian analysis of a choice dataset that is relevant to the

RRMS study. Later, the results will be compared and combined with those from the methods

developed elsewhere in this chapter, and used to inform a preference-weighted benefit-risk

assessment based on the evidence synthesis in Chapter II.

III.4.1 Data structure

The data are drawn from a single discrete choice experiment involving several individual

participants. All choice tasks in this dataset are binary; examples of analyses involving more

alternatives per choice set can be found in the literature 167.

III.4.1.1 Study-level constants

Let 1, … , be a set of outcomes, and 1, … , 𝑛 a set of participants (again, it will typically it will

be assumed that these are individuals, but a single “participant” can also be a group of people make

choices on a collective consensus basis. Let 1, … ,𝑁𝐶𝑆 be the collection of completed choice sets

in the dataset, i.e. the number of observations.

III.4.1.2 Choice-set-level data

For each observation 𝑘 ∈ 1, … ,𝑁𝐶𝑆 the following variables are supplied:

• 𝑠𝑢𝑏𝑗𝑒𝑐𝑡𝑘 ∈ 1,… , 𝑛 an identifier for the subject or participant who completed the choice

task

• 𝑥𝜔𝑗 (𝜔 ∈ 1,… , , 𝑗 ∈ 𝐴𝑘 , 𝐵𝑘 ) the value of criterion 𝜔 in scenario 𝑗 within the

choice set

• 𝑦𝑘 ∈ 0,1 a binary variable indicating whether the participant chose scenario 𝐴𝑘 (𝑦𝑖 = 1)

or scenario 𝐵𝑘 (𝑦𝑖 = 0)

III.4.2 Dataset - PROTECT patient choice data

This dataset consists of choice data originally elicited within PROTECT’s workstream on patient and

public involvement 193, like the patient ratings in III.3.2.2. The choice tasks were performed in a

paper-based survey issued to RRMS patients at a London clinic and the study design, ethical

approval and consent processes have been described elsewhere193. The choice experiment was

designed with 64 choice tasks; these were divided between 4 versions of the survey with 16 choice

tasks each so as to ease the burden on participants. The criteria used were: relapse rate, risk of

Chapter III.4

205

disability progression, risk of PML, risk of allergic/hypersensitivity reactions, risk of serious allergic

reactions, and risk of depression.

The first four of these are by now familiar, but the latter two are new and represent additional

possible risks of treatment. These are listed as possible side effects in the Summary of Product

Characteristics for at least one of the investigated treatments.

• Serious allergic reactions – systemic anaphylactic reactions to treatment requiring

hospitalisation

• Depression – thoughts of hopelessness, lack of self-worth, suicidal ideation.

A listing of the dataset (in abridged format) is shown in Appendix A.2.

III.4.3 Choice model

III.4.3.1 Binomial logit

Recall from III.1.3.1 the basic principles of choice models:

• The utility 𝑉𝑋𝑖 of a scenario X to an individual i is assumed to consist of (i) a deterministic

component 𝑈𝑋 defined as a specific function of the criteria, with parameters to be

estimated, and (ii) an individual-specific random error term 휀𝑖. That is, 𝑉𝑋𝑖 = 𝑈𝑋 + 휀𝑖 and

𝑈𝑋 = 𝑓(𝑥1, … , 𝑥𝑚; 𝜷) where 𝑥1, … , 𝑥𝑚 are the criteria values in scenario X and 𝜷 is the set

of preference parameters to be estimated. If, as here, a linear utility model is assumed, then

𝑈𝑋 = 𝛽1𝑥1 +⋯+ 𝛽𝑚𝑥𝑚 but the method is not restricted to this particular form.

• An individual i selects option A if 𝑉𝐴𝑖 > 𝑉𝑋𝑖 for all alternative options X.

A number of models are available for choice data; here the binomial logit model is used. This is the

most widely used choice model167 and is restricted to binary choice sets. The model assumes a

logistic link function between the excess utility of an alternative and the probability of choosing that

alternative. In other words, for a binary choice set k comparing scenarios 𝐴𝑘 and 𝐵𝑘, the log odds of

choosing 𝐴𝑘 over 𝐵𝑘 is given by 𝑈𝐴𝑘 −𝑈𝐵𝑘 (following the notation established in III.1.3, or

equivalently, the probability of choosing A is 𝑒𝑈𝐴𝑘 (𝑒𝑈𝐴𝑘 + 𝑒𝑈𝐵𝑘)⁄ . A number of arguments exist for

the validity of the logit model 168.

𝑈𝐴𝑘 and 𝑈𝐵𝑘 are the values of the (assumed linear) utility function for the criteria levels in scenarios

A and B respectively. The linear coefficients for each criterion are the utility coefficients 𝑒𝑔𝜔 (or

the random equivalent 𝑒𝛾𝑖𝜔 for participant i in the random preferences model). Thus the probability

𝑝𝐴𝑘 of choosing 𝐴𝑘 over 𝐵𝑘 in the random preferences model satisfies

Chapter III.4

206

𝑙𝑜𝑔𝑖𝑡(𝑝𝐴𝑘) = 𝑈𝐴𝑘 − 𝑈𝐵𝑘 = ∑ 𝑒𝛾𝑖𝜔𝑥𝜔𝐴𝑘 − 𝑒𝛾𝑖𝜔𝑥𝜔𝐵𝑘

𝜔=1

= ∑ 𝑒𝛾𝑖𝜔(𝑥𝜔𝐴𝑘 − 𝑥𝜔𝐵𝑘)

𝜔=1

where 𝑥𝜔𝐴𝑘 and 𝑥𝜔𝐵𝑘 are the levels of criterion 𝜔 in alternatives 𝐴𝑘 and 𝐵𝑘 respectively.

It is also possible to include a constant intercept term in the equation above in order to capture any

difference in utility between the choice set alternatives that is not explained by the criteria in the

model. That is, the equation above could be rewritten as

𝑙𝑜𝑔𝑖𝑡(𝑝𝐴𝑘) = 𝑈𝐴𝑘 −𝑈𝐵𝑘 = ∑ 𝑒𝛾𝑖𝜔(𝑥𝜔𝐴𝑘 − 𝑥𝜔𝐵𝑘)

𝜔=1

+ 𝐴𝑆𝐶

where ASC represents any difference in utility between v and B that is not explained by the criteria

𝜔 ∈ 1,… , Ω. In the DCE literature this intercept term is referred to as an alternative-specific

constant. In some DCEs in other fields (such as market research) where the scenarios correspond to

familiar real-world alternatives (such as brands of consumer goods) and are labelled as such, ASCs

can be useful in capturing the effect of unmeasurable criteria (eg brand image) that cannot be

explicitly included in the model. In the preference elicitation context the choice sets are

hypothetical and unlabelled with any real-world interpretation; the ASC thus only represents any

inherent bias the participants may have between the first (or left-hand) alternative 𝐴𝑘 and the

second (or right-hand) alternative 𝐵𝑘. A preliminary frequentist analysis of the RRMS choice dataset

revealed a statistically insignificant ASC at the 5% level, meaning that there is little evidence for any

such left-right bias. For this reason no ASC is therefore included here, but it would be

straightforward to do so.

Having determined 𝑝𝐴𝑘 , the likelihood of an observed choice 𝑦𝑘 is simply 𝑦𝑘~𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖(𝑝𝐴𝑘).

As usual both a “fixed preferences” and “random preferences” version of the model are possible,

but in Bayesian MCMC applications the fixed effects model is preferred, as the next paragraph

explains.

III.4.3.2 Individual vs collapsed analysis

Specifying the model with a Bernoulli likelihood (one choice set and subject per observation) in an

MCMC context results in a model that is prohibitively slow to update, based on initial trial runs. In

the fixed preferences model, significant time savings can be achieved by aggregating/collapsing the

Chapter III.4

207

data so that each observation gives the proportion of subjects choosing y=1 for a particular choice

set, and using a Binomial likelihood corresponding to the sum of the individual Bernoullis. The

downside of this specification is that it cannot incorporate random preferences, as it relies on the

choice probabilities being the same for all subjects. For practical reasons, therefore, only a fixed

preference choice analysis will be performed here. It may be worth recalling that a fixed preference

analysis was found to be appropriate in the patient ratings dataset; the same may be a reasonable

assumption of the choice dataset as enrolment was simultaneous from the same population, with

participants randomised into one or the other dataset.

III.4.3.3 Priors

In the fixed effect model it is only necessary to specify priors for either the preference strengths or

utility coefficients; here the latter are all assigned identical Gamma(1,0.01) priors as set out in

III.2.1.7.

III.4.4 Results

The results of the fixed preferences model are shown in Table 24 alongside the results of a

frequentist multinomil logit analysis conducted using the “mlogit” package in R version 3.2.1.

Normalised weights have only been calculated in the Bayesian model, since it is not straightforward

to obtain their standard errors in the frequentist framework (and the original coefficients are

sufficient to compare the two models).

Chapter III.4

208

Table 24 - Posterior distribution of preferences in the patient choice dataset

124 participants, 1755 choices in total FIXED PREFERENCES Frequentist

analysis Bayesian analysis

unit mean SE mean SE 2.5% median 97.5%

Utility coefficients on choice scale; i.e. effect on log odds of choice

Relapse 1 relapse/year -0.80 0.31 -0.85 0.31 -1.46 -0.85 -0.26

Disability progression

100% risk over 2 years -12.50 0.65 -12.45 0.65 -13.75 -12.44 -11.19

PML 100% risk over 2 years -388.3 30.0 -383.7 30.6 -444.3 -383.5 -324.5

Allergic/hypersen-sitivity reactions

100% risk over 2 years -1.22 0.18 -1.24 0.18 -1.59 -1.23 -0.89

Serious allergic reactions

100% risk over 2 years -36.34 4.10 -39.79 4.07 -47.87 -39.77 -31.88

Depression 100% risk over 2 years -5.81 0.77 -5.16 0.76 -6.66 -5.16 -3.67


Relapse 1 relapse/year N/A

N/A 0.2% 0.1% 0.1% 0.2% 0.3%

Disability progression

100% risk over 2 years N/A N/A 2.8% 0.2% 2.5% 2.8% 3.2%

PML 100% risk over 2 years N/A N/A 86.5% 1.1% 84.3% 86.6% 88.6%

Allergic/hypersen-sitivity reactions


Serious allergic reactions


Depression 100% risk over 2 years N/A N/A 1.2% 0.2% 0.8% 1.2% 1.5%

Residual deviance N/A N/A N/A 94.3 3.5 89.5 93.6 102.7

The results appear to show that the Bayesian model is working as expected and fits the data well:

the Bayesian estimates of the utility coefficients are close to those in the frequentist analysis, and

the residual deviance is well below the number of individual participants in the choice experiment.

III.4.5 Discussion

Akthough implementing a Bayesian binomial logit choice analysis is not in itself novel, these results

show that it can be done successfully within the specific parameterisation that I am using in this

thesis. This forms the starting point for a unified model that can combine choice data with other

sources of preference information, an approach that will be pursued in 0.

Although many criteria appear in both this dataset and the investigator ratings dataset, the utility

coefficients cannot be directly compared, as the absolute scale in the ratings model is arbitrary; and

neither can the weights, as the overall set of criteria is not the same. It is however straightforward

Chapter III.4

209

to renormalise the weight estimates within the set of shared criteria, putting them on a comparable

basis, as shown in Table 25 (which is based on the median weights).

Table 25 – Comparison of criteria weights in the investigator ratings and patient choices datasets.

Criterion Median renormalised weight (investigator ratings)

Median renormalised weight (patient choices)

Relapse 14.5% 0.03%

Disability progression 22.5% 3.05%

PML 60.7% 93.91%

Allergic/hypersensitivity reactions

2.3% 3.01%

A few conclusions are immediately obvious:

• PML has the highest weight in both models/datasets, as expected;

• The weights for PML and allergic/hypersensitivity reactions are of a similar order in both

datasets;

• The weights for disability progression and relapse are considerably lower in the patient

choice dataset – and in the case of relapse, the difference is extreme.

To recap on the chapter so far: Bayesian models for eliciting preferences based on individual ratings

data and individual choice have been demonstrated and fitted using a shared parameterisation.

Three sets of individual elicitation data, originating from different methods, have been analysed.

The evidence so far on homogeneity/heterogeneity of preferences has been mixed.

Later, in 0, a more formal approach to comparing and combining these preference data sources will

be attempted. The next section, however, looks at another potential data source - published

elicitation studies – and investigates whether a meta-analysis of preferences is possible.

Chapter III.5

210

III.5 Bayesian meta-analysis of preferences

Just as decision makers frequently rely on evidence synthesis of existing studies to provide clinical

data instead carrying out their own studies, they may also wish to rely on externally elicited

preferences. There could be many reasons for this – for example, elicitation exercises can be difficult

to design, take time and money to carry out, it may be difficult to get access to enough of the right

kind of participants. Even if one has original elicited data available for analysis, it may sometimes be

desirable to borrow strength from external studies. An exploration of the homo/heterogeneiety

amongst preference elicitation studies may be of interest in its own right. This suggests a need for a

meta-analytical methodology that can aggregate the results of multiple elicitation studies; however,

no such methods have been described.

At least one review of patient preference studies in a particular disease area has been published, but

it was a qualitative analysis with no attempt made at numerical aggregation of the data 196. The

authors stated that “given the high degree of variability across choice-based preference-elicitation

studies with regard to design, sample composition, statistical analyses, and research questions, and

the relatively small literature, a meta-analysis was deemed untenable”. Some of these points are

beyond the meta-analyst’s control, but with appropriate methodology (to be explored in this

section) it may be possible to overcome some of the bigger issues relating to the heterogeneity of

studies’ design and statistical methods. The number of preference studies in the literature is always

increasing, and it does not seem unfeasible that in some disease areas a large enough set of studies

from a relatively homogeneous population, suitable for meta-analysis, may be available soon if not

already. As interest in quantitative methods using preferences increases, and analysts and decision

makers seek to make sense of a growing volume of data, I anticipate that an upsurge in demand for

meta-analyses of preferences is imminent.

Since medical decisions will often involve more than two criteria, and in general not all of the source

studies will feature every relevant criterion, any meta-analytical method would need to be flexible

enough to handle problems and datasets with arbitrary dimensions, as well as a range of study

designs and analyses. This section attempts to develop a suitable model.

Chapter III.5

211

III.5.1 Data structure

III.5.1.1 Network-level constants

Let 1, … , be a set of outcomes, and 1,…,NPS a set of preference elicitation studies.

III.5.1.2 Study-level constants

For each preference elicitation study 𝑖 ∈ 1,… ,𝑁𝑃𝑆 the following constants describe the

dimensions:

𝑁𝑃𝐶𝑖 ∈ 1,… , the number of criteria within study 𝑖 ∈ 1,… , 𝑁𝑃𝑆 . The criteria within study i

are ordered such that 𝜔𝑖𝑗 ∈ 1,… , refers to the jth criterion.

𝑁𝑃𝑖𝑗 ∈ 1,2,3, … the number of individual points, levels or categories of criterion 𝑗 ∈ 1,… , 𝑁𝑃𝐶𝑖

for which a utility coefficient is provided in study 𝑖 ∈ 1,… ,𝑁𝑃𝑆 .

𝑁𝐿𝑖𝑗 ∈ 1,2,3, … the number of levels of criterion 𝑗 ∈ 1,… ,𝑁𝑃𝐶𝑖 for which a utility coefficient is

to be estimated by the model. Continuous criteria are assumed to have linear partial value

functions, so that 𝑁𝐿𝑖𝑗 = 1; but for categorical criteria a separate coefficient is required for each

level, meaning that 𝑁𝐿𝑖𝑗 = 𝑁𝑃𝑖𝑗.

III.5.1.3 Outcome-level preference data

The data consists of estimated coefficients from the utility models in the source studies and their

standard errors, denoted by 𝑐𝑖𝑗𝑘 and 𝜋𝑖𝑗𝑘 respectively for study i, outcome j, point k.

The coefficients often have to be transformed from their raw state in order to ensure consistency of

scale and category coding between studies. This is discussed in further detail in III.5.3.1 and III.5.3.2.

𝑙𝑖𝑗𝑘 ∈ 1,2, … ,𝑁𝐿𝑖𝑗 refers to the criterion level for which a utility coefficient is to be estimated at

the within-study coordinates criterion j, point k within study i. In the overall parameterisation of

preferences each level is represented as a separate criterion, denoted by 𝜔𝑖𝑗𝑘 ∈ 1,2, … , 𝑁𝑃𝐶𝑖𝑗.

Another variable 𝑥𝑖𝑗𝑘 is also used, containing (for continuous criteria) the value of criterion j to

which the utility coefficient 𝜋𝑖𝑗𝑘 relates. Categorical criteria are coded using dummy indicator

variables so that 𝑥𝑖𝑗𝑘 = either 0 or 1.

III.5.2 Dataset: RRMS

The method was applied to the relapsing remitting multiple sclerosis (RRMS) case study. The aim

was to perform a meta-analysis on the results of published patient preference elicitation studies

Chapter III.5

212

relating to the RRMS treatment outcomes and administration modes that have already been

encountered in the case study – namely, relapses, disability progression, liver enzyme elevation and

Additional literature searches were carried out to identify utility/preference elicitation studies. Only

studies reporting patient preferences (as opposed to the preferences of some other group such as

clinicians) were included. Each study was screened to determine which criteria had been included,

how these were defined and whether sufficient information was reported to enable inclusion in the

analysis. For further details see Appendix A.

The criteria that were included and the assumed form of their partial value functions are shown in

Table 26.

Table 26 – RRMS case study outcomes for the preference synthesis module. PVF = partial value function

Outcome PVF type Units / levels

Relapse frequency Linear Relapses per year

Risk of disability

progression

Linear Risk of progression

Route and frequency

of administration

Categorical Daily oral, 1-3x weekly intramuscular or

subcutaneous, monthly intravenous.

The precise categories used for administration modes vary between studies, reflecting the diversity

of treatment regimens available. In order to ensure a reasonable amount of data per category, and

for compatibility with the treatments in the RRMS case study and the ratings datasets, it was

decided to pool all self-administered injection methods (i.e. subcutaneous and intramuscular) at

frequencies from once weekly to once every 2 days (or thrice weekly).

The studies that contributed data to each case study are shown in Table 27.

Details of the search and screening process are set out in Appendix A.

Chapter III.5

213

Table 27 – Source studies for the RRMS dataset for the preference synthesis module.

RRMS studies Outcomes used Number of participants

ARROYO 197 Relapse frequency Disability progression risk Route and frequency of administration

221

GARCIA-DOMINGUEZ 198 Route and frequency of administration

125

MANSFIELD 199 Relapse frequency Route and frequency of administration

301

POULOS 200 Relapse frequency Disability progression risk

189

UTZ 201 Route and frequency of administration

156

WILSON 2014 202 Relapse frequency Disability progression risk Route and frequency of administration

291

WILSON 2015 203 Relapse frequency Route and frequency of administration

50

III.5.3 Data extraction

Care must be taken in extracting the estimated coefficients from the source studies as outcomes

may be expressed on different scales and/or using different coding conventions. These factors can

affect both the interpretation of the coefficients and their statistical properties, as detailed below.

III.5.3.1 Categorical or continuous

Assuming that utility is linear in any continuous criteria (perhaps after a transformation) is a fairly

commonplace practice. However, many studies (particularly discrete choice experiments) instead

allow for non-linearity by estimating separate utility coefficients for a number of discrete levels of a

criterion (i.e. treating it as a categorical variable). In principle the same approach could be used in a

meta-analysis, but unfortunately studies will tend (more often than not) to use different discrete

levels, rendering the results incompatible. For meta-analytical purposes, therefore, it will often be

particularly convenient therefore to assume linearity (or, perhaps, some other continuous

monotonic function with a simple parameterisation) after having examined the data to check that

such a relationship appears appropriate. Where a study reports a linear coefficient it can be used as

is; where a study reports discrete levels these can be fitted within the linear framework by

performing a linear regression on the discrete points within the study. This regression can be

incorporated within the main model, as will be shown in III.5.5, so technically this is a form of meta-

regression 204,205.

Chapter III.5

214

III.5.3.2 Coding schemes

When fitting regression models with categorical predictors, there is more than one coding scheme

that can be used to represent the categorical data.

III.5.3.2.1 Dummy coding

Dummy coding is probably the most widely used coding method. For a categorical variable X with n

categories, dummy coding involves choosing a category to act as the “baseline” or “reference”.

Then, order the variables such that category 1 is the reference, and create n-1 indicator variables

𝑋2, … , 𝑋𝑛 such that 𝑋𝑖 = 1 for an individual observation in category i and 0 otherwise, as shown in

the example in Table 28 for 𝑛 = 4. There is no indicator variable for the reference category 1, as the

dependent variable for observations in this category is reflected in the intercept term of the

regression equation. To include an extra variable for category 1 would add an unnecessary degree

of freedom to the model and hinder estimation.

Table 28 – Example of a 4-category variable and its dummy-coded indicator variables

X

(original

category)

𝑋2 𝑋3 𝑋4

1 0 0 0

2 1 0 0

3 0 1 0

4 0 0 1

Because the intercept reflects the dependent variable in the reference category, the interpretation

of the regression coefficient 𝛽𝑖 corresponding to 𝑋𝑖 (i >1) is the difference in the dependent variable

between category i and the reference category. As such, the estimated coefficients for the indicator

variables 𝑋𝑖 share a common baseline, and are therefore fundamentally correlated, which will now

be formalised in Theorem 4. The theorem and its proof are largely applicable to the general

regression context, but will be set out here in terms of DCEs so that the interpretation of the

assumptions is clear. There are four key assumptions that make the correlations easily tractable:

• independence of observations;

• equal variance of the dependent variable (i.e. perceived utility) in all predictor categories;

• level balance in the choice experiment design (i.e. the categories of the various predictor

variables occur with equal frequency in the choice sets); and

Chapter III.5

215

• orthogonality in the choice experiment design (i.e. there is no confounding between

different categories or variables in the choice design – in other words the effect of each

variable can be estimated independently of the level of other variables).

The first two are typical assumptions underlying almost all DCEs; level balance and orthogonality, on

the other hand, are desirable (but not universal) properties of well-designed DCEs 206.

Theorem 4: In an orthogonal level-balanced DCE, assuming independent observations and equal

variance of utility in all categories, under dummy coding the correlation between coefficients for

distinct levels of the same variable is 0.5 and the correlation between coefficients for different

variables is 0.

Proof:

Assume initially that all predictor variables are categorical.

Suppose that X is the predictor variable (with m categories) and X2,...,Xm are its dummy-coded

indicator variables for which we are trying to estimate the regression coefficients β2,..., β m. We can

assume without loss of generality that there is one other predictor Z with n categories (since if X is

the only predictor, then Z can still be said to exist with (trivially) n=1; and if there is more than one

other predictor with numbers of categories given by n1, n2, n3,,... then they can be combined into a

single predictor Z with the number of categories given by n1*n2*n3*... i.e. all possible combinations

of the original categories. The coefficients of Z will be linear combinations of the original

coefficients, and vice versa). Let Z2,...,Zn be the dummy-coded indicator variables for Z with

coefficients γ2,..., γ n.

Let 𝑌𝑖𝑗 denote the average value of the continuous dependent variable Y (in this case, utility

expressed as the log odds of choice) for observations in X category 𝑖 ∈ 1,… ,𝑚 i and Z category 𝑗 ∈

1, … , 𝑛. In an orthogonal design, all combinations of categories occur equally often206; therefore

there are observations in every combination of categories and 𝑌𝑖𝑗 is well-defined. It follows from

the assumptions of equal variance of utility across categories, and equal frequency of categories

within the data (level balance), that 𝑣𝑎𝑟(𝑌𝑖𝑗 ) is a constant 𝜎2 for all i,j.

Within a Z category Zj, the estimated difference in Y (utility) between X category i and the reference

category is given by 𝑌𝑖𝑗 − 𝑌1𝑗 . The estimated coefficient 𝛽 (𝑖 ∈ 2,… ,𝑚) will correspond to the

average of these estimated differences across all Z categories j in proportion to their frequency in

the choice sets. Under the assumption of level balance, all categories of Z occur with equal

frequency and thus the estimate is a simple average, 𝛽 =1

𝑛∑ (𝑛𝑗=1 𝑌𝑖𝑗 − 𝑌1𝑗 ) .

Chapter III.5

216

Thus 𝑣𝑎𝑟(𝛽) = 𝑣𝑎𝑟(1

𝑛∑ (𝑛𝑗=1 𝑌𝑖𝑗 − 𝑌1𝑗 )) =

1

𝑛2∑ 2𝜎2𝑛𝑗=1 (by independence of observations)

= 2𝜎2/𝑛.

Similarly 𝛾 =1

𝑚∑ (𝑚𝑖=1 𝑌𝑖𝑗 − 𝑌𝑖1 ) and 𝑣𝑎𝑟(𝛾) = 2𝜎

2/𝑚.

For distinct 𝑖1, 𝑖2 ∈ 1, …𝑚, 𝑐𝑜𝑣(𝛽𝑖1,𝛽𝑖2

) =1

𝑛2𝑐𝑜𝑣(𝑌1𝑗 , 𝑌1𝑗 ) (by independence of observations)

= 𝜎2/𝑛 .

Therefore the correlation coefficient between 𝛽𝑖1and 𝛽𝑖2

is 0.5 as required.

𝑐𝑜𝑣(𝛽, 𝛾) =1

𝑚𝑛(𝑐𝑜𝑣(𝑌𝑖𝑗 , 𝑌𝑖𝑗 ) − 𝑐𝑜𝑣(𝑌1𝑗 , 𝑌1𝑗 ) − 𝑐𝑜𝑣(𝑌𝑖1 , 𝑌𝑖1 ) + 𝑐𝑜𝑣(𝑌11 , 𝑌11 )) =

1

𝑚𝑛(𝜎2 − 𝜎2 − 𝜎2 + 𝜎2) = 0 as required.

If Z was constructed as a combination of predictors, then the original coefficients are linear

combinations of 𝛾 and therefore also uncorrelated with 𝛽.

Finally note that if either X or Z is analysed as a continuous linear variable, then each category

contributes an estimate of 𝛽/𝑤𝑖𝑑𝑡ℎ𝑖 (or 𝛾/𝑤𝑖𝑑𝑡ℎ𝑗) where 𝑤𝑖𝑑𝑡ℎ𝑖 (or 𝑤𝑖𝑑𝑡ℎ𝑗) refers to the

difference in magnitude between level i (or j) and the reference. In other words continuous linear

coefficients within DCEs are effectively linear combinations of categorical coefficients, and will

therefore exhibit the same lack of correlation between variables.

III.5.3.2.2 Effects coding

Effects coding is similar to dummy coding but instead of setting all the indicators to 0 in the

reference category, they are instead set to -1, as shown in the example in Table 29 for 𝑛 = 4. This

means that the effective regression coefficient for the reference category is the sum of the

coefficients for the other categories, i.e. 𝛽1 = −∑ 𝛽𝑖𝑛𝑖=2 . Consequently ∑ 𝛽𝑖 = 0

𝑛𝑖=1 , so the

regression intercept reflects the dependent variable not in the reference category but in the “grand

mean” across all categories. This has an appealing symmetry and sometimes makes for a more

convenient interpretation, particularly when interaction terms are included, making effects coding

popular in some fields such as discrete choice modelling 166.

Table 29 – Example of a 4-category variable and its effects-coded indicator variables

Chapter III.5

217

X

(original

category)

𝑋2 𝑋3 𝑋4

1 -1 -1 -1

2 1 0 0

3 0 1 0

4 0 0 1

Although effects coding gives 𝑛 coefficients for 𝑛 categories, knowing any n-1 coefficients fully

determines the final one, since they sum to zero; there are still therefore only n-1 degrees of

freedom. This of course means that there must be correlations between the coefficients, as the

following theorem shows. As before, independence of observations, equal variance of the

dependent variable (i.e. perceived utility) in all predictor categories, level balance and orthogonality

are assumed.

Theorem 5: In an orthogonal level-balanced DCE, assuming independent observations and equal

variance of utility in all categories, under effects coding the correlation between coefficients for

distinct levels of the same variable is -1/(m-1) (where m is the number of categories) and the

correlation between coefficients for different variables is 0.

Proof:

The assumptions and notation are as described in Theorem 4 and the proof proceeds similarly.

Again, assume initially that there are two predictors X and Z, both categorical. This time the

regression coefficients are (for X) β1,..., β m and (for Z) γ2,..., γ n.

Let 𝑌𝑖𝑗 denote the average value of the continuous dependent variable Y (in this case, utility

expressed as the log odds of choice) for observations in X category 𝑖 ∈ 1,… ,𝑚 i and Z category 𝑗 ∈

1, … , 𝑛. In an orthogonal design, all combinations of categories occur equally often ; therefore

there are observations in every combination of categories and 𝑌𝑖𝑗 is well-defined. It follows from

the assumptions of equal variance of utility across categories, and equal frequency of categories

within the data (level balance), that 𝑣𝑎𝑟(𝑌𝑖𝑗 ) is a constant 𝜎2 for all i,j.

Within a Z category Zj, the estimated difference in Y (utility) between X category i and the “grand

mean” of X categories is given by 𝑌𝑖𝑗 −1

𝑚∑ 𝑌𝑘𝑗 𝑚𝑘=1 . The estimated coefficient 𝛽 (𝑖 ∈ 1,… ,𝑚)

will correspond to the average of these estimated differences across all Z categories j in proportion

Chapter III.5

218

to their frequency in the choice sets. Under the assumption of level balance, all categories of Z

occur with equal frequency and thus the estimate is 𝛽 =1

𝑛∑ (𝑛𝑗=1 𝑌𝑖𝑗 −

1

𝑚∑ 𝑌𝑘𝑗 𝑚𝑘=1 ) .

Thus 𝑣𝑎𝑟(𝛽) = 𝑣𝑎𝑟(1

𝑛∑ (𝑛𝑗=1 𝑌𝑖𝑗 −

1

𝑚∑ 𝑌𝑘𝑗 𝑚𝑘=1 ))

= 𝑣𝑎𝑟(1

𝑛∑ (𝑛𝑗=1

𝑚−1

𝑚𝑌𝑖𝑗 −

1

𝑚∑ 𝑌𝑘𝑗 𝑘≠𝑖 ))

= 1

𝑛2∑ [(

𝑚−1

𝑚)2𝜎2𝑛

𝑗=1 +𝑚−1

𝑚2 𝜎2] (by independence of observations)

=𝑚−1

𝑚𝑛𝜎2.

Similarly 𝛾 =1

𝑚∑ (𝑚𝑖=1 𝑌𝑖𝑗 −

1

𝑛∑ 𝑌𝑖𝑘 𝑛𝑘=1 ) and 𝑣𝑎𝑟(𝛾) =

𝑛−1

𝑚𝑛𝜎2.

For distinct 𝑖1, 𝑖2 ∈ 1, …𝑚,

𝑐𝑜𝑣(𝛽𝑖1,𝛽𝑖2

) = 𝑐𝑜𝑣 (1

𝑛∑𝑌𝑖1𝑗

𝑛

𝑗=1

−1

𝑚𝑛∑∑𝑌𝑘𝑗

𝑚

𝑘=1

𝑛

𝑗=1

,1

𝑛∑𝑌𝑖2𝑗

𝑛

𝑗=1

−1

𝑚𝑛∑∑𝑌𝑘𝑗

𝑚

𝑘=1

𝑛

𝑗=1

)

= 0 −𝜎2

𝑚𝑛−𝜎2

𝑚𝑛+𝜎2

𝑚𝑛 = −

𝜎2

𝑚𝑛

Therefore the correlation coefficient between 𝛽𝑖1and 𝛽𝑖2

is -1/(m-1) as required.

𝑐𝑜𝑣(𝛽, 𝛾) = 𝑐𝑜𝑣(1

𝑛∑𝑌𝑖𝑝

𝑛

𝑝=1

−1

𝑚𝑛∑∑𝑌𝑘𝑝

𝑚

𝑘=1

𝑛

𝑝=1

,1

𝑚∑𝑌𝑘𝑗

𝑚

𝑘=1

−1

𝑚𝑛∑∑𝑌𝑘𝑝

𝑛

𝑝=1

𝑚

𝑘=1

)

=𝜎2

𝑚𝑛−

𝜎2

𝑚𝑛−

𝜎2

𝑚𝑛+

𝜎2

𝑚𝑛 = 0 as required.

The cases involving more than two predictors or linear coefficients follow using the same logic as for

Theorem 4.

It is worth stressing again that Theorems 4 and 5 do not deal with correlations between coefficients

that may arise due to:

• Correlations between the underlying preference parameters, or

• Lack of orthogonality or balance in the choice experiment’s design

Chapter III.5

219

Rather, they are addressing fundamental and unavoidable structural correlations between the

variables reported in the data - correlations that will be present even given fixed utility parameters

and perfect experimental design.

Another coding scheme sometimes used for categorical variables is orthogonal coding, but this is not

likely to be encountered much in this context as it used to generate a set of user-defined contrasts,

useful for testing specific a priori hypotheses. This is unlike dummy or effects coding, which provide

a comprehensive set of contrasts relative to the same baseline, and hence are more suited to

preference elicitation studies, where a full characterisation of preferences is the aim. Orthogonal

coding by definition generates uncorrelated coefficients, so it can be handled easily enough if

necessary.

It is straightforward to identify whether a study uses dummy coding or effects coding (if the text

does not specify) by examining the results for any given categorical variable; the former should have

a reference category clearly indicated and n-1 coefficients for n categories; the latter will have n

coefficients that sum to zero.

In order to design a model for preference meta-analysis the correlations described in Theorems 4

and 5 must be allowed for in the likelihood, and this is made much simpler if the coefficients in the

data all use the same coding scheme. Fortunately, it is fairly straightforward to convert between

coding schemes (or change the reference category), as detailed in the next section.

III.5.4 Data rebasing

Since the outcomes, units and coding schemes in the data may vary, it will usually be necessary to

“rebase” the data to some extent – that is, to transform the reported coefficients and their standard

errors onto a consistent basis for analysis.

To transform standard errors, knowledge of correlations between the reported coefficients is

required. The assumption throughout is that the source DCEs have perfectly level balanced and

orthogonal designs, meaning that the correlations are as described in Theorems 4 and 5. Many

popular DCE designs such as fractional factorial arrays have good balance and orthogonality206 but

this may not always be the case. If a study is not balanced and orthogonal then details of the study

design could in principle allow better estimates of the correlations to be calculated 207.

Chapter III.5

220

The headings below set out the various kinds of transformations that were found to be necessary in

the RRMS dataset. This is mainly a general walkthrough of the issues involved – for more specifics of

the data extraction and rebasing process used in the case study, see Appendix A.

III.5.4.1 Variables with discrete utility coefficients

This section covers both categorical variables and continuous variables for which coefficients have

been estimated at discrete levels (i.e. no overall linear trend coefficient is provided).

III.5.4.1.1 Changing the coding scheme

Converting reported coefficients from a study to a different coding scheme is essentially just a

matter of constructing simple linear combinations of the coefficients, but care must be taken with

the standard errors due to the correlations between estimates.

Suppose the source study reports a set of coefficients βi (i ∈ 1,… , n) for categories i of some n-

category variable X. (If the study uses dummy coding, then βi is a constant 0 for the reference

category). The aim is to convert these coefficients to dummy coding with a reference category of

our choice – assume without loss of generality this is category number 1.

For each category i ∈ 1,… , n it is necessary to generate a coefficient βi∗ that corresponds to the

change in utility when moving category 1 to i (by the definition of dummy coding).

In other words βi∗ = βi − β1.

To derive the standard error of βi∗ one can use the following formula which follows from the basic

properties of variance and covariance:

𝑣𝑎𝑟(βi ± βj) = 𝑣𝑎𝑟(βi) + 𝑣𝑎𝑟(βj) ± 2cor(βi, βj)√𝑣𝑎𝑟(βi)𝑣𝑎𝑟(βj)

where cor(βi, βj) is the correlation coefficient between βiand βj, recalling (as per Theorems 4 and 5)

that this takes the value 0.5 for dummy coded coefficients and -1/(n-1) for effects coded coefficients.

III.5.4.1.2 Recalibrating intercept to zero (continuous variables only)

Once all of the coefficients have been converted (if necessary) to dummy coding, it is convenient to

adjust the 𝑥𝑖𝑗𝑘 values for any continuous variables originally reported in discrete levels so that no

intercept term is required in the subsequent meta-regression. This is achieved by subtracting the

value of 𝑥𝑖𝑗1 (i.e. the reference level) from all 𝑥𝑖𝑗𝑘 for all points k.

In formal terms, one defines 𝑥𝑖𝑗𝑘∗ = 𝑥𝑖𝑗𝑘 − 𝑥𝑖𝑗1 .

Chapter III.5

221

This amounts to a uniform shift in the x axis which does not affect the linear trend coefficient (since

𝑑𝑈

𝑑𝑥=

𝑑𝑈

𝑑𝑥∗ ) but eliminates the need to fit an intercept, since for the reference level both the

coefficient and the adjusted value of 𝑥𝑖𝑗1 are zero.

For some criteria (such as time-to-event criteria) a value of zero may seem to have a dubious

interpretation, but note that the correct interpretation of 𝑥𝑖𝑗𝑘∗ = 0 is an increase of zero in the value

of 𝑥𝑖𝑗𝑘 relative to a plausible baseline and not 𝑥𝑖𝑗𝑘 = 0 per se.

III.5.4.1.3 Continuous transformations (continuous variables only)

For a given criterion there are often several different outcome measures or units used by different

studies. In order to carry out a meta-analysis, the coefficients all need to be expressed using the

same measure. Converting between outcome definitions is often easy for discrete coefficients. It is

the levels to which a coefficient relates, rather than the coefficient itself, that need transforming.

Many simple transformations (such as a change of units) are entirely straightforward. The criterion

levels to which a coefficient relates (including the reference, if applicable) simply need to be

converted to the desired measure. For example, given a coefficient expressing the utility difference

between 212 degrees Fahrenheit and a reference level of 32 degrees Fahrenheit, one can easily see

that exactly the same coefficient relates to the difference between 100 and 0 degrees Celsius.

In some cases, additional assumptions may be required. The disability progression criterion in the

RRMS case study is in sometimes expressed as a risk of disability progression over a particular time

horizon, or sometimes as the expected time until a disability progression event. One strategy for

converting between the two is to assume a constant disability hazard θ; the risk of progressing

within a time period t is then given 208 by 1 − 𝑒−𝜃𝑡 . Indeed, this assumption and transformation

were used in preparing the RRMS dataset.

III.5.4.2 Variables with linear coefficients

This section covers variables for which linear utility coefficients are provided in the source data.

III.5.4.2.1 Linear transformations

It is straightforward to handle a linear transformation (such as a change of units) in a predictor

variable x that was analysed in linear fashion in the source study. In this case the reported linear

coefficient simply needs to be multiplied by the constant ratio of the interval size (i.e. the interval

width to which the transformed coefficient should relate divided by the the interval width over

which the original coefficient was defined). This should reflect both the number and size of the units

Chapter III.5

222

in each interval. The standard error is multiplied in the same way (multiplicative scaling by a

constant is a basic property of any standard deviation).

A particularly simple example is a change of units: a utility coefficient for weight gain, say, expressed

as a linear coefficient per 1g increase in weight needs multiplying by 1000 (along with its standard

error) to obtain a coefficient expressed per kg.

Similarly, where coefficients are reported that do not correspond to unit intervals, this must be

taken into account. Many DCEs tend to report linear coefficients corresponding to either end of an

interval centred at zero utility, eg +1.36 at x=5kg and -1.36 at x=1kg. This corresponds to a utility

coefficient of 2.72 over an interval of 4kg, i.e. a coefficient of 2.72/4 = 0.68 per kg.

In the model that follows the interval widths can be passed to the model as data, so it is not

important to standardise the widths, but the units in which they are expressed must of course be

consistent.

Where criteria are measured using count/rate outcomes the time horizon is part of the units. If

coefficients use different time horizons, it will often be appropriate to align them simply by linearly

scaling the event rate and coefficient in proportion with the time horizon. For example, a utility of

0.5 associated with an additional 1 relapse per month is equivalent to a utility of 0.5/12 associated

with an additional 1 relapse per year, assuming a constant event rate throughout the period

(although one should always consider whether any such extrapolation of time horizons is

appropriate).

III.5.4.2.2 Non-linear transformations

Non-linear transformations are problematic if all that is reported is a linear coefficient since (over an

arbitrary interval) linearity on both the original and transformed scale is impossible. Often, however,

linear coefficients are estimated using only two discrete points at either end of an interval; if these

points are known then it is possible to change the linear scale by transforming the individual discrete

values as per III.5.4.1.3 (if one is satisfied that interpolating on the new linear scale is appropriate).

If linear coefficients were estimated using more than two discrete levels, they are incompatible with

linearity on any other scale. In the RRMS dataset, for example, it is assumed that utility is linear with

regard to risk of disability progression, meaning that studies which assumed linearity with regard to

time until disability progression had to be excluded.

Chapter III.5

223

III.5.4.3 Combining categorical variables

In some instances it may be necessary to combine two or more categorical predictor variables from

a utility elicitation study to obtain the required effect. In the RRMS dataset, for example, most

studies use commonly encountered combinations of administration route and frequency to elicit

preferences with regard to administration. Two studies, however, elicited preferences for route and

frequency as two separate categorical variables 197,201. Assuming utility independence between

these two sub-criteria, and statistical independence between their estimated utility coefficients (i.e.

orthogonality of the source DCE), one can obtain the coefficients for the combined criterion by

adding (or subtracting) the coefficients for the appropriate levels of each sub-criterion, and (due to

independence) adding their variances (after changing the coding scheme of each sub-criterion if

necessary, as described above).


III.5.5.1 The concept: a generalised network meta-analysis of preferences

The basic concept behind the model is to borrow the structure from network meta-analysis and

apply it in the context of outcome preferences. Essentially this means involve using a “network” of

outcomes and estimate the relative preferences between them from previous preference elicitation

studies, analogous to the way in which relative treatment effects from individual clinical studies are

combined in a network meta-analysis.

In the linear utility framework already established in this chapter, the utility function is given by

𝑈 = ∑ 𝑒𝑔𝜔𝑥𝜔

𝜔=1

where the coefficients are estimated using an elicitation method such as those discussed earlier in

the chapter. Under this linearity assumption, the utility ratio between criteria 𝜔1 and 𝜔2, i.e. the

relative level of utility for a unit increase in 𝜔1 compared to a unit increase in 𝜔2, takes a constant

value 𝑒𝑔𝜔1/𝑒𝑔𝜔2 regardless of the starting level of either criterion. Note also that this quantity is

independent of the (arbitrary) absolute scale on which the utility function is expressed. This in turn

suggests that the utility ratio may be homogeneous regardless of the particular elicitation method or

exercise that was used in each study. Again, this is analogous to the typical situation in network

meta-analysis where it is assumed that the relative treatment effects are homogeneous between

studies even though the absolute outcomes may show considerable variation.

Chapter III.5

224

Network diagrams can be drawn analogous to those in traditional network meta-analysis, as shown

in Figure 57 for the RRMS case study. The thickness of the line connecting any 2 criteria shows is

proportional to the number of studies provided data for the relevant preference ratio.

Figure 57 - Network diagram of preference elicitation studies concerning relapsing remitting multiple sclerosis treatment outcomes.

During the process of screening and data extraction it became apparent that most of the eligible

studies were discrete choice experiments using the logit model (for more details see Appendix A).

The utility coefficients from these studies are all on the same absolute scale where the coefficients

represent log odds ratios of choice, and therefore their absolute values can be directly combined

without the need to focus on pairwise relative utility ratios. The absolute utility coefficients also

appeared to be just as homogeneous as the utility ratios. It therefore was decided that in this case

the meta-analysis could be carried out on this absolute utility scale, although this should be kept

under review in any future applications. To express preferences from other types of elicitation study

on the same scale for analysis, they are multiplied within the model by an overall scaling constant

which is assigned a vague prior.

Chapter III.5

225

The analysis is then essentially a simple matter of performing simultaneous meta-analysis on each

criterion’s (scaled) utility coefficient. However, the task is made more complicated due to (i) the

need to incorporate a meta-regression on the criteria values used in each study and (ii) the fact that

the source studies can estimate and report their coefficients in different ways with different

statistical properties, as already discussed in III.5.3 and III.5.4.

III.5.5.2 Model specification

The observed coefficients (after any necessary rebasing) are assumed to follow conditional Normal

distributions, at least in the marginal sense.

In formal terms, in the random preferences version of the model, for study 𝑖 ∈ 1,… ,𝑁𝑃𝑆 and

criterion 𝑗 ∈ 1, … ,𝑁𝑃𝑂𝑖 and discrete point 𝑘 ∈ 1,… ,𝑁𝑃𝑖𝑗 the observed utility coefficient 𝑐𝑖𝑗𝑘

conditional on the corresponding preference parameter 𝑒𝑔𝜔𝑖𝑗𝑘 and study-specific scaling constant 휁𝑖

has a Normal marginal distribution with mean 𝑒𝛾𝑖𝑗𝑥𝑖𝑗𝑘휁𝑖 and variance 𝜋𝑖𝑗𝑘2 :

𝑐𝑖𝑗𝑘 ~ 𝑁( 𝑒𝛾𝑙𝑖𝑗𝑘𝑥𝑖𝑗𝑘휁𝑖 , 𝜋𝑖𝑗𝑘

2 )

where 𝑒𝛾𝑙𝑖𝑗𝑘 ~ 𝑁(𝑒

𝑔𝜔𝑖𝑗𝑘 , (𝑒𝑔𝜔𝑖𝑗𝑘 𝜎𝑝𝑟𝑒𝑓)

2) is the random study-specific utility coefficient on the

logit scale for level k of within-study criterion j (similar to the random preferences distribution in the

ratings model - see III.3.3) and 𝜔𝑖𝑗𝑘 refers to the global criterion whose preferences relate to the

level 𝑙𝑖𝑗𝑘 . In the fixed preferences model, 𝑒𝛾𝑙𝑖𝑗𝑘 = 𝑒

𝑔𝜔𝑖𝑗𝑘 .

However, within levels of a criterion the coefficients will be correlated, as set out in III.5.4. For any

criteria with more than 2 levels it is necessary to allow for correlations of 0.5 (dummy coding) or

-1/(n-1) (effects coding) between pairs of coefficients for the same criterion within a study. In

principle either coding scheme can be used; for the sake of simplicity the model here will be

constructed based on dummy coding for all criteria with any rebasing having already been carried

out within the data. Correlations of 0.5 between coefficients must therefore be allowed for. This is

achieved using the same variance decomposition technique described earlier in II.4.4, with an

auxiliary variable representing the portion of variability than is shared between the coefficients.

III.5.5.3 Priors

Priors must be specified for the preference strengths (or utility coefficients) and random preference

standard deviation; these are the same as set out for the ratings model in III.3.3.

Chapter III.5

226

It is also necessary to specify a prior for the study-specific scaling constants 휁𝑖. For logit-model

choice studies, 휁𝑖 is assigned a fixed value of 1; for other types of study the prior used here is a vague

folded-Normal prior, i.e. 휁𝑖~𝑁+(0,100).

III.5.6 Results

III.5.6.1 Published RRMS studies only

Table 30 and Table 31 show summary statistics from the posterior distributions of the key

parameters and the residual deviance for the fixed and random effects versions of the model

repectively.

Table 30 - Posterior distribution of preferences in published RRMS preference elicitaton studies; fixed preference model

FIXED PREFERENCES 7 studies, 28 coefficient estimates in total

unit mean SE 2.5% median 97.5%


Relapse rate 1 relapse/year -1.085 0.041 -1.167 -1.085 -1.005

Disability progression 100% risk -1.465 0.063 -1.590 -1.465 -1.343

Daily oral vs daily subcutaneous N/A 0.851 0.034 0.785 0.851 0.919

Monthly infusion vs daily subcutaneous N/A 0.461 0.023 0.417 0.461 0.506

Weekly intramuscular vs daily subcutaneous N/A 0.178 0.012 0.154 0.177 0.202


Relapse rate 1 relapse/year 4.4% 0.2% 3.9% 4.4% 4.9%

Disability progression 100% risk 31.9% 0.6% 30.7% 31.9% 33.1%

Daily oral vs daily subcutaneous N/A 25.0% 0.3% 24.4% 25.0% 25.7%

Monthly infusion vs daily subcutaneous N/A 13.6% 0.7% 12.3% 13.6% 14.9%

Weekly intramuscular vs daily subcutaneous N/A 5.2% 0.3% 4.6% 5.2% 5.8%

Residual deviance N/A 373.5 3.7 368.3 372.8 382.3

Chapter III.5

227

Table 31 - Posterior distribution of preferences in published RRMS preference elicitation studies; random preference model

RANDOM PREFERENCES 7 studies, 28 coefficient estimates in total










Disability progression 100% risk 42.2% 9.7% 26.1% 41.1% 64.9% Daily oral vs daily subcutaneous N/A 37.5% 8.5% 20.8% 37.4% 54.9%



Between-study proportional preference standard deviation N/A 0.65 0.17 0.42 0.62 1.06


In both datasets the residual deviance in the fixed preference model is at least an order of

magnitude too high, indicating very poor fit; the random preferences model fits better but the

residual deviance still exceeds the number of observations, suggesting excessive heterogeneity or

non-Normality of the random preferences.

Inspection of the study-level residuals (not shown) reveals that most of the deviance is associated

with one or two outlying studies in each dataset, which do not fit well with the assumed Normality

of the random preference distribution. Exclusion of these studies improves the overall fit (see III.5.7)

III.5.6.2 Published RRMS studies and PROTECT patient choice results

Table 32 shows the results from a random preference analysis of the RRMS dataset augmented by

the inclusion of an extra study: the PROTECT RRMS patient choice study, providing coefficients (for

relapse rate and disability progression) obtained from a frequentist analysis (as already shown

alongside the Bayesian results in

Chapter III.5

228

Table 24).

Table 32 - Posterior distribution of preferences in published RRMS choice studies and summary data from PROTECT patient choice study; random preference model

RANDOM PREFERENCES 8 studies, 30 coefficient estimates in total









Relapse rate 1 relapse/year 12.7% 4.4% 5.9% 12.0% 22.9% Disability progression 100% risk 62.6% 8.5% 45.8% 62.6% 79.5%

Daily oral vs daily subcutaneous N/A 24.7% 6.8% 12.8% 24.2% 39.6%



Between-study proportional preference standard deviation N/A 0.72 0.18 0.47 0.69 1.16


The PROTECT study appears to fit well with the majority of the external studies, since including it has

increased the between-study preference heterogeneity only slightly; also, the increase in the

residual deviance is only 1.6, which compares favourably to the 2 additional observed coefficients.

The population average utility coefficients have not changed much, with the main difference being

the coefficient for disability progression, which is more than doubled when the PROTECT study is

included. The other utility coefficients are not substantially altered.

Chapter III.5

229

III.5.7 Discussion

As elicitation studies and quantitative health decision models become more widespread it is likely

that many will be interested in comparing or combining elicitation results. This method is a natural

adaptation of meta-analysis that provides a mathematical framework for this task.

The data did appear to show a higher degree of heterogeneity than one would perhaps expect or

wish to see when carrying out a meta-analysis of this kind, although with only 7 source studies in the

dataset it is difficult to make a fair assessment.

The between-study heterogeneity of preferences in the random preferences model can be

measured by the corresponding (proportional) standard deviation parameter – in other words, the

typical proportional variation in utility coefficients between studies - which was estimated at 65% in

the RRMS case study. Univariate meta-analyses often report the between-study heterogeneity using

an 𝐼2 statistic, representing the proportion of variability that is due to true heterogeneity rather than

random chance. A multivariate version of this statistic has been proposed 209 and is calculated as

𝐼2 = (𝑄 − 𝑑𝑓 + 1)/𝑄 where df is the degrees of freedom in the model (i.e. the number of

parameter estimates in the data minus the number of parameters to be estimated) and Q is a

multivariate analogue 209 of the Cochran Q-statistic 210, which is equal to the residual deviance. Here

this gives an estimate of 𝐼2 = 50% in the RRMS case study, indicating moderate to high

heterogeneity. (When the PROTECT patient choice results are included, the estimate is slightly lower

at 𝐼2 = 47%.)

One possible reason for the heterogeneity is that some of the source data were biased and/or

incompatible with one another for reasons related to study design and conduct - preference

elicitation studies are relatively novel in the field, and place unfamiliar demands on researchers and

patients alike. There are many methodological aspects of these studies – from the statistical design

and analysis to the way information is provided to participants – that could potentially colour the

results.

It must be acknowledged, of course, that poor study design and conduct are not the only possible

explanation for these findings. It may be that the assumptions do not hold, and the model’s

structure and distributions do not correctly reflect the population-wide distribution of preferences.

But even if this is so in these cases, it does not imply that these models can never be appropriate,

especially at the sub-group level. Encouragingly, most of the residual deviance in the RRMS dataset

was associated with two studies (POULOS 200 and WILSON 2014 202); removing these from the

Chapter III.5

230

dataset educes the 𝐼2 estimate to 20%. This suggests the presence of effect modifiers differentiating

these studies from the others; a cursory examination of the study reports revealed no obvious

demographic or design variables that could be responsible, but a detailed comparison has not been

undertaken. It may be that participants’ responses are affected by the way questions are framed

and the explanatory materials that are provided (for further discussion of this point see III.6.4).

There is too little data here to judge properly, but should it be the case that the studies or

populations appear fall into more than one class then potentially an approach based on mixture

models might be appropriate 211.Discrete choice and “rating-based conjoint analysis” (i.e. absolute

scenario ratings) were the only methodologies represented in the source studies. In principle any

method that gives weights or preference strengths with standard deviations/errors can be used.

Any preference estimates without a measure of uncertainty would require additional

assumptions/modelling.

There were no included choice studies that used the probit (as opposed to logit) model, which

technically results in a different utility scale. If any such studies were to be included, they could all be

given the same scale parameter, and since has been shown that the difference in results between

the two models is usually negligible 172, using the same scale parameter value for all choice studies

seems unlikely to be problematic.

No allowance was made for correlations arising as a result of imbalanced choice study designs. Any

such correlations can in principle be calculated given the full study design 212; incorporating them

would require an extension to the model.

Additionally, one could choose to allow for correlations among the (random) preferences for

different criteria, either within-or between-study. This has not been attempted at this stage in order

to keep the model simple, but the code could be adapted to allow for multivariate random

preference distributions using the kind of construction set out in II.4.4. Note that such statistical

dependence of preferences between individuals in a population is a different concept to utility

dependence, which would contradict the assumptions of the linear-additive MCDA model.

The data structure used for categorical criteria can also in principle accommodate continuous

criteria where the aim is to estimate the utility associated with a set of discrete levels (and hence

allow for non-linearity). It would be fairly straightforward to interpolate the utility between these

levels in, say, piecewise linear fashion. This does however goes somewhat beyond the scope of this

thesis, where the focus is on linear MCDA.

Chapter III.5

231

Preference elicitation in medical decision-making is still relatively novel and clearly more experience

– and larger datasets – are needed to see whether population preferences are homogeneous

enough to apply this particular method (or indeed, to apply MCDA in benefit-risk at all). The results

from these datasets are mixed, and more research in this area is needed. The consistency and

homogeneity within studies appears to be good, however, so if significant between-study

heterogeneity is confirmed it would be worth asking how it arises. The studies used here do vary

widely geographically, and RRMS varies widely in severity and in patient characteristics, so

straightforward heterogeneity may be to blame. It may also be possible, however, that the

psychological impact of the way questions are framed (eg cognitive biases of scale related to the

levels used) or the supporting information (outcome glossaries, etc) presented to participants, may

influence the results of any given study.

In providing a framework for comparing studies and making a broad assessment of their

heterogeneity, the model presented here at least presents a first step toward understanding such

issues. Should any particular study characteristic be suspected of influencing the results,

incorporating this effect via a meta-regression coefficient could provide direct evidence.

Issues of between-study heterogeneity notwithstanding, as a proof of concept this was largely a

successful exercise – the model has been clearly set out on paper and shown to function as

expected, with working code that converges and allows estimation of the parameters of interest.

Here we saw that the method could be used to combine external studies with the PROTECT choice

data – however, this required a two-stage analysis, with summary results from the first stage being

used in the second. In its current form this model could also not incorporate results from the ratings

datasets as they are not expressed on the same scale. In the next section a different model for

combining inferences from multiple preference datasets will be constructed, one capable of using

the full source data (choices and/or ratings) in the original format rather than just summary

statistics.

Chapter III.6

232

III.6 Combining preferences from different methods

Given the Bayesian models already described in III.3, III.4 and III.5 for individual elicited ratings,

individual choices and summary preference data from previous analyses respectively, the logical

next step is to establish a unified model or framework that can make combined inferences based on

two or more of those models.

The literature in this area is somewhat limited. Several studies have previously carried out parallel

tests of different methods without synthesising the combined results 174,213,192,214-216.

There is a class of models originating in marketing research, sometimes known as “hybrid conjoint”,

that seek to combine different preference data formats. However the formats they focus on are

absolute scenario ratings, which are not being used here, and directly elicited weights and partial

values (i.e. not broken down into pairwise comparisons, making the elicitation cognitively more

taxing than methods that break down the problem like AHP or swing weighting). Furthermore most

of these models are not Bayesian and do not even incorporate the elicited data fully in the

likelihood. One model from this school does use a fully Bayesian parameterisation that is similar in

concept to my goal here 217 – but again the data formats are restricted to absolute scenario ratings

and directly elicited weights and partial values. Another example incorporates choice data, and also

uses a Bayesian approach 175, but again it does not incorporate the elicited data in the likelihood,

instead using it to construct the priors. None of these models therefore can carry out a full Bayesian

analysis all of the types of preference data that have been encountered in this chapter.

Within the field of choice modelling, there was some limited early work on combining and

comparing different preference data formats, and the idea’s potential was noted 218. More recently

a paper by Zhang et al presented a combined analysis of choice and ranking data, with different (but

overlapping) sets of criteria 173. All of these examples are however non-Bayesian and therefore not

directly compatible with the models here. Choice studies have also been developed that

supplement the choice tasks with additional questions (such as rating the strength of preference for

the selected choice 25,26) but this type of approach does not allow for borrowing of information from

separate studies altogether.

Attempts to marry different preference data formats have also been made in models that formulate

MCDA very differently, for example using fuzzy mathematics 219. This is a completely different

paradigm that is not compatible with a Bayesian approach and therefore beyond the scope of this

thesis.

Chapter III.6

233

A new model for joint Bayesian analysis of multiple preference data types is therefore required. In

the Bayesian MCMC context this is reasonably straightforward to construct; given the models

already developed in this chapter, one essentially just needs to append the likelihoods together

using the same set of preference parameters. (This is one advantage of the Bayesian MCMC

approach: it is easy to specify arbitrarily complex models). The basic parameterisation used so far

within this chapter is compatible with each type of data provided that any differences in scale are

allowed for. To recap:

• Preference parameters derived from choice data (in either summary or individual format)

are on a fixed scale related to individual choice probabilities

• Preference parameters derived from criteria ratings data are on an arbitrary scale.

Again the underlying principle is that we are ultimately concerned not with the absolute utility scale

but with the values of the utility coefficients, on a relative basis, which are assumed to be

homogeneous in the population.

III.6.1 Datasets

The RRMS case study will be used to test the approach. The aim is to generate preference weights

for (i) the outcomes synthesised in Chapter II and (ii) the administration route and frequency for

each of the treatments (as shown in Table 4). We have already come across usable preference data

for some of these, but not others, as shown in Table 33.

Table 33 – Treatment outcomes and administration modes for the RRMS treatments, and the availability of corresponding preference data. IR = investigator ratings dataset, PR = patient ratings dataset, PC = patient choices dataset, PS = preference synthesis dataset.

Preference data available?

IR PR PC PS

Criteria for which outcomes have been synthesised: Relapse rate Yes

Risk of disability progression Yes

Risk of liver enzyme elevation Yes

Risk of serious gastrointestinal disorders No

Risk of serious bradycardia No

Risk of macular edema No

Administration routes and frequencies: Daily oral (self-administered) Yes

Daily injection (self-administered) Yes

1-3x per week injection (self-administered) Yes

All four datasets therefore have something to contribute to the overall network. However, they also

contain several other criteria that are not directly relevant to the case study. To avoid having to deal

Chapter III.6

234

with too cumbersome a model it would be convenient to drop some of these irrelevant criteria if

possible. However, the following points should be considered:

• For choice data, it is not advisable to drop any criteria; the analysis should include a

parameter for every criterion in the original dataset, since the impact of any excluded

variables on choice may manifest as additional uncertainty on the remaining parameter

estimates.

• For ratings data and preference meta-analysis data, criteria can sometimes be discarded but

it is necessary to pay close attention to the connectivity of the network structure, as

illustrated in Figure 58.

Figure 58 – Example of combined preference network. The tree structure (top left) might arise from swing weighting; the web structure (bottom left) from AHP or preference meta-analysis. These combine to give a more complex overall structure (right). Suppose the aim is to obtain weights for criteria A, B, C and D. Then X should certainly not be removed as this would leave D disconnected. Y can safely be removed without affecting the estimated weights. Discarding Z would still permit the weights to be estimated but would discard useful data.

Chapter III.6

235

In addition there may be concerns over the validity of some data, for example with the RRMS patient

ratings for continuous outcomes where the scale was not made clear. Table 34 applies this logic to

the RRMS case study and sets out whether each of the criteria encountered in each preference

dataset should be included in the overall model.

Chapter III.6

236

Table 34 – Overall RRMS preference model: Criteria from each dataset for inclusion/exclusion

Dataset RRMS criteria to include RRMS criteria to exclude Notes

PROTECT patient ratings (see III.3.2.2)

Administration mode (daily oral, daily injection, weekly injection, monthly infusion)

Relapse Disability progression Liver enzyme elevation PML Herpes reactivation Seizures Flu-like reactions Infusion/injection

reactions Allergic/hypersensitivity

reactions Serious allergic reactions Depression

All continuous criteria excluded owing to doubts over validity due to unclear scale in elicitation questions.

PROTECT patient choices (see III.4.2)

Relapse Disability progression PML Allergic/hypersensitivity

reactions Serious allergic reactions Depression

(None) Validity of choice model relies on all criteria being included

PROTECT investigator ratings (see III.3.2.1)

Relapse Disability progression Liver enzyme elevation PML Allergic/hypersensitivity

reactions Infusion/injection

reactions Administration mode

(daily oral, daily injection, weekly injection, monthly infusion)

Herpes reactivation Seizures Flu-like reactions Congenital abnormalities

Many outcomes are not relevant and can be excluded as they are only located on isolated branches of the investigator value tree [CROSS REF] and hence provide no indirect information on other outcomes. Monthly infusion, PML and allergic/hypersensitivity reactions are retained because they feature in the patient ratings and/or patient choices, and thus may provide indirect evidence on the main criteria of interest. Infusion/injection reactions is retained because it provides the only link in the value tree between allergic/hypersensitivity reactions and the other criteria.

Published RRMS preference studies (see III.5.2)

Relapse Disability progression Administration mode

(daily oral, daily injection, 1-3x weekly injection, monthly infusion)

Various depending on source study (see Appendix A)

“Weekly injection” expanded to 1-3x weekly due to heterogeneous definitions in source studies.

Chapter III.6

237

Although messy, this very piecemeal collection of datasets is well suited for illustrating the method’s

ability to combine fragments of data from different sources.


The model simply combines the components we have seen previously into this chapter (i.e. a choice

model, a ratings model and a preference synthesis model) in order to specify the joint likelihood of

observing all of the adopted datasets. The underlying preference parameters to be estimated are

shared between all of the components but there is no need to allow for any correlation or

dependence between the likelihoods provided that no individuals took part in more than one of the

elicitation studies (which seems a reasonable assumption in this case).

As discussed earlier in the chapter, the various components should be compatible in terms of the

utility scale – if any choice data is included in one or more components (as in the choice and

preference synthesis models here) then the overall utility is fixed to the scale where a unit

represents the log odds of choice; if not then the scale remains arbitrary. Either way, normalised

weights can be obtained. In the RRMS case study, the presence of choice data means that the

overall utility remains fixed to the logit-of-choice scale.

III.6.2.1 Fixed/random preferences

The amount of heterogeneity to allow in the preference parameters is no longer a simple binary

choice between fixed and random preferences. There are several places in the model where a

“fixed” or “random” parameterisation can be used, as shown in Figure 59.

Chapter III.6

238

Figure 59 – Hierarchical structure of the preference data, indicating the levels where random preference distributions can be used.

This leads to a multitude of possible variations on the model; it would take up too much space (and

not be particularly informative) to go through them all here. This section will focus on two versions:

• Fixed preferences (every participant’s preferences are equal to the population average)

• Study-level random preferences only (preferences vary between studies but not between

participants in a single study)

Individual-level random preferences will not be used. The model selection is based partly on the

practical grounds that the choice model runs prohibitively slowly with individual-level data (see

III.4.3.2) but can also be justified on a statistical basis, since the results from the ratings datasets

indicate that the fixed-preference model fits equally well (and should therefore be favoured due to

its relative simplicity119).

Individual-level

random preferences

Study-level random

preferences

Population average

preferences

Average preferences

PROTECT investigator

ratings

Participant 1

Participant 2

Participant 3

PROTECT patient ratings

Participant 1

...

Participant 36

PROTECT patient choices

Participant 1

...

Particopant 124External study 1

...

External study 7

Chapter III.6

239

III.6.2.2 Combining the datasets

Each dataset uses a different set of criteria (although each overlaps with at least one other, giving a

connected network overall). To run the models simultaneously a master set of outcomes (the union

of the outcome sets for each dataset) is used to define the underlying preference strength

parameters; indexing vectors are then used to pick out the appropriate parameters for each analysis.

It is easier to specify the model if categorical variables follow the same coding scheme in every

dataset they appear in, but this may not always be strictly necessary.

For continuous criteria, care needs to be taken to ensure the outcome units to which the

preferences relate are equivalent; simple transformations may be necessary, sometimes requiring

additional assumptions (see III.5.4.2 for a discussion of the issues). In this instance:

• In the ratings datasets all criteria weights are based on comparisons of individual events

over a timescale that is unspecified (but presumably equivalent for all criteria)

• In the choice and preference synthesis datasets relapses are measured as a 1-year average

(i.e. expected) rate while the other outcomes are expressed as risks (i.e. binary expectations)

over a 2-year period.

In the second instance the time horizon of the relapse criterion is half that of the other criteria, while

in the first is the time horizons are (presumably) equal. To adjust for this the relapse utility

coefficient estimates from the ratings datasets need to be doubled.

III.6.2.3 Predictive distributions of preferences

In the results shown in this chapter so far I have focused on the population-level averages. As with

the clinical outcome synthesis in II.6.3.2, one can also allow for between- and within-study variability

in the posterior distributions if desired.

The study-level predictive distribution is obtained by drawing simulations from the random

preference distribution, with the between-study standard deviation estimated by the model. The

individual-level predictive distribution is obtained by drawing from a Normal distribution centred on

the study-level average. In a model with random preferences by participant, the standard deviation

of this distribution should be the within-study between-participant preference standard deviation.

The models fitted here all assume fixed preferences by participant, however, and so there is no

difference between the predictive distribution of preferences at the study level and participant level.

Suppose however for a moment that the assumption of fixed preferences within studies was

incorrect. Any true between-participant heterogeneity within studies in the preference meta-

analysis dataset would be reflected in the variances of the coefficient estimates, together with the

Chapter III.6

240

within-participant (between-choice) random variability of utility. An estimated upper bound for the

between-participant heterogeneity can therefore be derived based on the average variance of the

coefficients from the external preference studies (this parallels the approach used in Chapter II). For

consistency with the parameterisation used so far, the standard deviation for a given utility

coefficient is expressed as a constant proportion of the mean.

Any true between-participant heterogeneity within studies in the ratings datasets would be swept

up in the ratings standard deviation parameter, together with the within-participant random

variability of ratings. The ratings standard deviation can therefore be used as an estimated upper

bound for the within-study between-participant heterogeneity in the ratings datasets. It should

however be multiplied by 1/√2 (i.e. halving the variance, to account for the fact that a rating

consists of judgements on 2 criteria and the predictive distribution of a preference parameter

reflects the variability of only 1 criterion); note also that the use of this parameter means that the

predictive distribution should be defined on the log scale.

These upper bounds on the heterogeneity within the ratings and preference meta-analysis datasets

will be used in parallel to estimate the predictive distribution of preferences at the individual level.

Any true between-participant heterogeneity in the choice dataset, although it will be reflected in the

posterior distributions, cannot be so easily captured within the model (without additional

calculations) as it is not directly represented within any existing variables.

III.6.3 Results

The BUGS code and data used to generate these results are provided in Appendix B.

III.6.3.1 Patient choices and preference synthesis

Table 35 and Table 36 show the posterior distributions of the key parameters and variables from

models combining the patient choice dataset with the published preference studies, with fixed

preferences and random (by study) preferences respectively. Computing 200,000 iterations

(100,000 for burn-in and 100,000 for the posterior estimates) in OpenBUGS (version 3.2.2 rev 1063)

on a Microsoft Surface Book 2 (i5-8350U 1.70 GHz quad core) running Windows 10 took 201 seconds

for the fixed preference model and 275 seconds for the random preference model, which does not

seem excessive for an MCMC analysis.

Chapter III.6

241

Table 35 - Posterior distribution of preferences based on published RRMS choice studies and full data from PROTECT patient choice study; fixed preference model

FIXED PREFERENCES 7 summary-data studies with 28 coefficient estimates 1 full-data study with 124 participants and 1755 choices



Relapse rate 1 relapse/year -1.14 0.04 -1.22 -1.14 -1.06 Disability progression 100% risk -1.61 0.07 -1.74 -1.61 -1.49 PML 100% risk -266.0 24.7 -314.8 -265.9 -217.9 Allergic/hypersensitivity reactions 100% risk -0.67 0.14 -0.94 -0.67 -0.39 Serious allergic reactions 100% risk -31.31 3.32 -37.83 -31.30 -24.83 Depression 100% risk -2.39 0.65 -3.66 -2.39 -1.12 Daily oral vs daily subcutaneous N/A 0.92 0.03 0.85 0.92 0.98 Monthly infusion vs daily subcutaneous N/A 0.48 0.02 0.43 0.48 0.52 Weekly intramuscular vs daily subcutaneous N/A 0.19 0.01 0.17 0.19 0.22

Normalised preference weights for synthesised RRMS outcomes and treatment administration modes

Relapse rate 1 relapse/year 31.0% 0.6% 29.8% 31.0% 32.2% Disability progression 100% risk 44.0% 0.6% 42.8% 44.0% 45.1% Daily oral vs daily subcutaneous N/A 25.0% 0.3% 24.3% 25.0% 25.7% Monthly infusion vs daily subcutaneous N/A 13.0% 0.6% 11.8% 13.0% 14.3% Weekly intramuscular vs daily subcutaneous N/A 5.2% 0.3% 4.6% 5.2% 5.8%

Choice model residual deviance N/A 491.9 5.7 481.1 491.8 503.4 Preference synthesis residual deviance N/A 379.3 6.2 370.1 378.4 393.8 Total residual deviance N/A 871.2 4.6 864.1 870.6 882.1

Chapter III.6

242

Table 36 - Posterior distribution of preferences based on published RRMS choice studies and full data from PROTECT patient choice study; random (by study) preference model

RANDOM PREFERENCES (by study; fixed within studies)

7 summary-data studies with 28 coefficient estimates 1 full-data study with 124 participants and 1755 choices



Relapse rate 1 relapse/year -1.47 0.56 -2.84 -1.36 -0.72 Disability progression 100% risk -7.29 3.08 -13.83 -6.71 -3.99 PML 100% risk -268.1 90.8 -491.1 -251.4 -140.7 Allergic/hypersensitivity reactions 100% risk -25.07 33.99 -121.20 -12.55 -0.43 Serious allergic reactions 100% risk -39.66 4.42 -48.40 -39.63 -31.11 Depression 100% risk -5.11 0.89 -6.87 -5.10 -3.37 Daily oral vs daily subcutaneous N/A 2.77 0.79 1.61 2.64 4.67 Monthly infusion vs daily subcutaneous N/A 0.63 0.32 0.27 0.56 1.44 Weekly intramuscular vs daily subcutaneous N/A 0.56 0.48 0.10 0.41 1.87


Relapse rate 1 relapse/year 13.1% 4.7% 5.9% 12.4% 24.1% Disability progression 100% risk 62.2% 8.4% 45.8% 62.1% 79.0% Daily oral vs daily subcutaneous N/A 24.7% 6.6% 13.0% 24.2% 39.0% Monthly infusion vs daily subcutaneous N/A 5.6% 2.7% 2.3% 5.1% 12.4% Weekly intramuscular vs daily subcutaneous N/A 5.0% 4.1% 0.9% 3.7% 16.3%

Between-study proportional preference standard deviation N/A 0.65 0.14 0.45 0.63 0.98 Choice model residual deviance N/A 94.2 3.5 89.4 93.6 102.7 Preference synthesis residual deviance N/A 45.4 6.5 34.8 44.8 59.9 Total residual deviance N/A 139.7 7.4 127.2 139.0 155.8

Again the residual deviance (compared to the number of observations) reveals very poor fit for the

fixed preference model but good fit for the random preference model.

It is worth noting that the preference weight posteriors obtained from this (random preference)

model and dataset, which includes the full PROTECT choice data, are in close agreement with those

Chapter III.6

243

obtained in III.5.6.2 where the same data was included in summary form. This is a reassuring finding

that indicates the overall model structure and parameterisation is behaving appropriately. The only

substantial difference between the two sets of results the residual deviance, which is higher here

than in III.5.6.2 because the incorporation of full individual-level data creates more scope for

observations to deviate from the within-study averages.

III.6.3.2 Full model – patient choices, investigator ratings, patient ratings and preference

synthesis

Table 37 and Table 38 show the posterior distributions of the key parameters and variables from

models combining all relevant preference data, with fixed preferences and random (by study)

preferences respectively. Computing 200,000 iterations (100,000 for burn-in and 100,000 for the

posterior estimates) in OpenBUGS (version 3.2.2 rev 1063) on a Microsoft Surface Book 2 (i5-8350U

1.70 GHz quad core) running Windows 10 took 567 seconds for the fixed preference model and 388

seconds for the random preference model, which again does not seem excessive for an MCMC

analysis of this kind.

Chapter III.6

244

Table 37 - Posterior distribution of preferences based on all preference datasets; fixed preference model

FIXED PREFERENCES 7 summary-data studies with 28 coefficient estimates 3 full-data studies with 163 participants, 231 ratings and

1755 choices




Disability progression 100% risk -1.59 0.06 -1.71 -1.58 -1.46 PML 100% risk -247.0 26.1 -298.5 -247.0 -196.0 Liver enzyme elevation 100% risk -71.52 48.70 -198.90 -59.46 -15.37 Allergic/hypersensitivity reactions 100% risk -0.75 0.13 -1.01 -0.75 -0.50 Serious allergic reactions 100% risk -32.77 3.34 -39.35 -32.75 -26.27 Depression 100% risk -2.37 0.65 -3.64 -2.37 -1.10 Infusion/injection reactions 100% risk -6.40 3.80 -16.17 -5.51 -1.86 Daily oral vs daily subcutaneous N/A -0.90 0.03 -0.97 -0.90 -0.83 Monthly infusion vs daily subcutaneous N/A -0.47 0.02 -0.52 -0.47 -0.43 Weekly intramuscular vs daily subcutaneous N/A -0.19 0.01 -0.21 -0.19 -0.16



Disability progression 100% risk 3.0% 2.0% 0.8% 2.5% 8.4% Liver enzyme elevation 100% risk 93.1% 4.6% 81.1% 94.3% 98.2% Daily oral vs daily subcutaneous N/A 1.7% 1.1% 0.4% 1.4% 4.7% Monthly infusion vs daily subcutaneous N/A 0.9% 0.6% 0.2% 0.8% 2.5% Weekly intramuscular vs daily subcutaneous N/A 0.4% 0.2% 0.1% 0.3% 1.0%

Ratings standard deviation N/A 1.33 0.06 1.21 1.33 1.46 Ratings model residual deviance N/A 230.0 21.5 189.8 229.3 274.1 Choice model residual deviance N/A 535.8 6.0 525.0 535.4 548.8 Preference synthesis residual deviance N/A 379.4 6.2 370.1 378.5 394.1 Total residual deviance N/A 1145 22.3 1104 1144 1191

Chapter III.6

245

Table 38 - Posterior distribution of preferences based on all preference datasets; random (by study) preference model

RANDOM PREFERENCES (by study; fixed within studies)

7 summary-data studies with 28 coefficient estimates 3 full-data studies with 163 participants, 231 ratings and

1755 choices




Disability progression 100% risk -7.26 2.21 -12.66 -6.85 -4.35 PML 100% risk -245.3 75.3 -431.6 -231.9 -139.2 Liver enzyme elevation 100% risk -21.22 23.23 -84.71 -13.88 -1.93 Allergic/hypersensitivity reactions 100% risk -5.92 7.22 -24.93 -3.64 -0.47 Serious allergic reactions 100% risk -39.48 4.39 -48.07 -39.46 -30.91 Depression 100% risk -5.10 0.88 -6.83 -5.10 -3.38 Infusion/injection reactions 100% risk -19.31 35.30 -107.50 -8.50 -1.01 Daily oral vs daily subcutaneous N/A -2.72 0.64 -4.21 -2.63 -1.72 Monthly infusion vs daily subcutaneous N/A -0.72 0.31 -1.50 -0.66 -0.33 Weekly intramuscular vs daily subcutaneous N/A -0.66 0.43 -1.76 -0.54 -0.17



Disability progression 100% risk 28.8% 13.5% 7.0% 27.5% 56.9% Liver enzyme elevation 100% risk 53.7% 20.3% 14.9% 54.9% 88.1% Daily oral vs daily subcutaneous N/A 11.0% 5.4% 2.7% 10.4% 23.0% Monthly infusion vs daily subcutaneous N/A 2.9% 1.7% 0.6% 2.5% 7.0% Weekly intramuscular vs daily subcutaneous N/A 2.6% 2.0% 0.4% 2.0% 7.9%

Ratings standard deviation N/A 1.20 0.06 1.09 1.19 1.32 Between-study proportional preference standard deviation N/A 0.58 0.09 0.44 0.57 0.79 Ratings model residual deviance N/A 229.9 21.4 190.1 229.3 273.7 Choice model residual deviance N/A 94.4 3.6 89.5 93.7 103.3 Preference synthesis residual deviance N/A 45.8 6.6 34.9 45.1 60.8 Total residual deviance N/A 370.1 22.7 327.8 369.4 416.2

Based on the residual deviances, the random preference model clearly achieves a much better fit to

the data than the fixed preference model. Introducing the ratings data has reduced the between-

study preference standard deviation slightly compared to the model in Table 36, and the preference

weights remain in roughly the same proportions (with the exception of liver enzyme elevation,

which was not included in the previous model since it only appears in the ratings data).

Chapter III.6

246

III.6.3.3 Predictive distributions of preferences

The preceding tables represent the posterior distributions of the population-average preference

parameters.

Figure 60 shows the posterior 95% credibility intervals of the preference weights from the full model

(with random preferences by study) at the following levels of predictive variability (see III.6.2 for

more details):

• Population averages

• Study-level averages; includes between-study variability

• Individual preferences (1); includes between-study variability and an upper bound estimate

of between-participant variability in the preference meta-analysis dataset

• Individual preferences (2); includes between-study variability and an upper bound estimate

of between-participant variability in the ratings dataset

For administration, only the overall weight (i.e. corresponding to the utility difference between daily

oral and daily subcutaneous) is shown.

Figure 60 – Forest plot showing the posterior predictive distributions of preference weights in the full RRMS preference model, at various levels of predictive variability. The two versions of the individual-level predictive distributions are based on upper bounds for the individual-level variance and so the width of the distributions may be overstated.

Chapter III.6

247

Increasing the level of predictive variability adds to the uncertainty on weights, widening the

credibility intervals and also appears to shift the means slightly towards the null (the point where all

criteria are equally weighted, i.e. 25% each in this example).

The upper bound estimate (1) for the between-patient heterogeneity in the preference meta-

analysis dataset only increases the overall uncertainty very marginally above that resulting from

between-study heterogeneity, supporting the assumption that preferences are fixed within studies.

The upper bound estimate from the ratings dataset has more of an impact, but the estimated

standard deviation parameters seen earlier in Table 23 suggest that most of this impact must have

been due to within-participant random errors rather than between-participant heterogeneity.

Overall therefore the assumption of fixed preferences within studies appears to be sound.

III.6.4 Discussion

Overall this has provided a successful initial demonstration of the method’s capabilities for

combining preference data; coherent posterior distributions have been obtained which appears to

reflect a combined average of the individual analyses. For example, consider the estimated utility

ratio of disability progression to relapse: in the preference synthesis model, this is 2.2III.5.6.1), in the

choice model it is 14.6 (III.4.4), in the ratings model it is roughly two thirds (III.3.4.3, remembering to

double the relapse coefficient as per III.6.2.2). The model that combines the first two estimates

gives a ratio of 5 (III.6.3.1); then bringing in the third estimate gives a ratio of 4.5 (III.6.3.2). In other

words the combined estimates are always in between the original estimates.

It is also encouraging to see the equivalence of the weights obtained in III.5.6.3 and III.6.3.1, where

the same PROTECT source data is analysed in two different formats. In the first instance the

PROTECT data was included in summary form in the preference meta-analysis model; in the second

instance the PROTECT data was analysed in a binomial logit choice model and combined with the

results of the preference meta-analysis. This is evidence that the novel constructions of the

preference meta-analysis model and the combined preference model are consistent with the

established principles of choice models.

These models provide the basis for a framework to combine disparate sources of preference data; or

simply to make comparisons and evaluate heterogeneity. I believe this constitutes a useful step

forward in the field of preference modelling. Unlike most previous work in this area the underlying

datasets do not need to be based on precisely the same set of criteria. The model has been built

with generalisibility to different datasets in mind, hopefully facilitating further applications, but due

to the complexity of the overall structure there are some elements of the code that would need

some adaptions in order to be used elsewhere.

Chapter III.6

248

The model fit is generally poor for fixed preferences, but adequate to excellent for random

preferences at the study level. Insofar as attempts have been made to incorporate random

preferences at the individual participant level (i.e. in the ratings datasets) there seemed to be little

resulting improvement in model fit; further investigation of this issue in the choice dataset was

abandoned as the random preference model was found to run prohibitively slowly. As discussed

above, the evidence seems to suggest that the assumption of fixed within-study preferences was

appropriate here; nevertheless it might be interesting to examine the impact of allowing for random

preferences at the individual level in other datasets.

The between-study (proportional) preference standard deviation parameter was estimated at 58% in

the full model, a little less than the 65% estimate from the external preference studies alone. These

are proportional figures, i.e. the standard deviation of a utility coefficient is the given percentage of

the mean. This suggests that the preferences from the PROTECT datasets are consistent with the

external studies and the already revealed level of inconsistency. The multivariate 𝐼2 statistic209 is not

straightforward to calculate for the full preference model, however, as the Q-statistic no longer

coincides with the residual deviance (since the latter also reflects within-study heterogeneity) .

There has been little other work on Bayesian parametric utility modelling within health-related

fields. One model worthy of note is the model put forward by Musal & Soyer for directly analysing

utility coefficients, which uses an interesting parameterisation whereby the likelihood of he

coefficients is characterised using beta distributions220,221. As here, a Dirichlet prior is used. However,

for data the model uses utility coefficients with no error bounds, which is to ignore any random

error or uncertainty introduced by the original elicitation method (or even bias, as the results in

III.3.4 suggest). It would be interesting to see whether Musal & Soyer’s parameterisation could

extend to the elicitation data formats used here, but this may not be straightforward.

In principle results from any elicitation methods involving either choices or cardinal relative ratings

can be accommodated in my model. One could also incorporate scenario rankings by expressing

them as a series of choices170. In principle it would not be difficult to adapt the model to incorporate

absolute (Likert) ratings of scenarios, which have been combined with choice data elsewhere 175,222.

This was not done here because no such data was available (indeed I am not aware of such

methodology having been used for preference elicitation with regard to treatment outcomes).

Methods for ordinal criteria ratings have also been proposed 51,223 and it is possible that these (or

similar) approaches could be incorporated into the framework if this form of data is available.

Chapter III.6

249

Another data type not included is that arising from best-worst scaling 224. Analysis of such data can

be fairly complex; nevertheless, probabilistic models do exist 171,225 and it may be possible to

incorporate them in this Bayesian framework in future.

The fixed preference model fits just as well as the random preference model when only ratings and

choices are combined (results not shown), which is not surprising as (excluding the small number of

investigator ratings) the studies that produced these datasets were carried out by the same study

team (the PROTECT Patient & Public Involvement team 193) and in the same population (multiple

sclerosis patients at a London clinic). It is when the external studies are brought in that the

heterogeneity increases.

Heterogeneity ultimately indicates that one or more effect modifiers (that is, variables that influence

the relative level of preference for differenct outcomes) are unevenly distributed between the

source studies. Any such variation between studies might relate to one or more of the following

types of effect modifying variable:

• Population characteristics - the distribution of patients’ demographic and/or clinical

attributes may vary between studies.

• Geographical characteristics – physical factors such as climate and societal factors (such as

cultural attitudes or aspects of the healthcare system) may contribute to heterogeneity

when studies are carried out in distinct geographical locations.

• Study design factors – aspects of study design such as the type of elicitation task (choices vs

ratings, etc), the way questions are framed, and the choice of criteria and levels can vary

between studies and may influence the preference estimates 40. Studies may appear to use

the same clinical criteria but define them as emerging over different time horizons. Any

disparity among studies with regard to the covariates that are used to adjust the results may

also have an impact. Different recruitment methods may result in studies having patient

groups with different characteristics even if conducted in exactly the sme population.

• Study conduct factors – Elicited preferences may be influenced by the wording of

explanatory materials (intructions and glossaries) provided to participants and the

style/depth of involvement of facilitators 40. It seems likely that this may be a significant

contributor to heterogeneity of preference elicitation studies, both in the RRMS case study

and more generally, since the elicitation tasks can be cognitively demanding and require

extensive background knowledge, and participants will often refer extensively to any

provided guidance in order to help make sense of the tasks.

Chapter III.6

250

In the results shown in this chapter, the heterogeneity between studies is measured by the

between-study preference standard deviation, estimated (in the random effects model) as 58% on

the proportional scale. Assuming Normality, this implies that about 95% of studies will have

preference parameters within 116% of the overall mean, which might sound like quite a lot of

heterogeneity given that a 100% deviation corresponds to a doubling (or halving) of the parameter

value. However, bear in mind that 58% is considerably lower than the within-participant ratings

standard deviation, which was estimated at 101% on the equivalent scale (see III.3.4.3). In light of

this the between-study heterogeneity appears quite reasonable.

These results from the RRMS case study are fairly encouraging, and hopefully goes some way to allay

concerns about the validity of benefit-risk preference modelling at the population level. It is not

clear to what degree the heterogeneity represents true variability in preferences, or reflects aspects

of study design and execution. Note that many potential data sources (several external preference

studies and most of the PROTECT patient ratings dataset) had to be excluded due to the potential for

range insensitivity bias (see III.1.3.3); were it not for these exclusions the heterogeneity could have

been much higher. Equally many of the included studies may have had flaws that were not

recognised, and the heterogeneity may have been far less if all of the studies were perfect. It

remains probable that there is also some variability in preferences between segments of the

population, as has been found in other elicitation studies226-228.

One could see any heterogeneity as a problem for preference modellers, but my overriding view is

that the ability to directly assess and quantify the uncertainty on preference weights – and (in this

case at least) to find it within fairly reasonable limits given their subjective nature and the diversity

of study populations and elicitation techniques – will be something of a boost to the growing field of

benefit-risk assessment. Although there is noticeable heterogeneity between the RRMS preference

datasets, the weights do appear to represent a deeper truth than simply the arbitrary whims of the

study participants.

Nevertheless, attempts to aggregate/synthesise preferences for decision making purposes should try

to avoid heterogeneity whenever possible by ensuring that literature reviews include only those

studies that are most relevant to the problem and the target population. Where heterogeneity

cannot be avoided, it may be helpful to explicitly model the impact of any effect modifiers (using

meta-regression, for example) or to examine subgroups of studies/patients to identify more

homogeneous classes. If this is not possible, then random effects (preferences) models can be used

and measures of heterogeneity presented as part of the results, as has been done here.

Chapter III.6

251

Looking at the preference weights for the RRMS outcomes, one could argue that the preference

weight on liver enzyme elevation appears too high. On the face of it, these results appear to suggest

that a liver enzyme elevation event is more serious than a disability progression event, even though

the former may be transient and not translate into long-term liver damage, whereas the latter is

effectively permanent by definition. This is not an artefact of the methodology but a reflection of

the preferences expressed by the participants. It is worth bearing in mind that many patients and

clinicians may deem it more important to avoid harm by action (i.e. adverse events due to

treatment) than harm by inaction (i.e. lack of efficacy) as the latter is the part of the natural course

of the disease. Whether such a view is rational is a philosophical point that I cannot provide a

general resolution to here, but such issues may need to be considered by decision makers when

weighing up decision criteria.

The equal weighting of criteria in the prior has had a clear impact on the results. This could be seen

when comparing the ratings model results with a deterministic analysis (see III.3.4). In the combined

preference model, although there is no deterministic analysis to serve as a comparison, assigning

unequal prior weights does substantially change the posterior estimates (results not shown). It wuld

therefore appear that the selection of priors is something that will need to be considered carefully in

Bayesian preference elicitation.

The impact of the between-study heterogeneity on the posterior distribution of the preference

weights can be seen in Figure 60. Another conseqyuence of the heterogeneity is that the central

estimates of the preference weights can vary considerably depending on which datasets are

included, as shown in Figure 61. It is not yet clear to what extent the uncertainty/heterogeneity of

weights impacts upon the overall value of the different treatment options, however. This will be

explored in the next chapter.

Chapter III.6

252

Figure 61 – Preference weights (posterior means) for the key benefit-risk criteria, for three different combinations of the source datasets.

Ultimately the model can now provide estimated RRMS outcome weights based on all of the

available preference data of sufficient quality. A sense of the magnitude of preference

heterogeneity among the population has also been obtained, which will be important when

interpreting and assessing the importance of the overall results. The final step in assessing the

benefit-risk balance, then, is to combine the preference distributions with the treatment outcome

distributions that were estimated in Chapter II. This combination of performance data and

preference data is at the heart of MCDA, and will be the subject of Chapter IV.

Chapter III.6

253

IV. Assessing the overall benefit-risk balance

Having obtained the posterior distribution of treatment outcomes (Chapter II) and outcome

preferences (Chapter III) the final step in assessing the benefit-risk balance is to put these pieces

together and evaluate the overall utility associated with each treatment.

In essence all that is required is to combine the clinical outcomes synthesised in Chapter II with the

preferences elicited/synthesised in Chapter III, but the devil is in the detail of how this is done given

the particular outcome definitions that have been used in each source study. This section will go

through the issues encountered and strategy followed in the RRMS case study, much of which is

expected to also be relevant to other applications. But each benefit-risk assessment is different and

complex, with its own idiosyncrasies in the criteria and data, and it is difficult to set out a generalised

procedure or anticipate all of the issues that may be encountered in other applications. .

The “zeroes” outcomes in the RRMS case study (see II.4.7) will not be used here since (i) no

preference data was available and (ii) this would rather bias the benefit-risk assessment against

fingolimod, the only treatment with which two of these outcomes are associated. These outcomes

were only included in Chapter II to illustrate the model’s capability to synthesise such treatment-

specific adverse events.

Chapter IV.1

254

IV.1 Methods

IV.1.1 High level model structure

Figure 62 shows the overall modelling structure, putting together the clinical evidence modules from

Chapter II and the preference module from Chapter III.

Figure 62 – High level structure of the entire benefit-risk assessment model.

IV.1.2 Selection of outcomes and model versions

In the RRMS case study, a key issue to be settled is the question of which outcome definitions to use.

The preferences have been defined at the criteria level, but most of the criteria can be measured by

two or three different outcomes (as shown in Figure 10).

For the clinical evidence synthesis, fixed mappings with one group for each criterion will be used (i.e.

three mapping groups). This means that whichever outcome is chosen to represent a criterion in the

BR assessment, information on the other outcomes for that criterion is also incorporated, and

treatment rankings for that criterion are always the same. Applying the mappings this way achieves

what I would argue is a reasonable degree of “smoothing” (i.e. strongly within criteria, but not

between criteria) and can be thought of as an insurance policy against selecting the “wrong”

Chapter IV.1

255

outcome and hence overlooking key trends in another similar one. However, this strategy may not

be viable in all cases, as it depends on the pattern of missingness/patchiness in the evidence

synthesis data; sometimes different approaches to mapping will be necessary. Moreover, this

mapping strategy does not completely address the problem. The rankings within a criterion may be

unaffected by the choice of outcome, but the magnitude of (cardinal) differences between

treatments is still very much outcome-dependent. If we are to quantify the level of preference

associated with these differences, then care needs to be taken to ensure the preferences are

assigned to the same outcome as that to which the elicited preferences originally related (or as close

as possible).

It is of course impossible to know exactly what outcome definitions participants had in their heads

during an elicitation exercise, but it is usually possible to access the wording of the elicitation

questions together with any definitions, notes or glossaries that were provided. The suggested

strategy therefore is to use whatever outcome definition (out of those for which clinical outcome

data can be synthesised) seems to best fit the wording in the elicitation materials. Again, this will

require careful judgement on a case by case basis. Table 39 sets out the logic that was applied for

the criteria in the RRMS case study.

Table 39 – Identification of outcomes to which preferences relate for the criteria in the RRMS case study.

Criterion Outcomes Outcome to which preferences assumed to relate, and why

Relapse Annual relapse rate (ARR) Relapse-free proportion over 2 years (RFP)

ARR – this measure was implied by the elicitation questions in the PROTECT datasets and by the reporting in the external studies.

Disability progression Proportion experiencing disability progression, confirmed 3 months later (DP3) Proportion experiencing disability progression, confirmed 6 months later (DP6)

DP3 – this is the definition more commonly adopted within the source studies.

Liver enzyme elevation Proportion with ALT above upper limit of normal range (ALT1) Proportion with ALT above 3x upper limit of normal range (ALT3) Proportion with ALT above 5x upper limit of normal range (ALT)

ALT1 – the elicitation questions did not specify the level at which enzymes were considered to be “elevated”, but arguably this is the most literal interpretation.

Chapter IV.1

256

It seems clear however that the interpretation of outcome definitions (especially imprecise ones) by

elicitation participants may vary, and I would suggest that attaching alternative outcomes to the

preferences should be a key focus for sensitivity analyses, especially in cases where there is

significant doubt such as the liver enzyme elevation outcomes in the RRMS case study.

The full preference model with random preferences by study will be used, as per III.6.3.2.

Since the absolute level of clinical outcomes may vary considerably between populations, particular

attention should be paid to the data sources used for the population calibration module so that the

result of the decision is appropriate to the levels of outcomes observed in the target population. It

would also be sensible to take similar caution with the data sources used for preferencs, which may

vary in the population at large.

Another important question for a probabilistic benefit-risk assessment is the level of predictive

variability that is to be included – in other words, whether the aim is to assess the distribution of the

benefit-risk balance in terms of population-level averages, study-level averages or with full

individual-level variability. Here all three levels will be presented. It seems sensible to always be

consistent and use the same level of predictive variability for both preferences and clinical

outcomes. The individual-level predictive distributions use approximate upper bounds on the

individual-level variability, as described in II.5.3 for the clinical evidence synthesis and 0 for

preferences (method 1 for individual preferences will be used here).

Once the outcomes, models and level of predictive variability have been chosen, the calculations

required in order to assess the overall benefit-risk balance of the treatments are straightforward:

1. Pick out the utility coefficient parameters (with the selected level of predictive variability)

from the preference module. Only select those that relate to the benefit-risk criteria, not

any preference parameters for other criteria in the datasets.

2. Normalise the parameters from step 1, i.e. convert to weights

3. For each treatment and criterion, the weighted partial utility is calculated as the product of

a. the weight from step 2; and

b. the selected outcome for that criterion, on the absolute scale (from the population

calibration module), with the selected level of predictive variability.

4. For each treatment, the overall utility or benefit-risk score is the sum across all criteria of its

weighted partial utilities from step 3.

Chapter IV.2

257

The overall benefit-risk score is on an arbitrary utility scale (which may include negative utilities) as I

am not following the convention of restricting utility to the interval [0,1]. For further discussion of

the utility parameterisation see III.2.1.1.

These calculations can easily be incorporated within the model, and this is strongly recommended as

it is allows the exact posterior distribution of every calculated quantity to be reported. Rankings

based on the benefit-risk score, and the associated SUCRA statistics120 (see II.5.4), can (and should)

also be calculated in situ. Recall however that SUCRAs give no information on the magnitude (and

hence significance) of the differences in score, and so posterior summaries of the benefit-risk score

should also be presented.

IV.2 Results

The BUGS code and data used to generate these results are provided in Appendix B.

The code is generalised at the “within-dataset” level in that it is not tailored to the dimensions of the

clinical evidence dataset, the PROTECT choice dataset, the PROTECT investigator ratings dataset, the

PROTECT patient ratings dataset or the preference network meta-analysis dataset. It is not

generalised to any combination of datasets, as the number of datasets here does not warrant it and

it would hinder the comprehensibility of the code.

Computing 200,000 iterations (100,000 for burn-in and 100,000 for the posterior estimates) in

OpenBUGS (version 3.2.2 rev 1063) on a Microsoft Surface Book 2 (i5-8350U 1.70 GHz quad core)

running Windows 10 took 1008 seconds, or just under 17 minutes – longer than for either the clinical

evidence synthesis or preference models in isolation, but still not unreasonably long from a practical

perspective.

IV.2.1 Benefit-risk scores

Table 40 shows the posterior distribution of each treatment’s population average overall utility (i.e.

the benefit-risk score) broken down by to show the contribution of each criterion. Scores for

relapse, disability progression and liver enzyme are relative to a notional perfect treatment on which

these outcomes never occur. Scores for administration are relative to the baseline category (daily

subcutaneous injection).

The posterior total utility or benefit-risk score for each treatment is shown in Figure 63 for all three

levels of predictive variability.

Chapter IV.2

258

Table 40 – Benefit-risk score by treatment, with breakdown by criterion. Figures are population average posterior means and (standard deviations).

Relapse Disability

progression

Liver enzyme

elevation

Administratio

n

TOTAL

Placebo -0.045 (0.030) -0.075 (0.041) -0.055 (0.028) 0.111 (0.055) -0.066 (0.056)

Dimethyl fumarate -0.024 (0.016) -0.053 (0.03) -0.081 (0.043) 0.111 (0.055) -0.049 (0.069)

Fingolimod -0.022 (0.015) -0.057 (0.032) -0.164 (0.082) 0.111 (0.055) -0.134 (0.103)

Glatiramer acetate -0.029 (0.019) -0.055 (0.031) -0.061 (0.031) 0 (0) -0.146 (0.039)

Interferon beta-1a

(intramuscular) -0.035 (0.023) -0.062 (0.034) -0.092 (0.048) 0.027 (0.020) -0.162 (0.052)

Interferon beta-1a

(subcutaneous) -0.033 (0.022) -0.049 (0.029) -0.155 (0.082) 0.027 (0.020) -0.21 (0.076)

Interferon beta-1b -0.027 (0.018) -0.046 (0.027) -0.147 (0.076) 0.027 (0.020) -0.194 (0.073)

Laquinimod -0.036 (0.024) -0.055 (0.031) -0.106 (0.054) 0.111 (0.055) -0.088 (0.076)

Teriflunomide -0.031 (0.021) -0.057 (0.033) -0.102 (0.054) 0.111 (0.055) -0.082 (0.077)

Dimethyl fumarate has the highest average benefit-risk score, followed by placebo and then

teriflunomide and laquinimod. The three interferon-based medicines have the lowest scores.

Figure 63 – Posterior benefit-risk score for RRMS treatments at three levels of predictive variability. The markers and lines indicate the mean and 95% credibility limits. PL = placebo, DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, SC = subcutaneous interferon beta-1b, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide.

Chapter IV.2

259

Note the long left tail in the predictive distributions (especially at the individual level). This is

because in the model the annual relapse rate outcome has a lower bound of zero but no upper

bound. In reality however an RRMS patient cannot experience an unlimited number of relapses in a

year and it may be sensible to apply a cap to the modelled rates. Since this study concerns first-line

treatments, in Figure 64 the annual relapse rate has been capped at 3 on the assumption that any

patients with more severe relapse rates would be eligible for more aggressive second-line therapies.

Figure 64 – Posterior benefit-risk score for RRMS treatments at three levels of predictive variability, with a maximum of 3 relapses per year. The markers and lines indicate the mean and 95% credibility limits. PL = placebo, DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, SC = subcutaneous interferon beta-1b, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide.

The first point to note is that there is considerable overlap in the credibility intervals, so no

treatment can be declared an outright winner or loser. This is particularly true of the study- and

individual-level predictive distributions (which are naturally wider due to the additional variability)

but there is also overlap between the population-average distributions. This should not be too

surprising: all of these drugs have proven their clinical value in trials and are in use by real-world

patients, but as with all disease-modiying therapies they can only ever have a limited impact on

RRMS symptoms. Furthermore these treatments are known to be in fairly close clinical equipoise,

with the appropriate choice of treatment for a patient depending on their own individual experience

of benefits and harms93. This is particularly relevant when considering the overall benefit-risk

balance (rather than any individual outcome), as the drugs with the highest efficacy may not

perform as well on safety.Although dimethyl fumarate has the best average benefit-risk score, the

extent of the overlap means that for any given patient (or subgroup of patients), it is possible that

Chapter IV.2

260

any one of the treatments in the case study is the optimal one. As such, this analysis does not

present any compelling evidence for regulators to remove any of these treatments from the market.

Another point of interest is the position of placebo, which perhaps surprisingly has the second-best

score on average. Bearing in mind again the extent of the overlap, this does not mean that the

treatments which score less are always worse than placebo for every patient; it may however

suggest that the average patient (at least, out of those who took part in the preference elicitation

studies) places high values on safety and administrative convenience (i.e. the criteria on which

placebo outperforms all other treatments).

IV.2.2 Rankings

Figure 65 shows the SUCRA statistic 120 based on the overall benefit-risk score rankings at three

levels of predictive variability, with the annual relapse rate capped at 3 (i.e. the same distributions

shown in Figure 64.

Chapter IV.2

261

Figure 65 - SUCRA statistic for the overall benefit-risk score of the RRMS treatments at three levels of predictive variability. PL = placebo, DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, SC = subcutaneous interferon beta-1b, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide.

Chapter IV.2

262

The SUCRAs reflect the same underlying distributions as the forest plots in Figure 64 but more

clearly emphasise the relative ranks of the treatments. However, they do not convey any

information about the difference in score between ranks, which may not correspond to a substantial

change in overall utility.

Preferences for administration modes appear to have been a key driver of the results, with the four

lowest-scoring treatments all being injectable and the five highest-scoring all orally delivered.

IV.2.3 Sensitivity analyses

IV.2.3.1 Impact of exclusion of criteria from decision model

The treatment scores and rankings are considerably altered if the decision is not based on the full set

of criteria. The figures below show the (population average) SUCRA statistic by treatment for three

models with restricted sets of criteria.

Figure 66 – SUCRA statistic by treatment based on population-average benefit risk score; efficacy outcomes only. PL = placebo, DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, SC = subcutaneous interferon beta-1b, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide.

Based on efficacy alone, the best-performing drug is interferon beta-1b, followed by dimethyl

fumarate, fingolimod and subcutaneous interferon beta-1a (Figure 66).

Chapter IV.2

263

Figure 67 – SUCRA statistic by treatment based on population-average benefit risk score; liver safety only. PL = placebo, DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, SC = subcutaneous interferon beta-1b, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide.

For liver safety alone, the best treatment is placebo, followed by glatiramer acetate, dimethyl

fumarate and intramuscular interferon beta-1a (Figure 67). The rankings are in fact almost reversed

from the efficacy-only results, except in the cases of dimethyl fumarate and glatiramer acetate,

revealing their strength as all-round performers.

Figure 68 – SUCRA statistic by treatment based on population-average benefit risk score; efficacy and liver safety outcomes (but not administration). PL = placebo, DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, SC = subcutaneous interferon beta-1b, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide.

If efficacy and safety are included but the administration mode criterion is not (Figure 68) then the

injectable treatments fare somewhat better in the rankings compared to the main results (Figure

Chapter IV.2

264

65), particularly glatiramer acetate which edges into first place very slightly ahead of dimethyl

fumarate.

IV.2.3.2 Impact of choice of outcomes for weighting

The figures below show the (population average) SUCRA statistic by treatment when, for disability

and liver enzyme elevation, the outcome to which the preference weight is assumed to relate is

changed from the default set out in Table 39. For relapses the elicitation questions did not leave as

much room for ambiguity with regard to the outcome definition, so this criterion has not been

subject to the same kind of sensitivity analysis.

IV.2.3.2.1 Disability progression

Figure 69 - SUCRA statistic by treatment based on population-average benefit risk score; disability progression weight relates to disability progression events confirmed 6 months later (rather than 3 months later in the main results). PL = placebo, DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, SC = subcutaneous interferon beta-1b, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide.

Changing the meaning of the disability progression preference strength so that it relates to a

progression confirmed 6 months later (Figure 69) rather than 3 months later (main results,Figure 65)

has scarcely any effect on the SUCRAs.

Chapter IV.2

265

IV.2.3.2.2 Liver enzyme elevation

Figure 70 - SUCRA statistic by treatment based on population-average benefit risk score; liver enzyme elevation weight relates to alanine aminotransferase above 3x upper limit of normal range (rather than simply above upper limit of normal range as in the main results). PL = placebo, DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, SC = subcutaneous interferon beta-1b, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide.

Changing the meaning of the liver enzyme elevation preference strength so that it relates to a

elevation of alanine aminotransferase above 3x the upper limit of the normal range (Figure 70)

rather than 1x the upper limit (main results, Figure 65) has the effect of reducing the overall weight

on liver safety outcomes (since the more serious enzyme elevation is rarer and the magnitude of

difference between the rates on different treatments is less). This reduces the overall score for

placebo and glatiramer acetate (the two safest treatments) while giving a boost to other active

treatments, especially fingolimod (the worst performer on liver safety).

Chapter IV.2

266

Figure 71 - SUCRA statistic by treatment based on population-average benefit risk score; liver enzyme elevation weight relates to alanine aminotransferase above 5x upper limit of normal range (rather than simply above upper limit of normal range as in the main results). PL = placebo, DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, SC = subcutaneous interferon beta-1b, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide.

Changing the meaning of the liver enzyme elevation preference strength so that it relates to a

elevation of alanine aminotransferase above 5x the upper limit of the normal range (Figure 71)

rather than 3x the upper limit (Figure 70) further boosts the overall rankings for fingolimod,

subcutaenous interfreron beta-1a and interferon beta-1b, the three worst drugs for liver safety. This

time their improvement in the rankings comes not only at the expense of placebo and glatiramer

acetate but also the other active treatments. However, it is not enough to change the overall

success of dimethyl fumarate.

IV.2.3.3 Impact of exclusion of preference datasets

The figures below show the (population average) SUCRA statistic by treatment when each of the

preference datasets is excluded in turn.

Chapter IV.2

267

Figure 72 - SUCRA statistic by treatment based on population-average benefit risk score; preferences from published studies excluded. PL = placebo, DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, SC = subcutaneous interferon beta-1b, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide.

Excluding the published summary data from external preference elicitation studies resulted in less

weight being placed on safety and administration, severely diminishing placebo’s standing and

changing some of the rankings for the active drugs (Figure 72) compared to the main results (Figure

65), but still not changing dimethyl fumarate’s overall lead.

Figure 73 - SUCRA statistic by treatment based on population-average benefit risk score; PROTECT patient choice dataset excluded. PL = placebo, DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, SC = subcutaneous interferon beta-1b, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide.

Removing the patient choice data from the preference model (Figure 73) had little impact on the

overall rankings compared to the main model (Figure 65).

Chapter IV.3

268

Figure 74 - SUCRA statistic by treatment based on population-average benefit risk score; PROTECT ratings datasets excluded. PL = placebo, DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, SC = subcutaneous interferon beta-1b, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide.

Removing the ratings data from the preference model (Figure 74) shifted the preference weights

towards favouring safety, putting placebo in the lead overall and giving a boost to glatiramer acetate

while most other active drugs fared less well than in the main results (Figure 65).

IV.3 Discussion

IV.3.1 Bayesian MCDA

As far as I know, this represents the first MCDA-based benefit-risk assessment in which every one of

the clinical outcome and preference parameters are inferred from real-world evidence in a single

comprehensive fully Bayesian model. The decision problem was somewhat simplified compared to

real benefit-risk problems, as I did not attempt to incorporate an exhaustive set of safety criteria in

order to prevent the model becoming too cluttered and cumbersome to make an effective example;

however I have shown in II.4.7 how additional safety outcomes could be included.

The model has been designed to be generalizable to other datasets of arbitrary dimensions with few

changes required. Coding complex multi-stage multivariate models such as this can be a daunting

and time-consuming task, and although one must be realistic regarding how “user friendly” MCMC

modelling can ever be, I hope that providing a template will facilitate and encourage further

applications.

Chapter IV.3

269

IV.3.2 Benefit-risk assessment of RRMS treatments

It is important to bear in mind that this is not a comprehensive BR assessment due to the narrow

scope of safety outcomes included (although the results obtained for liver safety do in some respects

appear very similar to those in the Cochrane review based on treatment adherence 81, as noted in

II.6.5).

Nevertheless, it may be worthwhile to consider the clinical implications of the results. Of the RRMS

drugs investigated, dimethyl fumarate has the best posterior benefit-risk score, and is the only active

treatment outperforming placebo. This is because it has a large effect size (although not the largest)

on the efficacy outcomes and a small effect size (although not the smallest) on the liver safety

outcomes, as well as having the most favourable administration method (daily oral). Owing to this

good all-round performance, the high SUCRA value for dimethyl fumarate is for the most part very

insensitive to the choice of outcomes for weighting and the inclusion/exclusion of the various

criteria and preference datasets. However, if the weight on safety is sufficiently increased then it is

outperformed by placebo, and if the administration outcome is excluded it is outperformed by

glatiramer acetate. The rankings in general appear fairly robust to the sensitivity analyses, although

there are some changes, especially with regard to the ranking of the placebo option. If the model is

used for real-world regulatory performances, a more thorough sensitivity analysis may be

worthwhile, subjecting the model inputs to further alternate scenarios and perhaps examining the

thresholds required to change the decision outcome.

It should be recognised that the uncertainty of the benefit-risk score for each treatment is large in

relation to the differences between treatments. In other words, there is considerable overlap in the

distributions of the benefit-risk score and there are no universal winners and losers. This is

especially true when one allows for predictive variability at the study or patient level, and serves as a

reminder that patient outcomes and preferences do vary and there is no one-size-fits-all treatment.

Based on these results there is no reason to conclude that any of these treatments should be

withdrawn altogether. What these results do suggest is that the higher-ranking treatments (such as

dimethyl fumarate and teriflunomide) should perhaps be offered to patients in the first instance

(since on average they provide the highest utility) with lower-ranked treatments being kept as

reserve options. It should be borne in mind however that this analysis includes only clinical trial

data; when a treatment has been licensed for some time then a more complete picture of the real-

world safety and effectiveness might be obtained from post-marketing data (and indeed some RRMS

treatments have been withdrawn on the basis of such data81,229). Also, treatment safety and

administration may have significant impact on patient satisfaction and these potential impacts

should not be overlooked in decision making.

Chapter IV.3

270

The figures in IV.2.3.1 are a good illustration of how drastically the conclusions change depending on

whether one considers efficacy or safety in isolation, or a combination of the two, or also in

combination with administration modes. This demonstrates the importance of a multi-criteria

decision-making approach that can incorporate all aspects of treatment that matter to patients.

Chapter V.1

271

V. Conclusions

The research question posed in Chapter I of this thesis was:

“Can a modelling framework be developed that facilitates a fully Bayesian implementation of MCDA

for benefit-risk decision making; with parameters for clinical outcomes and associated preferences

directly informed by real-world data, and reflecting the uncertainties inherent in such data, while

respecting all relevant correlations and consistency relations?”

I believe that the model developed for the RRMS case study provides an affirmative answer to this

question. This chapter aims to place this result in context and consider its implications for research

and practice in the field of medical decision making.

V.1 Summary of results

The bullet points under the headings below give a high-level summary of the results described

earlier in this thesis.

V.1.1 Bayesian synthesis of clinical evidence for benefit-risk assessment (Chapter II)

Chapter II demonstrated a number of key methodological results that together enable principled

multivariate evidence synthesis of clinical outcomes for benefit-risk assessment:

• Development of a multivariate network meta-analysis model with full allowance for within-

and between-study outcome correlations. The study-specific treatment effects are defined

relative to a universal fixed baseline treatment, avoiding a potetntial asymmetry issue with

the more common parameterisation that allows the baseline treatment to vary by study.

• Extension of the multivariate NMA model to incorporate between-outcome mappings at the

mean level, useful for patchy data networks (i.e. summary trial data with a high degree of

missingness and heterogeneity in the outcome definitions).

• A novel construction of the multivariate Normal distribution via decomposition of the

variance, useful for specifying the multivariate NMA model in arbitrary dimensions in the

BUGS language.

• An extension to the multivariate NMA model allowing the inclusion of outcomes that are

known (or assumed) never to occur on certain treatments.

• Application to a multivariate outcome synthesis for several relapsing-remitting multiple

sclerosis treatments, revealing how the treatment effects (relative to placebo) compare with

regard to relapse, disability progression and liver safety outcomes.

Chapter V.1

272

• Estimation of the absolute-level distribution of outcomes within trial populations for the

same set of RRMS treatments, resulting in synthesised outcomes suitable for decision-

making using MCDA. The synthesis provides not only population averages but also the

predictive distribution of outcomes at the study or patient level.

V.1.2 Bayesian multi-criteria utility modelling (Chapter III)

In Chapter III, the work done in unifying preference elicitation methods in a probabilistic manner

resulted in some important methodological insights. These included an elucidation of the network

structure of two commonly used preference elicitation methods based on pairwise criteria ratings

(AHP and swing weighting), and establishing a general framework within which they both represent

special cases.

The main methodological achievement in Chapter III was a unified parametric model, based on the

assumption of linear additive utility, for Bayesian analysis of elicited criteria ratings data and choice

data from individuals, Bayesian meta-analysis of summary preference data from published studies,

and aggregation of (and examination of heterogeneity among) these sources of preference data.

Chapter III also provided some results regarding real-world multiple sclerosis patient preferences via

an application of the model to several sources data in various formats. Inference of multiple

sclerosis patient preferences for relapses, disability progression, liver enzyme elevation and

administration mode was performed, not just in terms of the population average but also allowing

for predictive variability at the study or participant level.

The criteria were ranked as follows, from highest to lowest weight: 1 liver enzyme elevation event, 1

disability progression event, oral vs injectable administration, 1 relapse per year.

There was notable heterogeneity in preferences between studies but not much evidence of within-

study heterogeneity.

V.1.3 Assessing the overall benefit-risk balance (Chapter IV)

Chapter IV combined the models developed in Chapters II and III to give a fully Bayesian evidence-

based multi-criteria decision analysis model for choosing between medical treatments. The model

was applied to a set of relapsing-remitting multiple sclerosis treatments, providing an assessment of

the overall benefit-risk balance.

Based on the outcomes and data that were included in the model, the treatment with the most

favourable overall balance was dimethyl fumarate, followed by teriflunomide. The administration

mode was highly influential on the results, with other oral treatments also performing well and

Chapter V.2

273

injectable treatments generally assessed as unfavourable. The rankings for dimethyl fumarate and

teriflunomide were fairly robust to sensitivity analyses on subjective aspects of the model structure.

While there were clear differences between the mean benefit-risk scores for the various treatments,

there was considerable overlap when allowing for uncertainty, especially if predictive variability at

the study and/or patient level was allowed for.

V.2 Strengths

The particular modelling approach used in this thesis has a number of key strengths that make it

well-suited to benefit-risk assessment and other medical decisions:

• The Bayesian MCMC environment facilitates simultaneous modelling of all variables so that

the uncertainty of the data is automatically propagated to the model outputs, an advantage

of the methodology that has been noted elsewhere 230.

• The models are motivated by, and constructed around, real-world examples where the

available data do not necessarily conform to ideal standards. In particular, the real-world

practicality of any MCDA-based benefit-risk model depends on its ability to analyse patchy

clinical evidence and incorporate preferences from diverse sources. This thesis was able to

address many of the key limitations of earlier work in this area 40,53

• The models can accommodate many different kinds of clinical outcomes, correlation

structures and preference elicitation formats with few changes required to the BUGS code.

V.3 Limitations

The limitations of the models have already been discussed within the preceding chapters, but an

overview will be provided here.

Bayesian statistics is not the only paradigm for modelling uncertainty, and totally different

approaches to uncertainty in MCDA have been used elsewhere, such as fuzzy sets 231 or stochastic

multicriteria acceptability analysis (SMAA) 30. However I believe that in the context of medical

decision-making the Bayesian approach makes particular sense due to its compatibility with the way

clinical evidence is gathered and analysed, and its ability to reflect any evidence that is available and

fill in any gaps with prior assumptions.

I have not provided any models that incorporate individual patient data (IPD) in the clinical evidence

synthesis, largely owing to the lack of data in the public domain. Availability of IPD for real benefit-

risk assessments is improving, however. Manufacturers will usually have such data when carrying

out their own assessments, and they are obliged to provide the regulator with IPD from pivotal

Chapter V.3

274

clinical trials when submitting a drug licence application in the EU 232. IPD has advantages over

summary data in that it reveals the shape of the within-study distribution of outcomes and

treatment effects, giving additional insight into the probability of key of key clinical or decision

thresholds being reached at the individual patient level, and allowing examination of heterogeneity

in the patient population. In particular, the benefit-risk balance in stratified subgroups can be

examined, potentially allowing more clarity in how the new drug fits in with licensing needs and

prescribing guidelines for different classes of patients, and thus allowing more specific conclusions

and recommendations to be reached. The main disadvantage of IPD is that these additional analyses

may be much more labour-intensive compared to an analysis based on summary data. Insofar as

IPD from some studies need to be aggregated with summary data from others, evidence synthesis

methods such as those in this thesis may still be required, and in such cases the available IPD may

provide useful information for the model, particularly with regard to estimating the within-study

correlations. For these reasons I believe there is still value in the summary-data approach used in

this thesis, despite the increasing availability of IPD.The mapping-based imputations in the

multivariate network meta-analysis model operate under the assumption that treatment effects in

the published study populations follow the same distribution as in the general target population.

This leaves the model open to the influence of publication bias if it turns out that any outcomes are

unreported due to poor treatment performance. While this is of course a simplifying assumption

that one must bear in mind when interpreting the results, it is not unique to this model - the same

assumption is implicitly made in any review of clinical evidence that does not control for publication

bias. Estimating the mechanism underlying outcome missingness from benefit-risk assessment

data is likely to be difficult (especially given the typical number of studies), but it may be useful to

test the sensitivity of the results to the possibility of publication bias, for example by imposing a

fixed penalty on estimated unreported outcome measures.

The mappings also rely on at least one outcome in each mapping group being present in all included

studies, thus limiting the patchiness of data that can be included and/or the number of groups that

can used.

I have not made any assessment of the reliability of the studies included in the clinical evidence

synthesis, choosing instead to rely on the screening performed by the Cochrane reviewers 81.

Similarly, I have not reviewed the literature for any trials carried out since the publication of the

Cochrane review, nor sought out any post-marketing surveillance data.

I have not set out any formal methods for evaluating inconsistency in the evidence (with regard to

treatment effects, mappings or preference ratios), despite the importance of the assumption of

Chapter V.4

275

consistency that underlies the models. Nor have I undertaken any formal assessment of

inconsistency in the RRMS case study. Evaluating inconsistency is an area that has received much

attention with regard to univariate network meta-analysis, and existing approaches could be

extended to the models developed here.

The preference modelling here was limited to linear-additive MCDA, but in principle the model could

easily be extended to other forms of additive utility function. That is, the utility function could take

the form

𝑈 = ∑ 𝑤𝑒𝑖𝑔ℎ𝑡𝜔𝑃𝑉𝐹𝜔(𝑥𝜔)

𝜔=1

Where 𝑃𝑉𝐹𝜔 is any pre-sepcified monotonic partial value function for criterion 𝜔, This is simply a

transformation of the data and requires no significant alteration of the methodology. The utility

coefficients would be estimated as in the linear case, but they would be coefficients of 𝑃𝑉𝐹𝜔(𝑥𝜔)

rather than 𝑥𝜔. (This assumes the source data can be expressed on this transformed scale – if linear

coefficients are quoted with no reference to the underlying points used for elicitation, this may not

be possible). Going beyond the additive model and allowing utility functions with interaction terms

is likely to be more challenging unless all contributing data sources use the same model. 130

With regard to the use of elicited preference data, I have not been concerned in this thesis with

issues concerning the elicitation study design, choice of method, whose preferences to use (except

insofar as directly relates to the statistical analysis). These issues have been discussed elsewhere 82

233.

I have barely scratched the surface in terms of examining preferences among groups of individuals

and the philosophy of utilitarian decision making at the group level. There is a substantial literature

on these topics in terms of both theory 184,185,234 and application 235-238, and the citations given here

are just a few examples out of many.

V.4 Reflections on generalisability & applicability

These models, although developed to be generalisable, require further testing as they have almost

exclusively been tested on only a single problem (the RRMS case study). The only exception is the

preference meta-analysis model (III.5) which is currently also being tested on a small type II diabetes

dataset, where the initial results suggesting it fits poorly. This may be indicative of the poor quality

of that dataset rather than the suitability of the model, but this emphasises the need for additional

test applications.

Chapter V.4

276

The aim was to code the models in a manner that would work on arbitrary datasets with no hard

coded parameters or dimensions. The individual models for each specific data type largely met this

requirement, but the overall model did not, due to the sheer multiplicity of combined data

structures that could in theory be incorporated. However, the model remains easily adaptable.

The use of elicited preferences in formal models is sometimes seen as controversial due to their

subjective nature, and some have raised concerns that a high degree of heterogeneity should be

expected 239. These concerns may be well-founded, or not – the only way to know for sure is to

actually try modelling the preferences arising from different elicitation studies, and the methods

developed here are well suited to this task. As long as the heterogeneity of preferences is within

reasonable limits (or homogeneous subgroups can be identified) then there would seem to be some

merit in the MCDA approach and the model can provide estimates of the necessary parameters.

Even in the absence of extensive data on heterogeneity, the model could help by simulating the

predictive distribution of the benefit-risk balance under various scenarios – for example, one could

answer questions such as “how much between-study preference heterogeneity is required to make

treatment X more favoured than treatment Y at least 50% of the time?”

With regard to the RRMS case study in particular, the preferences have a substantial amount of

between-study heterogeneity, but I do not believe that this is sufficient grounds to call the whole

modelling approach into question. Indeed, it may actually be an argument in favour of Bayesian

preference estimation. When heterogeneity is present it does of course increase the variability of

the benefit-risk balance and therefore make the task of the decision maker more complex, but the

advantage of Bayesian MCMC is that it is straightforward to incorporate this variability in the results

via the predictive distribution. And, of course, using a Bayesian MCDA model does not create this

heterogeneity, it merely reveals it. Using a simpler decision making approach might obscure this

aspect of the problem altogether and lead to poor decisions.

The validity of the assumption of an additive linear utility function (i.e. mutual utility independence

of the decision criteria) is not guaranteed. Elicitation studies can be designed to test this assumption

by including interaction coefficients in the utility model and testing their significance. Some have

argued that additive linearity is usually a reasonable assumption 180 whilst others have argued for a

more cautious approach, recommending that criteria are tested for any violations of this

assumption and the impact assessed40.

Another important structural assumption is that of proportionality between outcomes, which

underlies the mappings in the clinical evidence synthesis. This seems a reasonable starting point but

Chapter V.5

277

other non-proportional links between outcomes could instead be used if evidence or logic suggests

this would be more appropriate. Developing a test of the proportionality in the data may be of use

here.

V.5 Contribution to the field

This work has made several contributions to the state of the art in evidence synthesis, benefit risk-

assessment and preference modelling.

I believe the clinical evidence synthesis model represents the most powerful and flexible

multivariate NMA yet published, with full rigour when it comes to correlations. Furthermore, the

mapping-based approach to outcome imputation adds strength and helps to fill in patchy data

networks, an important development that should facilitate the application of multivariate NMA to

real-world situations where perfect data structures may not be available. I sometimes refer to this

methodology as “patchwork meta-analysis” since it uses meta-analytical techniques to patch

together fragments of data that might at first seem incompatible. Generalised model code has been

provided in Appendix B with a view to aiding future applications, and I am exploring possibilities for

hosting the code files online.

Several important advances have also been made with regard to the probabilistic modelling of

preferences for medical multi-criteria decisions. Firstly, the features of various preference elicitation

methodologies that relate to their statistical properties have been elucidated (such as whether the

data consists of ratings or rankings, the role of the network structure, and the use of substitution or

agglomeration to move through the network). Secondly, statistical models have been identified or

developed for the analysis of the most common types of individual-level elicitation data. Thirdly, a

unified Bayesian parametric framework has been proposed that can make joint inferences using

several such models in combination. Finally, a meta-analytical framework for comparison and

aggregation of summary preference data from previously published studies has been developed. All

of these novel approaches have been demonstrated using real-world preference data for RRMS

treatments.

In isolation, these are significant contributions to the field of probabilistic multi-criteria medical

decision making. Taken together, they demonstrate the feasibility and power of a holistic evidence-

based Bayesian modelling approach, and an important step forward in terms of the statistical rigour

and sophistication that can be applied to such problems.

Various other Bayesian models involving elicited preferences or utilities have appeared in the

literature, sometimes in the benefit-context. This work however stands out in terms of the rigour

Chapter V.6

278

with which the uncertainty is characterised and the range of data types that can be accommodated.

An alternative Bayesian model for uncertainty on elicited weights treats the preference weights

themselves as observations drawn from exchangeable distributions 178 but, unlike my model (where

the observations are the original ratings or choices) this approach does not account for asymmetries

that may result from the elicitation method. Others have also used a Bayesian analysis of choice

data in order to elicit preferences for benefit-risk assessment 240. However, this thesis goes further

by showing that such an analysis can be combined with other types of elicitation data using a

common parameterisation. Non-Bayesian probabilistic MCDA benefit-risk models have also

appeared: one such model obtained the uncertainty on weights using a bootstrap technique33, but

this procedure cannot be incorporated in a single-step Bayesian MCMC model and would not extend

easily to aggregation of multiple somewhat heterogeneous preference datasets. Another approach

is to elicit the uncertainty level alongside the central preference estimates27 but this requires

specially designed elicitation studies (and may not be compatible with many elicitation methods).

The Bayesian approach stands out for its ability to account for uncertainty in a principled manner in

a variety of data structures based on standard elicitation methods.

Bayesian MCMC modelling is a specialist field, and hence any one of the models discussed in this

thesis may be demanding to apply in practice. However, in pivotal benefit-risk asessments, many

drug manufacturers may already be using some form of MCDA, network meta-analysis and/or

Bayesian MCMC modelling to support their applications to regulators or for internal decision-

making. Given a familiarity with these methods, it should be quite feasible for some real-world

decision makers to combine them into an integrated model along the lines I have laid out here.

V.6 Future research priorities

Trying all of these methods on more datasets is naturally of key importance. Convergence of MCMC

algorithms may vary between datasets 241, and the validity of the model assumptions cannot be

taken for granted.

In terms of further development of the model there are some obvious extensions that could be

attempted, including:

• Incorporating other types of outcome such as multinomial or time-to-event variables in the

clinical evidence synthesis.

• Altering the way the mappings are applied so that there is no single “baseline” outcome in

each mapping group that must be present in every study.

Chapter V.6

279

• Extending the preference models to allow for correlations between criteria preference

strengths – these correlations can be incorporated easily using the coding technique

described in II.4.4 but were omitted here for the sake of simplicity.

• Incorporating additional preference data formats such as absolute ratings of multi-criteria

scenarios (aka conjoint analysis) or ordinal pairwise criteria ratings.

• Providing measures of the consistency of the treatment effects (where the evidence

network contains loops), outcome mappings (i.e. whether the proportionality assumption

holds) and preferences from different sources.

Other targets for future research include:

• Simulation studies to examine the models’ performance more systematically across a range

of possible data structures.

• A more complete benefit-risk assessment of RRMS treatments, with a set of criteria that

better reflects the safety profile of the drugs than the simplified case study adopted here.

• Checking whether the assumption of preference independence holds in real patient

populations.

• Further examination of the homogeneity/heterogeneity of preferences in the patient

population, and the implications for MCDA-based benefit-risk assessment. The picture may

be different across different disease areas. Investigating subgroups (using latent class

analysis, for example 228,242-244) may reveal predictable structures underlying any

heterogeneity.

• Clarification of best practice for elicitation, including consideration of how the framing of

elicitation questions can influence the results40 and how this can best be accounted for in

both study design and interpretation.

It will also be important to raise awareness the existence of these methods to decision makers such

as pharmaceutical companies and regulators. Alongside this, however, it may be necessary to

develop methods that simplify implementation of the models. Although I have made efforts to keep

the models as generalisable and user-friendly as reasonably possible, it must be recognised that

running such complex models in current Gibbs sampling software is a task that inevitably requires

some expertise. Ultimately, if the approach’s worth can be proved through further case examples

and examination of the above issues, then it may be sensible to try to simplify implementation by

developing automated software routines (such as an R package, or a standalone program).

However, this is far from a trivial task given the unpredictability of MCMC convergence241.

Chapter V.7

280

V.7 Concluding summary

Assessment of the benefit-risk balance of treatments, and medical decision-making in general, can

be put on a more formal footing using multi-criteria decision analysis with explicit value judgements.

However this method is largely unfamiliar in the health sciences and its reliability and technical

capabilities have not been properly evaluated.

This thesis successfully shows that a Bayesian MCMC approach can successfully address many of the

technical challenges involved in jointly modelling clinical and preference variables, and provides a

framework for constructing fully probabilistic MCDA models for comparing treatments in terms of

several conflicting clinical outcomes. It also provides an illustration, via the RRMS case study, that

preferences arising from multiple study populations and elicitation methods can be combined in a

coherent model. These are significant steps forward in terms of MCDA modelling in healthcare,

since evidence-based medicine should include evidence from all reliable sources.

Nevertheless, the reliability and practicality of using this approach for decision-making requires

further research, with key priorities including further investigation of the distribution of preferences

within patient populations, establishment of reliable elicitation practices, and easing the

implementation of Bayesian MCMC simulation.

281

References

1. Sackett DL, Rosenberg WMC, Gray JAM, Haynes RB, Richardson WS. Evidence based medicine: what it is and what it isn't. BMJ. 1996;312(7023):71-72.

2. EMA. ICH guideline E2C (R2) on periodic benefit-risk evaluation report (PBRER). 2012. 3. Mackay FJ. Post-marketing studies: the work of the Drug Safety Research Unit. Drug safety.

1998;19(5):343-353. 4. Levine MN, Julian JA. Registries That Show Efficacy: Good, but Not Good Enough. J Clin

Oncol. 2008;26(33):5316-5319. 5. Hughes D, Waddingham E, Mt‐Isa S, et al. Recommendations for benefit–risk assessment

methodologies and visual representations. Pharmacoepidemiology and Drug Safety. 2016;25(3):251-262.

6. Hughes DA, Bayoumi AM, Pirmohamed M. Current assessment of risk-benefit by regulators: is it time to introduce decision analyses? Clin Pharmacol Ther. 2007;82(2):123-127.

7. Bridges JFP, Hauber AB, Marshall D, et al. Conjoint Analysis Applications in Health—a Checklist: A Report of the ISPOR Good Research Practices for Conjoint Analysis Task Force. Value in Health. 2011;14(4):403-413.

8. Marshall D, Bridges JFP, Hauber B, et al. Conjoint Analysis Applications in Health — How are Studies being Designed and Reported? The Patient: Patient-Centered Outcomes Research. 2010;3(4):249-256.

9. EMA. Benefit-risk methodology. 2009; https://www.ema.europa.eu/en/about-us/support-research/benefit-risk-methodology. Accessed 05/05/2019, 2019.

10. PROTECT. About PROTECT. 2009; http://www.imi-protect.eu/about.shtml. Accessed 16/01/2019, 2019.

11. Coplan P, Noel R, Levitan B, Ferguson J, Mussen F. Development of a Framework for Enhancing the Transparency, Reproducibility and Communication of the Benefit–Risk Balance of Medicines. Clinical Pharmacology & Therapeutics. 2011;89(2):312-315.

12. Thokala P, Devlin N, Marsh K, et al. Multiple Criteria Decision Analysis for Health Care Decision Making-An Introduction: Report 1 of the ISPOR MCDA Emerging Good Practices Task Force. Value in Health. 2016;19(1):1-13.

13. Hammond JS, Keeney RL, Raiffa H. Smart choices: A practical guide to making better decisions. Harvard Business Review Press; 2015.

14. Mt-Isa S, Hallgreen CE, Wang N, et al. Balancing benefit and risk of medicines: a systematic review and classification of available methodologies. Pharmacoepidemiology and drug safety. 2014;23(7):667-678.

15. Ho MP, Gonzalez JM, Lerner HP, et al. Incorporating patient-preference evidence into regulatory decision making. Surgical endoscopy. 2015;29(10):2984-2993.

16. Sutton AJ, Cooper NJ, Abrams KR, Lambert PC, Jones DR. A Bayesian approach to evaluating net clinical benefit allowed for parameter uncertainty. Journal of clinical epidemiology. 2005;58(1):26-40.

17. Broekhuizen H, Groothuis-Oudshoorn CGM, van Til JA, Hummel JM, Ijzerman MJ. A Review and Classification of Approaches for Dealing with Uncertainty in Multi-Criteria Decision Analysis for Healthcare Decisions. Pharmacoeconomics. 2015;33(5):445-455.

18. Durbach IN, Stewart TJ. Modeling uncertainty in multi-criteria decision analysis. European Journal of Operational Research. 2012;223(1):1-14.

19. Wen S, Zhang L, Yang B. Two approaches to incorporate clinical data uncertainty into multiple criteria decision analysis for benefit-risk assessment of medicinal products. Value in health : the journal of the International Society for Pharmacoeconomics and Outcomes Research. 2014;17(5):619-628.

https://www.ema.europa.eu/en/about-us/support-research/benefit-risk-methodology

https://www.ema.europa.eu/en/about-us/support-research/benefit-risk-methodology

http://www.imi-protect.eu/about.shtml

282

20. Mosadeghi R, Warnken J, Tomlinson R, Mirfenderesk H. Uncertainty analysis in the application of multi-criteria decision-making methods in Australian strategic environmental decisions. Journal of Environmental Planning and Management. 2013;56(8):1097-1124.

21. Kangas AS, Kangas J. Probability, possibility and evidence: approaches to consider risk and uncertainty in forestry decision analysis. Forest Policy and Economics. 2004;6(2):169-188.

22. Stewart TJ, Durbach I. Dealing with uncertainties in MCDA. In. International Series in Operations Research and Management Science. Vol 2332016:467-496.

23. Svecova L, Fotr J, Vrbova L. A MULTI-CRITERIA EVALUATION OF ALTERNATIVES UNDER RISK. 6th International Days of Statistics and Economics. 2012:1090-1100.

24. Basak I. Probabilistic judgments specified partially in the Analytic Hierarchy Process. European Journal of Operational Research. 1998;108(1):153-164.

25. Bech M, Gyrd-Hansen D, Kjær T, Lauriden J, Sørensen J. Graded pairs comparison - Does strength of preference matter? Analysis of preferences for specialised nurse home visits for pain management. Health Economics. 2007;16(5):513-529.

26. Dekker T, Hess S, Brouwer R, Hofkes M. Decision uncertainty in multi-attribute stated preference studies. Resource and Energy Economics. 2016;43:57-73.

27. Jessop A. Using imprecise estimates for weights. Journal of the Operational Research Society. 2011;62(6):1048-1055.

28. Voltaire L, Pirrone C, Bailly D. Dealing with preference uncertainty in contingent willingness to pay for a nature protection program: A new approach. Ecological Economics. 2013;88:76-85.

29. Nixon R, Dierig C, Mt-Isa S, et al. A case study using the PrOACT-URL and BRAT frameworks for structured benefit risk assessment. Biometrical Journal. 2016;58(1):8-27.

30. Lahdelma R, Hokkanen J, Salminen P. SMAA - Stochastic multiobjective acceptability analysis. European Journal of Operational Research. 1998;106(1):137-143.

31. Tervonen T, van Valkenhoef G, Buskens E, Hillege HL, Postmus D. A stochastic multicriteria model for evidence-based decision making in drug benefit-risk analysis. Statistics in medicine. 2011;30(12):1419-1428.

32. van Valkenhoef G, Tervonen T, Zhao J, de Brock B, Hillege HL, Postmus D. Multicriteria benefit-risk assessment using network meta-analysis. Journal of clinical epidemiology. 2012;65(4):394-403.

33. Broekhuizen H, Groothuis-Oudshoorn CG, Hauber AB, Jansen JP, MJ IJ. Estimating the value of medical treatments to patients using probabilistic multi criteria decision analysis. BMC medical informatics and decision making. 2015;15:102.

34. Smith JQ. Bayesian decision analysis: Principles and practice. 2010. 35. Ashby D. Bayesian statistics in medicine: a 25 year review. Statistics in medicine.

2006;25(21):3589-3631. 36. Harrell FE, Shih YCT. Using full probability models to compute probabilities of actual interest

to decision makers. International Journal of Technology Assessment in Health Care. 2001;17(1):17-26.

37. Stangl DK. Bridging the gap between statistical analysis and decision making in public health research. Statistics in Medicine 2005; 24:503-511.

38. Ashby D, Smith AFM. Evidence-based medicine as Bayesian decision-making. Statistics in Medicine. 2000;19(23):3291-3305.

39. Costa MJ, He W, Jemiai Y, Zhao Y, Di Casoli C. The Case for a Bayesian Approach to Benefit-Risk Assessment::Overview and Future Directions. Therapeutic Innovation & Regulatory Science. 2017;51(5):568-574.

40. Garcia-Hernandez A. A Note on the Validity and Reliability of Multi-Criteria Decision Analysis for the Benefit-Risk Assessment of Medicines. Drug safety. 2015;38(11):1049-1057.

41. Muehlbacher AC. Patient-centric HTA: different strokes for different folks. Expert Review of Pharmacoeconomics & Outcomes Research. 2015;15(4):591-597.

283

42. Umar N, Schaarschmidt M, Schmieder A, Peitsch WK, Schoellgen I, Terris DD. Matching physicians' treatment recommendations to patients' treatment preferences is associated with improvement in treatment satisfaction. Journal of the European Academy of Dermatology and Venereology. 2013;27(6):763-770.

43. Efthimiou O, Mavridis D, Cipriani A, Leucht S, Bagos P, Salanti G. An approach for modelling multiple correlated outcomes in a network of interventions using odds ratios. Statistics in medicine. 2014;33(13):2275-2287.

44. Efthimiou O, Mavridis D, Riley RD, Cipriani A, Salanti G. Joint synthesis of multiple correlated outcomes in networks of interventions. Biostatistics (Oxford, England). 2015;16(1):84-97.

45. Hong H, Carlin BP, Shamliyan TA, et al. Comparing Bayesian and Frequentist Approaches for Multiple Outcome Mixed Treatment Comparisons. Medical Decision Making. 2013;33(5):702-714.

46. Ades AE, Lu G, Dias S, Mayo-Wilson E, Kounali D. Simultaneous synthesis of treatment effects and mapping to a common scale: an alternative to standardisation. Research synthesis methods. 2015;6(1):96-107.

47. Lu G, Kounali D, Ades AE. Simultaneous Multioutcome Synthesis and Mapping of Treatment Effects to a Common Scale. Value in Health. 2014;17(2):280-287.

48. Lunn DJ, Thomas A, Best N, Spiegelhalter D. WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing. 2000;10(4):325-337.

49. CIOMS. Benefit-Risk Balance for Marketed Drugs: Evaluating Safety Signals, Report of CIOMS Working Group IV. In. Chemistry International -- Newsmagazine for IUPAC. Vol 211999:48.

50. Mussen F, Salek S, Walker S. A quantitative approach to benefit-risk assessment of medicines - part 1: the development of a new model using multi-criteria decision analysis. Pharmacoepidemiology and drug safety. 2007;16 Suppl 1:S2-s15.

51. Caster O, Noren GN, Ekenberg L, Edwards IR. Quantitative benefit-risk assessment using only qualitative information on utilities. Medical decision making : an international journal of the Society for Medical Decision Making. 2012;32(6):E1-15.

52. Tervonen T. JSMAA: open source software for SMAA computations. International Journal of Systems Science. 2014;45(1):69-81.

53. Waddingham E, Mt-Isa S, Nixon R, Ashby D. A Bayesian approach to probabilistic sensitivity analysis in structured benefit-risk assessment. Biometrical Journal. 2016;58(1):28-42.

54. Tervonen T, Naci H, van Valkenhoef G, et al. Applying Multiple Criteria Decision Analysis to Comparative Benefit-Risk Assessment: Choosing among Statins in Primary Prevention. Medical Decision Making. 2015;35(7):859-871.

55. Marsh K, Ijzerman M, Thokala P, et al. Multiple Criteria Decision Analysis for Health Care Decision Making—Emerging Good Practices: Report 2 of the ISPOR MCDA Emerging Good Practices Task Force. Value in Health. 2016;19(2):125-137.

56. Glass GV. Primary, Secondary, and Meta-Analysis of Research. Educational Researcher. 1976;5(10):3-8.

57. Pearson K. Report on Certain Enteric Fever Inoculation Statistics. British Medical Journal. 1904;2(2288):1243-1246.

58. Ades AE. A chain of evidence with mixed comparisons: models for multi-parameter synthesis and consistency of evidence. Statistics in Medicine. 2003;22(19):2995-3016.

59. Bucher HC, Guyatt GH, Griffith LE, Walter SD. The results of direct and indirect treatment comparisons in meta-analysis of randomized controlled trials. Journal of clinical epidemiology. 1997;50(6):683-691.

60. Lu G, Ades AE. Combination of direct and indirect evidence in mixed treatment comparisons. Statistics in Medicine. 2004;23(20):3105-3124.

61. Lee AW. Review of mixed treatment comparisons in published systematic reviews shows marked increase since 2009. Journal of clinical epidemiology. 2014;2(67):138-143.

284

62. Jansen JP, Fleurence R, Devine B, et al. Interpreting Indirect Treatment Comparisons and Network Meta-Analysis for Health-Care Decision Making: Report of the ISPOR Task Force on Indirect Treatment Comparisons Good Research Practices: Part 1. Value in Health. 2011;14(4):417-428.

63. Dias SW, N.J.; Sutton. A.J.; Ades, A.E. NICE DSU Technical Support Document 2: A general linear modelling framework for pair-wise and network meta-analysis of randomised controlled trials; 2011 (updated 2016) http://nicedsu.org.uk/wp-content/uploads/2017/05/TSD2-General-meta-analysis-corrected-2Sep2016v2.pdf. Accessed: 18/02/2020.

64. Salanti G, Kavvoura FK, Ioannidis JPA. Exploring the Geometry of Treatment Networks. Annals of Internal Medicine. 2008;148(7):544-553.

65. Dias S, Welton NJ, Sutton AJ, Caldwell DM, Lu G, Ades AE. Evidence synthesis for decision making 4: inconsistency in networks of evidence based on randomized controlled trials. Medical decision making : an international journal of the Society for Medical Decision Making. 2013;33(5):641-656.

66. Lu G, Ades AE. Assessing Evidence Inconsistency in Mixed Treatment Comparisons. Journal of the American Statistical Association. 2006;101(474):447-459.

67. Ades AE, Sculpher M, Sutton A, et al. Bayesian methods for evidence synthesis in cost-effectiveness analysis. Pharmacoeconomics. 2006;24(1):1-19.

68. Riley RD. Multivariate meta-analysis: the effect of ignoring within-study correlation. Journal of the Royal Statistical Society: Series A (Statistics in Society). 2009;172(4):789-811.

69. Dias S, Welton NJ, Caldwell DM, Ades AE. Checking consistency in mixed treatment comparison meta-analysis. Statistics in Medicine. 2010;29(7‐8):932-944.

70. Lu G, Welton NJ, Higgins JPT, White IR, Ades AE. Linear inference for mixed treatment comparison meta-analysis: A two-stage approach. Research synthesis methods. 2011;2(1):43-60.

71. Caster O, Edwards IR. Quantitative benefit-risk assessment of methylprednisolone in multiple sclerosis relapses. BMC neurology. 2015;15:206.

72. Juhaeri J, Amzal B, Chan E, et al. Wave 2 Case Study Report: Rimonabant. 2012. 73. Smith TC, Spiegelhalter DJ, Thomas A. Bayesian approaches to random-effects meta-analysis:

a comparative study. Statistics in medicine. 1995;14(24):2685-2699. 74. Warn DE, Thompson SG, Spiegelhalter DJ. Bayesian random effects meta-analysis of trials

with binary outcomes: methods for the absolute risk difference and relative risk scales. Statistics in Medicine. 2002;21(11):1601-1623.

75. Bucher HC, Griffith L, Guyatt GH, Opravil M. Meta-analysis of prophylactic treatments against Pneumocystis carinii pneumonia and toxoplasma encephalitis in HIV-infected patients. J Acquir Immune Defic Syndr Hum Retrovirol. 1997;15(2):104-114.

76. Higgins JP, Whitehead A. Borrowing strength from external trials in a meta-analysis. Statistics in medicine. 1996;15(24):2733-2749.

77. Lumley T. Network meta-analysis for indirect treatment comparisons. Statistics in medicine. 2002;21(16):2313-2324.

78. Dias S, Sutton AJ, Ades AE, Welton NJ. Evidence Synthesis for Decision Making 2: A Generalized Linear Modeling Framework for Pairwise and Network Meta-analysis of Randomized Controlled Trials. Medical Decision Making. 2013;33(5):607-617.

79. Lu G, Ades A. Modeling between-trial variance structure in mixed treatment comparisons. Biostatistics. 2009;10(4):792-805.

80. Welton NJ, Caldwell DM, Adamopoulos E, Vedhara K. Mixed treatment comparison meta-analysis of complex interventions: psychological interventions in coronary heart disease. Am J Epidemiol. 2009;169(9):1158-1165.

285

81. Tramacere I, Filippini G, Del Giovane C, et al. Immunomodulators and immunosuppressants for multiple sclerosis: a network meta-analysis. The Cochrane database of systematic reviews. 2013(6):Cd008933.

82. NICE. Guide to the methods of technology appraisal; 2013. https://www.nice.org.uk/process/pmg9/resources/guide-to-the-methods-of-technologyappraisal-2013-pdf-2007975843781 Accessed: 29/07/2019.

83. Bujkiewicz S, Thompson JR, Sutton AJ, et al. Multivariate meta-analysis of mixed outcomes: a Bayesian approach. Statistics in medicine. 2013;32(22):3926-3943.

84. Jackson D, Riley R, White IR. Multivariate meta-analysis: Potential and promise. Statistics in medicine. 2011;30(20):2481-2498.

85. Riley RD, Abrams KR, Lambert PC, Sutton AJ, Thompson JR. An evaluation of bivariate random-effects meta-analysis for the joint synthesis of two correlated outcomes. Statistics in medicine. 2007;26(1):78-97.

86. Madan J, Chen Y-F, Aveyard P, et al. Synthesis of evidence on heterogeneous interventions with multiple outcomes recorded over multiple follow-up times reported inconsistently: a smoking cessation case-study. Journal of the Royal Statistical Society: Series A (Statistics in Society). 2014;177(1):295-314.

87. Welton NJ, Cooper NJ, Ades AE, Lu G, Sutton AJ. Mixed treatment comparison with multiple outcomes reported inconsistently across trials: Evaluation of antivirals for treatment of influenza A and B. Statistics in Medicine. 2008;27(27):5620-5639.

88. Pedder H, Dias S, Bennetts M, Boucher M, Welton NJ. Modelling time-course relationships with multiple treatments: Model-based network meta-analysis for continuous summary outcomes. Research synthesis methods. 2019;10(2):267-286.

89. Daniels MJ, Hughes MD. Meta-analysis for the evaluation of potential surrogate markers. Statistics in Medicine. 1997; Sep 15;16(17):1965-82.

90. Dias S, Ades AE, Welton NJ, Jansen JP, Sutton AJ. Network Meta-Analysis for Decision-Making. Wiley; 2018.

91. Welton NJ, Sutton AJ, Cooper NJ, Abrams KR, Ades AE. Evidence Synthesis for Decision Making in Healthcare. 2012.

92. Rosati G. The prevalence of multiple sclerosis in the world: an update. Neurol Sci. 2001;22(2):117-139.

93. Gajofatto A, Benedetti MD. Treatment strategies for multiple sclerosis: When to start, when to change, when to stop? World Journal of Clinical Cases : WJCC. 2015;3(7):545-555.

94. Bornstein MB, Miller A, Slagle S, et al. A pilot trial of Cop 1 in exacerbating-remitting multiple sclerosis. N Engl J Med. 1987;317(7):408-414.

95. Cadavid D, Wolansky Lj Fau - Skurnick J, Skurnick J Fau - Lincoln J, et al. Efficacy of treatment of MS with IFNbeta-1b or glatiramer acetate by monthly brain MRI in the BECOME study. Neurology. 2009;23(72):1976-1983.

96. Calabresi PA, Radue E-W, Goodin D, et al. Safety and efficacy of fingolimod in patients with relapsing-remitting multiple sclerosis (FREEDOMS II): a double-blind, randomised, placebo-controlled, phase 3 trial. The Lancet Neurology. 2014;13(6):545-556.

97. Comi G, Jeffery D, Kappos L, et al. Placebo-Controlled Trial of Oral Laquinimod for Multiple Sclerosis. New England Journal of Medicine. 2012;366(11):1000-1009.

98. Durelli L, Verdun E, Barbero P, et al. Every-other-day interferon beta-1b versus once-weekly interferon beta-1a for multiple sclerosis: results of a 2-year prospective randomised multicentre study (INCOMIN). The Lancet.359(9316):1453-1460.

99. Ebers GC. Randomised double-blind placebo-controlled study of interferon beta-1a in relapsing/remitting multiple sclerosis. The Lancet. 1998;352(9139):1498-1504.

100. Fox RJ, Miller DH, Phillips JT, et al. Placebo-Controlled Phase 3 Study of Oral BG-12 or Glatiramer in Multiple Sclerosis. New England Journal of Medicine. 2012;367(12):1087-1097.

286

101. Gold R, Kappos L, Arnold DL, et al. Placebo-Controlled Phase 3 Study of Oral BG-12 for Relapsing Multiple Sclerosis. New England Journal of Medicine. 2012;367(12):1098-1107.

102. Jacobs LD, Cookfair DL, Rudick RA, et al. Intramuscular interferon beta-1a for disease progression in relapsing multiple sclerosis. Annals of Neurology. 1996;39(3):285-294.

103. Johnson KP, Brooks BR, Cohen JA, et al. Copolymer 1 reduces relapse rate and improves disability in relapsing‐remitting multiple sclerosis: Results of a phase III multicenter, double‐blind, placebo‐controlled trial. Neurology. 1995;45(7):1268-1276.

104. Kappos L, Radue E-W, O'Connor P, et al. A Placebo-Controlled Trial of Oral Fingolimod in Relapsing Multiple Sclerosis. New England Journal of Medicine. 2010;362(5):387-401.

105. Mikol DD, Barkhof F, Chang P, et al. Comparison of subcutaneous interferon beta-1a with glatiramer acetate in patients with relapsing multiple sclerosis (the REbif vs Glatiramer Acetate in Relapsing MS Disease [REGARD] study): a multicentre, randomised, parallel, open-label trial. The Lancet Neurology. 2008;7(10):903-914.

106. O'Connor P, Filippi M, Arnason B, et al. 250 microg or 500 microg interferon beta-1b versus 20 mg glatiramer acetate in relapsing-remitting multiple sclerosis: a prospective, randomised, multicentre study. The Lancet Neurology. 2009;8(10):889-897.

107. O'Connor P, Wolinsky JS, Confavreux C, et al. Randomized Trial of Oral Teriflunomide for Relapsing Multiple Sclerosis. New England Journal of Medicine. 2011;365(14):1293-1303.

108. Paty DW, Li DKB, Group TIMS. Interferon beta-1b is effective in relapsing-remitting multiple sclerosis. I. Clinical results of a multicenter, randomized, double-blind, placebo-controlled trial. Neurology. 1993;43(4):655-661.

109. Vollmer TL, Sorensen PS, Selmaj K, et al. A randomized placebo-controlled phase III trial of oral laquinimod for multiple sclerosis. Journal of Neurology. 2014;261(4):773-783.

110. Koch-Henriksen N, Sørensen PS, Christensen T, et al. A randomized study of two interferon- beta treatments in relapsing–remitting multiple sclerosis. Neurology. 2006;66(7):1056.

111. Deeks JJ. Issues in the selection of a summary statistic for meta-analysis of clinical trials with binary outcomes. Statistics in Medicine. 2002;21(11):1575-1600.

112. Jansen JP. Network meta-analysis of survival data with fractional polynomials. BMC Medical Research Methodology. 2011;11(1):61.

113. Ouwens MJ, Philips Z, Jansen JP. Network meta-analysis of parametric survival curves. Research synthesis methods. 2010;1(3-4):258-271.

114. Franchini AJ, Dias S, Ades AE, Jansen JP, Welton NJ. Accounting for correlation in network meta-analysis with multi-arm trials. Research synthesis methods. 2012;3(2):142-160.

115. Riley RD, Jackson D, Salanti G, et al. Multivariate and network meta-analysis of multiple outcomes and multiple treatments: rationale, concepts, and examples. BMJ. 2017;358.

116. Riley RD, Thompson JR, Abrams KR. An alternative model for bivariate random-effects meta-analysis when the within-study correlations are unknown. Biostatistics. 2008;9(1):172-186.

117. Sweeting MJ, Sutton Aj Fau - Lambert PC, Lambert PC. What to add to nothing? Use and avoidance of continuity corrections in meta-analysis of sparse data. Statistics in Medicine. 2004; 23: 1351-1375.

118. Phillips R, Hazell L, Sauzet O, Cornelius V. Analysis and reporting of adverse events in randomised controlled trials: a review. BMJ open. 2019;9(2):e024537-e024537.

119. Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2002;64(4):583-639.

120. Salanti G, Ades AE, Ioannidis JPA. Graphical methods and numerical summaries for presenting results from multiple-treatment meta-analysis: an overview and tutorial. Journal of Clinical Epidemiology. 2011;64(2):163-171.

121. Walker R, Schulz M, Arora B, et al. 057 Real world evidence (RWE) on long-term persistence of fingolimod in relapsing-remitting multiple sclerosis (RRMS) in australia. Journal of Neurology, Neurosurgery & Psychiatry. 2018;89(6):A23.

287

122. Team SD. Stan User's Guide. 2019; https://mc-stan.org/docs/2_22/stan-users-guide/index.html. Accessed 11/02/2020.

123. Spiegelhalter DJ, Thomas A, Best NG, Lunn DJ. WinBUGS User Manual, version 1.4. 2003; http://www.mrc-bsu.cam.ac.uk/wp-content/uploads/manual14.pdf. Accessed 11/02/2020.

124. Jansen JP, Trikalinos T, Cappelleri JC, et al. Indirect Treatment Comparison/Network Meta-Analysis Study Questionnaire to Assess Relevance and Credibility to Inform Health Care Decision Making: An ISPOR-AMCP-NPC Good Practice Task Force Report. Value in Health. 2014;17(2):157-173.

125. Madan J, Stevenson MD, Cooper KL, Ades AE, Whyte S, Akehurst R. Consistency between direct and indirect trial evidence: Is direct evidence always more reliable? Value in Health. 2011;14(6):953-960.

126. Egger M, Smith GD, Schneider M, Minder C. Bias in meta-analysis detected by a simple, graphical test. 1997;315(7109):629-634.

127. Figueira J, Greco S, Ehrgott M. Multiple Criteria Decision Analysis: State of the Art Surveys. New York: Springer; 2005.

128. Keeney RL, Raiffa H. Decisions with Multiple Objectives: Preferences and Value Tradeoffs. New York: Wiley; 1976.

129. Rothrock L, Yin J. Integrating Compensatory and Noncompensatory Decision-Making Strategies in Dynamic Task Environments. In: Kugler T, Smith JC, Connolly T, Son Y-J, eds. Decision Modeling and Behavior in Complex and Uncertain Environments. New York, NY: Springer New York; 2008:125-141.

130. Saint-Hilary G, Robert V, Gasparini M, Jaki T, Mozgunov P. A novel measure of drug benefit–risk assessment based on Scale Loss Score. Statistical Methods in Medical Research. 2019;28(9):2738-2753.

131. Pauly MV, McGuire TG, Barros PP. Handbook of Health Economics. Amsterdam, Netherlands: Elsevier Science & Technology; 2012.

132. Petrou S, Gray A. Economic evaluation using decision analytical modelling: design, conduct, analysis, and reporting. 2011;342:d1766.

133. von Neumann JM, O. Theory of Games and Economic Behavior. Princeton, NJ: Princeton University Press; 1943.

134. Dyer JS. Multiattribute Utility Theory (MAUT). In: Greco S, Ehrgott M, Figueira JR, eds. Multiple Criteria Decision Analysis State of the Art Surveys. Vol 1. New York, NY: Springer; 2016:285-314.

135. Phillips LD. A theory of requisite decision models. Acta Psychologica. 1984;56(1):29-48. 136. Anderson NH. Functional Measurement and Psychophysical Judgement. Psychological

Review. 1970;77(3):153-&. 137. Torrance GW, Boyle MH, Horwood SP. Application of multi-attribute utility theory to

measure social preferences for health states. Operations Research. 1982;30(6):1043-1069. 138. Whitehead SJ, Ali S. Health outcomes in economic evaluation: the QALY and utilities. British

Medical Bulletin. 2010;96(1):5-21. 139. Marsh K, Lanitis T, Neasham D, Orfanos P, Caro J. Assessing the Value of Healthcare

Interventions Using Multi-Criteria Decision Analysis: A Review of the Literature. Pharmacoeconomics. 2014;32(4):345-365.

140. Petrou S, Henderson J. Preference-Based Approaches to Measuring the Benefits of Perinatal Care. Birth. 2003;30(4):217-226.

141. Ryan M, Scott DA, Reeves C, et al. Eliciting public preferences for healthcare: a systematic review of techniques. Health technology assessment (Winchester, England). 2001;5(5):1-180.

142. Blinman P, King M, Norman R, Viney R, Stockler MR. Preferences for cancer treatments: An overview of methods and applications in oncology. Annals of Oncology. 2012;23(5):1104-1110.

https://mc-stan.org/docs/2_22/stan-users-guide/index.html

https://mc-stan.org/docs/2_22/stan-users-guide/index.html

http://www.mrc-bsu.cam.ac.uk/wp-content/uploads/manual14.pdf

288

143. Brett Hauber A, Fairchild AO, Reed Johnson F. Quantifying benefit-risk preferences for medical interventions: an overview of a growing empirical literature. Applied health economics and health policy. 2013;11(4):319-329.

144. Weernink MGM, Janus SIM, van Til JA, Raisch DW, van Manen JG, Ijzerman MJ. A Systematic Review to Identify the Use of Preference Elicitation Methods in Healthcare Decision Making. Pharmaceutical Medicine. 2014;28(4):175-185.

145. Huber J, Wittink DR, Fiedler JA, Miller R. The effectiveness of alterntive preference elicitation procedures in predicting choice. Journal of Marketing Research. 1993;30(1):105-114.

146. Saaty RW. The analytic hierarchy process—what it is and how it is used. Mathematical Modelling. 1987;9(3):161-176.

147. Saaty TL, Vargas LG. Models, Methods, Concepts & Applications of the Analytic Hierarchy Process. Springer US; 2012.

148. Liberatore MJ, Nydick RL. The analytic hierarchy process in medical and health care decision making: A literature review. European Journal of Operational Research. 2008;189(1):194-207.

149. Lootsma FA. Scale sensitivity in the multiplicative AHP and SMART. Journal of Multi-Criteria Decision Analysis. 1993;2(2):87-110.

150. Finan JS, Hurley WJ. Transitive calibration of the AHP verbal scale. European Journal of Operational Research. 1999;112(2):367-372.

151. Laininen P, Hamalainen RP. Analyzing AHP-matrices by regression. European Journal of Operational Research. 2003;148(3):514-524.

152. Genest C, Rivest LP. A Statistical Look at Saaty's Method of Estimating Pairwise Preferences Expressed on a Ratio Scale. Journal of Mathematical Psychology. 1994;38(4):477-496.

153. Bana e Costa CA, Vansnick J-C. A critical analysis of the eigenvalue method used to derive priorities in AHP. European Journal of Operational Research. 2008;187(3):1422-1428.

154. de Jong P. A statistical approach to Saaty's scaling method for priorities. Journal of Mathematical Psychology. 1984;28(4):467-478.

155. Crawford G, Williams C. A note on the analysis of subjective judgment matrices. Journal of Mathematical Psychology. 1985;29(4):387-405.

156. Alho JM, Kangas J, Kolehmainen O. Uncertainty in expert predictions of the ecological consequences of forest plans. Journal of the Royal Statistical Society Series C: Applied Statistics. 1996;45(1):1-14.

157. Altuzarra A, Moreno-Jimenez JM, Salvador M. A Bayesian priorization procedure for AHP-group decision making. European Journal of Operational Research. 2007;182(1):367-382.

158. Bana e Costa CA, Vansnick J-C. Applications of the MACBETH Approach in the Framework of an Additive Aggregation Model. Journal of Multi-Criteria Decision Analysis. 1997;6(2):107-114.

159. Bana e Costa CA, De Corte J-M, Vansnick J-C. On the Mathematical Foundations of MACBETH. In: Greco S, Ehrgott M, Figueira JR, eds. Multiple Criteria Decision Analysis: State of the Art Surveys. New York, NY: Springer New York; 2016:421-463.

160. Dodgson J, Spackman M, Pearman A, Phillips L. Multi-criteria analysis: a manual. London School of Economics and Political Science, Department of Economic History;2009.

161. Arons AM, Krabbe PF. Probabilistic choice models in health-state valuation research: background, theories, assumptions and applications. Expert review of pharmacoeconomics & outcomes research. 2013;13(1):93-108.

162. Ryan M, Bate A, Eastmond CJ, Ludbrook A. Use of discrete choice experiments to elicit preferences. Quality in health care : QHC. 2001;10 Suppl 1:i55-60.

163. Ryan M, Gerard K, Amaya-Amaya M, eds. Using discrete choice experiments to value health and health care. Dordrecht: Springer Academic Publishers; 2008. The economics of non-market goods and resources.

289

164. Lancsar E, Louviere J. Conducting discrete choice experiments to inform healthcare decision making: a user's guide. Pharmacoeconomics. 2008;26(8):661-667.

165. Clark MD, Determann D, Petrou S, Moro D, de Bekker-Grob EW. Discrete Choice Experiments in Health Economics: A Review of the Literature. Pharmacoeconomics. 2014;32(9):883-902.

166. Muhlbacher A, Johnson FR. Choice Experiments to Quantify Preferences for Health and Healthcare: State of the Practice. Applied health economics and health policy. 2016;14(3):253-266.

167. Louviere J, Hensher D, Swait J. Stated choice methods: analysis and application. Cambridge: Cambridge University Press; 2000.

168. Cheu RL, Nguyen HT, Magoc T, Kreinovich V. Logit discrete choice model: A new distribution-free justification. Soft Computing. 2009;13(2):133-137.

169. McFadden D. Conditional logit analysis of qualitative choice behaviour. In: Zarembka P, ed. Frontiers in econometrics. New York: Academic Press; 1974.

170. Ben-Akiva M, Morikawa T, Shiroishi F. Analysis of the reliability of preference ranking data. Journal of Business Research. 1992;24(2):149-164.

171. Lancsar E, Louviere J, Donaldson C, Currie G, Burgess L. Best worst discrete choice experiments in health: Methods and an application. Social Science & Medicine. 2013;76:74-82.

172. Böckenholt U. Comparative judgments as an alternative to ratings: Identifying the scale origin. Psychol Methods. 2004;9(4):453-465.

173. Zhang J, Johnson FR, Mohamed AF, Hauber AB. Too many attributes: A test of the validity of combining discrete-choice and best-worst scaling data. Journal of Choice Modelling. 2015;15:1-13.

174. Chakraborty G, Ball D, Gaeth GJ, Jun S. The ability of ratings and choice conjoint to predict market shares - A Monte Carlo simulation. Journal of Business Research. 2002;55(3):237-249.

175. Marshall P, Bradlow ET. A unified approach to conjoint analysis models. Journal of the American Statistical Association. 2002;97(459):674-682.

176. Montibeller G, von Winterfeldt D. Cognitive and Motivational Biases in Decision and Risk Analysis. Risk Anal. 2015;35(7):1230-1250.

177. Steele K, Carmel Y, Cross J, Wilcox C. Uses and misuses of multicriteria decision analysis (MCDA) in environmental decision making. Risk Anal. 2009;29(1):26-33.

178. Saint-Hilary G, Cadour S, Robert V, Gasparini M. A simple way to unify multicriteria decision analysis (MCDA) and stochastic multicriteria acceptability analysis (SMAA) using a Dirichlet distribution in benefit–risk assessment. Biometrical Journal. 2017;59(3):567-578.

179. Salo AA, Hamalainen RP. On the Measurement of Preferences in the Analytic Hierarchy Process. Journal of Multi-Criteria Decision Analysis. 1997;6:309-319.

180. Edwards W. How to Use Multiattribute Utility Measurement for Social Decisionmaking. IEEE Transactions on Systems, Man, and Cybernetics. 1977;7(5):326-340.

181. Arrow KJ. Social Choice and Individual Values. New York: John Wiley & Sons; 1951. 182. Sen A. The Impossibility of a Paretian Liberal. Journal of Political Economy. 1970;78(1):152-

157. 183. Keeney RL. A Group Preference Axiomatization with Cardinal Utility. Management Science.

1976;23(2):140-145. 184. Sen A. Collective choice and social welfare. 1970. 185. Keeney RL. Group Preference Axiomatization with Cardinal Utility. Management Science.

1976;23(2):140-145. 186. Harsanyi JC. Cardinal Welfare, Individualistic Ethics, and Interpersonal Comparisons of

Utility. Journal of Political Economy. 1955;63(4):309-321. 187. Hammond PJ. Harsanyi’s Utilitarian Theorem: A Simpler Proof and Some Ethical

Connotations. In: Selten R, ed. Rational Interaction: Essays in Honor of John C. Harsanyi. Berlin, Heidelberg: Springer Berlin Heidelberg; 1992:305-319.

290

188. Carlin BP. Bayes and empirical Bayes methods for data analysis. London: Chapman & Hall; 1996.

189. Haan P, Kemptner D, Uhlendorff A. Bayesian procedures as a numerical tool for the estimation of an intertemporal discrete choice model. Empirical Economics. 2015;49(3):1123-1141.

190. Daziano RA, Chiew E. On the effect of the prior of Bayes estimators of the willingness to pay for electric-vehicle driving range. Transportation Research Part D-Transport and Environment. 2013;21:7-13.

191. Boettger B, Thate-Waschke I-M, Bauersachs R, Kohlmann T, Wilke T. Preferences for anticoagulation therapy in atrial fibrillation: the patients' view. Journal of Thrombosis and Thrombolysis. 2015;40(4):406-415.

192. Lichtenstein GR, Waters HC, Kelly J, et al. Assessing drug treatment preferences of patients with Crohn's disease: A conjoint analysis. The Patient. 2010;3(2):113-123.

193. Hockley KA, D.; Das, S.; Hallgreen, C.; Mt-Isa, S.; Waddingham, E. ; Nicolas, R.; Talbot, S.; Stoeckert, I.; Genov, G.; Dil, Y.; Groves, J.; Johnson, R.; Lightbourne, A.; Mwangi, J.; Seal-Jones, R.; Elmachtoub, A.; Allen, C.; Thomson, A.; Lohrmann, E.; Micaleff, A.; Nixon, R.; Treacy, J.; Wise, L. PATIENT AND PUBLIC INVOLVEMENT REPORT version 1.0 - Recommendations for Patient and Public Involvement in the assessment of benefit and risk of medicines. PROTECT;2015.

194. Forman E, Peniwati K. Aggregating individual judgments and priorities with the Analytic Hierarchy Process. European Journal of Operational Research. 1998;108(1):165-169.

195. Lin C, Kou G. Bayesian revision of the individual pair-wise comparison matrices under consensus in AHP-GDM. Applied Soft Computing Journal. 2015;35:802-811.

196. Schatz NK, Fabiano GA, Cunningham CE, et al. Systematic Review of Patients’ and Parents’ Preferences for ADHD Treatment Options and Processes of Care. Patient. 2015;8(6):483-497.

197. Arroyo R, Sempere AP, Ruiz-Beato E, et al. Conjoint analysis to understand preferences of patients with multiple sclerosis for disease-modifying therapy attributes in Spain: a cross-sectional observational study. BMJ Open. 2017;7(3):e014433.

198. Garcia-Dominguez JM, Munoz D, Comellas M, Gonzalbo I, Lizan L, Polanco Sanchez C. Patient preferences for treatment of multiple sclerosis with disease-modifying therapies: a discrete choice experiment. Patient Prefer Adherence. 2016;10:1945-1956.

199. Mansfield C, Thomas N, Gebben D, Lucas M, Hauber AB. Preferences for Multiple Sclerosis Treatments: Using a Discrete-Choice Experiment to Examine Differences Across Subgroups of US Patients. Int J MS Care. 2017;19(4):172-183.

200. Poulos C, Kinter E, Yang JC, Bridges JF, Posner J, Reder AT. Patient Preferences for Injectable Treatments for Multiple Sclerosis in the United States: A Discrete-Choice Experiment. Patient. 2016;9(2):171-180.

201. Utz KS, Hoog J, Wentrup A, et al. Patient preferences for disease-modifying drugs in multiple sclerosis therapy: a choice-based conjoint analysis. Ther Adv Neurol Disord. 2014;7(6):263-275.

202. Wilson L, Loucks A, Bui C, et al. Patient centered decision making: use of conjoint analysis to determine risk-benefit trade-offs for preference sensitive treatment choices. J Neurol Sci. 2014;344(1-2):80-87.

203. Wilson LS, Loucks A, Gipson G, et al. Patient preferences for attributes of multiple sclerosis disease-modifying therapies: development and results of a ratings-based conjoint analysis. Int J MS Care. 2015;17(2):74-82.

204. Berkey CS, Hoaglin DC, Antczak-Bouckoms A, Mosteller F, Colditz GA. Meta-analysis of multiple outcomes by regression with random effects. Statistics in Medicine. 1998;17(22):2537-2550.

205. Thompson SG, Higgins JPT. How should meta-regression analyses be undertaken and interpreted? Statistics in Medicine. 2002;21(11):1559-1573.

291

206. Johnson FR, Lancsar E, Marshall D, et al. Constructing Experimental Designs for Discrete-Choice Experiments: Report of the ISPOR Conjoint Analysis Experimental Design Good Research Practices Task Force. Value in Health. 2013;16(1):3-13.

207. Kuhlfeld W. Marketing Research Methods in SAS. Cary, NC, USA: SAS Institute Inc.;2010. 208. Cox DR, Oakes D. Analysis of Survival Data. New York: Chapman and Hall; 1984. 209. Jackson D, White IR, Riley RD. Quantifying the impact of between-study heterogeneity in

multivariate meta-analyses. Statistics in medicine. 2012;31(29):3805-3820. 210. Higgins JPT, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses.

BMJ (Clinical research ed). 2003;327(7414):557-560. 211. McLachlan G, Peel D. Finite Mixture Models. New York: Wiley; 2000. 212. Hensher DA, Rose JM, Greene WH. Applied Choice Analysis. 2 ed. Cambridge: Cambridge

University Press; 2015. 213. Boyle KJ, Holmes TP, Teisl MF, Roe B. A comparison of conjoint analysis response formats.

American Journal of Agricultural Economics. 2001;83(2):441-454. 214. Pignone MP, Brenner AT, Hawley S, et al. Conjoint analysis versus rating and ranking for

values elicitation and clarification in colorectal cancer screening. Journal of general internal medicine. 2012;27(1):45-50.

215. Wijnen BFM, Van Der Putten IM, Groothuis S, et al. Discrete-choice experiments versus rating scale exercises to evaluate the importance of attributes. Expert Review of Pharmacoeconomics and Outcomes Research. 2015;15(4):721-728.

216. Zagonari F. Choosing among weight-estimation methods for multi-criterion analysis: A case study for the design of multi-purpose offshore platforms. Applied Soft Computing Journal. 2016;39:1-10.

217. Ter Hofstede F, Kim Y, Wedel M. Bayesian prediction in hybrid conjoint analysis. Journal of Marketing Research. 2002;39(2):253-261.

218. Louviere JJ, Fox MF, Moore WL. Cross-task validity comparisons of stated preference choice models. Marketing Letters. 1993;4(3):205-213.

219. Herrera F, Herrera-Viedma E, Chiclana F. Multiperson decision-making based on multiplicative preference relations. European Journal of Operational Research. 2001;129(2):372-385.

220. Musal RM, Soyer R. Bayesian Modeling of Health State Preferences. 2010. 221. Musal RM, Soyer R, McCabe C, Kharroubi SA. Estimating the population utility function: A

parametric Bayesian approach. European Journal of Operational Research. 2012;218(2):538-547.

222. Bacon L, Lenk P. Augmenting discrete-choice data to identify common preference scales for inter-subject analyses. Qme-Quantitative Marketing and Economics. 2012;10(4):453-474.

223. Leskinen P, Kangas AS, Kangas J. Rank-based modelling of preferences in multi-criteria decision making. European Journal of Operational Research. 2004;158(3):721-733.

224. Flynn TN, Louviere JJ, Peters TJ, Coast J. Best-worst scaling: What it can do for health care research and how to do it. Journal of Health Economics. 2007;26(1):171-189.

225. Marley AAJ, Louviere JJ. Some probabilistic models of best, worst, and best-worst choices. Journal of Mathematical Psychology. 2005;49(6):464-480.

226. Luyten J, Kessels R, Goos P, Beutels P. Public preferences for prioritizing preventive and curative health care interventions: a discrete choice experiment. Value in health : the journal of the International Society for Pharmacoeconomics and Outcomes Research. 2015;18(2):224-233.

227. Schmieder A, Schaarschmidt M-L, Umar N, et al. Comorbidities significantly impact patients' preferences for psoriasis treatments. Journal of the American Academy of Dermatology. 2012;67(3):363-372.

292

228. Najafzadeh M, Gagne JJ, Choudhry NK, Polinski JM, Avorn J, Schneeweiss SS. Patients' Preferences in Anticoagulant Therapy Discrete Choice Experiment. Circulation-Cardiovascular Quality and Outcomes. 2014;7(6):912-919.

229. EMA. Use of multiple sclerosis medicine Lemtrada restricted while EMA review is ongoing. 2019; https://www.ema.europa.eu/en/news/use-multiple-sclerosis-medicine-lemtrada-restricted-while-ema-review-ongoing. Accessed 16 July 2019.

230. Cooper NJ, Sutton AJ, Abrams KR, Turner D, Wailoo A. Comprehensive decision analytical modelling in economic evaluation: a Bayesian approach. Health Economics. 2004;13(3):203-226.

231. Sun L, van Kooten GC. Comparing Fuzzy and Probabilistic Approaches to Preference Uncertainty in Non-Market Valuation. Environmental & Resource Economics. 2009;42(4):471-489.

232. EMA. ICH guideline E3 on structure and content of clinical study reports. 01/07/1996 1996. 233. Mott DJ, Najafzadeh M. Whose preferences should be elicited for use in health-care

decision-making? A case study using anticoagulant therapy. Expert Review of Pharmacoeconomics & Outcomes Research. 2016;16(1):33-39.

234. Keeney RL, Kirkwood CW. Group Decision Making Using Cardinal Social Welfare Functions. Management Science. 1975;22(4):430-437.

235. Greco S, Kadzinski M, Mousseau V, Slowinski R. Robust ordinal regression for multiple criteria group decision: UTA(GMS)-GROUP and UTADIS(GMS)-GROUP. Decision Support Systems. 2012;52(3):549-561.

236. Hahn ED. Judgmental consistency and consensus in stochastic multicriteria decision making. Expert Systems with Applications. 2010;37(5):3784-3791.

237. Kunsch PL. A statistical multi-criteria procedure with stochastic preferences. International Journal of Multicriteria Decision Making. 2010;1(1):49-73.

238. Moreno-Jiménez JM, Salvador M, Gargallo P, Altuzarra A. Systemic decision making in AHP: a Bayesian approach. Annals of Operations Research. 2014.

239. Caster O. Benefit-Risk Assessment in Pharmacovigilance. In: Bate A, ed. Evidence-Based Pharmacovigilance: Clinical and Quantitative Aspects. New York, NY: Springer New York; 2018:233-257.

240. Mukhopadhyay S, Dilley K, Oladipo A, Jokinen J. Hierarchical Bayesian Benefit–Risk Modeling and Assessment Using Choice Based Conjoint. Statistics in Biopharmaceutical Research. 2019;11(1):52-60.

241. Robert CP, Elvira V, Tawn N, Wu C. Accelerating MCMC algorithms. Wiley Interdisciplinary Reviews: Computational Statistics. 2018;10(5):e1435.

242. Daziano RA. Inference on mode preferences, vehicle purchases, and the energy paradox using a Bayesian structural choice model. Transportation Research Part B: Methodological. 2015;76:1-26.

243. Goossens LM, Utens CM, Smeenk FW, Donkers B, van Schayck OC, Rutten-van Molken MP. Should I stay or should I go home? A latent class analysis of a discrete choice experiment on hospital-at-home. Value in health : the journal of the International Society for Pharmacoeconomics and Outcomes Research. 2014;17(5):588-596.

244. Magor TJ, Coote LV. Latent variables as a proxy for inherent preferences: A test of antecedent volition. Journal of Choice Modelling. 2014;13:24-36.

https://www.ema.europa.eu/en/news/use-multiple-sclerosis-medicine-lemtrada-restricted-while-ema-review-ongoing

https://www.ema.europa.eu/en/news/use-multiple-sclerosis-medicine-lemtrada-restricted-while-ema-review-ongoing

293

Appendices

Appendix A

294

Appendix A. Source data for RRMS case study

1 Clinical evidence synthesis

Network diagrams by outcome

Relapse rate

Relapse-free proportion

Disability progression confirmed 3 months later

Disability progression confirmed 6 months later

ALT above ULN

ALT above 3x ULN

ALT above 5x ULN

Serious GI disorders

Serious bradycardia, macular edema

DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a, IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide.

Appendix A

295

Raw arm-level data

The tables below show the raw trial data for each of the RRMS case study outcomes in turn. N = total number of patients in arm, n = number of patients experiencing given binary outcome, se = standard error. * indicates that the value was estimated based on other reported quantities.

ANNUALISED RELAPSE RATE Arm 1 Arm 2 Arm 3

Study # arms Drug N Estimate(se) Drug N Estimate(se) Drug N Estimate(se)

BRAVO 2014 3 PL 450 0.34 (0.03) IA (IM) 447 0.26 (0.02) LQ 434 0.28 (0.03)

CONFIRM 2012 3 PL 363 0.4 (0.04) DF 359 0.22 (0.025) GA 350 0.29 (0.03)

ALLEGRO 2012 2 PL 556 0.39 (0.03) LQ 550 0.3 (0.02)

BECOME 2009 2 GA 39 0.33 (0.101*) IB 36 0.37 (0.112*)

BEYOND 2009 2 GA 448 0.34 (0.03*) IB 897 0.36 (0.022*)

DEFINE 2012 2 PL 408 0.36 (0.035) DF 409 0.17 (0.02)

FREEDOMS 2010 2 PL 418 0.4 (0.033) FM 425 0.18 (0.02)

FREEDOMS II 2014 2 PL 355 0.4 (0.035) FM 358 0.21 (0.02)

INCOMIN 2002 2 IB 94 0.5 (0.071) IA (IM) 88 0.7 (0.094)

JOHNSON 1995 2 PL 126 0.84 (0.09*) GA 125 0.59 (0.076*)

MSCRG 1996 2 PL 143 0.82 (0.083*) IA (IM) 158 0.67 (0.072*)

PRISMS 1998 2 PL 187 1.28 (0.091*) IA (SC) 189 1.73 (0.107*)

REGARD 2008 2 IA (SC) 381 0.3 (0.031*) GA 375 0.29 (0.03*)

TEMSO 2011 2 PL 363 0.54 (0.038) TF 358 0.37 (0.033)

BORNSTEIN 1987 2 PL 23 1.35 (0.266*) GA 25 0.3 (0.965*)

IFNB 1993 2 PL 112 1.27 (0.117*) IB 115 0.84 (0.095*) Note: for Model 0, the number of relapse events and person-years are the required data items. As these were generally unreported, they were constructed so as to match the estimated annualised rates, i.e.

#𝑒𝑣𝑒𝑛𝑡𝑠 = 𝑎𝑛𝑛𝑢𝑎𝑙𝑖𝑠𝑒𝑑 𝑟𝑒𝑙𝑎𝑝𝑠𝑒 𝑟𝑎𝑡𝑒 × 𝑝𝑒𝑟𝑠𝑜𝑛𝑦𝑒𝑎𝑟𝑠 with 𝑝𝑒𝑟𝑠𝑜𝑛𝑦𝑒𝑎𝑟𝑠 set to 4

3 𝑁 i.e. an assumption that each participant contributes two thirds of the 2-year

study period on average).

RELAPSE-FREE PROPORTION Arm 1 Arm 2 Arm 3

Study # arms Drug n(N) Drug n(N) Drug n(N)

BRAVO 2014 3 PL 275 (450) IA (IM) 308 (447) LQ 286 (434)

CONFIRM 2012 3 PL 214 (363) DF 255 (359) GA 238 (350)

ALLEGRO 2012 2 PL 290 (556) LQ 346 (550)

BEYOND 2009 2 GA 327 (448) IB 655 (897)

DEFINE 2012 2 PL 220 (408) DF 299 (410)

FREEDOMS 2010 2 PL 191 (418) FM 299 (425)

FREEDOMS II 2014 2 PL 187 (355) FM 256 (358)

INCOMIN 2002 2 IB 49 (96) IA (IM) 33 (92)

JOHNSON 1995 2 PL 34 (126) GA 42 (125)

MSCRG 1996 2 PL 23 (87) IA (IM) 32 (85)

PRISMS 1998 2 PL 30 (187) IA (SC) 59 (184)

REGARD 2008 2 IA (SC) 239 (386) GA 234 (378)

TEMSO 2011 2 PL 166 (363) TF 202 (358)

BORNSTEIN 1987 2 PL 6 (23) GA 14 (25)

IFNB 1993 2 PL 18 (112) IB 36 (115)

Appendix A

296

DISABILITY PROGRESSION CONFIRMED 3 MONTHS LATER Arm 1 Arm 2 Arm 3


BRAVO 2014 3 PL 60 (450) IV (IM) 47 (447) LQ 42 (434)

CONFIRM 2012 3 PL 62 (363) DF 47 (359) GA 56 (350)

ALLEGRO 2012 2 PL 87 (556) LQ 61 (550)

BEYOND 2009 2 GA 90 (448) IB 188 (897)

DEFINE 2012 2 PL 110 (408) DF 65 (409)

FREEDOMS 2010 2 PL 101 (418) FM 75 (425)

FREEDOMS II 2014 2 PL 103 (355) FM 91 (358)

JOHNSON 1995 2 PL 31 (126) GA 27 (125)

PRISMS 1998 2 PL 71 (187) IA (SC) 51 (189)

TEMSO 2011 2 PL 99 (363) TF 72 (358)

BORNSTEIN 1987 2 PL 11 (23) GA 5 (25)

DISABILITY PROGRESSION CONFIRMED 6 MONTHS LATER Arm 1 Arm 2 Arm 3


BRAVO 2014 3 PL 46 (450) IV (IM) 35 (447) LQ 28 (434)

CONFIRM 2012 3 PL 45 (363) DF 28 (359) GA 38 (350)

ALLEGRO 2012 2 PL 78 (556) LQ 54 (550)

FREEDOMS 2010 2 PL 79 (418) FM 53 (425)

FREEDOMS II 2014 2 PL 63 (355) FM 49 (358)

INCOMIN 2002 2 IB 13 (96) IV (IM) 28 (92)

MSCRG 1996 2 PL 50 (143) IV (IM) 35 (158)

REGARD 2008 2 IA (SC) 45 (386) GA 33 (378)

ALT ABOVE UPPER LIMIT OF NORMAL RANGE Arm 1 Arm 2 Arm 3


BRAVO 2014 3 PL 84 (415) IV (IM) 131 (413) LQ 127 (384)

CONFIRM 2012 3 PL 149 (362) DF 167 (355) GA 129 (346)

ALLEGRO 2012 2 PL 99 (515) LQ 175 (504)

BEYOND 2009 2 GA 16 (445) IB 99 (888)

FREEDOMS II 2014 2 PL 18 (355) FM 62 (358)

PRISMS 1998 2 PL 2 (187) IA (SC) 10 (184)

REGARD 2008 2 IA (SC) 21 (381) GA 5 (375)

TEMSO 2011 2 PL 129 (360) TF 205 (358)

Appendix A

297

ALT ABOVE 3x UPPER LIMIT OF NORMAL RANGE Arm 1 Arm 2 Arm 3


BRAVO 2014 3 PL 10 (415) IV (IM) 11 (413) LQ 16 (384)

CONFIRM 2012 3 PL 23 (362) DF 20 (355) GA 24 (346)

ALLEGRO 2012 2 PL 8 (515) LQ 24 (504)

DEFINE 2012 2 PL 12.24 (408) DF 24.6 (410)

FREEDOMS 2010 2 PL 7 (418) FM 36 (425)

FREEDOMS II 2014 2 PL 12 (355) FM 33 (358)

TEMSO 2011 2 PL 24 (360) TF 24 (358)

ALT ABOVE 5x UPPER LIMIT OF NORMAL RANGE Arm 1 Arm 2 Arm 3


BRAVO 2014 3 PL 7 (415) IV (IM) 5 (413) LQ 4 (384)

CONFIRM 2012 3 PL 13 (363) DF 7 (355) GA 10 (346)

FREEDOMS 2010 2 PL 4 (418) FM 8 (425)

FREEDOMS II 2014 2 PL 4 (355) FM 8 (358)

SERIOUS GASTROINTESTINAL DISORDERS Arm 1 Arm 2 Arm 3


CONFIRM 2012 3 PL 0 (363) DF 4 (359) GA 0 (351)

ALLEGRO 2012 2 PL 1 (556) LQ 8 (550)

DEFINE 2012 2 PL 0 (408) DF 4 (410)

TEMSO 2011 2 PL 1 (360) TF 8 (358)

SERIOUS BRADYCARDIA Arm 1 Arm 2

Study # arms Drug n(N) Drug n(N)

FREEDOMS 2010 2 PL 1 (418) FM 4 (425)

FREEDOMS II 2014 2 PL 1 (355) FM 0 (358)

MACULAR EDEMA Arm 1 Arm 2

Study # arms Drug n(N) Drug n(N)

FREEDOMS 2010 2 PL 0 (418) FM 0 (425)

FREEDOMS II 2014 2 PL 0 (355) FM 1 (358)

Appendix A

298

Arm-level data on Normal scale

The tables below show the transformed Normal trial data for each of the RRMS case study outcomes

in turn. The uncertainty is described using variances (which relate to the distribution of each

outcome within the study arm) rather than standard errors (which relate to the sampling

distribution of the mean). The two are of course linked by the relation 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟 =

√𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒/𝑁.

DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a,

IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. N =

total number of patients in arm, n = number of patients experiencing given binary outcome, va=variance. *

indicates that the value was estimated based on other reported quantities.

LOG ANNUALISED RELAPSE RATE Arm 1 Arm 2 Arm 3

Study # arms Drug N Estimate(va) Drug N Estimate(va) Drug N Estimate(va)

BRAVO 2014 3 PL 450 -1.08 (3.52) IA (IM) 447 -1.35 (2.66) LQ 434 -1.27 (5.02)

CONFIRM 2012 3 PL 363 -0.92 (3.69) DF 359 -1.51 (4.56) GA 350 -1.24 (4.02)

ALLEGRO 2012 2 PL 556 -0.94 (3.3) LQ 550 -1.2 (2.45)

BECOME 2009 2 GA 39 -1.11 (3.87) IB 36 -0.99 (3.44)

BEYOND 2009 2 GA 448 -1.08 (3.57) IB 897 -1.02 (3.37)

DEFINE 2012 2 PL 408 -1.02 (3.89) DF 409 -1.77 (4.38)

FREEDOMS 2010 2 PL 418 -0.92 (2.85) FM 425 -1.71 (4.06)

FREEDOMS II 2014 2 PL 355 -0.92 (2.75) FM 358 -1.56 (3.47)

INCOMIN 2002 2 IB 94 -0.69 (2.45) IA (IM) 88 -0.36 (1.75)

JOHNSON 1995 2 PL 126 -0.17 (1.45) GA 125 -0.53 (2.07)

MSCRG 1996 2 PL 143 -0.2 (1.48) IA (IM) 158 -0.4 (1.82)

PRISMS 1998 2 PL 187 0.25 (0.95) IA (SC) 189 0.55 (0.7)

REGARD 2008 2 IA (SC) 381 -1.2 (4.12) GA 375 -1.24 (4.13)

TEMSO 2011 2 PL 363 -0.62 (1.81) TF 358 -0.99 (2.86)

BORNSTEIN 1987 2 PL 23 0.3 (0.92) GA 25 -1.2 (4.43)

IFNB 1993 2 PL 112 0.24 (0.96) IB 115 -0.17 (1.45)

Appendix A

299

LOG ODDS OF AVOIDING RELAPSE Arm 1 Arm 2 Arm 3


BRAVO 2014 3 PL 450 0.45 (4.21) IA (IM) 447 0.8 (4.67) LQ 434 0.66 (4.45)

CONFIRM 2012 3 PL 363 0.36 (4.13) DF 359 0.9 (4.86) GA 350 0.75 (4.6)

ALLEGRO 2012 2 PL 556 0.09 (4.01) LQ 550 0.53 (4.29)

BEYOND 2009 2 GA 448 0.99 (5.07) IB 897 1 (5.08)

DEFINE 2012 2 PL 408 0.16 (4.02) DF 409 0.99 (5.06)

FREEDOMS 2010 2 PL 418 -0.17 (4.03) FM 425 0.86 (4.79)

FREEDOMS II 2014 2 PL 355 0.11 (4.01) FM 358 0.92 (4.91)

INCOMIN 2002 2 IB 94 0.04 (4) IA (IM) 88 -0.58 (4.35)

JOHNSON 1995 2 PL 126 -1 (5.08) GA 125 -0.68 (4.48)

MSCRG 1996 2 PL 143 -1.02 (5.14) IA (IM) 158 -0.5 (4.26)

PRISMS 1998 2 PL 187 -1.66 (7.42) IA (SC) 189 -0.75 (4.59)

REGARD 2008 2 IA (SC) 381 0.49 (4.24) GA 375 0.49 (4.24)

TEMSO 2011 2 PL 363 -0.17 (4.03) TF 358 0.26 (4.07)

BORNSTEIN 1987 2 PL 23 -1.04 (5.19) GA 25 0.24 (4.06)

IFNB 1993 2 PL 112 -1.65 (7.41) IB 115 -0.79 (4.65)

LOG ODDS OF DISABILITY PROGRESSION, CONFIRMED 3 MONTHS LATER Arm 1 Arm 2 Arm 3


BRAVO 2014 3 PL 450 -1.87 (8.65) IA (IM) 447 -2.14 (10.63) LQ 434 -2.23 (11.44)

CONFIRM 2012 3 PL 363 -1.58 (7.06) DF 359 -1.89 (8.79) GA 350 -1.66 (7.44)

ALLEGRO 2012 2 PL 556 -1.68 (7.58) LQ 550 -2.08 (10.14)

BEYOND 2009 2 GA 448 -1.38 (6.23) IB 897 -1.33 (6.04)

DEFINE 2012 2 PL 408 -1 (5.08) DF 409 -1.67 (7.48)

FREEDOMS 2010 2 PL 418 -1.14 (5.46) FM 425 -1.54 (6.88)

FREEDOMS II 2014 2 PL 355 -0.89 (4.86) FM 358 -1.08 (5.27)

JOHNSON 1995 2 PL 126 -1.12 (5.39) GA 125 -1.29 (5.91)

PRISMS 1998 2 PL 187 -0.49 (4.25) IA (SC) 189 -1 (5.08)

TEMSO 2011 2 PL 363 -0.98 (5.04) TF 358 -1.38 (6.22)

BORNSTEIN 1987 2 PL 23 -0.09 (4.01) DF 25 -1.39 (6.25)

LOG ODDS OF DISABILITY PROGRESSION, CONFIRMED 6 MONTHS LATER Arm 1 Arm 2 Arm 3


BRAVO 2014 3 PL 450 -2.17 (10.9) IA (IM) 447 -2.47 (13.86) LQ 434 -2.67 (16.57)

CONFIRM 2012 3 PL 363 -1.96 (9.21) DF 359 -2.47 (13.91) GA 350 -2.11 (10.33)

ALLEGRO 2012 2 PL 556 -1.81 (8.29) LQ 550 -2.22 (11.29)

FREEDOMS 2010 2 PL 418 -1.46 (6.52) FM 425 -1.95 (9.16)

FREEDOMS II 2014 2 PL 355 -1.53 (6.85) FM 358 -1.84 (8.46)

INCOMIN 2002 2 IB 94 -1.85 (8.54) IA (IM) 88 -0.83 (4.72)

MSCRG 1996 2 PL 143 -0.62 (4.4) IA (IM) 158 -1.26 (5.8)

REGARD 2008 2 IA (SC) 381 -2.03 (9.71) GA 375 -2.35 (12.55)

Appendix A

300

LOG ODDS OF ALT ABOVE UPPER LIMIT OF NORMAL RANGE Arm 1 Arm 2 Arm 3


BRAVO 2014 3 PL 450 -1.37 (6.19) IA (IM) 447 -0.77 (4.62) LQ 434 -0.7 (4.52)

CONFIRM 2012 3 PL 363 -0.36 (4.13) DF 359 -0.12 (4.01) GA 350 -0.52 (4.28)

ALLEGRO 2012 2 PL 556 -1.44 (6.44) LQ 550 -0.63 (4.41)

BEYOND 2009 2 GA 448 -3.29 (28.85) IB 897 -2.08 (10.1)

FREEDOMS II 2014 2 PL 355 -2.93 (20.78) FM 358 -1.56 (6.98)

PRISMS 1998 2 PL 187 -4.53 (94.51) IA (SC) 189 -2.86 (19.46)

REGARD 2008 2 IA (SC) 381 -2.84 (19.2) GA 375 -4.3 (76.01)

TEMSO 2011 2 PL 363 -0.58 (4.35) TF 358 0.29 (4.09)

LOG ODDS OF ALT ABOVE 3x UPPER LIMIT OF NORMAL RANGE Arm 1 Arm 2 Arm 3


BRAVO 2014 3 PL 450 -3.7 (42.52) IA (IM) 447 -3.6 (38.57) LQ 434 -3.14 (25.04)

CONFIRM 2012 3 PL 363 -2.69 (16.81) DF 359 -2.82 (18.81) GA 350 -2.6 (15.49)

ALLEGRO 2012 2 PL 556 -4.15 (65.39) LQ 550 -3 (22.05)

DEFINE 2012 2 PL 408 -3.48 (34.36) DF 409 -2.75 (17.73)

FREEDOMS 2010 2 PL 418 -4.07 (60.73) FM 425 -2.38 (12.9)

FREEDOMS II 2014 2 PL 355 -3.35 (30.62) FM 358 -2.29 (11.95)

TEMSO 2011 2 PL 363 -2.64 (16.07) TF 358 -2.63 (15.99)

LOG ODDS OF ALT ABOVE 5x UPPER LIMIT OF NORMAL RANGE Arm 1 Arm 2 Arm 3


BRAVO 2014 3 PL 450 -4.07 (60.3) IA (IM) 447 -4.4 (83.61) LQ 434 -4.55 (97.01)

CONFIRM 2012 3 PL 363 -3.29 (28.96) DF 359 -3.91 (51.73) GA 350 -3.51 (35.63)

FREEDOMS 2010 2 PL 418 -4.64 (105.51) FM 425 -3.95 (54.14)

FREEDOMS II 2014 2 PL 355 -4.47 (89.76) FM 358 -3.78 (45.77)

Appendix A

301

The variances given for the outcomes below are calculated as (0.025 + 𝑝)(0.975 − 𝑝) × 100 𝑁⁄ (where p is the estimated risk) as per II.6.1.5.

RISK OF SERIOUS GASTROINTESTINAL DISORDERS Arm 1 Arm 2 Arm 3

Study # arms Drug N Estimate(va) Drug N Estimate(va) Drug N Estimate(se)

CONFIRM 2012 3 PL 363 0 (0.0067) DF 359 0.011 (0.0097) GA 350 0 (0.0069)

ALLEGRO 2012 2 PL 556 0.002 (0.0047) LQ 550 0.015 (0.0069)

DEFINE 2012 2 PL 408 0 (0.006) DF 409 0.01 (0.0082)

TEMSO 2011 2 PL 363 0.003 (0.0075) DF 358 0.022 (0.0126)

RISK OF SERIOUS BRADYCARDIA Arm 1 Arm 2

Study # arms Drug N Estimate(va) Drug N Estimate(se)

FREEDOMS 2010 2 PL 418 0.002 (0.0064) FM 425 0.009 (0.0078)

FREEDOMS II 2014 2 PL 355 0.003 (0.0076) FM 358 0 (0.0068)

RISK OF MACULAR EDEMA Arm 1 Arm 2

Study # arms Drug N Estimate(va) Drug N Estimate(va)

FREEDOMS 2010 2 PL 418 0 (0.0058) FM 425 0 (0.0057)

FREEDOMS II 2014 2 PL 355 0 (0.0069) FM 358 0.003 (0.0075)

Appendix A

302

Outcome proportionality plots

Relapse-free proportion vs annualised relapse rate

These effects are assumed to occur in proportion in all mapping strategies used.

Appendix A

303

Disability progression confirmed 6 months later vs 3 months later


ALT above 3x ULN vs ALT above ULN

These effects are assumed to occur in proportion in all the mapping strategies used.

Appendix A

304

ALT above 5x ULN vs ALT above ULN


Disability progression confirmed 3 months later vs annualised relapse rate

These effects are assumed to occur in proportion in the one-group and two-group models.

Appendix A

305

Disability progression confirmed 6 months later vs annualised relapse rate

These effects are assumed to occur in proportion in the one-group and two-group models.

ALT above ULN vs annualised relapse rate

These effects are assumed to occur in proportion in the one-group models.

Appendix A

306

ALT above 3xULN vs annualised relapse rate


ALT above 5xULN vs annualised relapse rate


Appendix A

307

2 PROTECT datasets

Investigator ratings

The raw pairwise ratings in the PROTECT investigator ratings dataset are shown in the table below.

PML = progressive focal leukoencephalopathy.

Pairwise comparison Participant 1 Participant 2 Participant 3

Avoid a relapse vs avoid a disability progression 0.7 0.7 0.6

Avoid a disability progression vs avoid PML 0.1 0.9 0.9

Daily subcutaneous -> daily oral vs avoid PML 0.01 0.1 0.1

Avoid herpes reactivation vs avoid PML 0.12 0.2 0.3

Avoid liver enzyme elevation vs avoid PML 0.2 0.2 0.2

Avoid seizures vs avoid PML 0.1 0.1 0.1

Avoid congenital abnormalities vs avoid PML 0.1 0.1 0.1

Avoid infusion/injection reactions vs avoid PML 0.05 0.05 0.05

Avoid allergic/hypersensitivity reactions vs avoid infusion/injection reactions 0.4 0.4 0.89

Avoid flu-like reactions vs avoid infusion/injection reactions 0.4 0.4 1.11

Daily subcutaneous -> daily oral vs daily subcutaneous -> monthly intravenous infusion 0.7 0.7 0.7

Daily subcutaneous -> daily oral vs daily subcutaneous -> weekly intramuscular 0.5 0.5 0.5

Patient ratings

The relative ratings for administration modes were the only data used from the PROTECT patient

ratings study and are shown in the table below. NA indicates missing values, i.e. questions left

unanswered by the respondent. As the AHP method was used to elicit these ratings, they all take a

value of 1,3,5,7,9 or a reciprocal thereof.

Appendix A

308

Participant Monthly

infusion vs

weekly

intramuscular

Daily

subcutaneous

vs monthly

infusion

Weekly

intramuscular

vs daily

subcutaneous

Daily oral vs

weekly

intramuscular

Daily

subcutaneous

vs daily oral

Daily oral

vs monthly

infusion

1 0.111111 0.142857 7 3 0.333333 3

2 3 0.2 0.111111 5 5 5

3 3 0.333333 9 3 0.333333 9

4 NA NA 9 3 0.333333 7

5 5 0.333333 NA 3 0.333333 3

6 3 0.333333 3 0.333333 0.333333 0.333333

7 3 0.333333 5 3 0.333333 9

8 3 0.333333 3 1 1 0.333333

9 1 1 1 1 1 1

10 5 0.2 9 7 0.142857 1

11 9 0.111111 1 1 1 1

12 0.111111 3 0.2 5 1 5

13 1 1 1 5 0.2 5

14 3 0.333333 3 3 0.333333 3

15 5 0.333333 9 3 0.333333 0.333333

16 3 0.333333 3 3 0.333333 3

17 3 0.333333 1 3 0.333333 3

18 9 0.111111 1 3 0.333333 3

19 3 0.333333 0.2 3 0.333333 1

20 5 0.2 9 3 0.333333 3

21 3 0.333333 1 3 0.333333 0.2

22 9 9 0.142857 3 0.2 3

23 0.142857 0.142857 7 3 0.333333 3

24 0.2 5 9 3 0.333333 3

25 0.333333 0.333333 0.333333 3 0.333333 3

26 5 0.333333 0.2 3 0.2 5

27 3 0.333333 9 0.111111 0.111111 0.333333

28 3 0.333333 7 3 0.333333 0.2

29 1 0.142857 9 5 0.2 5

30 3 0.333333 3 3 0.333333 3

31 3 0.333333 9 5 0.333333 0.333333

32 NA NA NA NA NA NA

33 9 7 0.142857 3 0.142857 7

34 0.142857 0.2 5 3 0.333333 3

35 3 0.333333 5 3 0.333333 1

36 0.111111 0.111111 9 0.111111 9 0.111111

Appendix A

309

Patient choices

The PROTECT patient choice data consists of 1755 individual choices, generated by 124 individuals

presented with 16 choices each. This would be prohibitively long to show in full at the individual

level, and instead is shown below in collapsed form, i.e. with one row per choice set (of which 64

were used in total) and, for each choice set, the number of participants who made each choice out

of those who responded. Missing responses have been excluded. This is the form in which the data

was analysed and is equivalent to the full data provided one is not concerned with preference

variability at the individual level.

N = total respondents, n = number of respondents choosing option B, ARR = annualised relapse rate,

DP = disability progression, PML = progressive multifocal leukoencephalopathy, A/H = allergic-

hypersensitivity reactions, SA = serious allergic reactions, DEP = depression.

Option A Option B

Choice

set

N n ARR DP

risk

PML

risk

A/H

risk

SA

risk

DEP

risk

ARR DP

risk

PML

risk

A/H

risk

SA

risk

DEP

risk

1 25 21 1 0.25 0 0 0 0.1 0.75 0.1 0 0.5 0 0.2

2 25 12 1 0.1 0.003 0.5 0.02 0.2 0.75 0.25 0 0.5 0.02 0.1

3 24 2 1 0.1 0 0 0.02 0.1 1 0.25 0.003 0.5 0 0.1

4 25 25 0.75 0.25 0 0.5 0.02 0.2 1 0.1 0 0 0 0.2

5 25 18 0.75 0.25 0.003 0.5 0 0.2 0.75 0.25 0.003 0 0.02 0.1

6 24 8 1 0.1 0 0.5 0 0.1 0.75 0.1 0.003 0 0.02 0.1

7 25 1 0.75 0.1 0.003 0 0 0.2 0.75 0.25 0.003 0.5 0.02 0.1

8 25 6 1 0.1 0.003 0 0 0.2 1 0.25 0 0 0.02 0.2

9 27 21 1 0.25 0.003 0 0 0.2 1 0.25 0 0.5 0 0.1

10 27 2 0.75 0.1 0 0 0.02 0.2 1 0.25 0.003 0 0.02 0.1

11 27 15 1 0.25 0 0.5 0 0.2 0.75 0.25 0 0 0.02 0.2

12 27 20 0.75 0.1 0.003 0.5 0 0.2 0.75 0.1 0 0.5 0.02 0.1

13 27 9 1 0.1 0 0.5 0.02 0.2 0.75 0.1 0.003 0.5 0.02 0.1

14 27 15 0.75 0.25 0.003 0 0 0.1 1 0.1 0.003 0 0.02 0.2

15 27 19 1 0.25 0 0 0.02 0.1 0.75 0.1 0.003 0 0 0.1

16 27 4 0.75 0.1 0.003 0.5 0.02 0.2 1 0.25 0.003 0.5 0 0.2

17 32 23 1 0.25 0.003 0 0 0.2 1 0.25 0 0.5 0 0.1

18 32 2 1 0.25 0 0 0 0.2 1 0.25 0 0 0.02 0.1

19 32 13 1 0.25 0 0.5 0 0.2 0.75 0.25 0 0 0.02 0.2

20 31 20 0.75 0.1 0.003 0.5 0 0.2 0.75 0.1 0 0.5 0.02 0.1

21 31 13 1 0.1 0 0.5 0.02 0.2 0.75 0.1 0.003 0.5 0.02 0.1

22 32 16 0.75 0.25 0.003 0 0 0.1 1 0.1 0.003 0 0.02 0.2

23 30 24 1 0.25 0 0 0.02 0.1 0.75 0.1 0.003 0 0 0.1

24 31 6 0.75 0.1 0.003 0.5 0.02 0.2 1 0.25 0.003 0.5 0 0.2

25 30 20 1 0.25 0 0 0 0.1 0.75 0.1 0 0.5 0 0.2

26 30 18 1 0.1 0.003 0.5 0.02 0.2 0.75 0.25 0 0.5 0.02 0.1

Appendix A

310

27 30 2 1 0.1 0 0 0.02 0.1 1 0.25 0.003 0.5 0 0.1

28 29 28 0.75 0.25 0 0.5 0.02 0.2 1 0.1 0 0 0 0.2

29 29 19 0.75 0.25 0.003 0.5 0 0.2 0.75 0.25 0.003 0 0.02 0.1

30 30 10 1 0.1 0 0.5 0 0.1 0.75 0.1 0.003 0 0.02 0.1

31 30 4 0.75 0.1 0.003 0 0 0.2 0.75 0.25 0.003 0.5 0.02 0.1

32 30 9 1 0.1 0.003 0 0 0.2 1 0.25 0 0 0.02 0.2

33 23 20 0.75 0.25 0.003 0 0.02 0.2 0.75 0.1 0.003 0.5 0 0.1

34 22 8 0.75 0.25 0 0.5 0 0.2 1 0.25 0 0.5 0.02 0.1

35 23 14 1 0.25 0 0.5 0.02 0.2 1 0.25 0.003 0 0 0.1

36 22 2 1 0.1 0.003 0 0.02 0.2 0.75 0.25 0.003 0.5 0.02 0.2

37 23 1 1 0.1 0 0.5 0 0.2 1 0.25 0.003 0.5 0.02 0.1

38 21 11 0.75 0.1 0.003 0 0.02 0.1 0.75 0.25 0 0 0 0.1

39 23 22 1 0.1 0.003 0.5 0 0.1 0.75 0.1 0 0 0 0.2

40 23 4 0.75 0.1 0 0.5 0 0.1 1 0.1 0 0 0.02 0.2

41 22 18 0.75 0.25 0 0 0.02 0.1 1 0.1 0 0.5 0.02 0.1

42 22 20 0.75 0.1 0.003 0.5 0.02 0.2 1 0.1 0.003 0 0 0.1

43 22 12 1 0.1 0.003 0.5 0 0.2 0.75 0.25 0 0.5 0 0.1

44 22 3 0.75 0.25 0 0 0 0.2 0.75 0.25 0.003 0.5 0 0.1

45 22 19 0.75 0.25 0.003 0 0.02 0.2 0.75 0.1 0 0.5 0.02 0.2

46 22 3 1 0.1 0.003 0.5 0.02 0.1 1 0.25 0.003 0 0.02 0.2

47 22 0 1 0.1 0 0 0 0.1 0.75 0.25 0.003 0 0 0.2

48 22 17 1 0.25 0 0 0 0.2 0.75 0.1 0 0 0.02 0.1

49 30 23 0.75 0.25 0 0 0.02 0.1 1 0.1 0 0.5 0.02 0.1

50 30 29 0.75 0.1 0.003 0.5 0.02 0.2 1 0.1 0.003 0 0 0.1

51 30 9 1 0.1 0.003 0.5 0 0.2 0.75 0.25 0 0.5 0 0.1

52 30 8 0.75 0.25 0 0 0 0.2 0.75 0.25 0.003 0.5 0 0.1

53 30 27 0.75 0.25 0.003 0 0.02 0.2 0.75 0.1 0 0.5 0.02 0.2

54 30 3 1 0.1 0.003 0.5 0.02 0.1 1 0.25 0.003 0 0.02 0.2

55 30 1 1 0.1 0 0 0 0.1 0.75 0.25 0.003 0 0 0.2

56 30 26 1 0.25 0 0 0 0.2 0.75 0.1 0 0 0.02 0.1

57 32 31 0.75 0.25 0.003 0 0.02 0.2 0.75 0.1 0.003 0.5 0 0.1

58 32 12 0.75 0.25 0 0.5 0 0.2 1 0.25 0 0.5 0.02 0.1

59 32 19 1 0.25 0 0.5 0.02 0.2 1 0.25 0.003 0 0 0.1

60 32 1 1 0.1 0.003 0 0.02 0.2 0.75 0.25 0.003 0.5 0.02 0.2

61 32 2 1 0.1 0 0.5 0 0.2 1 0.25 0.003 0.5 0.02 0.1

62 32 11 0.75 0.1 0.003 0 0.02 0.1 0.75 0.25 0 0 0 0.1

63 32 26 1 0.1 0.003 0.5 0 0.1 0.75 0.1 0 0 0 0.2

64 32 3 0.75 0.1 0 0.5 0 0.1 1 0.1 0 0 0.02 0.2

Appendix A

311

3 Preference meta-analysis

Literature search

A search was carried out on PubMed for the following expressions:

“multiple sclerosis”

AND

“preference”/”preferences”/”utility”/”utilities”/”elicitation”/”elicited”/”elicit”

AND

“patient”

AND

“relapsing”

AND

“remitting”

PubMed search term: ((((((((((preference) OR preferences) OR utility) OR utilities) OR elicit)

OR elicitation) OR elicited) AND multiple sclerosis) AND relapsing) AND remitting) AND patient

PubMed hits: 198 (search carried out 22/09/2017)

After title screening: 30 papers remaining

Papers were then screened for relevance of the methodology. Some papers were primarily clinical

and only mentioned utility or preferences in passing. Many papers were not concerned with

preference elicitation for individual criteria, but rather with surveying global utility or quality of life

among the MS patient population. These were excluded, leaving only 9 papers remaining.

Then each paper was assessed for compatibility with the RRMS case study, i.e. whether the following

requirements were met:

- Preferences should be elicited from multiple sclerosis patients.

- The criteria assessed must include some outcomes or treatment administration modes from

the RRMS case study, with definitions/scales that are either equivalent to those used in the

evidence synthesis or can be used to approximate the latter via simple transformations.

312

- The units of each crterion must be clearly expressed within the elicitation tasks.

Two studies were excluded as they did not satisfy the last point, with no criteria units specified

during the elicitation procedure:

• Sempere, Angel & López, Vanesa M. & Gimenez-Martinez, Juana & Ruiz-Beato, Elena &

Cuervo, Jesús & Maurino, Jorge. (2017). Using a multidimensional unfolding approach to

assess multiple sclerosis patient preferences for disease-modifying therapy: A pilot study.

Patient Preference and Adherence. Volume 11. 995-999. doi:10.2147/PPA.S129356.

• Kremer IE, Evers SM, Jongen PJ, van der Weijden T, van de Kolk I, Hiligsmann M.

Identification and Prioritization of Important Attributes of Disease-Modifying Drugs in

Decision Making among Patients with Multiple Sclerosis: A Nominal Group Technique and

Best-Worst Scaling. PLoS One. 2016;11(11):e0164862. doi:10.1371/journal.pone.0164862

This resulted in a final set of 7 studies, providing the utility coefficients set out in the tables below

(full references are given in the bibliography). As the tables show, the studies use different

conventions: dummy or effects coding may be used to construct coefficients; and standard errors,

standard deviations or confidence intervals to report uncertainty. One study (Wilson 2014) reported

exponentiated coefficients.

313

N = number of participants, ARR = annualised relapse rate, DP = disability progression, IM = intramuscular, IV = intravenous infusion, SC = subcutaneous, coef = coefficient, sd = standard deviation, se = standard error, CI = confidence interval.

Study: ARROYO N=221

Study type: absolute scenario ratings

Effects-coded coefficients

ARR coef sd

0.2 0.367 0.131

0.5 -0.367 0.131

Expected time to DP (years) coef sd

2 -0.445 0.131

5 0.445 0.131

Administration route coef sd

Oral 1.345 0.195

SC/IM -0.381 0.175

IV -0.965 0.195

Administration frequency coef sd

Daily -0.877 0.206

every 2 days - weekly -0.527 0.251

monthly 0.267 0.206

twice yearly 1.137 0.251

Study: GARCIA-DOMINGUEZ N=125

Linear coefficient

Expected time to DP coef se

1 year reduction 0.128 0.013

Dummy-coded coefficients

Administration modes coef se

Oral daily 0

IM weekly -0.849 0.113

SC several x week -0.943 0.103

314

Study: MANSFIELD N=301

Study type: discrete choice (logit)

Numbers visually approximated from graphs


1-year DP risk coef 95% CI

0.15 -1.3 (-1.5,-1.1)

0.02 1.3 (1.1,1.5)

ARR coef 95% CI

0.125 0.25 (0.1,0.4)

0.167 0.2 (0.05,0.35)

0.2 0.05 (-0.1,0.2)

0.5 -0.5 (-0.65,-0.35)

Administration modes coef 95% CI

Oral daily 1.2 (1.0,1.4)

Injection 3x week -0.5 (-0.8,-0.2)

IV monthly -0.3 (-0.6,0)

IV every 6 months 0.25 (0,0.5)

Study: POULOS N=189


Numbers visually approximated from graphs


ARR coef 95% CI

0.25 0.6 (0.4,0.8)

0.75 -0.1 (-0.25,0.05)

1 -0.5 (-0.7,-0.3)

Expected time to DP (years)

coef 95% CI

1 -0.9 (-1.2,-0.6)

2 -0.3 (-0.5,-0.1)

4 1.2 (0.9,1.5)

315

Study: UTZ N=156



Administration route coef sd

Oral 3.61 2.22

SC/IM -3.61 2.22

Administration frequency

coef sd

Daily -0.49 0.88

every 2 days - weekly 2.35 1.35

monthly 3.74 2.48

x3 daily -5.61 3.31

Study: WILSON 2014 N=291


Exponentiated coefficients reported


ARR exp(coef) 95% CI

1 1

0.5 1.2 (1.08,1.32)

0.2 1.53 (1.38,1.69)


exp(coef) 95% CI

2 1

4 1.36 (1.23,1.50)

10 2.46 (2.22,2.72)

Administration modes exp(coef) 95% CI

SC daily 1

IM weekly 1.04 (0.93,1.18)

IV monthly 1.62* (1.54,1.71)

Oral daily 2.08 (1.84,2.35)

* reported as 1.52 in the original paper but presumed to be a typographical error

316



Linear coefficients

ARR coef se

1 -0.05 0.06


coef se

1 0.12 0.03


Oral daily (reference) 0

IM 3x week -1.23 0.24

SC 3x week -1.41 0.24

IV monthly -0.86 0.24

Where studies reported standard deviations, these were converted to standard errors by dividing by

the square root of the number of participants. Where 95% confidence intervals were reported, the

interval width was divided by 3.92 to obtain the standard error.

Relapse rates were sometimes expressed in terms of the expected time between relapses, i.e. the

reciprocal of the rate, but not explicitly modelled as linear on that scale, meaning they could be

simply converted to annualised rates at the point of extraction (and have been expressed as such in

the tables above). The situation with regard to disability progression was less straightforward. The

evidence synthesis (and PROTECT preference datasets) used 2-year risk outcomes to measure this

criterion whereas the published elicitation studies used the expected time to disability progression

(in years), or in one study (Mansfield) a 1-year risk. It was decided that extrapolating preferences

from a 1-year to a 2-year risk horizon was too speculative, so this data was not included. Where

preferences were elicited regarding the expected time to disability progression, this was

transformed to a 2-year risk under a constant hazard assumption i.e. using the formula

𝑃(𝑝𝑟𝑜𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑤𝑖𝑡ℎ𝑖𝑛 2 𝑦𝑒𝑎𝑟𝑠) = 1 − 𝑒−2

𝑡 where t is the expected time until progression (see

III.5.4.1.3). This could not be done for the Garcia-Dominguez study, however, since this study

elicited a single linear coefficient on time to progression at three discrete levels. This fixes the utility

scale as linear in time to progression and renders it incompatible with the case study assumption of

linearity in progression risk. This coefficient was therefore not included.

Given the scale transformations applied to the data it is sensible to check that the assumption of

linear utility on the ARR and disability risk scales is appropriate. The graphs below plot the discrete

317

coefficient estimates on those scales (for the studies contributing such data) and appear sufficiently

linear for these purposes, albeit with too few data points to draw any firm conclusions regarding the

true relationships.

Rebasing was needed to make the coding scheme consistent throughout the dataset. Coefficients

were converted where necessary to dummy coding as described in III.5.4.1.1III.5.4. Continuous

criteria were assumed linear so the choice of reference is arbitrary provided the criterion levels are

expressed as linear changes from the reference point. For the categorical criterion, administration

modes, “daily subcutaneous” was selected as the reference; where this was unavailable in any given

study, a study-specific alternative reference category was used and recorded in the data so the

model could adjust the parameters accordingly (i.e. by combining with the parameter for the study-

specific reference category). Creating a pooled category for intramuscular and subcutaneous

injections taken at least once a week (but not every day) was the best way to make efficient use of

the data.

Most of the published studies combined the administration mode and frequency into a single

criterion (as in the PROTECT preference datasets) but two studies (Arroyo, Utz) elicited preferences

318

for these two dimensions separately. It is straightforward to combine them by taking linear pairings

of the coefficients (see III.5.4.3).

The tables below show the source data after transformations and rebasing.

Study: ARROYO

N=221


Change in ARR coef se

0.3 -0.734 0.0176

Change in 2-year DP risk

coef se

0.3024 -0.89 0.0176


SC daily (reference) 0

Oral daily 1.726 0.022

IV monthly 0.56 0.210

Injection 2 days-weekly 0.35 0.025

Study: GARCIA-DOMINGUEZ

N=125




Injection 2 days-weekly -0.849 0.113


Study: MANSFIELD N=301



0 (reference) 0

0.042 -0.05 0.1250

0.075 -0.2 0.1250

0.375 -0.75 0.1250


Oral daily 0

Injection 2 days-weekly

-1.7 0.2101


319

Study: POULOS N=189



0.5 -0.7 0.1552

0.75 -1.1 0.1767

Change in DP event risk

coef se

0 (reference) 0

-0.23254 0.6 0.2224

-0.4712 2.1 0.2651

Study: UTZ N=156


Administration frequency

coef se

Oral daily (reference) 0 0

Injection 2 days-weekly

-4.38 0.5045




-0.5 0.1823 0.0512

-0.8 0.4253 0.0517


coef se

0 0

-0.2387 0.3075 0.0506

-0.4509 0.9002 0.0518


SC daily 0

Injection 2 days-weekly 0.0392 0.0607

IV monthly 0.4824 0.0267

Oral daily 0.7324 0.0624

320




1 -0.05 0.06


coef se

1 0.12 0.03






Appendix B

321

Appendix B. BUGS code and data

1 Clinical evidence synthesis

Dictionary

The table below describes the key variables/parameters/constants.

Name Description d[t,] Population average effect of treatment t on outcome ad[t, ] Magnitude (absolute value) of d[t, ] adelta[i,k,j] Effect (relative to reference treatment) on outcome j of treatment in

arm k of study i delta[i,k,j] Used in constructing adelta[i,k,j]. Almost identical to adelta[i,k,j] but is

non-zero for reference treatment. sdelta[i,j] Mean of adelta[i,k,j] across all trial arms k b[] Magnitude of the average (across treatments) mapping coefficient for

outcome , i.e. the absolute value of the average ratio between

outcome and the reference outcome in the appropriate group sb[] Mapping coefficient for outcome with correct sign sign[] Known sign of treatment effect on outcome . Takes value 1 if all

treatment effects on outcome (relative to reference) are positive; or

value -1 if all treatment effects on outcome (relative to reference) are negative

impact[] Takes value 1 if an increase in outcome is beneficial; or value -1 if an

increase in outcome is harmful beta[t, ] Treatment-specific mapping coefficient for outcome relative to

reference outcome in the appropriate group, for treatment t lbeta[t, ] Treatment-specific mapping coefficient for outcome relative to

reference outcome in the appropriate group, for treatment t, on log scale

rho_b[] Between-study propensity to correlate for outcome ; if equal for all outcomes, it is also the between-study correlation coefficient

rho_w[] Within-study propensity to correlate for outcome ; if equal for all outcomes, it is also the within-study correlation coefficient

sig Between-study random effects standard deviation tau Between-study random effects precision mapsig Between-treatment random mappings standard deviation maptau Between-treatment random mappings precision y[i,k,j] Observed value of outcome j in arm k of study i va[i,k,j] Observed variance of outcome j in arm k of study i ns Number of studies in dataset nt Number of treatments in dataset na[i] Number of arms in study i maxarms Highest number of arms in any study in dataset* no[i] Number of outcomes in study i totalo Total number of outcomes in dataset ng Number of outcome groups used for mappings ogbase Vector listing the first outcome in each group (plus a final component

equal to totalo+1) a[] Population average absolute level of outcome in untreated

population on Normal scale

Appendix B

322

alpha[ns+1,] Study-level predictive distributon of absolute level of outcome in untreated population on Normal scale

absd[t,] Population average absolute level of outcome on treatment t on Normal scale

pm_amu[ns+1,t,] Study-level predictive distribution of absolute level of outcome on treatment t on Normal scale

pred_y[t,] Individual-level predictive distribution of absolute level of outcome on treatment t on Normal scale

trad[t,] Population average absolute level of outcome on treatment t back-transformed to original scale

trad_pred_study[t,] Study-level predictive distribution of absolute level of outcome on treatment t back-transformed to original scale

trad_pred_y[t,] Individual-level predictive distribution of absolute level of outcome on treatment t back-transformed to original scale

* Due to the coding used to construct the covariance matrix, if there is a study with the maximum number of

arms and outcomes then it is necessary to increase the value of maxarms by 1.

The treatments, outcomes and studies in the RRMS case study are numbered as follows:

Treatments

1 Placebo

2 Dimethyl fumarate

3 Fingolimod

4 Glatiramer acetate

5 Interferon beta-1a (intramuscular)

6 Interferon beta-1a (subcutaneous)

7 Interferon beta-1b

8 Laquinimod

9 Teriflunomide

Outcomes

1 Annualised relapse rate

2 Relapse-free proportion

3 Proportion undergoing disability progression; confirmed 3 months later


5 Alanine aminotransferase above upper limit of normal range

6 Alanine aminotransferase above 3x upper limit of normal range


8 Proportion with serious gastrointestinal disorders

9 Proportion with serious bradycardia

10 Proportion with macular edema

Studies

1 BRAVO 2014

2 CONFIRM 2012

3 ALLEGRO 2012

4 BECOME 2009

5 BEYOND 2009

6 DEFINE 2012

7 FREEDOMS 2010

8 FREEDOMS II 2014

9 INCOMIN 2002

10 JOHNSON 1995

11 MSCRG 1996

12 PRISMS 1998

13 REGARD 2008

14 TEMSO 2011

15 BORNSTEIN 1987

16 IFNB 1993

Treatment effects module code: Model 0

Variables and constants specific to Model 0 Name Description no1[i] Number of Poisson distributed outcomes in study i no2[i] Number of binary outcomes in study i modelled with odds ratios totalo1 Total number of Poisson distributed outcomes in dataset totalo1 Total number of of binary outcomes modelled with odds ratios in

dataset

# This model uses random effects on outcomes 1-7; to obtain a fixed effects model replace each blue line of code with delta[i,k,j] <- H[i,k,j] # Outcomes 8-10 are modelled using fixed effects.

Appendix B

323

model

sig~dunif(0,10) # prior for between-study sd of treatment effects

tau<-pow(sig,-2) # between-study precision of treatment effects

for(i in 1:ns)

temp[i]<-sum(n[i,1:na[i]]) # variable n is not used

# outcome 1: relapse rate (Poisson)

for (i in 1:ns)

for (j in 1:no1[i])

mu[i,j]~dnorm(0,.001)

for (k in 1:na[i])

lambda[i,k,j] <- pi[i,k,j]*va[i,k,j] # here va is the number of person-

years

y[i,k,j]~dpois(lambda[i,k,j])

log(pi[i,k,j])<-mu[i,j] + adelta[i,k,j]

yhat[i,k,j] <- pi[i,k,j] * va[i,k,j]

dev[i,k,j] <- 2 * (y[i,k,j] * (log(y[i,k,j])-log(yhat[i,k,j])) - ( y[i,k,j]

- yhat[i,k,j]))

delta[i,k,j] ~ dnorm(H[i,k,j],tau) # distribution of trial-specific

treatment effect on outcome j in arm k

adelta[i,k,j]<-(1-equals(k,1))*delta[i,k,j] # treatment effect set to

zero for reference treatment

H[i,k,j] <- d[t[i,k],o[i,j]] - d[t[i,1],o[i,j]]

prd[i,j]<-sum(dev[i,1:na[i],j])

# outcomes 2-7: common binary events (Binomial, via odds ratio)

for (j in no1[i]+1:no1[i]+no2[i])

mu[i,j]~dnorm(0,.001)

for (k in 1:na[i])

y[i,k,j]~dbin(pi[i,k,j],va[i,k,j]) # here va is denominator

logit(pi[i,k,j])<-mu[i,j] + adelta[i,k,j]


dev[i,k,j] <- 2 * (y[i,k,j] * (log(y[i,k,j])-log(yhat[i,k,j])) + (va[i,k,j]

- y[i,k,j]) * (log(va[i,k,j] - y[i,k,j]) - log(va[i,k,j] - yhat[i,k,j])))

delta[i,k,j] ~ dnorm(H[i,k,j],tau) # distribution of trial-specific

treatment effect on outcome j in arm k

adelta[i,k,j]<-(1-equals(k,1))*delta[i,k,j] # treatment effect set to

zero for reference treatment

H[i,k,j] <- d[t[i,k],o[i,j]] - d[t[i,1],o[i,j]]


# outcomes 8-10: common binary events (Binomial, via risk diff)

for (j in no1[i]+no2[i]+1:no[i])

mu[i,j]~dgamma(0.5,0.5)

for (k in 1:na[i])

y[i,k,j]~dbin(pi[i,k,j],va[i,k,j]) # here va is denominator

pi[i,k,j]<-mu[i,j]+min(max(d[t[i,k],o[i,j]]-d[t[i,1],o[i,j]],-mu[i,j]), 1-

mu[i,j])

Appendix B

324


dev[i,k,j] <- 2 * (y[i,k,j] * (log(y[i,k,j])-log(yhat[i,k,j])) + (va[i,k,j]

- y[i,k,j]) * (log(va[i,k,j] - y[i,k,j]) - log(va[i,k,j] - yhat[i,k,j])))


resdev[i]<-sum(prd[i,1:no[i]])

for (k in 2:nt)

for (j in 1:totalo1+totalo2) d[k,j]~dnorm(0,.001) # prior for

mean treatment effects

for (j in totalo1+totalo2+1:totalo) d[k,j]~dbeta(0.5,0.5)

for (j in 1:totalo)

d[1,j]<-0 # mean treatment effect is zero on reference

treatment

for (k in 1:nt)

for (j in 1:totalo)

rank[k,j]<-equals(impact[j],-1)*rank(d[,j],k)+equals(impact[j],1)*(nt+1-

rank(d[,j],k)) # treatment rankings by outcome

for (q in 1:nt)

rankprop[k,j,q]<-equals(rank[k,j],q) # indicator for time

spent at each rank

cumrankprop[k,j,q]<-step(q-rank[k,j]) # indicator for time

spent at or below each rank

sucra[k,j]<-sum(cumrankprop[k,j,1:nt-1])/(nt-1) # SUCRA

nmaresdev<-sum(resdev[]) # summed overall residual

deviance

# END


# This model uses random effects; to obtain a fixed effects model replace the blue line of code with delta[i,k,j] <- H[i,k,j]

model



for(i in 1:ns)

rc[i]<-0

E[i]~dnorm(0,1) # normalised between-trial different-arm different-

outcome covariance of treatment effects (delta)

resdev[i]<-inprod(pres[i,1:no[i]*na[i]],res[i,1:no[i]*na[i]]) #

residual deviance for study i

cp[i,1:totalo*maxarms,1:totalo*maxarms]<-

inverse(cv[i,1:totalo*maxarms,1:totalo*maxarms])

# within-study coprecision matrix of outcomes in study i

Appendix B

325

for (j in 1:no[i])

G[i,j]~dnorm(0,1) # normalised between-trial different-arm same-outcome

covariance of treatment effects (delta)

mu[i,j]~dnorm(0,.001) # "average" level of outcome j in study

i across all trial arms

delta[i,1,j]<-0

for (k in 1:na[i])

D[i,k,j] <- mu[i,j] + delta[i,k,j] +

signr_w[o[i,j]]*sqrt(abs(rho_w[o[i,j]]))*B[i,k]*pow(prec[i,k,j],-0.5)

# mean of outcome j in arm k of study i is the average across all arms plus

the effect of treatment compared to average; final term induces required

within-study covariance between different outcomes in same arm

y[i,k,j]~dnorm(D[i,k,j],yprec[i,k,j]) # distribution

of outcome j in arm k of study i

prec[i,k,j]<-pow(va[i,k,j]/n[i,k],-1) # overall

variance of observed outcome y

yprec[i,k,j] <- prec[i,k,j]/(1-abs(rho_w[o[i,j]])) # remaining

(unshared) precision of y[I,k,j] after accounting for covariance

for (k in 1:na[i])

B[i,k]~dnorm(0,1) # normalised within-trial same-arm different-

outcome covariance of observed outcomes (y)

for (k in 2:na[i])

F[i,k] ~ dnorm(0,1) # normalised between-trial same-arm

different-outcome covariance of treatment effects (delta)

for (j in 1:no[i])

taud[i,k,j]<-tau/(1-abs(rho_b[o[i,j]])-0.5+0.5*abs(rho_b[o[i,j]]))

# remaining (unshared) precision of delta after accounting for covariances

delta[i,k,j] ~ dnorm(H[i,k,j],taud[i,k,j]) # distribution of

trial-specific treatment effect on outcome j in arm k

H[i,k,j] <- d[t[i,k],o[i,j]] - d[t[i,1],o[i,j]] +

signr_b[o[i,j]]*(sqrt(abs(rho_b[o[i,j]])*0.5)*E[i]+signr_b[o[i,j]]*sqrt(abs

(rho_b[o[i,j]])-abs(rho_b[o[i,j]])*0.5)*F[i,k] + signr_b[o[i,j]]*sqrt(0.5-

abs(rho_b[o[i,j]])*0.5)*G[i,j])* pow(tau,-0.5)

# mean of treatment effect for outcome j in arm k of study i is the

population average effect parameter for that treatment/outcome, with

adjustments for correlations: different-arm/different-outcome, same-

arm/different-outcome, same-arm/same-outcome

for (x in 1:no[i]*na[i])

arm[i,x]<-trunc(1+(x-1)/no[i])

out[i,x]<-x-no[i]*trunc((x-1)/no[i])

for (z in 1:no[i]*na[i])

cv[i,x,z]<-

pow(prec[i,arm[i,x],out[i,x]]*prec[i,arm[i,z],out[i,z]],-

0.5)*(signr_w[o[i,out[i,x]]]*signr_w[o[i,out[i,z]]]*sqrt(abs(rho_w[o[i,out[

i,x]]]*rho_w[o[i,out[i,z]]]))*equals(arm[i,x],arm[i,z])+(1-

signr_w[o[i,out[i,x]]]*signr_w[o[i,out[i,z]]]*sqrt(abs(rho_w[o[i,out[i,x]]]

*rho_w[o[i,out[i,z]]])))*equals(x,z))

for (j in no[i]*na[i]+1:totalo*maxarms)

Appendix B

326

cv[i,x,j]<-0

cv[i,j,x]<-0

res[i,x]<-y[i,arm[i,x],out[i,x]] - mu[i,out[i,x]] -

delta[i,arm[i,x],out[i,x]]

pres[i,x]<-inprod(cp[i,x,1:no[i]*na[i]],res[i,1:no[i]*na[i]])

for (j in no[i]*na[i]+1:totalo*maxarms-1)

for (k in j+1:totalo*maxarms) cv[i,j,k]<-0

cv[i,k,j]<-0

cv[i,j,j]<-1

cv[i,totalo*maxarms,totalo*maxarms]<-1

for (j in 1:totalo)


treatment

signr_b[j]<-step(rho_b[j]) # sign of between-study correlations

signr_w[j]<-step(rho_w[j]) # sign of within-study

correlations

for (k in 2:nt) d[k,j]~dnorm(0,.001) # prior for mean

treatment effects

for (k in 1:nt) for (j in 1:totalo)



for (q in 1:nt)


spent at each rank





deviance

# END

Treatment effects module code: Model 1* (contrast-level data)


model



for(i in 1:ns)

Appendix B

327



A[i]~dnorm(0,1)

rc[i]<-1-n[i,1]/(n[i,1]+sum(n[i,2:na[i]])/(na[i]-1)) # estimate

between-arm correlation based on number of patients in trial arms

resdev[i]<-inprod(pres[i,1:no[i]*(na[i]-1)],res[i,1:no[i]*(na[i]-1)])

cp[i,1:totalo*(maxarms-1),1:totalo*(maxarms-1)]<-

inverse(cv[i,1:totalo*(maxarms-1),1:totalo*(maxarms-1)])


for (j in 1:no[i])



C[i,j]~dnorm(0,1)

for (k in 2:na[i])

D[i,k,j] <- delta[i,k,j] +

signr_w[o[i,j]]*(sqrt(abs(rho_w[o[i,j]])*rc[i])*A[i]+signr_w[o[i,j]]*sqrt(a

bs(rho_w[o[i,j]])-abs(rho_w[o[i,j]])*rc[i])*B[i,k] +

signr_w[o[i,j]]*sqrt(rc[i]--

abs(rho_w[o[i,j]])*rc[i])*C[i,j])*pow(prec[i,k,j],-0.5)






prec[i,k,j]<-pow(se[i,k,j],-2) # overall


yprec[i,k,j] <- prec[i,k,j]/(1-abs(rho_w[o[i,j]])-

rc[i]+rc[i]*abs(rho_w[o[i,j]])) # remaining (unshared) precision of

y[i,k,j] after accounting for covariance

for (k in 2:na[i])





for (j in 1:no[i])





H[i,k,j] <- d[t[i,k],o[i,j]] - d[t[i,1],o[i,j]] +








for (x in 1:no[i]*(na[i]-1)) # indexing variable x loops

through all arm/outcome combinations in study i

arm[i,x]<-trunc(1+(x-1)/no[i])+1 # finds within-trial arm number

corresponding to each value of x

Appendix B

328

out[i,x]<-x-no[i]*trunc((x-1)/no[i]) # finds within-trial outcome

number corresponding to each value of x

for (z in 1:no[i]*(na[i]-1)) cv[i,x,z]<-


0.5)*signr_w[o[i,out[i,x]]]*signr_w[o[i,out[i,z]]]*(rc[i]*sqrt(abs(rho_w[o[

i,out[i,x]]]*rho_w[o[i,out[i,z]]])+(sqrt(abs(rho_w[o[i,out[i,x]]]*rho_w[o[i

,out[i,z]]]))-

rc[i]*sqrt(abs(rho_w[o[i,out[i,x]]]*rho_w[o[i,out[i,z]]])))*equals(arm[i,x]

,arm[i,z])+(rc[i]-

rc[i]*sqrt(abs(rho_w[o[i,out[i,x]]]*rho_w[o[i,out[i,z]]])))*equals(out[i,x]

,out[i,z]))+(1-rc[i]-


*rho_w[o[i,out[i,z]]]))+rc[i]*signr_w[o[i,out[i,x]]]*signr_w[o[i,out[i,z]]]

*sqrt(abs(rho_w[o[i,out[i,x]]]*rho_w[o[i,out[i,z]]])))*equals(x,z))

# within-study covariance matrix element representing covariance between

arm/outcome combinations x and z; matrix is needed to calculate residual

deviance

for (j in no[i]*(na[i]-1)+1:totalo*(maxarms-1))

# covariance matrix needs extra columns and rows to standardise its

dimensions across studies

cv[i,x,j]<-0 # fill in redundant off-diagonal elements of the

covariance matrix with zeroes

cv[i,j,x]<-0 # fill in redundant off-diagonal elements of the


res[i,x]<-y[i,arm[i,x],out[i,x]] - delta[i,arm[i,x],out[i,x]]

# residual for arm/outcome combination x in study i

pres[i,x]<-inprod(cp[i,x,1:no[i]*(na[i]-1)],res[i,1:no[i]*(na[i]-1)])

# inner product of residuals and row of coprecision matrix (for residual

deviance calculation)

for (j in no[i]*(na[i]-1)+1:totalo*(maxarms-1)-1)



for (k in j+1:totalo*(maxarms-1)) cv[i,j,k]<-0

# fill in redundant off-diagonal elements of the covariance matrix with

zeroes

cv[i,k,j]<-0 # fill in redundant off-diagonal elements of the


cv[i,j,j]<-1 # fill in redundant diagonal elements of the

covariance matrix with 1s

cv[i,totalo*(maxarms-1),totalo*(maxarms-1)]<-1 # fill in final

redundant diagonal element of the covariance matrix with a 1

for (j in 1:totalo)


treatment

signr_b[j]<-step(rho_b[j]) # sign of between-study correlations


correlations


treatment effects

Appendix B

329




for (q in 1:nt)


spent at each rank





deviance

# END



model

sig~dunif(0,10) # prior for between-study sd of treatment

effects


for(i in 1:ns)

E[i]~dnorm(0,1)

# normalised between-trial different-arm different-outcome


resdev[i]<-inprod(pres[i,1:no[i]*na[i]],res[i,1:no[i]*na[i]])

# residual deviance for study i


inverse(cv[i,1:totalo*maxarms,1:totalo*maxarms]) # within-study

coprecision matrix of outcomes in study i

for (j in 1:no[i])

G[i,j]~dnorm(0,1) # normalised between-trial

different-arm same-outcome covariance of treatment effects (delta)

mu[i,j]~dnorm(0,.001) # "average" level of

outcome j in study i across all trial arms

sdelta[i,j]<-sum(adelta[i,1:na[i],j]) # effect of

"average" treatment in study i on outcome j relative to reference treatment

for (k in 1:na[i])

D[i,k,j] <- mu[i,j] + adelta[i,k,j] - sdelta[i,j]/na[i] +

signr_w[o[i,j]]*sqrt(abs(rho_w[o[i,j]]))*B[i,k]*pow(prec[i,k,j],-0.5) #

mean of outcome j in arm k of study i is the average across all arms plus



y[i,k,j]~dnorm(D[i,k,j],yprec[i,k,j])

# distribution of outcome j in arm k of

study i

Appendix B

330

prec[i,k,j]<-pow(va[i,k,j]/n[i,k],-1)

# overall variance of observed outcome

y

yprec[i,k,j] <- prec[i,k,j]/(1-abs(rho_w[o[i,j]]))

# remaining (unshared)

variance of y after accounting for covariance

for (k in 1:na[i])

B[i,k]~dnorm(0,1) # normalised within-trial same-

arm different-outcome covariance of observed outcomes (y)

F[i,k] ~ dnorm(0,1) # normalised between-trial

same-arm different-outcome covariance of treatment effects (delta)

iszeroarm[i,k]<-equals(t[i,k],1) # equals 1 if arm k of

trial i is reference treatment

for (j in 1:no[i])

taud[i,k,j]<-tau/(1-abs(rho_b[o[i,j]])-

0.5+0.5*abs(rho_b[o[i,j]]))

# remaining (unshared) variance of delta after

accounting for covariances

delta[i,k,j] ~ dnorm(H[i,k,j],taud[i,k,j])

# distribution

of trial-specific treatment effect on outcome j in arm k

adelta[i,k,j]<-(1-iszeroarm[i,k])*delta[i,k,j]

# treatment

effect set to zero for reference treatment

H[i,k,j] <- d[t[i,k],o[i,j]] +

signr_b[o[i,j]]*(sqrt(abs(rho_b[o[i,j]])*0.5)*E[i]+sqrt(abs(rho_b[o[i,j]])-

abs(rho_b[o[i,j]])*0.5)*F[i,k]+ sqrt(0.5-abs(rho_b[o[i,j]])*0.5)*G[i,j])*

pow(tau,-0.5) # mean of treatment effect for outcome j in arm k of

study i is the population average effect parameter for that

treatment/outcome, with adjustments for correlations: different-

arm/different-outcome, same-arm/different-outcome, same-arm/same-outcome

for (x in 1:no[i]*na[i]) # indexing variable x loops

through all arm/outcome combinations in study i

arm[i,x]<-trunc(1+(x-1)/no[i]) # finds within-trial

arm number corresponding to each value of x

out[i,x]<-x-no[i]*trunc((x-1)/no[i]) # finds within-trial

outcome number corresponding to each value of x


cv[i,x,z]<-





*rho_w[o[i,out[i,z]]])))*equals(x,z)) # within-study covariance matrix

element representing covariance between arm/outcome combinations x and z;

matrix is needed to calculate residual deviance

for (j in no[i]*na[i]+1:totalo*maxarms) # covariance matrix needs

extra columns and rows to standardise its dimensions across studies

cv[i,x,j]<-0 # fill in redundant off-

diagonal elements of the covariance matrix with zeroes

cv[i,j,x]<-0 # fill in redundant off-

diagonal elements of the covariance matrix with zeroes

Appendix B

331


adelta[i,arm[i,x],out[i,x]] + sdelta[i,out[i,x]]/na[i] # residual for

arm/outcome combination x in study i


# inner product of residuals and row of

coprecision matrix (for residual deviance calculation)

for (j in no[i]*na[i]+1:totalo*maxarms-1) #

covariance matrix needs extra columns and rows to standardise its


for (k in j+1:totalo*maxarms) cv[i,j,k]<-0 # fill in

redundant off-diagonal elements of the covariance matrix with zeroes

cv[i,k,j]<-0 # fill in

redundant off-diagonal elements of the covariance matrix with zeroes

cv[i,j,j]<-1 # fill in

redundant diagonal elements of the covariance matrix with 1s

cv[i,totalo*maxarms,totalo*maxarms]<-1 # fill in

final redundant diagonal element of the covariance matrix with a 1

for (j in 1:totalo)


treatment

signr_b[j]<-step(rho_b[j]) # sign of between-study

correlations


correlations


treatment effects


rank[k,j]<-equals(impact[j],-

1)*rank(d[,j],k)+equals(impact[j],1)*(nt+1-rank(d[,j],k)) # treatment

rankings by outcome

for (q in 1:nt)

rankprop[k,j,q]<-equals(rank[k,j],q)

cumrankprop[k,j,q]<-step(q-rank[k,j]) #

indicator for time spent at or below each rank


nmaresdev<-sum(resdev[]) # summed

overall residual deviance

# END



# This model uses random mappingss; to obtain a fixed mappings model replace the red line of code with lbeta[k,j] <- log(abs(b[j]))

Appendix B

332

model



for(i in 1:ns)



resdev[i]<-inprod(pres[i,1:no[i]*na[i]],res[i,1:no[i]*na[i]]) #



inverse(cv[i,1:totalo*maxarms,1:totalo*maxarms])


for (j in 1:no[i])



mu[i,j]~dnorm(0,.001) # "average" level of outcome j in study

i across all trial arms

sdelta[i,j]<-sum(adelta[i,1:na[i],j])/na[i] # effect of "average"

treatment in study i on outcome j relative to reference

for (k in 1:na[i])

D[i,k,j] <- mu[i,j] + adelta[i,k,j] - sdelta[i,j] +















adelta[i,k,j]<-(1-equals(t[i,k],1))*delta[i,k,j] # treatment effect

set to zero for reference treatment

H[i,k,j] <- d[t[i,k],o[i,j]] +








for (k in 1:na[i])





Appendix B

333

for (x in 1:no[i]*na[i]) # indexing variable x loops through all

arm/outcome combinations in study i

arm[i,x]<-trunc(1+(x-1)/no[i]) # finds within-trial arm number




for (z in 1:no[i]*na[i]) cv[i,x,z]<-








deviance









adelta[i,arm[i,x],out[i,x]] + sdelta[i,out[i,x]]








for (k in j+1:totalo*maxarms) cv[i,j,k]<-0


zeroes





cv[i,totalo*maxarms,totalo*maxarms]<-1 # fill in final redundant

diagonal element of the covariance matrix with a 1

for (m in 1:ng) # cycle through outcome groups

b[ogbase[m]]<-sign[ogbase[m]] # mean mapping is +/-1 for base outcome

in each group

for (j in 2:totalo) b[j] ~ dnorm(0,.01) # mean mapping

for outcome j relative to outcome 1

Appendix B

334

for(j in 1:totalo) sb[j]<-sign[j]*abs(b[j]) # mean mapping

for outcome j with correct (known) sign

lb[j]<-log(abs(b[j]))

maptau~dgamma(.005,.005) I(1,) # Lu-Ades prior for

mapping precision

mapsig <- pow(maptau,-0.5) # sd of mappings on

outcome 1

for (k in 2:nt)

W[k]~dnorm(0,1) # normalised covariation of mappings

for (m in 2:ng+1)

beta[k,ogbase[m-1]]<-sign[ogbase[m-1]] # treatment-specific mapping is

+/-1 for base outcome in each group

ad[k,ogbase[m-1]]~dnorm(0,.001) # prior for population-mean treatment

effect of each treatment on outcome 1

for (j in ogbase[m-1]+1:ogbase[m]-1)

ad[k,j]<-(beta[k,j]/beta[k,ogbase[m-1]])*ad[k,ogbase[m-1]]

# population-mean treatment effect on outcome j is mapped from mean effect

on base outcome for that group

beta[k,j]<-sign[j]*exp(lbeta[k,j]) # treatment-specific

mappings with correct sign

lbeta[k,j] ~ dnorm(bW[k,j], lbetatau[k,j]) # treatment-

specific mapping distribution on log scale

lbetatau[k,j]<-pow(mapsig,-2)/(1-0.5) # precision

corresponding to half of mapping sd

bW[k,j]<-log(abs(b[j]))+sqrt(0.5)*mapsig*W[k] # mean mappings

with adjustment for correlations

for (j in 1:totalo)

ad[1,j]<-0 # mean treatment effect is zero on

reference treatment


correlations


correlations




d[k,j]<-ad[k,j] # assign known signs to treatment

effects

for (q in 1:nt)


spent at each rank





deviance

# END

Appendix B

335

Treatment effects module code: Model 4a

Variables and constants specific to Models 4a and 4b Name Description no1[i] Number of outcomes in study i excluding the binary outcoms with

zeroes totalo1 Total number of outcomes in dataset excluding the binary outcoms with

zeroes

# This model uses random effects; to obtain a fixed effects model replace the blue line of code with delta[i,k,j] <- H[i,k,j] # This model uses random mappingss; to obtain a fixed mappings model replace the red line of code with lbeta[k,j] <- log(abs(b[j])) model

### TREATMENT EFFECTS MODEL



for (j in 1:totalo1) ff[j]<-1

for (j in totalo1+1:totalo) ff[j]<-1

for (j in 1:totalo)


reference treatment


correlations


correlations

for (k in 1:nt) d[k,j]<-ad[k,j]*(1-ze[j,k])

# assign known signs to treatment effects

for(i in 1:ns)

temp[i]<-no2[i]*ns2+sum(sw2[])*no[i] # unused variables



resdev[i]<-inprod(pres[i,1:no1[i]*na[i]],res[i,1:no1[i]*na[i]]) #


cp[i,1:totalo1*maxarms,1:totalo1*maxarms]<-

inverse(cv[i,1:totalo1*maxarms,1:totalo1*maxarms])


for (j in 1:totalo1) mu[i,j]~dnorm(0,.01) # "average"

level of outcome j in study i across all trial arms

for (j in totalo1+1:totalo) mu[i,j]~dgamma(0.5,0.5)

for (j in 1:no1[i])



sdelta[i,j]<-sum(adelta[i,1:na[i],j])/na[i] # effect of "average"


for (k in 1:na[i])

Appendix B

336

D[i,k,j] <- mu[i,o[i,j]] + adelta[i,k,j] - sdelta[i,j] +











for (k in 1:na[i])





# iszeroarm[i,k]<-equals(t[i,k],1) # equals 1 if arm k of trial i is

reference treatment

for (j in 1:no1[i])





adelta[i,k,j]<-(1-ze[o[i,j],t[i,k]])*delta[i,k,j]

H[i,k,j] <- d[t[i,k],o[i,j]] +








for (i in 1:ns2)

for (j in no1[sw2[i]]+1:no[sw2[i]])

for (k in 1:na[sw2[i]])

pi[sw2[i],k,j] <- mu[sw2[i],o[sw2[i],j]] +min(max(adelta[sw2[i],k,j],-

mu[sw2[i],o[sw2[i],j]]),1-mu[sw2[i],o[sw2[i],j]])

adelta[sw2[i],k,j]<- d[t[sw2[i],k],o[sw2[i],j]]

y[sw2[i],k,j]~dbin(pi[sw2[i],k,j],va[sw2[i],k,j])

yhat[sw2[i],k,j] <- pi[sw2[i],k,j] * va[sw2[i],k,j]

dev[sw2[i],k,j] <- 2 * (y[sw2[i],k,j] * (log(y[sw2[i],k,j])-

log(yhat[sw2[i],k,j])) + (va[sw2[i],k,j] - y[sw2[i],k,j]) *

(log(va[sw2[i],k,j] - y[sw2[i],k,j]) - log(va[sw2[i],k,j] -

yhat[sw2[i],k,j])))

prd[sw2[i],j]<-sum(dev[sw2[i],1:na[sw2[i]],j])

resdev2[i]<-sum(prd[sw2[i],no1[sw2[i]]+1:no[sw2[i]]])

Appendix B

337

### MAPPINGS


sb[ogbase[m]]<-sign[ogbase[m]] # mean mapping is +/-1 for base

outcome in each group

for (j in ogbase[m]+1:ogbase[m+1]-1)

sb[j]<-(sign[j]/sign[ogbase[m]])*abs(b[j])

for (j in 1:totalo) b[j] ~ dnorm(0,.01)

lb[j]<-log(abs(sb[j]))


mapping precision


outcome 1

for (k in 2:nt)


for (m in 2:ng+1)

beta[k,ogbase[m-1]]<- sign[ogbase[m-1]] # treatment-specific mapping is


ad1[k,ogbase[m-1]]~dnorm(0,.001) I(0,)

ad2[k,ogbase[m-1]]~dbeta(0.5,0.5)

ad[k,ogbase[m-1]]<-sign[ogbase[m-1]]*(step(totalo1+0.5-ogbase[m-

1])*ad1[k,ogbase[m-1]]+step(ogbase[m-1]-totalo1-0.5)*ad2[k,ogbase[m-1]])


ad[k,j]<- (beta[k,j]/beta[k,ogbase[m-1]])*abs(ad[k,ogbase[m-1]] )



beta[k,j]<-(sign[j]/sign[ogbase[m-1]])*exp(lbeta[k,j]) #

treatment-specific mappings with correct sign







### RESIDUAL DEVIANCE



for (i in 1:ns)

for (x in 1:no1[i]*na[i]) # indexing variable x loops through all


arm[i,x]<-trunc(1+(x-1)/no1[i]) # finds within-trial arm number


out[i,x]<-x-no1[i]*trunc((x-1)/no1[i]) # finds within-trial outcome


for (z in 1:no1[i]*na[i]) cv[i,x,z]<-



Appendix B

338






deviance

for (j in no1[i]*na[i]+1:totalo1*maxarms)







res[i,x]<-y[i,arm[i,x],out[i,x]] - mu[i,o[i,out[i,x]]] -

adelta[i,arm[i,x],out[i,x]] + sdelta[i,out[i,x]]


pres[i,x]<-inprod(cp[i,x,1:no1[i]*na[i]],res[i,1:no1[i]*na[i]])



for (j in no1[i]*na[i]+1:totalo1*maxarms-1)



for (k in j+1:totalo1*maxarms) cv[i,j,k]<-0


zeroes





cv[i,totalo1*maxarms,totalo1*maxarms]<-1 # fill in final redundant


# END

Treatment effects module code: Model 4b

Variables and constants specific to Models 4a and 4b Name Description no1[i] Number of outcomes in study i excluding the binary outcoms with

zeroes totalo1 Total number of outcomes in dataset excluding the binary outcoms with

zeroes

# This model uses random effects; to obtain a fixed effects model replace the blue line of code with adelta[i,k,j] <- H[i,k,j]

# This model uses random mappingss; to obtain a fixed mappings model replace the red line of code with lbeta[k,j] <- log(abs(b[j])) model

Appendix B

339






for (j in 1:totalo)


reference treatment


correlations


correlations



for(i in 1:ns)

temp[i]<-no2[i]

E[i]~dnorm(0,1)

for (j in 1:no1[i])

mu[i,j]~dnorm(0,.01) # "average" level of outcome j in

study i across all trial arms

for (j in no1[i]+1:no[i])

mu[i,j]~dgamma(0.5,0.5) # "average" level of outcome j in


for (j in 1:no[i])

G[i,j]~dnorm(0,1)

sdelta[i,j]<-sum(delta[i,1:na[i],j])/na[i] # effect of "average"


for (k in 1:na[i])

Dmu[i,k,j] <- step(totalo1+0.5-o[i,j])*(mu[i,j] + delta[i,k,j] -

sdelta[i,j]) + step(o[i,j]-totalo1-0.5)*min(1,max(0,mu[i,j] + delta[i,k,j]

- sdelta[i,j]))

D[i,k,j] <- mu[i,j] + delta[i,k,j] - sdelta[i,j] +

signr_w[o[i,j]]*sqrt(abs(rho_w[o[i,j]]))*B[i,k]*pow(prec[i,k,j]/ff[o[i,j]],

-0.5)




DD[i,k,j]<-step(totalo1+0.5-o[i,j])*D[i,k,j] + step(o[i,j]-totalo1-

0.5)*min(1,max(0,D[i,k,j]))

y[i,k,j]~dnorm(DD[i,k,j],yprec[i,k,j]) # distribution of outcome j

in arm k of study i

prec[i,k,j]<-pow(va[i,k,j]/n[i,k],-1) # overall precision

of observed outcome y

yprec[i,k,j] <- (prec[i,k,j]/(1-abs(rho_w[o[i,j]])))/ff[o[i,j]] #

remaining (unshared) precision of y after accounting for covariance



adelta[i,k,j] ~ dnorm(H[i,k,j],taud[i,k,j]) # distribution of


Appendix B

340

delta[i,k,j]<-step(o[i,j]-totalo1-0.5)*(1-

ze[o[i,j],t[i,k]])*d[t[i,k],o[i,j]]+step(totalo1+0.5-o[i,j])*(1-

ze[o[i,j],t[i,k]])*adelta[i,k,j] # select appropriate

treatment effect parameter for this study arm and outcome

H[i,k,j] <- d[t[i,k],o[i,j]] +




for (k in 1:na[i])

B[i,k]~dnorm(0,1) # normalised within-trial same-arm different-outcome

covariance of observed outcomes (y)

F[i,k]~dnorm(0,1)

### MAPPINGS









mapping precision


outcome 1

for (k in 2:nt)


for (m in 2:ng+1)











beta[k,j]<-(sign[j]/sign[ogbase[m-1]])*exp(lbeta[k,j]) #

treatment-specific mappings with correct sign







Appendix B

341




for (i in 1:ns)




inverse(cv[i,1:totalo*maxarms,1:totalo*maxarms]) # within-study

coprecision matrix of outcomes in study i








cv[i,x,z]<-

pow((prec[i,arm[i,x],out[i,x]]/ff[o[i,out[i,x]]])*(prec[i,arm[i,z],out[i,z]

]/ff[o[i,out[i,z]]]),-







deviance








res[i,x]<-y[i,arm[i,x],out[i,x]] - Dmu[i,arm[i,x],out[i,x]]








for (k in j+1:totalo*maxarms)

cv[i,j,k]<-0 # fill in redundant off-diagonal elements of

the covariance matrix with zeroes

cv[i,k,j]<-0 # fill in redundant off-diagonal elements of




cv[i,totalo*maxarms,totalo*maxarms]<-1 # fill in final redundant


Appendix B

342

for (k in 1:nt)

for (j in 1:totalo)



for (q in 1:nt)

rankprop[k,j,q]<-equals(rank[k,j],q)




#END

Population calibration module code

# This model assumes random effects for outcomes numbered up to and including totalo1 and fixed effects for the remaining outcomes. ### POPULATION CALIBRATION MODEL

for (i in 1:ns)

Q[i]~dnorm(0,1)

for (k in 1:na[i]) S[i,k]~dnorm(0,1)

for (j in 1:no1[i]) alpha[i,j]<-aalpha[i,j]

for (j in no1[i]+1:no[i]) alpha[i,j]<-min(1,max(0,aalpha[i,j]))

for (j in 1:no[i])

aalpha[i,j]~dnorm(amu[i,j],aprec[i,j])

amu[i,j]<-a[o[i,j]]+signr_b[o[i,j]]*zi*sqrt(abs(rho_b[o[i,j]]))*Q[i]

aprec[i,j]<-pow(zi,-2)/(1-abs(rho_b[o[i,j]]))

for (k in 1:na[i])

pm_y[i,k,j]<-y[i,k,j]

pm_va[i,k,j]<-va[i,k,j]

pm_va_prec[i,k,j]<-pow(pm_va[i,k,j]*sqrt(2/n[i,k]),-1)

pm_va[i,k,j]~dnorm(pm_va_mu[o[i,j]],pm_va_prec[i,k,j])

pm_prec[i,k,j]<-pow(pm_va[i,k,j]/n[i,k],-1)/((1-

abs(rho_w[o[i,j]]))*ff[o[i,j]])

pm_mu[i,k,j]<-step(o[i,j]-totalo1-0.5)*a[o[i,j]]+step(totalo1+0.5-

o[i,j])*alpha[i,j]+signr_w[o[i,j]]*sqrt(abs(rho_w[o[i,j]]))*sqrt(pm_va[i,k,

j]/n[i,k])*S[i,k] + cut(delta[i,k,j])

pm_y[i,k,j]~dnorm(pm_mu[i,k,j],pm_prec[i,k,j])

zi~dunif(0,10)

for (j in 1:totalo1)

a[j]~dnorm(0,.001)

for (j in totalo1+1:totalo)

a[j]~dgamma(0.5,0.5)

for (j in 1:totalo) for (k in 1:nt)

Appendix B

343

absd[k,j]<-step(totalo1+0.5-j)*(a[j]+d[k,j])+step(j-totalo1-

0.5)*max(0,min(1,a[j]+d[k,j]))

### PREDICTIVE DISTRIBUTIONS

E[ns+1]~dnorm(0,1)

Q[ns+1]~dnorm(0,1)

for (k in 1:nt)

F[ns+1,k]~dnorm(0,1)

S[ns+1,k]~dnorm(0,1)


alpha[ns+1,j]<-aalpha[ns+1,j]

pm_va_mu[j]~dunif(0,100)


alpha[ns+1,j]<-max(0,aalpha[ns+1,j])

pm_va_mu[j]~dunif(0,0.0001)

for (j in 1:totalo)

G[ns+1,j]~dnorm(0,1)

amu[ns+1,j]<-a[j]+signr_b[j]*zi*sqrt(abs(rho_b[j]))*Q[ns+1]

aprec[ns+1,j]<-pow(zi,-2)/(1-abs(rho_b[j]))

aalpha[ns+1,j]~dnorm(amu[ns+1,j],aprec[ns+1,j])

for (k in 1:nt)

taud[ns+1,k,j]<-tau/(1-abs(rho_b[j])-0.5+0.5*abs(rho_b[j]))

adelta[ns+1,k,j] ~ dnorm(H[ns+1,k,j],taud[ns+1,k,j])

delta[ns+1,k,j]<-step(j-totalo1-0.5)*(1-ze[j,k])*d[k,j]+step(totalo1+0.5-

j)*(1-ze[j,k])*adelta[ns+1,k,j] # select appropriate


H[ns+1,k,j] <- d[k,j] +

signr_b[j]*(sqrt(abs(rho_b[j])*0.5)*E[ns+1]+signr_b[j]*sqrt(abs(rho_b[j])-

abs(rho_b[j])*0.5)*F[ns+1,k] + signr_b[j]*sqrt(0.5-

abs(rho_b[j])*0.5)*G[ns+1,j])* pow(tau,-0.5)

pm_prec[ns+1,k,j]<-pow(pm_va_mu[j]*ff[j],-1)/(1-abs(rho_w[j]))

pm_amu[ns+1,k,j]<-step(j-totalo1-

0.5)*min(1,max(0,a[j]+cut(delta[ns+1,k,j])))+step(totalo1+0.5-

j)*(alpha[ns+1,j]+cut(delta[ns+1,k,j]))

pm_mu[ns+1,k,j]<- pm_amu[ns+1,k,j]

+signr_w[j]*sqrt(abs(rho_w[j]))*sqrt(pm_va_mu[j])*S[ns+1,k]

apred_y[k,j]~dnorm(pm_mu[ns+1,k,j],pm_prec[ns+1,k,j])

pred_y[k,j]<- step(totalo1+0.5-j)*apred_y[k,j] + step(j-totalo1-

0.5)*max(0,apred_y[k,j])

### TRANSFORMATIONS (NOTE - HARD CODED TO OUTCOME TYPES FROM RRMS CASE

STUDY)

for (k in 1:nt)

trad[k,1]<-exp(absd[k,1])

trad_pred_study[k,1]<-exp(pm_amu[ns+1,k,1])

trad_pred_y[k,1]<-exp(pred_y[k,1])

for (j in 2:7) trad[k,j]<-exp(absd[k,j])/(1+exp(absd[k,j]))

trad_pred_study[k,j]<-exp(pm_amu[ns+1,k,j])/(1+exp(pm_amu[ns+1,k,j]))

Appendix B

344

trad_pred_y[k,j]<-exp(pred_y[k,j])/(1+exp(pred_y[k,j]))

for (j in 8:10) trad[k,j] <- absd[k,j]

trad_pred_study[k,j]<-pm_amu[ns+1,k,j]

trad_pred_y[k,j]<-pred_y[k,j]

for (j in 11:12) trad[k,j]<-d[k,j]

trad_pred_study[k,j]<-d[k,j]

trad_pred_y[k,j]<-d[k,j]

# END

Appendix B

345

RRMS case study data

This section sets out the data in BUGS format; see Appendix A for details of the original source data).

Each version of the model requires a set of parameters specified in list format and a rectangular

array of trial data. Additionally, Models 4a and 4b require a second rectangular array to indicate

which treatment effects are assumed to equal zero.

The table below shows the list data for each version of the model.

Parameter values (list format) for Model 0 list(rho_b=c(0.6,0.6,0.6,0.6,0.6,0.6,0.6,0,0,0),ns=16,totalo1=1,

totalo2=6, totalo=10,maxarms=4,nt=9,impact=c(-1,1,-1,-1,-1,-1,-1,-1,-1,-

1))

Parameter values (list format) for Models 1,1*,2,3 (7 outcomes only) Model 3 (one mapping group) using the following data in list format: list(rho_b=c(0.6,0.6,0.6,0.6,0.6,0.6,0.6),rho_w=c(0.6,0.6,0.6,0.6,0.6,0.6

,0.6),ns=16,totalo=7,maxarms=4,nt=9,impact=c(-1,1,-1,-1,-1,-1,-

1),sign=c(-1,1,-1,-1,1,1,1),ogbase=c(1,8),ng=1)

For models 1, 1* and 2, remove the red and green data. For model 3 with two mapping groups, replace the data in green with ogbase=c(1,5,8),ng=2 For model 3 with three mapping groups, replace the data in green with ogbase=c(1,3,5,8),ng=3

Parameter values (list format) for Models 4a, 4b Model 4a (one mapping group) uses the following data in list format: list(rho_b=c(0.6,0.6,0.6,0.6,0.6,0.6,0.6,0,0,0),rho_w=c(0.6,0.6,0.6,0.6,0

.6,0.6,0.6,0.6,0.6,0.6),totalo=10,totalo1=7,maxarms=4,nt=9,impact=c(-

1,1,-1,-1,-1,-1,-1,-1,-1,-1,1,1),sign=c(-1,1,-1,-1,1,1,1,1,1,1,1,1),

ogbase=c(1,8,9,10,11,12,13),ng=6,ns=16,ns2=6,sw2=c(2,3,6,7,8,14)

For model 4b, remove the blue data. For two mapping groups, replace the data in green with ogbase=c(1,5,8,9,10,11,12,13),ng=7

For three mapping groups, replace the data in green with ogbase=c(1,3,5,8,9,10,11,12,13),ng=8

Appendix B

346

The table below shows the additional “zeroes” data for Models 4a and 4b. Columns correspond to

treatments and rows to outcomes; a value of 1 indicates that the corresponding treatment effect

will be fixed at zero.

Table of assumed zeroes for Models 4a,4b

ze[,1] ze[,2] ze[,3] ze[,4] ze[,5] ze[,6] ze[,7] ze[,8] ze[,9]

1 0 0 0 0 0 0 0 0

1 0 0 0 0 0 0 0 0

1 0 0 0 0 0 0 0 0

1 0 0 0 0 0 0 0 0

1 0 0 0 0 0 0 0 0

1 0 0 0 0 0 0 0 0

1 0 0 0 0 0 0 0 0

1 0 1 0 1 1 1 0 0

1 1 0 1 1 1 1 1 1

1 1 0 1 1 1 1 1 1

END

The table below shows the rectangular arrays of trial data that are required for each version of the

model.

Trial data (rectangular format) for Model 0 na[] no[] no1[] no2[] o[,1] o[,2] o[,3] o[,4] o[,5] o[,6] o[,7] o[,8]

o[,9] n[,1] n[,2] n[,3] t[,1] t[,2] t[,3] y[,1,1] y[,1,2]

y[,1,3] y[,1,4] y[,1,5] y[,1,6] y[,1,7] y[,1,8]

y[,1,9] y[,2,1] y[,2,2] y[,2,3] y[,2,4] y[,2,5]

y[,2,6] y[,2,7] y[,2,8] y[,2,9] y[,3,1] y[,3,2]

y[,3,3] y[,3,4] y[,3,5] y[,3,6] y[,3,7] y[,3,8]

y[,3,9] va[,1,1] va[,1,2] va[,1,3] va[,1,4] va[,1,5]

va[,1,6] va[,1,7] va[,1,8] va[,1,9] va[,2,1] va[,2,2]

va[,2,3] va[,2,4] va[,2,5] va[,2,6] va[,2,7] va[,2,8]

va[,2,9] va[,3,1] va[,3,2] va[,3,3] va[,3,4] va[,3,5]

va[,3,6] va[,3,7] va[,3,8] va[,3,9]

3 7 1 6 1 2 3 4 5 6 7 NA

NA 450 447 434 1 5 8 204 275 60 46

84 10 7 NA NA 154 308 47 35 131 11 5

NA NA 161 286 42 28 127 16 4 NA NA

600 450 450 450 415 415 415 NA NA 596 447

447 447 413 413 413 NA NA 578 434 434 434

384 384 384 NA NA

3 8 1 6 1 2 3 4 5 6 7 8

NA 363 359 350 1 2 4 193 214 62 45

149 23 13 0 NA 105 255 47 28 167 20 7

4 NA 135 238 56 38 129 24 10 0 NA

484 363 363 363 362 362 363 363 NA 478 359

359 359 355 355 355 359 NA 466 350 350 350

346 346 346 351 NA

2 7 1 5 1 2 3 4 5 6 8 NA

NA 556 550 NA 1 8 NA 289 290 87 78

99 8 1 NA NA 219 346 61 54 175 24 8

Appendix B

347

NA NA NA NA NA NA NA NA NA NA NA

741.3333333 556 556 556 515 515 556 NA NA 733

550 550 550 504 504 550 NA NA NA NA NA

NA NA NA NA NA NA

2 1 1 0 1 NA NA NA NA NA NA NA

NA 39 36 NA 4 7 NA 17 NA NA NA

NA NA NA NA NA 17 NA NA NA NA NA


NA 52 NA NA NA NA NA NA NA NA 48


NA NA NA NA NA NA

2 4 1 3 1 2 3 5 NA NA NA NA

NA 448 897 NA 4 7 NA 203 327 90 16

NA NA NA NA NA 430 655 188 99 NA NA


NA 597.3333333 448 448 445 NA NA NA NA NA

1196 897 897 888 NA NA NA NA NA NA NA

NA NA NA NA NA NA NA

2 5 1 3 1 2 3 6 8 NA NA NA

NA 408 409 NA 1 2 NA 195 220 110 12 0

NA NA NA NA 92 299 65 25 4 NA NA


544 408 408 408 408 NA NA NA NA 545 410

409 410 410 NA NA NA NA NA NA NA NA

NA NA NA NA NA

2 8 1 5 1 2 3 4 6 7 9 10

NA 418 425 NA 1 3 NA 222 191 101 79 7

4 1 0 NA 101 299 75 53 36 8 4 0

NA NA NA NA NA NA NA NA NA NA

557.3333333 418 418 418 418 418 418 418 NA 566

425 425 425 425 425 425 425 NA NA NA NA

NA NA NA NA NA NA

2 9 1 6 1 2 3 4 5 6 7 9

10 355 358 NA 1 3 NA 189 187 103 63

18 12 4 1 0 100 256 91 49 62 33 8

0 1 NA NA NA NA NA NA NA NA NA

473.3333333 355 355 355 355 355 355 355 355 477

358 358 358 358 358 358 358 358 NA NA NA

NA NA NA NA NA NA

2 3 1 2 1 2 4 NA NA NA NA NA

NA 94 88 NA 7 5 NA 62 49 13 NA

NA NA NA NA NA 81 33 28 NA NA NA


NA 125.3333333 96 96 NA NA NA NA NA NA



2 3 1 2 1 2 3 NA NA NA NA NA

NA 126 125 NA 1 4 NA 141 34 31 NA



NA 168 126 126 NA NA NA NA NA NA 166


NA NA NA NA NA NA

2 3 1 2 1 2 4 NA NA NA NA NA

NA 143 158 NA 1 5 NA 156 23 50 NA



NA 190.6666667 87 143 NA NA NA NA NA NA



2 4 1 3 1 2 3 5 NA NA NA NA

NA 187 189 NA 1 6 NA 319 30 71 2



NA 249.3333333 187 187 187 NA NA NA NA NA

252 184 189 184 NA NA NA NA NA NA NA


Appendix B

348

2 4 1 3 1 2 4 5 NA NA NA NA

NA 381 375 NA 6 4 NA 150 239 45 21



NA 500 386 386 381 NA NA NA NA NA 508


NA NA NA NA NA NA

2 6 1 4 1 2 3 5 6 8 NA NA

NA 363 358 NA 1 9 NA 261 166 99 129

24 1 NA NA NA 176 202 72 205 24 8


NA 484 363 363 360 360 360 NA NA NA 477

358 358 358 358 358 NA NA NA NA NA NA

NA NA NA NA NA NA

2 3 1 2 1 2 3 NA NA NA NA NA

NA 23 25 NA 1 4 NA 41 6 11 NA



NA 30.66666667 23 23 NA NA NA NA NA NA



2 2 1 1 1 2 NA NA NA NA NA NA

NA 112 115 NA 1 7 NA 189 18 NA NA

NA NA NA NA NA 128 36 NA NA NA NA


NA 149.3333333 112 NA NA NA NA NA NA NA



END

Trial data (rectangular format) for Models 1,2,3 na[] no[] o[,1] o[,2] o[,3] o[,4] o[,5] o[,6] o[,7] n[,1] n[,2] n[,3]

t[,1] t[,2] t[,3] y[,1,1] y[,1,2] y[,1,3] y[,1,4]

y[,1,5] y[,1,6] y[,1,7] y[,2,1] y[,2,2] y[,2,3]

y[,2,4] y[,2,5] y[,2,6] y[,2,7] y[,3,1] y[,3,2]

y[,3,3] y[,3,4] y[,3,5] y[,3,6] y[,3,7] va[,1,1]

va[,1,2] va[,1,3] va[,1,4] va[,1,5] va[,1,6] va[,1,7]

va[,2,1] va[,2,2] va[,2,3] va[,2,4] va[,2,5] va[,2,6]

va[,2,7] va[,3,1] va[,3,2] va[,3,3] va[,3,4] va[,3,5]

va[,3,6] va[,3,7]

3 7 1 2 3 4 5 6 7 450 447 434 1

5 8 -1.078809661 0.451985124 -1.871802177 -2.172773481 -

1.371301577 -3.701301974 -4.065357025 -1.347073648 0.79562585 -2.141316945 -

2.465675288 -0.766709748 -3.598556816 -4.401829262 -1.272965676 0.658779537 -

2.233592222 -2.674148649 -0.704888998 -3.135494216 -4.553876892 3.521753493

4.207792208 8.653846154 10.89647008 6.194252626 42.52469136

60.30287115 2.655451786 4.667126039 10.6281383 13.85638003

4.617210763 38.57281773 83.6122549 5.020610262 4.44991495

11.44047619 16.56896552 4.517785471 25.04347826 97.01052632

3 7 1 2 3 4 5 6 7 363 359 350 1

2 4 -0.916290732 0.362029709 -1.57997588 -1.955388893 -

0.35734586 -2.690505891 -3.292983797 -1.514127733 0.896872646 -1.892855586 -

2.469913865 -0.11844815 -2.818398258 -3.906292331 -1.237874356 0.753771802 -

1.658228077 -2.105417028 -0.520084949 -2.596497715 -3.514526067 3.691612479

4.132503293 7.060818776 9.208176101 4.129060718 16.80697704

28.96021978 4.560769534 4.859766214 8.788938898 13.90602072

4.014046375 18.80970149 51.73440066 4.015061307 4.595588235

7.44047619 10.33232119 4.276640589 15.49120083 35.6297619

2 6 1 2 3 4 5 6 NA 556 550 NA 1

8 NA -0.94160854 0.086384614 -1.68469465 -1.812901906 -

1.43556541 -4.149069462 NA -1.203972804 0.528318781 -2.081488625 -

2.21759188 -0.631271777 -2.995732274 NA NA NA NA NA NA

NA NA 3.302978061 4.007466943 7.576305664 8.291385045

6.440000971 65.39077909 NA 2.451712012 4.285673807 10.14113782

11.29405615 4.411914894 22.05 NA NA NA NA NA NA

NA NA

Appendix B

349

2 1 1 NA NA NA NA NA NA 39 36 NA 4

7 NA -1.108662625 NA NA NA NA NA NA -

0.994252273 NA NA NA NA NA NA NA NA NA NA

NA NA NA 3.868686582 NA NA NA NA NA NA

3.443987948 NA NA NA NA NA NA NA NA NA

NA NA NA NA

2 4 1 2 3 5 NA NA NA 448 897 NA 4

7 NA -1.078809661 0.994169625 -1.380723316 -3.288868197 NA

NA NA -1.021651248 0.995697509 -1.327413564 -2.075646471 NA

NA NA NA NA NA NA NA NA NA 3.574478436

5.07250992 6.229174426 28.84979604 NA NA NA 3.368066624

5.076077219 6.036438796 10.09517225 NA NA NA NA NA

NA NA NA NA NA

2 4 1 2 3 6 NA NA NA 408 409 NA 1

2 NA -1.021651248 0.157185584 -0.996613121 -3.47609869 NA

NA NA -1.771956842 0.990913372 -1.666254387 -2.751535313 NA


4.024758221 5.078218426 34.36426117 NA NA NA 4.3758069

5.064931152 7.481261181 17.73049645 NA NA NA NA NA

NA NA NA NA NA

2 6 1 2 3 4 6 7 NA 418 425 NA 1

3 NA -0.916290732 -0.172676589 -1.143781257 -1.456552255 -

4.072683065 -4.639571613 NA -1.714798428 0.864161666 -1.540445041 -

1.948601941 -2.380060405 -3.95364468 NA NA NA NA NA NA

NA NA 2.85182696 4.029891367 5.457225849 6.524177589

60.73131734 105.5096618 NA 4.056923076 4.794420555 6.880952381

9.161341043 12.89810054 54.14418465 NA NA NA NA NA

NA NA NA

2 7 1 2 3 4 5 6 7 355 358 NA 1

3 NA -0.916290732 0.107144637 -0.894700099 -1.533619076 -

2.929711172 -3.352823797 -4.474491862 -1.560647748 0.920204631 -1.076389152 -

1.841520979 -1.563225069 -2.287317621 -3.778491613 NA NA NA NA

NA NA NA 2.747215428 4.01149096 4.8553321 6.85067406

20.77563469 30.61831876 89.76139601 3.465179 4.908241422

5.274889904 8.464698501 6.983653008 11.95002331 45.77285714 NA

NA NA NA NA NA NA

2 3 1 2 4 NA NA NA NA 94 88 NA 7

5 NA -0.693147181 0.041672696 -1.85389125 NA NA NA

NA -0.356674944 -0.581029882 -0.826678573 NA NA NA NA

NA NA NA NA NA NA NA 2.454896121 4.001736865

8.541241891 NA NA NA NA 1.747514885 4.347200822


NA NA

2 3 1 2 3 NA NA NA NA 126 125 NA 1

4 NA -0.174353387 -0.995428052 -1.119889687 NA NA NA

NA -0.527632742 -0.68117099 -1.289130613 NA NA NA NA

NA NA NA NA NA NA NA 1.449615722 5.07544757

5.390831919 NA NA NA NA 2.069581078 4.482214573


NA NA

2 3 1 2 4 NA NA NA NA 143 158 NA 1

5 NA -0.198450939 -1.023388867 -0.620576488 NA NA NA

NA -0.400477567 -0.504556011 -1.256836294 NA NA NA NA

NA NA NA NA NA NA NA 1.484054383 5.141983696

4.397634409 NA NA NA NA 1.817426485 4.260023585


NA NA

2 4 1 2 3 5 NA NA NA 187 189 NA 1

6 NA 0.246860078 -1.655048424 -0.490910314 -4.527208645 NA

NA NA 0.548121409 -0.750776293 -0.995428052 -2.856470206 NA


7.424416136 4.245871782 94.51081081 NA NA NA 0.700851385

4.590644068 5.07544757 19.45747126 NA NA NA NA NA

NA NA NA NA NA

2 4 1 2 4 5 NA NA NA 381 375 NA 6

4 NA -1.203972804 0.486030965 -2.025219988 -2.841581594 NA

NA NA -1.237874356 0.485507816 -2.347036856 -4.304065093 NA


Appendix B

350

4.240913102 9.709742587 19.20119048 NA NA NA 4.131658462

4.240384615 12.55019763 76.01351351 NA NA NA NA NA

NA NA NA NA NA

2 5 1 2 3 5 6 NA NA 363 358 NA 1

9 NA -0.616186139 -0.17121594 -0.980829253 -0.582605306 -

2.63905733 NA NA -0.994252273 0.25841169 -1.379325692 0.292572058 -


1.81239387 4.029386582 5.041666667 4.349139233 16.07142857 NA

NA 2.857256318 4.067149023 6.223970474 4.086210744 15.98852295

NA NA NA NA NA NA NA NA NA

2 3 1 2 3 NA NA NA NA 23 25 NA 1

4 NA 0.300104592 -1.041453875 -0.087011377 NA NA NA

NA -1.203972804 0.241162057 -1.386294361 NA NA NA NA

NA NA NA NA NA NA NA 0.916028512 5.18627451

4.007575758 NA NA NA NA 4.433022784 4.058441558 6.25


2 2 1 2 NA NA NA NA NA 112 115 NA 1

7 NA 0.2390169 -1.652923024 NA NA NA NA NA -

0.174353387 -0.785928914 NA NA NA NA NA NA NA NA

NA NA NA NA 0.957245599 7.413711584 NA NA NA

NA NA 1.450496939 4.650140647 NA NA NA NA NA


END

Trial data (rectangular format) for Model 1* na[] no[] o[,1] o[,2] o[,3] o[,4] o[,5] o[,6] o[,7] n[,1] n[,2] n[,3]

t[,1] t[,2] t[,3] y[,2,1] y[,2,2] y[,2,3] y[,2,4]

y[,2,5] y[,2,6] y[,2,7] y[,3,1] y[,3,2] y[,3,3]

y[,3,4] y[,3,5] y[,3,6] y[,3,7] se[,2,1] se[,2,2]

se[,2,3] se[,2,4] se[,2,5] se[,2,6] se[,2,7] se[,3,1]

se[,3,2] se[,3,3] se[,3,4] se[,3,5] se[,3,6] se[,3,7]

3 7 1 2 3 4 5 6 7 450 447 434 1

5 8 -0.268263987 0.343640726 -0.269514768 -0.292901806

0.604591829 0.102745158 -0.336472237 -0.194156014 0.206794413 -

0.361790045 -0.501375168 0.666412578 0.565807758 -0.488519866 0.117331696

0.140682789 0.207382171 0.234974448 0.155223505 0.425196391

0.566620159 0.139263582 0.140013962 0.213521225 0.249783342

0.155481992 0.390132261 0.597940581

3 7 1 2 3 4 5 6 7 363 359 350 1

2 4 -0.597837001 0.534842937 -0.312879706 -0.514524972

0.238897709 -0.127892367 -0.613308534 -0.321583624 0.391742093 -

0.078252197 -0.150028135 -0.162739089 0.094008176 -0.22154227 0.151240949

0.157864688 0.20960204 0.25318434 0.150186586 0.314157483

0.473167185 0.147109942 0.156571258 0.201766684 0.234281442

0.153602721 0.300933192 0.426121508

2 6 1 2 3 4 5 6 NA 556 550 NA 1

8 NA -0.262364264 0.441934167 -0.396793976 -0.404689974

0.804293633 1.153337188 NA NA NA NA NA NA NA

NA 0.101971889 0.122473706 0.179066695 0.188274296 0.14001571

0.397114875 NA NA NA NA NA NA NA NA

2 1 1 NA NA NA NA NA NA 39 36 NA 4

7 NA 0.114410351 NA NA NA NA NA NA NA

NA NA NA NA NA NA 0.441433374 NA NA NA


2 4 1 2 3 5 NA NA NA 448 897 NA 4

7 NA 0.057158414 0.001527884 0.053309752 1.213221726 NA


0.130313145 0.143645374 0.275047703 NA NA NA NA NA

NA NA NA NA NA

2 4 1 2 3 6 NA NA NA 408 409 NA 1

2 NA -0.750305594 0.833727789 -0.669641267 0.724563377 NA


0.149158634 0.175323147 0.357179195 NA NA NA NA NA

NA NA NA NA NA

2 6 1 2 3 4 6 7 NA 418 425 NA 1

3 NA -0.798507696 1.036838256 -0.396663784 -0.492049686

Appendix B

351

1.69262266 0.685926933 NA NA NA NA NA NA NA

NA 0.127938477 0.14464397 0.171014737 0.192780126 0.419092716

0.616290143 NA NA NA NA NA NA NA NA

2 7 1 2 3 4 5 6 7 355 358 NA 1

3 NA -0.644357016 0.813059994 -0.181689053 -0.307901903

1.366486103 1.065506177 0.69600025 NA NA NA NA NA

NA NA 0.131976914 0.158145965 0.168556582 0.207224715

0.279339086 0.345873877 0.617013894 NA NA NA NA NA

NA NA

2 3 1 2 4 NA NA NA NA 94 88 NA 7

5 NA 0.336472237 -0.622702579 1.027212677 NA NA NA

NA NA NA NA NA NA NA NA 0.214415578

0.303268327 0.380180437 NA NA NA NA NA NA NA

NA NA NA NA

2 3 1 2 3 NA NA NA NA 126 125 NA 1

4 NA -0.353279355 0.314257063 -0.169240926 NA NA NA


0.275933047 0.300042495 NA NA NA NA NA NA NA

NA NA NA NA

2 3 1 2 4 NA NA NA NA 143 158 NA 1

5 NA -0.202026628 0.518832857 -0.636259806 NA NA NA


0.250838798 0.25971946 NA NA NA NA NA NA NA

NA NA NA NA

2 4 1 2 3 5 NA NA NA 187 189 NA 1

6 NA 0.301261331 0.90427213 -0.504517738 1.670738438 NA


0.252966168 0.222619444 0.779971146 NA NA NA NA NA

NA NA NA NA NA

2 4 1 2 4 5 NA NA NA 381 375 NA 6

4 NA -0.033901552 -0.000523149 -0.321816868 -1.462483499 NA


0.14979552 0.242800499 0.50308998 NA NA NA NA NA

NA NA NA NA NA

2 5 1 2 3 5 6 NA NA 363 358 NA 1

9 NA -0.378066134 0.429627631 -0.398496439 0.875177364


0.113903395 0.14986991 0.17684536 0.152954556 0.298219024 NA

NA NA NA NA NA NA NA NA

2 3 1 2 3 NA NA NA NA 23 25 NA 1

4 NA -1.504077397 1.282615932 -1.299282984 NA NA NA


0.622758266 0.651338947 NA NA NA NA NA NA NA

NA NA NA NA

2 2 1 2 NA NA NA NA NA 112 115 NA 1

7 NA -0.413370288 0.86699411 NA NA NA NA NA

NA NA NA NA NA NA NA 0.145464266 0.326542278


NA

END

Trial data (rectangular format) for Model 4a na[] no[] no1[] no2[] o[,1] o[,2] o[,3] o[,4] o[,5] o[,6] o[,7] o[,8]

o[,9] n[,1] n[,2] n[,3] t[,1] t[,2] t[,3] y[,1,1] y[,1,2]

y[,1,3] y[,1,4] y[,1,5] y[,1,6] y[,1,7] y[,1,8]

y[,1,9] y[,2,1] y[,2,2] y[,2,3] y[,2,4] y[,2,5]

y[,2,6] y[,2,7] y[,2,8] y[,2,9] y[,3,1] y[,3,2]

y[,3,3] y[,3,4] y[,3,5] y[,3,6] y[,3,7] y[,3,8]

y[,3,9] va[,1,1] va[,1,2] va[,1,3] va[,1,4] va[,1,5]

va[,1,6] va[,1,7] va[,1,8] va[,1,9] va[,2,1] va[,2,2]

va[,2,3] va[,2,4] va[,2,5] va[,2,6] va[,2,7] va[,2,8]

va[,2,9] va[,3,1] va[,3,2] va[,3,3] va[,3,4] va[,3,5]

va[,3,6] va[,3,7] va[,3,8] va[,3,9]

3 7 7 0 1 2 3 4 5 6 7 NA

NA 450 447 434 1 5 8 -1.078809661 0.451985124 -

1.871802177 -2.172773481 -1.371301577 -3.701301974 -4.065357025 NA NA -

Appendix B

352

1.347073648 0.79562585 -2.141316945 -2.465675288 -0.766709748 -3.598556816 -

4.401829262 NA NA -1.272965676 0.658779537 -2.233592222 -2.674148649 -

0.704888998 -3.135494216 -4.553876892 NA NA 3.521753493 4.207792208

8.653846154 10.89647008 6.194252626 42.52469136 60.30287115 NA

NA 2.655451786 4.667126039 10.6281383 13.85638003 4.617210763

38.57281773 83.6122549 NA NA 5.020610262 4.44991495

11.44047619 16.56896552 4.517785471 25.04347826 97.01052632 NA

NA

3 8 7 1 1 2 3 4 5 6 7 8

NA 363 359 350 1 2 4 -0.916290732 0.362029709 -

1.57997588 -1.955388893 -0.35734586 -2.690505891 -3.292983797 0 NA -

1.514127733 0.896872646 -1.892855586 -2.469913865 -0.11844815 -2.818398258 -

3.906292331 4 NA -1.237874356 0.753771802 -1.658228077 -2.105417028 -

0.520084949 -2.596497715 -3.514526067 0 NA 3.691612479 4.132503293

7.060818776 9.208176101 4.129060718 16.80697704 28.96021978 363

NA 4.560769534 4.859766214 8.788938898 13.90602072 4.014046375

18.80970149 51.73440066 359 NA 4.015061307 4.595588235

7.44047619 10.33232119 4.276640589 15.49120083 35.6297619 351

NA

2 7 6 1 1 2 3 4 5 6 8 NA

NA 556 550 NA 1 8 NA -0.94160854 0.086384614 -

1.68469465 -1.812901906 -1.43556541 -4.149069462 1 NA NA -

1.203972804 0.528318781 -2.081488625 -2.21759188 -0.631271777 -2.995732274 8


3.302978061 4.007466943 7.576305664 8.291385045 6.440000971

65.39077909 556 NA NA 2.451712012 4.285673807 10.14113782

11.29405615 4.411914894 22.05 550 NA NA NA NA NA

NA NA NA NA NA NA


NA 39 36 NA 4 7 NA -1.108662625 NA NA

NA NA NA NA NA NA -0.994252273 NA NA NA



NA NA 3.443987948 NA NA NA NA NA NA NA


2 4 4 0 1 2 3 5 NA NA NA NA

NA 448 897 NA 4 7 NA -1.078809661 0.994169625 -

1.380723316 -3.288868197 NA NA NA NA NA -1.021651248

0.995697509 -1.327413564 -2.075646471 NA NA NA NA NA


5.07250992 6.229174426 28.84979604 NA NA NA NA NA

3.368066624 5.076077219 6.036438796 10.09517225 NA NA NA


2 5 4 1 1 2 3 6 8 NA NA NA

NA 408 409 NA 1 2 NA -1.021651248 0.157185584 -

0.996613121 -3.47609869 0 NA NA NA NA -1.771956842

0.990913372 -1.666254387 -2.751535313 4 NA NA NA NA


4.024758221 5.078218426 34.36426117 408 NA NA NA NA

4.3758069 5.064931152 7.481261181 17.73049645 410 NA NA


2 8 6 2 1 2 3 4 6 7 9 10

NA 418 425 NA 1 3 NA -0.916290732 -0.172676589 -

1.143781257 -1.456552255 -4.072683065 -4.639571613 1 0 NA -

1.714798428 0.864161666 -1.540445041 -1.948601941 -2.380060405 -3.95364468 4

0 NA NA NA NA NA NA NA NA NA NA

2.85182696 4.029891367 5.457225849 6.524177589 60.73131734

105.5096618 418 418 NA 4.056923076 4.794420555 6.880952381

9.161341043 12.89810054 54.14418465 425 425 NA NA NA


2 9 7 2 1 2 3 4 5 6 7 9

10 355 358 NA 1 3 NA -0.916290732 0.107144637 -

0.894700099 -1.533619076 -2.929711172 -3.352823797 -4.474491862 1 0 -

1.560647748 0.920204631 -1.076389152 -1.841520979 -1.563225069 -2.287317621 -

3.778491613 0 1 NA NA NA NA NA NA NA NA

NA 2.747215428 4.01149096 4.8553321 6.85067406 20.77563469

30.61831876 89.76139601 355 355 3.465179 4.908241422

Appendix B

353

5.274889904 8.464698501 6.983653008 11.95002331 45.77285714 358

358 NA NA NA NA NA NA NA NA NA

2 3 3 0 1 2 4 NA NA NA NA NA

NA 94 88 NA 7 5 NA -0.693147181 0.041672696 -

1.85389125 NA NA NA NA NA NA -0.356674944 -0.581029882 -


NA NA NA NA NA 2.454896121 4.001736865 8.541241891

NA NA NA NA NA NA 1.747514885 4.347200822


NA NA NA NA NA NA

2 3 3 0 1 2 3 NA NA NA NA NA

NA 126 125 NA 1 4 NA -0.174353387 -0.995428052 -

1.119889687 NA NA NA NA NA NA -0.527632742 -0.68117099 -


NA NA NA NA NA 1.449615722 5.07544757 5.390831919

NA NA NA NA NA NA 2.069581078 4.482214573


NA NA NA NA NA NA

2 3 3 0 1 2 4 NA NA NA NA NA

NA 143 158 NA 1 5 NA -0.198450939 -1.023388867 -

0.620576488 NA NA NA NA NA NA -0.400477567 -0.504556011 -


NA NA NA NA NA 1.484054383 5.141983696 4.397634409

NA NA NA NA NA NA 1.817426485 4.260023585


NA NA NA NA NA NA

2 4 4 0 1 2 3 5 NA NA NA NA

NA 187 189 NA 1 6 NA 0.246860078 -1.655048424 -

0.490910314 -4.527208645 NA NA NA NA NA 0.548121409 -

0.750776293 -0.995428052 -2.856470206 NA NA NA NA NA NA


7.424416136 4.245871782 94.51081081 NA NA NA NA NA

0.700851385 4.590644068 5.07544757 19.45747126 NA NA NA


2 4 4 0 1 2 4 5 NA NA NA NA

NA 381 375 NA 6 4 NA -1.203972804 0.486030965 -

2.025219988 -2.841581594 NA NA NA NA NA -1.237874356

0.485507816 -2.347036856 -4.304065093 NA NA NA NA NA


4.240913102 9.709742587 19.20119048 NA NA NA NA NA

4.131658462 4.240384615 12.55019763 76.01351351 NA NA NA


2 6 5 1 1 2 3 5 6 8 NA NA

NA 363 358 NA 1 9 NA -0.616186139 -0.17121594 -

0.980829253 -0.582605306 -2.63905733 1 NA NA NA -0.994252273

0.25841169 -1.379325692 0.292572058 -2.633087163 8 NA NA


1.81239387 4.029386582 5.041666667 4.349139233 16.07142857 360

NA NA NA 2.857256318 4.067149023 6.223970474 4.086210744

15.98852295 358 NA NA NA NA NA NA NA NA

NA NA NA NA

2 3 3 0 1 2 3 NA NA NA NA NA

NA 23 25 NA 1 4 NA 0.300104592 -1.041453875 -

0.087011377 NA NA NA NA NA NA -1.203972804 0.241162057 -


NA NA NA NA NA 0.916028512 5.18627451 4.007575758

NA NA NA NA NA NA 4.433022784 4.058441558 6.25


NA NA NA NA


NA 112 115 NA 1 7 NA 0.2390169 -1.652923024

NA NA NA NA NA NA NA -0.174353387 -0.785928914


NA NA NA NA NA 0.957245599 7.413711584 NA NA

NA NA NA NA NA 1.450496939 4.650140647 NA NA


NA NA NA

END

Appendix B

354

Trial data (rectangular format) for Model 4b Sample variance for outcomes 8-10 is set to (0.025 + 𝑝)(0.975 − 𝑝) × 100 𝑁⁄ as per II.6.1.5. na[] no[] no1[] no2[] o[,1] o[,2] o[,3] o[,4] o[,5] o[,6] o[,7] o[,8]

o[,9] n[,1] n[,2] n[,3] t[,1] t[,2] t[,3] y[,1,1] y[,1,2]

y[,1,3] y[,1,4] y[,1,5] y[,1,6] y[,1,7] y[,1,8]

y[,1,9] y[,2,1] y[,2,2] y[,2,3] y[,2,4] y[,2,5]

y[,2,6] y[,2,7] y[,2,8] y[,2,9] y[,3,1] y[,3,2]

y[,3,3] y[,3,4] y[,3,5] y[,3,6] y[,3,7] y[,3,8]

y[,3,9] va[,1,1] va[,1,2] va[,1,3] va[,1,4] va[,1,5]

va[,1,6] va[,1,7] va[,1,8] va[,1,9] va[,2,1] va[,2,2]

va[,2,3] va[,2,4] va[,2,5] va[,2,6] va[,2,7] va[,2,8]

va[,2,9] va[,3,1] va[,3,2] va[,3,3] va[,3,4] va[,3,5]

va[,3,6] va[,3,7] va[,3,8] va[,3,9]

3 7 7 0 1 2 3 4 5 6 7 NA

NA 450 447 434 1 5 8 -1.078809661 0.451985124 -

1.871802177 -2.172773481 -1.371301577 -3.701301974 -4.065357025 NA NA -

1.347073648 0.795625850 -2.141316945 -2.465675288 -0.766709748 -3.598556816 -

4.401829262 NA NA -1.272965676 0.658779537 -2.233592222 -2.674148649 -

0.704888998 -3.135494216 -4.553876892 NA NA 3.521753493 4.207792208

8.653846154 10.896470082 6.194252626 42.524691358 60.302871148 NA

NA 2.655451786 4.667126039 10.628138298 13.856380028 4.617210763

38.572817730 83.612254902 NA NA 5.020610262 4.449914950

11.440476190 16.568965517 4.517785471 25.043478261 97.010526316 NA

NA

3 8 7 1 1 2 3 4 5 6 7 8

NA 363 359 350 1 2 4 -0.916290732 0.362029709 -

1.579975880 -1.955388893 -0.357345860 -2.690505891 -3.292983797 0.000000000

NA -1.514127733 0.896872646 -1.892855586 -2.469913865 -0.118448150 -

2.818398258 -3.906292331 0.011142061 NA -1.237874356 0.753771802 -

1.658228077 -2.105417028 -0.520084949 -2.596497715 -3.514526067 0.000000000

NA 3.691612479 4.132503293 7.060818776 9.208176101 4.129060718

16.806977042 28.960219780 0.006714876 NA 4.560769534 4.859766214

8.788938898 13.906020716 4.014046375 18.809701493 51.734400657

0.009703569 NA 4.015061307 4.595588235 7.440476190 10.332321188

4.276640589 15.491200828 35.629761905 0.006944444 NA

2 7 6 1 1 2 3 4 5 6 8 NA

NA 556 550 NA 1 8 NA -0.941608540 0.086384614 -

1.684694650 -1.812901906 -1.435565410 -4.149069462 0.001798561 NA NA -

1.203972804 0.528318781 -2.081488625 -2.217591880 -0.631271777 -2.995732274


NA NA 3.302978061 4.007466943 7.576305664 8.291385045

6.440000971 65.390779093 0.004690719 NA NA 2.451712012

4.285673807 10.141137819 11.294056153 4.411914894 22.050000000


NA NA


NA 39 36 NA 4 7 NA -1.108662625 NA NA

NA NA NA NA NA NA -0.994252273 NA NA NA



NA NA 3.443987948 NA NA NA NA NA NA NA


2 4 4 0 1 2 3 5 NA NA NA NA

NA 448 897 NA 4 7 NA -1.078809661 0.994169625 -

1.380723316 -3.288868197 NA NA NA NA NA -1.021651248

0.995697509 -1.327413564 -2.075646471 NA NA NA NA NA


5.072509920 6.229174426 28.849796037 NA NA NA NA NA

3.368066624 5.076077219 6.036438796 10.095172255 NA NA NA


2 5 4 1 1 2 3 6 8 NA NA NA

NA 408 409 NA 1 2 NA -1.021651248 0.157185584 -

0.996613121 -3.476098690 0.000000000 NA NA NA NA -1.771956842

0.990913372 -1.666254387 -2.751535313 0.009756098 NA NA NA


Appendix B

355

3.894646152 4.024758221 5.078218426 34.364261168 0.005974265 NA

NA NA NA 4.375806900 5.064931152 7.481261181 17.730496454


NA NA NA NA

2 8 6 2 1 2 3 4 6 7 9 10

NA 418 425 NA 1 3 NA -0.916290732 -0.172676589 -

1.143781257 -1.456552255 -4.072683065 -4.639571613 0.002392344 0.000000000

NA -1.714798428 0.864161666 -1.540445041 -1.948601941 -2.380060405 -

3.953644680 0.009411765 0.000000000 NA NA NA NA NA NA

NA NA NA NA 2.851826960 4.029891367 5.457225849

6.524177589 60.731317344 105.509661836 0.006373685 0.005831340 NA

4.056923076 4.794420555 6.880952381 9.161341043 12.898100543

54.144184652 0.007818258 0.005735294 NA NA NA NA NA

NA NA NA NA NA

2 9 7 2 1 2 3 4 5 6 7 9

10 355 358 NA 1 3 NA -0.916290732 0.107144637 -

0.894700099 -1.533619076 -2.929711172 -3.352823797 -4.474491862 0.002816901

0.000000000 -1.560647748 0.920204631 -1.076389152 -1.841520979 -

1.563225069 -2.287317621 -3.778491613 0.000000000 0.002793296 NA NA

NA NA NA NA NA NA NA 2.747215428 4.011490960

4.855332100 6.850674060 20.775634685 30.618318756 89.761396011

0.007617781 0.006866197 3.465179000 4.908241422 5.274889904

8.464698501 6.983653008 11.950023310 45.772857143 0.006808659


2 3 3 0 1 2 4 NA NA NA NA NA

NA 94 88 NA 7 5 NA -0.693147181 0.041672696 -

1.853891250 NA NA NA NA NA NA -0.356674944 -0.581029882 -


NA NA NA NA NA 2.454896121 4.001736865 8.541241891

NA NA NA NA NA NA 1.747514885 4.347200822


NA NA NA NA NA NA

2 3 3 0 1 2 3 NA NA NA NA NA

NA 126 125 NA 1 4 NA -0.174353387 -0.995428052 -

1.119889687 NA NA NA NA NA NA -0.527632742 -0.681170990 -


NA NA NA NA NA 1.449615722 5.075447570 5.390831919

NA NA NA NA NA NA 2.069581078 4.482214573


NA NA NA NA NA NA

2 3 3 0 1 2 4 NA NA NA NA NA

NA 143 158 NA 1 5 NA -0.198450939 -1.023388867 -

0.620576488 NA NA NA NA NA NA -0.400477567 -0.504556011 -


NA NA NA NA NA 1.484054383 5.141983696 4.397634409

NA NA NA NA NA NA 1.817426485 4.260023585


NA NA NA NA NA NA

2 4 4 0 1 2 3 5 NA NA NA NA

NA 187 189 NA 1 6 NA 0.246860078 -1.655048424 -

0.490910314 -4.527208645 NA NA NA NA NA 0.548121409 -

0.750776293 -0.995428052 -2.856470206 NA NA NA NA NA NA


7.424416136 4.245871782 94.510810811 NA NA NA NA NA

0.700851385 4.590644068 5.075447570 19.457471264 NA NA NA


2 4 4 0 1 2 4 5 NA NA NA NA

NA 381 375 NA 6 4 NA -1.203972804 0.486030965 -

2.025219988 -2.841581594 NA NA NA NA NA -1.237874356

0.485507816 -2.347036856 -4.304065093 NA NA NA NA NA


4.240913102 9.709742587 19.201190476 NA NA NA NA NA

4.131658462 4.240384615 12.550197628 76.013513514 NA NA NA


2 6 5 1 1 2 3 5 6 8 NA NA

NA 363 358 NA 1 9 NA -0.616186139 -0.171215940 -

0.980829253 -0.582605306 -2.639057330 0.002777778 NA NA NA -

0.994252273 0.258411690 -1.379325692 0.292572058 -2.633087163 0.022346369

Appendix B

356


NA 1.812393870 4.029386582 5.041666667 4.349139233 16.071428571

0.007501715 NA NA NA 2.857256318 4.067149023 6.223970474

4.086210744 15.988522954 0.012599075 NA NA NA NA NA


2 3 3 0 1 2 3 NA NA NA NA NA

NA 23 25 NA 1 4 NA 0.300104592 -1.041453875 -

0.087011377 NA NA NA NA NA NA -1.203972804 0.241162057 -


NA NA NA NA NA 0.916028512 5.186274510 4.007575758

NA NA NA NA NA NA 4.433022784 4.058441558


NA NA NA NA NA NA


NA 112 115 NA 1 7 NA 0.239016900 -1.652923024

NA NA NA NA NA NA NA -0.174353387 -0.785928914


NA NA NA NA NA 0.957245599 7.413711584 NA NA

NA NA NA NA NA 1.450496939 4.650140647 NA NA


NA NA NA

END

2 Relative ratings

Dictionary

The table below describes the key variables/parameters/constants.

Name Description g[] Average preference strength (log utility coefficient) for criterion eg[] Average utility coefficient for criterion nr Number of ratings in the data rastudyid[i] Identifier of the study/dataset to which the rating in row i belongs subj[i] Identifier of the participant who provided the rating in row i nrastud Number of source studies/datasets nsubj Number of participants nrc Number of criteria raresdev Residual deviance of the model ragamma[k,j] Preference strength for participant or study k (in the random preference

model) rasig Ratngs standard deviation prefresig Standard deviation of random preference distribution ahpalpha Average preference strength for reference category of administration variable

in PROTECT patient ratings (required due to use of AHP elicitation method) pvfa[] AHP-style partial priority for level of administration variable pvfaa AHP-style partial priority for reference level of administration variable pvfb[] Keeney-Raiffa style partial value for level of administration variable

With regard to the RRMS case study data, the PROTECT investigator ratings are study 1 and the PROTECT patient ratings are study 2. The criteria are numbered as follows: 1 Relapse

2 Disability progression

3 PML

Appendix B

357

4 Herpes reactivation

5 Liver enzyme elevation

6 Seizures

7 Congenital abnormalities

8 Infusion/injection reactions

9 Allergic/hypersensitivity reactions

10 Flu-like reactions

11 Daily oral vs daily subcutaneous

12 Monthly infusion vs daily subcutaneous

13 Weekly intramuscular vs daily subcutaneous

Ratings model code (1)

# This model has an intercept term ahpalpha to fix the utility scale in absolute terms, designed for AHP data relating to categorical criteria (eg the PROTECT patient ratings). Where there is no such data (eg the PROTECT investigator ratings), remove the red and green items of code. # The model shown uses random preferences at the individual level – for the fixed-preference model replace the blue lines of code with ragamma[i,j] <- prefmu[i,j] # Note that some of the model outputs (pvfa, pvfaa, pvfb, weightb) are hard coded to the criteria indices in the RRMS dataset. # This model can be hard coded to work with a dataset formed by concatenating the PROTECT investigator ratings (participants 1-3) and the PROTECT patient ratings (participants 4-39). To do this, replace the green code with +step(i-3.5)*ahpalpha model

for(i in 1:nr) # loop through ratings

temp[i]<-subj[i]+nsubj+rastudyid[i]+nrastud # these variables are unused

in the fixed preference model

#Utility model

for (j in 1:nrc) pmu[j,i] <- log(ragamma[subj[i],j])*cr[i,j] # each

criterion's expected contribution to log rating i

ramu[i] <- sum(pmu[1:nrc,i]) # expected value (mean) of log rating i

logra[i]<-log(ra[i])

logra[i]~dnorm(ramu[i],ratau) # likelihood of observed log rating i

radev[i] <- (logra[i]-ramu[i]) * (logra[i]-ramu[i]) * ratau # residual

deviance contribution

for (i in 1:nsubj) # loop through participants

for (j in 1:nrc) # loop through criteria

ragamma[i,j]~dnorm(prefmu[i,j],prefretau[i,j]) I(0,) # random

preferences by participant

prefmu[i,j] <- eg[j] + ahpalpha

prefretau[i,j]<-pow(prefresig*prefmu[i,j],-2) # random preference

precision (sd proportional to mean)

Appendix B

358

ahpalpha~dgamma(1,0.1) # prior for AHP intercept term (utility of reference

category)

ratau<-pow(rasig,-2)

rasig~dunif(0,10) # prior for ratings standard deviation

for (j in 1:nrc) eg[j] ~ dgamma(1,0.01) # prior for utility coefficients

g[j]<-log(eg[j])

pvfb[j]<-eg[j]/eg[11] # Keeney-Raiffa-style partial values for

intermediate admin levels (note hard coding)

pvfa[j]<-(eg[j]+ahpalpha)/(sum(eg[11:13])+4*ahpalpha) # AHP-style partial

priorities for intermediate admin levels (note hard coding)

pvfaa<-ahpalpha/(sum(eg[11:13])+ 4*ahpalpha) # AHP-style partial

priority for reference admin level (note hard coding)

prefresig~dunif(0,10) # prior for random preference standard deviation

#weights

for(i in 1:nrc)

weight[i]<-eg[i]/sum(eg[1:nrc]) # preference weights, i.e. normalised

utiltiy coefficients

for(i in 1:nrc)

weightb[i]<-eg[i]/sum(eg[1:11]) # this calculates weights excluding

intermediate admin levels from the total (note hard coding)

raresdev<-sum(radev[1:nr]) # ratings standard deviation

# END

Ratings model code (2)

# This model is hard coded to a dataset formed by concatenating the PROTECT investigator ratings (participants 1-3) and the PROTECT patient ratings (participants 4-39). # The model includes the intercept term ahpalpha for the patient ratings but not the investigator ratings (i.e. when the variable subj is at least 4). # This model uses random preferences at the study level

model


temp[i]<-subj[i]+nsubj # these variables are unused

#Utility model

Appendix B

359

for (j in 1:nrc) pmu[j,i] <- log(ragamma[rastudyid[i],j])*cr[i,j] #

each criterion's expected contribution to log rating i

ramu[i] <- sum(pmu[1:nrc,i]) # expected value (mean) of log rating


logra[i]~dnorm(ramu[i],ratau) # likelihood of observed log rating i



for (j in 1:nrc) # loop through criteria

for (k in 1:nrastud) # loop through studies

ragamma[k,j]~dnorm(prefmu[k,j],prefretau[j]) I(0,)

prefmu[k,j] <- eg[j] + (k-1)*ahpalpha # includes ahpalpha for

study 2 but not study 1

ahpalpha~dgamma(1,0.01) # prior for AHP intercept term (utility of reference

category)


rasig~dunif(0,10) # prior for ratings standard deviation

for (j in 1:nrc) eg[j] ~ dgamma(1,0.01) # prior for utility coefficients

g[j]<-log(eg[j])

prefretau[j]<-pow(prefresig*eg[j],-2)


#weights

for(i in 1:nrc)

weight[i]<-eg[i]/sum(eg[1:nrc]) # preference weights, i.e. normalised

utiltiy coefficients

raresdev<-sum(radev[1:nr]) # ratings standard deviation

# END

Ratings data

PROTECT investigator ratings list(nr=243,nrc=13,nrastud=1,nsubj=3)

rastudyid[] subj[] ra[] cr[,1] cr[,2] cr[,3] cr[,4]

cr[,5] cr[,6] cr[,7] cr[,8] cr[,9] cr[,10]

cr[,11] cr[,12] cr[,13]

1 1 0.7 1 -1 0 0 0 0 0 0 0 0

0 0 0

1 1 0.1 0 1 -1 0 0 0 0 0 0 0

0 0 0

1 1 0.01 0 0 -1 0 0 0 0 0 0 0

1 0 0

Appendix B

360

1 1 0.12 0 0 -1 1 0 0 0 0 0 0

0 0 0

1 1 0.2 0 0 -1 0 1 0 0 0 0 0

0 0 0

1 1 0.1 0 0 -1 0 0 1 0 0 0 0

0 0 0

1 1 0.1 0 0 -1 0 0 0 1 0 0 0

0 0 0

1 1 0.05 0 0 -1 0 0 0 0 1 0 0

0 0 0

1 1 0.4 0 0 0 0 0 0 0 -1 1 0

0 0 0

1 1 0.4 0 0 0 0 0 0 0 -1 0 1

0 0 0

1 1 0.7 0 0 0 0 0 0 0 0 0 0

-1 1 0

1 1 0.5 0 0 0 0 0 0 0 0 0 0

-1 0 1

1 2 0.7 1 -1 0 0 0 0 0 0 0 0

0 0 0

1 2 0.9 0 1 -1 0 0 0 0 0 0 0

0 0 0

1 2 0.1 0 0 -1 0 0 0 0 0 0 0

1 0 0

1 2 0.2 0 0 -1 1 0 0 0 0 0 0

0 0 0

1 2 0.2 0 0 -1 0 1 0 0 0 0 0

0 0 0

1 2 0.1 0 0 -1 0 0 1 0 0 0 0

0 0 0

1 2 0.1 0 0 -1 0 0 0 1 0 0 0

0 0 0

1 2 0.05 0 0 -1 0 0 0 0 1 0 0

0 0 0

1 2 0.4 0 0 0 0 0 0 0 -1 1 0

0 0 0

1 2 0.4 0 0 0 0 0 0 0 -1 0 1

0 0 0

1 2 0.7 0 0 0 0 0 0 0 0 0 0

-1 1 0

1 2 0.5 0 0 0 0 0 0 0 0 0 0

-1 0 1

1 3 0.6 1 -1 0 0 0 0 0 0 0 0

0 0 0

1 3 0.9 0 1 -1 0 0 0 0 0 0 0

0 0 0

1 3 0.1 0 0 -1 0 0 0 0 0 0 0

1 0 0

1 3 0.3 0 0 -1 1 0 0 0 0 0 0

0 0 0

1 3 0.2 0 0 -1 0 1 0 0 0 0 0

0 0 0

1 3 0.1 0 0 -1 0 0 1 0 0 0 0

0 0 0

1 3 0.1 0 0 -1 0 0 0 1 0 0 0

0 0 0

1 3 0.05 0 0 -1 0 0 0 0 1 0 0

0 0 0

1 3 0.888888889 0 0 0 0 0 0 0 -1 1

0 0 0 0

Appendix B

361

1 3 1.111111111 0 0 0 0 0 0 0 -1 0

1 0 0 0

1 3 0.7 0 0 0 0 0 0 0 0 0 0

-1 1 0

1 3 0.5 0 0 0 0 0 0 0 0 0 0

-1 0 1

END

PROTECT patient ratings list(nr=243,nrc=13,nrastud=2,nsubj=36)

rastudyid[] subj[] ra[] cr[,1] cr[,2] cr[,3] cr[,4]

cr[,5] cr[,6] cr[,7] cr[,8] cr[,9] cr[,10]

cr[,11] cr[,12] cr[,13]

2 1 0.111111111 0 0 0 0 0 0 0 0 0

0 0 1 -1

2 1 0.142857143 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 1 7 0 0 0 0 0 0 0 0 0 0

0 0 1

2 1 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 1 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 1 3 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 2 3 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 2 0.2 0 0 0 0 0 0 0 0 0 0

0 -1 0

2 2 0.111111111 0 0 0 0 0 0 0 0 0

0 0 0 1

2 2 5 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 2 5 0 0 0 0 0 0 0 0 0 0

-1 0 0

2 2 5 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 3 3 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 3 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 3 9 0 0 0 0 0 0 0 0 0 0

0 0 1

2 3 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 3 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 3 9 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 4 9 0 0 0 0 0 0 0 0 0 0

0 0 1

2 4 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 4 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 4 7 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 5 5 0 0 0 0 0 0 0 0 0 0

0 1 -1

Appendix B

362

2 5 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 5 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 5 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 5 3 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 6 3 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 6 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 6 3 0 0 0 0 0 0 0 0 0 0

0 0 1

2 6 0.333333333 0 0 0 0 0 0 0 0 0

0 1 0 -1

2 6 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 6 0.333333333 0 0 0 0 0 0 0 0 0

0 1 -1 0

2 7 3 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 7 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 7 5 0 0 0 0 0 0 0 0 0 0

0 0 1

2 7 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 7 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 7 9 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 8 3 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 8 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 8 3 0 0 0 0 0 0 0 0 0 0

0 0 1

2 8 1 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 8 1 0 0 0 0 0 0 0 0 0 0

-1 0 0

2 8 0.333333333 0 0 0 0 0 0 0 0 0

0 1 -1 0

2 9 1 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 9 1 0 0 0 0 0 0 0 0 0 0

0 -1 0

2 9 1 0 0 0 0 0 0 0 0 0 0

0 0 1

2 9 1 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 9 1 0 0 0 0 0 0 0 0 0 0

-1 0 0

2 9 1 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 10 5 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 10 0.2 0 0 0 0 0 0 0 0 0 0

0 -1 0

Appendix B

363

2 10 9 0 0 0 0 0 0 0 0 0 0

0 0 1

2 10 7 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 10 0.142857143 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 10 1 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 11 9 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 11 0.111111111 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 11 1 0 0 0 0 0 0 0 0 0 0

0 0 1

2 11 1 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 11 1 0 0 0 0 0 0 0 0 0 0

-1 0 0

2 11 1 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 12 0.111111111 0 0 0 0 0 0 0 0 0

0 0 1 -1

2 12 3 0 0 0 0 0 0 0 0 0 0

0 -1 0

2 12 0.2 0 0 0 0 0 0 0 0 0 0

0 0 1

2 12 5 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 12 1 0 0 0 0 0 0 0 0 0 0

-1 0 0

2 12 5 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 13 1 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 13 1 0 0 0 0 0 0 0 0 0 0

0 -1 0

2 13 1 0 0 0 0 0 0 0 0 0 0

0 0 1

2 13 5 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 13 0.2 0 0 0 0 0 0 0 0 0 0

-1 0 0

2 13 5 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 14 3 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 14 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 14 3 0 0 0 0 0 0 0 0 0 0

0 0 1

2 14 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 14 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 14 3 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 15 5 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 15 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

Appendix B

364

2 15 9 0 0 0 0 0 0 0 0 0 0

0 0 1

2 15 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 15 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 15 0.333333333 0 0 0 0 0 0 0 0 0

0 1 -1 0

2 16 3 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 16 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 16 3 0 0 0 0 0 0 0 0 0 0

0 0 1

2 16 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 16 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 16 3 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 17 3 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 17 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 17 1 0 0 0 0 0 0 0 0 0 0

0 0 1

2 17 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 17 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 17 3 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 18 9 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 18 0.111111111 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 18 1 0 0 0 0 0 0 0 0 0 0

0 0 1

2 18 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 18 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 18 3 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 19 3 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 19 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 19 0.2 0 0 0 0 0 0 0 0 0 0

0 0 1

2 19 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 19 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 19 1 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 20 5 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 20 0.2 0 0 0 0 0 0 0 0 0 0

0 -1 0

Appendix B

365

2 20 9 0 0 0 0 0 0 0 0 0 0

0 0 1

2 20 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 20 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 20 3 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 21 3 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 21 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 21 1 0 0 0 0 0 0 0 0 0 0

0 0 1

2 21 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 21 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 21 0.2 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 22 9 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 22 9 0 0 0 0 0 0 0 0 0 0

0 -1 0

2 22 0.142857143 0 0 0 0 0 0 0 0 0

0 0 0 1

2 22 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 22 0.2 0 0 0 0 0 0 0 0 0 0

-1 0 0

2 22 3 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 23 0.142857143 0 0 0 0 0 0 0 0 0

0 0 1 -1

2 23 0.142857143 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 23 7 0 0 0 0 0 0 0 0 0 0

0 0 1

2 23 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 23 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 23 3 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 24 0.2 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 24 5 0 0 0 0 0 0 0 0 0 0

0 -1 0

2 24 9 0 0 0 0 0 0 0 0 0 0

0 0 1

2 24 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 24 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 24 3 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 25 0.333333333 0 0 0 0 0 0 0 0 0

0 0 1 -1

2 25 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

Appendix B

366

2 25 0.333333333 0 0 0 0 0 0 0 0 0

0 0 0 1

2 25 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 25 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 25 3 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 26 5 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 26 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 26 0.2 0 0 0 0 0 0 0 0 0 0

0 0 1

2 26 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 26 0.2 0 0 0 0 0 0 0 0 0 0

-1 0 0

2 26 5 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 27 3 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 27 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 27 9 0 0 0 0 0 0 0 0 0 0

0 0 1

2 27 0.111111111 0 0 0 0 0 0 0 0 0

0 1 0 -1

2 27 0.111111111 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 27 0.333333333 0 0 0 0 0 0 0 0 0

0 1 -1 0

2 28 3 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 28 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 28 7 0 0 0 0 0 0 0 0 0 0

0 0 1

2 28 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 28 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 28 0.2 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 29 1 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 29 0.142857143 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 29 9 0 0 0 0 0 0 0 0 0 0

0 0 1

2 29 5 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 29 0.2 0 0 0 0 0 0 0 0 0 0

-1 0 0

2 29 5 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 30 3 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 30 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

Appendix B

367

2 30 3 0 0 0 0 0 0 0 0 0 0

0 0 1

2 30 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 30 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 30 3 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 31 3 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 31 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 31 9 0 0 0 0 0 0 0 0 0 0

0 0 1

2 31 5 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 31 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 31 0.333333333 0 0 0 0 0 0 0 0 0

0 1 -1 0

2 33 9 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 33 7 0 0 0 0 0 0 0 0 0 0

0 -1 0

2 33 0.142857143 0 0 0 0 0 0 0 0 0

0 0 0 1

2 33 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 33 0.142857143 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 33 7 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 34 0.142857143 0 0 0 0 0 0 0 0 0

0 0 1 -1

2 34 0.2 0 0 0 0 0 0 0 0 0 0

0 -1 0

2 34 5 0 0 0 0 0 0 0 0 0 0

0 0 1

2 34 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 34 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 34 3 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 35 3 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 35 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 35 5 0 0 0 0 0 0 0 0 0 0

0 0 1

2 35 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 35 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 35 1 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 36 0.111111111 0 0 0 0 0 0 0 0 0

0 0 1 -1

2 36 0.111111111 0 0 0 0 0 0 0 0 0

0 0 -1 0

Appendix B

368

2 36 9 0 0 0 0 0 0 0 0 0 0

0 0 1

2 36 0.111111111 0 0 0 0 0 0 0 0 0

0 1 0 -1

2 36 9 0 0 0 0 0 0 0 0 0 0

-1 0 0

2 36 0.111111111 0 0 0 0 0 0 0 0 0

0 1 -1 0

END

Both ratings datasets list(nr=243,nrc=13,nrastud=2,nsubj=39)

rastudyid[] subj[] ra[] cr[,1] cr[,2] cr[,3] cr[,4] cr[,5] cr[,6] cr[,7] cr[,8]

cr[,9] cr[,10] cr[,11] cr[,12] cr[,13]

1 1 0.7 1 -1 0 0 0 0 0 0 0 0

0 0 0

1 1 0.1 0 1 -1 0 0 0 0 0 0 0

0 0 0

1 1 0.01 0 0 -1 0 0 0 0 0 0 0

1 0 0

1 1 0.12 0 0 -1 1 0 0 0 0 0 0

0 0 0

1 1 0.2 0 0 -1 0 1 0 0 0 0 0

0 0 0

1 1 0.1 0 0 -1 0 0 1 0 0 0 0

0 0 0

1 1 0.1 0 0 -1 0 0 0 1 0 0 0

0 0 0

1 1 0.05 0 0 -1 0 0 0 0 1 0 0

0 0 0

1 1 0.4 0 0 0 0 0 0 0 -1 1 0

0 0 0

1 1 0.4 0 0 0 0 0 0 0 -1 0 1

0 0 0

1 1 0.7 0 0 0 0 0 0 0 0 0 0

-1 1 0

1 1 0.5 0 0 0 0 0 0 0 0 0 0

-1 0 1

1 2 0.7 1 -1 0 0 0 0 0 0 0 0

0 0 0

1 2 0.9 0 1 -1 0 0 0 0 0 0 0

0 0 0

1 2 0.1 0 0 -1 0 0 0 0 0 0 0

1 0 0

1 2 0.2 0 0 -1 1 0 0 0 0 0 0

0 0 0

1 2 0.2 0 0 -1 0 1 0 0 0 0 0

0 0 0

1 2 0.1 0 0 -1 0 0 1 0 0 0 0

0 0 0

1 2 0.1 0 0 -1 0 0 0 1 0 0 0

0 0 0

1 2 0.05 0 0 -1 0 0 0 0 1 0 0

0 0 0

1 2 0.4 0 0 0 0 0 0 0 -1 1 0

0 0 0

1 2 0.4 0 0 0 0 0 0 0 -1 0 1

0 0 0

1 2 0.7 0 0 0 0 0 0 0 0 0 0

-1 1 0

1 2 0.5 0 0 0 0 0 0 0 0 0 0

-1 0 1

1 3 0.6 1 -1 0 0 0 0 0 0 0 0

0 0 0

Appendix B

369

1 3 0.9 0 1 -1 0 0 0 0 0 0 0

0 0 0

1 3 0.1 0 0 -1 0 0 0 0 0 0 0

1 0 0

1 3 0.3 0 0 -1 1 0 0 0 0 0 0

0 0 0

1 3 0.2 0 0 -1 0 1 0 0 0 0 0

0 0 0

1 3 0.1 0 0 -1 0 0 1 0 0 0 0

0 0 0

1 3 0.1 0 0 -1 0 0 0 1 0 0 0

0 0 0

1 3 0.05 0 0 -1 0 0 0 0 1 0 0

0 0 0

1 3 0.888888889 0 0 0 0 0 0 0 -1 1

0 0 0 0

1 3 1.111111111 0 0 0 0 0 0 0 -1 0

1 0 0 0

1 3 0.7 0 0 0 0 0 0 0 0 0 0

-1 1 0

1 3 0.5 0 0 0 0 0 0 0 0 0 0

-1 0 1

2 4 0.111111111 0 0 0 0 0 0 0 0 0

0 0 1 -1

2 4 0.142857143 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 4 7 0 0 0 0 0 0 0 0 0 0

0 0 1

2 4 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 4 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 4 3 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 5 3 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 5 0.2 0 0 0 0 0 0 0 0 0 0

0 -1 0

2 5 0.111111111 0 0 0 0 0 0 0 0 0

0 0 0 1

2 5 5 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 5 5 0 0 0 0 0 0 0 0 0 0

-1 0 0

2 5 5 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 6 3 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 6 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 6 9 0 0 0 0 0 0 0 0 0 0

0 0 1

2 6 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 6 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 6 9 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 7 9 0 0 0 0 0 0 0 0 0 0

0 0 1

2 7 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 7 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 7 7 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 8 5 0 0 0 0 0 0 0 0 0 0

0 1 -1

Appendix B

370

2 8 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 8 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 8 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 8 3 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 9 3 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 9 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 9 3 0 0 0 0 0 0 0 0 0 0

0 0 1

2 9 0.333333333 0 0 0 0 0 0 0 0 0

0 1 0 -1

2 9 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 9 0.333333333 0 0 0 0 0 0 0 0 0

0 1 -1 0

2 10 3 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 10 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 10 5 0 0 0 0 0 0 0 0 0 0

0 0 1

2 10 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 10 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 10 9 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 11 3 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 11 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 11 3 0 0 0 0 0 0 0 0 0 0

0 0 1

2 11 1 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 11 1 0 0 0 0 0 0 0 0 0 0

-1 0 0

2 11 0.333333333 0 0 0 0 0 0 0 0 0

0 1 -1 0

2 12 1 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 12 1 0 0 0 0 0 0 0 0 0 0

0 -1 0

2 12 1 0 0 0 0 0 0 0 0 0 0

0 0 1

2 12 1 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 12 1 0 0 0 0 0 0 0 0 0 0

-1 0 0

2 12 1 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 13 5 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 13 0.2 0 0 0 0 0 0 0 0 0 0

0 -1 0

2 13 9 0 0 0 0 0 0 0 0 0 0

0 0 1

2 13 7 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 13 0.142857143 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 13 1 0 0 0 0 0 0 0 0 0 0

1 -1 0

Appendix B

371

2 14 9 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 14 0.111111111 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 14 1 0 0 0 0 0 0 0 0 0 0

0 0 1

2 14 1 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 14 1 0 0 0 0 0 0 0 0 0 0

-1 0 0

2 14 1 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 15 0.111111111 0 0 0 0 0 0 0 0 0

0 0 1 -1

2 15 3 0 0 0 0 0 0 0 0 0 0

0 -1 0

2 15 0.2 0 0 0 0 0 0 0 0 0 0

0 0 1

2 15 5 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 15 1 0 0 0 0 0 0 0 0 0 0

-1 0 0

2 15 5 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 16 1 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 16 1 0 0 0 0 0 0 0 0 0 0

0 -1 0

2 16 1 0 0 0 0 0 0 0 0 0 0

0 0 1

2 16 5 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 16 0.2 0 0 0 0 0 0 0 0 0 0

-1 0 0

2 16 5 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 17 3 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 17 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 17 3 0 0 0 0 0 0 0 0 0 0

0 0 1

2 17 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 17 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 17 3 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 18 5 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 18 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 18 9 0 0 0 0 0 0 0 0 0 0

0 0 1

2 18 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 18 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 18 0.333333333 0 0 0 0 0 0 0 0 0

0 1 -1 0

2 19 3 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 19 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 19 3 0 0 0 0 0 0 0 0 0 0

0 0 1

2 19 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

Appendix B

372

2 19 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 19 3 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 20 3 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 20 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 20 1 0 0 0 0 0 0 0 0 0 0

0 0 1

2 20 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 20 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 20 3 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 21 9 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 21 0.111111111 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 21 1 0 0 0 0 0 0 0 0 0 0

0 0 1

2 21 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 21 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 21 3 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 22 3 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 22 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 22 0.2 0 0 0 0 0 0 0 0 0 0

0 0 1

2 22 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 22 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 22 1 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 23 5 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 23 0.2 0 0 0 0 0 0 0 0 0 0

0 -1 0

2 23 9 0 0 0 0 0 0 0 0 0 0

0 0 1

2 23 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 23 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 23 3 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 24 3 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 24 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 24 1 0 0 0 0 0 0 0 0 0 0

0 0 1

2 24 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 24 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 24 0.2 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 25 9 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 25 9 0 0 0 0 0 0 0 0 0 0

0 -1 0

Appendix B

373

2 25 0.142857143 0 0 0 0 0 0 0 0 0

0 0 0 1

2 25 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 25 0.2 0 0 0 0 0 0 0 0 0 0

-1 0 0

2 25 3 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 26 0.142857143 0 0 0 0 0 0 0 0 0

0 0 1 -1

2 26 0.142857143 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 26 7 0 0 0 0 0 0 0 0 0 0

0 0 1

2 26 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 26 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 26 3 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 27 0.2 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 27 5 0 0 0 0 0 0 0 0 0 0

0 -1 0

2 27 9 0 0 0 0 0 0 0 0 0 0

0 0 1

2 27 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 27 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 27 3 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 28 0.333333333 0 0 0 0 0 0 0 0 0

0 0 1 -1

2 28 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 28 0.333333333 0 0 0 0 0 0 0 0 0

0 0 0 1

2 28 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 28 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 28 3 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 29 5 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 29 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 29 0.2 0 0 0 0 0 0 0 0 0 0

0 0 1

2 29 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 29 0.2 0 0 0 0 0 0 0 0 0 0

-1 0 0

2 29 5 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 30 3 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 30 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 30 9 0 0 0 0 0 0 0 0 0 0

0 0 1

2 30 0.111111111 0 0 0 0 0 0 0 0 0

0 1 0 -1

2 30 0.111111111 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 30 0.333333333 0 0 0 0 0 0 0 0 0

0 1 -1 0

Appendix B

374

2 31 3 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 31 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 31 7 0 0 0 0 0 0 0 0 0 0

0 0 1

2 31 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 31 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 31 0.2 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 32 1 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 32 0.142857143 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 32 9 0 0 0 0 0 0 0 0 0 0

0 0 1

2 32 5 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 32 0.2 0 0 0 0 0 0 0 0 0 0

-1 0 0

2 32 5 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 33 3 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 33 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 33 3 0 0 0 0 0 0 0 0 0 0

0 0 1

2 33 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 33 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 33 3 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 34 3 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 34 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 34 9 0 0 0 0 0 0 0 0 0 0

0 0 1

2 34 5 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 34 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 34 0.333333333 0 0 0 0 0 0 0 0 0

0 1 -1 0

2 36 9 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 36 7 0 0 0 0 0 0 0 0 0 0

0 -1 0

2 36 0.142857143 0 0 0 0 0 0 0 0 0

0 0 0 1

2 36 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 36 0.142857143 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 36 7 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 37 0.142857143 0 0 0 0 0 0 0 0 0

0 0 1 -1

2 37 0.2 0 0 0 0 0 0 0 0 0 0

0 -1 0

2 37 5 0 0 0 0 0 0 0 0 0 0

0 0 1

2 37 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

Appendix B

375

2 37 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 37 3 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 38 3 0 0 0 0 0 0 0 0 0 0

0 1 -1

2 38 0.333333333 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 38 5 0 0 0 0 0 0 0 0 0 0

0 0 1

2 38 3 0 0 0 0 0 0 0 0 0 0

1 0 -1

2 38 0.333333333 0 0 0 0 0 0 0 0 0

0 -1 0 0

2 38 1 0 0 0 0 0 0 0 0 0 0

1 -1 0

2 39 0.111111111 0 0 0 0 0 0 0 0 0

0 0 1 -1

2 39 0.111111111 0 0 0 0 0 0 0 0 0

0 0 -1 0

2 39 9 0 0 0 0 0 0 0 0 0 0

0 0 1

2 39 0.111111111 0 0 0 0 0 0 0 0 0

0 1 0 -1

2 39 9 0 0 0 0 0 0 0 0 0 0

-1 0 0

2 39 0.111111111 0 0 0 0 0 0 0 0 0

0 1 -1 0

END

3 Choices

Dictionary

The table below describes the key variables/parameters/constants that have not already been

encountered in the ratings models.

Name Description nchc Number of criteria in choice experiment nchs Number of choice sets in dataset V[i,k] Non-random component of utility of option k in choice set i pr[i] Probability of choosing option 2 in choice set i ch_N[i] Number of participants choosing an option in choice set i ch_n[i] Number of participants choosing option 2 in choice set i chresdev Residual deviance in choice model

The criteria in the PROTECT patient choice dataset are numbered as follows:

1. Relapse 2. Disability progression 3. PML 4. Allergic/hypersensitivity reactions 5. Serious allergic reactions 6. Depression

Appendix B

376

Choice model code

# This model assumes fixed preferences at the individual level.

model

for(i in 1:nchs) # loop through choice sets

# difference in utility between choices with logistically distributed random component

logit(pr[i]) <- V[i,2] - V[i,1]

#p is probability of choosing right-hand option, corresponding to y=2

ch_n[i]~dbin(pr[i],ch_N[i]) # likelihood

# residual deviance calcs

ch_nhat[i]<-pr[i]*ch_N[i]

chdev[i]<- 2 * (ch_n[i] * (log(ch_n[i])-log(ch_nhat[i])) + (ch_N[i]-

ch_n[i])*(log(ch_N[i]-ch_n[i]) - log(ch_N[i]-ch_nhat[i])))

for(k in 1:2) # loop through choice options

for (j in 1:nchc) pg[i,j,k] <- chsign[j]*chgamma[j]*ch_cr[i,j,k] # each

criterion's contribution to utility

V[i,k] <- sum(pg[i,1:nchc,k])` # total utility of option k

for (j in 1:nchc) chgamma[j]<-eg[j] # fixed preference model

temp[j]<-chc[j] # unused variable

for (j in 1:nchc) eg[j] ~ dgamma(1,0.01) # prior for utility coefficient

g[j]<-log(eg[j])

#weights

for(i in 1:nchc)

weight[i]<-eg[i]/sum(eg[1:nchc]) # normalised weight

chresdev<-sum(chdev[1:nchs]) # residual deviance for choice model

# END

PROTECT patient choices data

list(nchs=64,nchc=6,chc=c(1,2,3,8,9,10),chsign=c(-1,-1,-1,-1,-1,-1))

ch_N[] ch_n[] ch_cr[,1,1] ch_cr[,2,1] ch_cr[,3,1] ch_cr[,4,1]

ch_cr[,5,1] ch_cr[,6,1] ch_cr[,1,2] ch_cr[,2,2] ch_cr[,3,2]

ch_cr[,4,2] ch_cr[,5,2] ch_cr[,6,2]

25 21 2 0.25 0 0 0 0.1 1.5 0.1 0 0.5 0

0.2

25 12 2 0.1 0.003 0.5 0.02 0.2 1.5 0.25 0 0.5

0.02 0.1

Appendix B

377

24 2 2 0.1 0 0 0.02 0.1 2 0.25 0.003 0.5 0

0.1

25 25 1.5 0.25 0 0.5 0.02 0.2 2 0.1 0 0 0

0.2

25 18 1.5 0.25 0.003 0.5 0 0.2 1.5 0.25 0.003 0

0.02 0.1

24 8 2 0.1 0 0.5 0 0.1 1.5 0.1 0.003 0

0.02 0.1

25 1 1.5 0.1 0.003 0 0 0.2 1.5 0.25 0.003 0.5

0.02 0.1

25 6 2 0.1 0.003 0 0 0.2 2 0.25 0 0

0.02 0.2

27 21 2 0.25 0.003 0 0 0.2 2 0.25 0 0.5 0

0.1

27 2 1.5 0.1 0 0 0.02 0.2 2 0.25 0.003 0

0.02 0.1

27 15 2 0.25 0 0.5 0 0.2 1.5 0.25 0 0

0.02 0.2

27 20 1.5 0.1 0.003 0.5 0 0.2 1.5 0.1 0 0.5

0.02 0.1

27 9 2 0.1 0 0.5 0.02 0.2 1.5 0.1 0.003 0.5

0.02 0.1

27 15 1.5 0.25 0.003 0 0 0.1 2 0.1 0.003 0

0.02 0.2

27 19 2 0.25 0 0 0.02 0.1 1.5 0.1 0.003 0 0

0.1

27 4 1.5 0.1 0.003 0.5 0.02 0.2 2 0.25 0.003 0.5 0

0.2

32 23 2 0.25 0.003 0 0 0.2 2 0.25 0 0.5 0

0.1

32 2 2 0.25 0 0 0 0.2 2 0.25 0 0

0.02 0.1

32 13 2 0.25 0 0.5 0 0.2 1.5 0.25 0 0

0.02 0.2

31 20 1.5 0.1 0.003 0.5 0 0.2 1.5 0.1 0 0.5

0.02 0.1

31 13 2 0.1 0 0.5 0.02 0.2 1.5 0.1 0.003 0.5

0.02 0.1

32 16 1.5 0.25 0.003 0 0 0.1 2 0.1 0.003 0

0.02 0.2

30 24 2 0.25 0 0 0.02 0.1 1.5 0.1 0.003 0 0

0.1

31 6 1.5 0.1 0.003 0.5 0.02 0.2 2 0.25 0.003 0.5 0

0.2

30 20 2 0.25 0 0 0 0.1 1.5 0.1 0 0.5 0

0.2

30 18 2 0.1 0.003 0.5 0.02 0.2 1.5 0.25 0 0.5

0.02 0.1

30 2 2 0.1 0 0 0.02 0.1 2 0.25 0.003 0.5 0

0.1

29 28 1.5 0.25 0 0.5 0.02 0.2 2 0.1 0 0 0

0.2

29 19 1.5 0.25 0.003 0.5 0 0.2 1.5 0.25 0.003 0

0.02 0.1

30 10 2 0.1 0 0.5 0 0.1 1.5 0.1 0.003 0

0.02 0.1

30 4 1.5 0.1 0.003 0 0 0.2 1.5 0.25 0.003 0.5

0.02 0.1

30 9 2 0.1 0.003 0 0 0.2 2 0.25 0 0

0.02 0.2

Appendix B

378

23 20 1.5 0.25 0.003 0 0.02 0.2 1.5 0.1 0.003 0.5 0

0.1

22 8 1.5 0.25 0 0.5 0 0.2 2 0.25 0 0.5

0.02 0.1

23 14 2 0.25 0 0.5 0.02 0.2 2 0.25 0.003 0 0

0.1

22 2 2 0.1 0.003 0 0.02 0.2 1.5 0.25 0.003 0.5

0.02 0.2

23 1 2 0.1 0 0.5 0 0.2 2 0.25 0.003 0.5

0.02 0.1

21 11 1.5 0.1 0.003 0 0.02 0.1 1.5 0.25 0 0 0

0.1

23 22 2 0.1 0.003 0.5 0 0.1 1.5 0.1 0 0 0

0.2

23 4 1.5 0.1 0 0.5 0 0.1 2 0.1 0 0

0.02 0.2

22 18 1.5 0.25 0 0 0.02 0.1 2 0.1 0 0.5

0.02 0.1

22 20 1.5 0.1 0.003 0.5 0.02 0.2 2 0.1 0.003 0 0

0.1

22 12 2 0.1 0.003 0.5 0 0.2 1.5 0.25 0 0.5 0

0.1

22 3 1.5 0.25 0 0 0 0.2 1.5 0.25 0.003 0.5 0

0.1

22 19 1.5 0.25 0.003 0 0.02 0.2 1.5 0.1 0 0.5

0.02 0.2

22 3 2 0.1 0.003 0.5 0.02 0.1 2 0.25 0.003 0

0.02 0.2

22 0 2 0.1 0 0 0 0.1 1.5 0.25 0.003 0 0

0.2

22 17 2 0.25 0 0 0 0.2 1.5 0.1 0 0

0.02 0.1

30 23 1.5 0.25 0 0 0.02 0.1 2 0.1 0 0.5

0.02 0.1

30 29 1.5 0.1 0.003 0.5 0.02 0.2 2 0.1 0.003 0 0

0.1

30 9 2 0.1 0.003 0.5 0 0.2 1.5 0.25 0 0.5 0

0.1

30 8 1.5 0.25 0 0 0 0.2 1.5 0.25 0.003 0.5 0

0.1

30 27 1.5 0.25 0.003 0 0.02 0.2 1.5 0.1 0 0.5

0.02 0.2

30 3 2 0.1 0.003 0.5 0.02 0.1 2 0.25 0.003 0

0.02 0.2

30 1 2 0.1 0 0 0 0.1 1.5 0.25 0.003 0 0

0.2

30 26 2 0.25 0 0 0 0.2 1.5 0.1 0 0

0.02 0.1

32 31 1.5 0.25 0.003 0 0.02 0.2 1.5 0.1 0.003 0.5 0

0.1

32 12 1.5 0.25 0 0.5 0 0.2 2 0.25 0 0.5

0.02 0.1

32 19 2 0.25 0 0.5 0.02 0.2 2 0.25 0.003 0 0

0.1

32 1 2 0.1 0.003 0 0.02 0.2 1.5 0.25 0.003 0.5

0.02 0.2

32 2 2 0.1 0 0.5 0 0.2 2 0.25 0.003 0.5

0.02 0.1

32 11 1.5 0.1 0.003 0 0.02 0.1 1.5 0.25 0 0 0

0.1

Appendix B

379

32 26 2 0.1 0.003 0.5 0 0.1 1.5 0.1 0 0 0

0.2

32 3 1.5 0.1 0 0.5 0 0.1 2 0.1 0 0

0.02 0.2

END

Appendix B

380

4 Preference meta-analysis

Dictionary


explained.

Name Description nps Number of preference elicitation studies in dataset pma_n[i] Number of participants in study i nop[i] Number of outcomes in study i np[i,j] Number of coefficients reported for criterion j in study i up[i,j,k] Coefficient k for criterion j in study i upse[i,j,k] Standard error of up[i,j,k] x[i,j,k] Criterion value to which up[i,j,k] relates zeta[i] Scaling coefficient for study i minlev[i,j] Lowest categorical level for which utility is to be estimated for criterion j in

study i maxlev[i,j] Highestest categorical level t for which utility is to be estimated for

criterion j in study i lev[i,j,k] Categorical level to be estimated based on up[i,j,k] (=1 for linear

criteria) levsign[i,j,k] Takes value 1 if the categorical level to be estimated based on up[i,j,k]

is more favourable than the reference category in study I (or if it is a linear criterion for which higher values are more favourable); takes value -1 othewise

base[i,j] Integer code for administration reference category in study i offset[m] Parameter to use for reference category adjustment represented by code

m pmagamma[i,j,k] Study-specific utility coefficient for level k of criterion j in study I (random

preference model) pmaresdev Residual deviance in preference meta-analysis model

The criteria and studies in the RRMS preference meta-analysis dataset are numbered as follows:

Criteria

1. Relapse 2. Disability progression 3. Daily oral vs daily subcutaneous 4. Monthly infusion vs daily subcutaneous 5. Weekly injection vs daily subcutaneous

Studies

1. ARROYO 2. MANSFIELD 3. POULOS 4. WILSON 2014 5. WILSON 2015 6. GARCIA-DOMINGUEZ 7. UTZ

Preference meta-analysis model code

# This model uses random preferences at the study level; for the fixed-preference model replace the code in blue with for (k in minlev[i,j]:maxlev[i,j]) pmagamma[i,j,k]<-

abs(eg[op[i,j]+k-1]-offset[base[i,j]+1])

# Note that the scaling coefficients and base offsets are hard coded to the RRMS dataset.

Appendix B

381

model


for (i in 1:nps) # studies reporting preference coefficients

temp1[i]<-pma_n[i] # unused variable (fixed preference model)

for (j in 1:nop[i]) # loop through outcomes

AA[i,j]~dnorm(0,1)

for (k in 1:np[i,j]) # loop through utility estimates for outcome

j

up[i,j,k]~dnorm(atheta[i,j,k],prep[i,j,k]) # likelihood of

observed utility coefficient

atheta[i,j,k]<-theta[i,j,k] + sqrt(0.5)*upse[i,j,k]*AA[i,j] #

expected value of observed utility coefficient (with dummy coding

covariance adjustment)

prep[i,j,k] <- pow(upse[i,j,k],-2)*2 #

precision (with correction for dummy coding covariance)

pmares[i,j,k] <- up[i,j,k]-theta[i,j,k] # residual

theta[i,j,k]<-

levsign[i,j,k]*pmagamma[i,j,lev[i,j,k]]*ux[i,j,k]*zeta[i] # expected

value of observed utility coefficient

# residual deviance calcs

pma_cv[i,j,k,k]<- pow(upse[i,j,k],2)

pmapres[i,j,k]<-inprod(pma_cp[i,j,k,1:np[i,j]],

pmares[i,j,1:np[i,j]] )

for (m in 1:k-1) pma_cv[i,j,k,m]<-

0.5*upse[i,j,k]*upse[i,j,m]

pma_cv[i,j,m,k] <- pma_cv[i,j,k,m]

for (k in np[i,j]+1:maxp) pma_cv[i,j,k,k]<-1

for (m in 1:k-1) pma_cv[i,j,k,m]<-0

pma_cv[i,j,m,k]<-0

BB[i,j]~dnorm(0,1) # normalised component of between-study

variance shared for study i outcome j

for (k in minlev[i,j]:maxlev[i,j]) # loop through levels for

which coefficients are to be estimated

pmatau[i,j,k]<-pow(prefresig*(eg[op[i,j]+k-1]-offset[base[i,j]+1]),-

2)*2

pmagamma[i,j,k]~dnorm(pmamu[i,j,k],pmatau[i,j,k]) I(0,) # random

utility coefficeint distribution

Appendix B

382

pmamu[i,j,k]<-abs(eg[op[i,j]+k-1]-offset[base[i,j]+1]) +

sqrt(0.5)*BB[i,j]*prefresig*(eg[op[i,j]+k-1]-offset[base[i,j]+1]) # mean

of random utility coefficient distribution (with covariance adjustment)

pma_cp[i,j,1:maxp,1:maxp] <- inverse(pma_cv[i,j,1:maxp,1:maxp]

)

pmadev[i,j]<- inprod(pmapres[i,j,1:np[i,j]], pmares[i,j,1:np[i,j]] )

pmardev[i]<-sum(pmadev[i,1:nop[i]]) # residual deviance contribution

for study i

#scaling coefficients (note hard coded to MS dataset)

zeta[2]<-1

zeta[3]<-1

zeta[4]<-1

zeta[6]<-1

zeta[7]<-1

zeta[8]<-1

zeta[1]~dnorm(0,.01) I(0,)

zeta[5]~dnorm(0,.01) I(0,)

#base offsets (sets admin reference category; note hard coded to MS

dataset)

offset[1]<-0

offset[2]<-eg[3]

for (j in 1:npmac) eg[j] ~ dgamma(1,0.01) # prior for utility

coefficients

g[j]<-log(eg[j])

temp2[j]<-pmac[j] # unused variable

weight[j]<-eg[j]/sum(eg[1:npmac]) # normalised preference weights

weightb[j]<-eg[j]/sum(eg[1:3]) # normalised preference weights

excluding intermediate admin levels (note hard coding)

pmaresdev<-sum(pmardev[1:nps]) # residual deviance for preference

meta-analysis model

# END

Appendix B

383

Preference meta-analysis dataset: RRMS

list(nps=7,maxp=3,npmac=5,pmac=c(1,2,5,6,7),nc=11,wi=c(1,1,0,1,1,0,0,0,0,

0,0,0,0,0),wib=c(1,1,0,1,1,0,1,0,0,0,0,0,0,0))

pma_n[] nop[] op[,1] op[,2] op[,3] np[,1] np[,2]

np[,3] base[,1] base[,2] base[,3] ux[,1,1]

ux[,1,2] ux[,1,3] ux[,2,1] ux[,2,2] ux[,2,3]

ux[,3,1] ux[,3,2] ux[,3,3] up[,1,1] up[,1,2]

up[,1,3] up[,2,1] up[,2,2] up[,2,3] up[,3,1]

up[,3,2] up[,3,3] upse[,1,1] upse[,1,2] upse[,1,3]

upse[,2,1] upse[,2,2] upse[,2,3] upse[,3,1] upse[,3,2]

upse[,3,3] lev[,1,1] lev[,1,2] lev[,1,3] lev[,2,1]

lev[,2,2] lev[,2,3] lev[,3,1] lev[,3,2] lev[,3,3]

levsign[,1,1] levsign[,1,2] levsign[,1,3] levsign[,2,1]

levsign[,2,2] levsign[,2,3] levsign[,3,1] levsign[,3,2]

levsign[,3,3] minlev[,1] minlev[,2] minlev[,3] maxlev[,1]

maxlev[,2] maxlev[,3]

221 3 1 2 3 1 1 3 0 0 0 0.3

NA NA 0.302440605 NA NA 1 1 1 -0.734

NA NA -0.89 NA NA 1.726 0.56 0.35 0.017624027 NA

NA 0.017624027 NA NA 0.021564908 0.210222243 0.02516081 1

NA NA 1 NA NA 1 2 3 -1 NA NA -

1 NA NA 1 1 1 1 1 1 1 1 3

301 2 1 3 NA 3 2 NA 0 1 NA 0.042

0.075 0.375 1 1 NA NA NA NA -0.05 -0.2 -0.75 -

1.7000 -1.5 NA NA NA NA 0.124973966 0.124973966

0.124973966 0.210114901 0.210114901 NA NA NA NA 1 1

1 2 3 NA NA NA NA -1 -1 -1 -1 -

1 NA NA NA NA 1 2 NA 1 3 NA

189 2 1 2 NA 2 2 NA 0 0 NA 0.5

0.75 NA -0.232544158 -0.471195376 NA NA NA

NA -0.7 -1.1 NA 0.6 2.1 NA NA NA NA

0.155172514 0.176739878 NA 0.222392803 0.265109817 NA NA

NA NA 1 1 NA 1 1 NA NA NA NA -

1 -1 NA -1 -1 NA NA NA NA 1 1 NA 1

1 NA

291 3 1 2 3 2 2 3 0 0 0 -0.5 -

0.8 NA -0.238651219 -0.450851312 NA 1 1 1

0.182321557 0.425267735 NA 0.3074847 0.90016135 NA

0.732367894 0.482426149 0.039220713 0.051191504 0.051695161 NA

0.050625239 0.051817522 NA 0.062410652 0.026711978 0.060736003 1

1 NA 1 1 NA 1 2 3 -1 -1 NA -

1 -1 NA 1 1 1 1 1 1 1 1 3

50 2 1 3 NA 1 3 NA 0 1 NA 1

NA NA 1 1 1 NA NA NA -0.05 NA NA -

1.23 -1.41 -0.86 NA NA NA 0.06 NA NA 0.24 0.24 0.24

NA NA NA 1 NA NA 2 2 3 NA NA

NA -1 NA NA -1 -1 -1 NA NA NA 1 2

NA 1 3 NA

125 1 3 NA NA 2 NA NA 1 NA NA 1 1

NA NA NA NA NA NA NA -0.849 -0.943

NA NA NA NA NA NA NA 0.113 0.103 NA NA

NA NA NA NA NA 2 2 NA NA NA NA

NA NA NA -1 -1 NA NA NA NA NA NA

NA 2 NA NA 2 NA NA

156 1 3 NA NA 1 NA NA 1 NA NA 1

NA NA NA NA NA NA NA NA -4.38 NA NA

NA NA NA NA NA NA 0.504493749 NA NA NA

NA NA NA NA NA 2 NA NA NA NA NA

Appendix B

384

NA NA NA -1 NA NA NA NA NA NA NA

NA 2 NA NA 2 NA NA

END

5 Combined preference model

Dictionary


explained.

Name Description chc Vector of criteria numbers in choice data rac Vector of criteria numbers in ratings data pmac Vector of criteria numbers in preference meta-analysis data nc Total number of criteria in model wi[] Indicates whether criterion is to be included in weights pred_egamma[] Study-level predictve distribution of utility coefficient for criterion pred_pref[] Individual-level predictve distribution of utility coefficient for criterion weight_pred_study[] Study-level predictve distribution of preference weight for criterion weight_pred_y[] Individual-level predictve distribution of preference weight for criterion

totresdev Total residual deviance in all preference models

The criteria are numbered as follows:

1. Relapse 2. Disability progression 3. PML 4. Liver enzyme elevation 5. Daily oral vs daily subcutaneous 6. Monthly infusion vs daily subcutaneous 7. Weekly intramuscular vs daily subcutaneous 8. Allergic/hypersensitivity reactions 9. Serious allergic reactions 10. Depression 11. Infusion/injection reactions

Combined preference model code – fixed preferences

model


# difference in utility between choices with logistically distributed random

component

logit(pr[i]) <- V[i,2] - V[i,1]


ch_n[i]~dbin(pr[i],ch_N[i])

Appendix B

385




for(k in 1:2)

for (j in 1:nchc) pg[i,j,k] <- chsign[j]*chgamma[j]*ch_cr[i,j,k]

#Utility model

V[i,k] <- sum(pg[i,1:nchc,k])

for (j in 1:nchc) chgamma[j]<-eg[chc[j]]


#Utility model

for (j in 1:nrc) pmu[j,i] <- log(ragamma[rastudyid[i],j])*(cr[i,j]-

0.5*equals(j,1)*cr[i,j]) # when j=1 relapse coefficient is halved due to

differing time horizons in ratings dataset

ramu[i] <- sum(pmu[1:nrc,i])


logra[i]~dnorm(ramu[i],ratau)



for (j in 1:nrc)

for (k in 1:nrastud)

ragamma[k,j]<-prefmu[k,j]

prefmu[k,j] <- eg[rac[j]] + (k-1)*ahpalpha # includes alpha

for study 2 but not study 1

ahpalpha~dgamma(1,0.1)


rasig~dunif(0,10)


for (j in 1:nop[i])

AA[i,j]~dnorm(0,1)

for (k in 1:np[i,j])

up[i,j,k]~dnorm(atheta[i,j,k],prep[i,j,k])

atheta[i,j,k]<-theta[i,j,k] + sqrt(0.5)*upse[i,j,k]*AA[i,j]

Appendix B

386

prep[i,j,k] <- pow(upse[i,j,k],-2)*2

# precision (with correction for dummy coding covariance)

pmares[i,j,k] <- up[i,j,k]-theta[i,j,k]

theta[i,j,k]<-

levsign[i,j,k]*pmagamma[i,j,lev[i,j,k]]*ux[i,j,k]*zeta[i]




for (m in 1:k-1) pma_cv[i,j,k,m]<- 0.5*upse[i,j,k]*upse[i,j,m]




pma_cv[i,j,m,k]<-0

BB[i,j]~dnorm(0,1)

for (k in minlev[i,j]:maxlev[i,j])

pmagamma[i,j,k]<-abs(eg[pmac[op[i,j]+k-1]]-offset[base[i,j]+1])

pma_cp[i,j,1:maxp,1:maxp] <- inverse(pma_cv[i,j,1:maxp,1:maxp] )


pmardev[i]<-sum(pmadev[i,1:nop[i]])

zeta[2]<-1

zeta[3]<-1

zeta[4]<-1

zeta[6]<-1

zeta[7]<-1

zeta[8]<-1

zeta[1]~dnorm(0,.01) I(0,)

zeta[5]~dnorm(0,.01) I(0,)

#base offsets

offset[1]<-0

offset[2]<-eg[pmac[3]]

#priors

for (j in 1:nc) eg[j] ~ dgamma(1,0.01)

g[j]<-log(eg[j])

Appendix B

387

#weights

for(i in 1:nc)

wg[i]<-wi[i]*eg[i]

weight[i]<-wg[i]/sum(wg[1:nc])

weightb[i]<-eg[i]/sum(wg[1:nc])

pratau<-2*ratau

chresdev<-sum(chdev[1:nchs])

raresdev<-sum(radev[1:nr])

pmaresdev<-sum(pmardev[1:nps])

totresdev<-raresdev+chresdev+pmaresdev

# END

Combined preference model code – random preferences

# This model includes predictive distributions on preferences, using method 1 (as described in III.6.2.3) for the indidivual-level distributions. For method 2, replace the blue code with: pred_gamma[i]<-log(pred_egamma[i])

pred_pref[i]~dlnorm(pred_gamma[i],pratau)

model



component

logit(pr[i]) <- V[i,2] - V[i,1]






for(k in 1:2)


#Utility model


for (j in 1:nchc) chgamma[j]~dnorm(eg[chc[j]],prefretau[j]) I(0,)


Appendix B

388

#Utility model


0.5*equals(j,1)*cr[i,j]) # when j=1 relapse coefficient is halved due to

differing time horizons in ratings dataset






for (j in 1:nrc)


ragamma[k,j]~dnorm(prefmu[k,j],raprefretau[k,j]) I(0,)



raprefretau[k,j]<-pow(prefresig*prefmu[k,j],-2)



rasig~dunif(0,10)

prefresig~dunif(0,10)


for (j in 1:nop[i])

AA[i,j]~dnorm(0,1)






pmava[i,j,k]<-pma_n[i]*pow(upse[i,j,k],2)

pmava_prec[i,j,k]<-pow(pmava[i,j,k]*sqrt(2/pma_n[i]),-1)

pmava[i,j,k]~dnorm(pmavam[i,j,k],pmava_prec[i,j,k]) I(0,)

pmavam[i,j,k]<-pmava_mu*cut(theta[i,j,k])*cut(theta[i,j,k])


theta[i,j,k]<-





for (m in 1:k-1) pma_cv[i,j,k,m]<-

0.5*upse[i,j,k]*upse[i,j,m]

Appendix B

389




pma_cv[i,j,m,k]<-0

BB[i,j]~dnorm(0,1)


pmatau[i,j,k]<-pow(prefresig*(eg[pmac[op[i,j]+k-1]]-

offset[base[i,j]+1]),-2)*2

pmagamma[i,j,k]~dnorm(pmamu[i,j,k],pmatau[i,j,k]) I(0,)

pmamu[i,j,k]<-abs(eg[pmac[op[i,j]+k-1]]-offset[base[i,j]+1]) +

sqrt(0.5)*BB[i,j]*prefresig*(eg[pmac[op[i,j]+k-1]]-offset[base[i,j]+1])




zeta[2]<-1

zeta[3]<-1

zeta[4]<-1

zeta[6]<-1

zeta[7]<-1

zeta[8]<-1

zeta[1]~dnorm(0,.01) I(0,)

zeta[5]~dnorm(0,.01) I(0,)

#base offsets

offset[1]<-0


#priors


g[j]<-log(eg[j])

prefretau[j]<-pow(prefresig*eg[j],-2)

#weights

for(i in 1:nc)

wg[i]<-wi[i]*eg[i]

weight[i]<-wg[i]/sum(wg[1:nc])

weightb[i]<-eg[i]/sum(wg[1:nc])

pmaprec_mu[i]<-pow(pmava_mu*eg[i]*eg[i],-1)

pmava_mu~dunif(0,100)

# predictive preferences

Appendix B

390

for (i in 1:nc) pred_egamma[i]~dnorm(eg[i],prefretau[i]) I(0,)

pred_wegamma[i]<-wi[i]*pred_egamma[i]

weight_pred_study[i]<-pred_wegamma[i]/sum(pred_wegamma[1:nc])

weightb_pred_study[i]<-pred_egamma[i]/sum(pred_wegamma[1:nc])

pred_pref[i]~dnorm(pred_egamma[i],pmaprec_mu[i]) I(0,)

pred_wpref[i]<-wi[i]*pred_pref[i]

weight_pred_y[i]<-pred_wpref[i]/sum(pred_wpref[1:nc])

weightb_pred_y[i]<-pred_pref[i]/sum(pred_wpref[1:nc])

pratau<-2*ratau




totresdev<-raresdev+chresdev+pmaresdev

# END

Ratings dataset for combined preference model

# The combined model needs this altered data file for the ratings dataset (but for the choice or preference meta-analysis datasets the same data files already presented can be used). list(nr=231,nrc=9,nc=11,wi=c(1,1,0,1,1,0,0,0,0,0,0,0,0,0),nrastud=2,rac=c(1,2,3,4

,5,6,7,11,8))

rastudyid[] ra[] cr[,1] cr[,2] cr[,3] cr[,4] cr[,5] cr[,6] cr[,7] cr[,8] cr[,9]

1 0.7 1 -1 0 0 0 0 0 0 0

1 0.1 0 1 -1 0 0 0 0 0 0

1 0.01 0 0 -1 0 1 0 0 0 0

1 0.2 0 0 -1 1 0 0 0 0 0

1 0.05 0 0 -1 0 0 0 0 1 0

1 0.4 0 0 0 0 0 0 0 -1 1

1 0.7 0 0 0 0 -1 1 0 0 0

1 0.5 0 0 0 0 -1 0 1 0 0

1 0.7 1 -1 0 0 0 0 0 0 0

1 0.9 0 1 -1 0 0 0 0 0 0

1 0.1 0 0 -1 0 1 0 0 0 0

1 0.2 0 0 -1 1 0 0 0 0 0

1 0.05 0 0 -1 0 0 0 0 1 0

1 0.4 0 0 0 0 0 0 0 -1 1

1 0.7 0 0 0 0 -1 1 0 0 0

1 0.5 0 0 0 0 -1 0 1 0 0

1 0.6 1 -1 0 0 0 0 0 0 0

1 0.9 0 1 -1 0 0 0 0 0 0

1 0.1 0 0 -1 0 1 0 0 0 0

1 0.2 0 0 -1 1 0 0 0 0 0

1 0.05 0 0 -1 0 0 0 0 1 0

1 0.888888889 0 0 0 0 0 0 0 -1 1

1 0.7 0 0 0 0 -1 1 0 0 0

1 0.5 0 0 0 0 -1 0 1 0 0

2 0.111111111 0 0 0 0 0 1 -1 0 0

2 0.142857143 0 0 0 0 0 -1 0 0 0

2 7 0 0 0 0 0 0 1 0 0

2 3 0 0 0 0 1 0 -1 0 0

2 0.333333333 0 0 0 0 -1 0 0 0 0

2 3 0 0 0 0 1 -1 0 0 0

2 3 0 0 0 0 0 1 -1 0 0

2 0.2 0 0 0 0 0 -1 0 0 0

2 0.111111111 0 0 0 0 0 0 1 0 0

Appendix B

391

2 5 0 0 0 0 1 0 -1 0 0

2 5 0 0 0 0 -1 0 0 0 0

2 5 0 0 0 0 1 -1 0 0 0

2 3 0 0 0 0 0 1 -1 0 0

2 0.333333333 0 0 0 0 0 -1 0 0 0

2 9 0 0 0 0 0 0 1 0 0

2 3 0 0 0 0 1 0 -1 0 0

2 0.333333333 0 0 0 0 -1 0 0 0 0

2 9 0 0 0 0 1 -1 0 0 0

2 9 0 0 0 0 0 0 1 0 0

2 3 0 0 0 0 1 0 -1 0 0

2 0.333333333 0 0 0 0 -1 0 0 0 0

2 7 0 0 0 0 1 -1 0 0 0

2 5 0 0 0 0 0 1 -1 0 0

2 0.333333333 0 0 0 0 0 -1 0 0 0

2 3 0 0 0 0 1 0 -1 0 0

2 0.333333333 0 0 0 0 -1 0 0 0 0

2 3 0 0 0 0 1 -1 0 0 0

2 3 0 0 0 0 0 1 -1 0 0

2 0.333333333 0 0 0 0 0 -1 0 0 0

2 3 0 0 0 0 0 0 1 0 0

2 0.333333333 0 0 0 0 1 0 -1 0 0

2 0.333333333 0 0 0 0 -1 0 0 0 0

2 0.333333333 0 0 0 0 1 -1 0 0 0

2 3 0 0 0 0 0 1 -1 0 0

2 0.333333333 0 0 0 0 0 -1 0 0 0

2 5 0 0 0 0 0 0 1 0 0

2 3 0 0 0 0 1 0 -1 0 0

2 0.333333333 0 0 0 0 -1 0 0 0 0

2 9 0 0 0 0 1 -1 0 0 0

2 3 0 0 0 0 0 1 -1 0 0

2 0.333333333 0 0 0 0 0 -1 0 0 0

2 3 0 0 0 0 0 0 1 0 0

2 1 0 0 0 0 1 0 -1 0 0

2 1 0 0 0 0 -1 0 0 0 0

2 0.333333333 0 0 0 0 1 -1 0 0 0

2 1 0 0 0 0 0 1 -1 0 0

2 1 0 0 0 0 0 -1 0 0 0

2 1 0 0 0 0 0 0 1 0 0

2 1 0 0 0 0 1 0 -1 0 0

2 1 0 0 0 0 -1 0 0 0 0

2 1 0 0 0 0 1 -1 0 0 0

2 5 0 0 0 0 0 1 -1 0 0

2 0.2 0 0 0 0 0 -1 0 0 0

2 9 0 0 0 0 0 0 1 0 0

2 7 0 0 0 0 1 0 -1 0 0

2 0.142857143 0 0 0 0 -1 0 0 0 0

2 1 0 0 0 0 1 -1 0 0 0

2 9 0 0 0 0 0 1 -1 0 0

2 0.111111111 0 0 0 0 0 -1 0 0 0

2 1 0 0 0 0 0 0 1 0 0

2 1 0 0 0 0 1 0 -1 0 0

2 1 0 0 0 0 -1 0 0 0 0

2 1 0 0 0 0 1 -1 0 0 0

2 0.111111111 0 0 0 0 0 1 -1 0 0

2 3 0 0 0 0 0 -1 0 0 0

2 0.2 0 0 0 0 0 0 1 0 0

2 5 0 0 0 0 1 0 -1 0 0

2 1 0 0 0 0 -1 0 0 0 0

2 5 0 0 0 0 1 -1 0 0 0

2 1 0 0 0 0 0 1 -1 0 0

2 1 0 0 0 0 0 -1 0 0 0

2 1 0 0 0 0 0 0 1 0 0

2 5 0 0 0 0 1 0 -1 0 0

2 0.2 0 0 0 0 -1 0 0 0 0

2 5 0 0 0 0 1 -1 0 0 0

2 3 0 0 0 0 0 1 -1 0 0

2 0.333333333 0 0 0 0 0 -1 0 0 0

Appendix B

392

2 3 0 0 0 0 0 0 1 0 0

2 3 0 0 0 0 1 0 -1 0 0

2 0.333333333 0 0 0 0 -1 0 0 0 0

2 3 0 0 0 0 1 -1 0 0 0

2 5 0 0 0 0 0 1 -1 0 0

2 0.333333333 0 0 0 0 0 -1 0 0 0

2 9 0 0 0 0 0 0 1 0 0

2 3 0 0 0 0 1 0 -1 0 0

2 0.333333333 0 0 0 0 -1 0 0 0 0

2 0.333333333 0 0 0 0 1 -1 0 0 0

2 3 0 0 0 0 0 1 -1 0 0

2 0.333333333 0 0 0 0 0 -1 0 0 0

2 3 0 0 0 0 0 0 1 0 0

2 3 0 0 0 0 1 0 -1 0 0

2 0.333333333 0 0 0 0 -1 0 0 0 0

2 3 0 0 0 0 1 -1 0 0 0

2 3 0 0 0 0 0 1 -1 0 0

2 0.333333333 0 0 0 0 0 -1 0 0 0

2 1 0 0 0 0 0 0 1 0 0

2 3 0 0 0 0 1 0 -1 0 0

2 0.333333333 0 0 0 0 -1 0 0 0 0

2 3 0 0 0 0 1 -1 0 0 0

2 9 0 0 0 0 0 1 -1 0 0

2 0.111111111 0 0 0 0 0 -1 0 0 0

2 1 0 0 0 0 0 0 1 0 0

2 3 0 0 0 0 1 0 -1 0 0

2 0.333333333 0 0 0 0 -1 0 0 0 0

2 3 0 0 0 0 1 -1 0 0 0

2 3 0 0 0 0 0 1 -1 0 0

2 0.333333333 0 0 0 0 0 -1 0 0 0

2 0.2 0 0 0 0 0 0 1 0 0

2 3 0 0 0 0 1 0 -1 0 0

2 0.333333333 0 0 0 0 -1 0 0 0 0

2 1 0 0 0 0 1 -1 0 0 0

2 5 0 0 0 0 0 1 -1 0 0

2 0.2 0 0 0 0 0 -1 0 0 0

2 9 0 0 0 0 0 0 1 0 0

2 3 0 0 0 0 1 0 -1 0 0

2 0.333333333 0 0 0 0 -1 0 0 0 0

2 3 0 0 0 0 1 -1 0 0 0

2 3 0 0 0 0 0 1 -1 0 0

2 0.333333333 0 0 0 0 0 -1 0 0 0

2 1 0 0 0 0 0 0 1 0 0

2 3 0 0 0 0 1 0 -1 0 0

2 0.333333333 0 0 0 0 -1 0 0 0 0

2 0.2 0 0 0 0 1 -1 0 0 0

2 9 0 0 0 0 0 1 -1 0 0

2 9 0 0 0 0 0 -1 0 0 0

2 0.142857143 0 0 0 0 0 0 1 0 0

2 3 0 0 0 0 1 0 -1 0 0

2 0.2 0 0 0 0 -1 0 0 0 0

2 3 0 0 0 0 1 -1 0 0 0

2 0.142857143 0 0 0 0 0 1 -1 0 0

2 0.142857143 0 0 0 0 0 -1 0 0 0

2 7 0 0 0 0 0 0 1 0 0

2 3 0 0 0 0 1 0 -1 0 0

2 0.333333333 0 0 0 0 -1 0 0 0 0

2 3 0 0 0 0 1 -1 0 0 0

2 0.2 0 0 0 0 0 1 -1 0 0

2 5 0 0 0 0 0 -1 0 0 0

2 9 0 0 0 0 0 0 1 0 0

2 3 0 0 0 0 1 0 -1 0 0

2 0.333333333 0 0 0 0 -1 0 0 0 0

2 3 0 0 0 0 1 -1 0 0 0

2 0.333333333 0 0 0 0 0 1 -1 0 0

2 0.333333333 0 0 0 0 0 -1 0 0 0

2 0.333333333 0 0 0 0 0 0 1 0 0

2 3 0 0 0 0 1 0 -1 0 0

Appendix B

393

2 0.333333333 0 0 0 0 -1 0 0 0 0

2 3 0 0 0 0 1 -1 0 0 0

2 5 0 0 0 0 0 1 -1 0 0

2 0.333333333 0 0 0 0 0 -1 0 0 0

2 0.2 0 0 0 0 0 0 1 0 0

2 3 0 0 0 0 1 0 -1 0 0

2 0.2 0 0 0 0 -1 0 0 0 0

2 5 0 0 0 0 1 -1 0 0 0

2 3 0 0 0 0 0 1 -1 0 0

2 0.333333333 0 0 0 0 0 -1 0 0 0

2 9 0 0 0 0 0 0 1 0 0

2 0.111111111 0 0 0 0 1 0 -1 0 0

2 0.111111111 0 0 0 0 -1 0 0 0 0

2 0.333333333 0 0 0 0 1 -1 0 0 0

2 3 0 0 0 0 0 1 -1 0 0

2 0.333333333 0 0 0 0 0 -1 0 0 0

2 7 0 0 0 0 0 0 1 0 0

2 3 0 0 0 0 1 0 -1 0 0

2 0.333333333 0 0 0 0 -1 0 0 0 0

2 0.2 0 0 0 0 1 -1 0 0 0

2 1 0 0 0 0 0 1 -1 0 0

2 0.142857143 0 0 0 0 0 -1 0 0 0

2 9 0 0 0 0 0 0 1 0 0

2 5 0 0 0 0 1 0 -1 0 0

2 0.2 0 0 0 0 -1 0 0 0 0

2 5 0 0 0 0 1 -1 0 0 0

2 3 0 0 0 0 0 1 -1 0 0

2 0.333333333 0 0 0 0 0 -1 0 0 0

2 3 0 0 0 0 0 0 1 0 0

2 3 0 0 0 0 1 0 -1 0 0

2 0.333333333 0 0 0 0 -1 0 0 0 0

2 3 0 0 0 0 1 -1 0 0 0

2 3 0 0 0 0 0 1 -1 0 0

2 0.333333333 0 0 0 0 0 -1 0 0 0

2 9 0 0 0 0 0 0 1 0 0

2 5 0 0 0 0 1 0 -1 0 0

2 0.333333333 0 0 0 0 -1 0 0 0 0

2 0.333333333 0 0 0 0 1 -1 0 0 0

2 9 0 0 0 0 0 1 -1 0 0

2 7 0 0 0 0 0 -1 0 0 0

2 0.142857143 0 0 0 0 0 0 1 0 0

2 3 0 0 0 0 1 0 -1 0 0

2 0.142857143 0 0 0 0 -1 0 0 0 0

2 7 0 0 0 0 1 -1 0 0 0

2 0.142857143 0 0 0 0 0 1 -1 0 0

2 0.2 0 0 0 0 0 -1 0 0 0

2 5 0 0 0 0 0 0 1 0 0

2 3 0 0 0 0 1 0 -1 0 0

2 0.333333333 0 0 0 0 -1 0 0 0 0

2 3 0 0 0 0 1 -1 0 0 0

2 3 0 0 0 0 0 1 -1 0 0

2 0.333333333 0 0 0 0 0 -1 0 0 0

2 5 0 0 0 0 0 0 1 0 0

2 3 0 0 0 0 1 0 -1 0 0

2 0.333333333 0 0 0 0 -1 0 0 0 0

2 1 0 0 0 0 1 -1 0 0 0

2 0.111111111 0 0 0 0 0 1 -1 0 0

2 0.111111111 0 0 0 0 0 -1 0 0 0

2 9 0 0 0 0 0 0 1 0 0

2 0.111111111 0 0 0 0 1 0 -1 0 0

2 9 0 0 0 0 -1 0 0 0 0

2 0.111111111 0 0 0 0 1 -1 0 0 0

END

Appendix B

394

Initial values for combined preference model

list(eg=c(1,1,1,1,1,0.5,0.5,1,1,1,1))

Appendix B

395

6 Full MCDA model

Dictionary


explained.

Name Description admin[t] Administration mode for treatment t. =0 for daily subcutaneous, =1 for

daily oral, =2 for monthly infusion, =3 for preflink Vector listing the preference parameter number corresponding to each

mapping group wo Vector listing the outcome number used in the MCDA calculations for

each criterion/mapping group wib[] Indicates whether criterion is to be included in alternative weight

calculation (which gives a weight for intermediate admin levels but does not include them in the sum used for normalisation)

totresdev Total nce in the residual deviance in the model

The set of “outcomes” in the RRMS clinical evidence synthesis dataset is extended to include the administration modes as follows: Outcomes

1 Annualised relapse rate

2 Relapse-free proportion



5 Alanine aminotransferase above upper limit of normal range



8 Proportion with serious gastrointestinal disorders

9 Proportion with serious bradycardia

10 Proportion with macular edema

11 Indicator variable for daily oral administration

12 Indicator variable for administration by 1-3x weekly injection

In other words d[t,11]=1 for treatments with daily oral administration, d[t,12]=1 for treatments

administered by 1-3x weekly injection, and d[t,11]=d[t,12]=0 for treatments administered by daily

subcutraneous injection. An indicator for monthly infusion is not required as there are no such

treatments in the dataset.

Full MCDA model code

# This model uses fixed mappings in three groups, random preferences by study and method 1 (see III.6.2.3) for predictive preferences at the individual level. # This model includes the three “zeroes” outcomes in the evidence synthesis but excludes them from from the MCDA model. Due to the way the model is coded, it is necessary to assign utility coefficient parameters for these outcomes even though they are not included in the MCDA calculations. The

Appendix B

396

parameters eg[12], eg[13] and eg[14] are used for this purpose and are assigned the deterministic value 1. This has led to some hard coding where the number 14 is used directly to represent the number of criteria in some loops, while the parameter nc needs to retain the value 11 for other purposes in the model. model






for (j in 1:totalo)


reference treatment


correlations


correlations



for(i in 1:ns)

temp[i]<-no2[i]

E[i]~dnorm(0,1)

for (j in 1:no1[i])

mu[i,j]~dnorm(0,.01) # "average" level of outcome j in


for (j in no1[i]+1:no[i])

mu[i,j]~dgamma(0.5,0.5) # "average" level of outcome j in


for (j in 1:no[i])

G[i,j]~dnorm(0,1)

sdelta[i,j]<-sum(delta[i,1:na[i],j])/na[i] # effect of "average"


for (k in 1:na[i])

Dmu[i,k,j] <- step(totalo1+0.5-o[i,j])*(mu[i,j] + delta[i,k,j] -

sdelta[i,j]) + step(o[i,j]-totalo1-0.5)*min(1,max(0,mu[i,j] + delta[i,k,j]

- sdelta[i,j]))

D[i,k,j] <- mu[i,j] + delta[i,k,j] - sdelta[i,j] +

signr_w[o[i,j]]*sqrt(abs(rho_w[o[i,j]]))*B[i,k]*pow(prec[i,k,j]/ff[o[i,j]],

-0.5)




DD[i,k,j]<-step(totalo1+0.5-o[i,j])*D[i,k,j] + step(o[i,j]-totalo1-

0.5)*min(1,max(0,D[i,k,j]))

y[i,k,j]~dnorm(DD[i,k,j],yprec[i,k,j]) # distribution of outcome j

in arm k of study i

prec[i,k,j]<-pow(va[i,k,j]/n[i,k],-1) # overall precision

of observed outcome y

Appendix B

397

yprec[i,k,j] <- (prec[i,k,j]/(1-abs(rho_w[o[i,j]])))/ff[o[i,j]] #

remaining (unshared) precision of y after accounting for covariance



adelta[i,k,j] ~ dnorm(H[i,k,j],taud[i,k,j]) # distribution of


delta[i,k,j]<-step(o[i,j]-totalo1-0.5)*(1-

ze[o[i,j],t[i,k]])*d[t[i,k],o[i,j]]+step(totalo1+0.5-o[i,j])*(1-

ze[o[i,j],t[i,k]])*adelta[i,k,j] # select appropriate


H[i,k,j] <- d[t[i,k],o[i,j]] +




for (k in 1:na[i])

B[i,k]~dnorm(0,1) # normalised within-trial same-arm different-outcome

covariance of observed outcomes (y)

F[i,k]~dnorm(0,1)

### MAPPINGS








for (k in 2:nt)


for (m in 2:ng+1)











beta[k,j]<-sb[j]


Appendix B

398



for (i in 1:ns)




inverse(cv[i,1:totalo*maxarms,1:totalo*maxarms]) # within-study coprecision

matrix of outcomes in study i








cv[i,x,z]<-

pow((prec[i,arm[i,x],out[i,x]]/ff[o[i,out[i,x]]])*(prec[i,arm[i,z],out[i,z]

]/ff[o[i,out[i,z]]]),-







deviance








res[i,x]<-y[i,arm[i,x],out[i,x]] - Dmu[i,arm[i,x],out[i,x]]








for (k in j+1:totalo*maxarms)

cv[i,j,k]<-0 # fill in redundant off-diagonal elements of


cv[i,k,j]<-0 # fill in redundant off-diagonal elements of




cv[i,totalo*maxarms,totalo*maxarms]<-1 # fill in final redundant diagonal

element of the covariance matrix with a 1

### POPULATION CALIBRATION MODEL

Appendix B

399

for (i in 1:ns)

Q[i]~dnorm(0,1)

for (k in 1:na[i]) S[i,k]~dnorm(0,1)

for (j in 1:no1[i]) alpha[i,j]<-aalpha[i,j]

for (j in no1[i]+1:no[i]) alpha[i,j]<-min(1,max(0,aalpha[i,j]))

for (j in 1:no[i])

aalpha[i,j]~dnorm(amu[i,j],aprec[i,j])

amu[i,j]<-a[o[i,j]]+signr_b[o[i,j]]*zi*sqrt(abs(rho_b[o[i,j]]))*Q[i]

aprec[i,j]<-pow(zi,-2)/(1-abs(rho_b[o[i,j]]))

for (k in 1:na[i])

pm_y[i,k,j]<-y[i,k,j]

pm_va[i,k,j]<-va[i,k,j]

pm_va_prec[i,k,j]<-pow(pm_va[i,k,j]*sqrt(2/n[i,k]),-1)

pm_va[i,k,j]~dnorm(pm_va_mu[o[i,j]],pm_va_prec[i,k,j])

pm_prec[i,k,j]<-pow(pm_va[i,k,j]/n[i,k],-1)/((1-

abs(rho_w[o[i,j]]))*ff[o[i,j]])

pm_mu[i,k,j]<-step(o[i,j]-totalo1-0.5)*a[o[i,j]]+step(totalo1+0.5-

o[i,j])*alpha[i,j]+signr_w[o[i,j]]*sqrt(abs(rho_w[o[i,j]]))*sqrt(pm_va[i,k,

j]/n[i,k])*S[i,k] + cut(delta[i,k,j])

pm_y[i,k,j]~dnorm(pm_mu[i,k,j],pm_prec[i,k,j])

zi~dunif(0,10)


a[j]~dnorm(0,.001)


a[j]~dgamma(0.5,0.5)

for (j in 1:totalo) for (k in 1:nt)

absd[k,j]<-step(totalo1+0.5-j)*(a[j]+d[k,j])+step(j-totalo1-

0.5)*max(0,min(1,a[j]+d[k,j]))

### PREDICTIVE DISTRIBUTIONS

E[ns+1]~dnorm(0,1)

Q[ns+1]~dnorm(0,1)

for (k in 1:nt)

F[ns+1,k]~dnorm(0,1)

S[ns+1,k]~dnorm(0,1)


alpha[ns+1,j]<-aalpha[ns+1,j]

pm_va_mu[j]~dunif(0,100)


alpha[ns+1,j]<-max(0,aalpha[ns+1,j])

pm_va_mu[j]~dunif(0,0.0001)

for (j in 1:totalo)

G[ns+1,j]~dnorm(0,1)

Appendix B

400

amu[ns+1,j]<-a[j]+signr_b[j]*zi*sqrt(abs(rho_b[j]))*Q[ns+1]

aprec[ns+1,j]<-pow(zi,-2)/(1-abs(rho_b[j]))

aalpha[ns+1,j]~dnorm(amu[ns+1,j],aprec[ns+1,j])

for (k in 1:nt)

taud[ns+1,k,j]<-tau/(1-abs(rho_b[j])-0.5+0.5*abs(rho_b[j]))

adelta[ns+1,k,j] ~ dnorm(H[ns+1,k,j],taud[ns+1,k,j])

delta[ns+1,k,j]<-step(j-totalo1-0.5)*(1-ze[j,k])*d[k,j]+step(totalo1+0.5-

j)*(1-ze[j,k])*adelta[ns+1,k,j] # select appropriate


H[ns+1,k,j] <- d[k,j] +

signr_b[j]*(sqrt(abs(rho_b[j])*0.5)*E[ns+1]+signr_b[j]*sqrt(abs(rho_b[j])-

abs(rho_b[j])*0.5)*F[ns+1,k] + signr_b[j]*sqrt(0.5-

abs(rho_b[j])*0.5)*G[ns+1,j])* pow(tau,-0.5)

pm_prec[ns+1,k,j]<-pow(pm_va_mu[j]*ff[j],-1)/(1-abs(rho_w[j]))

pm_amu[ns+1,k,j]<-step(j-totalo1-

0.5)*min(1,max(0,a[j]+cut(delta[ns+1,k,j])))+step(totalo1+0.5-

j)*(alpha[ns+1,j]+cut(delta[ns+1,k,j]))

pm_mu[ns+1,k,j]<- pm_amu[ns+1,k,j]

+signr_w[j]*sqrt(abs(rho_w[j]))*sqrt(pm_va_mu[j])*S[ns+1,k]

apred_y[k,j]~dnorm(pm_mu[ns+1,k,j],pm_prec[ns+1,k,j])

pred_y[k,j]<- step(totalo1+0.5-j)*apred_y[k,j] + step(j-totalo1-

0.5)*max(0,apred_y[k,j])

# Assign admin levels

for (k in 1:nt) d[k,11]<-equals(admin[k],1) # Daily oral indicator

d[k,12]<-equals(admin[k],3) # every 2 days-weekly injection indicator

### TRANSFORMATIONS, RANKINGS, WEIGHTS, MCDA

for (m in 1:ng)

wgt[m]<-wg[preflink[m]]

weight[m]<-wgt[m]/sum(wgt[1:ng])

for (j in 1:14) weightb[j]<-eg[j]/sum(wgt[1:ng])

for (k in 1:nt)

wbr[k]<-sum(pbr[k,1:ng])

wbr_pred_study[k]<-sum(pbr_pred_study[k,1:ng])

wbr_pred_y[k]<-sum(pbr_pred_y[k,1:ng])

trad[k,1]<-exp(absd[k,1])

trad_pred_study[k,1]<-min(3,exp(pm_amu[ns+1,k,1]))

trad_pred_y[k,1]<-min(3,exp(pred_y[k,1]))

for (j in 2:7) trad[k,j]<-exp(absd[k,j])/(1+exp(absd[k,j]))

trad_pred_study[k,j]<-exp(pm_amu[ns+1,k,j])/(1+exp(pm_amu[ns+1,k,j]))

trad_pred_y[k,j]<-exp(pred_y[k,j])/(1+exp(pred_y[k,j]))

for (j in 8:10) trad[k,j] <- absd[k,j]

trad_pred_study[k,j]<-pm_amu[ns+1,k,j]

trad_pred_y[k,j]<-pred_y[k,j]

Appendix B

401

#admin route categories are deterministic

trad[k,11]<-d[k,11]

trad[k,12]<-d[k,12]

trad_pred_study[k,11]<-d[k,11]

trad_pred_study[k,12]<-d[k,12]

trad_pred_y[k,11]<-d[k,11]

trad_pred_y[k,12]<-d[k,12]

for (m in 1:ng)

pbr[k,m]<-impact[wo[m]]*trad[k,wo[m]]*weightb[preflink[m]]*wib[preflink[m]]

pbr_pred_study[k,m]<-

impact[wo[m]]*trad_pred_study[k,wo[m]]*weightb_pred_study[m]*wib[preflink[m

]]

pbr_pred_y[k,m]<-

impact[wo[m]]*trad_pred_y[k,wo[m]]*weightb_pred_y[m]*wib[preflink[m]]

rank[k,m]<-equals(impact[ogbase[m]],-

1)*rank(wbr[],k)+equals(impact[ogbase[m]],1)*(nt+1- rank(wbr[],k)) #

treatment rankings by outcome

for (q in 1:nt)

rankprop[k,m,q]<-equals(rank[k,m],q)

cumrankprop[k,m,q]<-step(q-rank[k,m]) # indicator for time


sucra[k,m]<-sum(cumrankprop[k,m,1:nt-1])/(nt-1) # SUCRA

for (k in 1:nt) totrank[k]<-rank(wbr[],k)

totrank_pred_study[k]<-rank(wbr_pred_study[],k)

totrank_pred_y[k]<-rank(wbr_pred_y[],k)

### PREFERENCE MODEL



component

logit(pr[i]) <- V[i,2] - V[i,1]






for(k in 1:2)


#Utility model


Appendix B

402

for (j in 1:nchc) chgamma[j]~dnorm(eg[chc[j]],prefretau[j]) I(0,)


#Utility model


0.5*equals(j,1)*cr[i,j]) # includes adjustment for relapse time horizon






for (j in 1:nrc)


ragamma[k,j]~dnorm(prefmu[k,j],raprefretau[k,j]) I(0,)



raprefretau[k,j]<-pow(prefresig*prefmu[k,j],-2)



rasig~dunif(0,10)

prefresig~dunif(0,10)


for (j in 1:nop[i])

AA[i,j]~dnorm(0,1)






pmava[i,j,k]<-pma_n[i]*pow(upse[i,j,k],2)

pmava_prec[i,j,k]<-pow(pmava[i,j,k]*sqrt(2/pma_n[i]),-1)

pmava[i,j,k]~dnorm(pmavam[i,j,k],pmava_prec[i,j,k]) I(0,)

pmavam[i,j,k]<-pmava_mu*cut(theta[i,j,k])*cut(theta[i,j,k])


Appendix B

403

theta[i,j,k]<-





for (m in 1:k-1) pma_cv[i,j,k,m]<- 0.5*upse[i,j,k]*upse[i,j,m]




pma_cv[i,j,m,k]<-0

BB[i,j]~dnorm(0,1)


pmatau[i,j,k]<-pow(prefresig*(eg[pmac[op[i,j]+k-1]]-

offset[base[i,j]+1]),-2)*2

pmagamma[i,j,k]~dnorm(pmamu[i,j,k],pmatau[i,j,k]) I(0,)

pmamu[i,j,k]<-abs(eg[pmac[op[i,j]+k-1]]-offset[base[i,j]+1]) +

sqrt(0.5)*BB[i,j]*prefresig*(eg[pmac[op[i,j]+k-1]]-offset[base[i,j]+1])




zeta[2]<-1

zeta[3]<-1

zeta[4]<-1

zeta[6]<-1

zeta[7]<-1

zeta[8]<-1

zeta[1]~dnorm(0,.01) I(0,)

zeta[5]~dnorm(0,.01) I(0,)

#base offsets

offset[1]<-0


#priors


g[j]<-log(eg[j])

Appendix B

404

eg[12]<-1

eg[13]<-1

eg[14]<-1

#weights

for(i in 1:14)

wg[i]<-wi[i]*eg[i]

pmaprec_mu[i]<-pow(pmava_mu*eg[i]*eg[i],-1)

prefretau[i]<-pow(prefresig*eg[i],-2)

pmava_mu~dunif(0,100)

# predictive preferences

for (i in 1:ng)

pred_egamma[i]~dnorm(eg[preflink[i]],prefretau[preflink[i]]) I(0,)

pred_wegamma[i]<-wi[preflink[i]]*pred_egamma[i]

weight_pred_study[i]<-pred_wegamma[i]/sum(pred_wegamma[1:ng])

weightb_pred_study[i]<-pred_egamma[i]/sum(pred_wegamma[1:ng])

pred_pref[i]~dnorm(pred_egamma[i],pmaprec_mu[preflink[i]]) I(0,)

pred_wpref[i]<-wi[preflink[i]]*pred_pref[i]

weight_pred_y[i]<-pred_wpref[i]/sum(pred_wpref[1:ng])

weightb_pred_y[i]<-pred_pref[i]/sum(pred_wpref[1:ng])

pratau<-2*ratau




totresdev<-raresdev+chresdev+pmaresdev+nmaresdev

# END

Appendix B

405

Full MCDA model data

The data from the clinical evidence synthesis model (4b) and the combined preference model data

are used again here. It is necessary however to replace the list formatted clinical evidence synthesis

data with the following:

Parameter values (list format) for full MCDA model

list(rho_b=c(0.6,0.6,0.6,0.6,0.6,0.6,0.6,0,0,0),rho_w=c(0.6,0.6,0.6,0.6,0.6,0.6,0

.6,0.6,0.6,0.6),ns=16,totalo=10,totalo1=7,maxarms=4,nt=9,impact=c(-1,1,-1,-1,-1,-

1,-1,-1,-1,-1,1,1),sign=c(-1,1,-1,-

1,1,1,1,1,1,1,1,1),ogbase=c(1,3,5,8,9,10,11,12,13),wo=c(1,3,5,8,9,10,11,12,13),ng

=8,preflink=c(1,2,4,11,12,13,5,7),admin=c(1,1,1,0,3,3,3,1,1),wib=c(1,1,0,1,1,0,1,

0,0,0,0,0,0,0))

Initial values for full MCDA model

list(eg=c(1,1,1,1,1,0.5,0.5,1,1,1,1,NA,NA,NA))

Appendix C

406

Appendix C. Additional results and sensitivity analyses

1 Clinical evidence synthesis models

Estimated mappings

The tables below summarise the posterior distributions of the mappings between outcomes in the

final treatment effects model for both fixed and random mappings (in the latter case, the average

across treatments is reported) according to the following grouping strategies:

- One group: all efficacy and liver safety outcomes in one mapping group

- Two groups: all efficacy outcomes in one group, all liver safety outcomes in another group

- Three groups: both relapse outcomes in one group, both disability progression outcomes in

a second group and all liver safety outcomes in a third group.

Note that serious gastrointestinal disorders, serious bradycardia and macular edema are not subject

to mappings and therefore do not feature in these results.

FIXED MAPPINGS MODEL

1 group 2 groups 3 groups

mean sd mean sd mean sd

Mapping relative to group reference outcome (reference outcomes have constant mapping of 1)

log ARR 1 1 1 1 1 1

logit avoid relapse 1.510 0.608 1.327 0.408 1.188 0.344

logit 3M DP -0.838 0.269 -0.762 0.189 1 1

logit 6M DP -0.972 0.324 -0.894 0.243 -1.047 0.315

logit ALT>ULN 2.132 0.817 1 1 1 1

logit ALT>3xULN 1.670 0.725 0.817 0.177 0.793 0.183

logit ALT>5xULN 0.509 0.455 0.267 0.186 0.302 0.196

RANDOM MAPPINGS MODEL

1 group 2 groups 3 groups

mean sd mean sd mean sd Mapping relative to group reference outcome (reference outcomes have constant mapping of 1)

log ARR 1 1 1 1 1 1

logit avoid relapse 2.270 0.933 1.396 0.482 1.235 0.363

logit 3M DP -1.046 0.415 -0.843 0.236 1 1

logit 6M DP -1.316 0.558 -0.981 0.322 -1.061 0.327

logit ALT>ULN 2.933 1.130 1 1 1 1 logit ALT>3xULN 2.216 1.007 0.818 0.215 0.740 0.210

logit ALT>5xULN 0.665 0.596 0.259 0.206 0.258 0.205

Appendix C

407

Treatment rankings by outcome

The figures below show the proportion of MCMC simulations each treatment spent at each rank, for

each individual clinical outcome in turn. The rankings are based upon the population-average

treatment effects in the final model (random effects on efficacy and liver safety; fixed effects on

serious gastrointestinal disorders, serious bradycardia and macular edema; three mapping groups;

all between-outcome correlations=0.6) and both fixed-mapping and random-mapping versions are

shown.

The rankings for serious gastrointentinal disorders, serious bradycardia and macular edema are not

shown as these outcomes do not contribute to the benefit-risk assessment in the RRMS case study.

DF = dimethyl fumarate, FM = fingolimod, GA = glatiramer acetate, IA(IM) = intramuscular interferon beta-1a,

IA (SC) = subcutaneous interferon beta-1a, IB = interferon beta-1b, LQ = laquinimod, TF = teriflunomide. ARR =

annualised relapse rate, RFP = relapse-free proportion, DP3 = proportion experiencing disability progression

confirmed 3 months later, DP6 = proportion experiencing disability progression confirmed 6 months later, ALT

= proportion with alanine aminotransferase above upper limit of normal range, ALT3 = proportion with

alanine aminotransferase above 3x upper limit of normal range, ALT5 = proportion with alanine

aminotransferase above 5x upper limit of normal range.

Appendix C

408

Fixed mappings

Appendix C

409

Random mappings

Appendix C

410

Sensitivity to assumed correlations

The two tables below show the posterior mean and standard deviation of the key parameters in the

treatment effects module (Model 3, random effects, one mapping group) with the correlations

between all pairs of outcomes (at both the within- and between-study levels) set to 0, 0.3, 0.6 (as

per the main results in II.6.1.4) and 0.9. The tables use fixed and random mappings respectively.

The treatment-outcome combinations with no data (instead estimated via the mappings) are shown

in grey.

RANDOM EFFECTS FIXED MAPPINGS 1 GROUP MODEL 3

All correlations = 0 All correlations = 0.3 All correlations = 0.6 All correlations = 0.9



DF -0.429 0.118 -0.414 0.106 -0.437 0.120 -0.448 0.156 FM -0.680 0.145 -0.612 0.136 -0.626 0.159 -0.637 0.221 GA -0.267 0.085 -0.261 0.078 -0.289 0.088 -0.338 0.124 IA (IM) -0.276 0.087 -0.257 0.078 -0.265 0.087 -0.268 0.110 IA (SC) -0.340 0.117 -0.343 0.107 -0.399 0.125 -0.484 0.185 IB -0.540 0.128 -0.507 0.118 -0.553 0.141 -0.630 0.215 LM -0.354 0.090 -0.330 0.083 -0.351 0.098 -0.367 0.132 TF -0.366 0.124 -0.343 0.110 -0.354 0.124 -0.350 0.153







Appendix C

411


DF 0.797 0.155 0.856 0.165 0.857 0.180 0.861 0.220 FM 1.281 0.235 1.274 0.242 1.235 0.250 1.217 0.282 GA 0.497 0.135 0.540 0.135 0.568 0.139 0.646 0.170 IA (IM) 0.520 0.152 0.536 0.152 0.523 0.153 0.515 0.171 IA (SC) 0.655 0.253 0.727 0.245 0.797 0.248 0.933 0.286 IB 1.022 0.243 1.063 0.250 1.099 0.262 1.214 0.314 LM 0.664 0.141 0.685 0.144 0.690 0.153 0.704 0.183 TF 0.682 0.194 0.712 0.201 0.697 0.212 0.674 0.245





Average mapping (reference outcome has constant mapping of 1)

log ARR 1 1 1 1 1 1 1 1 logit avoid relapse 1.355 0.339 1.545 0.465 1.510 0.608 1.575 0.915 logit 3M DP -0.669 0.283 -0.817 0.277 -0.838 0.269 -0.877 0.295 logit 6M DP -0.934 0.309 -0.968 0.345 -0.972 0.324 -0.996 0.351 logit ALT>ULN 1.958 0.531 2.192 0.696 2.132 0.817 2.235 1.160 logit ALT>3xULN 1.415 0.525 1.715 0.624 1.670 0.725 1.774 0.966 logit ALT>5xULN 0.472 0.410 0.603 0.480 0.509 0.455 0.479 0.482

Between-study treatment effects sd 0.226 0.038 0.255 0.041 0.367 0.050 0.879 0.097 Residual deviance 161.2 17.5 161.2 17.5 163.0 17.9 160.7 18.2

Appendix C

412

RANDOM EFFECTS RANDOM MAPPINGS 1 GROUP MODEL 3

All correlations = 0 All correlations = 0.3 All correlations = 0.6 All correlations = 0.9












Appendix C

413







Between-study treatment effects sd 0.160 0.044 0.186 0.043 0.250 0.052 0.613 0.097 Between-treatment mapping sd 0.571 0.186 0.499 0.182 0.566 0.161 0.617 0.162 Residual deviance 154.8 16.9 157.3 16.9 161.0 17.5 162.9 18.4

The table below shows the results when a vague prior is assigned to each outcome’s “propensity to

correlate” (as defined in II.4.4.1.3). Initially a uniform prior on the interval (-1,1) was attempted;

however in some cases this failed to converge well and therefore a uniform prior on the interval (-

0.9,0.9) was used instead. Two variations are presented: in one, the between- and within-study

correlation propensities are assumed equal; in the other, they are allowed to differ.

Appendix C

414

RANDOM EFFECTS MODEL 3

Fixed mappings Random mappings

Vague prior on each outcome’s correlation propensity (between-study = within-study)

Vague prior on each outcome’s correlation propensity (between-

study within-study)



study within-study)












Appendix C

415







Between-study treatment effects sd 0.299 0.059 0.331 0.072 0.246 0.064 0.261 0.076 Between-treatment mapping sd N/A N/A 0.370 0.258 0.401 0.182 0.429 0.182 Residual deviance 137.6 19.6 135.1 20.3 136.8 18.9 133.4 19.9 Between-study correlation propensity

log ARR 0.328 0.294 0.152 0.422 0.274 0.330 0.169 0.428 logit avoid relapse -0.344 0.211 -0.453 0.295 -0.216 0.345 -0.309 0.451 logit 3M DP -0.028 0.441 -0.427 0.410 -0.061 0.437 -0.326 0.461 logit 6M DP 0.179 0.443 -0.262 0.475 0.110 0.462 -0.177 0.497 logit ALT>ULN 0.404 0.246 0.440 0.275 0.401 0.287 0.332 0.388 logit ALT>3xULN 0.360 0.301 0.308 0.425 0.357 0.346 0.195 0.494 logit ALT>5xULN 0.465 0.368 0.211 0.522 0.454 0.369 0.164 0.525

Within-study correlation propensity

log ARR 0.328 0.294 0.287 0.423 0.274 0.330 0.237 0.417 logit avoid relapse -0.344 0.211 -0.244 0.356 -0.216 0.345 -0.123 0.433 logit 3M DP -0.028 0.441 0.243 0.456 -0.061 0.437 0.116 0.470 logit 6M DP 0.179 0.443 0.279 0.474 0.110 0.462 0.147 0.495 logit ALT>ULN 0.404 0.246 0.272 0.420 0.401 0.287 0.200 0.468 logit ALT>3xULN 0.360 0.301 0.187 0.446 0.357 0.346 0.202 0.455 logit ALT>5xULN 0.465 0.368 0.281 0.505 0.454 0.369 0.277 0.489

Appendix C

416

The graphs below show the SUCRA statistic by outcome for the models above, based on rankings of the (population average) treatment effects.

Assumed correlation coefficient


0

0.3

0.6

0.9



study within-study)

Appendix C

417

Sensitivity to random effects standard deviation prior


treatment effects module (Model 3, random effects) with an alternative uniform prior on the

random effects standard deviation: 𝜎 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,2) (as recommended by the NICE Decision

Support Unit 78) together with 𝜎 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10) (as per the main results in II.6.1.4). The tables use

one mapping group and three mapping groups respectively. The treatment-outcome combinations

with no data (instead estimated via the mappings) are shown in grey.

RANDOM EFFECTS 1 GROUP MODEL 3


𝜎 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,2) 𝜎 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10) 𝜎 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,2) 𝜎 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10)










Appendix C

418









Between-study treatment effects sd 0.367 0.050 0.367 0.050 0.242 0.050 0.250 0.052 Between-treatment mapping sd N/A N/A N/A N/A 0.614 0.164 0.566 0.161 Residual deviance 162.8 17.9 163.0 17.9 161.4 17.4 161.0 17.5

Appendix C

419

RANDOM EFFECTS 3 GROUPS MODEL 3


𝜎 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,2) 𝜎 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10) 𝜎 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,2) 𝜎 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10)











DF 0.292 0.210 0.299 0.210 0.289 0.213 0.301 0.209 FM 1.379 0.283 1.388 0.276 1.368 0.277 1.380 0.282 GA -0.170 0.214 -0.170 0.214 -0.188 0.221 -0.183 0.215 IA (IM) 0.577 0.210 0.582 0.209 0.582 0.208 0.590 0.206 IA (SC) 1.099 0.433 1.067 0.427 1.077 0.425 1.075 0.427 IB 0.929 0.377 0.935 0.374 0.959 0.373 0.958 0.369 LM 0.765 0.164 0.769 0.162 0.756 0.159 0.761 0.157 TF 0.711 0.247 0.716 0.243 0.774 0.242 0.781 0.244

Appendix C

420






log ARR 1 1 1 1 1 1 1 1 logit avoid relapse 1.159 0.332 1.188 0.344 1.237 0.379 1.235 0.363 logit 3M DP 1 1 1 1 1 1 1 1 logit 6M DP -1.041 0.290 -1.047 0.315 -1.097 0.336 -1.061 0.327 logit ALT>ULN 1 1 1 1 1 1 1 1 logit ALT>3xULN 0.796 0.177 0.793 0.183 0.778 0.236 0.740 0.210 logit ALT>5xULN 0.300 0.194 0.302 0.196 0.317 0.202 0.258 0.205

Between-study treatment effects sd 0.277 0.053 0.274 0.052 0.261 0.052 0.262 0.053 Between-treatment mapping sd N/A N/A N/A N/A 0.292 0.177 0.264 0.170 Residual deviance 162.3 17.9 163.0 17.9 161.6 17.7 162.1 17.8

Appendix C

421

2 Preference models

Sensitivity to priors


ratings model (both PROTECT ratings datasets) with two alternative uniform priors on the ratings

standard deviation: 𝜎𝑟𝑎𝑡 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,2) and 𝜎𝑟𝑎𝑡 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,50) together with

𝜎𝑟𝑎𝑡 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10) (as per the main results in III.3.4.3).

FIXED PREFERENCES 𝜎𝑟𝑎𝑡 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,2) 𝜎𝑟𝑎𝑡 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10) 𝜎𝑟𝑎𝑡 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,50)

Unit Mean sd Mean sd Mean sd

Preference weights Relapse 1 event

11.6% 6.6% 11.5% 6.5% 11.6% 6.6%

Disability progression 1 event

15.1% 6.2% 15.0% 6.2% 15.1% 6.1%

PML 1 event 32.7% 6.5% 32.8% 6.4% 32.8% 6.4% Herpes reactivation 1 event

8.5% 4.9% 8.6% 4.9% 8.5% 4.9%

Liver enzyme elevation 1 event

8.8% 5.0% 8.8% 5.0% 8.8% 5.0%

Seizures 1 event 5.1% 3.2% 5.1% 3.2% 5.1% 3.2% Congenital abnormalities 1 event

5.1% 3.2% 5.1% 3.3% 5.1% 3.2%

Infusion/injection reactions 1 event

4.6% 2.5% 4.7% 2.5% 4.6% 2.5%

Allergic/hypersensitivity reactions 1 event

3.9% 3.1% 3.9% 3.1% 3.9% 3.1%

Flu-like reactions 1 event 4.1% 3.3% 4.1% 3.3% 4.1% 3.3% Administration (daily oral vs daily subcutaneous) N/A

0.5% 0.2% 0.5% 0.2% 0.5% 0.2%

Administration (monthly infusion vs daily subcutaneous) N/A

0.3% 0.1% 0.3% 0.1% 0.3% 0.1%

Administration (weekly intramuscular vs daily subcutaneous) N/A

0.2% 0.1% 0.2% 0.1% 0.2% 0.1%

Ratings standard deviation N/A

1.17 0.05 1.17 0.05 1.17 0.05

Residual deviance N/A 241.9 22.0 242.0 22.1 242.1 22.0

Appendix C

422

RANDOM PREFERENCES BY PARTICIPANT

𝜎𝑟𝑎𝑡 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,2) 𝜎𝑟𝑎𝑡 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10) 𝜎𝑟𝑎𝑡 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,50)


Preference weights Relapse 1 event 12.0% 6.4% 12.0% 6.7% 12.1% 6.3% Disability progression 1 event 15.9% 6.2% 15.9% 6.4% 16.1% 6.4% PML 1 event 32.6% 7.1% 32.6% 7.0% 32.6% 7.0% Herpes reactivation 1 event 8.7% 4.8% 8.4% 4.6% 8.6% 4.5% Liver enzyme elevation 1 event 8.9% 4.8% 9.0% 5.1% 8.7% 4.8% Seizures 1 event 5.0% 3.1% 5.2% 3.1% 5.3% 3.4% Congenital abnormalities 1 event 5.0% 3.1% 4.9% 2.8% 5.1% 3.1% Infusion/injection reactions 1 event 4.3% 2.4% 4.4% 2.4% 4.2% 2.4% Allergic/hypersensitivity reactions 1 event 3.5% 2.8% 3.3% 2.5% 3.2% 2.3% Flu-like reactions 1 event 3.6% 2.8% 3.8% 3.0% 3.6% 2.7% Administration (daily oral vs daily subcutaneous) N/A 0.6% 0.2% 0.6% 0.2% 0.6% 0.2% Administration (monthly infusion vs daily subcutaneous) N/A 0.4% 0.1% 0.4% 0.1% 0.4% 0.1% Administration (weekly intramuscular vs daily subcutaneous) N/A 0.2% 0.1% 0.2% 0.1% 0.2% 0.1%

Ratings standard deviation N/A 1.02 0.06 1.01 0.06 1.02 0.06 Proportional between-participant preference standard deviation N/A 0.32 0.04 0.33 0.04 0.33 0.04 Residual deviance N/A 242.0 21.9 241.9 21.9 242.0 22.1

Appendix C

423

The table below shows the posterior mean and standard deviation of the key parameters in the

ratings model (both PROTECT ratings datasets, random preferences by participant) with two

alternative uniform priors on the random preferences standard deviation: 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,2) and

𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,50) together with 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10) (as per the main results in III.3.4.3).

RANDOM PREFERENCES BY PARTICIPANT

𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,2) 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10) 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,50)


Preference weights Relapse 1 event 11.3% 6.3% 12.0% 6.7% 12.5% 6.4% Disability progression 1 event 14.9% 5.8% 15.9% 6.4% 16.0% 6.0% PML 1 event 33.4% 6.9% 32.6% 7.0% 32.3% 7.0% Herpes reactivation 1 event 8.8% 4.8% 8.4% 4.6% 8.6% 4.8% Liver enzyme elevation 1 event 9.6% 5.2% 9.0% 5.1% 8.7% 4.6% Seizures 1 event 5.3% 3.1% 5.2% 3.1% 5.0% 2.9% Congenital abnormalities 1 event 5.0% 3.0% 4.9% 2.8% 4.9% 2.9% Infusion/injection reactions 1 event 4.2% 2.2% 4.4% 2.4% 4.3% 2.2% Allergic/hypersensitivity reactions 1 event 3.4% 2.5% 3.3% 2.5% 3.4% 2.5% Flu-like reactions 1 event 3.5% 2.6% 3.8% 3.0% 3.7% 2.7% Administration (daily oral vs daily subcutaneous) N/A 0.6% 0.3% 0.6% 0.2% 0.6% 0.2% Administration (monthly infusion vs daily subcutaneous) N/A 0.4% 0.2% 0.4% 0.1% 0.4% 0.2% Administration (weekly intramuscular vs daily subcutaneous) N/A 0.2% 0.1% 0.2% 0.1% 0.2% 0.1%

Ratings standard deviation N/A 1.02 0.06 1.01 0.06 1.02 0.06 Proportional between-participant preference standard deviation N/A 0.33 0.04 0.33 0.04 0.32 0.04 Residual deviance N/A 242.0 22.1 241.9 21.9 242.0 22.0

Appendix C

424

The table below shows the posterior mean and standard deviation of the key parameters in the

preference meta-analysis model (random preferences by study) with two alternative uniform priors

on the random preferences standard deviation: 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,2) and 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,50)

together with 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10) (as per the main results in III.5.6.1).

RANDOM PREFERENCES BY STUDY

unit 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,2) 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10) 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,50)

mean SE mean SE mean SE


Relapse rate 1 relapse/year -1.486 0.511 -1.474 0.523 -1.362 0.453 Disability progression 100% risk -3.199 1.681 -3.195 1.606 -3.121 1.737

Daily oral vs daily subcutaneous N/A 2.719 0.752 2.718 0.775 1.593 0.584

Monthly infusion vs daily subcutaneous N/A 0.611 0.287 0.610 0.291 4.004 1.051

Weekly intramuscular vs daily subcutaneous N/A 0.529 0.433 0.517 0.428 0.362 0.276


Relapse rate 1 relapse/year 17.8% 5.3% 20.3% 6.0% 13.3% 4.1%

Disability progression 100% risk 36.7% 9.5% 42.2% 9.7% 29.4% 8.9%

Daily oral vs daily subcutaneous N/A 32.4% 6.9% 37.5% 8.5% 15.2% 3.6%

Monthly infusion vs daily subcutaneous N/A 7.2% 2.6% 8.4% 3.5% 38.7% 6.5%

Weekly intramuscular vs daily subcutaneous N/A 6.0% 4.1% 7.1% 5.6% 3.4% 2.0%

Between-study proportional preference standard deviation N/A 0.64 0.16 0.65 0.17 0.66 0.16

Residual deviance N/A 45.6 6.5 45.6 6.5 45.7 6.6

Appendix C

425

The table below shows the posterior mean and standard deviation of the key parameters in the full

preference model model (random preferences by study) with two alternative uniform priors on the

random preferences standard deviation: 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,2) and 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,50) together

with 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10) (as per the main results in III.5.6.1).

RANDOM PREFERENCES BY STUDY unit 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,2) 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,10) 𝜎𝑝𝑟𝑒𝑓 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,50)

mean SE mean SE mean SE


Relapse rate 1 relapse/year -1.86 1.02 -1.62 0.62 -1.77 0.64

Disability progression 100% risk -7.31 2.23 -7.26 2.21 -7.23 2.16 PML 100% risk -244.5 75.5 -245.3 75.3 -245.6 75.3 Liver enzyme elevation 100% risk -22.75 25.87 -21.22 23.23 -20.83 24.34 Allergic/hypersensitivity reactions 100% risk -6.31 8.23 -5.92 7.22 -2.75 0.64 Serious allergic reactions 100% risk -39.56 4.40 -39.48 4.39 -0.74 0.30 Depression 100% risk -5.10 0.89 -5.10 0.88 -0.69 0.43 Infusion/injection reactions 100% risk -20.85 35.38 -19.31 35.30 -5.95 7.47 Daily oral vs daily subcutaneous N/A -2.79 0.67 -2.72 0.64 -39.48 4.39 Monthly infusion vs daily subcutaneous N/A -0.76 0.33 -0.72 0.31 -5.08 0.88 Weekly intramuscular vs daily subcutaneous N/A -0.70 0.44 -0.66 0.43 -19.30 32.32


Relapse rate 1 relapse/year 7.0% 4.4% 6.5% 3.6% 7.2% 4.3%

Disability progression 100% risk 27.9% 13.4% 28.8% 13.5% 29.0% 13.4% Liver enzyme elevation 100% risk 54.2% 20.4% 53.7% 20.3% 52.6% 20.6% Daily oral vs daily subcutaneous N/A 10.8% 5.4% 11.0% 5.4% 11.2% 4.0% Monthly infusion vs daily subcutaneous N/A 2.9% 1.7% 2.9% 1.7% 6.4% 13.4% Weekly intramuscular vs daily subcutaneous N/A 2.6% 2.0% 2.6% 2.0% 6.0% 20.6%

Ratings standard deviation N/A 1.19 0.06 1.20 0.06 1.19 0.06 Proportional between-study preference standard deviation N/A 0.59 0.09 0.58 0.09 0.58 0.09 Ratings model residual deviance N/A 230.0 21.5 229.9 21.4 230.1 21.5 Choice model residual deviance N/A 94.4 3.6 94.4 3.6 94.4 3.6 Preference synthesis residual deviance N/A 45.6 6.5 45.8 6.6 45.6 6.5 Total residual deviance N/A 370.0 22.8 370.1 22.7 370.1 22.7

Appendix C

426

The table below shows the posterior mean and standard deviation of the key parameters in the full

preference model model (random preferences by study) with an alternative uniform prior on the

utility coefficients: 𝑒𝑔𝜔 ~ 𝑁+(0,10000) (a folded Normal distribution; see II.4.8) together with

𝑒𝑔𝜔 ~ 𝐺𝑎𝑚𝑚𝑎(1,0.01) as per the main results in III.5.6.1.

RANDOM PREFERENCES BY STUDY unit 𝑒𝑔𝜔 ~ 𝐺𝑎𝑚𝑚𝑎(1,0.01) 𝑒𝑔𝜔 ~ 𝑁+(0,10000)

mean SE mean SE


Relapse rate 1 relapse/year -1.62 0.62 -1.78 0.68

Disability progression 100% risk -7.26 2.21 -7.32 2.23 PML 100% risk -245.3 75.3 -201.4 43.5 Liver enzyme elevation 100% risk -21.22 23.23 -26.48 27.54 Allergic/hypersensitivity reactions 100% risk -5.92 7.22 -7.47 9.29 Serious allergic reactions 100% risk -39.48 4.39 -39.49 4.36 Depression 100% risk -5.10 0.88 -5.10 0.90 Infusion/injection reactions 100% risk -19.31 35.30 -22.72 30.99 Daily oral vs daily subcutaneous N/A -2.72 0.64 -2.78 0.66 Monthly infusion vs daily subcutaneous N/A -0.72 0.31 -0.76 0.32 Weekly intramuscular vs daily subcutaneous N/A -0.66 0.43 -0.71 0.44


Relapse rate 1 relapse/year 6.5% 3.6% 6.4% 3.8%

Disability progression 100% risk 28.8% 13.5% 25.7% 13.0% Liver enzyme elevation 100% risk 53.7% 20.3% 57.9% 20.1% Daily oral vs daily subcutaneous N/A 11.0% 5.4% 10.0% 5.3% Monthly infusion vs daily subcutaneous N/A 2.9% 1.7% 2.7% 1.7% Weekly intramuscular vs daily subcutaneous N/A 2.6% 2.0% 2.5% 1.9%

Ratings standard deviation N/A 1.20 0.06 1.20 0.06 Proportional between-study preference standard deviation N/A 0.58 0.09 0.59 0.09 Ratings model residual deviance N/A 229.9 21.4 229.9 21.4 Choice model residual deviance N/A 94.4 3.6 94.5 3.6 Preference synthesis residual deviance N/A 45.8 6.6 45.6 6.5 Total residual deviance N/A 370.1 22.7 370.0 22.7

Appendix C

427

3 MCDA model

Sensitivity to assumed correlations

The graphs below show the SUCRA statistic (based on population-average benefit-risk scores) for the

treatments in the RRMS case study under different assumptions regarding the correlations between

pair of outcomes in the evidence synthesis.

All correlations = 0

All correlations = 0.3



Vague prior on correlation propensities, between-study = within-study

Vague prior on correlation propensities,

between-study within-study

Appendix C

428

Sensitivity to priors


treatments in the RRMS case study under alternative priors for the random effects standard

deviation, utility coefficients and random preferences standard deviation.

Main priors

𝝈 ~ 𝑼𝒏𝒊𝒇𝒐𝒓𝒎(𝟎, 𝟐)

𝒆𝒈𝝎 ~ 𝑵+(𝟎, 𝟏𝟎𝟎𝟎𝟎) (see II.4.8)

𝝈𝒑𝒓𝒆𝒇 ~ 𝑼𝒏𝒊𝒇𝒐𝒓𝒎(𝟎, 𝟐)

Appendix C

429

Sensitivity to initial values


treatments in the RRMS case study under alternative sets of initial values for the utility coefficients

(as specificed above each graph in BUGS format).

eg=c(1,1,1,1,1,0.5,0.5,1,1,1,1,NA,NA,NA)

eg=c(0.5,0.5,0.5,0.5,0.5,0.1,0.1,0.1,0.5,0.5,0.5,NA,NA,NA)

eg=c(4,3,4,3,4,1,2,4,3,4,3,NA,NA,NA)

eg=c(0.1,0.2,0.3,0.4,0.5,0.01,0.05,0.1,0.2,0.3,0.4,NA,NA,NA)

Documents

spiral.imperial.ac.uk...3 . Acknowledgements . I am deeply grateful to my supervisors Professor Deborah Ashby and Professor Paul Matthews. Their help, advice, inspiration, reassurance,