QUANT IT AT IVE MET HODS T O SUPPORT DRUG ...su.diva-portal.org/smash/get/diva2:692438/FULLTEXT03.pdfI Caster O, Norén GN, Madigan D, Bate A. Large-Scale Regression-Based Pattern

Q U A N T I T A T I V E M E T H O D S T O S U P P O R T

D R U G B E N E F I T - R I S K A S S E S S M E N T

Ola Caster

Report Series / Department of

Computer & Systems Sciences No. 14-001

Quantitative methods to support drug benefit-risk assessment

Ola Caster

©Ola Caster, Stockholm University 2014

ISSN 1101-8526

ISBN 978-91-7447-856-3

Printed in Sweden by US-AB, Stockholm 2014

Distributor: Department of Computer and Systems Sciences

To all of you who took care of the world

while I was busy pretending to save it

Abstract

Joint evaluation of drugs’ beneficial and adverse effects is required in many

situations, in particular to inform decisions on initial or sustained marketing

of drugs, or to guide the treatment of individual patients. This synthesis,

known as benefit-risk assessment, is without doubt important: timely deci-

sions supported by transparent and sound assessments can reduce mortality

and morbidity in potentially large groups of patients. At the same time, it can

be hugely complex: drug effects are generally disparate in nature and likeli-

hood, and the information that needs to be processed is diverse, uncertain,

deficient, or even unavailable. Hence there is a clear need for methods that

can reliably and efficiently support the benefit-risk assessment process. For

already marketed drugs, this process often starts with the detection of previ-

ously unknown risks that are subsequently integrated with all other relevant

information for joint analysis.

In this thesis, quantitative methods are devised to support different as-

pects of drug benefit-risk assessment, and the practical usefulness of these

methods is demonstrated in clinically relevant case studies. Shrinkage re-

gression is adapted and implemented for large-scale screening in collections

of individual case reports, leading to the discovery of a link between methyl-

prednisolone and hepatotoxicity. This adverse effect is then considered as

part of a complete benefit-risk assessment of methylprednisolone in multiple

sclerosis relapses, set in a general framework of probabilistic decision analy-

sis. Two methods devised in the thesis substantively contribute to this as-

sessment: one for efficient generation of utility distributions for the consid-

ered clinical outcomes, driven by modelling of qualitative information; and

one for computing risk limits for rare and otherwise non-quantifiable adverse

effects, based on collections of individual case reports.

Sammanfattning

I många situationer är det nödvändigt med gemensam utvärdering av

läkemedels nyttobringande och skadliga effekter, speciellt som underlag för

att besluta om läkemedel ska tillåtas träda in eller stanna kvar på marknaden,

eller för att vägleda behandlingen av enskilda patienter. Denna syntes, kallad

nytta-risk-bedömning, är tveklöst viktig: beslut som tas i tid och är baserade

på transparenta och grundliga bedömningar kan minska dödligheten och

sjukligheten i potentiellt stora grupper av patienter. Samtidigt kan denna

syntes vara ytterst komplex: läkemedelseffekter är generellt vitt skilda till

natur och förekomst, och den information som måste behandlas är

mångfaldig, osäker, bristfällig eller i värsta fall otillgänglig. Det finns därför

ett klart behov av metoder som tillförlitligt och effektivt kan stödja

processen för nytta-risk-bedömning. För redan godkända läkemedel startar

denna process ofta med upptäckten av tidigare okända risker som sedan

integreras med all annan relevant information för gemensam analys.

I denna avhandling presenteras kvantitativa metoder för att stödja olika

aspekter av nytta-risk-bedömning av läkemedel, och den praktiska

användbarheten av dessa metoder demonstreras i kliniskt relevanta

fallstudier. Regressionsanalys med krympta koefficienter anpassas och

implementeras för storskalig genomsökning i samlingar av fallrapporter,

vilket leder till upptäckten av en koppling mellan metylprednisolon och

levertoxicitet. Denna biverkning beaktas sedan som en del av en komplett

nytta-risk-bedömning av metylprednisolon vid skov av multipel skleros,

utförd inom ett generellt ramverk baserat på probabilistisk beslutsanalys.

Två metoder som presenteras i avhandlingen bidrar i hög grad till denna

nytta-risk-bedömning: en metod för att effektivt generera fördelningar över

de betraktade kliniska utfallens respektive nyttovärde, baserat på

modellering av kvalitativ information; samt en metod för att utifrån

samlingar av fallrapporter beräkna riskgränser för ovanliga och annars icke

kvantifierbara biverkningar.

List of publications

This thesis consists of the following original publications, which are referred

to in the text by their bolded Roman numerals.

I Caster O, Norén GN, Madigan D, Bate A. Large-Scale Regression-

Based Pattern Discovery: The Example of Screening the WHO

Global Drug Safety Database. Statistical Analysis and Data Mining,

2010. 3(4):197-208.

II Caster O, Edwards IR. Reflections on Attribution and Decisions in

Pharmacovigilance. Drug Safety, 2010. 33(10):805-809.

III Caster O, Conforti A, Viola E, Edwards IR. Methylprednisolone-

induced hepatotoxicity: experiences from global adverse drug reac-

tion surveillance. European Journal of Clinical Pharmacology,

2014. DOI: 10.1007/s00228-013-1632-3.

IV Caster O, Ekenberg L. Combining Second-Order Belief Distribu-

tions with Qualitative Statements in Decision Analysis. Lecture

Notes in Mathematics and Economical Systems, 2012. 658:67-87.

V Caster O, Norén GN, Ekenberg L, Edwards IR. Quantitative Benefit-

Risk Assessment Using Only Qualitative Information on Utilities.

Medical Decision Making, 2012. 32(6):E1-E15.

VI Caster O, Norén GN, Edwards, IR. Computing limits on medicine

risks based on collections of individual case reports. Submitted for

publication.

VII Caster O, Edwards IR. Quantitative benefit-risk assessment of

methylprednisolone in multiple sclerosis relapses. In manuscript.

Reprints of I, II, III, IV, and V were made with kind permission from the

publishers.

Contents

1 Introduction ....................................................................................... 17 1.1 General background ..................................................................................... 17 1.2 Methods to support drug benefit-risk assessment ................................. 19 1.3 Aim .................................................................................................................. 22 1.4 Overview of the thesis ................................................................................. 22

2 Detection and evaluation of new drug risks ............................... 26 2.1 Introduction ................................................................................................... 26

2.1.1 Adverse drug reaction surveillance ................................................. 26 2.1.2 Disproportionality analysis ............................................................... 27 2.1.3 Causality evaluation .......................................................................... 28

2.2 Contributions ................................................................................................. 29 2.2.1 Regression as a basis for large-scale screening ........................... 29 2.2.2 Uncertainty in causal relationships ................................................. 31 2.2.3 Real-world prospective use .............................................................. 32

2.3 Empirical appraisals ..................................................................................... 34 2.3.1 Analysis of highlighted drug - adverse drug reaction pairs ........ 35 2.3.2 Stability over time ............................................................................. 36 2.3.3 Retrospective performance evaluation ........................................... 36 2.3.4 Exploratory investigation of prospective use ................................ 37 2.3.5 Conclusions ......................................................................................... 38

2.4 Related work .................................................................................................. 39 2.4.1 Regression in adverse drug reaction surveillance ........................ 39 2.4.2 Improved disproportionality analysis ............................................. 40 2.4.3 Other developments in adverse drug reaction surveillance ....... 41 2.4.4 Causality evaluation in pharmacovigilance.................................... 41

3 Drug benefit-risk assessment ........................................................ 43 3.1 Introduction ................................................................................................... 43

3.1.1 General overview ............................................................................... 43 3.1.2 Frequency and desirability of drug effects .................................... 44 3.1.3 Decision analysis ................................................................................ 45 3.1.4 Decision problems in benefit-risk assessment .............................. 48 3.1.5 Accommodating uncertainty in decision analysis ......................... 50

3.2 Contributions ................................................................................................. 52 3.2.1 Qualitative utility modelling ............................................................. 52

3.2.2 Risk quantification for rare adverse effects ................................... 56 3.2.3 A clinically significant benefit-risk assessment case study......... 59

3.3 Empirical appraisals ..................................................................................... 63 3.3.1 Qualitative utility modelling ............................................................. 63 3.3.2 Risk quantification for rare adverse effects ................................... 65

3.4 Related work .................................................................................................. 67 3.4.1 Work related to specific contributions ............................................ 67 3.4.2 Other benefit-risk assessment frameworks................................... 68

4 Conclusions and future directions ................................................ 72

5 Acknowledgements and afterword ............................................... 75

6 Appendices......................................................................................... 76 A Decision analysis and the need for absolute risk .................................... 76 B Monetary objectives in benefit-risk assessment ..................................... 79 C Technical notes on qualitative utility modelling ...................................... 79

7 Bibliography....................................................................................... 81

Abbreviations

ADR Adverse drug reaction

AIDS Acquired immunodeficiency syndrome

BARDI Bayesian adverse reaction diagnostic instrument

DES Discrete event simulation

EDSS Expanded disability status scale

EMA European Medicines Agency

FDA Food and Drug Administration

HIV Human immunodeficiency virus

IC Information Component

ITM Inverse transformation method

LLR Lasso logistic regression

MCDA Multi-criteria decision analysis

MCMC Markov chain Monte Carlo

MCV4 Meningococcal conjugate vaccine

MS Multiple sclerosis

NICE National Institute for Health and Clinical Excellence

NNH Number needed to harm

NNT Number needed to treat

PML Progressive multifocal leukoencephalopathy

QALY Quality-adjusted life year

UK United Kingdom

US United States

WHO World Health Organisation

17

1 Introduction

1.1 General background

Drugs are used to treat, prevent, or cure disease. Today they are considered a

natural and essential part of healthcare in most societies, though the vast

majority of all drugs used in Western medicine are less than 100 years old.

Undisputedly, the extent and breadth of this development has brought benefit

upon humanity. Examples of reduced mortality and suffering are numerous

and significant: deadly and disabling infectious diseases such as smallpox,

polio, and diphtheria have been eradicated globally or locally thanks to suc-

cessful immunisation programmes [1]; the discovery of insulin has trans-

formed type I diabetes from an incurable and lethal disease to a condition

that can be kept under control if well managed [2]; modern antiretroviral

therapy profoundly reduces the progression rate from HIV infection to out-

break of AIDS [3]; chemotherapy continuously pushes the borders for cancer

survival [4]; and so on.

At the same time, it is clear that drugs are not safe in the common under-

standing of the word. They interfere with physiological processes throughout

the human body, often in ways that are incompletely understood. Not seldom

drugs have harmful effects: one study estimated that adverse drug reactions

(ADRs) had caused about 100,000 deaths in the US during 1994 [5]; another

that ADRs leading to hospital admission had caused over 5,000 deaths in the

UK during 2002, suggesting a total annual ADR-related mortality in the

order of 10,000 people [6]. Although many of these deaths may have been

preventable, the figures are massive. As a reference point for the latter study,

less than 3,500 people were reported to have been killed in road accidents in

Great Britain during 2002 [7].

Because of this two-sidedness, many activities and decisions in relation to

drugs are inherently delicate. Drug regulation is one fundamental example: it

must be decided which drugs should be allowed to enter or remain on the

market, and in which conditions their use should be mandated. Drug therapy

is another: from the set of drugs available for a certain disease, it must be

decided which one should be used – if any – by a specific patient in a spe-

cific context. The immense importance of such decisions should be quite

clear, as they directly influence people’s health. Lives can be saved or

spilled; suffering can be reduced or induced. It is therefore no surprise that

drugs belong to the most extensively regulated of all products [8].

18

Decisions of this nature require joint evaluation of a drug’s beneficial and

adverse effects, preferably in relation to other available alternatives. Such

evaluation is commonly referred to as benefit-risk assessment, and in recog-

nition of its widespread acceptance this term will be used throughout this

thesis. However, its construction is illogical. Benefit is certain and manifest;

something good that can be experienced. Risk, on the other hand, is a possi-

bility of something harmful that might happen [9].

Benefit-risk assessments are difficult by design. Imagine a scenario with

only two effects to consider, amelioration of the disease and a single adverse

effect, whose respective likelihood and nature were precisely known. An

assessment would have to consider both the desirability of the beneficial

effect, which depends on the nature of the disease in its untreated and re-

duced forms, and the undesirability of the adverse effect, which relates for

example to its seriousness and persistence. Not only are these effects most

often widely different in a qualitative sense, but the total reckoning of the

situation must also account for their respective likelihood to occur, which

again may be hugely different. This is a tormenting exercise for the human

mind.

In reality, though, the situation is even worse. Knowledge is never com-

plete, and there are several effects to consider simultaneously, in particular

on the risk side. The information on their respective likelihood and nature

stems from diverse sources such as controlled trials, observational studies,

and anecdotal reports. Much of this information is fraught with inherent un-

certainties and deficiencies, and essential information may even be unavail-

able. Hence, it is a major challenge to merely comprehend, not to mention

disentangle, the aggregated complexity of benefit-risk assessment.

Yet, decisions are inescapable. Their complexity in conjunction with their

importance to the lives of patients suggests that benefit-risk assessments

must be approached with responsibility and rigor. Whereas regulatory drug

approval decisions have traditionally been made by expert committees with-

out the support from structured methods [10], the current attitude towards

the use of such methods is generally positive [11]. The world’s leading regu-

latory agencies FDA (Food and Drug Administration) in the US and EMA

(European Medicines Agency) in Europe now run and engage in scientific

programmes that investigate various methodological approaches [12-14].

Academic research is diverse and plentiful [15], but this area is still very

much in development. No method appears to be even close to widespread

acceptance, and the question how to assess benefit and risk in a given situa-

tion is yet unresolved.

To further add to the complexity, the approval of a drug is really more of

a starting point than an endpoint. Pre-marketing investigations of drug effi-

cacy are performed in human subjects that do not represent the populations

likely to use the drugs in clinical practice [16]. Therefore overall clinical

effectiveness is unknown. Further, drugs with hazards are typically screened

19

out by extensive toxicity testing, so that adverse effects are relatively much

rarer than beneficial effects. Because only a few thousand people will have

been exposed to a drug at its first marketing, it is doubtful whether adverse

effects of lower incidence than 1 in 1,000 will be detected in the pre-

marketing testing [16].

This necessitates continuous surveillance for previously unknown adverse

effects throughout a drug’s lifecycle, using the most appropriate data sources

and methods [17, 18]. It has long been recognised scientifically that the de-

tection of a new risk with a marketed drug calls for rapid and consistent revi-

sion of its benefit-to-risk balance, ideally in relation to alternative treatments

[19]. This enables rapid decisions to be made about continued marketing of

the drug, or about use in an individual patient. It also increases the under-

standing of the significance of the newly discovered risk. This lifecycle per-

spective is now emphasised also in a regulatory context [14], and since 2012

global guidelines call for Periodic Benefit-Risk Update Reports rather than

Periodic Safety Update Reports [20]. In this new paradigm, pharmaceutical

companies are expected to conduct benefit-risk assessments in the face of

new important information for their marketed drugs. However, no guidance

on methodology is provided.

Such is the context of this thesis: a complex and demanding reality in

symbiotic coexistence with a multi-faceted scientific method development.

To serve patients in the best possible way, ambitions must be set high. Any

benefit-risk assessment method which unduly delays decisions to move to-

wards necessary warnings of risk or further analysis compromises patient

safety; any approach that requires extensive new data and work is likely to

be too expensive for frequent routine use; and any method that is misleading

and lacks transparency cannot be justified.

1.2 Methods to support drug benefit-risk assessment

This thesis endorses the contemporary notion of benefit-risk assessment as a

dynamic process throughout the life-cycle of a drug. Consequently there is a

manifold of different types of methods that could be considered supportive

of the benefit-risk assessment process. Likewise, there is a manifold of areas

in which method development would be possible or even desirable, to enable

a higher standard of benefit-risk assessment, and ultimately to better serve

patients. In this thesis, three specific areas with potential for improvement

are considered: first-pass screening to detect previously unknown drug risks,

generation of values for the desirability of pertinent drug effects, and risk

quantification for rare adverse effects.

As mentioned, post-marketing ADR surveillance is a necessity in view of

the inherent limitations to pre-marketing drug trials. Since more than a dec-

ade, a fundament of this surveillance is the screening of large collections of

20

individual case reports, using data mining methods [18, 21]. Because any

potential new risk highlighted in this first-pass filtering needs to be clinically

assessed prior to further action, the optimal data mining approach should

detect real emerging problems as early as possible while keeping the rate of

false discoveries at a minimum. However, routine methods [22-25] are fairly

simplistic in that they are all based on two-dimensional data projections, for

one pair of drug and ADR term at the time. This has theoretical drawbacks

that may lead to sub-optimal performance [26]. Whereas multiple regression

could possibly mitigate some of the issues with the routine methods, its im-

plementation in large-scale applications such as ADR surveillance is a major

computational and operational challenge [27].

The area of potential improvement just described relates to the detection

of potential new drug risks, which after proper clinical evaluation could be

further communicated and, if significant, trigger a full benefit-risk assess-

ment. At the other end of the continuum reside those methods that feed di-

rectly into the joint analysis of drugs’ favourable and unfavourable effects.

To appreciate the needs and potential improvements in that region of the

overall process, a basic understanding of the elements that make up struc-

tured benefit-risk assessments is required.

The focus of this thesis lies on methods that include all effects relevant to

a particular assessment; that can accommodate information relating both to

the frequency and desirability of those effects; that can compare treatment

alternatives; and that provide an actionable and transparent quantitative syn-

thesis. These features are here considered essential for an assessment to

really achieve its purpose, and this restriction fits well with recommenda-

tions from recent systematic methods appraisals [12, 13, 15]. Important ex-

amples include variations of multi-criteria decision analysis (MCDA) [28-

32] and approaches based on aggregating utility over time, e.g. as quality-

adjusted life years (QALYs), either in decision trees [33, 34], Markov mod-

els [35, 36], or in patient-level discrete event simulation (DES) models [37].

All of these different methods have unique properties that may suit better

or worse the preferences of the analyst and the specific requirements of the

situation at hand. (For a detailed discussion, see Section 3.4.2.) However,

they all quantify – in some way – two intrinsic dimensions of the considered

drug effects, whether those are presented as outcomes, health states, decision

criteria, or something else. Those dimensions are frequency and desirability,

which are further described in Section 3.1.2. This thesis highlights one spe-

cific area of potential improvement corresponding to each of these two di-

mensions, presented briefly in turn below.

Quantifying the desirability of drug effects typically amounts to trans-

forming preferences into values. Ideally one would turn to the relevant pa-

tient population to elicit their preferences for the set of effects that apply in a

given benefit-risk assessment [38, 39]. However, this is costly and cannot be

done promptly; therefore it is unlikely to be a generally feasible approach in

21

response to a newly discovered significant risk for a marketed drug. Another

alternative is to turn to the literature for estimates of desirability [40, 41],

although this requires overcoming significant challenges: for example, esti-

mates may differ dramatically between respondent groups even if elicited

with the same method [40], and they may vary considerably within the same

group of respondents if elicited with different methods [42]. The most seri-

ous situation, however, is that when there are no estimates available at all,

which forces the analyst to use questionable substitute values elicited for

other effects, possibly in unrepresentative populations, or else to make plain

ad hoc value assignments. Such situations are not too rare, and may arise

even though the effect is central to the assessment at hand, and the purpose

of the assessment is to inform an important policy decision. One example is

the substitution of Guillain-Barré syndrome in adolescents by multiple scle-

rosis (MS) in adults [43]; for further examples, see Table 2 in Section 3.1.2.

At the same time, logically or clinically implied qualitative preference in-

formation should be immediately available in most situations. For example,

hepatitis that requires transplantation is worse than hepatitis that spontane-

ously resolves, and persistent disabilities such as deafness are universally

acknowledged as less desirable than transient and mild conditions like sea-

sonal rhinitis. Hence, methods that could usefully accommodate such infor-

mation within a quantitative analysis framework might enable quick and

cheap assessments relieved of the requirement on external estimates of de-

sirability.

As regards the other dimension of interest – frequency – an important

practical issue is that the risk of rare adverse effects can be difficult to quan-

tify. Randomised clinical trials are typically too small, and even very large

observational studies may be insufficient for some rare adverse effects of

importance [44]. Although ad hoc approaches may be possible in some in-

stances [45], they do not provide a generally feasible solution. Empirical

studies show that individual case reports are by far the most frequently used

source of evidence in safety-related regulatory actions, such as market with-

drawals of drugs [46-49]. Because those withdrawals must have been pre-

ceded by benefit-risk assessments, whether by structured methods or not,

individual case reports may sometimes be attributed quantitative risk infor-

mation in some vague and unspecified sense. However, as far as we are

aware, there has been no real attempt to elucidate when and how such infor-

mation could be harnessed from individual case reports, and what type of

quantification that could come into question. In light of the worldwide avail-

ability and abundance of individual case reports, any information they could

contribute on the risk of rare adverse effects could prove very useful.

The overall framework for benefit-risk assessment adopted in this thesis

is probabilistic decision analysis of patient-oriented treatment decisions. The

nature of benefit-risk assessment fits well with decision-analytic principles

(see Section 3.1.3), and decision analysis allows for information from dispa-

22

rate sources to be combined. Further, its use in this context is supported by

external method reviews [12, 13]. Probabilistic sensitivity analysis [50] is a

generic and natural way to handle uncertainty, whose use in this application

has been externally recommended [51, 52]. It is important to realise, how-

ever, that the three focus areas for method development presented above,

with their corresponding proposed solutions later in this thesis, are tied to

this general framework to various degrees. Nonetheless, the framework has a

crucial role to play in demonstrating how the different contributions fit to-

gether, and how they can be practically combined within a single benefit-risk

assessment.

1.3 Aim

The aim of this thesis is to devise quantitative methods to support the drug

benefit-risk assessment process, defined broadly to include, if applicable, the

detection of new adverse effects as triggers for subsequent complete benefit-

risk assessment. Proposed methods should be compatible with, but not nec-

essarily dependent on, a general framework of probabilistic decision analy-

sis, and they should facilitate a more efficient, more accurate, and more

transparent benefit-risk assessment process.

In view of the areas of potential improvement presented in Section 1.2,

the following specific objectives all contribute towards this overall aim:

(i) to propose a feasible approach by which regression can be used in

ADR surveillance for large-scale screening of collections of indi-

vidual case reports;

(ii) to enable qualitative preference information with respect to rele-

vant drug effects to be incorporated into quantitative benefit-risk

assessment; and

(iii) to elucidate what type of quantitative risk information for rare ad-

verse effects that may be available from collections of individual

case reports, and to specify how it could be extracted.

1.4 Overview of the thesis

The core of this thesis is made up of Sections 2 and 3. The former covers

issues related to the detection and evaluation of new drug risks and specifi-

cally seeks to address objective (i) as specified above. Section 3 is devoted

to the evaluative part of the benefit-risk assessment process: it first sets up

the general framework of probabilistic decision analysis, then devises meth-

23

ods corresponding to objectives (ii) and (iii), and finally demonstrates in a

real-world prospective case study of clinical significance how all methods

proposed in this thesis can contribute to the same assessment. Within this

structure, publications I, II, and III belong to Section 2, while IV, V, VI,

and VII belong to Section 3.

As a direct response to objective (i), publication I is the main contribution

of Section 2. It describes the first ever implementation and validation of

shrinkage regression as a method for large-scale screening of databases of

individual case reports. This approach, referred to as lasso logistic regression

(LLR), has the potential to account for the impact of covariates, such as co-

reported drugs, that may confound or otherwise distort the traditionally used

measures. It may therefore improve the accuracy and promptness of the

screening process. II discusses what aspects need to be considered in evalu-

ating whether hypotheses generated by screening methods such as LLR rep-

resent true causal relationships, and how this relates to subsequent actions

such as communication to wider audiences. Finally, III is a short report on

the likely causal association between methylprednisolone and hepatotoxicity.

This association was initially highlighted by LLR when prospectively

screening the WHO global individual case safety report database VigiBase,

as part of an exploratory investigation of the practical usefulness of LLR.

Due to the choice of using probabilistic decision analysis as the general

framework for benefit-risk assessment in this thesis, objective (ii) will

henceforth relate specifically to qualitative relations between utilities in de-

cision problems. This objective is not fully addressed until V, where an effi-

cient probabilistic approach is presented that permits flexible sets of qualita-

tive relations to be specified. However, the core algorithm of this approach is

introduced already in IV, which is an application-independent publication

with more focus on the mathematical and statistical particulars. Further, a

proposed solution to meet objective (iii) is presented in VI, in the form of a

mathematical model that links individual case reporting to drug exposures

and adverse events in the real world. From this model, formulae for upper

and lower limits on the risk of adverse effects from drugs are derived, to-

gether with assumptions required for the formulae to be valid.

Lastly, publication VII extends the detection of a new drug risk in III by

incorporating it into a full quantitative benefit-risk assessment of methyl-

prednisolone in MS relapses. This case study is of high clinical importance,

considering that methylprednisolone is essentially the only treatment given

to MS patients specifically to manage relapses. However, no formal benefit-

risk assessment has been performed for methylprednisolone in this context,

to investigate if and possibly how it should be used. The assessment in VII

makes direct use of the methods proposed in V and VI, and is indirectly

dependent on I, considering that the link between methylprednisolone and

hepatotoxicity was highlighted by LLR. Therefore, VII serves both as a real-

ity check with respect to the usefulness of the methods devised in this thesis,

24

and as a pedagogical aid to explaining how the different publications relate

to each other. The latter aspect becomes evident in Figure 1, where these

relations are illustrated: all other publications feed into VII, either directly or

indirectly.

25

Figure 1. Overview of the publications included in this thesis and their inter-relations.

26

2 Detection and evaluation of new drug risks

2.1 Introduction

2.1.1 Adverse drug reaction surveillance

The notion that drugs need to be continuously monitored for potential safety

problems is about 50 years old, triggered by the tragic thalidomide disaster

[53]. This is the concern of the discipline of pharmacovigilance, and follows

logically from the fact that pre-marketing clinical trial programmes are too

small and too short, and include a too limited set of patients treated under too

narrow conditions, to be able to detect all risks attached to a drug [16]. Col-

lection and analysis of individual case reports pertaining to the real-world

use of drugs has long been recognised as the mainstay of ADR surveillance

for new risks [54]. Although several complementary approaches are avail-

able today, such as screening of longitudinal electronic patient records [55-

57] and cohort event monitoring [58, 59], individual case reports remain the

most important source of information [17, 60, 61]. Within the context of this

thesis, databases of individual case reports will be the only considered data

source for ADR surveillance.

Ideally, submitted individual case reports represent suspicions by health

care professionals or patients that one or more drugs have caused an adverse

reaction [62]. The clinical suspicion is one strength; the wide coverage in

terms of both patient populations and drugs is another [63]. The main limita-

tion is that not all ADRs are recognised, and far from all that are recognised

are reported [63]. Also, the extent of this under-reporting varies across drugs

and reactions [64].

The particular database of individual case reports considered in this thesis

is VigiBase, which is a large global repository [65]. VigiBase now contains

more than eight million reports and grows at a rate of several hundred thou-

sand reports per year. For this and other databases of the same magnitude,

the development of automated screening methods to generate hypotheses on

potentially causal drug-ADR associations has been necessary [66]. The

amount of data generated is simply too massive for exhaustive manual

evaluation, and assessors need to be guided towards issues more likely to

represent real drug safety signals. Whether labelled ‘knowledge discovery in

27

databases’ [67], ‘data mining’ [21], or ‘pattern discovery’ [68], this is an

important application of computer science methods in pharmacovigilance.

2.1.2 Disproportionality analysis

If the only information considered on the reports are the listed drugs and

ADR terms, a database of individual case reports can be viewed as a set of

transactions: for every report, each drug and each ADR is either present or

not. In principle, therefore, well known measures from association rule min-

ing such as support and confidence could be used [69]. If the drug is denoted

by and the ADR by , support and confidence can be defined as

and , respectively. In other words, support is the proportion of all

reports that contain both the drug and the ADR, and confidence is the pro-

portion of the reports on the drug that also contain the ADR.

However, drug-ADR pairs with high support and confidence may not rep-

resent interesting reporting patterns: if the ADR and/or the drug are overall

common in the database, high values would indeed be expected. This obser-

vation triggered the alternative measure lift, which in the same notation is

defined in the following way:

(1)

This means that, in the context of databases of individual case reports, lift is

the ratio between the observed reporting frequency of the drug together with

the ADR, and the frequency expected if they were reported independently of

each other. Because focus will be on those drug-ADR pairs with an observed

reporting frequency that is disproportionately high in comparison to the ex-

pected, screening of this type is usually referred to as disproportionality

analysis. This bivariate screening approach was essentially the only one

available in ADR surveillance when I was published [21], and vastly domi-

nates in practice still today.

The traditionally used disproportionality measures are based on the lift or

some closely related metric, extended or complemented with protection

against spurious findings [22-25]. For an overview, see [66]. In I, shrinkage

regression is compared specifically against the Information Component (IC):

(2)

where and are the observed and expected numbers of reports, re-

spectively. is given by , where is the total number of re-

ports on the drug; is the total number of reports on the ADR; and is the

total number of reports in the database. It should be noted that the observed-

28

to-expected ratio is precisely the lift presented in Equation (1), only

with both numerator and denominator multiplied by the factor . Credibility

intervals for the IC are obtained via the Gamma distribution [70]. Screening

in practice highlights drug-ADR pairs whose lower endpoint of a 95% credi-

bility interval for the IC exceeds zero.

In view of the observational nature of individual case reports, the fact that

bivariate measures like the IC consider only one drug and one ADR at the

time appears to leave much room for improvement. However, given the size

and complexity of these databases, any increase in methodological sophisti-

cation will need to tackle issues with computational complexity and interpre-

tation.

2.1.3 Causality evaluation

Attributing causality of an adverse effect to a drug is challenging. Since in-

dividual case reports are collected in an unsystematic manner, it is clear that

an unexpectedly high reporting rate does not per se imply a causal link be-

tween the drug and the ADR. Highlighted drug-ADR pairs, e.g. from dispro-

portionality screening, represent hypotheses on causality. These need to be

further evaluated manually prior to any subsequent synthesis, such as con-

sideration in a wider benefit-risk context.

Evaluations of causality typically consider clinical particulars on the re-

ports that speak in favour of a true relationship, such that no other agent or

co-morbidity is suspected; that the time relationship is suggestive; that the

reaction, if reversible by nature, abates upon drug removal, and possibly re-

emerges following re-exposure; or that supportive laboratory findings are at

hand [71]. However, even if no very strong index reports are available, sheer

numbers of well documented reports can be quite convincing if no other

possible explanation can be found [72]. In addition to the available reported

information, it is crucial to consider orthogonal information from other

sources in evaluating a possibly causal relationship [73]. For example, other

observational data, collected with or without controls, often need to be used.

This is because drugs are likely to be a minority cause for adverse reactions,

and controlled clinical trials usually lack statistical power to detect ADRs in

such situations. In practice this means that phenomenonological as well as

probabilistic information will be needed for the evaluation.

29

2.2 Contributions

2.2.1 Regression as a basis for large-scale screening

2.2.1.1 Background and motivation

Several limitations have been identified with traditional disproportionality

analysis. To begin with, confounding is a major threat to disproportionality

analysis, at least theoretically. Confounding is a well known phenomenon in

statistical analysis of observational data, and has been discussed to some

extent also in the data mining literature [74-76]. A confounder is some vari-

able with direct associations to some other variables and . Conse-

quently an apparent association arises between and , which will be de-

tected by crude measures of disproportionality that disregard the impact of

other covariates. Of particular interest in I is the so called innocent bystander

bias known in ADR surveillance [77], whereby a drug is wrongly implicated

with an ADR because this drug is excessively co-reported with another drug,

which, in turn, is directly associated to that same ADR.

Further, masking is a phenomenon which may distort measures that con-

trast observed frequencies to an expected value based on the marginal fre-

quencies of the constituent events. Hence lift and all traditional dispropor-

tionality measures used in ADR surveillance are susceptible. In this particu-

lar application, the issue is that the overall reporting rate of some ADR ,

, becomes elevated if one or more drugs are reported excessively with

that ADR. When this overall reporting rate is used as the reference rate for

other drugs, highlighting of associations may be delayed or altogether hin-

dered [78]. Looking at the ratio on the right hand side of Equation (1), the

denominator is unduly inflated relative to the numerator .

To overcome these limitations, I aimed to implement and evaluate shrink-

age regression as an alternative screening method in ADR surveillance, on

account of its theoretical benefits relative to disproportionality analysis.

2.2.1.2 Implementing lasso logistic regression for ADR surveillance

In regression, the effect of a given explanatory variable on the outcome vari-

able of interest is estimated conditional on all other explanatory variables

included in the model. Hence, in theory, by including all reported drugs as

explanatory variables in a regression model for the reporting of some ADR,

innocent bystander biases should be eliminated. Further, if the model in-

cludes an intercept corresponding to the background reporting of the ADR,

masking effects should be accounted for. For these reasons, objective (i) in

Section 1.3 calls for a feasible approach by which regression can be used in

ADR surveillance for large-scale screening of collections of individual case

reports.

30

Given the binary transaction data considered in ADR surveillance, the

starting point in I is the logistic regression model:

(3)

where is a binary indicator for the presence of the :th drug on the reports;

is the coefficient that measures the influence of the reporting of the :th

drug on the reporting of the ADR; and is the intercept. Shrinkage is then

added via the following constraint on the coefficients:

(4)

A general advantage with using statistical shrinkage for very large regression

models is that numerical issues such as non-convergence and instability in

estimation are avoided [27]. The specific shrinkage induced by the constraint

in Equation (4) is called lasso, and typically results in a majority of coeffi-

cients being shrunk down to exactly zero [79].

The computational task considered in I is massive: about 2,000 models,

each with roughly explanatory variables and

data points. These figures correspond to the number of available ADRs,

drugs, and reports, respectively, in the VigiBase extract from mid 2007 that

was used in I. Six years later, in May 2013, the corresponding figures were

2,200 ADRs, 17,000 drugs, and 8.1 million reports. Given that the database-

wide screen in I required several weeks, this growth gives a flavour of some

of the challenges associated with computer-intensive methods such as

shrinkage regression in this context. For the analyses in I, and in subsequent

prospective VigiBase screens, we have used the lasso logistic regression

(LLR) algorithm developed by Genkin et al. [27]. While this has been suffi-

cient so far, future work may consider later developments of possibly more

efficient algorithms [80].

In a given model, the lasso shrinkage effectively dichotomises drugs into

those with and those without positive coefficients, respectively. The drugs in

the former group all have positive reporting correlations with the ADR

strong enough to withstand the shrinkage; therefore those drugs were con-

sidered highlighted with the ADR by LLR. To limit the computational bur-

den, the level of shrinkage was not optimised with respect to predictive per-

formance. Instead, it was pragmatically set so that LLR overall would yield

the same number of positive drug-ADR associations as the IC.

2.2.1.3 Empirical evaluation: results and conclusions

Adapting and implementing LLR for use in ADR surveillance in the manner

just described is one of two main contributions in I. The other is the exten-

31

sive empirical evaluation of LLR that was done relative to the IC, in three

parts: a comparison of the subsets of drug-ADR pairs highlighted by each

method at one specific point in time; an investigation of the stability over

time in the methods’ highlighted associations; and a retrospective analysis

relative to a set of established drug safety issues. Notably, the screening of

VigiBase presented in I was the first large-scale use of regression in ADR

surveillance. As it appears, it was even the first such use in binary transac-

tion data from any application.

The results from the empirical evaluation demonstrate that LLR in fact

brings added value relative to the IC, in particular by unmasking associations

and thus enabling earlier discovery. One example is the established link be-

tween the antidepressant drug venlafaxine and the ADR rhabdomyolysis, i.e.

degeneration of muscle fibres [81]. Figure I:5 shows how LLR highlights

this drug-ADR pair retrospectively already in 2001, while the IC leaves it

undiscovered at the end of the database extract in mid 2007. A virtue of the

analysis is the isolation of the unmasking effect achieved by comparing LLR

not only to the IC, but also to a modified LLR that is forced to use an inter-

cept equivalent to the crude background reporting rate . As seen in Fig-

ure I:5, this modified LLR conforms precisely to the IC, which demonstrates

that, in this example, the entire difference is due to unmasking by LLR rela-

tive to the IC. In the retrospective analysis against a set of established drug

safety issues, LLR offered earlier detection than the IC in 4 of 45 cases, two

of which could be attributed to unmasking (see Figure I:8).

Whereas it was clearly observed that LLR did adjust for confounding by

co-reported drugs, this effect was not as evidently as unmasking linked to

practical benefits. A likely contributory explanation to this observation is the

nature of the retrospective evaluation, which focussed on timeliness of detec-

tion and only included positive test cases.

The overall conclusion from I is that while LLR does bring conceptual

and practical advantages, it should be used to complement rather than re-

place the routinely used disproportionality measures. This is primarily be-

cause detection with LLR was slower than with the IC for some issues, a few

of which may be explained by too strong adjustment for co-reported drugs

by LLR. Also, the empirical basis of some estimated regression coefficients

was opaque, which may make interpretation and communication with do-

main experts difficult. Finally, more work to improve computational effi-

ciency would be needed before LLR or another shrinkage regression ap-

proach could be relied upon as one’s sole screening method. Nonetheless, it

remains a very interesting alternative.

2.2.2 Uncertainty in causal relationships

II is a reflection paper that discusses the concept of causality in pharma-

covigilance, the role of available sources of evidence, and decisions on how

32

to manage a potentially causal relationship. As such it is a useful bridge from

I, which concerns the generation of causality hypotheses, to the rest of the

thesis, where considered adverse effects have been evaluated with respect to

causality at some point.

In II it is argued that a series of individual case reports pertaining to an at-

tribution hypothesis can carry high evidential value, depending on the clini-

cal particulars of the reports. The Bradford-Hill criteria [82] are essential in

such an assessment, and will typically require consideration of information

external to that available on the reports. Further, it is argued that the logic in

using formal studies to prove or disprove attribution hypotheses is flawed.

Such studies have an important role in quantifying the strength of associa-

tions, in particular among events that are not too rare. However, making a

judgement on the probability of an attribution hypothesis is a very different

matter that requires a much broader approach, in which individual case re-

ports constitute one important piece. The term ‘notional probability’ is intro-

duced to emphasise that the probability of an attribution hypothesis cannot

simply be measured the way that quantitative associations can.

A recurrent theme in II is the uncertainty that is always present for early

hypotheses that attribute an adverse effect to a drug, either because there is

very little and often weak evidence, or because there are conflicting types of

evidence of inherently different nature. II proposes a classification of hy-

potheses as either tentative or strong based on the magnitude of this uncer-

tainty as captured by the notional probability. It also discusses how the level

of uncertainty may influence decisions to communicate or initiate further

investigation. Some later preliminary work outside this thesis has built on

these ideas and considered how uncertainty with respect to causality can be

formally accounted for in benefit-risk assessment [83].

2.2.3 Real-world prospective use

The approach proposed in I has yielded tangible results in prospective

screening of VigiBase to discover new risks with marketed drugs. In an ex-

ploratory investigation of its practical usefulness, only those drug-ADR pairs

were considered that were highlighted as potentially causal associations by

LLR at that point in time, but never by routine IC-based screening. After

careful clinical assessment of a select subset of the outputted drug-ADR

pairs, including evaluation with respect to causality, four were deemed con-

vincing enough to be publicly disseminated as so called signals. Three of

these are external to this thesis (see Table 1), and the fourth is presented in

III. Not only are these findings reassuring with respect to the practical value

of shrinkage regression in ADR surveillance, but they are also, as far as we

are aware, unique in pharmacovigilance. Logistic regression has been used

earlier to study reporting associations [25, 84], but never in a prospective

and open-ended context. See also Section 2.4.1.

33

Table 1. Prospective signals from screening VigiBase with LLR. Re-generated from Caster et al. [85].

Drug Adverse reaction(s) Date of publication

Mometasone Arrhythmia August 2012

(WHO PN issue 2012-4)

Propylthiouracil Stevens-Johnson syndrome, erythema multiforme, epidermal necrolysis

April 2013


Fluoxetine Deafness July 2013


WHO PN = WHO Pharmaceuticals Newsletter, available at

www.who.int/medicines/publications/newsletter

Figure 2. Retrospective measures of association between methylprednisolone and hepatitis in VigiBase from 1986 to 2012. IC = information component; LLR = lasso logistic regression. Error bars for the IC correspond to 95% credibility intervals.

http://www.who.int/medicines/publications/newsletter

34

The initial discovery that led to III concerned the drug methylpredniso-

lone and the ADR term hepatitis. Figure 2 displays retrospectively generated

IC values and estimated coefficients for this drug-ADR pair over time.

Whereas LLR first highlighted the association in 2002, the IC has yet failed

to do so. The subsequent clinical assessment was not limited to the specific

term hepatitis, but used a broadened definition of hepatotoxicity. III reports

on the available data in VigiBase for methylprednisolone in relation to this

adverse effect. Because previous literature case reports have suggested a link

specifically between high-dose methylprednisolone and hepatotoxicity [86],

the analysis in III is stratified by dose.

The primary result is that high-dose methylprednisolone is reported quite

frequently with hepatotoxicity in VigiBase, including three reports that

strongly implicate causality. Several of the Bradford-Hill criteria discussed

in II are relevant, not least the strong temporal connections observed in these

reports: onset of hepatotoxicity occurred with reasonable delay after drug

dispensing, and it recurred with later re-exposure, in two of the patients sev-

eral times. Such time patterns are very unlikely to be due to chance. Further,

no other drugs were suspected to have caused the reactions, and for two of

the three patients other common causes of hepatic damage were ruled out by

laboratory testing. When adding to this evidence the coherence with previous

case reports in the literature [87, 88] and several plausible underlying

mechanisms [86], our judgment is that the hypothesis of hepatotoxicity being

attributable to high-dose methylprednisolone should be considered strong in

the terminology of II, even in the absence of formal studies. There may even

be dose dependency, given that hepatotoxicity was considerably less fre-

quently reported with low-dose methylprednisolone, and no convincing re-

ports were identified.

Communication of this strong causality hypothesis initially discovered

with regression-based screening makes a suitable endpoint to the contribu-

tions of the first part of this thesis. The analysis in III revealed that hepato-

toxicity in conjunction with methylprednisolone use often was reported as

serious, and sometimes had a fatal outcome. This circumstance combined

with the fairly heavy burden of other adverse effects with corticosteroids

[89] as well as an identified need to investigate the benefit-risk profile of

methylprednisolone in MS relapse management [90, 91] triggered the subse-

quent analysis in VII (see Section 3.2.3).

2.3 Empirical appraisals

Section 1.3 calls for quantitative methods that support the benefit-risk as-

sessment process and that offer an improvement with respect to efficiency,

accuracy, and transparency. In relation to this overall aim, three specific

objectives were listed; the first of these is addressed here in Section 2 with

35

the adaptation and implementation of LLR for screening of collections of

individual case reports in ADR surveillance. A critical question from a sci-

entific perspective is whether sufficient and appropriate efforts have been

made to conclude whether the proposed approach contributes to fulfilling the

aim.

LLR as presented in this thesis has been empirically investigated on two

occasions: as a part of I (see Section 2.2.1.3) and in a prospective setting

after the publication of I (see Section 2.2.3). The former investigation con-

sisted of three different parts to be discussed in turn below, followed by a

discussion of the latter investigation.

2.3.1 Analysis of highlighted drug - adverse drug reaction pairs

In Section 3.1 of I, a series of analyses are performed that attempt to charac-

terise and quantify the differences between the respective outputs from LLR

and standard IC-based screening. The basic methodology is to use the two

methods in parallel to screen the entire VigiBase as it appeared at one spe-

cific point in time. Although these analyses are limited in the sense that they

consider only a single time point, the results provide useful baseline infor-

mation, such as the overlap in the two methods’ respective sets of high-

lighted drug-ADR pairs.

This section also contains mechanistic analyses that attempt to directly

measure the presumed LLR effects of adjusting for confounding by co-

reported drugs and unmasking. The methodology behind those analyses is

fairly involved and to some extent dependent on the particular regression

software used. The basic philosophy is to change some aspect of LLR

thought to be responsible for the effect of interest, and then to compare the

modified LLR both to standard LLR and to the IC. If the modified LLR be-

haves like the IC, but unlike standard LLR, this is indirect proof that the

aspect that was altered does bring about the effect of interest. While logically

sensible, this methodology has practical issues. In particular it is difficult to

tell when the modified LLR behaves like the IC, because the two methods

are so unlike each other. Although minor adjustments were made, such as

shifting the IC to the same logarithmic scale as the estimated coefficients,

it is likely that the mechanistic investigations had been easier to carry out if

regression had been used indirectly in a propensity score-based approach.

(Cf. the discussion on the work by Tatonetti et al. [92] in Section 2.4.1.)

Nevertheless, examples such as that of venlafaxine and rhabdomyolysis (see

Section 2.2.1.3) provide quite convincing evidence, in this particular exam-

ple that masking does occur, and that LLR adjusts for it in the presumed

manner, i.e. by estimating the background reporting rate separately via the

model intercept.

36

2.3.2 Stability over time

As is clear for example from Figure 2, both the IC and LLR produce meas-

ures of association that fluctuate considerably over time. It could be sus-

pected that such fluctuations would be worse with a more sophisticated

method like LLR, which, if true, would speak against its practical usefulness.

This hypothesis is investigated in Section 3.2 of I: snapshots of VigiBase

as of 31st March 2005 and 30

th June 2005 are recreated, and both LLR and

the IC are used for complete screening at both points in time. Then, for the

newly highlighted drug-ADR pairs on 30th June 2005 relative to 31

st March

2005, it is ascertained what proportion that are still highlighted two years

later, i.e. 30th June 2007. Perhaps surprisingly, this proportion was the same

for both methods.

Clearly this methodology is simplistic in that it considers only three dif-

ferent points in time. This precludes identification of more complex temporal

trends. However, it was judged that the computational burden of screening

VigiBase at several additional time points and the severe challenges associ-

ated with estimating and interpreting temporal trends would offset the possi-

ble additional insights that could be gained.

2.3.3 Retrospective performance evaluation

The appraisals described thus far yield observations on various properties of

LLR, contrasted to those of a standard method. While these observations

provide practically useful information, they do not measure the really hard

endpoint, which is method performance. This endpoint can have multiple

meanings, but in this instance the features of accuracy and efficiency called

for in the aim offer useful guidance.

Section 3.3 of publication I contains one type of performance evaluation,

which focuses on timeliness of detection. A reference set was constructed by

extracting 45 historical drug safety issues from an external reference [93],

and LLR and the IC were retrospectively applied to VigiBase to determine at

what point in time they would have been able to highlight each of the drug-

ADR pairs in the reference set.

The results are shown in Figure I:8. The main finding is that the vast ma-

jority of the established safety issues were highlighted by both methods, with

small or no time differences. Where differences were seen, some were in

favour of LLR and some of the IC; among the former, unmasking was the

mechanistically confirmed cause in two cases.

The retrospective design of this evaluation has the obvious drawback

compared to a prospective study that it can only approximate real-world use

of the methods. For example, it is impossible to recreate the sharp clinical

assessments that normally follow after the screening phase, and that natu-

rally are performed without knowing whether a specific issue will become

37

established as a causal relation in the future. At the same time, the retrospec-

tive design is much more efficient, and has been used to benchmark per-

formance in pharmacovigilance for a long time [94]. Also, there are issues

with using clinical assessment as the reference when evaluating perform-

ance: inter-individual agreement can be poor [95], and it can be difficult for

assessors to isolate the causality aspect from other aspects such as clinical

significance in their classification of potential findings as true or false.

Rather, the main limitation of the performance evaluation presented in I is

the lack of negative controls in the reference set, which precludes any con-

clusions regarding specificity. This oversight is particularly troublesome in

the evaluation of a method like LLR, with its presumed virtue of being able

to adjust for confounding by co-reported drugs, and thereby to reduce the

rate of false positive findings. A complementary performance evaluation that

rectified this methodological issue would therefore be of great value; how-

ever, subsequent investigations by other researchers have to some extent

already filled this empirical void (see Section 2.4.1).

A concluding remark in relation to this evaluation is that the appropriate

methodology for benchmarking performance of hypothesis-generating meth-

ods to detect new drug risks is a constant subject of debate in pharmacovigi-

lance. Some of the issues have been brought up here, but interested readers

may consult e.g. Hauben and Norén for further details [21].

2.3.4 Exploratory investigation of prospective use

As mentioned in Section 2.2.3, LLR has been field-tested in a prospective

real-world setting. Some of its highlighted drug-ADR pairs in VigiBase at a

specific point in time were taken further for proper clinical assessment with

the intent of publicly disseminating any resulting signals.

The outcome of this exploratory investigation, as previously reported, was

the publication of four signals. In a pragmatic, result-oriented context this is

a very good outcome: it is tangible evidence that the method has the capabil-

ity of doing what it is intended to do. However, from a scientific standpoint

there are several limitations with this exploratory investigation that limit the

type of conclusions that can be drawn.

First, the evaluation is not fit to compare the respective performance of

different methods. It considers only drug-ADR pairs highlighted by LLR and

never highlighted by standard IC screening. The reason for this approach is

again pragmatic: it is of great interest whether a new method can detect sig-

nals overlooked by standard methods, and this rather drastic selection proc-

ess seeks to answer that question as promptly as possible. The second main

limitation is the lack of control. There is no proper record of the initial num-

ber of highlighted-drug ADR pairs, how many were initially reviewed and

why, and how many that went on to end-stage clinical assessment.

38

On the whole, therefore, this investigation adds only limited empirically

based knowledge. It can be seen as a basic form of quality assurance: if no

signals at all had been detected, there would have been reason for concern.

2.3.5 Conclusions

Methods like LLR have the potential to spark off a full benefit-risk assess-

ment, but have no role to play in the process of jointly evaluating favourable

and unfavourable effects. Also, there is a required intermediate step of cau-

sality evaluation before a full benefit-risk assessment would come into ques-

tion. Therefore, referring back directly to the overall thesis aim in Section

1.3, it is difficult to see how the use of LLR instead of other screening meth-

ods in ADR surveillance could increase the transparency of the benefit-risk

assessment process. It might increase the accuracy, if the accuracy of the

screening process is factored into the overall accuracy of the process. But

foremost, it could clearly increase the efficiency by enabling earlier detec-

tion of real emerging risks.

Of all the empirical appraisals discussed, it is really only the retrospective

performance evaluation in Section 2.3.3 that forms an appropriate basis on

which to judge whether the use of LLR in ADR surveillance would make the

benefit-risk assessment process more efficient. The conclusion that can be

drawn from the results is largely the one already presented (see Section

2.2.1.3): because it demonstrably can unmask some associations and offer

earlier detection, LLR should be considered an interesting complement to

existing methods. The detection of the link between methylprednisolone and

hepatotoxicity in III is a good example of this, since it is very unlikely that

this issue would have become known to us by other routes, and therefore the

clinically important benefit-risk assessment in VII is directly attributable to

LLR as devised in I. However, a complete replacement of standard methods

by LLR is not mandated based on the available results from the retrospective

performance evaluation. In particular, some established drug safety issues

were detected later by LLR.

Two remarks are called upon. First, the conclusions in publication I are to

some extent based on factors that have not been empirically appraised and

that therefore have not been brought up for discussion here. These include an

increased computational burden from using LLR, and a presumed difficulty

to communicate its output to clinical assessors. Whereas both could be

properly investigated, there is reason to believe that the former is a real issue

that would require substantial efforts to mitigate. Secondly, these conclu-

sions are based strictly on the work presented in this thesis. There is later

work that is of direct relevance, which is discussed in Section 2.4.

39

2.4 Related work

2.4.1 Regression in adverse drug reaction surveillance

As mentioned earlier, logistic regression was proposed as a natural develop-

ment from standard disproportionality measures long before the publication

of I. However, it was used only sporadically, with a very limited set of pre-

dictors, and as a method for refined analysis of specific issues of interest

rather than a screening tool [25, 84]. Regression was also proposed as a suit-

able basis for drug-drug interaction surveillance [96, 97], although later

work has demonstrated its inferiority for that purpose relative to more tai-

lored methods [98].

Right after the publication of I, An et al. presented a shrinkage logistic

regression analysis in the Canadian national database of individual case re-

ports [99]. Although it is methodologically more sophisticated than LLR and

an interesting contribution, this analysis, like the earlier, resembles more an

in-depth study of a specific issue than a screening approach: only four ADRs

and six predictor variables were considered.

The use of standard (i.e. non-shrinkage) logistic regression in ADR sur-

veillance has recently been facilitated by its implementation into commercial

software [100]. In that sense regression should now be readily available as a

prospective tool, although the extent of its use is not publically known. The

main disadvantage with non-shrinkage regression is that the number of ex-

planatory variables is restricted to a few hundred, which necessitates pre-

selection of drugs and other possible predictors [101]. On the other hand,

confidence intervals around estimated coefficients are much easier to obtain

than for example with LLR.

Two empirical comparisons between this implementation of logistic re-

gression and standard disproportionality analysis have recently been pub-

lished [101, 102]. The analysis by Berlin et al. was based on an internal

company database of roughly 900,000 individual case reports and a test set

comprising 14 drugs spread over about 600 drug-ADR pairs [102]. In con-

trast, Harpaz et al. compared methods in an extract from the national US

database with about 4.7 million reports, using a test set that contained 180

drugs and only four different ADRs [101]. In the former study no overall

improvement with logistic regression was observed, whereas the latter re-

ported a markedly higher predictive performance for regression than for

disproportionality analysis.

Another recent study by Tatonetti et al. utilised logistic regression indi-

rectly by matching reports on propensity scores prior to computing dispro-

portionality metrics [92]. Although slightly cumbersome, this approach has

the advantage of injecting the multivariate properties from regression into

whatever disproportionality metric the analyst desires to use, without chang-

40

ing its appearance. This study also included empirical comparisons of the

sort just discussed, wherein the regression-based method clearly outper-

formed traditional disproportionality analysis.

On the whole it is difficult to draw any firm conclusions regarding the

usefulness of regression as a screening tool in ADR surveillance. The pub-

lished evaluations are diverse both in methods and results, and they all

commit the common mistake of including as positive test cases known rather

than emerging drug safety issues, which makes it very difficult to draw con-

clusions about real-world performance [21, 85]. Also, practical issues such

as computational complexity are most often overlooked. Nonetheless, the

studies by Harpaz et al. and Tatonetti et al. clearly speak in favour of regres-

sion, and the conclusion from Section 2.3.5 that it offers an interesting alter-

native to traditional disproportionality analysis has, in the author’s opinion,

been reinforced rather than weakened.

2.4.2 Improved disproportionality analysis

Another strand of research that is relevant in relation to I is the modification

of disproportionality analysis to mitigate the effects of confounding and

masking. A general strategy to reduce confounding effects is adjustment by

stratification [103]. With this well known statistical technique, the database

is divided into subsets as defined by one or several categorical covariates.

The applied measure of association is then calculated separately for each

subset, and in the end all subset-specific values are combined into an overall

adjusted value. Stratification, for example on patient characteristics, has

been utilised for a long time in ADR surveillance [97], although its useful-

ness has been questioned on empirical grounds [104]. In any case, it is not a

feasible method to reduce confounding by co-reported drugs, since the re-

sulting number of strata quickly becomes unmanageable as the number of

considered covariate drugs is increased.

Various methods to identify and adjust for masking have recently been

proposed [105-108]. Their general strategy is to measure, in some way, how

much a specific drug-ADR pair contributes to the reporting of the drug

and/or the ADR, and then to exclude all reports that contain drug-ADR pairs

with a too high contribution according to some threshold. Compared to re-

gression, this approach is coarser as it dichotomises drug-ADR pairs as ei-

ther too influential or not, and then removes all impact from the former and

none from the latter. Instead, regression attempts to estimate, for a given

ADR, how much of the reporting that could be independently attributed to

each drug, and how much that should be considered general background

reporting. On the other hand, disproportionality analysis adjusted for mask-

ing is computationally much more efficient.

The overall conclusion from these studies is that masking is generally

quite rare, but can have substantial impact on specific drugs and ADRs. This

41

goes against the results in I, where as many as half of all drug-ADR pairs

uniquely highlighted by LLR could be attributed to , the crude back-

ground reporting rate, being higher than the corresponding LLR estimate.

The study by Juhlin et al. [107] is particularly relevant as a reference point

for I, since both are conducted in VigiBase. Using their simple unmasking

strategy in parallel to standard IC analysis, Juhlin et al. increased their total

number of disproportionately reported drug-ADR pairs by about 3%. In I,

LLR’s uniquely highlighted drug-ADR pairs attributable to a lower estimate

of background reporting amounted to an increase of 10% relative to standard

IC analysis. Interestingly, this general difference is very visible in the spe-

cific example venlafaxine-rhabdomyolysis, which was investigated also by

Juhlin et al.: retrospectively, this drug-ADR pair is highlighted by standard

IC in quarter 3, 2010; by IC adjusted for masking in quarter 1, 2007; and by

LLR, as reported in Section 2.2.1.3, in quarter 3, 2001. Whether these differ-

ences are due to regression over-adjusting for masking effects or the simpler

methods under-adjusting is very difficult to tell, though most likely the an-

swer is a combination. Which approach is most useful in practice is a matter

for future empirical investigation.

2.4.3 Other developments in adverse drug reaction surveillance

Other recent promising initiatives in ADR surveillance have moved away

from the exclusive focus on reporting rates that is inherent to both dispropor-

tionality analysis and regression as used in I and in the methods discussed in

Section 2.4.1. One example is screening based on deviating time-to-onset

patterns [109]. Of particular interest are approaches that combine several

different variables suggestive of a causal association into a common predic-

tive model. Not only does this make sense conceptually, but empirical results

have been encouraging [110-112]. An interesting prospect for the future

would be to wed such a method with LLR or another report-level regression

approach, since this might very well yield synergistic benefits.

2.4.4 Causality evaluation in pharmacovigilance

It is difficult to discuss work related to II, since it is conceptual and does not

present results from developing or applying new methods. Yet, the topic of

causality has been at the very heart of pharmacovigilance from its inception

and into the present day [71, 113, 114]. Drug litigations are one relatively

recent focus area within causality evaluation [115].

While II is primarily concerned with the general hypothesis of a drug’s

attribution to an adverse effect and the possibly diverse sources of evidence

attached to it, most of the previous work on causality has focussed on the

assessment of individual cases. One example is the so called Naranjo

42

method, where ten different questions are used to score an individual adverse

event with respect to its likelihood of being caused by the suspected drug

[113]. Some of the reasoning in II is traceable in the so called BARDI

method, which still applies to individual cases, but uses orthogonal and his-

torical information to form a prior probability of the drug causing the ad-

verse effect [116].

43

3 Drug benefit-risk assessment

3.1 Introduction

3.1.1 General overview

As mentioned in Section 1.1, what is commonly referred to as benefit-risk

assessment entails the joint evaluation of a drug’s beneficial and adverse

effects. This evaluation could be the natural extension to discovering a new

risk with the drug, which is a typical scenario in the post-marketing setting.

However, benefit-risk assessments could be conducted for many other rea-

sons and in many other contexts. Regulatory market entry decisions consti-

tute one important and common example; individual treatment decisions

another.

By their nature, benefit-risk assessments are tied to a particular intended

use of the drug. This is primarily because the nature of the sought benefit

may vary greatly with the indication, but also the effectiveness of the drug

may be different. The risks are likely to be similar in different indications,

unless the patient groups are different, or the drug is used differently, for

example with respect to dose, duration, or route of administration.

Benefit-risk assessments may adopt the perspective of a population or an

individual; which perspective is applicable will often be clear from the con-

text. Further, benefit-risk assessments may be comparative, which means

that other available treatment alternatives for the same indication are in-

cluded for comparison.

This thesis focuses on comparative population-level benefit-risk assess-

ment in the post-marketing setting of a newly discovered risk. However, the

contributions and discussions are often relevant in a broader sense, which is

quite natural. For example, an assessment at market entry is conceptually the

same as one post-marketing, although the latter may comprise more effects

and require more information to be processed. Similarly, a method that

works for comparative assessments should work also if alternative active

treatments are disregarded.

Finally, it may be worthwhile to re-iterate the intrinsically dynamic nature

of benefit-risk assessment: Any evaluation is predicated on the current state

of knowledge and is subject to change at any time. Not only may safety

problems emerge, but new evidence may be obtained as regards the drug’s

44

effectiveness in clinical real-world use. Further, other drugs may appear that

are more effective and/or safer.

3.1.2 Frequency and desirability of drug effects

When assessing a certain effect from a drug, whether positive or negative,

both the relative frequency and the desirability – or the undesirability – of

the effect are of profound importance. As regards frequency, it is for exam-

ple quite clear that, if tied in every other aspect, an antibiotic that treats an

infection in 75% of patients is better than one that is effective in only 50% of

patients. Desirability is less tangible but clearly depends on the intrinsic na-

ture of the effect – in essence to what extent it does good or bad – as well as

its duration. Trivially, it is preferable to contract a mild rather than a severe

headache as a side effect from drug treatment; also, one and the same ad-

verse effect, e.g. insomnia, can range from a mere nuisance if lasting for a

few days to a real handicap if persistent for months. For now we shall take

these bland statements as heuristic motivation to discuss what type of infor-

mation that may be available in relation to the frequency and desirability of a

given effect. However, there is also a theoretical basis for declaring them the

two primary dimensions of benefit-risk assessment (see Section 3.1.3). Reas-

suringly, already some of the earliest approaches to structured benefit-risk

assessment adhered to these principles, whether knowingly or not [19, 117].

The same holds true for competitive methods proposed more recently (see

Section 3.4.2).

Frequency is a very well known and important entity in drug investiga-

tions. Marketing applications are based on randomised clinical trials, so

called phase III trials, which measure the relative frequency of patients in the

drug and placebo arms meeting some defined efficacy endpoint [118]. Some-

times comparison is made to another drug rather than to placebo [119]. Be-

cause a fair portion of patients typically respond to treatment, these trials can

statistically demonstrate a difference in efficacy, if present, for the well de-

fined patient population considered. However, with typically less than 2,000

patients exposed during the pre-marketing trials, a quite common adverse

effect occurring at a rate of about 1 in 1,000 or less may not have been ob-

served at all [16, 117].

After marketing, frequency continues to be measured. So called phase IV

trials are often designed in a similar fashion to phase III trials, though they

are usually conducted in a regular clinical setting [118]. Such trials may be

large enough to detect and quantify some additional adverse effects. Fur-

thermore, in the post-marketing setting there is the possibility to conduct

observational studies, for example in healthcare utilisation databases [120].

Such studies can include a huge number of patients treated under real-life

conditions, and can be used to directly compare alternative drugs, both with

respect to effectiveness and risk. While these studies, by their size, are more

45

apt to measure the frequency of rare adverse effects than are randomised

clinical trials, they too are likely to be underpowered for many important

adverse effects [44]. Also, observational studies are not randomised, which

means that confounding must be accounted for to ensure validity of the re-

sults [120].

Desirability is fundamentally different from frequency in that it generally

involves subjectivity. Different people may value quite differently the

(un)desirability of, for example, living with cancer, experiencing rhabdo-

myolysis as an adverse reaction, or suffering from allergic rhinitis. The in-

formation basis for this less tangible dimension appears to be weaker than

that for frequency, and benefit-risk assessors may well incorporate their own

explicit or implicit value judgements [121]. Desirability is more extensively

studied in the area of health resource allocation [122], and large repositories

of health-related quality estimates are available [40, 41]. However, it is far

from clear how such information can be reliably included into benefit-risk

assessments, in particular as estimates vary greatly across elicitation meth-

ods and respondent characteristics [40, 42, 122]. Also, these estimates are

generally not derived in a context focused on adverse effects from drugs

with all their variations in nature and outcome. It can therefore be suspected

that values for the desirability of important drug effects will be unavailable

from time to time; this supposition is also supported by several examples

from published benefit-risk assessments (see Table 2). Targeted elicitation in

the relevant patient population with a specific benefit-risk assessment in

mind is clearly possible [38, 39], but requires time and resources.

A final remark is that some benefit-risk assessments consider only fre-

quency, for example by directly subtracting the number of excess events of

one harmful effect caused by the drug from the number of events of another

harmful effect prevented by the drug [123]. However, this is not advisable as

it implicitly assumes that the two effects are equally (un)desirable, which

they in general will not be. This approach can possibly be traced back to the

logic of the number needed to harm (NNH), which is the reciprocal of the

absolute risk increase associated with a drug, and its equivalent NNT, the

number needed to treat [124]. The NNH and NNT may be useful summary

metrics of frequency estimates, but the subtraction of one from the other is

not a viable benefit-risk assessment method [12].

3.1.3 Decision analysis

Two common arguments for using structured and possibly quantitative

methods to support benefit-risk assessment are the prospects of increasing

transparency and consistency [15, 121]. Decision analysis is one such ap-

proach that has been proposed as generally useful for benefit-risk assessment

[12, 28, 33, 125]. Here are three arguments put forward to support this view:

46

Table 2. Examples of benefit-risk assessments where quantitative information on the

desirability of one or more drug effects is missing, requiring the use of substitute

values or ad hoc assumptions.

Assessment Real effect(s) Substitute effect or

assumption

MCV4 vaccination for meningococcal disease [43]

Guillain-Barré syndrome in adolescents

MS in adults [126]

Natalizumab and interferon ß-1a in MS [35]

All side effects except PML, for both treatments

State of being treated with interferon ß-1a, based on description [127]

Natalizumab and interferon ß-1a in MS [35]

PML Worst possible health state according to the Health Utili-ties Index III

Isoniazid in tuberculosis [36]

Hepatotoxicity, primarily non-seriousa

Liver transplantation [128]

Alosetron in irritable bowel syndrome [34]

Adequate relief from treatment

Average of irritable and non-irritable bowel syndrome

Alosetron in irritable bowel syndrome [34]

Outpatient complication Average of irritable bowel syndrome and hospitalisation

Raloxifene and tamoxifen in breast cancer prevention [129]

Invasive breast cancer, hip fracture, endometrial cancer, stroke, and pul-monary embolism

Weight 1.0b


In situ breast cancer and deep vein thrombosis

Weight 0.5b


Colles fracture, spine fracture, and cataract

Weight 0.0b

a The probability of this effect was a pooled estimated from three studies, where one reported

zero hospitalisations and zero fatalities [130], and another reported a single hospitalisation

and no sequelae [131] b These weights are apparently arbitrary

MCV4 = meningococcal conjugate vaccine

MS = multiple sclerosis

PML = progressive multifocal leukoencephalopathy

Benefit-risk assessments are often used to inform decisions: Should

a certain drug be allowed to enter or remain on the market, and

which, if any, out of several available drugs should a particular pa-

tient use? The ultimate purpose of decision analysis is to evaluate

complex decision problems.

A core component of decision analysis is the structuring of prob-

lems. This seems useful given the diverse body of information pre-

sented in Section 3.1.2 that is typically at hand in benefit-risk as-

47

sessments. Particularly attractive is the decision-analytical decompo-

sition into probability and utility, which maps well to the benefit-risk

dimensions of frequency and desirability. (More elaboration on this

point follows further below.)

Benefit-risk assessments are dependent on the context, for example

as seen in the distinction between population- and individual-level

evaluations. Decision analysis may well be general enough to ac-

commodate different contexts within a common framework.

Decision analysis encompasses both prescriptive theories of rational

choice and methods and tools to aid in a practical decision problem. In es-

sence, therefore, it is applied decision theory [132]. One important topic in

classical decision theory is the choice between actions whose respective

possible outcomes have (objectively or subjectively) determined probabili-

ties attached to them. If a decision-maker can assign utilities to the outcomes

such that they represent her preferences, then the all-dominating principle of

maximising expected utility can – and should – be used to choose an optimal

action. An alternative’s expected utility is its probability-weighted sum of

utilities over its possible outcomes. If there are multiple criteria in a deci-

sion, utilities can be constructed by assigning each criterion an importance

weight [133].

Ramsey was the first to present a unified framework for probabilities and

utilities [134]. von Neumann and Morgenstern formalised this in the case of

objective probabilities [135], and Savage in the case of subjective probabili-

ties [136]. These theories start out with defining a set of axioms supposed to

represent a rational decision maker, from which follow the existence of utili-

ties, and subsequently the principle of maximising expected utility itself.

Notably, because probabilities and utilities suffice to compute expected util-

ity, the theory implies that information on these two dimensions is enough to

prescribe an action [12]. Relative frequencies are estimates of probability,

and desirability as discussed previously is really the same thing as utility,

since both should correspond to preferences for various outcomes. This

shows that frequency and desirability really are the two primary dimensions

of benefit-risk assessment, provided that desirability can be valued coher-

ently with respect to preferences. The terms ‘frequency’ and ‘desirability’

will be used for the general concepts, whereas ‘probability’ and ‘utility’ will

be used in decision-analytical contexts.

It should be stressed that expected utility theory requires one set of prob-

abilities for each considered action; functions of probabilities such as ratios

or differences are in general not sufficient. In our considered application,

probabilities either correspond to effectiveness or risks, and clearly the re-

quirement applies to both. However, when frequency is measured in relation

to drug effects, as discussed in Section 3.1.2, ratios or differences are com-

48

monly used for comparative purposes. Appendix A elaborates on this dis-

crepancy, using risk as the example of probability.

The construct of expected utility offers a first important decomposition of

decision problems: The probability that a given state of nature occurs is a

wholly separate entity from the utility of the consequences if that state of

nature were to occur. The wider discipline of decision analysis offers further

decomposition and other structural support to disentangle a decision prob-

lem. Key steps preceding probability and utility assignment and actual

evaluation include definition of the problem, definition of the objective(s),

identification of the viable alternatives, and structural modelling of the prob-

lem, for example graphically [137]. Often sensitivity analysis is mentioned

as a crucial step following the primary evaluation [138]. Computerised deci-

sion support systems that can aid in both structuring and evaluating a deci-

sion problem are readily available [139].

Influence diagrams and decision trees are the two principal types of

graphical models available to structure a decision problem [137]. This thesis

will only consider decision trees, which are acyclic graphical models with

nodes connected by edges. Decision nodes, chance nodes for uncertain

events, and terminal nodes for consequences are typically denoted by

squares, circles, and triangles, respectively. Edges stemming from chance

nodes are assigned probability variables, and consequence nodes are as-

signed utility variables. For a formal description of decision trees, see Sec-

tion 2 in IV. For some simple examples, see Figure 3 and Figure 4 in Ap-

pendix A.

3.1.4 Decision problems in benefit-risk assessment

If a benefit-risk assessment concerns the therapy of a single patient, the deci-

sion problem is clearly defined: What drug, if any, should that particular

patient choose? The objective is to maximise the individual patient’s health.

For an early example, see [140].

Since the ultimate objective of all drug use is to maximise patients’

health, essentially the same decision problem can be used for population-

level as for individual-level assessments. The analysis would then pertain to

a hypothetical patient representing the relevant population. If the available

information is granular enough to differentiate between patients according to

their characteristics, a series of corresponding analyses are needed.

This approach is used for all benefit-risk assessments presented in this

thesis, in V and VII. It implies that for policy decisions, where the real ques-

tion might be ‘Should this drug be allowed on the market with this specific

indication for use?’, it is actually a projected decision problem that is being

analysed. While such population inference based on analysis of patient-

centred decisions has been implicitly suggested on a number of occasions

[33, 141, 142], its general motivation and implications deserve more atten-

49

tion. One important logical implication is that benefit-risk assessments, also

at the population level, should primarily consider patients’ values and pref-

erences, and not those of e.g. policy makers or physicians. This draws on

significant philosophical issues outside the scope of this thesis. For example,

to what extent should patients be empowered to decide for themselves what

risks they are willing to take, and who should assume responsibility when

adversities do occur?

It should be noted that in this thesis, patient health is considered the only

relevant objective for benefit-risk assessment. An additional monetary objec-

tive is discussed in Appendix B.

Once the decision problem and its objective are set, the viable alternatives

should be identified. As a minimum, the drug of interest and the option not

to treat should be considered. The no-treatment option will implicitly intro-

duce the unmanaged disease as a reference point for the drug’s adverse ef-

fects, which corresponds well to common-sense principles. For example,

having vomiting as a common side effect is considered acceptable for che-

motherapeutic agents, but would be unacceptable for, say, a contraceptive

drug. Many newly introduced drugs have not been compared head to head

with another active treatment [119]. However, there are strong arguments

why benefit-risk assessments should be comparative nonetheless. In clinical

practice there are typically several drugs available for the same indication,

each with its own profile of benefit and risks; and further, each decision to

keep a drug off the market will require patients at need of treatment to use

some other drug, which suggests that those other drugs should be considered

in the decision [19, 120, 143]. This perspective is also emphasised by the

FDA in their newly developed approach to benefit-risk assessment [14].

As mentioned in Section 3.1.3, decision trees will be the only type of

structural model considered here. A discussion of this choice within the con-

text of the benefit-risk assessment application is deferred to Section 3.4.2. In

this application, a decision tree should aim to portrait what could happen to

the patient following the different available alternatives. Uncertain events

such as onset of adverse effects or cessation of disease are included as

chance nodes with associated probability variables. Each branch depicts a

specific chain of events whose entirety makes up one of the possible clinical

outcomes; the consequence node in which the branch terminates associates

this clinical outcome with a utility variable. Two questions are of primary

interest: What uncertain events should be considered, and how should they

be modelled in a common structure?

It appears evident that some change in relation to the disease should be

included among the considered events. Prevention or alleviation of this dis-

ease will be accounted for as drug benefit. Apart from that, at least in a ma-

jority of cases, the only other relevant events will be adverse effects. Since

inclusion of all known adverse effects is likely to result in an unnecessarily

complex model, a key issue is to define a reasonable subset. One approach is

50

to limit inclusion to those that are thought to have a real impact on the bene-

fit-risk profile, e.g. the commonest and the most serious [117].

Once the set of relevant events has been identified, there will be several

possible ways in which to model them in a tree structure. Although this

should be an important design choice in the analysis, it appears to be rarely

discussed in the context of benefit-risk assessments. However, there are

guidelines for general medical decision problems, many of which are appli-

cable also in benefit-risk assessment [144]. It should be generally reasonable

to simplify as much as possible without compromising too severely the cor-

respondence to clinical reality. For example, rare events can often be mod-

elled as mutually exclusive.

3.1.5 Accommodating uncertainty in decision analysis

Once the model has been specified, probability and utility variables need to

be assigned values so that evaluation can proceed. Classical decision theory

requires exact specifications of probabilities and utilities, which often repre-

sents an idealised view of reality. The overview in Section 3.1.2 should

make it quite clear that uncertainty is a serious issue in the benefit-risk as-

sessment application. This suggests that approaches to accommodating un-

certainty in decision analysis need to be considered.

Ways to extend decision theory to allow for uncertainty in the form of

imprecise probability variables has long been a subject for research [145-

149]. One of the most popular approaches is the use of interval probabilities,

whereby a decision maker can specify upper and lower bounds for probabil-

ity variables of a decision model [150-152]. Several methods for evaluating

decision problems with such input have been suggested [153-155], and a

decision support system that allows interval as well as qualitative constraints

for both probabilities and utilities has been developed [139]. The main prob-

lem with interval approaches is that aggregation of probabilities and utilities

results in expected utilities that also take the form of intervals. Since there is

no degree of belief attached to the intervals of the input variables, a resulting

expected utility difference interval covering zero can only be interpreted as

both of the corresponding alternatives can be preferable, which precludes

discrimination. For benefit-risk assessment, the interval approach appears to

be too coarse. At least for some parameters of a decision problem, e.g. those

relating to frequency of beneficial outcomes, the uncertainty will typically be

better characterised than can be conveyed by an interval.

Probabilistic decision analysis is an alternative approach in which one

specifies probability distributions over the parameters of a decision problem.

Such distributions are far more informative than plain intervals and should

therefore significantly improve the chances of meaningful analysis. Interest-

ingly, one of the earliest applications of this approach was in the area of

medical decision making [50], and so called probabilistic sensitivity analysis

51

now appears to be commonplace in the medical domain [51, 52]. In its basic

form, it relies on specification of distributions for all available parameters,

followed by Monte Carlo simulation. Expected utilities are computed in each

iteration of the simulation, and the result is a simulated distribution over all

expected utilities and their differences. Intuitive inference from the evalua-

tion is provided by the respective proportions of all iterations in which the

different alternatives obtain the highest expected utility [33]; these propor-

tions are referred to as preference rates in V and VII.

The use of probabilistic decision analysis in the medical domain is gener-

ally pragmatic: most often standard distributions are employed, and key as-

pects are practical use and interpretability of results. The methodology itself

is continuously improving [156-158], also outside the medical domain [159].

The scientific community devoted to imprecise probabilities has re-

searched probabilistic decision analysis from another perspective, emphasis-

ing the mathematical foundations [160, 161]. Here, a probability variable

with its associated uncertainty captured in a probability distribution is re-

ferred to as a second-order probability, and more careful thought is given to

the task of modelling the input provided by a decision maker within such a

framework. Less attention is given to the practical aspects of evaluating de-

cision problems, and simulation techniques appear to be rarely used. How-

ever, exceptions do exist [162, 163].

There are other accounts of uncertainty than the probabilistic version, of

which fuzzy set theory [164] appears to be the best developed and most

widely applied. Notably, it has been successfully integrated with decision

analysis as a means to handle uncertainty [165, 166]. Fuzzy set theory has

also been used in medical applications [167], including some very specific

decision problems [168]. However, for the work in this thesis, probabilistic

decision analysis was found to be a superior general methodology for ac-

commodating uncertainty compared to fuzzy set theory and other alterna-

tives. The main reasons are:

the support that comes with the well developed theory and computa-

tional machinery from mathematical statistics, easily available in

programming languages for scientific computing;

the wide acceptance that the probabilistic approach already has

gained in the area of medical decision making; and

the relative simplicity of the probability concept, which facilitates

communication of analysis methodology and results with subject

matter experts like regulators and physicians.

An illustration of the entire framework of probabilistic decision analysis

in the context of benefit-risk assessment is provided in Figure VII:1.

52

3.2 Contributions

3.2.1 Qualitative utility modelling


Decision analysis with probability and utility variables specified as intervals

can readily accommodate qualitative utility statements [153]. However, prior

to IV, very little work had been done to incorporate the corresponding capa-

bility into probabilistic decision analysis. The only method of which we are

aware was designed for multi-criteria decision analysis (MCDA) preference

weights with equal uniform distributions and a complete ordinal ranking

among each other [163]. This was not, however, deemed appropriate for our

purposes, partly because complete rankings are likely to be too inflexible,

and partly because our intention was to use decision trees rather than

MCDA.

Our primary motivation for exploring the possibilities to merge qualita-

tive utility modelling with the quantitative framework of probabilistic deci-

sion analysis was to offer practically useful support in the benefit-risk as-

sessment application. As discussed in Section 1.2 and further in Section

3.1.2, numerical information on the desirability of drug effects can be unreli-

able or missing, which superficially implies that active elicitation studies

would be required for accurate assessment. However, qualitative relations

among the possible clinical outcomes of a benefit-risk assessment are likely

to be much easier and cheaper to obtain, and hence it would be very valuable

to be able to make use of them in the analysis. Indeed, this is the background

to the second objective stated in Section 1.3, and the motivation for the pro-

posed solution presented below. Obviously, other applications of decision

analysis could benefit from such a development as well.

3.2.1.2 Description of the technical challenge

The likely explanation to the lack of useful methods for qualitative utility

modelling within probabilistic decision analysis is the technical rather than

the conceptual difficulty. The core of the challenge lies in the fact that quali-

tative relationships such as ‘utility is greater than utility ’ induce de-

pendencies among variables. This means that one can no longer simulate the

variables independently of each other, but simulation must be performed

from a joint, multivariate, distribution. Probability theory has a mature

framework for some types of dependence, for example as captured by the

covariance matrix in the multivariate Gaussian distribution. This fact has

been utilised to model dependence in MCDA models with uncertain inputs

[169]. However, the type of dependence that results from qualitative con-

straints is very different in nature, characterised by the fact that the multi-

variate densities are zero over vast regions of their supports.

53

Some notation is needed to more clearly describe the issue. Let be the

random variable over the utility prior to the introduction of any qualitative

constraints, and let be the corresponding random variable after all con-

straints involving have been considered. In the simplest possible scenario,

all utilities are related in a strict ordering such as . (Here, the

indexing conforms to the ordering, for notational convenience.) In other

words, the resulting random variables are jointly defined in the following

way:

(5)

It is conceptually clear that, apart from a multiplicative scaling factor,

should have the same probability density function as

whenever the constraint is fulfilled, and zero

otherwise. Rather, the issue is technical: how does one sample from the dis-

tribution ? It is possible to sample from separately,

since they are independent, and then retain samples that fulfil the constraint.

However, this approach known as rejection sampling becomes computation-

ally very inefficient unless is low. The reason is simply that the constraint

becomes vanishingly improbable as grows, and so one needs a large num-

ber of samples to retain just a single one for further analysis. For example, if

all variables are equidistributed, the probability of the total order-

ing constraint is .This probability is 1 in 120 if and 1 in about 3.6

million if .

3.2.1.3 Proposed computational solutions

In IV, it is first shown that the distribution exists and can be

characterised by a multidimensional integral. It is then demonstrated how

efficient sampling can be achieved in the case of a strict ordering where all

variables have the same uniform distribution. The proposed solution is to

tailor a generic technique called the inverse transformation method (ITM)

[170] to this particular situation.

From this basic result, which in itself is only moderately useful, two or-

thogonal extensions are made. The first is also described in IV, and permits

the variables to have arbitrary uniform distributions, although they must

still be strictly ordered, at least group-wise. The proposed algorithm operates

basically by first splitting up the unit interval into smaller sub-intervals, and

then determining all possible configurations, i.e. all ways in which the vari-

ables can be distributed over these sub-intervals. Thereafter the joint distri-

bution is simulated as a mixture distribution over the permissible configura-

tions.

The second extension, which is two-fold, is introduced and applied in V.

Again assuming that all variables have the same uniform distribution, it is

54

described first how an arbitrary, though necessarily coherent, set of qualita-

tive relations can be allowed, and secondly how minimum differences in

utility can be introduced between individual variables or groups of variables.

The generalisation from strict orderings to arbitrary sets of relations is

practically very significant, since a benefit-risk assessment will typically

contain a fair number of clinical outcomes, which cannot in general be ex-

pected to be easily put in a strict preference order. Conceptually this devel-

opment is rather uncomplicated: each possible permutation of the variables

is either compatible or not with the set of relations, and clearly only the

compatible ones are of interest. Because the variables are equidistributed

and independent, every possible order, and therefore every possible permuta-

tion, is equally likely when randomly sampling from . Conse-

quently one can, in every iteration, first sample values as if the variables

were strictly ordered (using ITM as presented in IV); then randomly select

one of the permissible permutations; and finally map the sampled values to

the variables according to the particular permutation selected. The main

computational issue relates to the identification of the permissible permuta-

tions. For example, the MCV4 vaccination case study in V contains twelve

utility variables, which means that there are about 480 million possible per-

mutations. Hence it is not feasible to make the random selection directly

from the set of all possible permutations and then test the randomly selected

permutation against the relations; rather the set of permissible permutations

must be constructed from the relations prior to random selection. In this ex-

ample, only 210 permutations actually conform to the relations displayed in

Figure V:7.

The extension to allow minimum utility differences is not quite as signifi-

cant, but can to some extent compensate for the incapability of assigning the

variables different distributions. Computationally this development fol-

lows almost immediately from the core ITM-based algorithm presented in

IV; for details, please see Appendix B in V.

Overall this computational framework may appear quite rigid and there-

fore insufficient to support the practical application: effectively one must

choose either arbitrary uniform distributions in combination with a strict

ordering, as presented in IV, or a common uniform distribution for all vari-

ables combined with flexible relations and minimum utility differences, as in

V. However, as we shall see, the latter option works well in practice, at least

in the assessments performed to date. Interested readers are referred to Ap-

pendix C for a detailed technical discussion on alternative solutions and pos-

sible extensions.

3.2.1.4 Utility modelling in benefit-risk assessment

The computational framework from V with arbitrary qualitative relations and

minimum utility differences was applied in that same paper to three previ-

ously published benefit-risk assessments based on probabilistic decision

55

analysis: the antihistamines terfenadine and chlorpheniramine for the treat-

ment of allergic rhinitis [33], MCV4 vaccination against meningococcal

disease [43], and alosetron for the treatment of irritable bowel syndrome

[34]. The main aim of V was to investigate whether a framework of prob-

abilistic decision analysis that relies on qualitative utility modelling is suffi-

cient to reach conclusive results.

All utilities were initially assigned uniform distributions on the entire unit

interval. This design choice can be viewed pragmatically as a means to let

the qualitative utility modelling rather than the initial distributions be the

primary driver towards the resulting utility distributions . More theoreti-

cally, if we again view a benefit-risk assessment for a population as an

analysis of a treatment decision problem for a hypothetical representative

patient, the standard uniform distribution should be a reasonable representa-

tion of our ignorance regarding this patient’s numerical preferences for vari-

ous clinical outcomes. There is a close conceptual correspondence to non-

informative prior distributions as used in Bayesian statistics.

The main practical concern with relying on standard uniform distributions

for all utility variables is that the qualitative relations might be insufficient to

differentiate between outcomes of intrinsically different nature and desirabil-

ity. This is where the minimum utility differences have an important role to

play, which is usefully illustrated in the terfenadine/chlorpheniramine case

study: despite a small model (see Figure V:1) with few qualitative relations

(see Figure V:2), the utilities are well spread out in terms of marginal distri-

butions, as seen in Figure V:3a. However, they are evenly spread out, which

fails to recognise the inherent difference between stating, on the one hand,

that having allergic rhinitis is less desirable than not having it, and, on the

other hand, that a lethal ventricular tachyarrhythmia is less desirable than a

non-lethal. This deficiency is mitigated by including, in this model, a mini-

mum utility difference between lethal and non-lethal outcomes. The effect

on the resulting distributions is immediately clear, as shown in Figure V:3b.

Because the exact value of such a minimum utility difference cannot be

known to any greater extent than the actual utilities themselves, sensitivity

analysis is performed over the possible values. Figure V:5 shows how the

resulting preference rate for one of the alternatives in this model varies with

the minimum utility difference.

The general modelling strategy in V was to include only such relations

that are either logically implied or that could be considered widely agreed

upon from a clinical perspective. This reflects the use of a common model

for an entire population, and additional relations could be included if the

model were to be used by an individual patient in an actual treatment deci-

sion. While minimum utility differences between lethal and non-lethal out-

comes were used in all three case studies, the MCV4 vaccination assessment

also separated permanently disabling clinical outcomes from non-serious

clinical outcomes. The structural models as well as the distributions over

56

probability variables were the same as those used in the originally published

assessments. All three case studies were originally evaluated based on a

standard metric (QALYs; see Section 3.4.2.2), and a comparison was made

between the original results and the results obtained with our framework

based on qualitative utility modelling.

3.2.1.5 Results and conclusions

Three main conclusions were drawn based on the results in V:

the proposed approach appears to be sound, since its results overall

agreed with those of the standard reference method;

the proposed approach can carry sufficient discriminative power,

since in the MCV4 vaccination case study the results were conclu-

sive; and

the suggested sensitivity analysis for minimum utility differences

can provide additional insights. Specifically, the results suggested

that alosetron’s benefit-risk balance could be reversed in the case of

high risk-aversiveness for lethal outcomes.

Clearly, three case studies is a rather limited basis on which to draw more

general conclusions. Nevertheless, the results in V were judged promising

enough to recommend the developed framework to be used in the benefit-

risk assessment of methylprednisolone in MS relapses, presented in VII.

For a detailed discussion of the empirical value of IV and V in relation to

the overall thesis aim, see Section 3.3.1.

3.2.2 Risk quantification for rare adverse effects


As discussed in Section 3.1.2, how much one knows about the frequency of

an effect relevant to a benefit-risk assessment depends on the type of the

effect and the age of the drug. Naturally, the rarer the effect, the more diffi-

cult to quantify its frequency: as we know from Section 2, many adverse

effects have not even been discovered when a drug is first marketed. Unfor-

tunately, some rare adverse effects of importance may not even be quantifi-

able in very large observational studies [44]. Also, the standard pharma-

coepidemiological study design for rare outcomes, the case-control study, is

not sufficient to inform an overall benefit-risk assessment (see Appendix A).

Nonetheless, adverse effects detected in the post-marketing setting, of

which most are likely to be rare, can alter the benefit-risk balance of a drug

quite dramatically: complete market withdrawals for safety reasons are un-

57

fortunately not uncommon [17]. This clearly suggests that benefit-risk as-

sessments in the post-marketing setting must be able to account for signifi-

cant rare adverse effects, which inevitably entails consideration of their fre-

quency. However, this issue has received little attention in recent benefit-risk

assessment method development, which may be due to a strong focus on the

initial market licensing decision [12, 15].

The importance of individual case reports for the detection of new drug

risks is unquestionable, as discussed in Section 2. At the same time, it is

universally acknowledged in pharmacovigilance that relative reporting rates

in collections of individual case reports cannot be interpreted in the same

way as incidences in formal studies. In other words, reporting rates are not

risk estimates. Interestingly though, several empirical investigations show

that individual case reports are by far the most frequently used source of

evidence in safety-related regulatory actions, such as market withdrawals of

drugs [46-49]. This suggests that individual case reports by some unarticu-

lated and implicit process are attributed quantitative risk information. As far

as we are aware, no-one has suggested what this information may encompass

or how it could be explicitly formulated. However, vague ideas have been

presented on how reporting rates could be combined with information on

drug exposure to yield lower limits on the true risk [171].

In summary, some rare adverse effects clearly need to be considered in

post-marketing benefit-risk assessments, although their risks cannot always

be quantified with currently available methods. Further, individual case re-

ports are tailored to the study of adverse effects from marketed drugs, and

have a documented impact on benefit-risk assessments in practice. Conse-

quently, in direct response to the third objective stated in Section 1.3, VI

aims to explore the boundaries for individual case reports as a source of in-

formation on the risk of adverse effects under exposure to drugs.

3.2.2.2 Risk limits derived from collections of individual case reports

In recognition of the impossibility to estimate risks directly from individual

case reports, an attempt was made in VI to formalise the idea of a lower risk

limit, and to complement that with an upper risk limit. This was achieved

with the aid of a theoretical model that relates drug exposures and adverse

events in the real world to adverse events reported and deposited in a data-

base (see Figure VI:1). This model contains a novel entity labelled ‘adverse

episodes’ which is supposed to capture the real-world equivalent of an indi-

vidual case report in terms of adverse events that are clustered in a temporal

and clinical sense.

In the notation of Equation (1), this model is used to derive the reporting

ratio as an upper limit for the total risk of adverse effect following

exposure to drug . This limit is valid under two assumptions: (i) the aver-

age number of adverse episodes following exposure to is one or less; and

(ii) adverse episodes that follow and contain are more frequently re-

58

ported than adverse episodes in general that follow . In addition, a lower

risk limit is derived as the ratio between the number of reports on together

with and the total number of exposures to , where the latter must be col-

lected in an external data source. This limit is valid under the assumption

that exposures to that are followed by generate on average at most one

report on together with .

In VI there is limited empirical validation of the proposed approach, es-

sentially restricted to the single example of narcolepsy following Pandemrix

vaccination [172]. While proper validation is important work that should

follow, it must be realised that the derived limits and their assumptions are

logical consequents of the proposed model. This means that if one can trust

the model and reasonably argue in a given situation that the assumptions tied

specifically to the limits hold, then one can quite confidently apply this

framework in a real assessment, such as in VII. Also, any validation, how-

ever rigorous, cannot overcome the fundamental issue that this approach is

primarily relevant for adverse effects whose risks cannot be quantified with

other methods, and for those adverse effects there apparently is no available

reference to validate against.

For these reasons, a large proportion of VI is spent discussing the model

itself and the limit-specific assumptions. The model unrealistically presup-

poses that each adverse episode is reported at most once, and that each report

corresponds to an adverse episode that has actually occurred. However, it is

concluded that the phenomena that can disturb these assumptions, e.g. dupli-

cate reporting and misrepresentation of adverse episodes on reports, are rela-

tively small and manageable in practice. As regards the lower limit, this

should always be valid so long is rare, which is the situation of interest

here. For the upper limit, finally, assumption (ii) introduced above should

hold generally if is serious in nature, which again matches our interests:

rare adverse effects that are not serious are very unlikely to be of any signifi-

cance in a benefit-risk assessment. Assumption (i) is less straightforward,

but is more likely to hold the shorter the duration of treatment and the

healthier the patient population. If there is doubt regarding its validity, vari-

ous supportive modifications are detailed in VI. An additional level of reas-

surance is provided by the fact that any margin of violation of either of the

assumptions (i) and (ii) can be compensated for by the other. (For further

elaboration on these assumptions, see Validity of the underlying assump-

tions, in publication VI.)

3.2.2.3 Applicability to probabilistic benefit-risk assessment

It may superficially appear contradictory to first argue in Section 3.1.5 for

the superiority of a probabilistic treatment of uncertainty relative to one

based on intervals, and then proceed to propose no more than an interval for

the risk of rare adverse effects. However, this view fails to recognise that the

various beneficial and adverse effects of a benefit-risk assessment differ

59

greatly with respect to how well their respective frequencies can be quanti-

fied: to dismiss the richer body of information available for the frequencies

of other effects simply because an interval is the best we can get for some

rare adverse effects would certainly be wasteful. In contrast, it appears more

reasonable to view the framework developed in VI as an improvement com-

pared to the previous inability to handle some rare adverse effects.

Nonetheless, some thought is required with respect to the practical use of

these intervals in probabilistic benefit-risk assessment. The clearly most

straightforward scenario is when the assessment is insensitive to the value of

the risk in question so long as it is contained within the derived interval. In

other words, if setting the risk to the lower limit provides the same answer

for the benefit-risk assessment as a whole as setting it to the upper limit, then

there is little meaning trying to provide a reasonable distribution over the

interval.

If one finds that the assessment is in fact affected by shifting the risk from

its lower to its upper limit, the most reasonable approach is to carry out a

sensitivity analysis with a number of different distributions. The analysis can

be made more effective by narrowing down the possibilities from an initial

state of ignorance. For example, assume that the drug has been tested on

1000 patients in randomised clinical trials without a single occurrence of the

adverse effect in question, and that the upper risk limit derived from post-

marketing individual case reports is 0.01. If the risk were in fact as high as

0.01, the outcome of zero events in 1000 patients is highly unlikely: the

probability is below 0.0001. Clearly this argument disregards the substantial

differences between the controlled trial setting and clinical reality, but still it

is difficult to believe that the risk would be that high. In this example, even

half the upper limit, i.e. a risk of 0.005, is highly unlikely in view of the

clinical trial results, with a probability below 0.01. Therefore it seems rea-

sonable to focus on distributions centred closer to the lower than the upper

limit, while still permitting values as high as the upper limit to be sampled

with very low probability. See Figure VI:4 for some examples variably

skewed towards the lower limit.

A final remark is that the lower limit can sometimes be difficult to derive,

since it requires an external estimate of drug exposure. One must then make

do with the natural lower risk limit of zero. However, the reasoning above is

still valid.

3.2.3 A clinically significant benefit-risk assessment case study

3.2.3.1 Overview

The various contributions described thus far all have their own merits and

their own specific relevance to some aspect of benefit-risk assessment.

While several of them have more general relevance and applicability, the

60

main focus of this thesis is to portray them as a continuum of tools that can

be applied to provide support in the specific application of drug benefit-risk

assessment. The benefit-risk assessment of methylprednisolone in MS re-

lapses presented in VII is very useful in this respect, since all contributions

in this thesis point to this benefit-risk assessment in some way. A walk-

through of VII is useful also in the sense that it becomes quite clear what

aspects of the benefit-risk assessment process that are not subject to investi-

gation elsewhere in the thesis.

Probabilistic decision analysis is the general framework chosen for bene-

fit-risk assessment in this thesis, and consequently the backbone of the as-

sessment in VII is outlined in Section 3.1.3. However, another fundamental

aspect is the initiation, i.e. the realisation that there even is a decision prob-

lem to evaluate. In this thesis, initiation is handled in Section 2: the method

of LLR is devised in I, leading to the discovery of a possible link between

methylprednisolone and hepatotoxicity; then, taking into consideration the

discussions in II, this issue is further evaluated, in particular with respect to

causality, and is later communicated in III. However, the details of any

process leading from a discovery such as that of methylprednisolone-induced

hepatotoxicity presented in III to the benefit-risk assessment in VII are

likely to differ from case to case. In this example it was a combination of

primarily two factors: the abundance of patients using methylprednisolone

for MS relapses among those affected by hepatotoxicity in the individual

case reports, and the articulated need for a proper assessment of methylpred-

nisolone in this indication [90, 91]. This latter point is worth emphasising, to

underline the clinical importance of the issue investigated in VII: while sev-

eral million people worldwide are affected by MS, the all-dominating treat-

ment to manage relapses has not yet been evaluated with respect to both

benefit and risk.

As mentioned in Section 3.1.4, the decision problem in a benefit-risk as-

sessment can be defined as the selection of treatment, if any, by a hypotheti-

cal patient who matches the considered indication for use, in this case MS

relapses. This can be further refined by more clearly specifying the charac-

teristics of this patient; this is done in VII for example by including both the

relapsing-remitting and the progressive forms of MS, while excluding optic

neuritis. The objective of the analysis is to maximise the patient’s health,

considering the possible clinical outcomes that may follow from choosing

each of the different alternatives. In this example, alternatives were included

on the basis of what is being used and recommended and what has been dis-

cussed as feasible alternatives [90, 91], but also pragmatically based on data

availability.

There are few recommendations on structuring decision problems for

benefit-risk assessments, as mentioned in Section 3.1.4. Even the more basic

problem of selecting beneficial and adverse effects to consider in the as-

sessment is very open, though guidelines suggest it should not be an exhaus-

61

tive selection [20]. The real issue is often the selection of adverse effects,

since the potential benefit is typically quite obvious and previously defined.

There are no contributions in this thesis that deal specifically with this prob-

lem, and the literature provides very little guidance. Therefore an approach

was developed to facilitate the assessment in VII (see Section 3.2.3.2).

Three types of probability variables are considered in VII: effectiveness,

i.e. the probability of the beneficial effect, risk of non-serious adverse ef-

fects, and risk of serious adverse effects. The two former were estimated

from clinical trial data using standard methods from Bayesian statistics.

However, for the latter no data were available that permitted the use of

common methods, which motivated the development in VI. Unfortunately

though, only upper limits could be computed for the serious risks in VII, but

at least it seems very likely that those limits are valid: (i) treatment is short-

term in a moderately ill patient population; and (ii) a high threshold was

employed for seriousness, which should ensure lower under-reporting of

these events than in general with methylprednisolone. (Cf. assumptions (i)

and (ii) discussed in Section 3.2.2.2.) The issue that there is no natural distri-

bution to employ for the risk up to the upper limit was handled as part of the

sensitivity analysis, as outlined in Section 3.2.2.3.

Finally, the utility variables considered in VII were completely handled

within the modelling framework developed in IV and V. Reassuringly this

was feasible both computationally and clinically, despite the considerably

greater complexity of the assessment in VII than those considered previ-

ously in V.

3.2.3.2 Structural modelling

The task of selecting adverse effects to consider in benefit-risk assessments

is certainly more challenging for corticosteroids like methylprednisolone

than for many other drugs, given their breadth of pharmacological effects,

good and bad [89, 173]. The approach in VII contains some ideas that might

be of general applicability, which merits a brief description. To put this work

into perspective, it is worth stressing that the available literature contains

some very simplistic approaches. One example is the assessment of natali-

zumab and interferon ß-1a in MS, where all possible adverse effects from

both treatments, with the sole exception of PML, were considered to be one

and the same [35]. (See also Table 2.)

In VII, no differentiation was made between the various non-serious ad-

verse effects, e.g. insomnia, dyspepsia, and transient hyperglycaemia. This

was to reduce complexity of the model while retaining the primary influence

from these non-serious adverse effects on the benefit-risk balance, namely

their aggregated burden. Serious adverse effects, on the other hand, were

included individually, chosen separately for the two active treatment alterna-

tives, high- and low-dose methylprednisolone. The selection was performed

by a clinical expert who reviewed the list of ADR terms reported with

62

methylprednisolone in VigiBase, tabulated in decreasing reporting frequency

for low- and high-dose methylprednisolone separately. Terms were prelimi-

narily mapped to serious adverse effects that were generated on the fly, until

ten effects had been created for each alternative. The adverse effects were

then carefully defined as groups of ADR terms.

Selecting adverse effects for this purpose based on individual case report-

ing is, as far as we are aware, not a concept that has been formalised before.

While such reporting is fraught with biases, it does reflect the concerns and

complications that have arisen in clinical practice. The number of included

serious adverse effects is clearly arbitrary, and in this example it was set

high to reflect methylprednisolone’s variety of significant adverse effects. In

another example, a lower number would probably have been sufficient.

Three different types of seriousness were considered as potentially occur-

ring for each of the serious adverse effects chosen: lethal reactions, episodes

of persistent disability, and life-threatening reactions. In recognition of the

qualitative differences between these, each combination of adverse effect

and serious outcome that was reported sufficiently often with methylpredni-

solone was considered an individual entity of the model.

As recommended [144], a symmetrical decision tree was constructed, so

that all considered effects were taken into account in the same way for all of

the alternatives. The first level contained a single chance node corresponding

to the binary beneficial effect. The second level contained one chance node

with one edge for each serious adverse effect, and one edge for the scenario

where no such adversity occurs. Hence, the serious adverse effects were

considered mutually exclusive. This reduces complexity considerably and

should be quite reasonable given the rarity of these adverse effects. The third

and final level contained one chance node for each serious adverse effect,

reflecting their respective possible outcomes. Only in the scenario without

any serious adverse effect was the possible contraction of a non-serious ad-

verse effect considered; this too is a simplifying approximation.

The full decision tree is displayed in Figure VII:3. It is worth noticing

that the discovery of hepatotoxicity in relation to methylprednisolone use,

presented in III, was not merely an eye-opener. Rather, this adverse effect

was a proper component of the actual benefit-risk assessment.

3.2.3.3 Results and conclusions

While the benefit-risk assessment of methylprednisolone in MS relapses

presented in VII fails to provide definitive clinical recommendations, it does

bring valuable insights. In particular, based on the data that are available

today, high-dose methylprednisolone clearly cannot be ruled out as a useful

treatment of MS relapses, but effectively low-dose methylprednisolone can,

at least comparatively speaking. At the same time, the analysis discloses the

fact that there really is insufficient data to properly evaluate the low-dose

alternative; for example, its risk of non-serious adverse effects could not be

63

estimated directly from data. The analysis also shows that there generally is

substantial uncertainty with respect to all probability variables: for effective-

ness and non-serious risks this is manifested as wide posterior distributions,

and for serious risks as wide intervals for which the choice of overlying dis-

tributions has great impact on the overall results. All of this suggests that

more research, both clinical and pharmacoepidemiological, is needed to be

able to confidently establish the benefit-risk profile of methylprednisolone in

MS relapse management. Finally, the assessment sensibly suggests that the

benefit-risk profile becomes increasingly favourable with increasing severity

of the relapse.

Although not reported in VII, it may be of some interest within the con-

text of this thesis to provide an account of the influence specifically from

hepatotoxicity on the overall benefit-risk assessment. The most straightfor-

ward approach to evaluating this influence is clearly to carry out the entire

assessment without including hepatotoxicity among the adverse effects. The

observed impact from doing so is modest: the biggest change in any alterna-

tive’s preference rate in any of the investigated sensitivity analysis scenarios

is merely 4 percentage points, and the fraction of all scenarios where the

most preferred alternative changes is less than 2%. Whereas these results are

not surprising considering how small this risk of hepatotoxicity is likely to

be, they do point to important challenges for pharmacovigilance. Because

most signals concern risks of about this magnitude, and therefore are very

unlikely to pose a major threat to the overall benefit-risk balance, communi-

cations concerning new risks must be tailored accordingly.

3.3 Empirical appraisals

In direct analogy with Section 2.3, the following discussion seeks to review

the available empirical basis as regards the contribution from the proposed

methods to fulfilling the overall thesis aim. At issue in this section is the

approach for qualitative utility modelling (see Section 3.2.1) and the frame-

work for risk quantification in collections of individual case reports (see

Section 3.2.2), proposed in response to objectives (ii) and (iii) in Section 1.3,

respectively.

3.3.1 Qualitative utility modelling

Objective (ii) is to enable qualitative preference information with respect to

relevant drug effects to be incorporated into quantitative benefit-risk as-

sessment, which is factually achieved: the utility modelling in VII relies

entirely on the methods devised in IV and V. It is much more difficult to

claim, on an empirical basis, that these methods have facilitated a more effi-

cient, more accurate, or more transparent benefit-risk assessment.

64

Publication IV does not require much discussion since it does not contain

any empirical studies. It does include a small fictional example that high-

lights the increased potential for discriminative evaluation that comes with

the use of a probabilistic framework rather than one based on intervals.

However, this example is not of relevance with respect to the aim, and has

nothing to do with benefit-risk assessment.

As mentioned in Section 3.2.1, publication V contains a comparison be-

tween the proposed approach for qualitative utility modelling and an ap-

proach where utilities are calculated as QALYs. The comparison is based on

three publicly available benefit-risk assessments. To enable as fair as possi-

ble a comparison, the structures of the decision problems and the distribu-

tions for all probability variables were identical for the two approaches.

Overall there was substantial agreement, although in one of the case studies

the approach presented here identified the possibility of a reversed treatment

recommendation for highly risk-averse patients.

There are multiple issues with this comparison. Perhaps most serious is

the absence of a proper benchmark, which makes it very difficult to interpret

the results. In particular, is it advantageous for the method proposed here to

conform to the already available method? This issue points to a more fun-

damental concern when trying to evaluate benefit-risk assessment frame-

works, which is the near impossibility to construct a benchmark. The reason

is that the desirability dimension is largely subjective, which suggests that

accuracy for a benefit-risk assessment framework cannot be easily measured

on the basis of correct or incorrect recommendations. Rather, accuracy

hinges on much more subtle properties, such as including relevant informa-

tion, and processing that information in a sound and balanced manner. It

may be possible to evaluate such properties qualitatively by querying ana-

lysts about their experiences with different frameworks, although it is likely

to be very challenging.

Another issue with the comparison in V is the fact that even though con-

siderable effort was put into isolating the effect of accommodating qualita-

tive information, there is one other significant difference between the two

approaches. Whereas the use of QALYs relies on aggregating health state

utilities over time, and therefore essentially puts a fixed weight on the time

aspect, the approach devised in V assigns utility variables to composite

clinical outcomes, i.e. entire decision tree branches. This dissimilarity is

likely to be more important for the observed discrepancy in one of the case

studies than the dissimilarity with respect to using qualitative information.

(For more elaboration, see the Discussion in V. The issue is also brought up

in Section 3.4.2.2.)

Further, by design the comparison in V is unable to demonstrate some of

the benefits with the presented framework for qualitative utility modelling

that could be hoped for, in view of the motivation behind objective (ii). More

specifically, by using already published assessments, it becomes impossible

65

to show that our proposed method can offer quicker and cheaper assess-

ments. To do so we would have to apply our method in parallel to other

methods in a prospective setting. This is practically a major challenge, for

several reasons. For example, it would be impossible to disregard experi-

ences gained from one method when turning to another. Also, any method

could probably be made quick and efficient by using whatever assumptions

that may be available (cf. Table 2), without being properly penalised in the

appraisal: even though the increase in efficiency from using questionable

assumptions could be costly in terms of accuracy, the latter aspect is very

difficult to measure, as discussed above.

To put the above discussion into perspective, one must not forget the ba-

sic fact that V contains a method comparison in the first place. In our experi-

ence this is a rare feature in the benefit-risk assessment domain.

In addition to the method comparison presented in V, the benefit-risk as-

sessment of methylprednisolone in MS relapses undertaken in VII has some

empirical value. It cannot in itself demonstrate that the devised method for

utility modelling offers transparency, accuracy, or efficiency in the benefit-

risk assessment process, but it shows that the method works in a real exam-

ple. As explained in Section 3.2.1.2, to combine probabilistic analysis with

qualitative information is a technical challenge, and computational complex-

ity will always be a concern. The decision problem considered in VII is

much larger than any of the case studies in V, so it was not a given that the

method would work. In addition, lots of valuable insights from practical use

were gained from the assessment in VII. Particularly clear was the require-

ment of having clinical expertise at hand for modelling purposes in large and

complex assessments.

In conclusion, there are features inherent to the qualitative utility model-

ling framework proposed in V that suggest this framework is likely to con-

tribute to the thesis aim, but that are difficult to demonstrate empirically. In

particular, in those cases where quantitative estimates of the desirability of

relevant drug effects are unreliable or missing, the proposed method should

be more accurate than methods making use of substitute values or ad hoc

assumptions. Also, it should be more efficient than approaches that set up

elicitation studies. Finally, the graphical display of qualitative relations (e.g.

in Figure VII:6) and the visualisation of the resulting utility distributions

(e.g. in Figure VII:7) should promote transparency.

3.3.2 Risk quantification for rare adverse effects

Objective (iii) is to elucidate what type of quantitative risk information for

rare adverse effects that may be available from collections of individual case

reports, and to specify how it could be extracted. As with the previous objec-

tive, the method that is devised – in this case the derivation of risk limits

from collections of individual case reports – clearly meets the objective.

66

However, the efforts accounted for in this thesis are insufficient to show that

the proposed method would bring advantages in terms of efficiency, accu-

racy, or transparency to the benefit-risk assessment process.

Indeed there are only two empirical investigations of this method alto-

gether: the example of Pandemrix-induced narcolepsy presented in VI, and

the use of the proposed method for the rare adverse effects in VII. Whereas

the former investigation does entail a comparison to an established method,

the restriction to a single drug-effect combination essentially precludes any

type of general conclusions. The latter investigation does not contain a com-

parison to another method of risk quantification, which is very natural: no

other risk estimates could be found for the adverse effects of interest. Nota-

bly this was the very reason for embarking on the approach in publication VI

in the first place. The application of this approach in VII did however pro-

vide valuable insights concerning the challenges of its accommodation

within a probabilistic framework (cf. Section 3.2.2.3). Specifically, the as-

sessment was sensitive to how the risks were distributed within their respec-

tive intervals, which might reflect their relatively low informational content.

Compared to the situation for the qualitative utility modelling framework

discussed in the previous section, there are somewhat greater prospects of

empirically evaluating the merits of the currently discussed method. The

main reason is that frequency, unlike desirability, is an objective entity in

this application domain. Consequently it is at least theoretically possible to

compare this method to more traditional methods of risk quantification, in

particular to learn more about the mechanics of the derived assumptions

under which the risk limits are valid. However, such investigations are far

from trivial, and some of the challenges are discussed under Limitations and

future directions, in publication VI.

Even though comparisons to other methods are considerably easier for

this approach than for the qualitative utility modelling framework, it is in

principle impossible to demonstrate its superiority. This is because the main

virtue with this approach to deriving quantitative risk information is its wide

applicability, which is anticipated to be far greater than that of other meth-

ods. Consequently one cannot construct a benchmark for the class of risks

where this newly proposed method is likely to be most useful.

In conclusion, the novel approach to quantifying the risk of adverse ef-

fects proposed in VI should be able to contribute to the thesis aim by facili-

tating benefit-risk assessments that would otherwise either be postponed or

require subjective guesswork with respect to some risks. In a sense, there-

fore, it should be able to improve both efficiency and accuracy of the bene-

fit-risk assessment process. However, the empirical investigations to date are

merely sufficient as preliminary quality assurance of the theoretical deriva-

tions. Studies to further validate the approach and learn about its strengths

and weaknesses are possible and should be conducted in the future. In par-

ticular it should be investigated how informative are these risk intervals.

67

3.4 Related work

3.4.1 Work related to specific contributions

3.4.1.1 Qualitative information in probabilistic decision analysis

As mentioned in Section 3.2.1.1, very little work, at least of which we are

aware, has dealt with qualitative information in probabilistic decision analy-

sis. One exception is the work by Tervonen et al. [163] which is not quite

applicable for comparison with IV and V since it is based on MCDA rather

than tree-based models. IV solves the problem of combining interval and

ordering constraints, while V permits arbitrary sets of qualitative relations

rather than complete orderings; neither of these features is available in [163].

A more recent method proposed by Kadziński and Tervonen, also in the

MCDA setting, does allow for arbitrary qualitative relations [174]. However,

relations are not applied to the criteria weights, but directly to a set of refer-

ence alternatives.

3.4.1.2 Risk quantification in collections of individual case reports

To the best of our knowledge, there are no other attempts to infer quantita-

tive risk information from individual case reports by a theoretically moti-

vated approach, as in VI. However, a few examples can be found of quanti-

fication on an ad hoc basis.

In some examples the number of reports is divided by an estimated num-

ber of patients, which is how the lower risk limit is computed in VI. How-

ever, the fact that there is under-reporting is not handled appropriately in

these examples. When estimating the risk of Guillain-Barré syndrome from

MCV4 vaccination in this way, an arbitrary factor of three was used to com-

pensate for under-reporting [43]; when estimating the risk of serious adverse

effects from glatiramer acetate, no adjustment at all was applied [32].

In other examples, collections of individual case reports are used to esti-

mate conditional probabilities of various outcomes after contraction of some

adverse effect [33, 34]. Because all reports on the adverse effect with the

drug are used as denominator, this method is likely to overestimate the con-

ditional probability of serious outcomes, in particular death.

3.4.1.3 Benefit-risk assessment of methylprednisolone in MS relapses

There are a few published quantitative benefit-risk assessments for drugs

used in MS [32, 35]. However, unlike VII none of them targets relapse man-

agement, which is very different from maintenance therapy in MS. There are

a number of systematic reviews that investigate methylprednisolone in MS

relapses [90, 175, 176], but insofar they contain quantitative analysis, it is

restricted to effectiveness. It is interesting that Miller et al. [90] reached

widely different conclusions regarding the comparative effectiveness of low-

68

and high-dose methylprednisolone than that in VII. The analyses differ with

respect to the considered endpoint: Miller et al. studied the mean difference

in the expanded disability status scale (EDSS) rather than the fraction of

patients with an improvement of at least one EDSS point. However, it seems

likely that it is rather the difference in included clinical trials that has driven

the divergence in results. Miller et al. included merely two trials for the

comparison between low- and high-dose methylprednisolone, one of which

reported very poor results for high-dose methylprednisolone: only 34% of

patients improved one EDSS step or more [177]. In VII, there are eight

groups of patients treated with high-dose methylprednisolone from seven

different studies, and none of them reports a fraction below 50%, except the

study just mentioned. (See also Related work in publication VII.)

3.4.2 Other benefit-risk assessment frameworks

This thesis adopts a general framework of probabilistic decision analysis for

benefit-risk assessment. More precisely, this framework relies on decision

trees where utilities are tied to clinical outcomes that correspond to complete

decision tree branches, each with a specific chain of events. The below dis-

cussion focuses on other frameworks of similar general applicability (cf. the

requirements proposed Section 1.2), and some methods or metrics that are of

relevance to such frameworks. Clearly this excludes a lot of proposed ap-

proaches, e.g. qualitative frameworks [178]. Again, interested readers are

referred to more general method reviews [12, 13, 15].

3.4.2.1 Multi-criteria decision analysis

It is an interesting observation that most of the generally applicable frame-

works for benefit-risk assessment are based on, or are related to, decision

analysis in some wider sense. MCDA is a common and important example

[28-32], which is generally applied to benefit-risk assessment in the follow-

ing way: Criteria, criteria scores, and criteria weights in MCDA correspond

roughly to consequence nodes, terminal probabilities, and utilities in deci-

sion trees. An example from a published MCDA model for benefit-risk as-

sessment of antidepressants can be used for illustration [30]. In this model,

the adverse effect nausea is one out of several criteria in the decision to

choose one of the available drugs. Further, for a given drug, the score on this

criterion is a linearly decreasing function of the probability to experience

nausea. Finally, the weight for nausea reflects the decision maker’s view on

the importance of this criterion to the decision, relative to the other available

criteria.

The suggested correspondence between criteria weights in MCDA and

utilities in tree-based decision analysis is clearly plausible. For example, a

serious adverse effect would probably be deemed important and would

therefore yield a high weight in MCDA. At the same time it would be as-

69

signed a low utility in a tree-based analysis. However, this anti-correlation is

certainly not perfect. In particular, the utility of a given consequence is unre-

lated to the difference in probability between the alternatives most and least

likely to yield that consequence. On the contrary, criteria weights in MCDA

take into account the difference in score between the best and the worst al-

ternative.

An important difference between tree-based evaluation and MCDA, as

used in this application, is that the latter framework appears structurally

more rigid. Sets of criteria corresponding to relevant effects are merely listed

and assigned weights without consideration of possible scenarios in clinical

reality. For example, a non-serious adverse effect like headache will be

given a single weight, which fails to recognise that the significance of a

headache depends to a large extent on what else happens to the patient. As a

very trivial example, the headache will not matter if the patient dies from her

disease or from some other adverse effect. It is true that criteria can be mod-

elled in hierarchies [28, 29], but this is primarily intended to aid in weight

elicitation. A potential benefit of MCDA is that criteria which do not directly

map to clinical outcomes can be included, e.g. abuse potential. However,

such mixing of criteria might also make interpretation more difficult.

Among the published benefit-risk assessments based on MCDA, only the

approach by Tervonen et al. [30, 31] has the capability of accommodating

uncertainty. As a matter of fact, their work is based on [163], the probabilis-

tic method discussed in relation to qualitative utility modelling in Section

3.4.1.1.

3.4.2.2 The paradigm of quality-adjusted life years

A major paradigm in medical decision analysis is to measure time spent in

various health states. QALYs have been used for a long time, in particular in

the area of health economics [179]. The QALY is today the preferred metric

among major governmental bodies such as NICE in the UK that make deci-

sions on what drugs should be reimbursed, based on the health they bring

relative to their cost [51]. Within this paradigm, value is measured as quality

of health integrated over time. Quality can be measured in many different

ways, but most relevant here is when quality is the same as utility in the de-

cision-theoretic sense. In this case, QALYs themselves represent utility,

provided that a series of conditions are fulfilled [122]. Most significantly, the

utility of any given health state is assumed to be additive over time and in-

dependent of the past.

The QALY has been proposed as a generic measure for benefit-risk as-

sessment [33, 121, 142], and QALY-based evaluations can be considered

standard due to their widespread use [33-37, 43, 180]. Uncertainty is typi-

cally accommodated probabilistically, for example within the incremental

net-benefit framework [36, 37].

70

Acknowledging the popularity of the QALY, it is argued in V that the

concept of multiplying health state utilities by time is questionable in the

benefit-risk assessment application. Sometimes time may be considered con-

siderably less important than utility as reflected by the characteristics of

some health state, for example a possible adverse effect. Patients may be

very reluctant to taking a risk of, say, being hospitalised due to acute liver

failure, even if followed by recovery in just two weeks. In the QALY para-

digm, a drug carrying a risk of an adverse effect such as that could never be

seriously penalised: Even if the state of being hospitalised due to acute liver

failure were given utility close to zero, this would subtract at most two qual-

ity-adjusted weeks, which is negligible if the evaluation has a time horizon

of a year or more. Therefore, the approach in V and VII is to allow utilities

to take time into account as well, which means that the importance of the

time dimension can be decreased or even increased depending on the situa-

tion. This approach has been characterised as theoretically superior but prac-

tically too complex [122]. Nevertheless, for the models considered in this

thesis it does work, and it appears to make a difference. In particular, the

results in V suggest a negative benefit-risk profile for alosetron given a suf-

ficiently high degree of risk-aversiveness for lethal outcomes. Previous as-

sessments based on adjusted time failed to recognise this possibility [34,

181], and it is likely that the discrepancy can be explained by the different

approaches to time: QALYs simply cannot capture the notion that weighing

the risk of losing time due to death may sometimes be less appropriate than

weighing the risk of death as such.

An important aspect of QALYs is the simplification that follows from ex-

cluding the time dimension from the actual utilities, and instead accounting

for time by integrating over it. This enables the use of more complex and

more flexible structural models than decision trees, e.g. Markov models [35,

36]. A recent advancement is the use of so called discrete event simulation

(DES), where individual patients are simulated separately and their progres-

sion over time may be updated as often as weekly [37, 181, 182]. Event

probabilities are allowed to vary with time and patient characteristics, and

each patient gains QALYs over time depending on what happens in the

simulation. In the end, the total number of QALYs gained for a cohort of

patients with the same treatment is summed up and compared to that for

other treatments.

Generally speaking there is an increase in structural flexibility from

MCDA, through decision trees with composite outcomes (i.e. the approach

here), through decision trees with utility aggregation over time (e.g. with

QALYs), to Markov models, and finally to DES. With this flexibility fol-

lows more and stronger assumptions, more parameters to be estimated, more

design choices to be made, and increased computational burden. What

framework is preferable in a given situation will depend on many factors,

including the desired accuracy and the urgency of the situation [183].

71

3.4.2.3 Direct preference elicitation

Stated-choice elicitation has been proposed as a useful quantitative method

to estimate the desirability of drug effects in the patient population relevant

to a particular benefit-risk assessment [38, 39]. This is a generic method

strongly connected to decision theory with wide applicability in the health-

care sector [184].

In this approach, one typically first identifies the important attributes of a

benefit-risk assessment, e.g. the seriousness or duration of the disease, or the

probability of a particular adverse effect. Then, plausible levels for these

attributes are specified, and hypothetical drugs are generated by combining

the various levels of the attributes with each other. Patients are then asked to

choose between pairs of these hypothetical drugs, and regression methods

are used to estimate the effects of the different attributes on the way that

patients choose between drugs. These estimates can then be used to compute

e.g. the maximum risk of a certain adverse effect that patients are willing to

take to achieve some defined level of improvement. The estimates can also

be used as quality weights for health states, and the resulting metric is then

called relative value-adjusted life years [181].

Stated-choice elicitation is interesting since it appears to really capture pa-

tients’ preferences. On the other hand, enrolling patients is time-consuming

and costly, not to mention the time required to perform the actual experi-

ments. As argued repeatedly in this thesis, time may be a crucial factor in

some benefit-risk assessment scenarios, particularly in the post-marketing

setting, which is why the qualitative utility modelling approach in V was

developed. However, direct preference elicitation may well become increas-

ingly more utilised in the future, especially if it can be performed already as

part of the pre-marketing investigations.

72

4 Conclusions and future directions

This thesis set out to devise quantitative methods to support the drug benefit-

risk assessment process. In line with the notion that medicines require con-

stant monitoring, regulation, and re-assessment, this process was defined to

entail the detection of new drug risks. Three generic and desirable properties

– efficiency, accuracy, and transparency – were deemed appropriate for the

appraisal of newly proposed methods. Likewise, three specific objectives

were defined, each pointing out an opportunity for method development that

could potentially enhance the benefit-risk assessment process with respect to

one or more of the listed properties. The first objective called for regression

to be used for first-pass screening in individual case reports to highlight pos-

sible new drug risks; the second called for a method by which qualitative

preference information could be incorporated into quantitative benefit-risk

assessment; and the third called for a formal approach to extract, if possible,

quantitative risk information from individual case reports.

In publication I, lasso logistic regression (LLR) is proposed as a novel

method for screening collections of individual case reports in adverse drug

reaction surveillance. The basis for this proposal is the presumed ability of

LLR to handle issues like masking and innocent bystander bias attributed to

disproportionality analysis, the standard screening approach. A retrospective

performance evaluation in I showed that LLR can unmask associations rela-

tive to disproportionality analysis, which enables earlier detection of new

adverse effects, which in turn may increase the efficiency of the benefit-risk

assessment process. However, the evaluation was unable to demonstrate

overall better performance with either method, which led to recommending

LLR as an interesting complement rather than a replacement of existing

methods. External to this thesis, subsequent evaluations of regression-based

screening in collections of individual case reports have shown variable re-

sults, and further studies are warranted. Future work should also formally

investigate the computational feasibility of regression-based screening, and

explore the possibility of including it as one component of the recently de-

veloped umbrella methods that combine several different aspects of strength

of evidence.

Of lesser empirical weight, but of considerable practical significance, are

the four published drug-adverse effect signals that have resulted from pro-

spective screening of VigiBase with LLR, followed by causality evaluation

as discussed in II. Within the context of this thesis, the link between high-

73

dose methylprednisolone and hepatotoxicity presented in III is particularly

relevant. This signal triggered the benefit-risk assessment of methylpredniso-

lone in multiple sclerosis relapses in publication VII, which provided a use-

ful case study for the methods proposed in response to the second and third

thesis objectives.

As regards the second objective, a need was identified for a method that

would permit qualitative relations between utilities of relevant drug effects

to be modelled flexibly within a probabilistic framework. It is apparent that

real-world benefit-risk assessments regularly lack quantitative information

on the desirability of important drug effects, which can be either quickly and

unsatisfactorily circumvented by ad hoc substitutions and assumptions, or

alternatively slowly and accurately addressed by a tailored elicitation study.

The approach devised in V, with strong support from the algorithms pre-

sented in IV, has the potential for quick and accurate evaluation by using

readily available qualitative information. In addition, both input and output

can be graphically displayed to enhance transparency.

Unfortunately it is very difficult to empirically demonstrate the postulated

virtues of this approach, in particular in scenarios where other methods are

unfeasible. Nevertheless, publication V contains a comparison to an estab-

lished method, on the basis of three published benefit-risk assessments. The

results show overall good agreement between the methods, which provides

at least a basic form of quality assurance for the utility modelling framework

presented here. This framework was also successfully applied in VII, which

shows that it has the potential to computationally manage large and complex

benefit-risk assessments. Still, further assessments should follow, to expose

the framework to additional constellations of beneficial and adverse effects.

To facilitate its use, and to reduce the risk of errors, a pilot modelling tool

should be developed. This would also increase the accessibility to others,

and thus increase the chances that the method comes into use elsewhere.

Finally, this framework should ideally be used in parallel to other methods to

gauge its strengths and weaknesses. Such an investigation might have to be

qualitative in nature, based on the practical experiences from end-users.

The third objective originates from the very practical problem that the

risks of serious adverse effects included in the benefit-risk assessment in VII

could not be quantified. However, as argued previously, this reflects a more

general issue. The observation that individual case reports are demonstrably

very important in regulatory benefit-risk assessments motivated the attempt

to harness quantitative risk information from this generally abundant data

source. The proposal presented in VI is a framework where, under certain

well-defined assumptions, one can use reporting ratios from collections of

individual case reports as upper risk limits, and report counts divided by

external estimates of drug exposure as corresponding lower limits.

Whereas this framework has the potential to facilitate benefit-risk assess-

ments that would otherwise either be postponed or require subjective guess-

74

work with respect to some risks, there are no investigations in this thesis

mighty of proving its contribution towards the overall aim. Publication VI

contains a single real-world example, which can only really serve to rule out

major errors in the theoretical derivations. The application of this framework

to the benefit-risk assessment in VII was a very useful exercise to learn more

about its properties. It could be accommodated within a probabilistic as-

sessment, although the overall results were clearly sensitive to the distribu-

tion of the risks within their respective intervals. This may be an indication

that the derived intervals are relatively low in terms of informational content.

At the same time – despite all uncertainty – the assessment was able to pro-

vide clinically valuable conclusions. In particular, low-dose methylpredniso-

lone was effectively dismissed as a viable alternative based on the currently

available information.

Clearly the framework presented in VI needs to be further validated and

tested. It is conceptually dissimilar to traditional methods in pharmacoepi-

demiology and pharmacovigilance, and considerable practical experience

will be required to fully appreciate its merits and limitations.

A common denominator of all methods devised in this thesis is their po-

tential to speed up the benefit-risk assessment process. As mentioned several

times, this is crucial in the post-marketing setting where patients are already

exposed, and decisions are required promptly. At the same time, no assess-

ment will be quicker than its slowest component. During the work with VII,

several tasks were identified that could have been performed much more

efficiently if proper tools had been in place. Significant examples include

identification of effects to include in the assessment, definition of those ef-

fects, external data collection, and data processing. Future work should in-

vestigate which parts of the process that could possibly be automated, and

what other support could be developed to help streamline the benefit-risk

assessment process. Human involvement is a must, but it should be maxi-

mally supported by appropriate methods and tools, and it should be restricted

to tasks where it is irreplaceable.

Another fundamental insight from the work with this thesis is the severe

challenges inherent to objectively testing methods and frameworks in the

benefit-risk assessment application. This concern has also been expressed by

others. A major issue is that roughly speaking one half of each assessment is

susceptible to subjectivity; and further, prospective test cases are few and far

in between. Nonetheless, this is a very important topic for the future, consid-

ering the breadth of available approaches that now exist. Given that more or

less each benefit-risk assessment is unique, it can be expected that no single

approach will be universally most useful. In this long-term endeavour, col-

laboration between different method contributors and potential end-users

may be the only possible way forward.

75

5 Acknowledgements and afterword

In my opinion, acknowledgment sections in PhD theses often make quite a

dull read. However, when faced with the challenge of writing one myself, I

admittedly got off my high horse more or less instantly. Who is to thank for

this thick book that bears my name only? In the end I decided to make the

list quite short, so do not despair if you were left out: it does not mean I do

not love you or care for you, it just means you did not directly contribute to

this thesis.

I wish to thank Kristin for being my most faithful supporter. The way you

have chosen to simply accept the chaos I have created around us is nothing

but amazing. It is no exaggeration to say that I had not been able to complete

this thesis without your support. I love you.

My parents and parents-in-law have been crucial during this endeavour,

not least during the past year. You have freed up so much valuable time for

me and continuously cared for my well-being. I cannot thank you enough.

Ralph, you are nothing short of an oracle, and to embark on this journey

together with you has been an absolutely fantastic experience. Your combi-

nation of wisdom, integrity, and humour is both remarkable and admirable.

Thanks for everything.

Love, you are one of the most interesting persons I have ever come

across, and it has been a pleasure working with you. Thank you so much for

supporting me and believing in me. Your ability to see solutions rather than

obstacles has been a true asset during all of these years.

Niklas and Andrew, you have been instrumental to my professional de-

velopment, without which this thesis had been impossible. Thank you both

for your support and encouragement.

Thank you David for helping me with publication I. Similarly, thanks to

Anita and Ermelinda for your help with III.

Thanks Mats for your support and guidance during this time.

Finally I wish to thank you, whoever you may be and whenever you may

be reading this. Your attention in this very moment is the best praise I could

ask for. I am proud of this thesis, and if you have been inspired or enlight-

ened – or even upset – while reading it, all the effort I have put into it was

definitely worthwhile. And if you feel absolutely nothing, well, at least I got

my degree.

Ola Caster in Uppsala, February 2014

76

6 Appendices

A Decision analysis and the need for absolute risk

Quantification of drug risks, whether in randomised clinical trials or obser-

vational pharmacoepidemiological studies, generally measures relative risk,

either as a risk ratio or an odds ratio [185]. Indeed, for case-control studies

this is the only option since groups are not defined by their exposure.

Whereas relative risk is clearly a valid metric to study cause and effect

[186], the point is sometimes made that absolute risk is more useful for deci-

sion making [187]. I wish to emphasise two aspects of this latter statement.

First, it can be supported by coarse arguments that hinge on common sense:

for example, a lethal risk of 2 in 100,000 with a drug versus a background

risk of 1 in 100,000 is not particularly worrisome; however, if the numbers

were 2 in 100 and 1 in 100, respectively, the drug would most likely not be

allowed to exist, even though the relative risk is the same. The ultimate im-

plication is that relative risk does not in general leverage sufficient informa-

tion for decision making as in the context of benefit-risk assessment. A sim-

ple example to show this is provided in Figure 3, where a risk of 0.4 with the

drug and 0.2 without yields a recommendation not to treat, whereas if the

risks were instead 0.1 and 0.05, respectively, treatment would be preferable.

Again, the relative risk is the same in both scenarios.

Secondly, whereas risk differences in some contexts may offer additional

insight compared to relative risks, they too generally contain less informa-

tion than expected utility, and therefore decision making coherent with one’s

preferences, requires. The implication is certainly a challenge for all con-

cerned with benefit-risk assessment: for all relevant outcomes, we must con-

sider the probability with each and every included alternative. An analogous

example to that for relative risk is provided for risk differences in Figure 4:

when the risk with the drug is 0.1 compared to 0.0 without, the preferred

action is to abstain from treatment. However, when the risks are instead 0.4

and 0.3, respectively, the recommendation is reversed even though the risk

difference is unchanged. As has been pointed out earlier, this suggests that

metrics based on risk differences, like NNH and NNT, are not appropriate as

general-purpose benefit-risk assessment approaches [12]. Extensions to in-

corporate utilities into such metrics (e.g. [188]) do not overcome the funda-

mental issue associated with risk differences.

77

Figure 3. Fictional benefit-risk decision problem where either alternative can be made the preferred one without changing the relative risk for the adverse reaction. For example, if and , then ‘Drug’ and ‘No treatment’ obtain ex-pected utilities of 0.79 and 0.86, respectively. However, if and , then the corresponding values are 0.94 and 0.935, respectively. Although the relative risk is in both scenarios, ‘No treatment’ is preferred in the former scenar-io and ‘Drug’ in the latter.

78

Figure 4. Fictional benefit-risk decision problem where either alternative can be made the preferred one without changing the risk difference for the adverse reaction. For example, if and , then ‘Drug’ and ‘No treatment’ obtain ex-pected utilities of 0.897 and 0.9, respectively. However, if and , then the corresponding values are 0.768 and 0.765, respectively. Although the risk difference is in both scenarios, ‘No treatment’ is preferred in the for-mer scenario and ‘Drug’ in the latter.

79

B Monetary objectives in benefit-risk assessment

For policy-level benefit-risk assessments, an objective additional to maxi-

mising patients’ health may be to spend money as efficiently as possible

within the healthcare system. For example, assume that drug A has a slightly

better benefit-risk profile than drug B (for the same indication), and so

would yield a slightly higher level of health for patients overall; at the same

time, drug B is considerably cheaper. It could be argued in this example that,

at the societal level, the small loss in health for this patient group that would

result from using drug B instead of drug A is acceptable given the monetary

resources that are saved for alternative spending in other parts of the health-

care system. Issues of this type have been a matter for scientific investigation

for a long time [179]. The discipline is now usually referred to as health

technology assessment and is practiced for example by NICE in the UK [51].

However, the scope of this thesis is benefit-risk assessment of drugs, and

monetary objectives are considered to be outside of this scope. This is in line

with regulatory guidelines for benefit-risk assessment [20]. Nonetheless,

there are obvious advantages with using a benefit-risk assessment framework

that can be adopted for health technology assessment. Although it is not

empirically investigated in this thesis, such adoption should be possible

within the currently discussed framework by including a monetary dimen-

sion into the utility, as mentioned in Section 3.1.3.

C Technical notes on qualitative utility modelling

Here follows a brief discussion of the proposed computational solutions for

qualitative utility modelling in probabilistic decision analysis, presented in

IV and V and summarised in Section 3.2.1.3. The focus is on alternative

computational approaches and possible extensions; the notation follows that

of Section 3.2.1.3.

A first natural question is whether it would be possible to extend the

framework to other types of distributions, ideally in a way so that, for exam-

ple, one variable could follow a uniform distribution and another a triangular

distribution. This is interesting at least from a general decision-analytical

perspective, though not necessarily for the analyses considered in this thesis,

where the uniform distribution is a most reasonable choice (see Section

3.2.1.4). Technically, the restriction to uniform distributions is largely a result of

the choice to use ITM, since this technique puts high algebraic requirements

on the joint density function of . In particular, it must factor into

a series of conditional density functions, each of which must have a corre-

sponding distribution function that is analytically invertible. All proposed

solutions in this thesis are based on a core algorithm of equidistributed uni-

80

form variables and a strict ordering. In this case the joint density function

and its factorisation are well known from the area of order statistics, and the

invertibility requirement is fulfilled. (The core algorithm is described in IV

as Algorithm 2 with distribution functions specified in Equation (7), and in

V as Algorithm 1.)

However, it is possible that other sampling methods could offer an exten-

sion to permit more informed types of distributions than the uniform.

Markov chain Monte Carlo (MCMC) sampling is a general-purpose sam-

pling approach widely used throughout statistics [189], and therefore an

obvious candidate. MCMC is really a family of methods, and exactly how it

could be used in this application is a matter of further investigation. The

Metropolis-Hastings algorithm [190, 191] is one of the most frequently used,

but it requires simultaneous updating in all dimensions and may not be fea-

sible for problems with many utility variables and/or an arbitrary set of

qualitative relations. Multiple try Metropolis is a more recent algorithm with

better performance for high dimensionality [192], though the challenge with

arbitrary relations is still significant. Gibbs sampling [193] and slice sam-

pling [194] both permit updating one variable at the time, which may miti-

gate the dimensionality issue. However, both methods presuppose support on

the entire space [170], which seems to preclude use in this applica-

tion.

Given that the core algorithm mentioned above is based on equidistrib-

uted uniform variables and a strict ordering, another question is whether

ITM really is needed. It would be conceptually simpler to fill an ma-

trix with random samples from the common uniform distribution, and then

sort that matrix row-wise. While both approaches require sampling of

uniform random numbers, the subsequent computational requirements differ.

The sorting approach amounts to sorting vectors of length , which has a

time complexity of unless it is possible to utilise parallel

computing. Using ITM, the sampling step is followed by computing

th roots and performing subtractions, additions, and

multiplications. (See Equation (7) of IV for details.) This has a time

complexity of , and ITM should therefore generally be computa-

tionally preferable.

81

7 Bibliography

1. Roush SW, Murphy TV, Basket MM, Iskander JK, Moran JS, Seward JF, Wasley A. Historical comparisons of morbidity and mortality for vaccine-preventable diseases in the United States. Journal of the American Medical Association, 2007. 298(18):2155-2163.

2. Daneman D. Type 1 diabetes. Lancet, 2006. 367(9513):847-858. 3. Sterne JAC, Hernán MA, Ledergerber B, Tilling K, Weber R, Sendi

P, Rickenbach M, Robins JM, Egger M, Battegay M, Bernasconi E, Böni J, Bucher H, Bürgisser P, Cattacin S, Cavassini M, Dubs R, Elzi L, Erb P, Fantelli K, Fischer M, Flepp M, Fontana A, Francioli P, Furrer H, Gorgievski M, Günthard H, Hirschel B, Kaiser L, Kind C, Klimkait T, Lauper U, Opravil M, Paccaud F, Pantaleo G, Perrin L, Piffaretti JC, Rudin C, Schmid P, Schüpbach J, Speck R, Telenti A, Trkola A, Vernazza P, Yerly S. Long-term effectiveness of potent antiretroviral therapy in preventing AIDS and death: A prospective cohort study. Lancet, 2005. 366(9483):378-384.

4. Chabner BA, Roberts Jr TG. Chemotherapy and the war on cancer. Nature Reviews Cancer, 2005. 5(1):65-72.

5. Lazarou J, Pomeranz BH, Corey PN. Incidence of adverse drug reactions in hospitalized patients - A meta-analysis of prospective studies. Jama-Journal of the American Medical Association, 1998. 279(15):1200-1205.

6. Pirmohamed M, James S, Meakin S, Green C, Scott AK, Walley TJ, Farrar K, Park BK, Breckenridge AM. Adverse drug reactions as cause of admission to hospital: prospective analysis of 18,820 patients. British Medical Journal, 2004. 329(7456):15-19.

7. Department for Transport. Reported Road Casualties Great Britain: 2010. 2011; Available from: http://assets.dft.gov.uk/statistics/releases/road-accidents-and-safety-annual-report-2010/rrcgb2010-complete.pdf.

8. Vogel D. The globalization of pharmaceutical regulation. Governance - An International Journal of Policy and Administration, 1998. 11(1):1-22.

9. Hansson SO. Weighing risks and benefits. Topoi - An International Review of Philosophy, 2004. 23(2):145-152.

10. Yuan Z, Levitan B, Berlin JA. Benefit-risk assessment: To quantify or not to quantify, that is the question. Pharmacoepidemiology and Drug Safety, 2011. 20(6):653-656.

http://assets.dft.gov.uk/statistics/releases/road-accidents-and-safety-annual-report-2010/rrcgb2010-complete.pdf

http://assets.dft.gov.uk/statistics/releases/road-accidents-and-safety-annual-report-2010/rrcgb2010-complete.pdf

82

11. Leong J, McAuslane N, Walker S, Salek S. Is there a need for a universal benefit-risk assessment framework for medicines? Regulatory and industry perspectives. Pharmacoepidemiology and Drug Safety, 2013. 22(9):1004-1012.

12. European Medicines Agency (EMA). Benefit-risk methodology project - Work package 2 report: Applicability of current tools and processes for regulatory benefit-risk assessment. 2010; Available from: http://www.ema.europa.eu/docs/en_GB/document_library/Report/2010/10/WC500097750.pdf.

13. Pharmacoepidemiological Research on Outcomes of Therapeutics by a European ConsorTium (PROTECT). Review of methodologies for benefit and risk assessment of medication. 2013; Available from: http://www.imi-protect.eu/documents/ShahruletalReviewofmethodologiesforbenefitandriskassessmentofmedicationMay2013.pdf.

14. Food and Drug Administration (FDA). Structured Approach to Benefit-Risk Assessment in Drug Regulatory Decision-Making. 2013; Available from: http://www.fda.gov/downloads/ForIndustry/UserFees/PrescriptionDrugUserFee/UCM329758.pdf.

15. Guo JJ, Pandey S, Doyle J, Bian BY, Lis Y, Raisch DW. A Review of Quantitative Risk-Benefit Methodologies for Assessing Drug Safety and Efficacy - Report of the ISPOR Risk-Benefit Management Working Group. Value in Health, 2010. 13(5):657-666.

16. Evans SJW. Pharmacovigilance: a science or fielding emergencies? Statistics in Medicine, 2000. 19(23):3199-3209.

17. Wysowski DK, Swartz L. Adverse drug event surveillance and drug withdrawals in the united states, 1969-2002: The importance of reporting suspected reactions. Archives of Internal Medicine, 2005. 165(12):1363-1369.

18. CIOMS Working Group XIII. Practical Aspects of Signal Detection in Pharmacovigilance. 2010. CIOMS: Geneva, Switzerland.

19. CIOMS Working Group IV. Benefit-Risk Balance for Marketed Drugs: Evaluating Safety Signals. 1998. CIOMS: Geneva, Switzerland.

20. International Conference on Harmonisation (ICH). ICH harmonised tripartite guideline: Periodic Benefit-Risk Evaluation Report (PBRER). 2012; Available from: http://www.ich.org/fileadmin/Public_Web_Site/ICH_Products/Guidelines/Efficacy/E2C/E2C_R2_Step4.pdf.

21. Hauben M, Noren GN. A Decade of Data Mining and Still Counting. Drug Safety, 2010. 33(7):527-534.

22. Bate A, Lindquist M, Edwards IR, Olsson S, Orre R, Lansner A, De Freitas RM. A Bayesian neural network method for adverse drug

http://www.ema.europa.eu/docs/en_GB/document_library/Report/2010/10/WC500097750.pdf

http://www.ema.europa.eu/docs/en_GB/document_library/Report/2010/10/WC500097750.pdf

http://www.imi-protect.eu/documents/ShahruletalReviewofmethodologiesforbenefitandriskassessmentofmedicationMay2013.pdf



http://www.fda.gov/downloads/ForIndustry/UserFees/PrescriptionDrugUserFee/UCM329758.pdf

http://www.fda.gov/downloads/ForIndustry/UserFees/PrescriptionDrugUserFee/UCM329758.pdf

http://www.ich.org/fileadmin/Public_Web_Site/ICH_Products/Guidelines/Efficacy/E2C/E2C_R2_Step4.pdf

http://www.ich.org/fileadmin/Public_Web_Site/ICH_Products/Guidelines/Efficacy/E2C/E2C_R2_Step4.pdf

83

reaction signal generation. European Journal of Clinical Pharmacology, 1998. 54(4):315-321.

23. Evans SJW, Waller PC, Davis S. Use of proportional reporting ratios (PRRs) for signal generation from spontaneous adverse drug reaction reports. Pharmacoepidemiology and Drug Safety, 2001. 10(6):483-486.

24. DuMouchel W. Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system. American Statistician, 1999. 53(3):177-190.

25. Egberts ACG, Meyboom RHB, van Puijenbroek EP. Use of measures of disproportionality in pharmacovigilance - Three Dutch examples. Drug Safety, 2002. 25(6):453-458.

26. Hauben M, Madigan D, Gerrits CM, Walsh L, Van Puijenbroek EP. The role of data mining in pharmacovigilance. Expert Opinion on Drug Safety, 2005. 4(5):929-948.

27. Genkin A, Lewis DD, Madigan D. Large-scale bayesian logistic regression for text categorization. Technometrics, 2007. 49(3):291-304.

28. Mussen F, Salek S, Walker S. A quantitative approach to benefit-risk assessment of medicines-part 1: The development of a new model using multi-criteria decision analysis. Pharmacoepidemiology and Drug Safety, 2007. 16:S2-S15.

29. Felli JC, Noel RA, Cavazzoni PA. A Multiattribute Model for Evaluating the Benefit-Risk Profiles of Treatment Alternatives. Medical Decision Making, 2009. 29(1):104-115.

30. Tervonen T, van Valkenhoef G, Buskens E, Hillege HL, Postmus D. A stochastic multicriteria model for evidence-based decision making in drug benefit-risk analysis. Statistics in Medicine, 2011. 30(12):1419-1428.

31. Van Valkenhoef G, Tervonen T, Zhao J, De Brock B, Hillege HL, Postmus D. Multicriteria benefit-risk assessment using network meta-analysis. Journal of Clinical Epidemiology, 2012. 65(4):394-403.

32. Qizilbash N, Mendez I, Sanchez-de La Rosa R. Benefit-Risk Analysis of Glatiramer Acetate for Relapsing-Remitting and Clinically Isolated Syndrome Multiple Sclerosis. Clinical Therapeutics, 2012. 34(1):159-176.e5.

33. Hughes DA, Bayoumi AM, Pirmohamed M. Current assessment of risk-benefit by regulators: Is it time to introduce decision analyses? Clinical Pharmacology & Therapeutics, 2007. 82(2):123-127.

34. Ladabaum U. Safety, efficacy and costs of pharmacotherapy for functional gastrointestinal disorders: the case of alosetron and its implications. Alimentary Pharmacology & Therapeutics, 2003. 17(8):1021-1030.

35. Thompson JP, Noyes K, Dorsey ER, Schwid SR, Holloway RG. Quantitative risk-benefit analysis of natalizumab. Neurology, 2008. 71(5):357-64.

84

36. Sadatsafavi M, Marra C, Marra F, Moran O, Fitzgerald JM, Lynd L. A quantitative benefit-risk analysis of isoniazid for treatment of latent tuberculosis infection using incremental benefit framework. Value in Health, 2013. 16(1):66-75.

37. Lynd LD, Marra CA, Najafzadeh M, Sadatsafavi M. A quantitative evaluation of the regulatory assessment of the benefits and risks of rofecoxib relative to naproxen: An application of the incremental net-benefit framework. Pharmacoepidemiology and Drug Safety, 2010. 19(11):1172-1180.

38. Johnson FR, Hauber AB, Ozdemir S, Lynd L. Quantifying Women's Stated Benefit-Risk Trade-Off Preferences for IBS Treatment Outcomes. Value in Health, 2010. 13(4):418-423.

39. Johnson FR, Ozdemir S, Mansfield C, Hass S, Miller DW, Siegel CA, Sands BE. Crohn's disease patients' risk-benefit preferences: Serious adverse event risks versus treatment efficacy. Gastroenterology, 2007. 133(3):769-779.

40. Tengs TO, Wallace A. One thousand health-related quality-of-life estimates. Medical Care, 2000. 38(6):583-637.

41. Fryback DG, Dasbach EJ, Klein R, Klein BEK, Dorn N, Peterson K, Martin PA. The Beaver Dam health outcomes study - Initial catalog of health-state quality factors. Medical Decision Making, 1993. 13(2):89-102.

42. Salomon JA, Murray CJL. A multi-method approach to measuring health-state valuations. Health Economics, 2004. 13(3):281-290.

43. Cho B-H, Clark TA, Messonnier NE, Ortega-Sanchez IR, Weintraub E, Messonnier ML. MCV vaccination in the presence of vaccine-associated Guillain-Barré Syndrome risk: A decision analysis approach. Vaccine, 2010. 28(3):817-822.

44. Coloma PM, Trifirò G, Schuemie MJ, Gini R, Herings R, Hippisley-Cox J, Mazzaglia G, Picelli G, Corrao G, Pedersen L, van der Lei J, Sturkenboom M. Electronic healthcare databases for active drug safety surveillance: Is there enough leverage? Pharmacoepidemiology and Drug Safety, 2012. 21(6):611-621.

45. Yousry TA, Major EO, Ryschkewitsch C, Fahle G, Fischer S, Hou J, Curfman B, Miszkiel K, Mueller-Lenke N, Sanchez E, Barkhof F, Radue E-W, Jäger HR, Clifford DB. Evaluation of Patients Treated with Natalizumab for Progressive Multifocal Leukoencephalopathy. New England Journal of Medicine, 2006. 354(9):924-933.

46. Arnaiz JA, Carne X, Codina C, Ribas J, Trilla A. The use of evidence in pharmacovigilance - Case reports as the reference source for drug withdrawals. European Journal of Clinical Pharmacology, 2001. 57(1):89-91.

47. Clarke A, Deeks JJ, Shakir SAW. An assessment of the publicly disseminated evidence of safety used in decisions to withdraw medicinal products from the UK and US markets. Drug Safety, 2006. 29(2):175-181.

85

48. Olivier P, Montastruc JL. The nature of the scientific evidence leading to drug withdrawals for pharmacovigilance reasons in France. Pharmacoepidemiology and Drug Safety, 2006. 15(11):808-812.

49. Alves C, Macedo A, Marques F. Sources of information used by regulatory agencies on the generation of drug safety alerts. European Journal of Clinical Pharmacology, 2013. 69(12):2083-2094.

50. Doubilet P, Begg CB, Weinstein MC, Braun P, McNeill BJ. Probabilistic sensitivity analysis using Monte Carlo simulation: A practical approach. Medical Decision Making, 1985. 5(2):157-177.

51. National Institute for Clinical Excellence (NICE). Guide to the methods of technology appraisal. 2008; Available from: http://www.nice.org.uk/media/b52/a7/tamethodsguideupdatedjune2008.pdf.

52. Claxton K, Sculpher M, McCabe C, Briggs A, Akehurst R, Buxton M, Brazier J, O'Hagan T. Probabilistic sensitivity analysis for NICE technology assessment: not an optional extra. Health Economics, 2005. 14(4):339-347.

53. Finney DJ. An International Drug Safety Program. The Journal of New Drugs., 1963. 3:262-265.

54. Rawlins MD. Spontaneous reporting of adverse drug reactions II: Uses. British Journal of Clinical Pharmacology, 1988. 26(1):7-11.

55. Brown JS, Kulldorff M, Chan KA, Davis RL, Graham D, Pettus PT, Andrade SE, Raebel MA, Herrinton L, Roblin D, Boudreau D, Smith D, Gurwitz JH, Gunter MJ, Platt R. Early detection of adverse drug events within population-based health networks: application of sequential testing methods. Pharmacoepidemiology and Drug Safety, 2007. 16(12):1275-1284.

56. Hocine MN, Musonda P, Andrews NJ, Paddy Farrington C. Sequential case series analysis for pharmacovigilance. Journal of the Royal Statistical Society Series a-Statistics in Society, 2009. 172:213-236.

57. Noren GN, Hopstadius J, Bate A, Star K, Edwards IR. Temporal pattern discovery in longitudinal electronic patient records. Data Mining and Knowledge Discovery, 2010. 20(3):361-387.

58. Coulter DM. The New Zealand Intensive Medicines Monitoring Programme. Pharmacoepidemiology and Drug Safety, 1998. 7(2):79-90.

59. Mann RD. Prescription-event monitoring - recent progress and future horizons. British Journal of Clinical Pharmacology, 1998. 46(3):195-201.

60. Wise L, Parkinson J, Raine J, Breckenridge A. New approaches to drug safety: a pharmacovigilance tool kit. Nature Reviews Drug Discovery, 2009. 8(10):779-82.

http://www.nice.org.uk/media/b52/a7/tamethodsguideupdatedjune2008.pdf

http://www.nice.org.uk/media/b52/a7/tamethodsguideupdatedjune2008.pdf

86

61. Noren GN, Edwards IR. Modern methods of pharmacovigilance: detecting adverse effects of drugs. Clinical Medicine, 2009. 9(5):486-489.

62. Edwards IR. Spontaneous reporting - of what? Clinical concerns about drugs. British Journal of Clinical Pharmacology, 1999. 48(2):138-141.

63. Wiholm BE, Olsson S, Moore N, Wood S. Spontaneous Reporting Systems Outside the United States. In Pharmacoepidemiology, edited by Strom B, 1994. Churchill Livingstone: New York. 139-157.

64. Rawlins MD. Spontaneous reporting of adverse drug reactions I: The data. British Journal of Clinical Pharmacology, 1988. 26(1):1-5.

65. Lindquist M. VigiBase, the WHO Global ICSR Database System: Basic facts. Drug Information Journal, 2008. 42(5):409-419.

66. Hauben M, Bate A. Decision support methods for the detection of adverse events in post-marketing data. Drug Discovery Today, 2009. 14(7-8):343-357.

67. Bate A, Lindquist M, Edwards IR. The application of knowledge discovery in databases to post-marketing drug safety: example of the WHO database. Fundamental & Clinical Pharmacology, 2008. 22(2):127-140.

68. Hand DJ, Bolton RJ. Pattern discovery and detection: A unified statistical methodology. Journal of Applied Statistics, 2004. 31(8):885-924.

69. Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD international conference on Management of data, 1993. Washington, D.C., United States: ACM. 207-216.

70. Norén GN, Hopstadius J, Bate A. Shrinkage observed-to-expected ratios for robust and transparent large-scale pattern discovery. Statistical Methods in Medical Research, 2013. 22(1):57-69.

71. Meyboom RHB, Hekster YA, Egberts ACG, Gribnau FWJ, Edwards IR. Causal or casual? The role of causality assessment in pharmacovigilance. Drug Safety, 1997. 17(6):374-389.

72. Edwards IR, Lindquist M, Wiholm BE, Napke E. Quality criteria for early signals of possible adverse drug reactions. Lancet, 1990. 336(8708):156-158.

73. Meyboom RHB, Lindquist M, Egberts ACG, Edwards IR. Signal selection and follow-up in pharmacovigilance. Drug Safety, 2002. 25(6):459-465.

74. Das G, Ronkainen P. Similarity of attributes by external probes. In KDD '98: Proceedings of the fourth ACM SIGKDD international conference on Knowledge discovery and data mining, 1998. New York City, New York, USA: AAAI Press. 23-29.

75. Zaiane OR, Han J. Discovering web access patterns and trends by applying OLAP and data mining technology on web logs. In ADL

87

'98: Proceedings of the IEEE International Forum on Research and Technology Advances in Digital Libraries, 1998. Santa Barbara, California. 19-29.

76. Melamed ID. Automatic construction of clean broad-coverage translation lexicons. In AMTA '96: Proceedings of the 1996 Conference of the Association for Machine Translation in the Americas, 1996. Montreal, Quebec, Canada.

77. Purcell P, Barty S. Statistical techniques for signal generation: The Australian experience. Drug Safety, 2002. 25(6):415-421.

78. Evans SJW. Statistics: analysis and presentation of safety data. In Stephen's Detection of New Adverse Drug Reactions, edited by Talbott J, Waller P, 2004. John Wiley & Sons: Chichester. 301-328.

79. Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 1996. 58(1):267-288.

80. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 2010. 33(1):1-22.

81. Wilson AD, Howell C, Waring WS. Venlafaxine ingestion is associated with rhabdomyolysis in adults: A case series. Journal of Toxicological Sciences, 2007. 32(1):97-101.

82. Hill AB. Environment and disease - Association or causation? Proceedings of the Royal Society of Medicine, 1965. 58(5):295-300.

83. Caster O. The Impact of Uncertain Causality on Benefit-Risk Assessment: Insights from a Hierarchical Complexity Model. Drug Safety, 2012. 35(10):919-919.

84. Dumouchel W, Fram D, Yang X, Mahmoud RA, Grogg AL, Engelhart L, Ramaswamy K. Antipsychotics, glycemic disorders, and life-threatening diabetic events: A bayesian data-mining analysis of the FDA adverse event reporting system (1968-2004). Annals of Clinical Psychiatry, 2008. 20(1):21-31.

85. Caster O, Noren GN, Madigan D, Bate A. Logistic Regression in Signal Detection: Another Piece Added to the Puzzle. Clinical Pharmacology & Therapeutics, 2013. 94(3):312-312.

86. Gutkowski K, Chwist A, Hartleb M. Liver injury induced by high-dose methylprednisolone therapy: A case report and brief review of the literature. Hepatitis Monthly, 2011. 11(8):656-661.

87. Hofstee HMA, Nanayakkara PWB, Stehouwer CDA. Acute hepatitis related to prednisolone. European Journal of Internal Medicine, 2005. 16(3):209-210.

88. Das D, Graham I, Rose J. Recurrent acute hepatitis in patient receiving pulsed methylprednisolone for multiple sclerosis. Indian Journal of Gastroenterology, 2006. 25(6):314-316.

89. Fardet L, Kassar A, Cabane J, Flahault A. Corticosteroid-induced adverse events in adults: Frequency, screening and prevention. Drug Safety, 2007. 30(10):861-881.

88

90. Miller DM, Weinstock-Guttman B, Bethoux F, Lee JC, Beck G, Block V, Durelli L, LaMantia L, Barnes D, Sellebjerg F, Rudick RA. A meta-analysis of methylprednisolone in recovery from multiple sclerosis exacerbations. Multiple Sclerosis, 2000. 6(4):267-273.

91. Sellebjerg F, Barnes D, Filippini G, Midgard R, Montalban X, Rieckmann P, Selmaj K, Visser LH, Sørensen PS. EFNS guideline on treatment of multiple sclerosis relapses: Report of an EFNS task force on treatment of multiple sclerosis relapses. European Journal of Neurology, 2005. 12(12):939-946.

92. Tatonetti NP, Ye PP, Daneshjou R, Altman RB. Data-driven prediction of drug effects and interactions. Science Translational Medicine, 2012. 4(125).

93. Lasser KE, Allen PD, Woolhandler SJ, Himmelstein DU, Wolfe SM, Bor DH. Timing of new black box warnings and withdrawals for prescription medications. Journal of the American Medical Association, 2002. 287(17):2215-2220.

94. Lindquist M, Stahl M, Bate A, Edwards IR, Meyboom RHB. A retrospective evaluation of a data mining approach to aid finding new adverse drug reaction signals in the WHO international database. Drug Safety, 2000. 23(6):533-542.

95. Alvarez Y, Hidalgo A, Maignen F, Slattery J. Validation of Statistical Signal Detection Procedures in EudraVigilance Post-Authorization Data. Drug Safety, 2010. 33(6):475-487.

96. Van Puijenbroek EP, Egberts ACG, Meyboom RHB, Leufkens HGM. Signalling possible drug-drug interactions in a spontaneous reporting system: Delay of withdrawal bleeding during concomitant use of oral contraceptives and itraconazole. British Journal of Clinical Pharmacology, 1999. 47(6):689-693.

97. DuMouchel W, Pregibon D. Empirical Bayes screening for multi-item associations. In KDD '01: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001. San Francisco, California: ACM. 67-76.

98. Norén GN, Sundberg R, Bate A, Edwards IR. A statistical methodology for drug-drug interaction surveillance. Statistics in Medicine, 2008. 27(16):3057-3070.

99. An LH, Fung KY, Krewski D. Mining Pharmacovigilance Data Using Bayesian Logistic Regression with James-Stein Type Shrinkage Estimation. Journal of Biopharmaceutical Statistics, 2010. 20(5):998-1012.

100. Oracle Health Sciences. Empirica Signal. 2011; Available from: http://www.oracle.com/us/industries/life-sciences/empirica-signal-ds-396068.pdf.

101. Harpaz R, Dumouchel W, Lependu P, Bauer-Mehren A, Ryan P, Shah NH. Performance of pharmacovigilance signal-detection algorithms for the FDA adverse event reporting system. Clinical Pharmacology & Therapeutics, 2013. 93(6):539-546.

http://www.oracle.com/us/industries/life-sciences/empirica-signal-ds-396068.pdf

http://www.oracle.com/us/industries/life-sciences/empirica-signal-ds-396068.pdf

89

102. Berlin C, Blanch C, Lewis DJ, Maladorno DD, Michel C, Petrin M, Sarp S, Close P. Are all quantitative postmarketing signal detection methods equal? Performance characteristics of logistic regression and Multi-item Gamma Poisson Shrinker. Pharmacoepidemiology and Drug Safety, 2012. 21(6):622-630.

103. Mantel N, Haenszel W. Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 1959. 22(4):719-748.

104. Hopstadius J, Norén GN, Bate A, Edwards IR. Impact of stratification on adverse drug reaction surveillance. Drug Safety, 2008. 31(11):1035-1048.

105. Wang HW, Hochberg AM, Pearson RK, Hauben M. An Experimental Investigation of Masking in the US FDA Adverse Event Reporting System Database. Drug Safety, 2010. 33(12):1117-1133.

106. Pariente A, Didailler M, Avillach P, Miremont-Salame G, Fourrier-Reglat A, Haramburu F, Moore N. A potential competition bias in the detection of safety signals from spontaneous reporting databases. Pharmacoepidemiology and Drug Safety, 2010. 19(11):1166-1171.

107. Juhlin K, Ye X, Star K, Norén GN. Outlier removal to uncover patterns in adverse drug reaction surveillance - a simple unmasking strategy. Pharmacoepidemiology and Drug Safety, 2013. 22(10):1119-1129.

108. Maignen F, Hauben M, Hung E, Van Holle L, Dogne JM. Assessing the extent and impact of the masking effect of disproportionality analyses on two spontaneous reporting systems databases. Pharmacoepidemiology and Drug Safety, 2013.

109. Van Holle L, Zeinoun Z, Bauchau V, Verstraeten T. Using time-to-onset for detecting safety signals in spontaneous reports of adverse events following immunization: A proof of concept study. Pharmacoepidemiology and Drug Safety, 2012. 21(6):603-610.

110. Strandell J, Caster O, Hopstadius J, Edwards IR, Norén GN. The development and evaluation of triage algorithms for early discovery of adverse drug interactions. Drug Safety, 2013. 36(5):371-388.

111. Caster O, Juhlin K, Watson S, Noren GN. A Paradigm Shift for Screening Individual Case Reports: Accounting for Quality and Content. Drug Safety, 2013. 36(9):918-918.

112. Van Holle L, Bauchau V. How Logistic Regression Can Combine the Two Causality Criteria of Strength of Association and Temporality into a Signal Detection Method. Drug Safety, 2013. 36(9):820-820.

113. Naranjo CA, Busto U, Sellers EM. A method for estimating the probability of adverse drug reactions. Clinical Pharmacology & Therapeutics, 1981. 30(2):239-245.

114. Edwards IR. Considerations on causality in pharmacovigilance. International Journal of Risk and Safety in Medicine, 2012. 24(1):41-54.

90

115. Edwards IR, Body D. Forensic pharmacovigilance. International Journal of Risk and Safety in Medicine, 2012. 24(1):1-2.

116. Lane DA, Kramer MS, Hutchinson TA, Jones JK, Naranjo C. The causality assessment of adverse drug reactions using a Bayesian approach. Pharmaceutical Medicine, 1987. 2(3):265-283.

117. Edwards IR, Wiholm BE, Martinez C. Concepts in risk-benefit assessment - A simple merit analysis of a medicine? Drug Safety, 1996. 15(1):1-7.

118. Jones TC. Call for a new approach to the process of clinical trials and drug registration. British Medical Journal, 2001. 322(7291):920-923.

119. van Luijn JCF, Gribnau FWJ, Leufkens HGM. Availability of comparative trials for the assessment of new medicines in the European Union at the moment of market authorization. British Journal of Clinical Pharmacology, 2007. 63(2):159-162.

120. Schneeweiss S. Developments in post-marketing comparative effectiveness research. Clinical Pharmacology & Therapeutics, 2007. 82(2):143-56.

121. Garrison LP, Towse A, Bresnahan BW. Assessing a structured, quantitative health outcomes approach to drug risk-benefit analysis. Health Affairs, 2007. 26(3):684-695.

122. Weinstein MC, Torrance G, McGuire A. QALYs: the basics. Value in Health, 2009. 12 Suppl 1:S5-9.

123. van Staa TP, Smeeth L, Persson I, Parkinson J, Leufkens HGM. Evaluating drug toxicity signals: is a hierarchical classification of evidence useful or a hindrance? Pharmacoepidemiology and Drug Safety, 2008. 17(5):475-484.

124. Laupacis A, Sackett DL, Roberts RS. An assessment of clinically useful measures of the consequences of treatment. New England Journal of Medicine, 1988. 318(26):1728-1733.

125. Waller PC, Evans SJW. A model for the future conduct of pharmacovigilance. Pharmacoepidemiology and Drug Safety, 2003. 12(1):17-29.

126. Kobelt G, Berg J, Atherly D, Hadjimichael O. Costs and quality of life in multiple sclerosis: A cross-sectional study in the United States. Neurology, 2006. 66(11):1696-1702.

127. Prosser LA, Kuntz KM, Bar-Or A, Weinstein MC. Patient and community preferences for treatments and health states in multiple sclerosis. Multiple Sclerosis, 2003. 9(3):311-319.

128. Longworth L, Bryan S. An empirical comparison of EQ-5D and SF-6D in liver transplant patients. Health Economics, 2003. 12(12):1061-1067.

129. Freedman AN, Yu B, Gail MH, Costantino JP, Graubard BI, Vogel VG, Anderson GL, McCaskill-Stevens W. Benefit/risk assessment for breast cancer chemoprevention with raloxifene or tamoxifen for women age 50 years or older. Journal of Clinical Oncology, 2011. 29(17):2327-2333.

91

130. LoBue PA, Moser KS. Use of isoniazid for latent tuberculosis infection in a public health clinic. American Journal of Respiratory and Critical Care Medicine, 2003. 168(4):443-447.

131. Nolan CM, Goldberg SV, Buskin SE. Hepatotoxicity associated with isoniazid preventive therapy: A 7-year survey from a public health tuberculosis clinic. JAMA, 1999. 281(11):1014-1018.

132. Howard RA. Decision Analysis: Applied Decision Theory. In Proceedings of the 4th International Conference on Operational Research, edited by Hertz DB, Melese J, 1966. Wiley Interscience. 55-77.

133. Keeney RL, Raiffa H. Decisions with Multiple Objectives, 1993. Cambridge, United Kingdom: Cambridge University Press.

134. Ramsey FP. Truth and Probability (1926). In The Foundations of Mathematics and other Logical Essays., edited by Braithwaite RB, 1931. Kegan, Paul, Trench, Trubner & Co.: London. 156-198.

135. von Neumann J, Morgenstern O. Theory of Games and Economic Behaviour, 1944. Princeton, NJ: Princeton University Press.

136. Savage LJ. The Foundations of Statistics. 2nd revised edition, 1972 (1st edition published in 1954 by John Wiley & Sons). New York: Dover Publications.

137. Clemens RT. Making Hard Decisions: An Introduction to Decision Analysis, 1997: Duxbury Press.

138. Howard RA. Decision Analysis: Practice and Promise. Management Science, 1988. 34(6):679-695.

139. Danielson M, Ekenberg L, Johansson J, Larsson A. The DecideIT decision tool. In Proceedings of ISIPTA '03, 2003. Lugano, Italy.

140. Klein K, Pauker SG. Recurrent deep venous thrombosis in pregnancy. Analysis of the risks and benefits of anticoagulation. Medical Decision Making, 1981. 1(2):181-202.

141. Lilford RJ, Pauker SG, Braunholtz DA, Chard J. Getting research findings into practice - Decision analysis and the implementation of research findings. British Medical Journal, 1998. 317(7155):405-409.

142. Briggs AH, Levy AR. Pharmacoeconomics and pharmacoepidemiology - Curious bedfellows or a match made in heaven? Pharmacoeconomics, 2006. 24(11):1079-1086.

143. Edwards IR. What are the real lessons from Vioxx? Drug Safety, 2005. 28(8):651-658.

144. Detsky AS, Naglie G, Krahn MD, Redelmeier DA, Naimark D. Primer on medical decision analysis: Part 2 - Building a tree. Medical Decision Making, 1997. 17(2):126-135.

145. Choquet G. Theory of capacities. Annales de l'institut Fourier, 1954. 5:131-295.

146. Dempster AP. Upper and Lower Probabilities Induced by a Multivalued Mapping. The Annals of Mathematical Statistics, 1967. 38(2):325-339.

92

147. Shafer G. A Mathematical Theory of Evidence, 1976. Princeton: Princeton University Press.

148. Dubois D, Prade H. Possibility theory : An approach to computerized processing of uncertainty, 1988. New York: Plenum Press.

149. Gärdenfors P, Sahlin N-E. Unreliable probabilities, risk taking, and decision making. Synthese, 1982. 53(3):361-386.

150. Smith CAB. Consistency in statistical inference and decision. Journal of the Royal Statistical Society Series B-Statistical Methodology, 1961. 23(1):1-37.

151. Walley P. Statistical reasoning with imprecise probabilities, 1991. London: Chapman and Hall.

152. Weichselberger K. The theory of interval-probability as a unifying concept for uncertainty. International Journal of Approximate Reasoning, 2000. 24(2-3):149-170.

153. Danielson M, Ekenberg L. A framework for analysing decisions under risk. European Journal of Operational Research, 1998. 104(3):474-484.

154. Jaffray JY. Rational decision making with imprecise probabilities. In Proceedings of ISIPTA '99, 1999. Ghent, Belgium.

155. Augustin T. On decision making under ambiguous prior and sampling information. In Proceedings of ISIPTA '01, 2001. Ithaca, New York.

156. Koerkamp BG, Weinstein MC, Stijnen T, Heijenbrok-Kal MH, Hunink MGM. Uncertainty and Patient Heterogeneity in Medical Decision Models. Medical Decision Making, 2010. 30(2):194-205.

157. Ades AE, Claxton K, Sculpher M. Evidence synthesis, parameter correlation and probabilistic sensitivity analysis. Health Economics, 2006. 15(4):373-381.

158. Pasta DJ, Taylor JL, Henning JM. Probabilistic sensitivity analysis incorporating the bootstrap: An example comparing treatments for the eradication of Helicobacter pylori. Medical Decision Making, 1999. 19(3):353-363.

159. Oakley JE, O'Hagan A. Probabilistic sensitivity analysis of complex models: a Bayesian approach. Journal of the Royal Statistical Society Series B-Statistical Methodology, 2004. 66:751-769.

160. Ekenberg L, Thorbiörnson J. Second-order decision analysis. International Journal of Uncertainty Fuzziness and Knowledge-Based Systems, 2001. 9(1):13-37.

161. Utkin LV, Augustin T. Decision Making with Imprecise Second-Order Probabilities. In Proceedings of ISIPTA '03, 2003. Lugano, Italy.

162. Ekenberg L, Andersson M, Danielson M, Larsson A. Distributions over Expected Utilities in Decision Analysis. In Proceedings of ISIPTA '07, 2007. Prague, Czech Republic.

93

163. Tervonen T, Lahdelma R. Implementing stochastic multicriteria acceptability analysis. European Journal of Operational Research, 2007. 178(2):500-513.

164. Zadeh LA. Fuzzy sets. Information and Control, 1965. 8(3):338-353.

165. Roubens M. Fuzzy sets and decision analysis. Fuzzy Sets and Systems, 1997. 90(2):199-206.

166. Anoop MB, Rao KB, Gopalakrishnan S. Conversion of probabilistic information into fuzzy sets for engineering decision analysis. Computers and Structures, 2006. 84(3-4):141-155.

167. Torres A, Nieto JJ. Fuzzy logic in medicine and bioinformatics. Journal of Biomedicine and Biotechnology, 2006.

168. Papageorgiou EI, Stylios CD, Groumpos PP. An Integrated Two-Level Hierarchical System for Decision Making in Radiation Therapy Based on Fuzzy Cognitive Maps. IEEE Transactions on Biomedical Engineering, 2003. 50(12):1326-1339.

169. Lahdelma R, Makkonen S, Salminen P. Two ways to handle dependent uncertainties in multi-criteria decision problems. Omega-International Journal of Management Science, 2009. 37(1):79-92.

170. Devroye L. Non-uniform random variate generation, 1986. New York, USA: Springer-Verlag.

171. De Boer A. When to publish measures of disproportionality derived from spontaneous reporting databases? British Journal of Clinical Pharmacology, 2011. 72(6):909-911.

172. Morris K. Implications of narcolepsy link with swine-influenza vaccine. The Lancet infectious diseases, 2013. 13(5):396-397.

173. Schäcke H, Döcke WD, Asadullah K. Mechanisms involved in the side effects of glucocorticoids. Pharmacology and Therapeutics, 2002. 96(1):23-43.

174. Kadziński M, Tervonen T. Robust multi-criteria ranking with additive value models and holistic pair-wise preference statements. European Journal of Operational Research, 2013. 228(1):169-180.

175. Filippini G, Brusaferri F, Sibley WA, Citterio A, Ciucci G, Midgard R, Candelise L. Corticosteroids or ACTH for acute exacerbations in multiple sclerosis. Cochrane database of systematic reviews, 2000(4):Art. No.: CD001331.

176. Burton JM, O'Connor PW, Hohol M, Beyene J. Oral versus intravenous steroids for treatment of relapses in multiple sclerosis. Cochrane database of systematic reviews (Online), 2012. 12.

177. Barnes D, Hughes RAC, Morris RW, Wade-Jones O, Brown P, Britton T, Francis DA, Perkin GD, Rudge P, Swash M, Katifi H, Farmer S, Frankel J. Randomised trial of oral and intravenous methylprednisolone in acute relapses of multiple sclerosis. Lancet, 1997. 349(9056):902-906.

178. Levitan BS, Andrews EB, Gilsenan A, Ferguson J, Noel RA, Coplan PM, Mussen F. Application of the BRAT framework to case studies:

94

Observations and insights. Clinical Pharmacology and Therapeutics, 2011. 89(2):217-224.

179. Weinstein MC, Stason WB. Foundations of Cost-Effectiveness Analysis for Health and Medical Practices. New England Journal of Medicine, 1977. 296(13):716-721.

180. Cross JT, Veenstra DL, Gardner JS, Garrison LP. Can Modeling of Health Outcomes Facilitate Regulatory Decision Making?: The Benefit-Risk Tradeoff for Rosiglitazone in 1999 vs. 2007. Clinical Pharmacology & Therapeutics, 2011. 89(3):429-436.

181. Lynd LD, Najafzadeh M, Colley L, Byrne MF, Willan AR, Sculpher MJ, Johnson FR, Hauber AB. Using the Incremental Net Benefit Framework for Quantitative Benefit–Risk Analysis in Regulatory Decision-Making—A Case Study of Alosetron in Irritable Bowel Syndrome. Value in Health, 2010. 13(4):411-417.

182. Caro JJ. Pharmacoeconomic Analyses Using Discrete Event Simulation. Pharmacoeconomics, 2005. 23(4):323-332.

183. Cooper K, Brailsford SC, Davies R. Choice of modelling technique for evaluating health care interventions. Journal of the Operational Research Society, 2007. 58(2):168-176.

184. Bridges JF. Stated preference methods in health care evaluation: an emerging methodological paradigm in health economics. Applied health economics and health policy, 2003. 2(4):213-224.

185. Zhang J, Yu KF. What's the relative risk? A method of correcting the odds ratio in cohort studies of common outcomes. Journal of the American Medical Association, 1998. 280(19):1690-1691.

186. Maldonado G, Greenland S. Estimating causal effects. International Journal of Epidemiology, 2002. 31(2):422-429.

187. Egger M, Smith GD, Phillips AN. Meta-analysis: Principles and procedures. British Medical Journal, 1997. 315(7121):1533-1537.

188. Holden WL. Benefit-risk analysis - A brief review and proposed quantitative approaches. Drug Safety, 2003. 26(12):853-862.

189. Robert C, Casella G. Monte Carlo Statistical Methods, 2004: Springer.

190. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 1953. 21(6):1087-1092.

191. Hastings WK. Monte carlo sampling methods using Markov chains and their applications. Biometrika, 1970. 57(1):97-109.

192. Liu JS, Liang F, Wong WH. The Multiple-Try Method and Local Optimization in Metropolis Sampling. Journal of the American Statistical Association, 2000. 95(449):121-134.

193. Geman S, Geman D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1984. PAMI-6(6):721-741.

194. Swendsen RH, Wang JS. Nonuniversal critical dynamics in Monte Carlo simulations. Physical Review Letters, 1987. 58(2):86-88.

Documents

QUANT IT AT IVE MET HODS T O SUPPORT DRUG ...su.diva-portal.org/smash/get/diva2:692438/FULLTEXT03.pdfI Caster O, Norén GN, Madigan D, Bate A. Large-Scale Regression-Based Pattern