Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
MARAS: Multi-Drug Adverse Reactions Analytics System
by
Tabassum Kakar
A Thesis
Submitted to the Faculty
of the
WORCESTER POLYTECHNIC INSTITUTE
In partial fulfillment of the requirements for the
Degree of Master of Science
in
Data Science
by
April 23, 2016
APPROVED:
Professor Elke A. Rundensteiner, Thesis Advisor
Professor Xiangnan Kong, Thesis Reader
Abstract
Adverse Drug Reactions (ADRs) are a major cause of morbidity and mortality
worldwide. Clinical trials, which are extremely costly, human labor intensive and
specific to controlled human subjects, are ineffective to uncover all ADRs related
to a drug. There is thus a growing need of computing-supported methods facilitat-
ing the automated detection of drugs-related ADRs from large reports data sets;
especially ADRs that left undiscovered during clinical trials but later arise due to
drug-drug interactions or prolonged usage. For this purpose, big data sets available
through drug-surveillance programs and social media provide a wealth of longevity
information and thus a huge opportunity.
In this research, we thus design a system using machine learning techniques to
discover severe unknown ADRs triggered by a combination of drugs, also known
as drug-drug-interaction. Our proposed Multi-drug Adverse Reaction Analytics
System (MARAS) adopts and adapts an association rule mining-based methodol-
ogy by incorporating contextual information to detect, highlight and visualize inter-
esting drug combinations that are strongly associated with a set of ADRs. MARAS
extracts non-spurious associations that are true representations of the combination
of drugs taken and reported by patients. We demonstrate the utility of MARAS
via case studies from the medical literature, and the usability of the MARAS sys-
tem via a user study using real world medical data extracted from the FDA Adverse
Event Reporting System (FAERS).
Acknowledgements
I would like to express my sincere gratitude to my advisor Professor Elke Run-
densteiner for giving me the opportunity to work on this research and making it a
pleasant experience for me. Without her support and continuous motivation this
research would not have been possible. I am grateful for her time on revising my
work again and again, to make it perfect. I really appreciate her patience, guidance,
encouragement, as well as immense knowledge that inspired me and helped me grow
and continuously improve my way of thinking in solving a problem.
I am very grateful to Professor Xiangnan Kong for his valuable time on advising
me and reading my thesis.
I am thankful to Xiao Qin, graduate student at Computer Science Department
(WPI) for his close collaboration on this work. Without his continuous feedback
and joint effort this would not have been a quality and perfect work. His immense
knowledge on association rule mining and research has motivated me to continue
research.
I am also thankful to Susmitha Wunnava, graduate student at Data Science
Department (WPI) for her collaboration, support and motivation throughout the
course of this research.
I am deeply grateful to all the Data Science Program Professors whose teachings
equipped me with the knowledge to work on this research. I also thank Mary
Racicot, the administrative staff of Data Science for her continued support.
At the end, I would like to extend a special thanks to my family for their non
stop support, motivation and believing in me.
i
Contents
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Limitation of State-of-the-art . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Research Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 The MARAS Methodology . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Preliminaries 9
2.1 Association Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3 Association Rule Model for Multi-Drug ADR Signal 11
3.1 Drug-ADR Association . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Closed Drug-ADR Association . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Type of Drug-ADR Association . . . . . . . . . . . . . . . . . . . . . 12
3.4 Mining Supported Drug-ADR Rule Using Closed Itemset . . . . . . . 14
3.5 Multi-level Contextual Association Cluster . . . . . . . . . . . . . . . 15
3.6 Exclusiveness Score for Drug-Drug Interaction Signal . . . . . . . . . 17
4 Visualizing Drug-ADR Association 20
4.1 MARAS Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
ii
5 Experimental Evaluation 25
5.1 Data Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.2 Mining Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.3 Result At-A-Glance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.4 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.4.1 User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6 Related Work 33
7 Conclusion 35
A User Study 36
A.1 Sample of interesting and non-interesting groups . . . . . . . . . . . . 36
A.2 User Study Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
iii
List of Figures
1.1 The MARAS Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.1 A Contextual Glyph . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Panoramagram of Glyphs . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3 Zoom-in Glyph View . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.1 Reduction in number of rules. . . . . . . . . . . . . . . . . . . . . . . . 27
5.2 User study results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.3 Bar-chart representing MCAC . . . . . . . . . . . . . . . . . . . . . . . 32
A.1 Sample of two drugs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
A.2 Sample of three drugs . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
A.3 Sample of four drugs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
A.4 Question 1 with barchart . . . . . . . . . . . . . . . . . . . . . . . . . . 38
A.5 Question 1 with CG . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
A.6 Question 2 with barchart . . . . . . . . . . . . . . . . . . . . . . . . . . 39
A.7 Question 2 with CG . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
A.8 Question 3 with barchart . . . . . . . . . . . . . . . . . . . . . . . . . . 40
A.9 Question 3 with CG . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
A.10 Question 4 with barchart . . . . . . . . . . . . . . . . . . . . . . . . . . 42
A.11 Question 4 with CG . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
iv
A.12 Question 5 with barchart . . . . . . . . . . . . . . . . . . . . . . . . . . 42
A.13 Question 5 with CG . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
v
List of Tables
3.1 Example of a MCAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.1 FAERS Data From 2014 . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.2 Top 5 Multi-Drug Associations from 1st Quarter of 2014. . . . . . . . . . 28
vi
Chapter 1
Introduction
1.1 Background
An Adverse Drug Reaction (ADR) corresponds to an unwanted and dangerous effect
caused by the administration of a drug. According to the U.S. Food and Drug
Administration (FDA) every year hundreds of thousands of people die because of
these ADRs while over two million serious ADRs are reported every year [1]. For
this purpose, expensive controlled clinical trials are mandated to thoroughly test
the possible ADRs of any drug. In general these clinical trials are expensive and
are restricted to a limited time-frame, population groups with specific diseases and
certain combination of drugs. Thus these trials provide limited information on ADRs
caused by the prolonged usage of the drug or interaction with other drugs taken by
certain patients. Once tested through clinical trails and approved by FDA, the drug
is released into the market for public consumption.
In the post-marketing phase the effectiveness and safety of drugs is monitored
by regulatory agencies known as post-marketing surveillance. One such surveillance
system, the FDA Adverse Event Reporting System (FAERS) [5] collects information
1
on adverse events related to drugs reported by patients, health care professionals
and drug manufacturers in a database and makes it available to public via web ser-
vices. Many drugs have been withdrawn from the market during the post-marketing
surveillance for their adverse effects such as Posicor [24], Troglitazone [15], Cerivas-
tatin [14] and many more. Research [21] has shown computational methods applied
on the data from post-marketing surveillance can help address the limitations of
clinical trials, i.e. discovering potential severe ADRs related to drugs.
Adverse reactions can be caused by the administration of a single drug or multiple
drugs, either upon immediate use or prolonged use or even overdose. ADRs caused
by multiple drugs also known as Drug-Drug Interactions can vary from being minor
to severe. Minor reactions might increase or decrease the effectiveness of one of
the drugs. On the other hand, a severe reaction can turn out to be potentially
fatal. For example, Aspirin as taken together with Warfarin, a blood-thinning drug,
may lead to excessive bleeding [9]. Therefore, these drug-drug interactions should be
detected early-on with minimum patient exposure to avoid further harmful incidents.
The data collected from drug-surveillance programs is extremely useful resource to
tap into information related to the drug-drug interaction obtained first-hand from
the patients. Manual scanning of all these reports is extremely costly and nearly
impossible as thousands of reports are added on daily bases [5] hence the database
grows rapidly. Computational methods, especially data mining techniques can be
used to automatically identify the drug-drug interactions.
In order to find association between a combination of drugs and ADRs using
data mining tools, it is vital to know how these drugs individually are associated
with these ADRs. That is, if the chances of the single drug triggering the ADRs are
high then it is less likely that the combination of two or more drugs is triggering
these ADRs, hence the combination of drugs is not interesting. For example, if
2
the chances of Zometa and Prilosec taken together triggering a set of ADRs such
as Osteoarthritis, Neuropathy peripheral, Osteonecrosis of jaw and pain is very
high, while the possibility of either Zometa or Prilosec taken individually triggering
these ADRs is very low, then the combination of Zometa and Prilosec together as
a potential cause is worth further investigation. This ADR detection is especially
helpful for the drug-safety evaluator to focus on the potentially important drug
safety issue.
1.2 Limitation of State-of-the-art
Drug-drug interactions have been studied in the literature using statistical methods
such as relative reporting ratio and disproportionality analysis [26, 27, 28]. These
methods tend to be restricted to either a specific set of drugs or specific ADRs.
Hence they lack the general methodology to consider any reports that can help to
identify the most critical ADRs driven by the data. While it is crucial for drug-safety
monitoring organizations such as FDA to have plausible interaction information for
all reported drugs and ADRs; to direct their limited resources to take the most
appropriate action to target critical cases for further investigation.
In general, association rule mining is a popular technique used for identifying
relationships among items in large databases. Association rule mining has been used
previously in the medical domain to find ADRs possibly caused by drugs [12, 18].
There has been initial evidence that it can be used to mine associations among
combination of drugs and ADRs [17]. However, current methods lack to take context
of the rules into consideration that can provide crucial information about the drug-
drug interactions. Contextual information is important as considering a rule in
isolation may lose vital information present in its sub-rules. In case of detecting
3
ADRs related to a combination of drugs this contextual information can provide
insights into the relationship of the corresponding individual drugs to the ADRs.
Furthermore, some association rules can be misleading, i.e. not true representative
of real data, which needs to be eliminated. Finally, an interestingness criteria is
needed to identify the most crucial drug-drug interactions.
1.3 Research Challenges
To develop an Multi-Drug Adverse Reaction Analytics System (MARAS) the fol-
lowing research challenges must be addressed:
Amount of generated rules: Association rule mining applied to a set of
thousands of drugs and ADRs generates extremely huge number of rules which is
impossible for an analyst to sift through. These rules may exponentially grow es-
pecially if the minimum support threshold is very low. In the context of detecting
drug-drug interaction, a low support is necessary to extract all possible combination
of drugs and ADRs. Furthermore, some of these rules might be redundant, mislead-
ing or inappropriate for the analysis of drug-drug interaction. Thus we must devise
a criteria for selecting the appropriate ones from the overwhelming pool of rules.
Avoiding misleading information related to Drug-ADR: Association rules
can sometimes be misleading as they may not be the true representation of data.
This mining process finds all possible relationships among all items in a dataset
which may generate some rules based on partial information. In the context of
detecting drug-drug interaction, rules depicting partial information tend to ignore
drugs that could be contributing factor to the interaction. While one may argue
that the shorter rules with partial information are more general hence better, how-
ever, to find ADRs triggered by combination of drugs these partial rules might not
4
Reports� Adverse Event Reporting System �
Patterns�MARAS�Analytics� Medical Professions�
Domain Knowledge�
Aggregated Result�MARAS
Visualization�
Figure 1.1: The MARAS Approach
depict the actual combination of drugs reported by patients. Therefore, we need a
strategy to detect and eliminate such deceiving information.
Defining interestingness with respect to drug-drug interactions: We
have large number of rules showing ADRs associated with a combination of drugs,
however it is very challenging to find the most interesting ones. Even the definition of
interesting is blurred; For example, the system might select a drug-drug interaction
as interesting one but it might not be interesting for the decision makers because it
is already known and they want to know the unknown drug-drug interactions.
Advanced exploration of interesting drug-drug interactions: Although
a system generates potential drug combinations that trigger ADRs, a major chal-
lenge is providing an interactive tool to the decision maker in order to explore the
interesting drug combinations. For example, integrating domain knowledge into the
system would be beneficial to highlight interactions that are not unknown or may
lead to particularly severe adverse reactions.
1.4 The MARAS Methodology
We design the Multi-Drug Adverse Reaction Analytics System (MAR
AS) depicted in Fig. 1.1 to address the above challenges. The MARAS models
5
the correlation between a combination of drugs and some possible adverse drug
reaction(s) by adopting and then adapting the association rule methodology.
First, MARAS extracts the critical information from the adverse drug reaction
reports collected by the adverse event reporting system. It then pre-processes this
abstracted information for the core mining phase.
Second, MARAS leverages relevant domain specific insights to efficiently derive
the associations that reflect the original reports while avoiding misleading and redun-
dant information. MARAS offers flexibility to the users to control the ADRs ac-
commodated by the indicated preferences such as interestingness in unknown ADRs
versus unknown drug-drug interactions.
Third, we propose a Multi-level Contextual Association Clustering method to
evaluate the significance of the discovered multi-drug adverse drug reactions. This
contextual ranking strategy scores and ranks the drug-ADR association considering
the strength of the association as well as whether its strength is inversely propor-
tional to the strength of its contextual rules. The contrast of the strength can better
measure the significance of multi-drug ADRs as compared to traditional interest-
ingness measurements such as support, confidence, lift and etc.
Fourth, we propose a visualization method that pictures the overall distribution
of the discovered drug-ADR associations over the ranking scores. For each contex-
tual group, it presents the contrast of the strength of the drug-ADR to the strength
of its contextual rules. It provides a picturesque interpretation of the discovered pat-
terns to the user. MARAS also links the original reports to the returned patterns
so that the user can perform further investigation with richer information available
in the original reports.
6
1.5 Contributions
The key contributions by this work include:
Modeling Drug-ADR Associations: Mining non-spurious associations among
combination of drugs and ADRs is a two step process. First, we selected only those
rules that have drugs as antecedents and ADRs as consequent to find the possible
drug interactions triggering some adverse reactions. Second, in order to dismiss mis-
leading rules, we introduced the notion of closed drug-ADR associations in Section
3.2 that avoids spurious rules while generating those rules that are true representa-
tions of the original data reports.
Multi-level Contextual Association Clustering (MCAC) of Drugs and
ADRs: We propose a Multi-level Contextual Association Clustering model known
as MCAC that uses the contextual information related to the combination of drugs.
MCAC (Section 3.3) groups a rule composed of two or more drugs with its sub-rules
having a single drug or a subset of drugs to help understand the association of the
drugs and ADRs.
Ranking of Multi-level Contextual Groups: We propose (1) the exclusive-
ness measure that highlights the interesting drug-drug association groups, i.e. the
groups of rules having the contextual information. We score each group based on
the difference of the strength of association among drugs and ADRs of a rule and
its sub-rules within a group. (2) We define interestingness in the context of ADRs
related to drug-drug interactions. Intuitively, the higher the difference in strength
of association of a rule and its sub-rules within a group, the higher the group score
and the more interesting the rule.
Visualization of MARAS Rule Recommendation: We propose the MARAS
visual interactive tool that enables an analyst to not only have the insights into most
7
plausible drug-drug interactions but also provide them with the flexibility to select
the results based on customized criteria such as a specific drug or ADR (Section 4.1).
The user interface provides a visual exploratory tool that allows the user to (1) find
the interesting drug-drug interactions, (2) visualize the corresponding multi-level
contextual group of Drugs and ADRs, (3) extracts the raw reports that supports
the corresponding interesting drug-drug interaction
Experimental Evaluation: We demonstrate the utility of MARAS via case
studies from the medical literature, and the usability of the MARAS system via
a user study using real world medical data extracted from the FDA Adverse Event
Reporting System (FAERS).
8
Chapter 2
Preliminaries
2.1 Association Rule
Symbolic data analysis techniques that aim to discover patterns or models in data
can be divided into two categories: predictive and descriptive induction. Unlike
predictive induction where models are induced from class labeled data, descriptive
induction aims to find comprehensible patterns typically induced from unlabeled
data. Association rule mining is a descriptive induction technique that is widely
used to detect relationship among the items in large databases.
Let I = {i1, i2, ..., in} represent a set of items. D = {d1, d2, ..., dm} is a collection
of subsets of I called the transaction database. Each transaction di in D is a
set of items such that di ⊆ I. Let S ⊆ I be a set of items, called itemset. If
S ⊆ di, di contains S. |S| denotes the number of transactions in D that contain S.
If the cardinality of S is k, S is called a k-itemset.
Definition. 2.1.1 An association rule is an expression of the form R ≡ A ⇒ B,
where A and B are itemsets and A ⊆ I, B ⊆ I \ A.
Many measurements [23] have been proposed to evaluate the interestingness of
9
associations. Among them, the most widely used are support, confidence and lift
defined as follows:
Support(R ≡ A ⇒ B) = P (A ∪ B) = |A ∪ B| (2.1)
Confidence(R ≡ A ⇒ B) = P (B|A) =|A ∪ B||A|
(2.2)
Lift(R ≡ A ⇒ B) =P (B|A)
P (B)=P (A|B)
P (A)=|A ∪ B| ×N|A| × |B|
(2.3)
The support defined in Formula 2.1 describes the proportion of the transactions
that contains all items in the association. The confidence defined in Formula 2.2
describes the probability of finding the consequent B of the association under the
condition that these transactions also contain the antecedent A. It is a maxi-
mum likelihood estimate of the conditional probability P (B|A). The lift defined in
Formula 2.3 measures how many times more often A and B occur together than
expected if they are statistically independent.
10
Chapter 3
Association Rule Model for
Multi-Drug ADR Signal
3.1 Drug-ADR Association
Let Idrug and Iade be the complete sets of drugs and ADRs respectively where
Idrug∩Iade ≡ ∅, Idrug∪Iade ≡ I. To measure the association between the drugs and
ADRs, rules with only drug items as antecedents and ADR items as consequences
are considered. Therefore, an association rule R ≡ A ⇒ B is considered as a
drug-ADR association if A ⊆ Idrug and B ⊆ Iade.
3.2 Closed Drug-ADR Association
Traditional association rule model assumes that the correlations among items are
indicated by their co-occurrence in the database. Without pre-established depen-
dency constraints among items, existing rule mining techniques [29] consider every
possible combination of items that appears in a transaction, as a itemset. For exam-
11
ple, let t = {i1, i2, ..., in} be a transaction. Without any threshold, e.g. minimum
support, the total number of possible itemsets that can be generated based on this
single transaction is:
n1
+
n2
+ ...+
nn
= 2n − 1 (3.1)
Because of this, the number of the possible associations among items grow expo-
nentially w.r.t the number of unique items. As many studies [30, 6] point out, huge
amount of redundant information exists within the generated result. In our study,
we find that some of the patterns are not only redundant, but also misleading in
the context of finding drug-ADR associations from ADR reports.
3.3 Type of Drug-ADR Association
Let us consider an abstracted ADR report with a set of drugs takenA1 = {d1,d2} and
a set of observed ADRs B1 = {a1,a2}. This single ADR report explicitly establishes
the association between A1 and B1, expressed by the rule R1 ≡ d1, d2 ⇒ a1, a2.
Based upon this single report, the traditional association rule mining algorithms
generate 9 drug-ADR associations ((22−1)×(22−1)) including R1. All rules except
R1 are partial interpretations of the report since certain item(s), e.g. some drugs
or ADRs mentioned in the report, are absent from the rule. These rules capture
the associations among the partial drug and ADR sets implicitly indicated by the
report. In some scenarios, these rules could be misleading unless they are indicated
by other reports explicitly or implicitly.
For example, R2 ≡ d1 ⇒ a2 tells that taking d1 causes a2. It may not be true
at all since this report does not ever explicitly indicate this pairwise relationship.
12
However, if some other report exists, e.g. D2 = {d1,d5,d6} and A1 = {a2,a3,a7},
that also implicitly indicates this partial interpretation, then R2 can be possibly
considered as a legitimate association. Therefore, it can be more safely claimed as
a discovery of the association among drugs and ADRs.
As of now, we briefly describe 3 types of drug-ADR associations, namely, (1)
associations that are explicitly indicated by the report, (2) associations that im-
plicitly indicated by multiple reports and (3) partial associations. In the context of
discovering drug-ADR association using ADR reports, type 1 and 2 can be safely
used as discoveries and type 3 conveys misleading information, therefore, should
not be considered. Next, we will formally define the three types of association and
discuss how our system identifies each type.
Let t be an ADR report. Each report consists of a set of taken drugs and a set
of observed ADRs, denoted by t.D and t.A respectively. T = {t1,...,tn} is a set of
ADR reports in the database. Let R be a drug-ADR Association discovered from
T .
Definition. 3.3.1 Explicitly Supported Drug-ADR Association. R ≡ A ⇒
B is explicitly supported by T , if there exists at least one t ∈ T , such that A∪B ≡
t.D ∪ t.A.
If a drug-ADR association is explicitly supported, according to this definition, a
report exists that only describes the drugs and ADRs expressed in the association.
Other reports that contain these drugs and ADRs can also be used as the evidence
to measure the significance of this association. For example, if there is 1 report only
contains the drugs and ADRs in the association and 99 reports involve them as well,
then the support of R is 100.
Definition. 3.3.2 Implicitly Supported Drug-ADR Association. R ≡ A ⇒
13
B is implicitly supported by T , if there exists at least two reports t1, t2 ∈ T , such
that A ∪ B ≡ (t1.D ∪ t1.A) ∩ (t2.D ∪ t2.A).
The implicitly supported drug-ADR association captures the partial association
that is derived from different reports. It makes sure that the partial association is
not randomly generated due to the nature of the traditional association rule mining
model. If a drug-ADR association is neither explicitly nor implicitly supported, then
it is unsupported and therefore ignored.
3.4 Mining Supported Drug-ADR Rule Using Closed
Itemset
For a set of frequent itemsets mined from a particular dataset, closed itemsets [6]
are a subset of these regular itemsets that conveys the same amount of information.
By removing some of the redundant itemsets, it compactly represents the regular
itemsets without losing any information. The closed frequent itemset is defined as
below:
Definition. 3.4.1 An itemset S is a closed itemset if there exists no itemset S ′
such that (1) S ′ is a proper superset of S, and (2) every transaction containing S
also contains S ′ . A closed itemset S is frequent if its support passes the given
support threshold.
In our study, we find that if the complete itemset (A∪B) in a rule R ≡ A ⇒ B
is closed, then R must be a supported drug-ADR association. By complete itemset
we mean all the itemsets of the rule. Furthermore, we postulated for all type 1 &
2 drug-ADR associations w.r.t a specific dataset, their complete itemset must be
closed. In the context of finding drug-ADR association, using closed itemset as the
14
complete itemset of a rule not only compresses the ruleset but also removes any
semantically misleading information as explained in the previous section.
Lemma 3.4.2 R ≡ A ⇒ B is a supported drug-ADR association if the itemset
(A ∪ B) is a closed itemset.
Proof For a rule R ≡ (A ⇒ B), if A ∪ B is closed then there does not exist a
superset that has different support as A ∪ B does. With zero minimum support,
there are two possibilities causing such non-existence: (1) either no report exists
that has more items than A∪B which would mean R must be explicitly supported,
(2) or A∪ B is not a randomly generated subset, meaning that it must be a subset
of at least two different report, therefore R is explicitly supported.
Using Lemma 3.4.2, we only generate and consider the associations with complete
closed itemset to ensure the quality of the ruleset. Furthermore, the goal of this
study is to discover ADRs that are associated with a combination of drugs, the
drug-ADR association will be evaluated as long as it has more than one drug.
3.5 Multi-level Contextual Association Cluster
In this particular study, our goal is to capture the ADRs associated with multiple
drugs. The implication of this definition of interestingness is that the association is
interesting if ADRs are only highly associated with the complete set of drugs rather
than any of the individual drug or subsets of drugs associated with this rule.
While the existing measures [17] are able to find that some drug combinations
are highly associated with particular ADRs, they fail to verify whether this strong
association is in fact already dominated by a subset of the same drugs. Such a
domination from a subset of the drugs may weaken drug-drug interaction signal.
15
Table 3.1: Example of a MCAC
R [XOLAIR] [SINGULAIR] [PREDNISONE] ⇒ [Asthma]
R̃2
R̃21 ≡ [XOLAIR] [SINGULAIR] ⇒ [Asthma]
R̃22 ≡ [XOLAIR] [PREDNISONE] ⇒ [Asthma]
R̃23 ≡ [SINGULAIR] [PREDNISONE] ⇒ [Asthma]
R̃1
R̃11 ≡ [XOLAIR] ⇒ [Asthma]
R̃12 ≡ [SINGULAIR] ⇒ [Asthma]
R̃13 ≡ [PREDNISONE] ⇒ [Asthma]
For example, if the ADRs is highly associated with an individual drug in the com-
bination, it means that the ADRs are likely caused by this particular drug instead
of the drug-drug interaction.
To measure this notion of exclusiveness of the association between drugs and
ADRs, any rule that describes the association between a subset of drugs and the
ADRs needs to be considered as well. These related associations are henceforth
referred to as the contextual rules of the association that is being evaluated. In
particular, we now define the contextual rule as below.
Definition. 3.5.1 A drug-ADR association R ≡ X ⇒ Y is a contextual rule of
a drug-ADR association R ≡ A⇒ B if and only if X ⊂ A and Y ≡ B.
Definition. 3.5.2 The context of a Drug-ADR association R is a set of con-
textual rules of R denoted by C ≡ {R̃1,...,R̃n} such that⋃n
i=1 R̃i.antecedent ≡
P(R.antecedent)−{R.antecedent, ∅} where P(X ) is the power set of an itemset X .
A multi-level contextual association cluster refers to a combination of an
evaluated drug-ADR association and its context. The evaluated association is called
target rule. The contextual rules are grouped according to cardinality of their
antecedents. Table 3.1 displays a drug-ADR association and its entire context. In
the example, R̃ki denotes a contextual rule and k is the carnality of its antecedent.
16
3.6 Exclusiveness Score for Drug-Drug Interac-
tion Signal
As we explained, if ADRs are caused by the interaction of a set of drugs, normally
any subset of these drugs are not or weakly associated with the particular ADRs.
Inspired by this observation, we propose the exclusiveness measure that uses the
context information to evaluate the interestingness of a drug-ADR association in
terms of indicating drug-drug interaction.
Improvement measure proposed by [19] to evaluate the interestingness of the
“lengthy” rules derived from a dense dataset is derived in Formula 3.2)
Improvement(A ⇒ B) = Min(conf(A ⇒ B)− conf(As ⇒ B)|As ⊂ A) (3.2)
A rule with negative improvement is typically undesirable because the rule can
be simplified to yield a proper sub-rule that is more predictive. Also it applies to
an equal or larger population due to the antecedent containment relationship. The
notion of “sub-rule” relates to our notion of contextual association. If a drug-ADR
association is evaluated using improvement, negative improvement means that there
exists an individual drug or a subset of drugs in the rule which is more likely to cause
the ADRs. Therefore, rules with negative or low improvement value are not interest-
ing. The improvement measure reflects the similar meaning of interestingness as our
definition. However, it only considers the one sub-rule that is the most significant of
all. Overlooking the other sub-rules deprives the opportunity to differentiate among
several interesting cases. For example, even if two drug-ADR associations share the
same improvement value, the one with a larger number of high confidence sub-rules
17
may be less interesting than the other one because more subset of the drugs seem
to cause the same ADRs. To utilize the entire context to evaluate the drug-ADR
association, we propose exclusiveness measure.
Let V = {v1, ..., vn} be a set of confidence values w.r.t the context of a rule R
and p be the confidence value of R. To utilize the full context, our proposed score in
Formula 3.3 uses the average confidence of the context to compute the exclusiveness.
Exclusiveness(R) = p− 1
n
n∑k=1
vk (3.3)
The deficiency of this method is that it falsely weakens the negative effects of
the contextual rule with a high confidence. The average confidence can be much
lower than the maximum confidence. To overcome this, we introduce the coefficient
of “‘variation” to the measure. Computed as in Formula 3.4, the context with
extremely high and low confidence will be penalized. θ (0 ≤ θ ≤ 1) is a parameter
that allows the user to control the effect of such penality.
Exclusiveness(R) = (p− 1
n
n∑k=1
vk)× (1− θ · Cv(vk)) (3.4)
Following the intuition, contextual rules that describe the association between
individual drug and the ADRs are very important to measure the exclusiveness of
the association between the complete set of drugs and ADRs. As the number of drugs
increases in the contextual rule, the importance of the corresponding exclusiveness
decreases. For this reason, we introduce a decay function fd(k) to decrease the
importance of the contextual rules as the cardinality of their antecedent increases.
Exclusiveness(R) =1
|V|
k∑1
(p− v̄k)× fd(k)× (1− θ · Cv(vk)) (3.5)
vk is a set of confidence values w.r.t a set of contextual rules with k drugs. V
18
denotes the complete set of vk for R. In our experiment, we use a linear decay
function. If n is the number of drugs in an association, the weight for the average
exclusiveness over contextual rules R̃k is (1− (k− 1)/n). As mentioned in [19], the
confidence in this computation could be replaced by other reasonable measures. For
instance, we also experiment with lift in our evaluation study.
19
Chapter 4
Visualizing Drug-ADR Association
Glyph [7] is a popular technique to visualize multivariate data. A glyph is a display
object that consists of various attributes including shape, size, color and position.
Each of these attributes can be used to describe different variables of the displayed
data. Glyph has been used before to visualize the association rules [15]. However,
existing approaches only consider single rule at a time. They do not work on a cluster
of rules that is composed of sub-rules as explained in Section 3.3. Therefore, to
provide a flexible methodology to help analysts quickly comprehend similarities and
differences among various multi-level association clusters, we propose Contextual
Glyph (CG) depicted in Fig.2. The user study confirmed that CG are a better way
to pinpoint interesting drug-interactions.
The inner circle represents the target rule. The diameter of the circle encodes
the confidence value of the target rule. The surrounding circular sectors represent
the contextual rules. For each circular sector, the distance from the arc to the inner
circle encodes the confidence value of its represented rule. Staring from 12 o’clock,
contextual rules are uniformly laid out ordered by the cardinality of their antecedent.
Rules with the same cardinality identified by the same color (the darker the larger)
20
Confidence of the Evaluated Rule
Confidence of the Contextual Rule
3
2
1
# of Drugs
Ordered by Confidence value
Figure 4.1: A Contextual Glyph
are ordered by their confidence values. In our contextual glyph the larger the inner
circle and the smaller the outer circles are, the higher the rank of the group is,
showing strong association between the ADRs and a drug combination. The user
friendly interactive interface provides options of further drilling down to each glyph.
Clicking on any glyph shows the zoomed view to get further insights about the rule
cluster and mouse over on any segment displays further information about the rule.
These contextual glyphs provide a flexible way of comparison among interesting and
non interesting drug-interactions. An analyst can easily identify the similar ranked
ones and get further insights by drilling down.
4.1 MARAS Interface
MARAS provides a visual interactive interface Fig. 4.2 that have below function-
alities to help an analyst explore the drug related information:
21
Figure 4.2: Panoramagram of Glyphs
Highlighting interesting drug-drug interactions: The system facilitates
a user to select few interesting drug-drug interactions from the pool of thousands
of rules by searching for a specific drug, combination of drugs or specific ADR.
Moreover, the user can also select drug interactions based on some defined criteria
of interestingness such as drug interactions that may lead to severe ADRs which
might need immediate action for further investigation.
Visualization of ADRs associated with these interactions: Once a user
selects the interesting drug-drug interactions, he can get further insights about the
interaction by visualizing the corresponding multi-level contextual group of Drugs
and ADRs, i.e., association of each individual drug with the ADRs (Fig. 4.1).
Furthermore, the system can highlight drug-drug interactions that are similar to
each other based on the defined interestingness criteria.
22
Figure 4.3: Zoom-in Glyph View
23
Mapping the drug-drug interactions to actual reports: After the system
generates the plausible drug-drug interaction, they need to be further investigated
in order to and the relevant factors causing the interaction, such as patient’s age,
health history etc. It is essential to analyze the original data reports submitted by
patients that supports the corresponding drug-drug interactions.
24
Chapter 5
Experimental Evaluation
5.1 Data Source
The FDA Adverse Event Reporting System (FAERS) is a database maintained
by FDA as a part of its post-marketing safety surveillance program for drugs and
therapeutic biologic products. FAERS contains million of records about the adverse
event and medication error and is publicly available in quarterly basis. For the
purpose of this research we used the public version of FAERS [3] dataset from
2014. We selected the mandatory reports submitted by manufacturers marked as
expedited (EXP) as these reports contain at least one severe adverse event. Table
2 provides the basic statistics of the dataset we select from each quarter in 2014.
Table 5.1: FAERS Data From 2014
Q1 Q2 Q3 Q4Reports 126,755 138,278 121,725 121,490Drugs 37,661 37,780 33,133 32,721ADRs 9,079 9,324 9,418 9,234
25
5.2 Mining Process
The first step in the mining process is data preparation and cleaning.We extracted
the drugs and ADRs from FAERS reports and merged them for each single case. We
performed some preliminary cleaning on drug names and ADRs to remove duplica-
tion and correct misspellings. The second step is to apply association rule mining
on the pre-processed data. We use FP-Growth trees for closed item-set and rule
generation. The reason for using closed itemset is to remove misleading rules as
described in Section (3.4).
The third step is to select only those rules that have drugs as antecedent and
ADRs as consequent to align with our goal of discovering ADRs related to drugs.
The fourth step is to generate multi-level contextual association clusters and ranking
them using the exclusiveness measure explained in section (3.5) and (3.6) respec-
tively.
5.3 Result At-A-Glance
Fig. 5.1 summarizes the number of associations generated by different methods.
Total rules refer to the associations generated by the traditional association rule
mining algorithm. Filtered rules refer to all the possible drug-ADR associations.
MCACs refer to the closed drug-ADR associations that are used to signal the drug-
drug interactions. As depicted in the figures, our proposed method significantly
reduces the rule space by removing the redundant and misleading associations.
5.2 shows top 5 multi-drug associations from 2014 Quarter 1 data ranked by 4
different methods. The Confidence and Lift columns are the multi-drug associations
ordered by their confidence and lift values. These two methods do not filter the rule
using closed itemsets. As a result, there are many similar rules. Based on our ob-
26
1.E+00
1.E+01
1.E+02
1.E+03
1.E+04
1.E+05
1.E+06
1.E+07
Q1 Q2 Q3 Q4
Total Rules Filtered Rules MCACs
Figure 5.1: Reduction in number of rules.
servation explained in Section 3.2, most of these similar rules are redundant and
misleading. Among these top ranked rules, some patterns are uninteresting because
individual drug listed in the pattern is able to trigger the same ADRs. Exclusiveness
with Confidence and Exclusiveness with Lift select the closed multi-drug associations
and rank them using our proposed exclusiveness measure with confidence and lift
value. The top ranked rules are more diverse as compared to the first two methods.
Since lift considers the support of the consequence, in our case, rules with relatively
less frequent ADRs are ranked higher by the methods that involve lift. To evaluate
the quality of these results, we conduct a case study.
5.4 Case Study
MARAS identifies potential drug-drug interactions from the data and ranks the
interesting drug-drug interactions based on the exclusiveness measure explained in
section 3.4. Since the exclusiveness measure is based on contextual information
that is rather subjective in nature, it is tough to conduct a straight forward experi-
mental evaluation of the system to measure its effectiveness. Hence, to evaluate the
27
Table 5.2: Top 5 Multi-Drug Associations from 1st Quarter of 2014.
Pepcid Host Disease Acute Graft Versus Host Disease
Rank Confidence Lift Exclusiveness with Confidence Exclusiveness with Lift
1
Zantac
Osteoporosis
Methotrexate Chronic Graft Versus Host Disease Zometa Osteonecrosis Of
Jaw Prograf Chronic Graft Versus Host Disease Nexium Prograf
Tums Granulocyte Colony-Stimulating Factor Nos Drug Ineffective Ambien Anxiety Melphalan Drug Ineffective
Mylanta
2
Zantac
Osteoporosis
Prograf Chronic Graft Versus Host Disease Prograf
Drug Ineffective
Zometa Osteoarthritis
Tums Granulocyte Colony-
Stimulating Factor Nos Drug Ineffective Melphalan Prilosec Neuropathy Peripheral
Mylanta Osteonecrosis Of Jaw
Pain
3
Zantac
Osteoporosis
Methotrexate Chronic Graft Versus Host Disease Prograf
Drug Ineffective
Zometa Osteoarthritis
Nexium Granulocyte Colony-
Stimulating Factor Nos Drug Ineffective Granulocyte
Colony-Stimulating Factor Nos
Prilosec Anaemia
Mylanta Osteonecrosis Of Jaw
Pain
4 Zantac
Osteoporosis Prograf Chronic Graft Versus
Host Disease ZometaPain Osteonecrosis Of Jaw Prograf Drug Ineffective
Mylanta Melphalan Drug Ineffective Prilosec Pain Melphalan Acute Graft Versus Host Disease
5
Rolaids
Osteoporosis Zometa
Osteoarthritis Fludarabine
Drug Ineffective Prograf
Acute Graft Versus Host
Disease Zantac Neuropathy Peripheral
Nexium Prilosec
Osteonecrosis Of Jaw Prograf Granulocyte
Colony-Stimulating Drug Ineffective Tums Pain
MARAS system and its effectiveness we conduct case studies using FAERS patient
reports. The goal is to validate the top ranked drug-drug interactions identified by
MARAS through existing bio medical literature and domain knowledge resources.
Case I: Ibuprofen and Metamizole drug-drug interaction.
One of the top ranked drug-drug interactions identified by MARAS is Ibuprofen,
nonsteroidal anti-inflammatory drug and Metamizole, an analgesic, antipyretic and
anti-inflammatory agent. This interaction is identified from mining the FAERS 2014
second quarter reports and is ranked third by the MARAS system. We found that
the combination of these two drugs is highly associated with acute renal failure.
We validated this drug-drug interaction with the results of a study published in the
World Health Organization (WHO) Pharmaceuticals Newsletter, 2014 [22]. The
study was conducted on the data from VigiBase [20], WHO Global ICSR Database
and has found a statistically significant and valid drug interaction when the two
28
drugs are used in combination. Incidentally, Metamizole is an international drug
that is used as a pain killer and fever reducer in Mexico and many other countries.
Although it is prohibited in the Unites States, it is widely used among the Latino
and other immigrant populations [8].
Case II: Methotrexate and Prograf drug-drug interaction.
Our results indicate that a combination of Methotrexate and Prograf (Tacrolimus)
is associated with the drug being ineffective. Methotrexate is used to treat certain
types of cancers, rheumatoid arthritis and psoriasis. Prograf (Tacrolimus) is used in
post organ transplantation to prevent rejection by the human body. This interaction
is ranked second by MARAS. We validated this drug-drug interaction using two
sources: Drugs.com [2], a FDA recommended resource for obtaining valuable and re-
liable information on drug-drug interactions and, DrugBank.ca [11], a drug database
that contains comprehensive biochemical and pharmacological information provid-
ing insights on drug-drug interactions. According to Drugs.com, Methotrexate may
cause kidney problems, and combining it with other medications that can also affect
the kidney such as Tacrolimus may increase that risk. According to DrugBank, the
risk or severity of adverse effects can be increased when Tacrolimus is combined with
Methotrexate. Therefore, when two drugs that have similar adverse reactions are
taken concomitantly, their adverse effects might add up and contribute even more
so towards the occurrence of the ADRs.
Case III: Prevacid and Nexium drug-drug interaction. Our results indicate
that a concomitant use of Prevacid and Nexium is associated with Osteoporosis.
This interaction is ranked fourth by the proposed system. Both Prevacid and Nex-
ium belong to a group of drugs called proton pump inhibitors (PPI) that are used to
29
treat gastroesophageal reflux disease(GERD) by suppressing the secretion of gastric
acid. Several studies [13, 25] have shown that patients taking PPI drugs are at an
increased risk of developing Osteoporosis and related bone fractures. FDA revised
the original drug label for these PPI drugs to include safety information indicat-
ing possible side effects of osteoporosis and fracture warning [4]. We validated this
drug-drug interaction using Drugs.com [2]. According to Drugs.com, the interaction
is classified as a Therapeutic Duplication, meaning using same drug category drugs
to treat the same condition. And, the recommended maximum number of drugs in
the ’acid suppressant agents’ category to be taken concurrently is usually one. In
such cases as these, it is either intentional (drugs combined together for therapeutic
benefit), or unintentional (patient self-prescribed or has been treated by more than
one doctor, or had prescriptions filled at more than one pharmacy). Either way,
combining these drugs can potentially increase the risk of osteoporosis as supposed
to when these drugs are taken individually.
The above two case studies show that, MARAS can easily detect and identify
ADRs associated with a combination of drugs. In the case studies, for validation
purpose, we have intentionally selected already known and published drug-drug
interactions from our top-ranked results. However, we extend the notion and claim
that if MARAS can detect known drug-drug interactions, it is also equally capable
of detecting unlabeled or unknown drug-drug interactions.
5.4.1 User Study
We conduct a user study to evaluate the effective way of presenting contextual
groups (MCAC) visually either in the form of Contextual Glyphs (CG) or bar-
charts displayed in Fig. 4.1 and Fig. 5.3 respectively. We invited 50 students from
30
0
20
40
60
80
100
Two Three Four
Pe
rce
nta
ge(
%)
Number of Drugs
Contextual Glyph
Barchart
Figure 5.2: User study results
WPI to identify interesting drug-drug interaction containing two-four drugs, using
both bar-charts and CG. For each question the user was given two visuals, one bar
chart and one CG both representing the same drug interactions and the user was
asked to pick the interesting drug interactions. The details about the questions are
stated in Appendix A. Fig. 6 shows the percentage of users who were quickly able
to recognize an interesting pattern correctly using both visuals. It is obvious that
users could accurately identify top-ranked interesting drug interactions using CG
more faster than the bar-charts. In case of two drug combinations 71% of users
were able to pinpoint the interesting interactions using a CG, 57% for three and
86% for four drug combinations. This confirms that CG significantly save users
effort and time on pinpointing the interesting drug-drug interactions. Therefore, we
selected contextual glyphs to represent the drug-drug interaction in our MARAS
system.
31
Figure 5.3: Bar-chart representing MCAC
32
Chapter 6
Related Work
Drug-drug interactions: Drug Interactions leading to ADRs have been stud-
ied previously. For example, Tatonetti et al [26, 27, 28] have used statistical methods
to find interactions among drug classes. However, these methods are specified for
a subset of drugs and ADRs only. Hence, they do not consider all reported drugs
and ADRs which is crucial for drug-surveillance. On the other hand, unsupervised
methods and in particular association rule mining has been used in the medical
domain to explore drug related ADRs [12, 18, 16]. However, these methods have
only considered the identification of ADRs related to a single drug, rather than a
combination of drugs. While these drug-drug interactions are crucial to be detected
as about 30% of adverse reactions occur due to these drug-drug interactions.
Drug-drug interaction with association rule mining: [17] has used associ-
ation rule mining with relative reporting ratio to find drug interactions triggering a
set of ADRs. However this approach lacks to define improvements to the technique
in order to get rid of spurious and misleading rules as well as highlight interesting
drug-drug interactions based on contextual information
Interestingness in association rule mining: Various attempts have been
33
made in literature to reduce the number of the generated rules and rank the most
interesting ones [23, 30, 6]. However majority of these measures are either for clas-
sification rules or are subjective measures that need domain specific knowledge to
define interestingness. Sub-rules based interestingness has been studied by [10],
where interestingness is defined as an unexpected confidence among a neighbor-
hood. The interestingness based on sub-rule’s confidence known as improvement
has been proposed by [19] to ensures that for every rule none of its simplifications
offer any predictive advantage over it. None of these existing methods captures the
most interesting associations among multiple drugs and ADRs.
34
Chapter 7
Conclusion
We proposed the MARAS technology for detecting the drug-drug interactions.
We defined the criteria of interestingness in the context of multi-drug adverse drug
reaction association. Our visual mining technology helps an evaluator explore and
analyze these interactions in further detail. MARAS can effectively identify drug-
drug interaction along-with providing a new exploration experience.
35
Appendix A
User Study
A.1 Sample of interesting and non-interesting groups
First samples of top ranked and bottom ranked drug interactions with both contex-
tual glyphs and bar-charts were shown to users (Fig. A.1, A.2, A.3)
A.2 User Study Questions
Question 1: User had to select the top ranked (interesting) two drug interaction
from Fig. A.4 and A.5 using bar-charts and contextual glyph (CG).
Question 2: User had to select top three ranked (interesting) two drug interaction
from Fig. A.6 and A.7 using bar-charts and contextual glyph (CG).
Question 3: User had to select the top ranked (interesting) three drug interaction
from Fig. A.8 and A.9 using bar-charts and contextual glyph (CG).
Question 4: User had to select top two ranked (interesting) three drug interaction
36
Figure A.1: Sample of two drugs
Figure A.2: Sample of three drugs
37
Figure A.3: Sample of four drugs
Figure A.4: Question 1 with barchart
38
Figure A.5: Question 1 with CG
Figure A.6: Question 2 with barchart
39
Figure A.7: Question 2 with CG
Figure A.8: Question 3 with barchart
40
Figure A.9: Question 3 with CG
from Fig. A.10 and A.11 using bar-charts and contextual glyph (CG).
Question 5: User had to select the top ranked (interesting) four drug interaction
from Fig. A.12 and A.13 using bar-charts and contextual glyph (CG).
41
Figure A.10: Question 4 with barchart
Figure A.11: Question 4 with CG
Figure A.12: Question 5 with barchart
42
Figure A.13: Question 5 with CG
43
References
[1] Adverse drug reactions. http://www.fda.gov/Drugs/
DevelopmentApprovalProcess/DevelopmentResources/
DrugInteractionsLabeling/ucm110632.htm. [Online; accessed 21-March-2016].
[2] Drugs.com. http://www.drugs.com. [Accessed 2016-04-20].
[3] Openfda. https://open.fda.gov/drug/event. [Accessed: 2016-04-20].
[4] FDA. http://www.fda.gov/Drugs/DrugSafety/
PostmarketDrugSafetyInformationforPatientsandProviders/ucm213206.
htm. [Accessed 2016-04-20].
[5] FDA adverse event reporting system (FAERS). http://www.fda.gov/
Drugs/GuidanceComplianceRegulatoryInformation/Surveillance/
AdverseDrugEffects/ucm082193.htm. [Online; accessed 11-March-2016].
[6] Y. Bastide, N. Pasquier, R. Taouil, G. Stumme, and L. Lakhal. Mining minimalnon-redundant association rules using frequent closed itemsets. In Computa-tional LogicCL 2000, pages 972–986. Springer, 2000.
[7] J. Beddow. Shape coding of multidimensional data on a microcomputer display.In IEEE Visualization, pages 238–246, 1990.
[8] J. L. Bonkowsky, J. K. Frazer, K. F. Buchi, and C. L. Byington. Metamizoleuse by latino immigrants: a common and potentially harmful home remedy.Pediatrics, 109(6):e98–e98, 2002.
[9] T. Chan. Adverse interactions between warfarin and nonsteroidal antiinflam-matory drugs: mechanisms, clinical significance, and avoidance. The Annals ofpharmacotherapy, 29(12):1274–1283, 1995.
[10] G. Dong and J. Li. Interestingness of discovered association rules in terms ofneighborhood-based unexpectedness. In PAKDD, pages 72–86, 1998.
[11] V. L. et al. Drugbank 4.0: shedding new light on drug metabolism. NucleicAcids Research, 42(Database-Issue):1091–1097, 2014.
44
[12] D. M. Fram, J. S. Almenoff, and W. DuMouchel. Empirical bayesian data min-ing for discovering patterns in post-marketing drug safety. In ACM SIGKDD,pages 359–368, 2003.
[13] L. Fraser, W. Leslie, L. Targownik, A. Papaioannou, J. Adachi, C. R.Group, et al. The effect of proton pump inhibitors on fracture risk: reportfrom the canadian multicenter osteoporosis study. Osteoporosis International,24(4):1161–1168, 2013.
[14] C. D. Furberg and B. Pitt. Withdrawal of cerivastatin from the world market.Curr Control Trials Cardiovasc Med, 2(5):205–207, 2001.
[15] E. A. Gale. Troglitazone: the lesson that nobody learned? Diabetologia,49(1):1–6, 2006.
[16] M. R. Hacene, Y. Toussaint, and P. Valtchev. Mining safety signals in spon-taneous reports database using concept analysis. In Artificial Intelligence inMedicine, pages 285–294, 2009.
[17] R. Harpaz, H. S. Chase, and C. Friedman. Mining multi-item drug adverseeffect associations in spontaneous reporting systems. BMC Bioinformatics,11(S-9):S7, 2010.
[18] H. Jin, J. Chen, H. He, G. J. Williams, C. Kelman, and C. M. O’Keefe. Miningunexpected temporal associations: Applications in detecting adverse drug re-actions. IEEE Trans. Information Technology in Biomedicine, 12(4):488–500,2008.
[19] R. J. B. Jr., R. Agrawal, and D. Gunopulos. Constraint-based rule mining inlarge, dense databases. In IEEE ICDE, pages 188–197, 1999.
[20] M. Lindquist. Vigibase, the who global icsr database system: basic facts. DrugInformation Journal, 42(5):409–419, 2008.
[21] M. Liu, M. E. Matheny, Y. Hu, and H. Xu. Data mining methodologies forpharmacovigilance. ACM SIGKDD Explorations, 14(1):35–42, 2012.
[22] W. H. Organization et al. WHO pharmaceuticals newsletter. WHO Collabo-rating Centre for International Drug Monitoring, 2014.
[23] S. Sahar. Interestingness measures - on determining what is interesting. In DataMining and Knowledge Discovery Handbook, 2nd ed., pages 603–612. 2010.
[24] R. SoRelle. Withdrawal of posicor from market. Circulation, 98(9):831–832,1998.
45
[25] L. E. Targownik, L. M. Lix, C. J. Metge, H. J. Prior, S. Leung, and W. D.Leslie. Use of proton pump inhibitors and risk of osteoporosis-related fractures.Canadian Medical Association Journal, 179(4):319–326, 2008.
[26] N. P. Tatonetti, J. Denny, S. Murphy, G. Fernald, G. Krishnan, V. Castro,P. Yue, P. Tsau, I. Kohane, D. Roden, et al. Detecting drug interactions fromadverse-event reports: interaction between paroxetine and pravastatin increasesblood glucose levels. Clinical pharmacology and therapeutics, 90(1):133, 2011.
[27] N. P. Tatonetti, G. H. Fernald, and R. B. Altman. A novel signal detectionalgorithm for identifying hidden drug-drug interactions in adverse event reports.Journal of the American Medical Informatics Association, 19(1):79–85, 2012.
[28] N. P. Tatonetti, P. Y. Patrick, R. Daneshjou, and R. B. Altman. Data-drivenprediction of drug effects and interactions. Science translational medicine,4(125):125ra31–125ra31, 2012.
[29] I. H. Witten and E. Frank. Data Mining: Practical machine learning tools andtechniques. Morgan Kaufmann, 2005.
[30] M. J. Zaki. Generating non-redundant association rules. In Proceedings of thesixth ACM SIGKDD international conference on Knowledge discovery and datamining, pages 34–43. ACM, 2000.
46