Upload
abdelfattah-al-zaqqa
View
331
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Medicines: is the applied science or practice of the diagnosis, treatment, and prevention of disease. Bad effects called Adverse Drug Reactions (ADRs) , it differs from side effects.
Citation preview
Data Mining
Methodologies for
PharmacovigilanceABDELFATTAH AL ZAQQA
SCHOOL OF COMPUTER SCIENCE
PRINCESS SUMAYA UNIVERSITY FOR TECHNOLOGY
1A
bd
elfa
ttah
Al Z
aq
qa
, PSU
T-Am
ma
n-J
ord
an
Agenda
Introduction
Examples
Some facts of ADRs and drugs.
Pharmacovigilance
Phv methodologies
Data mining
Computational methodology-Pre-Marketing
Computational methodology-Post Marketing
Future perspectives
2
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
Introduction
Medicines: is the applied science or
practice of the diagnosis, treatment, and
prevention of disease.
Most medicines have both good and bad
effects.
Bad effects called Adverse Drug Reactions
(ADRs) , it differs from side effects.
Side effects whether therapeutic or adverse ADRs cause over 700,000
emergency department visits
each year in the United States
3
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
Example of ADRs and side effects
reduce your headache or fever
reduce the ability of your blood to clot
× bleeding of intestine
• Desired and undesired effects of an aspirin therapy
4
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
Facts
New drug may takes 10 years and
billions of dollars.
ADRs may led to withdrawals drug.
Drug interactions may also increase
the risk of ADRs
ADRs may cause over 100,000 deaths among hospitalized
patients each year.
ADRs is the fourth largest cause of
death in US
136 $ billion annual cost in US from
ADRs.
5
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
Pharmacovigilance (PhV)
Pharmacovigilance (PhV) is the science that concerns with the detection, assessment, understanding and prevention of ADRs
Pharmacovigilance (PhV)=drug safety surveillance
Surveillance for premarketing (i.e. Data from preclinical & clinical trials) and post-marketing(i.e. throughout a drug’s market life)
6
Phv trend to link the Preclinical human safety with information from post marketing.
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
Phv methodologies
Phv historically relied on biological
experiments or manual review of case
report
7
In vitro Safety Pharmacology
Profiling (SPP) is one of the
fundamental method for preclinical;
by testing compounds with
biochemical and cellular assays.
SPP still not efficient (cost and time)
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
Computational methodologies for PhV
Vast quantities and complexity of
data to be analyzed
Computational methods at both pre-
marketing and post-marketing stages
are more efficient in time and cost (i.e.
can accurately detect ADRs in a
timely fashion)
SPP still not efficient (cost and time)
Datasets are available
EMA and NCA are example of
specialized companies that maintain
and develop database of ADRs
8
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
What is Data mining ?!
Data mining the process of extracting previously unknown, valid and
actionable information from large information sources or databases
So what we will need to do this process?!
project goals: detection and prevention of ADRs
dataset acquisition: Available
data cleaning and preprocessing: organize the raw data obtained
data mining: extract useful information
data interpretation: Analysis of data
utilization: the act of using
9
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
Computational methodology-Pre-Marketing
Most of existing research devoted to develop computational methods.
These research can be categorized into
I. protein target-based.
II. chemical structure-based approaches.
III. integrative approach.
10
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
Computational methodology-Pre-Marketing-Protein
target-based
Drugs typically work by activating or inhibiting the function of
a protein, which in turn results in therapeutic benefits to a
patient.
drugs with similar in vitro protein binding profiles tend to similar
side-effects, Fliri et al.
Fukuzaki et al, proposed a method to predict ADRs using sub-
pathways “cooperative pathways” (pathways that function
together).
They developed an algorithm called CoopeRativE Pathway
Enumerator (CREPE) to select combinations of sub-pathways
it depends on the availability of gene-expression data
observed under identical conditions.
11
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
CoopeRativE Pathway Enumerator
(CREPE)
12
V vertex, I itemset (activation conditions)
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
Computational methodology-Pre-
Marketing-Protein target-based
More recently, Brouwers et al proposed that the side
effect similarity of drugs could be attributed to their target
proteins being close in a molecular network.
They proposed a pathway neighborhood measure to
assess the closest distance of drug pairs according to their
target proteins in the human protein protein interaction
network and found network neighborhoods to only
account for 5.8% of the side-effect similarities compared
to 64% by shared drug targets.
13
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
Computational methodology-Pre-
Marketing-Protein target-based
Pouliot et al. applied logistic regression (LR) models.
To identify potential ADRs manifesting in 19 specific
system organ classes (SOCs), as defined by the Medical
Dictionary for Regulatory Activities ,across 485
compounds in 508 BioAssays in the PubChem database.
The models were evaluated using leave-one-out-cross-
validation. The mean AUCs (area under the receiver
operating characteristic curve) ranged from 0.60 to 0.92
across different SOCs.
14
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
Chemical Structure-based Approach-premarketing
It attempts to link ADRs to their chemical structure.
Bender et al, explore the correlation but the positive predictive was quit low under 0.5. but at least he proved the concept.
Hammann et al, employed decision tree to determine the chemical, physical, and structural properties of compounds that predispose them to causing ADRs
Hammann focused on ADRs in centerla nervous system (CNS),liver, and kidney.
Hammann decision tree model positive predictive accuracies ranging from 78.9% to 90.2%.
15
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
Chemical Structure-based Approach-premarketing
Pauwels et al. developed a sparse canonical correlation analysis (SCCA)
method to predict high-dimensional side-effect profiles of drug molecules
based on the chemical structures.
They predict 1385 side effects in the SIDER DB from chemical structures of
888 approved drugs.
Pauwels et al best resulting AUC(area under curve) was between 0.6088
and 0.8932
16
• SCCA examines the
relationships of many variables of different
types simultaneously
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
Integrative Approach- premarketing
Huang et al. proposed a new computational framework to predict ADRs
by integrating systems biology data that include protein targets, protein-
protein interaction network, gene ontology (GO) annotation ,and
reported side effects. They predict heart-related ADRs (i.e. cardio toxicity),
which resulted in the highest AUC of 0.771.
Recently, Liu et al. investigated the use of phenotypic information,
together with chemical and biological properties of drugs, to predict
ADRs. using five machine learning algorithms: LR, Naïve Bayes (NB),
KNearest Neighbor (KNN), Random Forest (RF), and SVM.
17
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
Integrative Approach
integration of chemical, biological, and phenotypic properties
outperforms the chemical structured-based method (from 0.9054 to 0.9524
with SVM) and has the potential to detect clinically important ADRs at
both preclinical and post-market phases for drug surveillance.
18
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
Post Marketing
many ADRs may still be missed
because the clinical trials are often
small, short, and biased by excluding
patients with comorbid diseases.
do not mirror actual clinical use
situations for diverse populations
(e.g. inpatient)
thus it is important to continue the
surveillance postmarket.
19
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
Computational methodology-Post
marketing-Data sources
Spontaneous reporting systems (SRSs) is the
core data-collection system for post-
marketing drug surveillance since 1960. US
FDA and the VigiBase maintain such as these
report.
World Health Organization (WHO) manage
these SRSs.
20
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
Post marketing-Spontaneous Reports
Disproportionality Analysis (DPA) involves frequency analyses of 2x2
contingency tables to quantify the degree to which a drug and ADR co-
occurs “disproportionally” compared with what would be expected if
there were no association
ADR No ADR Total
Drug a b N=a+b
No Drug c d c+d
Total M=a+c B+d T=a+b+c+d
21
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
Post marketing-Spontaneous Reports
Many approaches are applied the straightforward method is the
calculation of frequentist metrics
• Definitions of the frequentist measures of association
Association Measures Definition
Relative Reporting Ratio (RRR) (t * a) / (m * n)
Proportional Reporting Ratio (PRR) (a * (t – n)) / (c * n)
Reporting Odds Ratio (ROR) (a * d) / (c * b)
22
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
Post marketing-Spontaneous Reports
Other algorithms were also developed but they are more complex, such
as gamma-Poisson shrinker (GPS) and the multi-item gamma-Poisson
shrinker (MGPS)
DPA methods are effective in detecting single Drug-ADR associations
23
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
Data Mining Algorithms
DPA methods are effective in detecting single Drug-ADR associations
Data mining for multi-item ADR associations.
Harpaz et al identified 1167 multi-item ADR associations Using a set of
162,744 reports submitted to the FDA in 2008, 67% were validated by a
domain expert
Tatonetti et al applied the bi clustering algorithm to identify drug groups
that share a common set of ADRs in SRS data.
They discovered ADRs between drugs that couldn’t be discovered using
DPA method.(e.g pravastatin and paroxetine had effect on blood
glucose)
24
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
Post marketing -Electronic Medical
Records
Electronic Medical Records :is a computerized medical record created in
an organization that delivers care, EMRs contain not only detailed patient
information but also copious longitudinal clinical data.
EMR databases consist of data in two types formats:
(1) structured (e.g., laboratory data)
Several groups have employed computational methods on
structured or coded data in EMRs to identify specific ADR signals
(2) unstructured (narrative clinical notes).
25
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
Structured & unstructured26
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
Post marketing -Electronic Medical
Records-structured data
Yoon et al, demonstrated laboratory abnormality to be a valuable source
for PhV by examining the odds ratio of laboratory abnormalities between a
drug-exposed and a matched unexposed group using 10 years of EMR
data.
Evaluation of their algorithm on 470 randomly selected drug-and-
abnormal-lab-event pairs produced a positive predictive value of 0.837
and negative predictive value of 0.659.
27
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
Post marketing -Electronic Medical
Records-Unstructured Data
natural language processing (NLP) technique is required to extract the
needed information from unstructured data.
Wang et al first employed NLP techniques to extract drug-ADR
Link
28
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
Non-conventional Data Sources-Post marketing
1. Biomedical Literature
Shetty and Dalal retrieved articles (published between
1949 and2009), for prioritizing drug-ADR associations.
DPA was applied to identify statistically significant pairs
from the thousands of pairs in the remaining articles.
Evaluation showed that the method identified true
associations with 0.41 and 0.71 inprecision and recall,
respectively.
29
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
Non-conventional Data Sources
2. Health Forums
Data posted by users on health-related websites may also contain valuable drug safety information
mine drug-and-ADR from health –related websites (e.g. DailyStrength(http://www.dailystrength.org/))
System evaluation was conducted on a manually annotated set of 3600 user posts corresponding to 6 drugs. The system was shown to achieve 0.78 in precision and 0.70 in recall.
30
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
Non-conventional Data Sources
Chee et al, aggregated individuals’
opinions and review of drugs and used
NLP technique to group drugs.
Some drugs were withdrawn from
based on these messages.
31
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
Future perspectives
This presentation provides a general
overview of the current computational
methodologies applied for PhV. basic
concepts and highlight some
representative work
it is desirable to incorporate various
data sources into one framework to
understand ADRs.
Data mining algorithms are applicable
and useful to detect drugs
interactions.
EMR for ADR prediction is not readily
accessible for data mining, more
sophisticated studies and NLP
techniques is needed.
cause-and-effect relationships is an
intrinsically hard problem in data
mining and need to be further
investigated for the PhV application.
32
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
Useful links
https://www.mediguard.org
http://www.jmedicalcasereports.com
http://www-
stat.stanford.edu/~tibs/Correlate/
http://www.iom.edu/
http://www.smartlogic.com/
http://blogs.sas.com/content/jmp/201
2/06/04/disproportionality-analysis-is-
coming-in-jmp-clinical-4-0/
33
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
References
Oxford English Dictionary definition of "medicine“
Source: The Importance of Pharmacovigilance, WHO 2002
Budnitz, D.S., Pollock, D.A., Weidenbach, K.N.,Mendelsohn, A.B., Schroeder, T.J. and Annest, J.L. National surveillance of emergency department visits for outpatient adverse drug events. JAMA, 296, 15 (Oct 18 2006), 1858-1866.
Hopkins, A.L. Network pharmacology: the next paradigm in drug discovery. Nat Chem Biol, 4, 11 (Nov 2008), 682-690.
Helma C, Gottmann E, Kramer S. Knowledge discovery and data mining in toxicology. Stat Meth Med Res. 2000;9:329–58.
http://articles.mercola.com/sites/articles/archive/2012/02/11/leading-causes-of-death-cost-for-us-economy.aspx
Mutsumi Fukuzaki, Mio Seki,Side Effect Prediction using Cooperative Pathways
34
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
Thank you!35
Abdelfattah Al Zaqqa, PSUT-Amman-Jordan