Repurpose terbutaline sulfate for amyotrophic lateral sclerosis using electronic medical records. Hyojung Paik Ph.D. UCSF, Institute for computational

Repurpose terbutaline sulfate for amyotrophic lateral sclerosis using electronic medical records. Hyojung Paik Ph.D. UCSF, Institute for computational health sciences Yonsei University seminar May 27. 2015

2 Repositioning, repurpose

Redefine same (=), similar (), and different () Joel T. Dudley et al. Brief Bioinform 2011 1.HOW much similar? 2.Between WHAT A, B (Drug, Disease) 3.HOW difference ? Efficacy of drug inverse signature 4. In terms of WHAT (data)? Principle 3 Technical support 1.Find robust signatures of drug, disease 2.In large scale 3.Validation of finding 4.Clinical applications

Drug will inverse disease signatures 4 M. Sirota et al. Sci. Trans. Med. 2011 1.HOW much similar? 2. Between WHAT A, B Drug A vs. Disease B with directionality If Drug As gene X & Disease Bs gene X then Drug A Disease B Find robust signature !! (Meta analysis) 3.HOW difference ? Efficacy of drug inverse signature of disease 4.In terms of WHAT (data)? - Conserved gene expression of diseases (GEO) - Conserved gene expression of drugs (Connectivity Map) Experimental validation: 1. Cimetidine (gastric ulcer drug) lung cancer 2. Topiramate (anticonvulsant drug) Crohns disease J. Dudley et al. Sci. Trans. Med. 2011

Similarity based (1) 5 1.HOW much similar? 2. Between WHAT A, B Drug A Drug B Disease X Disease Y If Drug A Disease X is TRUE then Drug B Disease X If Drug B Disease Y is TRUE then Drug A Disease X 3.HOW difference ? Efficacy of drug inverse signature 4.In terms of WHAT (data)? - Gene Expression - Sequence of associated genes - Gene Ontology - PPI network - Phenotype terms of Drug, Disease Assaf et al. Mol. Sys. Biol. 2011 Validation: Cross validation, ClinicalTrials.gov

6 Similarity based (2) A. Chiang et al. Clin. Pha. Thera. 2009 1.HOW much similar? 2. Between WHAT A, B Drug A Drug B If Drug As use Drug B 3.HOW difference ? Efficacy of drug inverse signature 4.In terms of WHAT (data)? - Known drugs indication (FDA approved) - Practice of drug use (off-label use) Validation: Literatures, ClinicalTrials.gov

7 Similarity based (3) Paik et al. Sci. Rep. 2015 1.HOW much similar? 2. Between WHAT A, B Drug A Drug B Disease X Disease Y If Drug A Disease X is TRUE then Drug B Disease X If Drug B Disease Y is TRUE then Drug A Disease X 3.HOW difference ? Efficacy of drug inverse signature 4.In terms of WHAT (data)? - Gene Ontology - PPI network similarity - Electronic Medical Records of disease patients - EMRs of drug treated patients Experimental validation: 1. Terbutaline sulfate (Asthma drug) ALS 2. Action mechanism of Terbutaline sulfate

Background 8 Successful cases i)sildenafil (Viagra) for erectile dysfunction, leaded by the reports of side-effects based on phenotypic descriptions of human volunteers ii)thalidomide severe phenotypic side-effect by antiangiogenic mechanism to regulate bone marrow vascularity for multiple myeloma therapy (NEJM, 1998) iii)arsenic trioxide long term observation of phenotypes therapeutic effect in acute promyelocytic leukemia (NEJM, 1999)

Problem definition 9 Goal: Predicting new edges in an incompletely known drug-disease bipartite network using information about the nodes. DrugDisease A set of drug (source): S= {s 1, s 2, s m } A set of disease (target) : T = {t 1, t 2, t n } Set of edges in a bipartite network: E = {e 11, e 12, e ij, e mn } Where, e ij = edge between drug s i and disease t j Given set Predicting the bipartite graph by training information about nodes. Predict new edges linking drug-disease (source-target) nodes: N s - N t Label of edge = +1 (known) or 0 (others) A set of labels for edges: LE={le 11, le 12, le ij, le mn } Make classification rule discrete the +1 labeled data from -1 using training set Problem definition SourceTarget

10 With classification rule, predict label of edge between s i and t j (le ij ) Presence or absence of edge eij ? i) Disease tj labels of tj and si= +1 or -1 ii) Classification rule (C): edge of (si vs. tj), eij label of eij, leij=1 g.m=geometric mean eij> leij=1 eij = max(g.m(si-S similarity rank, tj-T similarity rank))*(S-T label/degree of S)) Drug Disease tj T si 11 1 similarity T 1 H1 LH1 1HL 1 1 1 S T S Learning set Test set (unknown) tj vs. T T SourceTarget S L H L di vs. D D L H L

11 Cont Drug-Drug, Disease-Disease similaritiesDegree of drug: d (s) Clinical data driven P c (e ij )Genomic data driven P g (e ij ),, High degree,,,,Medium degree,,Low degree f (e i j ) f (e i j ) > True P c (e ij ) = prediction value of e ij using clinical physiomic signatures P g (e ij ) = prediction value of e ij using genomic signatures f (e i j ) > > LargeSmall similarity drugdisease sisi tjtj f (e i j ) f (e ij ) = final prediction value of e ij Node degree of drug High Low Edge Similarity Low High Node colorEdge color The basic assumption of ClinDR is that i)similar drugs can be treated to similar diseases. ii)A drug prescribed various diseases a strong candidate for drug repositioning. ClinDR synthesize a clinical score (P c (e ij )) and a genomic score (P g (e ij )) to calculate the final edge score (f(e ij )) between a disease and a drug.

Learning features 12 LevelTaskFeaturesData ClinicalDisease-diseaseDifference of Lab value (0 time)EMR ClinicalDrug-DrugDistance of Lab valueEMR MolecularNetwork based Disease-disease Drug-drug similarity Drug, Disease related gene (si, sj) = Membership score of sj for si Public MolecularGO semantic similarity between drug-drug & disease-disease GO term semantic similarity (Resnik et al, 1999) similarity score Sim(x, y) [0, ] 1/(sim(x,y)+1) = similarity distance [0,1] (ovaska et al, 2008) Public MolecularCommon geneJaccard score of disease, drug related genePublic A=0.9, b=1

Feature detail (3) 13 Disease-disease similarity : difference of 0 time lab values 0 timet time Physiological /clinical phenotype of disease Rank sum test (p-value) 0 0 0 0 P-value rank (0~1) 0 0 0 0 Lab A Lab B . Similarity of disease X-Y top-rank(p-values of rank-sum test with Lab value)

Feature detail (4) 14 Drug-drug similarity : difference of lab value perturbation Min-max difference of lab value = lab value perturbation by drug treatment Single drug treated case used only Lab A Case 1Min value, Max value,.. Case 2Min value, Max value,.. .. Drug X Lab A Case 1Min value, Max value,.. Case 2Min value, Max value,.. .. Drug Y Rank sum test (p-value) Similarity of drug X-Y top-rank(p-values of rank-sum test with Lab value)

15 Overview Paik et al. Sci. Rep. 2015

18 A drug with promiscuous indications promises drug repositioning(s). Frequency [0,1] 0 1~11 Degree of drug nodes p-value 2.27e-08 * * Wilcoxon rank test 31~41 Drug-disease indications (Exist) Approved drug-disease indications (A) Disease Drug Exist drug indication Drug-disease indications (Selected + ) + Learning set of ClinDR Drugs = 562 Diseases= 291 Drug-disease indications (Edges) = 17716 Node of drugs = 1114 Node of diseases= 5838 Drug-disease indications (Edges) = 419177 A B No. of drugs 36 B A No. of drugs 888 No. of drugs 226 A BA - (A B) Degree of drug nodes(log10) // A- (A B) A B Degree weight [0,1] Ratio of frequency [0,1] Degree of drug nodes Name unify A. B. C.D. E. F. Phenobarbital Autoimmune diseases 32 Cancers Drug-disease indications under clinical trials (B)

19 Different clinical signatures present distinct landscapes of disease-disease or drug-drug similarities. Diagnose code + Erythrocyte Sedimentation Rate ++ C91.1 D68.2Hem/Car Hem/ImmD80.0 Disease class ++ p-value ** 4.2e-26 Met/Imm/Hem End Total Cholesterol ++ E83.3 E70.0 Disease class ++ p-value ** 3.43e-39 + ICD10-code of disease diagnose ++ Goh et al (PNAS, 2007) * Wilcoxon test ** Hypergeometric test ++ Laboratory test results in diagnose point p-value * 1 0 Similar Dissimilar C91.1 Chronic lymphocytic leukaemia of B-cell type D68.2 Congenital afibrinogenaemi a E83 Disorders of mineral metabolism E83.3 Acid phosphatase deficiency E70.0 Classical phenyketonuria D80.0 Hereditary hypogammaglobulinaemia Hem : Hematological diseases Car: Cardiovescular diseases Imm: Immunological diseases Met: Metabolic diseases Diagnose code + z Disease pairs Drug pairs Platelet count GOT 1 GPT 2 Alkaline phosphatase Total cholesterol AC glucose 3 ChlorideESR 4 Total CO2 aPTT 5 Sodium SimilarDissimilar 1 Glutamic Oxaloacetic Transaminase; 2 Glutamic Pyruvic Transaminase; 3 Ante Cibum (before meal) glucose; 4 Erythrocyte Sedimentation Rate; 5 activated Partial Thromboplastin Time p-value + + Wilcoxon Rank-Sum Test

20 Performance evaluation 10-fold cross validation . source drugtarget disease Training set Test set Similarity rank matrix Disease-diseaseDrug-drugDisease-diseaseDrug-drug Genomic signatureClinical physiomic signatures ? Disease-disease similarity rank Drug-drug similarity rank > True False && ClinDR Genomic signature GBA Clinical signatures : Genomic signatures : Drug-disease association : ++++++ +++++ Sensitivity Specificity AUC

21 Positive predictions of ClinDR are highly enriched with current clinical trial cases. N m n^n^ k = number of clinical trials in predictions of ClinDR n = number of predictions of ClinDR N= total combination of drug-disease associations m = number of putative drug indications 12,430 (Drugs 226, Disease 55) 745 3,891 173 (Drugs 83, Disease 35) k p-value * = 3.0e-07 ^ false-positive set of ClinDR with common drug and disease in ClinicalTrials.gov (cut-off = 0.86) * hyper geometric test -log10(p-value + ) ClinDR Without Degree Clinical signatures Degree of drug nodes Clinical physiomic signatures Genomic signatures ++++++ -++-++ ++-++- Learning data Method By used features Genomic signatures +-++-+ p-value

Documents

Repurpose terbutaline sulfate for amyotrophic lateral sclerosis using electronic medical records. Hyojung Paik Ph.D. UCSF, Institute for computational