Upload
erich-gombocz
View
109
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Yes, we can! Lessons from Using Linked Open Data (LOD) and Public Ontologies to Contextualize and Enrich Experimental Data The talk slides cover Ideas behind “Linked Open Data”, reality challenges, applied to a practical scenario of toxicity biomarkers; details on semantic mapping, harmonization, resource alignment; the lessons learned and their socioeconomic consequences Future Outlook
Citation preview
YES, WE CAN! LESSONS FROM USING LINKED OPEN DATA (LOD) AND PUBLIC ONTOLOGIES TO CONTEXTUALIZE AND ENRICH EXPERIMENTAL DATA
Erich A. Gombocz Andrea Splendiani Mark A. MusenRobert A. Stanley Jason A. Eshleman
1
OUTLINE• IDEAS BEHIND “LINKED OPEN DATA”• REALITY CHALLENGES• PRACTICAL SCENARIO: TOXICITY
BIOMARKERS• MAPPING, HARMONIZATION, RESOURCE
ALIGNMENT • LESSONS LEARNED: YES, WE CAN ! • SOCIOECONOMIC CONSEQUENCES• FUTURE OUTLOOK
• ACKNOWLEDGEMENTS• REFERENCES 2
3
IDEAS BEHIND “LINKED OPEN DATA”
IDEAL***** Linked (Open) Data =***** Collaborative Usability
4
GREAT!LET’S USE THEM FOR ENRICHMENT
REALITY CHECK
5
• NOT SO FAST:• Inconsistent namespace policies• Use of internal, non-formal application ontologies• Misaligned public and experimental corporate standards• Versioning and provenance issues• Reliability from service-level to URI persistence• More and more “Open data” are closed for commercial use• Serious funding concerns about government-backed resources
6
HMMM.. CAN WE REALLY USE THEM?
7
YES, WE CAN!LET’S SEE HOW MUCH WE GAIN …
STUDY SCENARIO• HEPATOTOXICITY STUDIES
• Panel of hepatotoxicants, single oral dose (placebo, low, mid, high) in groups of 4 rats, at 6, 24 and 48 hrs.
• Metabolic analysis of liver, serum and urine (1603 metabolic components; Bruker LC/MS-MS);
• Microarray analysis of liver and whole blood (31096 transcript probes; Affymetrix); • Statistical biomarker pre-selection at p<0.005, abs fc>10 (genes) and p<0.005,
abs fc>2.5 (metabolites)
• ALCOHOL STUDIES • High doses t.i.d. for four days, with and without 24h withdrawal.
• Metabolic analysis of plasma, liver and brain (1620 metabolic components) • Microarray analysis of liver and brain (31096 transcript probes) • Statistical biomarker pre-selection at p<0.005, abs fc>5 (genes) and p<0.005,
abs fc>2.5
8
OBJECTIVES
INTERPRET EXPERIMENTAL FINDINGS IN CONTEXT OF BIOLOGICAL FUNCTIONS
• HOW? • Use public LODs to semantically enrich and annotate experiments,
and qualify biological relevancy of putative biomarkers
• WHY? • Multi-modal experimental observations from the same perturbation can
represent very different biological processes. • Pharmacodynamic correlations are not necessarily functionally linked
biologically
9
APPROACH
RESOURCE ANALYSIS • What do we need to accomplish objectives?
• Basics (provenance, versioning, high interlink quality, persistence)
• Generally applicable, quick & easy solution
• Focus on ‘’ resources• UniProt, …
• Use existing formal ontologies (or parts of) whenever possible
• NCBO BioPortal
10
ROADMAP TO MAKE IT WORK
11
Map Experiments to RDF
Harmonize & Version
Refine Context
Annotate & Add to Knowledge Base
Get Answers from Contextual Queries
•Namespace, URI Policy•Entities vs. Literals, Data Types•Scripted Transformations•Provenance
•Concept alignment•Vocabularies, Thesauri•Ontology Merging•Versioning, Attribution
• Import only what’s needed• Iterate Visual SPARQL Queries•Establish Classifier Patterns
Be aware of challenges, BUT:
LOD will save you a lot of time in providing biological context for experimental findings !
12
13
14
15
16
17
18
19
20
21
22
23
RESULTS: BIOLOGY CONFIRMED
24
Marker Class Instance UniProt AC Pathway Gene Protein Biology
genes CYP2C40 P11510 cp2cc Cytochrome P450 2C40 heme binding, iron ion binding, aromatase activity
genes AKR7A3 P38918 akr7a3 Aflatoxin B1 aldehyde reductase member 3 detoxification
genes GPX2 P83645 gpx2 Glutathione peroxidase 2 response to oxidative stress, negative regulation of inflammatory response
genes MYC P09416 myc Myc proto-oncogene protein (Transcription factor p64) regulation of gene transcription, non-specific DNA binding, activates transcription of growth-related genes
genes MT1A P02803, Q91ZP8 mt1a Metallothionein-1 metal ion binding
genes HMOX1 P06762 hmox1 Heme oxigenase 1 heme catabolic process, negative regulation of DNA binding
genes FGF21 Q8VI80 fgf21 Fibroblast growth factor 21 (Protein Fgf21) positive regulation of ERK1 and ERK2 cascade, MAPKKK cascade and cell proliferation
genes AKR1B8 Q91W30 akr1b8 Aldose reductase-like protein oxidoreductase activity
genes TRIB3 Q9WTQ6 trib3 Tribbles homolog 3 disrupts insulin signaling by binding directly to Akt kinases, expression induced during programmed cell death
genes YC2 P46418 gsta5 Glutathione S-transferase alpha-5 (EC 2.5.1.18) response to drug, xenobiotic catabolic process
genes ABCB1, RGD:619951 P43245 abcb1 Multidrug resistance protein 1 (EC=3.6.3.44) response to organic cyclic compound, tumor necrosis factor, arsenic-containing substance or ionizing radiation
genes RGD:1310991 Q5U2P3 Zfand2a AN1-type zinc finger protein 2A zinc ion binding
genes GSTP1, GSTP2 P04906 gstp1 Glutathione S-transferase P (EC 2.5.1.18) response to toxin, xenobiotic metabolic process, response to reactive oxygen species, response to ethanol
genes RGD:708417 Q62789 ugt2p7 UDP-glucuronosyltransferase 2B7 (UDPGT 2B7) (EC 2.4.1.17) major importance in conjugation and subsequent elimination of toxic xenobiotics and endogenous compounds
genes GCLC P19468 gclc Glutamate--cysteine ligase catalytic subunit (EC=6.3.2.2) response to oxidative stress
genes TXNRD1 O89049 txnrd1 Thioredoxin reductase 1, cytoplasmic (EC=1.8.1.9) benzene-containing compound metabolic process, cell redox homeostasis, response to drug
genes NQO1 P05982 nqo1 NAD(P)H dehydrogenase [quinone] 1 (EC 1.6.5.2) response to oxidative stress, response to ethanol, superoxide dismutase activity
genes DDIT4L Q8VD50 ddit4l DNA damage-inducible transcript 4-like protein negative regulation of signal transduction, Inhibits cell growth by regulating TOR signaling pathway
metabolites Pyroglutamic acid Q9ER34 aco2 Aconitate hydratase, mitochondrial citrate metabolism, isocitrate metabolism, tricarboxylic acid cycle
metabolites Choline Q64057 aldh7a1 Alpha-aminoadipic semialdehyde dehydrogenase (EC 1.2.1.31) betaine biosynthesis via choline pathway, response to DNA damage stimulus
FROM COMPLEX TO ACTIONABLE
25
SUMMARY IN A NUTSHELL• GENERAL
• LOD resources can be used to confidently qualify statistical pharmacogenomic findings with systems biological responses
• Such relationships are key to better decode complex biological functions involved in toxicity.• We were able to qualify biomarker patterns for distinct categories of toxicity (Benzene-like,
Halogenated compound-like, Alcohol-like). Confirmation of biological viability enables their use for toxicity screening.
• SPECIFIC INSIGHTS GAINED• NUCLEOTIDE SYNTHESIS AND REPAIR: One-Carbon metabolism changes are due to differential
methylation• LONG-TERM MEMORY: Signaling pathway involvement indicates influence on long-term memory storage
in brain• DEPENDENCY: Ketoacidosis in liver and depletion of biogenic amine precursors relate to alcohol
dependency. • TISSUE SPECIFICITY: Major changes in purine metabolism suggest inhibition of xanthine oxidase through
oxidative stress while in plasma changes in biogenic amine precursors which rebound during withdrawal were also indicated by the selective depletion of cytosine and cytidine vs. thymidine.
• CHRONIC TOXICITY: Purine metabolism changes in liver explain observed processes in Krebs cycle and Tryptophan pathway indicative of chronic, long-term toxic effects.
26
TAKE HOME / FUTURE OUTLOOKYES, ENRICHING EXPERIMENTS WITH LOD
RESOURCES FACILITATES BETTER AND FASTER QUALIFICATION !
• In toxicity assessment at pre-clinical stage. biologically validated system changes associated with common toxicity mechanisms provide better a-priori determination of adverse effects of drug combinations.
• Models for classification of toxicity types (hepato-, nephro-, drug residue-based) were functionally qualified.
• THERE IS STILL ROOM FOR IMPROVEMENT• Permanent URLs, better inter-linking and provenance.
• NEED TO RECOGNIZE SOCIOECONOMIC BENEFITS
• Time and money saved should lead to new business models to secure LOD resource funding. 27
ACKNOWLEDGMENTS
28
Icoria / Cogenics Pat Hurban, Alan Higgins, Imran Shah, Hongkang
Mei,
Ed Lobenhofer
Bowles Center for Alcohol Studies / UNC Fulton Crews
BMIR / NCBO Stanford Mark Musen, Trish Whetzel
Bio2RDF II Michel Dumontier
SIB / UniProt Consortium Jerven Bolleman
Wikimedia Foundation Anja Jentsch
Support for Toxicity Studies NIST ATP #70NANB2H3009 NIAAA #HHSN281200510008C
W3C HCLS / Pharmacogenomics SIG
IO Informatics Andrea Splendiani, Jason Eshleman, Robert Stanley
REFERENCES1) LDOW2012 Linked Data on the Web. Bizer C,Heath T, Berners-Lee T, Hausenblas M. WWW Workshop on Linked
Data on the Web, 2012 Apr.16, Lyon, France.
2) The National Center for Biomedical Ontology. Musen MA, Noy NF, Shah NH, Whetzel PL, Chute CG, Story MA, Smith B. J Am Med Inform Assoc. 2012 Mar-Apr; 19 (2): 190-5
3) BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications. Whetzel PL, Noy NF, Shah NH, Alexander PR, Nyulas C, Tudorache T, Musen MA. Nucleic Acids Res. 2011; 39 (Web Server issue): W541-5
4) Using SPARQL to Query BioPortal Ontologies and Metadata Salvadores M, Horridge M, Alexander PR, Fergerson RW, Musen MA, and Noy NF. International Semantic Web Conference. Boston US. LNCS 7650, pp. 180195, 2012.
5) The Translational Medicine Ontology and Knowledge Base: driving personalized medicine by bridging the gap between bench and bedside. Luciano JS, Andersson B, Batchelor C, Bodenreider O, Clark T, Denney CK, Domarew C, Gambet T, Harland L, Jentzsch A, Kashyap V, Kos P, Kozlovsky J, Lebo T, Marshall SM, McCusker JP, McGuinness DL, Ogbuji C, Pichler E, Powers RL, Prud’hommeaux E, Samwald M, Schriml L, Tonellato PJ, Whetzel PL, Zhao J, Stephens S, Dumontier M. J.Biomed.Semantics 2011; 2(Suppl 2):S1
6) VoID Vocabulary of Interlinked Datasets. Cyganiak R, Zhao J, Alexander K, Hausenblas M. DERI, W3C note 6-Mar-2011
7) PROV-O: The PROV Ontology. W3C Candidate Recommendation 11- Dec-2012
8) Does network analysis of integrated data help understanding how alcohol affects biological functions? - Results of a semantic approach to biomarker discovery. Gombocz EA, A.J. Higgins AJ, Hurban P, Lobenhofer EK, Crews FT, Stanley RA, Rockey C, Nishimura T. 2008 Sept.29-Oct.1.Biomarker Discovery Summit, Philadelphia, PA.
30