1
DisGeNET: Applying Semantic Web and Network Analysis Approaches for the Integration and Analysis of Gene-Disease Associations for Translational Research and Drug Discovery Janet Piñero, Núria Queralt-Rosinach , Àlex Bravo, Ferran Sanz and Laura I. Furlong Integrative Biomedical Informatics Group, Research Programme on Biomedical Informatics; Hospital del Mar Medical Research Institute; Pompeu Fabra University Acknowledgements The authors thank the Open PHACTS partners, Michel Dumontier and the OpenLink staff for their input, collaboration and help. Funding: We received support from ISCIII-FEDER (PI13/00082, CP10/00524), from the IMI-JU under grants agreements nº 115002 (eTOX), nº 115191 (Open PHACTS)], nº 115372 (EMIF) and nº 115735 (iPiE), resources of which are composed of financial con-tribution from the European Union's Seventh Framework Pro-gramme (FP7/2007-2013) and EFPIA companies’ in kind contribu-tion, and the EU H2020 Programme 2014-2020 under grant agreements no. 634143 (MedBioinformatics) and no. 676559 (Elixir-Excelerate). The Research Programme on Biomedical Informatics (GRIB) is a node of the Spanish National Institute of Bioinformatics (INB). DISCOVERY PLATFORM 2 KNOWLEDGE BASE TOOLS EVIDENCE-BASED DISCOVERY CLINICIAN INTEROPERABILITY METADATA DATABASES & LITERATURE STANDARDS INTEGRATION OPEN http://www.disgenet.org/ RESEARCHER CURATOR BIOINFORMATICIAN & DEVELOPER DISCOVERABILITY COMMUNITY USE LARGE-SCALE EXTRACTION DIGITAL PUBLICATION, SHARING AND LINKING Usage stats (Ago2014-Ago2015): 12,040 users, 22,696 sessions 14,494 downloads DisGeNET used in +20 publications, cited in +60 articles Registered: biosharing OMICtools NeuroLex Datahub Integrated knowledge: Text mining extraction Integration with well-curated data Analysis Discovery and decision-making Present in the Semantic Web: URI/RDF/nanpublications Machine-processable Semantic integration Links to the Linked Open Data cloud Data analysis across domains IMPLEMENTATION STANDARDIZATION TRANSLATIONAL RESEARCH INTEROPERABILITY INTEGRATION EVIDENCE SEMANTIC WEB NETWORK ANALYSIS DRUG DISCOVERY WEB-BASED EXPLORATION KNOWLEDGE BASE TOOLS FOR EXPLORATION AND ANALYSIS REPRODUCIBILITY ACCESSIBILITY LHGDN S = W CURATED + W PREDICTED + W LITERATURE SYNTACTIC NORMALIZATION SEMANTIC Downloads Web Interface SPARQL endpoint Open Database License Programmatic access Metadata Digital objects Data item-level description Dataset-level description Transparency and validation Tab separated plain text SQLite RDF Trusty nanopublications http://www.disgenet.org/ http://rdf.disgenet.org/sparql/ http://opendatacommons.org/licenses/odbl/1.0/ Linked Data Browser http://rdf.disgenet.org/fct/ Automatic analysis Higher speed Reduce error Share results Embed in workflows DisGeNET association type ontology Semanticscience Integrated Ontology (SIO) 4 11 common ontologies in NCBO BioPortal RDF 1 Nanopublications 3 NCBI Gene ID UMLS Concept Unique Identifiers Normalized Identification Scheme http://rdf.disgenet.org/resource/gda/ + ID What are the diseases associated to melanocortin 4 receptor (MC4R)? What are the genes associated to Obesity? 429,111 Gene-Disease Associations What is the pattern of tissue expression of the genes associated to Obesity? References 1. Queralt-Rosinach, N., Piñero,J. , Bravo, À, Sanz, F. and Furlong, L.I. DisGeNET-RDF: harnessing the innovative power of the Semantic Web to explore the genetic basis of diseases, 2015 ( submitted). 2. Piñero, J., Queralt-Rosinach, N., Bravo, A., Deu-Pons, J., Bauer-Mehren, A., Baron, M., … Furlong, L. I. (2015). DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes. Database, 2015(0), bav028 bav028. 3. Queralt-Rosinach, N., Kuhn, T., Chichester, C., Dumontier, M., Sanz, F., and Furlong, L.I., Publishing DisGeNET as Nanopublications. Semantic Web Journal, ( to appear ), 1-10, 2015. 4. Dumontier, M., Baker, C. J., Baran, J., Callahan, A., Chepelev, L., Cruz-Toledo, J., Hoehndorf, R. (2014). The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery. Journal of Biomedical Semantics, 5(1), 2014. 5. Gray, A. J. G., Groth, P., Loizou, A., Askjaer, S., Brenninkmeijer, C., Burger, K., Williams, A. J. (2014, January 1). Applying linked data approaches to pharmacology: Architectural decisions and implementation. Semantic Web. IOS Press. doi:10.3233/SW-2012- 0088 Several formats and models Provenance (PubMed ID, source) DisGeNET score (evidence) Context metadata for each G-D pair Sentence description 17,181 Genes PANTHER class 14,610 Diseases MeSH class Semantic Web platform to answer complex questions for the pharmacological field 5 What is the pattern of tissue protein expression of the genes associated to Obesity that are involved in the same pathway, and retrieve all bioactive small molecules hitting each target (filter minEx-pChembl=5)? Large-scale integration across domains DisGeNET Cytoscape plugin 60% complex, 36% rare/Mendelian, and 4% infectious diseases DO MSH OMIM NCI ORDO ICD9 19 58 38 33 13 12 Recent findings

DisGeNET: Applying Semantic Web and Network Analysis ... · DisGeNET: Applying Semantic Web and Network Analysis Approaches for the Integration and Analysis of Gene-Disease Associations

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DisGeNET: Applying Semantic Web and Network Analysis ... · DisGeNET: Applying Semantic Web and Network Analysis Approaches for the Integration and Analysis of Gene-Disease Associations

DisGeNET: Applying Semantic Web and Network Analysis Approaches for the Integration and Analysis of Gene-Disease Associations for Translational Research and Drug Discovery

Janet Piñero, Núria Queralt-Rosinach, Àlex Bravo, Ferran Sanz and Laura I. Furlong Integrative Biomedical Informatics Group, Research Programme on Biomedical Informatics; Hospital del Mar Medical Research Institute; Pompeu Fabra University

Acknowledgements The authors thank the Open PHACTS partners, Michel Dumontier and the OpenLink staff for their input, collaboration and help. Funding: We received support from ISCIII-FEDER (PI13/00082, CP10/00524), from the IMI-JU under grants agreements nº 115002 (eTOX), nº 115191 (Open PHACTS)], nº 115372 (EMIF) and nº 115735 (iPiE), resources of which are composed of financial con-tribution from the European Union's Seventh Framework Pro-gramme (FP7/2007-2013) and EFPIA companies’ in kind contribu-tion, and the EU H2020 Programme 2014-2020 under grant agreements no. 634143 (MedBioinformatics) and no. 676559 (Elixir-Excelerate). The Research Programme on Biomedical Informatics (GRIB) is a node of the Spanish National Institute of Bioinformatics (INB).

DISCOVERY PLATFORM2

KNOWLEDGE BASE TOOLS

EVIDENCE-BASED DISCOVERY

CLINICIAN INTEROPERABILITY

METADATA

DATABASES & LITERATURE STANDARDS

INTEGRATION

OPEN

http://www.disgenet.org/

RESEARCHER

CURATOR

BIOINFORMATICIAN & DEVELOPER

DISCOVERABILITY

COMMUNITY USE

LARGE-SCALE EXTRACTION

DIGITAL PUBLICATION, SHARING AND LINKING Usage stats (Ago2014-Ago2015):

• 12,040 users, 22,696 sessions • 14,494 downloads • DisGeNET used in +20 publications,

cited in +60 articles

Registered: • biosharing • OMICtools • NeuroLex • Datahub

Integrated knowledge: • Text mining extraction • Integration with well-curated data • Analysis • Discovery and decision-making

Present in the Semantic Web: • URI/RDF/nanpublications • Machine-processable • Semantic integration • Links to the Linked Open Data cloud • Data analysis across domains

IMPLEMENTATION

STANDARDIZATION

TRANSLATIONAL RESEARCH INTEROPERABILITY

INTEGRATION

EVIDENCE

SEMANTIC WEB

NETWORK ANALYSIS

DRUG DISCOVERY

WEB-BASED EXPLORATION

KNOWLEDGE BASE TOOLS FOR EXPLORATION AND ANALYSIS

REPRODUCIBILITY ACCESSIBILITY

LHGDN

S = WCURATED + WPREDICTED + WLITERATURE

SYNTACTIC

NORMALIZATION

SEMANTIC

Downloads

Web Interface

SPARQL endpoint

Open Database License

Programmatic access

Metadata

Digital objects

• Data item-level description • Dataset-level description Transparency and validation

• Tab separated plain text • SQLite • RDF • Trusty nanopublications

http://www.disgenet.org/

http://rdf.disgenet.org/sparql/

http://opendatacommons.org/licenses/odbl/1.0/

Linked Data Browser http://rdf.disgenet.org/fct/

• Automatic analysis • Higher speed • Reduce error • Share results • Embed in workflows

DisGeNET association type ontology

Semanticscience Integrated Ontology (SIO)4

• 11 common ontologies in NCBO BioPortal

• RDF1

• Nanopublications3

• NCBI Gene ID • UMLS Concept Unique Identifiers

• Normalized Identification Scheme http://rdf.disgenet.org/resource/gda/ + ID

What are the diseases associated to melanocortin 4 receptor (MC4R)?

What are the genes associated to Obesity?

429,111 Gene-Disease Associations

What is the pattern of tissue expression of the genes associated to Obesity?

References 1. Queralt-Rosinach, N., Piñero,J. , Bravo, À, Sanz, F. and Furlong, L.I. DisGeNET-RDF: harnessing the innovative power of the

Semantic Web to explore the genetic basis of diseases, 2015 (submitted). 2. Piñero, J., Queralt-Rosinach, N., Bravo, A., Deu-Pons, J., Bauer-Mehren, A., Baron, M., … Furlong, L. I. (2015). DisGeNET: a discovery

platform for the dynamical exploration of human diseases and their genes. Database, 2015(0), bav028–bav028.

3. Queralt-Rosinach, N., Kuhn, T., Chichester, C., Dumontier, M., Sanz, F., and Furlong, L.I., Publishing DisGeNET as Nanopublications.

Semantic Web Journal, (to appear), 1-10, 2015. 4. Dumontier, M., Baker, C. J., Baran, J., Callahan, A., Chepelev, L., Cruz-Toledo, J., … Hoehndorf, R. (2014). The Semanticscience

Integrated Ontology (SIO) for biomedical research and knowledge discovery. Journal of Biomedical Semantics, 5(1), 2014. 5. Gray, A. J. G., Groth, P., Loizou, A., Askjaer, S., Brenninkmeijer, C., Burger, K., … Williams, A. J. (2014, January 1). Applying linked

data approaches to pharmacology: Architectural decisions and implementation. Semantic Web. IOS Press. doi:10.3233/SW-2012-0088

Several formats and models

• Provenance (PubMed ID, source)

• DisGeNET score (evidence)

Context metadata for each G-D pair

Sentence description

• 17,181 Genes • PANTHER class

• 14,610 Diseases • MeSH class

• Semantic Web platform to answer complex questions for the pharmacological field 5

What is the pattern of tissue protein expression of the genes associated to Obesity that are involved in the same pathway, and retrieve all bioactive

small molecules hitting each target (filter minEx-pChembl=5)?

• Large-scale integration across domains

• DisGeNET Cytoscape plugin

60% complex, 36% rare/Mendelian, and 4% infectious diseases

DO MSH OMIM NCI ORDO ICD9

19 58 38 33 13 12

Recent findings