Structuring what we know and use that to better understand...

Structuring what we know and use that to better understand your data

@Chris_Evelo: Department of Bioinformatics – BiGCaT,

WikiPathways team, ELIXIR Interoperability team, Open PHACTS

So many…

ELIXIR, EXCELERATE, CORBEL, GA4GH, EGA, dbNP, ENPADASI, DISH, Open PHACTS, BBMRI, DRE, EuroCAT, DTL, EATRIS, DiXa, UniProt, PDB, CheBI, ChEMBL, HMDB, ISA, FAIR, RDF, VOID, Nanopubs, eNanomapper, KEGG, Reactome, Entrez, Parelsnoer, Arrayexpress, GEO, ENCODE, Recon2, SMBL, SBGN, MIM

And that is just what I discussed yesterday…

The typical question we get about using big data

We can do things like this (diabetic liver)

Pihlajamäki et al. dataset is from Gene Expression Omnibus (accession number GSE15653)

Pihlajamäki et al. J ClinEndocrinol Metab. 2009, 94 (9): 3521-3529. DOI: 10.1210/jc.2009-0212.

Martina Kutmon et al.BMC Genomics 2014, 15:971.DOI: 10.1186/1471-2164-15-971

Data predators

Data: Wang et al. 2011. in Gene Expression Omnibus (GEO, http://ncbi.nlm.nih.gov/geo/, accession number: GSE17461.

Published paper: Effects of 1alpha,25 dihydroxyvitamin D3 and testosterone on miRNA and mRNA expression in LNCaP cells. WL Wang et al. Mol Cancer 2011. 10. doi:10.1186/1476-4598-10-58

Or: Vitamin D effects on prostate cancer cells

Integrative network-based analysis of mRNA and microRNA expression in vitamin D3-treated cancer cells

Internal &external

datarepositories

e.g. dbNP,Sage, Atlas

knowledgeresources &

(semantic web)Integration

e.g. Open PHACTSWikiPathways

study capturingISA

models

studydataprocessing,statistics,storagee.g. arrayanalysis.org

ontologies

modeling & data integration,network biology (extension),supervised statistics

curation, simulation annotation &

provenance

Integrative Systems Biology

researchapplications

mappingBridgeDb

extraction,SPARQLingconversion

http://www.wikipathways.org/instance/WP430

http://www.wikipathways.org/index.php/Pathway:WP430

WikiPathways

• Public resource for biological pathways

• Anyone can contribute and curate

• More up-to-date representation of biological knowledge

WikiPathways: capturing the full diversity of pathway knowledge. M Kutmon et al

Nucleic Acids Res 2015: first published online: Oct 19.

Big data: Wikiomics. Mitch Waldrop. Nature 2008: 455, 22-25

We the curators. Allison Doerr. Nature Methods 2008: 5, 754–755

No rest for the bio-wikis. Ewen Callaway. Nature 2010: 468, 359-360

How to do interoperable data visualization?

Connect to Genome Databases

Backpages link to multiple databases

You could do this for gene lists

Don’t be afraid to reinvent wheels!

BridgeDb: Abstraction Layer

interface

IDMapper

IDMapperRdb

relational database

IDMapperFile

tab-delimited text

IDMapperBiomart

web service

The BridgeDb Framework: Standardized Access to Gene, Protein and Metabolite Identifier

Mapping Services. Martijn P van Iersel, Alexander R Pico, Thomas Kelder, Jianjiong Gao, Isaac Ho,

Kristina Hanspers, Bruce R Conklin, Chris T Evelo. BMC Bioinformatics 2010, 11: 5.

Combine: WikiPathways tissue analyzer

Work done by Jonathan Melius

WikiPathways, a house of webs?

Combine: adding miRNA’s clutters

Combine: regulator Interaction in MiPaSt PathVisio plugin

Work done by Christian Oertlin.

Pathways in Cytoscape

Figure 2. The Cardiac Hypertrophic Response pathway loaded as a network.

Kutmon M, Lotia S, Evelo CT and Pico AR 2014 [v1; ref status: indexed, http://f1000r.es/3ij] F1000Research 2014, 3:152 (doi: 10.12688/f1000research.4254.1)

All pathways

Pathways with high z-score

grouped together.

Explains why there are

relatively few significant

genes, but many pathways

with high z-score.

Cytoscape visualization used to group

Pathway interactions and what causes them

Thomas Kelder, Lars Eijssen, Robert Kleemann, Marjan van Erk, Teake Kooistra, Chris Evelo

(2011) Exploring pathway interactions in insulin resistant mouse liver.

BMC Systems Biology 5: 127 Aug. http://dx.doi.org/doi:10.1186/1752-0509-5-127

Pathway interactions and

detailed network visualization

for the interactions with three

apoptosis related pathways for

the comparison between HF and

LF diet at t = 0. A: Subgraph of the

pathway interaction network, based

on incoming interactions to three

stress response and apoptosis

pathways with the highest in-

degree. Pathway nodes with a thick

border are significantly enriched (p

< 0.05) with differentially expressed

genes. B: The protein interactions

that compose the interactions

between the three apoptosis

related pathways and their

neighbors in the subgraph as

shown in box A (see inset, included

interactions are colored orange).

Protein nodes have a thick border

when their encoding genes are

significantly differentially expressed

(q < 0.05).

Regulation resources

human ErbB signaling pathway extended with validated microRNA regulation

If we don’t do the magic

LiteraturePubChem

GenbankPatents

DatabasesDownloads

Data Analysis Data Integration Firewalled Databases

How do R&D companies use public data?

How do pharma companies use public data?

Pfizer

@gray_alasdair Big Data Integration 39

Semantic web grammar

Nanopub

Data Cache (Virtuoso Triple Store)

Semantic Workflow Engine

Linked Data API (RDF/XML, TTL, JSON)

Domain

Specific

Services

Identity

Resolution

Service

Chemistry

Registration

Normalisation

Identifier

Management

Service

Indexing

P12374

EC2.43.4

CS4532

“Adenosine

receptor 2a”

Nanopub

Public Content Commercial

Public

Ontologies

Annotations

Nanopub

Data Cache (Virtuoso Triple Store)

Semantic Workflow Engine

Linked Data API (RDF/XML, TTL, JSON)

Domain

Specific

Services

Identity

Resolution

Service

Chemistry

Registration

Normalisation

Identifier

Management

Service

Indexing

P12374

EC2.43.4

CS4532

“Adenosine

receptor 2a”

Nanopub

Public Content Commercial

Public

Ontologies

Annotations

Choose a standard

Link one resource to another

Or use both and map

Mapping tools are core tools: need funding and sustainability

Database identifier mapping tools we have:

• A software framework (BridgeDb)– Application in WikiPathways, PathVisio, Cytoscape, R/Bioconductor– An installable webservice– Open source– Community based– Database based (small)

• A semantic web implementation (Open PHACTS IMS)– With installable Docker image– Linkset based (fast)– Transitivity (and limits for that)

• gene -> protein -> has enzyme code• Protein -> has enzyme code -> other proteins

• Identifiers.org for ID schema’s and resolution

This is not just Open PHACTS

Federated SPARQL queries:

e.g. find all genes related to disease, then all pathways with these genes…

Used as hackaton (swat4ls) examples

Only works sometimes, by chance

Needs integrated ID mapping!

Ontology mapping• Many available, even as services

• Often integated in data resources

– Make my own, slim, combine, map, extend

– Needs feedback to original!

Metabolite mapping needs

• More mappings! (plant products, drugs, xenobiotics)

• Ontology based mapping (CheBi)

• Because:

– Palmitic acid is a fatty acid

– R,R,R-tocopherol is a form of Vitamin E

• And these should (sometimes) map

Also applies to biology:scientific lenses

Chemistry mapping

• Structure not ID based

• Allow substructure searches

• Open PHACTS open source ???

• We need it, may have to redo

From reproducibility to reusability

Reuse problems

The age distribution in the experimental groups were not significantly different…

Can we reuse that data to find out age effects?

Yes, if that is actually captured

Needs:Ontologies (bioportal)Principles/standards (FAIR, ISA)Capture tools (dbNP, Molgenis, OpenCLinica, eNotebooks)Study repositories (Biosamples, Biostudies)Data repositories (EGA, GEO, Arrayexpress, Metabolights, Pride)

Structuring what we know and use that to better understand...

Documents

An integrated analysis revealed different microRNA-mRNA ...€¦ · miRNAs are endogenous small non-coding RNAs (~22 nucleotides) that modulate gene expression at the post-transcriptional

MicroRNA profiles in B-cell non-Hodgkin lymphoma · single-stranded RNAs implicated in the regulation of mRNA function and translation. Each microRNA can regulate multiple transcripts;

microRNA-mRNA interaction identification in Wilms tumor using principal component analysis based unsupervised feature extraction

MicroArray Technology - Expression Profiling of MRNA and MicroRNA

MicroRNA profiling during rat ventricular maturation: A ... · mane animal care standards approved by the Institutional Animal ... 2.3. miRNA microarray, mRNA microarray, and GO-network

Correlation of MicroRNA-16, MicroRNA-21 and … level but not at the protein level, COX-2 correlated with mRNA levels of angiogenic factors VEGFR1, Ang-1, and Tie2. miR-21 expression

INTEGRATIVE NETWORK ANALYSIS TO IDENTIFY ABERRANT … · A recent study explored mRNA expression, microRNA expression, promoter methylation and DNA copy number data on TCGA ovarian

Research Paper MicroRNA-325-3p prevents sevoflurane-induced …€¦ · Sevoflurane impaired learning and memory and increased neural apoptosis and Nupr1 mRNA levels in neonatal rats

Integrated microRNA‑mRNA analyses of distinct expression

High-throughput MicroRNA and mRNA Sequencing Reveals that ... · Golam Jalal Ahammed, Zhejiang University, China Mingpu Tan, Nanjing Agricultural University, China *Correspondence:

miR-1275: A single microRNA that targets the three IGF2-mRNA … A single microRNA... · 2016-08-08 · miR-1275: A single microRNA that targets the three IGF2-mRNA-binding proteins

MicroRNA, mRNA, and protein expression link development ... · Research MicroRNA, mRNA, and protein expression link development and aging in human and macaque brain Mehmet Somel,1,2,7,8

DMirNet: Inferring direct microRNA-mRNA association networks

MicroRNA-Targeted and Small Interfering RNAâ€“Mediated mRNA

Integration of microRNA–mRNA profiles and pathway analysis ... · junction, acetylization process, TGF-β pathway, and Wnt signaling pathway. Conclusion: BBR could inhibit the proliferation

ORIGINAL ARTICLE Gene expression networks in COPD: microRNA and mRNA … · ORIGINAL ARTICLE Gene expression networks in COPD: microRNA and mRNA regulation Michael E Ezzie,1 Melissa

Deciphering microRNA-mRNA regulatory network in adult T

Therapeutics mRNA and microRNA Expression Profiles of the … · microRNA expression for the 60 cell lines of the National Cancer Institute (NCI) Developmental Therapeutics program

Construction of a MicroRNA-mRNA Network Underlying ...downloads.hindawi.com/journals/bmri/2020/9246868.pdf · Bioinformatics Analysis Junzui Li, Bin Zhao, Cui Yang, and Qionghua Chen

Integrated mRNA and microRNA transcriptome …...RESEARCH ARTICLE Open Access Integrated mRNA and microRNA transcriptome variations in the multi-tepal mutant provide insights into