View
0
Download
0
Category
Preview:
Citation preview
The logistics for the next hour
•
• ‘Materials’ section: presentation.pdf
• muted
• Take questions at the end: look for the chat box!
Chat box
Address chat to ‘Everyone’
Please enter your question into this box
Extracting gene-disease evidence from literature, genetics, genomics, and more...
Diversity of biomolecular data
Datagenera)on
Therapeu)chypothesis
Publicdata
Dataintegra)on
Our Vision
A partnership to transform drug discovery through the systematic identification and
prioritisation of targets
https://www.opentargets.org
2014 2016 2017 2018
• Resource of integrated multiomics data
• Added value (e.g. score) and links to original sources
• Graphical web interface: easy to use
April 2018 release
21K targets
9.7K diseases
2.3 M associations
6.1 M evidence
Open Targets Platform
Evidence for our T-D associations
https://docs.targetvalidation.org/data-sources/data-sources
Europe PMC A public database of life sciences research literature
33 million biomedical abstracts
3 million genomic variants
image by Jason D. Rowley
138 000 protein structures
in Europe PMC in dbVar in PDBe
Literature as part of big data
What evidence does this contain?
Text-mining for discovery
Text-mining bioentities
Demo: Publication centric workflow
https://europepmc.org/
Annotation types
Open Targets demo
Data sources grouped into data types Gene)c
Associa)onsSoma)cMuta)ons Drugs Affected
Pathways
Differen)alRNA
expression
AnimalModels
TextMining
EVA
GWAS Catalog
PheWAS
Cancer Gene Census
EVA
Expression Atlas PhenoDigm
Europe PMC
G2P
Open Targets Platform
• Target centric and disease centric information
https://www.targetvalidation.org
• Associations between targets and diseases
• Ensembl Gene IDs e.g. ENSGXXXXXXXXXXX
• UniProt IDs e.g P15056
• HGNC names e.g. DMD
• Also non-coding RNA genes
Targets → genes or proteins
• Experimental Factor Ontology (EFO)
• Controlled vocabulary (Alzheimers versus Alzheimer’s)
• Hierarchy (relationships)
Diseases → EFO terms
• Promotes consistency
• Increases the richness of annotation
• Allow for easier and automatic integration
Association score
Which targets have the most evidence for association with
a disease?
What is the relative weight of the evidence for different targets associated with a disease?
Statistical integration, aggregation and scoring
A) per evidence (e.g. one SNP from a GWAS paper)
B) per data source (e.g. SNPs from the GWAS catalog)
C) per data type (e.g. Genetic associations)
D) overall
Four-tier scoring framework
Aggregating individual scores
Ranking target-disease association
Association score: the overall score across all data types
• Based on the data sources
• Different weight applied: genetic association = drugs = mutations = pathways > RNA expression > animal models = text mining
To find gene-disease associations the text-mining algorithm searches for a gene and a disease mentioned in the same sentence. ● Only primary research, no reviews or commentaries. ● Only introduction, results, and discussion. ● Associations have to appear more than once. ● Defined vocabularies for genes and diseases (SwissProt, EFO) ● Short or ambiguous entries are filtered (gene “A”, protein “Large”) ● Term variations are included (“α” = “alpha”)
Text-mining evidence
Literature evidence in Open Targets - a target validation platform. Kafkas et. al. 2017
For each association a confidence score is calculated. It takes into account where the association is found in the paper: title scores high, introduction scores low (known associations). 50-80% of other evidence types overlap with the literature mining data.
How accurate is text-mining
APC
antigen-presenting cells
activated protein C anaphase-promoting complex
argon plasma coagulation ?
?
?
?
Annotation errors
Demo: reporting an annotation
Content in Europe PMC
Content for text-mining
Text-mining coverage
https://www.targetvalidation.org/
Demo: Disease centric workflow
What is the evidence for the association between a target
and a disease?
Which targets are associated with a disease?
https://europepmc.org/
Demo: Publication centric workflow
Which articles on p53 cite clinical trials?
Which studies report gene mutations implicated in
diabetes?
Programmatic access
https://europepmc.org/AnnotationsApi
Data citation
Data citation
Conclusions
Help!
helpdesk@europepmc.org
http://bit.ly/EuropePMC-youtube
@EuropePMC_news
Chat box
Please enter your question into this box
Address chat to ‘Everyone’
Back up
EVA
UniProt Gene2Phenotype
GWAS catalog
Cancer Gene Census
EVA (somatic)
IntOGen
ChEMBL
Reactome
Expression Atlas
Europe PMC
PhenoDigm
Genetic associations
Somatic mutations
RNA expression
Animal models
Affected pathways
Text mining
Drugs
*1.0
*1.0
*1.0
*1.0
*1.0
*1.0
*1.0
*1.0
*1.0
*0.5
*0.2
*0.2
Association
S1 + S2/22 + S3/32 + S4/42 + Si/i2
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
Genomics England
PhEWAS catalog *1.0
*1.0 ΣH
ΣH
Four-tier scoring framework
Calculated at 4 levels: • Evidence • Data source • Data type • Overall
Score: 0 to 1 (max)
weight factor
Aggregation with (harmonic sum)
ΣH
Note: Each data set has its own scoring and ranking scheme
f = sample size (cases versus controls)
s = predicted functional consequence (VEP)
c = p value reported in the paper
Factors affecting the relative strength of an evidence
e.g. GWAS Catalog S = f * s * c
f, relative occurrence of a target-disease evidence
s, strength of the effect described by the evidence
c, confidence of the observation for the target-disease evidence
https://docs.targetvalidation.org/getting-started/scoring
Aggregating scores across the data
• Using a mathematical function, the harmonic sum*
where S1,S2,...,Si are the individual sorted evidence scores in descending order
* PMID: 19107201, PMID: 20118918
• Advantages:A) account for replicationB) deflate the effect of large amounts of data e.g. text
mining
In addition to T-D associations
• Everything you wanted to know about…
… but were afraid to ask.
Disease profile page
Target profile page
Profile of a drug target
Protein Drugs Pathwaysinteractions
RNA and protein
baseline expression
Variants, isoforms and
genomic context
Mouse phenotypes Bibliography
Descrip)on Synonyms GeneOntology ProteinStructure
ProteinInterac)ons Similar Targets
ExpressionAtlas
Library/LINK
Extra, extra, extra! Cancer hallmarks in our latest release!
Genetree
http://www.targetvalidation.org/target/ENSG00000141510
Classification Drugs Similar diseases Bibliography
OpenTargetsLibrary/LINK
Profile of a disease
http://www.targetvalidation.org/disease/Orphanet_262
How to access all of this
Core bioinformatics pipelines
www.opentargets.org/projects
Experimental projects
Generate new evidence
CRISPR/Cas9
Organoids and IPS cells
(cellular models for disease)
Integration of available data
Web interface
Batch search tool REST API
Data dumps
Main data store
Elasticsearch Angular JS
Web App*
Public Access
REST
API**
* UI: first released in December 2015
** API first release in April 2016
https://www.targetvalidation.org
https://api.opentargets.io
Recommended