46
The logistics for the next hour ‘Materials’ section: presentation.pdf muted Take questions at the end: look for the chat box!

The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

The logistics for the next hour

• 

•  ‘Materials’ section: presentation.pdf

•  muted

•  Take questions at the end: look for the chat box!

Page 2: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

Chat box

Address chat to ‘Everyone’

Please enter your question into this box

Page 3: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

Extracting gene-disease evidence from literature, genetics, genomics, and more...

Page 4: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,
Page 5: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

Diversity of biomolecular data

Page 6: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

Datagenera)on

Therapeu)chypothesis

Publicdata

Dataintegra)on

Page 7: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

Our Vision

A partnership to transform drug discovery through the systematic identification and

prioritisation of targets

https://www.opentargets.org

2014 2016 2017 2018

Page 8: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

•  Resource of integrated multiomics data

•  Added value (e.g. score) and links to original sources

•  Graphical web interface: easy to use

April 2018 release

21K targets

9.7K diseases

2.3 M associations

6.1 M evidence

Open Targets Platform

Page 9: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

Evidence for our T-D associations

https://docs.targetvalidation.org/data-sources/data-sources

Page 10: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

Europe PMC A public database of life sciences research literature

Page 11: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

33 million biomedical abstracts

3 million genomic variants

image by Jason D. Rowley

138 000 protein structures

in Europe PMC in dbVar in PDBe

Literature as part of big data

Page 12: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

What evidence does this contain?

Page 13: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

Text-mining for discovery

Page 14: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

Text-mining bioentities

Page 15: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

Demo: Publication centric workflow

https://europepmc.org/

Page 16: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

Annotation types

Page 17: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

Open Targets demo

Page 18: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

Data sources grouped into data types Gene)c

Associa)onsSoma)cMuta)ons Drugs Affected

Pathways

Differen)alRNA

expression

AnimalModels

TextMining

EVA

GWAS Catalog

PheWAS

Cancer Gene Census

EVA

Expression Atlas PhenoDigm

Europe PMC

G2P

Page 19: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

Open Targets Platform

•  Target centric and disease centric information

https://www.targetvalidation.org

•  Associations between targets and diseases

Page 20: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

•  Ensembl Gene IDs e.g. ENSGXXXXXXXXXXX

•  UniProt IDs e.g P15056

•  HGNC names e.g. DMD

•  Also non-coding RNA genes

Targets → genes or proteins

Page 21: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

•  Experimental Factor Ontology (EFO)

•  Controlled vocabulary (Alzheimers versus Alzheimer’s)

•  Hierarchy (relationships)

Diseases → EFO terms

•  Promotes consistency

•  Increases the richness of annotation

•  Allow for easier and automatic integration

Page 22: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

Association score

Which targets have the most evidence for association with

a disease?

What is the relative weight of the evidence for different targets associated with a disease?

Page 23: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

Statistical integration, aggregation and scoring

A) per evidence (e.g. one SNP from a GWAS paper)

B) per data source (e.g. SNPs from the GWAS catalog)

C) per data type (e.g. Genetic associations)

D) overall

Four-tier scoring framework

Page 24: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

Aggregating individual scores

Page 25: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

Ranking target-disease association

Association score: the overall score across all data types

•  Based on the data sources

•  Different weight applied: genetic association = drugs = mutations = pathways > RNA expression > animal models = text mining

Page 26: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

To find gene-disease associations the text-mining algorithm searches for a gene and a disease mentioned in the same sentence. ●  Only primary research, no reviews or commentaries. ●  Only introduction, results, and discussion. ●  Associations have to appear more than once. ●  Defined vocabularies for genes and diseases (SwissProt, EFO) ●  Short or ambiguous entries are filtered (gene “A”, protein “Large”) ●  Term variations are included (“α” = “alpha”)

Text-mining evidence

Literature evidence in Open Targets - a target validation platform. Kafkas et. al. 2017

Page 27: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

For each association a confidence score is calculated. It takes into account where the association is found in the paper: title scores high, introduction scores low (known associations). 50-80% of other evidence types overlap with the literature mining data.

How accurate is text-mining

Page 28: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

APC

antigen-presenting cells

activated protein C anaphase-promoting complex

argon plasma coagulation ?

?

?

?

Annotation errors

Page 29: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

Demo: reporting an annotation

Page 30: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

Content in Europe PMC

Content for text-mining

Text-mining coverage

Page 31: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

https://www.targetvalidation.org/

Demo: Disease centric workflow

What is the evidence for the association between a target

and a disease?

Which targets are associated with a disease?

Page 32: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

https://europepmc.org/

Demo: Publication centric workflow

Which articles on p53 cite clinical trials?

Which studies report gene mutations implicated in

diabetes?

Page 33: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

Programmatic access

https://europepmc.org/AnnotationsApi

Page 34: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

Data citation

Page 35: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

Data citation

Page 36: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

Conclusions

Page 37: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

Help!

[email protected]

http://bit.ly/EuropePMC-youtube

@EuropePMC_news

Page 38: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

Chat box

Please enter your question into this box

Address chat to ‘Everyone’

Page 39: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

Back up

Page 40: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

EVA

UniProt Gene2Phenotype

GWAS catalog

Cancer Gene Census

EVA (somatic)

IntOGen

ChEMBL

Reactome

Expression Atlas

Europe PMC

PhenoDigm

Genetic associations

Somatic mutations

RNA expression

Animal models

Affected pathways

Text mining

Drugs

*1.0

*1.0

*1.0

*1.0

*1.0

*1.0

*1.0

*1.0

*1.0

*0.5

*0.2

*0.2

Association

S1 + S2/22 + S3/32 + S4/42 + Si/i2

ΣH

ΣH

ΣH

ΣH

ΣH

ΣH

ΣH

ΣH

ΣH

ΣH

ΣH

ΣH

ΣH

ΣH

ΣH

ΣH

Genomics England

PhEWAS catalog *1.0

*1.0 ΣH

ΣH

Four-tier scoring framework

Calculated at 4 levels: •  Evidence •  Data source •  Data type •  Overall

Score: 0 to 1 (max)

weight factor

Aggregation with (harmonic sum)

ΣH

Note: Each data set has its own scoring and ranking scheme

Page 41: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

f = sample size (cases versus controls)

s = predicted functional consequence (VEP)

c = p value reported in the paper

Factors affecting the relative strength of an evidence

e.g. GWAS Catalog S = f * s * c

f, relative occurrence of a target-disease evidence

s, strength of the effect described by the evidence

c, confidence of the observation for the target-disease evidence

https://docs.targetvalidation.org/getting-started/scoring

Page 42: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

Aggregating scores across the data

•  Using a mathematical function, the harmonic sum*

where S1,S2,...,Si are the individual sorted evidence scores in descending order

* PMID: 19107201, PMID: 20118918

•  Advantages:A) account for replicationB) deflate the effect of large amounts of data e.g. text

mining

Page 43: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

In addition to T-D associations

•  Everything you wanted to know about…

… but were afraid to ask.

Disease profile page

Target profile page

Page 44: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

Profile of a drug target

Protein Drugs Pathwaysinteractions

RNA and protein

baseline expression

Variants, isoforms and

genomic context

Mouse phenotypes Bibliography

Descrip)on Synonyms GeneOntology ProteinStructure

ProteinInterac)ons Similar Targets

ExpressionAtlas

Library/LINK

Extra, extra, extra! Cancer hallmarks in our latest release!

Genetree

http://www.targetvalidation.org/target/ENSG00000141510

Page 45: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

Classification Drugs Similar diseases Bibliography

OpenTargetsLibrary/LINK

Profile of a disease

http://www.targetvalidation.org/disease/Orphanet_262

Page 46: The logistics for the next hour · Only introduction, results, and discussion. Associations have to appear more than once. Defined vocabularies for genes and diseases (SwissProt,

How to access all of this

Core bioinformatics pipelines

www.opentargets.org/projects

Experimental projects

Generate new evidence

CRISPR/Cas9

Organoids and IPS cells

(cellular models for disease)

Integration of available data

Web interface

Batch search tool REST API

Data dumps

Main data store

Elasticsearch Angular JS

Web App*

Public Access

REST

API**

* UI: first released in December 2015

** API first release in April 2016

https://www.targetvalidation.org

https://api.opentargets.io