The logistics for the next hour · Only introduction, results, and discussion. Associations have to...

The logistics for the next hour

• 

•  ‘Materials’ section: presentation.pdf

•  muted

•  Take questions at the end: look for the chat box!

Chat box

Address chat to ‘Everyone’

Please enter your question into this box

Extracting gene-disease evidence from literature, genetics, genomics, and more...

Diversity of biomolecular data

Datagenera)on

Therapeu)chypothesis

Publicdata

Dataintegra)on

Our Vision

A partnership to transform drug discovery through the systematic identification and

prioritisation of targets

https://www.opentargets.org

2014 2016 2017 2018

•  Resource of integrated multiomics data

•  Added value (e.g. score) and links to original sources

•  Graphical web interface: easy to use

April 2018 release

21K targets

9.7K diseases

2.3 M associations

6.1 M evidence

Open Targets Platform

Evidence for our T-D associations

https://docs.targetvalidation.org/data-sources/data-sources

Europe PMC A public database of life sciences research literature

33 million biomedical abstracts

3 million genomic variants

image by Jason D. Rowley

138 000 protein structures

in Europe PMC in dbVar in PDBe

Literature as part of big data

What evidence does this contain?

Text-mining for discovery

Text-mining bioentities

Demo: Publication centric workflow

https://europepmc.org/

Annotation types

Open Targets demo

Data sources grouped into data types Gene)c

Associa)onsSoma)cMuta)ons Drugs Affected

Pathways

Differen)alRNA

expression

AnimalModels

TextMining

GWAS Catalog

PheWAS

Cancer Gene Census

Expression Atlas PhenoDigm

Europe PMC

Open Targets Platform

•  Target centric and disease centric information

https://www.targetvalidation.org

•  Associations between targets and diseases

•  Ensembl Gene IDs e.g. ENSGXXXXXXXXXXX

•  UniProt IDs e.g P15056

•  HGNC names e.g. DMD

•  Also non-coding RNA genes

Targets → genes or proteins

•  Experimental Factor Ontology (EFO)

•  Controlled vocabulary (Alzheimers versus Alzheimer’s)

•  Hierarchy (relationships)

Diseases → EFO terms

•  Promotes consistency

•  Increases the richness of annotation

•  Allow for easier and automatic integration

Association score

Which targets have the most evidence for association with

a disease?

What is the relative weight of the evidence for different targets associated with a disease?

Statistical integration, aggregation and scoring

A) per evidence (e.g. one SNP from a GWAS paper)

B) per data source (e.g. SNPs from the GWAS catalog)

C) per data type (e.g. Genetic associations)

D) overall

Four-tier scoring framework

Aggregating individual scores

Ranking target-disease association

Association score: the overall score across all data types

•  Based on the data sources

•  Different weight applied: genetic association = drugs = mutations = pathways > RNA expression > animal models = text mining

To find gene-disease associations the text-mining algorithm searches for a gene and a disease mentioned in the same sentence. ●  Only primary research, no reviews or commentaries. ●  Only introduction, results, and discussion. ●  Associations have to appear more than once. ●  Defined vocabularies for genes and diseases (SwissProt, EFO) ●  Short or ambiguous entries are filtered (gene “A”, protein “Large”) ●  Term variations are included (“α” = “alpha”)

Text-mining evidence

Literature evidence in Open Targets - a target validation platform. Kafkas et. al. 2017

For each association a confidence score is calculated. It takes into account where the association is found in the paper: title scores high, introduction scores low (known associations). 50-80% of other evidence types overlap with the literature mining data.

How accurate is text-mining

antigen-presenting cells

activated protein C anaphase-promoting complex

argon plasma coagulation ?

Annotation errors

Demo: reporting an annotation

Content in Europe PMC

Content for text-mining

Text-mining coverage

https://www.targetvalidation.org/

Demo: Disease centric workflow

What is the evidence for the association between a target

and a disease?

Which targets are associated with a disease?

https://europepmc.org/

Demo: Publication centric workflow

Which articles on p53 cite clinical trials?

Which studies report gene mutations implicated in

diabetes?

Programmatic access

https://europepmc.org/AnnotationsApi

Data citation

Conclusions

helpdesk@europepmc.org

http://bit.ly/EuropePMC-youtube

@EuropePMC_news

Chat box

Please enter your question into this box

Address chat to ‘Everyone’

Back up

UniProt Gene2Phenotype

GWAS catalog

Cancer Gene Census

EVA (somatic)

IntOGen

ChEMBL

Reactome

Expression Atlas

Europe PMC

PhenoDigm

Genetic associations

Somatic mutations

RNA expression

Animal models

Affected pathways

Text mining

Association

S1 + S2/22 + S3/32 + S4/42 + Si/i2

Genomics England

PhEWAS catalog *1.0

*1.0 ΣH

Four-tier scoring framework

Calculated at 4 levels: •  Evidence •  Data source •  Data type •  Overall

Score: 0 to 1 (max)

weight factor

Aggregation with (harmonic sum)

Note: Each data set has its own scoring and ranking scheme

f = sample size (cases versus controls)

s = predicted functional consequence (VEP)

c = p value reported in the paper

Factors affecting the relative strength of an evidence

e.g. GWAS Catalog S = f * s * c

f, relative occurrence of a target-disease evidence

s, strength of the effect described by the evidence

c, confidence of the observation for the target-disease evidence

https://docs.targetvalidation.org/getting-started/scoring

Aggregating scores across the data

•  Using a mathematical function, the harmonic sum*

where S1,S2,...,Si are the individual sorted evidence scores in descending order

* PMID: 19107201, PMID: 20118918

•  Advantages:A) account for replicationB) deflate the effect of large amounts of data e.g. text

mining

In addition to T-D associations

•  Everything you wanted to know about…

… but were afraid to ask.

Disease profile page

Target profile page

Profile of a drug target

Protein Drugs Pathwaysinteractions

RNA and protein

baseline expression

Variants, isoforms and

genomic context

Mouse phenotypes Bibliography

Descrip)on Synonyms GeneOntology ProteinStructure

ProteinInterac)ons Similar Targets

ExpressionAtlas

Library/LINK

Extra, extra, extra! Cancer hallmarks in our latest release!

Genetree

http://www.targetvalidation.org/target/ENSG00000141510

Classification Drugs Similar diseases Bibliography

OpenTargetsLibrary/LINK

Profile of a disease

http://www.targetvalidation.org/disease/Orphanet_262

How to access all of this

Core bioinformatics pipelines

www.opentargets.org/projects

Experimental projects

Generate new evidence

CRISPR/Cas9

Organoids and IPS cells

(cellular models for disease)

Integration of available data

Web interface

Batch search tool REST API

Data dumps

Main data store

Elasticsearch Angular JS

Web App*

Public Access

* UI: first released in December 2015

** API first release in April 2016

https://www.targetvalidation.org

https://api.opentargets.io

The logistics for the next hour · Only introduction, results, and discussion. Associations have to...

Documents

Future of vocabularies

Controlled Vocabularies CV: Controlled VocabulariesControlled Vocabularies • Guidelines for the construction, Format, and Management of Monolingual Controlled Vocabularies (ANSI/NISO

RDA Vocabularies Briefing

Glycan database. Database of molecules Two models (of vocabularies) – Proteins / Nucleic Acids Residues (+ modifications) Genbank / Swissprot – Compounds

Presentation Vocabularies

Open Government Vocabularies and Metadata - … · Vocabularies In Statistics, vocabularies are Code sets Classifications and Taxonomies Database models XML schemas Questionnaires

Developing and publishing vocabularies

Evaluation of controlled vocabularies by interindexers ...eprints.rclis.org/32637/1/Evaluation of controlled vocabularies for... · Evaluation of controlled vocabularies by interindexers

Evaluation of controlled vocabularies by interindexers ... · vocabularies (Part 1: Thesauri for information retrieval; Part 2: Interoperability with other vocabularies). The evaluation

Portmanteau Vocabularies for Multi-Cue Image Representationpapers.nips.cc/paper/4481-portmanteau-vocabularies-for...Portmanteau Vocabularies for Multi-Cue Image Representation Fahad

The Getty Vocabularies: Technical Overview Vocabularies€¦ · The Getty Vocabularies: Technical Overview Joan Cobb Technical Lead, Getty Vocabularies J. Paul Getty Trust jcobb@getty.edu

S4 Vocabularies JL

Registering the RDA Vocabularies

GBIF Controlled Vocabularies

Ontologies and Vocabularies

most-important english vocabularies

Arabic Vocabularies 2

Banking Vocabularies » BankInfoBD

Health vocabularies presentation

cdn.ymaws.com€¦ · Web viewThe Vocabularies Subcommittee participates in the development and revision of controlled vocabularies applicable to music resources. Such vocabularies