Data integration: The STITCH database of protein-small molecule interactions

Preview:

Citation preview

Data integrationThe STITCH database of protein–small molecule interactions

Lars Juhl Jensen

guilt by association

functional associations

Kuhn et al., Nucleic Acids Research, 2010

parts lists

>2.5 million proteins

630 genomes

many databases

different formats

model organism databases

Ensembl

RefSeq

PubChem compounds

>74,000 small molecules

genomic context

gene fusion

Korbel et al., Nature Biotechnology, 2004

conserved neighborhood

operons

Korbel et al., Nature Biotechnology, 2004

bidirectional promoters

Korbel et al., Nature Biotechnology, 2004

phylogenetic profiles

Korbel et al., Nature Biotechnology, 2004

interaction data

protein–small molecule

in vitro binding assays

protein–protein

yeast two-hybrid

affinity purification

fragment complementation

Jensen & Bork, Science, 2008

genetic interactions

Beyer et al., Nature Reviews Genetics, 2007

gene coexpression

many databases

BindingDB

CTDComparative Toxicogenomics Database

DrugBank

GLIDAGPCR-Ligand Database

PDSP KiPsycoactive Drug Screening Program

PharmGKBPharmacogenomics Knowledge Base

BINDBiomolecular Interaction Network Database

BioGRIDGeneral Repository for Interaction Datasets

DIPDatabase of Interacting Proteins

IntAct

MINTMolecular Interactions Database

HPRDHuman Protein Reference Database

PDBProtein Data Bank

GEOGene Expression Omnibus

different formats

different identifiers

partially redundant

curated knowledge

complexes

pathways

Letunic & Bork, Trends in Biochemical Sciences, 2008

high confidence

many databases

MIPSMunich Information center

for Protein Sequences

Gene Ontology

KEGGKyoto Encyclopedia of Genes and Genomes

MetaCyc

PIDNCI-Nature Pathway Interaction Database

Reactome

different formats

different identifiers

partially redundant

text mining

>10 km

human readable

not computer readable

different names

Reflect

dictionary

Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009

text corpus

MEDLINE

SGDSaccharomyces Genome Database

The Interactive Fly

OMIMOnline Mendelian Inheritance in Man

co-mentioning

NLPNatural Language Processing

integration

many data types

not comparable

variable quality

spread over 630 genomes

quality scores

reproducibility

von Mering et al., Nucleic Acids Research, 2005

intergenic distances

Korbel et al., Nature Biotechnology, 2004

benchmarking

calibrate vs. gold standard

von Mering et al., Nucleic Acids Research, 2005

raw quality scores

probabilistic scores

orthology transfer

von Mering et al., Nucleic Acids Research, 2005

combine all evidence

Acknowledgments

Damian Szklarczyk

Andrea Franceschini

Michael Kuhn

Sune Frankild

Heiko Horn

Evangelos Pafilis

Milan Simonovic

Alexander Roth

Pablo Minguez

Tobias Doerks

Jean Muller

Manuel Stark

Samuel Chaffron

Chris Creevey

Philippe Julien

Jan Korbel

Berend Snel

Martijn Huynen

Reinhardt Schneider

Sean O’DonoghueChristian von Mering

Peer Bork

Predicting novel targets for existing drugs using side effect information

Lars Juhl Jensen

the problem

new uses for old drugs

drug–drug network

shared target(s)

chemical similarity

Campillos & Kuhn et al., Science, 2008

Campillos & Kuhn et al., Science, 2008

similar drugs share targets

only trivial predictions

the idea

chemical perturbations

phenotypic readouts

drug treatment

side effects

the hard work

information on side effects

no database

package inserts

Campillos & Kuhn et al., Science, 2008

text mining

side-effect ontology

backtracking

Campillos & Kuhn et al., Science, 2008

manual validation

SIDER

Kuhn et al., Molecular Systems Biology, 2010

side-effect correlations

Campillos & Kuhn et al., Science, 2008

GSC weighting

side-effect frequencies

Campillos & Kuhn et al., Science, 2008

raw similarity score

Campillos & Kuhn et al., Science, 2008

p-values

Campillos & Kuhn et al., Science, 2008

side-effect similarity

chemical similarity

Campillos & Kuhn et al., Science, 2008

confidence scores

reference set

incomplete databases

text mining

manual validation

MATADOR

Günther et al., Nucleic Acids Research, 2008

Campillos & Kuhn et al., Science, 2008

the results

drug–drug network

Campillos & Kuhn et al., Science, 2008

categorization

Campillos & Kuhn et al., Science, 2008

20 drug–drug pairs

in vitro binding assays

Ki<10 µM for 11 of 20

cell assays

9 of 9 showed activity

the future

link side-effects to targets

direct target prediction

Acknowledgments

Monica Campillos

Michael Kuhn

Anne-Claude Gavin

Peer Bork

larsjuhljensen

Recommended