Exploring proteins, chemicals and their interactions with STRING and STITCH

Preview:

Citation preview

Exploring proteins, chemicals and their interactions

with STRING and STITCH

Michael KuhnEMBL Heidelberg

my usual diet work:drugs, proteins, side effects

today

intro: interactionssmall examples

the databases: STRING and STITCHlarger example: NetworKIN

STRING: version 7interactions of proteins

STITCH: version 1interactions of

proteins and chemicals

interactions of proteins and chemicals

example

Tryptophan synthase beta chainE. Coli K12

example

aspirinHomo sapiens

content

(STRING 7)

373 genomes

(only completely sequenced genomes)

1.5 million genes

(not proteins)

68,000 chemicals

(including 2200 drugs)

many sources of interactions

genomic context

gene neighborhood

gene fusion

phylogenetic profiles

curated knowledge

Texperimental evidence

GEO: Gene Expression Omnibus

co-expression

experimental databases

literature

variable quality

different “raw scores”

benchmarking

calibrate against “gold standard”(KEGG)

probabilistic scores

e.g. “70% chance for an assocation”

combine all evidence

Bayesian scoring scheme

e.g.: two scores of 0.7combined probability: ?

e.g.: two scores of 0.7combined probability: 0.91

1 - (1-0.7)2 = 0.91

evidence spread over many species

evidence transfer

transfer by orthology

(or “fuzzy orthology”)

von Mering et al., Nucleic Acids Research, 2005

von Mering et al., Nucleic Acids Research, 2005

two modes

proteins mode

von Mering et al., Nucleic Acids Research, 2005

maximum specificitylower coverage

information will be relevant for selected species

COG mode

“clusters of orthologous groups”

von Mering et al., Nucleic Acids Research, 2005

higher coveragelower specificity

includes all available evidence

some orthologous groups are too large to be meaningful

a real application

Resource

Systematic Discovery ofIn Vivo Phosphorylation NetworksRune Linding,1,2,7,* Lars Juhl Jensen,3,7 Gerard J. Ostheimer,2,4,7 Marcel A.T.M. van Vugt,2,5 Claus Jørgensen,1

Ioana M. Miron,1 Francesca Diella,3 Karen Colwill,1 Lorne Taylor,1 Kelly Elder,1 Pavel Metalnikov,1

Vivian Nguyen,1 Adrian Pasculescu,1 Jing Jin,1 Jin Gyoon Park,1 Leona D. Samson,4 James R. Woodgett,1

Robert B. Russell,3 Peer Bork,3,6,* Michael B. Yaffe,2,* and Tony Pawson1,*1Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Canada2Center for Cancer Research, Massachusetts Institute of Technology, Cambridge, USA3European Molecular Biology Laboratory, Heidelberg, Germany4Center for Environmental Health Sciences, Massachusetts Institute of Technology, Cambridge, USA5Department of Cell Biology and Genetics, Erasmus University, Rotterdam, The Netherlands6Max-Delbruck-Centre for Molecular Medicine, Berlin, Germany7These authors contributed equally to this work.*Correspondence: linding@mshri.on.ca (R.L.), bork@embl.de (P.B.), myaffe@mit.edu (M.B.Y.), pawson@mshri.on.ca (T.P.)DOI 10.1016/j.cell.2007.05.052

SUMMARY

Protein kinases control cellular decision pro-cesses by phosphorylating specific substrates.Thousands of in vivo phosphorylation siteshave been identified, mostly by proteome-wide mapping. However, systematically match-ing these sites to specific kinases is presentlyinfeasible, due to limited specificity of consen-sus motifs, and the influence of contextualfactors, such as protein scaffolds, localization,and expression, on cellular substrate specific-ity. We have developed an approach (Networ-KIN) that augments motif-based predictionswith the network context of kinases andphosphoproteins. The latter provides 60%–80% of the computational capability to assignin vivo substrate specificity. NetworKIN pin-points kinases responsible for specific phos-phorylations and yields a 2.5-fold improvementin the accuracy with which phosphorylationnetworks can be constructed. Applying thisapproach to DNA damage signaling, we showthat 53BP1 and Rad50 are phosphorylated byCDK1 and ATM, respectively. We describea scalable strategy to evaluate predictions,which suggests that BCLAF1 is a GSK-3substrate.

INTRODUCTION

The dynamic behavior and decision processes of eukary-otic cells are controlled by posttranslational modificationssuch as protein phosphorylation. These, in turn, can mod-ify protein function by inducing conformational changes or

by creating binding sites for protein interaction domains(for example, SH2 or BRCT) that selectively recognizephosphorylated linear motifs (Seet et al., 2006).

Decades of targeted biochemical studies and recentexperiments employing mass spectrometry (MS) tech-niques have identified thousands of in vivo phosphoryla-tion sites (Aebersold and Mann, 2003). These arecollected in the Phospho.ELM database, which currentlycontains 7207 phosphorylation sites in 2540 humanproteins (Diella et al., 2004). However, which of theapproximately 518 human protein kinases (Manninget al., 2002) is responsible for each of these phosphoryla-tion events is only known for just over a third of sitesidentified thus far (35% [Diella et al., 2004]), and this frac-tion is decreasing in thewake of additional proteome-widestudies. As a consequence, there is an ever-widening gapin our understanding of in vivo phosphorylation networks,which is difficult to close in a systematic way by currentexperimental methods, despite advances in high-through-put in vitro assays (Ptacek et al., 2005) and selectivekinase inhibitors (Bain et al., 2003). Our understanding ofphosphorylation-dependent signaling networks is there-fore still fragmentary.

The desire to map phosphorylation networks hasmotivated the development of computational methodsto predict the substrate specificities of protein kinases,based on experimental identification of the consensussequence motifs recognized by the active site of kinasecatalytic domains (Hjerrild et al., 2004; Obenauer et al.,2003; Puntervoll et al., 2003). However, these motifs oftenlack sufficient information to uniquely identify the phys-iological substrates of specific kinases. For example,the sites phosphorylated by different kinases from theCDK or Src families cannot be distinguished by theirsequences, although consensus motifs of these kinaseshave been determined by in vitro experiments (Mankeet al., 2005). Thus, the recognition properties of the activesite alone are typically insufficient to reproduce the

Cell 129, 1415–1426, June 29, 2007 ª2007 Elsevier Inc. 1415

phosphoproteomics

in vivo phosphorylation sites

kinases are unknown

computational methods

overprediction

context

scaffolders

Alberts, Molecular Biology of the Cell

interaction networks

NetworKIN

benchmarking

DNA damage response

experimental validation

take home message

STRING and STITCH integrate information and predict interactions

you can always go to the sources

it’s useful!

Acknowledgements

The STRING/STITCH teamLars Juhl Jensen

Peer BorkChristian von Mering & group in Zurich

NetworKINLars Juhl Jensen

Rune Linding(and many other people)

Thank you for your attention

string.embl.devon Mering et al., NAR Database Issue 2007

stitch.embl.deKuhn et al., NAR Database Issue 2008

Recommended