Upload
biocs
View
1.126
Download
0
Embed Size (px)
Citation preview
Exploring proteins, chemicals and their interactions
with STRING and STITCH
Michael KuhnEMBL Heidelberg
my usual diet work:drugs, proteins, side effects
today
intro: interactionssmall examples
the databases: STRING and STITCHlarger example: NetworKIN
STRING: version 7interactions of proteins
STITCH: version 1interactions of
proteins and chemicals
interactions of proteins and chemicals
example
Tryptophan synthase beta chainE. Coli K12
example
aspirinHomo sapiens
content
(STRING 7)
373 genomes
(only completely sequenced genomes)
1.5 million genes
(not proteins)
68,000 chemicals
(including 2200 drugs)
many sources of interactions
genomic context
gene neighborhood
gene fusion
phylogenetic profiles
curated knowledge
Texperimental evidence
GEO: Gene Expression Omnibus
co-expression
experimental databases
literature
variable quality
different “raw scores”
benchmarking
calibrate against “gold standard”(KEGG)
probabilistic scores
e.g. “70% chance for an assocation”
combine all evidence
Bayesian scoring scheme
e.g.: two scores of 0.7combined probability: ?
e.g.: two scores of 0.7combined probability: 0.91
1 - (1-0.7)2 = 0.91
evidence spread over many species
evidence transfer
transfer by orthology
(or “fuzzy orthology”)
von Mering et al., Nucleic Acids Research, 2005
von Mering et al., Nucleic Acids Research, 2005
two modes
proteins mode
von Mering et al., Nucleic Acids Research, 2005
maximum specificitylower coverage
information will be relevant for selected species
COG mode
“clusters of orthologous groups”
von Mering et al., Nucleic Acids Research, 2005
higher coveragelower specificity
includes all available evidence
some orthologous groups are too large to be meaningful
a real application
Resource
Systematic Discovery ofIn Vivo Phosphorylation NetworksRune Linding,1,2,7,* Lars Juhl Jensen,3,7 Gerard J. Ostheimer,2,4,7 Marcel A.T.M. van Vugt,2,5 Claus Jørgensen,1
Ioana M. Miron,1 Francesca Diella,3 Karen Colwill,1 Lorne Taylor,1 Kelly Elder,1 Pavel Metalnikov,1
Vivian Nguyen,1 Adrian Pasculescu,1 Jing Jin,1 Jin Gyoon Park,1 Leona D. Samson,4 James R. Woodgett,1
Robert B. Russell,3 Peer Bork,3,6,* Michael B. Yaffe,2,* and Tony Pawson1,*1Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Canada2Center for Cancer Research, Massachusetts Institute of Technology, Cambridge, USA3European Molecular Biology Laboratory, Heidelberg, Germany4Center for Environmental Health Sciences, Massachusetts Institute of Technology, Cambridge, USA5Department of Cell Biology and Genetics, Erasmus University, Rotterdam, The Netherlands6Max-Delbruck-Centre for Molecular Medicine, Berlin, Germany7These authors contributed equally to this work.*Correspondence: [email protected] (R.L.), [email protected] (P.B.), [email protected] (M.B.Y.), [email protected] (T.P.)DOI 10.1016/j.cell.2007.05.052
SUMMARY
Protein kinases control cellular decision pro-cesses by phosphorylating specific substrates.Thousands of in vivo phosphorylation siteshave been identified, mostly by proteome-wide mapping. However, systematically match-ing these sites to specific kinases is presentlyinfeasible, due to limited specificity of consen-sus motifs, and the influence of contextualfactors, such as protein scaffolds, localization,and expression, on cellular substrate specific-ity. We have developed an approach (Networ-KIN) that augments motif-based predictionswith the network context of kinases andphosphoproteins. The latter provides 60%–80% of the computational capability to assignin vivo substrate specificity. NetworKIN pin-points kinases responsible for specific phos-phorylations and yields a 2.5-fold improvementin the accuracy with which phosphorylationnetworks can be constructed. Applying thisapproach to DNA damage signaling, we showthat 53BP1 and Rad50 are phosphorylated byCDK1 and ATM, respectively. We describea scalable strategy to evaluate predictions,which suggests that BCLAF1 is a GSK-3substrate.
INTRODUCTION
The dynamic behavior and decision processes of eukary-otic cells are controlled by posttranslational modificationssuch as protein phosphorylation. These, in turn, can mod-ify protein function by inducing conformational changes or
by creating binding sites for protein interaction domains(for example, SH2 or BRCT) that selectively recognizephosphorylated linear motifs (Seet et al., 2006).
Decades of targeted biochemical studies and recentexperiments employing mass spectrometry (MS) tech-niques have identified thousands of in vivo phosphoryla-tion sites (Aebersold and Mann, 2003). These arecollected in the Phospho.ELM database, which currentlycontains 7207 phosphorylation sites in 2540 humanproteins (Diella et al., 2004). However, which of theapproximately 518 human protein kinases (Manninget al., 2002) is responsible for each of these phosphoryla-tion events is only known for just over a third of sitesidentified thus far (35% [Diella et al., 2004]), and this frac-tion is decreasing in thewake of additional proteome-widestudies. As a consequence, there is an ever-widening gapin our understanding of in vivo phosphorylation networks,which is difficult to close in a systematic way by currentexperimental methods, despite advances in high-through-put in vitro assays (Ptacek et al., 2005) and selectivekinase inhibitors (Bain et al., 2003). Our understanding ofphosphorylation-dependent signaling networks is there-fore still fragmentary.
The desire to map phosphorylation networks hasmotivated the development of computational methodsto predict the substrate specificities of protein kinases,based on experimental identification of the consensussequence motifs recognized by the active site of kinasecatalytic domains (Hjerrild et al., 2004; Obenauer et al.,2003; Puntervoll et al., 2003). However, these motifs oftenlack sufficient information to uniquely identify the phys-iological substrates of specific kinases. For example,the sites phosphorylated by different kinases from theCDK or Src families cannot be distinguished by theirsequences, although consensus motifs of these kinaseshave been determined by in vitro experiments (Mankeet al., 2005). Thus, the recognition properties of the activesite alone are typically insufficient to reproduce the
Cell 129, 1415–1426, June 29, 2007 ª2007 Elsevier Inc. 1415
phosphoproteomics
in vivo phosphorylation sites
kinases are unknown
computational methods
overprediction
context
scaffolders
Alberts, Molecular Biology of the Cell
interaction networks
NetworKIN
benchmarking
DNA damage response
experimental validation
take home message
STRING and STITCH integrate information and predict interactions
you can always go to the sources
it’s useful!
Acknowledgements
The STRING/STITCH teamLars Juhl Jensen
Peer BorkChristian von Mering & group in Zurich
NetworKINLars Juhl Jensen
Rune Linding(and many other people)
Thank you for your attention
string.embl.devon Mering et al., NAR Database Issue 2007
stitch.embl.deKuhn et al., NAR Database Issue 2008