28
Presentation for Shamir group meeting Presentation for Shamir group meeting Interactome under construction: Interactome under construction: protein-protein interaction and pathway protein-protein interaction and pathway databases databases 5/1/2011 5/1/2011 Based on the papers: Protein-protein interactions: Interactome under construction. Bonetta L. Nature. (PMID: 21150998, Dec 2010) Protein-protein interaction and pathway databases, a graphical review. Klingström T, Plewczynski D. Brief Bioinform. (PMID: 20851835, Sep 2010)

Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein

Embed Size (px)

Citation preview

Page 1: Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein

Presentation for Shamir group meetingPresentation for Shamir group meeting

Interactome under construction: Interactome under construction: protein-protein interaction and pathway databasesprotein-protein interaction and pathway databases

5/1/20115/1/2011

Based on the papers:

Protein-protein interactions: Interactome under construction. Bonetta L. Nature. (PMID: 21150998, Dec 2010)

Protein-protein interaction and pathway databases, a graphical review. Klingström T, Plewczynski D. Brief Bioinform. (PMID: 20851835, Sep 2010)

Page 2: Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein

Protein-protein Interaction (PPI) and Biological Pathways (BP) Protein-protein Interaction (PPI) and Biological Pathways (BP) databasesdatabases

There are two kinds of databases, each concentrate on one of There are two kinds of databases, each concentrate on one of the two aspects of the biochemical/biological data:the two aspects of the biochemical/biological data:

(i) Protein-protein interaction (PPI) databases gather gather data ondata on the physical interactions between proteins.the physical interactions between proteins.

(ii) Biological pathways (BP),BP), databases including metabolic including metabolic and transport pathways , signaling cascades, and regulation and transport pathways , signaling cascades, and regulation networks, networks, gather data on the biological meaning of biological meaning of PPIs and other possible interaction between gene products.

Most of Most of these two kinds of databases enable two kinds of databases enable visualization and producing of maps showing a selected group of interactions.producing of maps showing a selected group of interactions.

In this presentation I will concentrate mainly on PPIs In this presentation I will concentrate mainly on PPIs databases, and BP databases that use PPIs.databases, and BP databases that use PPIs.

Page 3: Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein

STRING ATM- 73 interactors (> 0.700 level of integrity)

Page 4: Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein

STRING ATM-STRING ATM- 100 100

Page 5: Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein

ATM- signaling networkATM- signaling network

Page 6: Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein

MAPK CascadeMAPK Cascade

Page 7: Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein

KEGG map for the development of melanomaKEGG map for the development of melanoma

Page 8: Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein

Methods for detecting protein-protein interactionsMethods for detecting protein-protein interactions

There are two main approaches for detecting interacting proteins: techniques that measure direct physical interactions between protein pairs — binary approaches — and those that measure interactions among groups of proteins that may not form physical contacts — co-complex methods.

The two main binary methods for measuring of direct physical interactions between protein pairs are:

Yeast two-hybrid (Y2H) luminescence-based mammalian interactome mapping (LUMIER)

The most common co-complex method is co-immunoprecipitation (co-IP) coupled with mass spectrometry (MS)

In addition to these empirical methods, researchers have used computational techniques to predict interactions on the basis of factors such as amino-acid sequence and structural information.

Page 9: Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein

The most frequently used binary method is the yeast two-hybrid (Y2H) system. It has variations involving different reagents, and has been adapted to high-throughput screening. The strategy interrogates two proteins, called bait and prey, coupled to two halves of a transcription factor and expressed in yeast. If the proteins make contact, they reconstitute a transcription factor that activates a reporter gene.

LUMIER (luminescence-based mammalian interactome mapping) is a method for identifying binary interactions. This strategy fuses Renilla luciferaze (RL) enzyme, which catalyses light emitting reactions, to a bait protein, which is expressed in a mammalian cell along with candidate protein partners tagged with a polypeptide called Flag. Researchers use a Flag antibody to immunoprecipitate all proteins with the Flag tag, along with any that interact with them. Interactions between the RL-fused bait and the Flag-tagged prey are detected when light is emitted.

The most common co-complex method is co-immunoprecipitation (coIP) coupled with mass spectrometry (MS). In this approach, a protein bait is tagged with a molecular marker. There are techniques to recognize the tag and fish the bait protein out of the cell lysate, bringing with it any interacting

Proteins. These proteins are then identified by Mass Spectometry (MS).

Page 10: Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein

The binary methods for measuring of direct physical interactions:

The yeast two hybrid systemThe yeast two hybrid system

A plasmid containing the DNA encoding the DNA-binding domain of a transcription factor needed to turn on expression of a "reporter gene" such as the lacZ  gene (that encodes the enzyme β-galactosidase) coupled to the DNA encoding the "target" protein (the protein whose possible partners we wish to identify) is inserted to a-mating type cell . In a second yeast cells, α-mating type cells, a plasmid with the DNA encoding the activation domain of the transcription factor coupled to the DNA encoding a possible partner ("bait") protein is inserted . Following the mating the α yeast cells with the a type cells If the fusion protein produced by the transcription and translation of a "bait"-containing plasmid can bind to the fusion protein containing the target, the two domains of the transcription factor can interact to turn on expression of the reporter gene (lacZ in our case). Grown on an indicator substrate, these colonies will turn blue. The DNA in these colonies can then be isolated and sequenced. The result: identification of the proteins that can associate 

Page 11: Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein

LUMIER (luminescence-based mammalian interactome mapping)

LUMIER (luminescence-based mammalian interactome mapping) is a method for identifying binary interactions, and a high throughput approach developed. This strategy fuses Renilla luciferaze (RL) enzyme, which catalyses light emitting reactions, to a bait protein, (A in the picture) which is expressed in a mammalian cell along with candidate protein partners tagged with a polypeptide called Flag. Researchers use a Flag antibody to immunoprecipitate all proteins with the Flag tag, along with any that interact with them. Interactions between the RL-fused bait and the Flag-tagged prey are detected when light is emitted.

Page 12: Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein

The problem of the false positive PPI reportsThe problem of the false positive PPI reports

The integrity of the results of Y2H experiment are relatively low.The integrity of the results of Y2H experiment are relatively low. The integrity of the co-IP experiments are low, partly due to the including of The integrity of the co-IP experiments are low, partly due to the including of

some non-specific partners in a reported PPI, but mainly due to the some non-specific partners in a reported PPI, but mainly due to the identification of proteins in complexes, and not direct partners.identification of proteins in complexes, and not direct partners.

Possible solutions: 1) Using at least two different methods when analyzing specific PPIs.1) Using at least two different methods when analyzing specific PPIs. 2) 2) The interaction data obtained in an experiment can also be combined with

that available in public databases, thus providing a more complete picture (for example using known PPIs networks of other organisms, co-expression data, and bioinformatics tools for identification of sequences in the proteins that promote specific interactions between proteins ).

Page 13: Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein

The false negative problemThe false negative problem

One challenge in defining protein–protein interaction networks is that unlike the genome, the interactome is dynamic. Many interactions are transient, and others occur only in certain cellular contexts or at particular times in development. Interactions vary depending on the type of cell and the cellular environment.

In the paper “Interactome under construction”, the protein–protein interaction network for TGF-β, a growth factor that regulates cell functions was given as an example for aforementioned complexity: It was found that two proteins that pass on the signals from the factor inside the cell — Smad2 and Smad4 — interact with one another only when the cells are stimulated with TGF-β. If the cells are not stimulated, these two proteins don’t come into contact. It seems that following the stimulation the contact can be formed due to a change in the environment of the proteins, and/or by implementation of specific post translation modifications on these proteins.

Page 14: Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein

New methods were developed for identifications of Interactome changes during diseases

AQUA (multiplex absolute quantification) is a new method that its aim is to look at dynamic changes in protein interaction networks. AQUA uses synthetic peptides that contain stable isotopes as internal standards for the native peptides that are produced when proteins from a cell lysate are digested. Using tandem MS, researchers can compare the levels of native and synthetic peptides in a cell to obtain a measure of the amount of native proteins present. Synthetic peptides can also be prepared with modifications This method can provide an accurate and sensitive measure of how the stoichiometry of components within complexes that make up a network are altered in response to a stimulus.

KAYAK (kinase activity assay for protein profiling) is another approach to developing diagnostic tools for cancer on the basis of the functional consequences of the interaction between a protein, in this case a kinase, and its substrate. In this method, up to 90 peptide substrates for kinases are used to simultaneously measure the addition of phosphate groups to proteins in a cell lysate — in essence providing a ‘phosphorylation signature’ for that particular cell.

Page 15: Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein

Some examples for PPI and biological pathways databases PPI databases BIOGRID BIOGRID (http://thebiogrid.org)(http://thebiogrid.org) STRING STRING (http://stringdb.org)(http://stringdb.org) Dip Dip (http://dip.doe-mbi.ucla.edu/)(http://dip.doe-mbi.ucla.edu/) MINT MINT (http://mint.bio.uniroma2.it/mint/Welcome.do)(http://mint.bio.uniroma2.it/mint/Welcome.do) INTERACTOMEINTERACTOME (http://www.ebi.ac.uk/intact/main.xhtml)(http://www.ebi.ac.uk/intact/main.xhtml) HPRD HPRD (http://hprd.org/)(http://hprd.org/) BIND BIND (http://bind.ca/)(http://bind.ca/)

Biological Pathways Databases SPIKE SPIKE ((http://www.cs.tau.ac.il/~spike/http://www.cs.tau.ac.il/~spike/ and and http://spike.cs.tau.ac.il/spike2/ )) REACTOME REACTOME (http://www.reactome.org/)(http://www.reactome.org/) KEGGKEGG ((http://www.genome.jp/kegg/http://www.genome.jp/kegg/)) GeneMANIAGeneMANIA (http://genemania.org/)(http://genemania.org/)

CYTOSCAPECYTOSCAPE ((http://http://www.cytoscape.orgwww.cytoscape.org//) ) is an is an open-source software for network visualization It is the most he most important site for visualization of PPI and biological pathway databases

NCBI_GENENCBI_GENE ((http://www.ncbi.nlm.nih.gov/gene/http://www.ncbi.nlm.nih.gov/gene/) ) is the data source for the human is the data source for the human genes, but gather also relevant data on the gene/gene products interactions genes, but gather also relevant data on the gene/gene products interactions and regulations.and regulations.

SPIKE imported data on PPIs from INTERACTOME PPI database and from REACTOME and KEGG Biological pathways databases.

Page 16: Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein

PPI databases categorization and qualifications

Stand-alone databases: BIND, DIP, HPRD, IntAct and MINT do not incorporate data from other databases. BioGRID imported the HPRD and Flybase databases in 2006, but have not added any more data from other databases since then.

Topical databases: DroID (PPIs in Drosophila melanogaster), MatrixDB (extracellular PPIs), InnateDB (PPIs in the immune system) and MPIDB (PPIs in microbes) combine datamining from other source databases with their own curation efforts.

Metamining databases: APID, MiMI and UniHI are with the mission to unify source databases into a single comprehensive source meta-database.

Predictive interactions databases: HAPPI,, STRING STITCH and Scansite. STRING combines known interaction data from interaction databases BIND, BioGRID, DIP, IntAct MINT and HPRD with interactions from the pathway databases PID, Reactome, KEGG and EcoCyc.

Page 17: Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein

Inconsistencies in the definition of proteins’ “interaction” Three different classes of proteins interactions are used by databases, sometimes Three different classes of proteins interactions are used by databases, sometimes

even without separation: even without separation: binary physical interactions, same-complex belonging (non-direct interactions) , and non-physical functional interactions.

Due to these inconsistencies in the “interaction” definition, there is a confusion regarding the size of the human interactome: Venkatesan et al estimates the size to 130,000 interactions, Hart et al. to 154,000–369,000 interactions and Stumpf et al. to 650,000 interactions.

Closer inspection reveals that each team has defined its own search space as the human interactome: Venkatesan et al. use the most restrictive definition and only include binary physical interactions, Hart et al. use in-house experimental data obtained by IP-MS to create its source networks which means that proteins belonging to the same protein complex are also considered to be interacting, thus increasing the size of their defined interactome. Stumpf et al. rely on a combination of yeast two hybrid (Y2H) derivated data sets and literature curated data from DIP and IntAct. Some Literature curated databases uses a more flexible definition of “interaction”: some of the papers considers also non-physical functional interactions to be a form of interaction. This definition enlarged significantly the number of interactions.

With the current technologies the human known PPIs are ~35,000, only about 1/4 of the estimated number of interactions, so the central problem in the construction of the Interactome is the false negative problem – the known interactions are just we the tip of the iceberg and we still need to identify a huge amount of PPIs.

Page 18: Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein

BIOGRID (http://thebiogrid.org/)(http://thebiogrid.org/) 50 model organism species50 model organism species The online interaction repository with data compiled version The online interaction repository with data compiled version 3.1.71 3.1.71

includes   362,355362,355 raw protein and genetic interactions from major model organism  raw protein and genetic interactions from major model organism species.species.

STRING STRING (http://stringdb.org)STRINGSTRING is a database of  is a database of known and predicted protein interactions.protein interactions.The interactions include direct (physical) and indirect (functional) associations; The interactions include direct (physical) and indirect (functional) associations; STRING quantitatively integrates interaction data from these sources for a large STRING quantitatively integrates interaction data from these sources for a large number of organisms, and transfers information between these organisms where number of organisms, and transfers information between these organisms where applicable. The database currently covers 2,590,259 proteins from 630 organisms.applicable. The database currently covers 2,590,259 proteins from 630 organisms.

GeneMANIA (http://genemania.org/)Indexing 817 association networks containing 185,324,281 interactions mapped to 135,148 genes from 6 organisms.

HPRD (http://hprd.org/)39,194 Protein-Protein Interactions (human))

Some examples for the organisms and volume of PPI and BP databases

Page 19: Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein

Dip (http://dip.doe-mbi.ucla.edu/)  more than 80 genome BUT for human they have only 2529 proteins, 3376

interactions

INTERACTOME (http://www.ebi.ac.uk/intact/main.xhtml) Contains: 234,147 binary interactions. 69,669 proteins.

MINT (http://mint.bio.uniroma2.it/mint/Welcome.do) 30 organisms, 90503 interactions (21938 human)

SPIKE: (http://www.cs.tau.ac.il/~spike/) and http://spike.cs.tau.ac.il/spike2/ 34266 interactions (human only)

Page 20: Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein

STRING_ATM_InteractorsSTRING_ATM_Interactors

Page 21: Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein
Page 22: Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein
Page 23: Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein
Page 24: Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein

Interactions databases

Page 25: Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein

Metamining and predictive PPIs databases

The PPI community has been characterized by a wide and open distribution of proteomic data through the collection of PPI and pathway databases. The ability to distribute and share data between various research groups has resulted in a large number of different source databases. However, the general overlap between PPIS databases is limited which means that a common procedure for researchers is to unify these diverse data sets to support their own work. Several metamining databases have been created that perform such unification. This has lead to the spontaneous development of a network of data exchange between literature curated databases, metamining databases and databases generating predicted PPIs.

The exchange of information is supported by three major data exchange formats: BioPAX, PSI-MI and SBML.

Page 26: Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein

Predictive interaction databasesPredictive interaction databases

Page 27: Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein

Metamining databases

Page 28: Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein

Pathway databases