19
TRD 1: DIFFERENTIAL NETWORKS – PROJECT SUMMARY A major limitation of most network mapping and analysis efforts is that they implicitly consider the system under static conditions, while real biological systems are under constant change. The dynamics of these biological systems are a reflection of context specificity (e.g., cell type), responses to environmental perturbations (e.g., chemical perturbations or viral infections), and genetic alterations (e.g., somatic mutations). Ultimately, we must understand how these dynamics affect – or are affected by – the underlying physical and genetic networks active at a particular time. Differential analysis of biological systems under multiple conditions (or in multiple systems) allows us to gain fundamental understanding of these biological responses and how biological networks are re-wired in response to perturbations and alterations. In this project, we will develop a series of tools and methodologies for conducting differential analyses of biological networks altered under multiple conditions. We will pursue novel algorithmic methods that allow us to make use of high-throughput, proteomic-level data to recover biological networks under specific biological perturbations. The software tools developed in this project allow researchers to further predict, analyze, and visualize the effects of these perturbations and alterations, while aggregating additional information regarding the known roles of the dynamic interactions and their participants.

Technology R&D Theme 1: Differential Networks

Embed Size (px)

Citation preview

TRD 1: DIFFERENTIAL NETWORKS – PROJECT SUMMARY A major limitation of most network mapping and analysis efforts is that they implicitly consider the system under static conditions, while real biological systems are under constant change. The dynamics of these biological systems are a reflection of context specificity (e.g., cell type), responses to environmental perturbations (e.g., chemical perturbations or viral infections), and genetic alterations (e.g., somatic mutations). Ultimately, we must understand how these dynamics affect – or are affected by – the underlying physical and genetic networks active at a particular time. Differential analysis of biological systems under multiple conditions (or in multiple systems) allows us to gain fundamental understanding of these biological responses and how biological networks are re-wired in response to perturbations and alterations. In this project, we will develop a series of tools and methodologies for conducting differential analyses of biological networks altered under multiple conditions. We will pursue novel algorithmic methods that allow us to make use of high-throughput, proteomic-level data to recover biological networks under specific biological perturbations. The software tools developed in this project allow researchers to further predict, analyze, and visualize the effects of these perturbations and alterations, while aggregating additional information regarding the known roles of the dynamic interactions and their participants.

TRD 1: DIFFERENTIAL NETWORKS – PROJECT NARRATIVE Network models are frequently used to integrate molecular data with prior biological knowledge, with the goal of elucidating disease pathways and identifying potential drug targets. We will develop novel bioinformatic tools that allow researchers to use high-throughput proteomic and genomic data to model the effects of dynamic network perturbations and gain an understanding of how these perturbations re-wire biological networks. These tools will help make clinically relevant diagnoses and predictions about an individual and their potential response to therapeutic interventions.

TRD 1: DIFFERENTIAL NETWORKS – SPECIFIC AIMS Network Biology has achieved significant advances in biological data integration, by enabling experimental data to be ‘explained’ via mapping to large networks of known biological interactions. Such techniques are enabled by a series of infrastructure developments, including the development of visualization and analysis tools (e.g., Cytoscape) and development of large-scale databases (e.g., BioGRID) that describe molecular interactions from literature and high-throughput experimentation. However, much of this analysis and infrastructure has been developed without considering the dynamics of biological networks across diverse physiological, environmental, disease and evolutionary contexts. Networks active in different contexts are re-wired in different ways and may possess different properties. Our premise is that understanding these differential networks will aid the ability to predict cellular responses to network perturbations. This TRD focuses on tools for modeling the differential states of biological networks under various perturbations and genomic alterations. It has the following Specific Aims: Aim 1: Tools for inference of differential networks from dynamic protein states. We will develop algorithms and tools for inference and analysis of dynamic protein networks. The proposed approach combines perturbation response data (e.g., cell growth in response to targeted drugs or growth factors) and pathway information (e.g., information from signaling pathway databases). The algorithms will generate network models that provide information on collective molecular changes in response to perturbations. We will also extend existing network visualization software to support the results of differential network analysis. Aim 2: Protein network alignment algorithm and viewer technology. Similar to sequence alignment algorithms and viewers, network alignment algorithms and viewers will enable the study of networks altered over evolution and via genetic and other disruptions within a population. Network alignment algorithms are under development, or have already been developed, by multiple groups, but very little work on network alignment visualization has taken place. We will 1) implement and improve network alignment algorithms, including novel algorithms that we have developed for multi-scale protein interface interaction networks (IIN), 2) develop a network alignment viewer in 2D and 3D in Cytoscape and 3) aggregate and make IIN data available to Cytoscape via standard PSICQUIC web services. Aim 3: Facilitating the interpretation of affinity purification mass spectrometry (AP-MS) data as interaction networks. We will develop tools to streamline mass-spectrometry analyses of protein interactions. The developed tools will enable importing, augmenting and clustering, as well as visualization of MS-derived data within Cytoscape. Additional tools will be developed that will enable users to access public repositories of AP-MS data for the purposes of data aggregation and annotation. These tools will support the quantitative analysis and visualization of AP-MS experiments, as well as allow these results to be viewed in the context of other ‘omics data.

TRD 1: DIFFERENTIAL NETWORKS – RESEARCH STRATEGY A cell is a collection of molecules that evolved to robustly achieve particular functions while under constant influence of environmental (e.g., changes in levels of nutrients and growth factors) and internal (e.g., mutations) perturbations. In the last decade, cellular systems have been frequently modeled as networks of interacting molecules, where functional information flows through pathways of interactions between signaling molecules. However, much of this analysis and infrastructure has been developed without considering the dynamics of biological networks across diverse physiological, environmental, disease, temporal, and evolutionary contexts. The focus of this TRD is to understand how biological systems, modeled as networks, respond to various forms of perturbation and alteration. In particular, we will develop algorithmic technology and supportive software tools for differentially perturbed network analysis and visualization. These capabilities will enable new comparative analyses looking at how protein-protein interaction networks change across contexts, such as over time, between uninfected and infected states, and in normal vs. cancerous tissues. We will also develop inference methods to predict network level responses to perturbations such as drug treatment. A major goal of this work is its application to human health, where despite the success of network models for describing cellular systems, predicting cell-type specific responses to perturbations still remains a grand challenge. For instance, it remains extremely difficult to predict individual patient response to targeted cancer therapies as diverse oncogenic alterations may exist in various combinations in each tumor and vary substantially among different patients. Cancer cells form heterogeneous systems that adapt rapidly to perturbations (e.g., targeted cancer drugs) through diverse mechanisms. Understanding the variations in molecular response to perturbations in different conditions, genomic backgrounds and time points will enable us to better predict the response of cellular systems to perturbations.

1.1 TOOLS FOR INFERENCE OF DIFFERENTIAL NETWORKS FROM PROTEIN STATES AND ABUNDANCES OVER TIME

Project Leader: Chris Sander (MSKCC) Overview. In this project, we will develop methods for the statistical analysis of molecular response data (e.g., total and phosphorylated protein abundances) to targeted perturbations using two different methods: 1) the perturbation biology method and its extensions and 2) a causal pathway analysis using differential network response. In the first method, we propose improvements in the perturbation biology method developed in the Sander lab. Perturbation biology involves inferring predictive models of cell signaling based on response to rich perturbations (i.e., hundreds of combinations of targeted drugs). Cellular response (e.g., cellular growth) to untested perturbations is predicted using these models and then experimentally tested; used in previous work on cancer drug resistance by the Sander lab. In the second method, we propose developing a methodology for pathway analysis for use when perturbation data is only available for low numbers of targeted agents. This new methodology will map the statistically significant differential responses between varying conditions to biological networks and identify the responsive modules. Further analysis of these differential networks will reveal potentially causal links between drug perturbations and module activities. Finally, we will extend existing tools to support the visualization of the results for the above methods. Objectives. Prediction of cellular response (e.g., cellular growth, apoptosis, migration) to targeted perturbations is a central challenge in biology for the development of mechanistic explanations and the design of effective therapeutic interventions. Task 1: Improve perturbation biology models using cell-type specific prior information. In order to improve the perturbation biology method (described below), we developed the Pathway Extraction and Reduction Algorithm (PERA) to build a cellular signaling network model based on a list proteins

and post-translationally modified proteins. PERA-derived models serve as prior information in the inference of models based on the perturbation biology method. Use of this prior information significantly improves the power of generated network models to predict cellular response to untested perturbations. We will improve the PERA to make use of a novel confidence score to generate cell-type specific prior information models using text-mining approaches to collect cell-type specific information for PERA-generated interactions. This will result in more accurate models with improved predictions of cellular responses1-4. Task 2: Develop a method for causal pathway analysis using differential networks. We will develop a method for pathway analysis to uncover causal relationships between perturbations and changes in protein abundances. This method will be used to analyze response data from projects with a limited number of perturbation experiments. Differential responses to perturbations will be statistically filtered (e.g., correlation-based filtering) to select proteins with a dose dependent response and significant differential response over multiple conditions. These selected proteins will be used to extract a biological sub-network from the Pathway Commons database and map the differential responses to this sub-network. Finally, responsive modules will be identified using a modified version of module detection software developed by the Sander lab known as NetBox. Increases in module activities in response to perturbations can be associated with adaptive responses and their causality tested. These modules may potentially be targeted therapeutically to overcome resistance in cancer cells. Task 3: Develop software tools for the visualization of differential networks models. To support Tasks 1 and 2, we will develop software tools that allow users to visualize the differences between multiple biological perturbations, enabling the visualization of absolute values, differences and differences of differences using a configurable interface. For projects involving time series or multi-condition data, this software will be able to handle changes in network topology due to response differences of the biological entities within the network. Lastly, this software will be able to visualize significantly varying sub-networks as a mechanism for managing network complexity. Background and Significance Task 1: Perturbation biology method. Quantitative characterization and prediction of the response landscape to perturbations is a central challenge in biology for the mechanistic understanding of biological systems and design of effective therapies. Quantitative models (e.g., differential equation models) that link perturbations to cellular response can capture predictive details that qualitative interrogation (e.g., Boolean network models) of cellular processes cannot. However, the building of such models is hard due to astronomically large number of possible models that represent underlying biological processes and often the lack experimentally tested model parameters; a review of modeling formalisms in systems biology was recently conducted5. To address this challenge, the Sander lab recently developed a computational and experimental method, termed perturbation biology. The perturbation biology method involves construction of quantitative network models using proteomic response data to large numbers of perturbations and network inference algorithms. These models are then used to predict system response (i.e., phospho-proteomic and cellular response such as cell viability) to novel perturbations through quantitative simulations (Figure 1). The reason for the use of proteomic-level data is that primary targets for most targeted therapeutic inhibitions (used by this method as perturbations) are phospho-proteins, and many drugs with a known mechanism may not correlate with the gene expression of their targets6. The significance of this work has been in the first time application of a statistical physics method known as belief propagation to effectively deal with this complexity and accurately predicting phenotypic responses to untested perturbations1,3. In the belief propagation inference method, we first calculate probabilistically the most likely interactions in the vast space of all possible solutions and then derive a set of individual, highly probable solutions in the form of executable models (Figure 1). In the generated network models, the nodes represent measured levels of proteins, phospho-proteins, or cellular phenotypes; and the edge weights represent the influence of upstream nodes on the time derivatives of their downstream effectors in a manner similar to ODE-based mathematical descriptions of models. Using the resulting

models, it is possible to perform quantitative simulations to in silico perturbations and predict system responses (e.g., cell viability or changes to protein abundance) to novel perturbations of interest and the most promising predictions are experimentally tested. The perturbation biology method is particularly powerful to nominate synergistic and effective drug combinations to overcome drug resistance in cancer1. For example, work in the Sander lab using perturbation biology models has accurately predicted 1) the dependence of synergistic response to CDK4 and IGF1R inhibition to AKT pathway activity and 2) synergistic effects of combined BRAF and MYC inhibition in RAF inhibitor resistant melanoma1,2. In the work on combined BRAF and MYC inhibition, use of prior information to narrow the search space in inference significantly improved the predictive power of the network models; shown by cross validation studies. Prior data was gathered using the Pathway Extraction and Reduction Algorithm (PERA)1,7,8, which uses a list of proteins and phospho-proteins to build a molecular interaction network using information from the Pathway Commons resource (developed by the Sander and Bader labs). Currently, the perturbation biology method does not support all available detail in the network captured from Pathway Commons. Therefore, a network reduction step is used to simplify the network and map Pathway Commons nodes one-to-one to experimental observables used in the perturbation biology method.

Task 2: Causal pathway analysis. In this task, we introduce a causal pathway analysis for perturbation studies when the perturbation biology method is not applicable due to a limited number of experimental perturbing agents. Often, available proteomic datasets, such as those collected by the White lab (DBP), contain measurements of a limited number of perturbations; the community has seen surge of these datasets and there is the expectation for more through programs such as NIH LINCS9. To determine causes of molecular response to drug perturbations for these datasets, we propose an alternative analysis method that makes use of some of the developments from the perturbation biology method, such as the PERA. Development of methods for causal pathway analysis has gained attention within the community as the number of large-scale datasets increases10-

14. Our proposed methodology will differ from these previous efforts in that we will focus on the analysis of drug perturbations 1) using proteomic and phospho-proteomic data (proteomic-level measurement provide a better reflection of drug activity) unlike several studies that have focused on gene expression11-13 and 2) the use of module based pathway analysis using NetBox, as a less biased form of pathway analysis unlike work that makes use of canonical pathways, which differ in definition between resources11,15.

Figure 1. The Perturbation Biology method involves systematic perturbations of cells with targeted drug combinations (Boxes 1-2), high-throughput measurements of response profiles (Box 2), automated extraction of prior signaling information from databases (Boxes 3-4), construction of ordinary differential equation (ODE) based signaling models (Box 5) with inference algorithms (Box 6) and prediction of system response to novel perturbations (Box 7)1

As part of the collaboration between the Forest White (MIT) and Chris Sander labs (MSKCC), we aim to determine, using our proposed methodology, the proteomic and pathway module level predictors of response and mechanisms of drug resistance to CDK4 inhibition in dedifferentiated liposarcoma. Dedifferentiated liposarcoma is a rare, but aggressive cancer with a high recurrence and low response rates to targeted therapies. We recently showed that targeting CDK4, either singly or in combination with other kinases may be a strong candidate for targeted therapy in liposarcoma2. Cyclin-dependent kinase 4 (CDK4) is important for cell cycle G1 phase progression. CDK4 is up-regulated in >40% of all soft-tissue sarcomas and nearly all cases of dedifferentiated liposarcomas either at the copy number or mRNA expression level (based on genomic profiling data from 207 soft tissue sarcoma patients, of which 50 are dedifferentiated liposarcoma patients)16. In our ongoing collaboration, a liposarcoma cell line (DDLS8817) is perturbed using different doses of CDK4 inhibitor and proteomic response data is collected at different time points (30min, 8hrs, 24hrs, 72hrs) using mass spectrometry. In initial experiments, phospho-peptide enrichment, performed using a cocktail of anti-phospho-tyrosine antibodies, yielded readouts for 190 peptides corresponding to 170 proteins. These data are used as input to our methods. Task 3: Develop software tools for the visualization of differential networks models. One focus of the Sander and Bader labs has been the development of standardized formats for community use in systems biology, including the use of the Biological Pathway Exchange (BioPAX) format in Pathway Commons and the Systems Biology Graphical Notation (SBGN) for the visual representation of molecular interaction networks; usage of these standards simplifies pathway data reuse and helps user readability of this data8,17. Of the nearly 30 software tools that support the SBGN notation (http://www.sbgn.org/SBGN_Software) only two provide a direct interface for Pathway Commons (Cytoscape and ChiBE), and of these two only ChiBE natively supports the SBGN notation18,19. We will focus on the extension of these tools for the visualization of results from the analyses conducted in this project. Both Cytoscape and ChiBE natively support the visualization of user data and comparison of changes within a network across two different contexts. Visualization of these differential networks is powerful method for understanding how processes are modified. Examples include changes in (i) proteomic response to drug perturbations between two cell lines, (ii) emergence of genomic alterations in different tumors in response to treatment (ii) genetic conservation of enzymes in metabolic networks between evolutionary branches, (iii) gene expression during development in different lineages, (iv) different sets of host-pathogen interactions, and (v) longitudinal change over time in the Framingham study network (Table 1). Table 1. Types of networks and corresponding perturbations Network Type Context Perturbation/Change Observation DBP Kinase Cell lines Drug concentration Protein levels NCI-60

Kinase Signaling Tumor subtypes Before/After treatment Genetic alterations Vidal,

Sage, ICGC

Gene Regulation

Cell lineages Developmental stage Gene expression Sage, ICGC

Metabolic Evolutionary branch Evolutionary time Genetic conservation ICGC, NCI-60

Host-pathogen Cell lines Virus type Protein interactions Krogan Social Friend network Time E.g. Obesity, Heart

disease Social

We have recently prototyped support for phospho-proteomic data visualization within ChiBE (Figure 2). We use background color of the nodes to visualize differences of values between two proteomic measurements. It also can visualize total abundance versus phospho-protein levels using a node

decoration. Currently, this visualization mechanism is not available within Cytoscape using CySBGN, the Cytoscape App that provides SBGN support, because there is no simple method for mapping SBGN elements to biological entities within SBGN files. This useful feature is missing from SBGN-ML, the SBGN file format20,21, thus we will add it to the SBGN programming interface, libSBGN, to support NRNB tools that use SBGN (Cytoscape, ChiBE, and PathVisio) and the analyses described here. This will also support of NCI-60 analysis results from the Pommier DBP – see TRD2.

Secondly, to support the analyses described in this project, there is a need for a mechanism for moving between detailed network views and more abstract representations. This multi-scale representation functionality allows researchers to address a range of biological questions (from qualitative to quantitative) that make use of the same underlying data. Through apps, Cytoscape (and independent tools, such as VisANT) supports related features for simple graphs, but none of these Apps

supports this functionality in the detailed SBGN notation22,23. There is some previous work on this topic by Vogt et al. and by members of the Sander and Bader labs in the reduction of detailed BioPAX, however, neither of these methods is supported by libSBGN24,25. Methods Task 1: Perturbation biology method. Perturbation biology is a powerful network modeling method based on belief propagation inference algorithm to predict system response to experimentally untested perturbations of interest. The method has been enhanced by use of prior information network models extracted from the Pathway Commons database Sander and Bader develop. We will develop a new version of the PERA tool to generate confidence scores related to each interaction in the prior model. The confidence score (i.e., a measure of the existence of an interaction across various conditions) for each interaction will be estimated based on co-citation results under particular conditions using literature mining approaches, co-expression data for input samples, and evidence code information (indicating how particular interactions are supported). For example, the count for co-citation of RAF and MEK phosphorylation from the White DBP lab at MIT (e.g., pMEK, pRAF) events in melanoma may be higher than the count for the same phosphorylation events

Figure 2. A prototype application that uses background colors to visualize changes in a network based on proteomic data which phosphorylation state data.

Figure 3. A flowchart of the integrated pathway and statistical tools for analysis of response to perturbations in biological systems.

in breast cancer. This literature-curated interaction information will also be used in network models for melanoma samples and the perturbation biology method. Methods for extraction of co-citation information from literature have been previously developed in projects, such as iHOP and we will use this as a resource26. In parallel, distance metric learning algorithms from machine learning will be used to further improve the prior confidence scores27. The goal of distance metric learning algorithms is to generate distance functions for particular tasks. In this scenario, we will use them in combination with text-mining approaches to identify textual fragments that are likely to contain novel interaction or cell-type specific information. This will be determined by the similarity of the new textual data and text identified in Pathway Commons (in the form of annotations) as containing interaction information. These innovations will improve the prior data used in the PERA algorithm and the perturbation biology method. Task 2: Causal pathway analysis. In cases where it is not feasible to use our perturbation biology method due to a low number of perturbation experiments, we will employ an analysis pipeline, which we call causal pathway analysis (Figure 3). The first step in this pipeline is the statistical filtering out of biological entities that do not respond significantly to perturbations. For instance, in the collaboration between the Sander and White labs (DBP), a combination of 190 peptides (i.e., total protein levels and phospho-protein levels) were collected across four different time points in the liposarcoma cell line DDLS8817 where the cell line was treated with a CDK4 inhibitor. For each individual time point, we will develop network models by first applying a correlation-based (e.g., Spearman correlation) filter of peptide levels in relation to cell viability measurements in different doses. In the second step, we will extract biological sub-networks from the Pathway Commons database using established PERA methodology and map response data values on to these network models. The third step will apply the NetBox module detection software we developed on the resulting network models. NetBox uses the edge betweenness algorithm by Girvan and Newman for modularity detection15,28. One advantage of the NetBox software is its ability to consider linked proteins that are outside of the context of the experimentally measured proteins with additional filtering for proteins with many interaction partners. This capacity will potentially enable us to observe affected modules that may not otherwise be seen with available peptide measurements. The final step of this analysis will be to examine changes in pathway modules, for example across time (e.g., between early and late time points) and determine 1) if peptides within a give module are significantly altered between time points and 2) if a given module has a significant relevance either through a gene set enrichment analysis as was originally developed for use with the NetBox algorithm15,29. Examination of the affected modules may reveal additional therapeutic intervention points in liposarcoma; examination of known drug targets will be conducted using the PiHelper tool developed by the Sander lab30 to identify these. Task 3: Develop software tools for the visualization of differential networks models. As discussed above, we will develop differential network result visualization methods. The first will be the development of a mapping mechanism for the SBGN notation of the biological pathway models used above and the second will be development of software to support multi-scale representation in SBGN (also linking to TRD3). For mapping, we will extend the SBGN-ML format and the libSBGN programming interface to support a simplified mechanism for annotating nodes to external database references for biological entities. To do this, we will make use of the extension mechanism available for SBGN and build support for Identifiers.org and MIRIAM-based identifiers (community projects that support unique and perennial identifiers for data)31 that we already use in Pathway Commons. This will enable users to map the results of differential response analyses onto annotated SBGN-based network views; use of this feature in SBGN-compatible tools (such as Cytoscape and PathVisio) will be necessary for widespread use. For multi-scale representation, we will implement support for network reduction rules provided by Paxtools (a programming interface for BioPAX used by the PERA tool) and export these to SBGN-ML files, which Paxtools supports. We will support a detailed representation (i.e., the SBGN Process Description language) and a simplified representation known as the SBGN Activity Flow (AF) language; this work will involve developing a mapping between the network reduced representation from Paxtools and the SBGN AF notation; several of the members of the Sander lab are involved in both SBGN and BioPAX efforts and are experts in both formats.

1.2 DEVELOP PROTEIN NETWORK ALIGNMENT ALGORITHM AND VIEWER TECHNOLOGY

Project Leaders: Gary Bader (University of Toronto) and Trey Ideker (UCSD) Overview. In this project, we will develop technology to align networks across contexts, with a particular focus on multi-scale protein-protein interface interaction networks (IINs). Similar to sequence alignment algorithms and viewers, network alignment algorithms and viewers will enable the study of networks altered over evolution and via genetic and other disruptions within a population leading to a better understanding of what network elements are shared and thus generally important, and which are different and thus important for specific contexts. This work will be driven by the need to study network changes in the context of protein sequence changes over evolution and to analyze edgetic interaction networks that identify interactions gained or lost in response to protein sequence mutation, mapped at Marc Vidal’s Center for Cancer Systems Biology at the Dana Farber Cancer Institute in Boston. Objectives. The major goal of this project is to develop new algorithmic and visualization technology for comparing networks. We will focus on protein-protein interaction networks and the binding site interfaces that mediate the interactions and can be affected by mutations across diseases or evolution to rewire the network, as mapped in our DBP. Specifically, we will:

• Implement and improve network alignment algorithms for IINs. We will implement the GreedyPlus IIN alignment algorithm and a selected set of state-of-the-art network alignment algorithms as a Cytoscape App, called NetworkAligner.

• Develop a network alignment viewer. Protein and DNA sequence alignment viewers are at the core of the bioinformatics toolset. As the need to compare networks across contexts grows, we envision increased need for network alignment viewers, which currently don’t exist as a class of software. We will develop a Cytoscape App, called NetworkAlignmentViewer, for this purpose.

• Aggregate and make IIN data available to Cytoscape via PSICQUIC server. No central source of IIN data exists, thus we will collect this data from multiple sources, including our DBP, normalize it using the PSI-MI standard format and make it available to Cytoscape via a web service.

Background and Significance. The increasing ease and accuracy of experimental methods to detect protein-protein interactions has recently resulted in the ability to map the interactomes of multiple species and experimental contexts. We now have an opportunity to compare these interactomes to better understand how they evolve over time – to identify which regions are conserved, and thus globally important, and which are not conserved, and thus likely important for species- or context-specific traits. Comparison tools such as network alignment algorithms are essential for our ability to exploit and extract information from the many protein interaction-mapping efforts currently underway. The alignment of protein-protein interaction networks (PPINs) serves as a systems-level analog to biological sequence alignments. Such alignments enable inferences to be drawn between proteins of different species, which may agree or disagree with conclusions drawn from sequence-based alignments. Examining aligned networks or sub-networks will help answer how these interactomes evolved, just as sequence alignment has done with genes, proteins and genomes. Network alignment will also reveal interaction network “mutations” just as sequence alignments have done for genetic mutations. A number of network alignment methods have been developed for protein-protein interaction networks. These are broadly categorized into local and global alignment algorithms. Local alignment algorithms seek small subnetworks that are topologically similar, emphasizing regions of high-confidence alignment between the two networks. The first network alignment algorithms, developed by the Ideker group, were local (e.g., PathBLAST32 and NetworkBlast33). Others have been developed, such as NetAligner34 and MaWISH35. Global network alignment methods attempt to align all or most of the proteins in two or more PPINs and include

IsoRank36, IsoRankN37, GRAAL38, H-GRAAL39, MI-GRAAL40, C-GRAAL41, Graemlin42 and Graemlin 2.043 and others. Metabolic pathway alignment algorithms have also been developed44-47. While network alignment is well-studied, it does not consider important aspects of PPINs, such as binding interfaces that are responsible for mediating the interaction. Interface-interaction networks (IINs) are a refinement of PPINs where proteins are subdivided into their separate interaction interfaces48. In an IIN, a vertex represents a binding site and an edge represents a direct physical interaction between two binding sites on their respective proteins. The higher resolution of IINs supports new biological insights that cannot be derived from standard PPINs. For example, IINs can distinguish between “date hubs” – proteins that interact with many partners, but not at different times or in different locations – and “party hubs” – proteins that interact with many partners simultaneously49, while these distinct hub types appear identically in a PPIN. The study of IINs will also help interpret how domain and binding site gain and loss affect the PPIN and how binding site sequence mutations over evolution and in disease cause PPI gain and loss50-52. This latter point is the goal of our DBP with Marc Vidal’s Center for Cancer Systems Biology (CCSB) at the Dana Farber Cancer Institute. The Vidal group is leading a project to identify human disease mutations in binding sites that affect specific interactions and not others of a given protein. Given a large number of these “edgotypic” interactions, network alignment algorithm technology that we develop will be valuable to automatically identify and visualize gain, loss or swap of protein interactions and the changes to mapped or known binding sites that are responsible. Such applications will be relevant to a broader range of our DBPs where networks will be perturbed in any manner over conditions or time, including Nevan Krogan’s AP-MS-derived host-pathogen network comparisons, ICGC, Sage, NCI-60 and social networks. In preliminary work, the Bader group has shown that traditional network alignment algorithms do not function well with IINs due to the topology differences present. To address this, we have developed a novel alignment algorithm for IINs, called GreedyPlus (Figure 4). In this project, we will develop technology to implement select network alignment algorithms, including GreedyPlus, and develop a network alignment viewer within Cytoscape.

Figure 4. Example network alignment using GreedyPlus.

Methods Task 1: Implement and improve network alignment algorithms for IINs. We will implement the GreedyPlus IIN alignment algorithm and a selected set of state of the art network alignment algorithms as a Cytoscape App, called NetworkAligner. Implementation will be similar to our popular ClusterMaker app, which makes available a range of network clustering algorithms within Cytoscape53. ClusterMaker was developed in collaboration with authors of various network clustering algorithms, thus benefitted from crowdsourcing development, motivated by co-authorship on a paper. We will follow the same model to build NetworkAligner. We will define an application programming interface (API) for network alignment algorithms to make this easier, as multiple developers can then code to the API. We will select algorithms to implement to cover major classes of local and global aligners, in addition to those already available as open source Java code, or those that can be ported to Java easily by the original authors. We have already implemented our own Java versions of GreedyPlus, IsoRank36 and GRAAL54 that will form the basis of the initial NetworkAligner app. Task 2: Develop network alignment viewer. Protein and DNA sequence alignment viewers, such as JalView55, are at the core of the bioinformatics toolset. As cross-species network information grows, we envision increased need for network alignment viewers, which currently don’t exist as a class of software. Cytoscape can be used to show a network alignment result, and a rudimentary viewer is available in the recently developed GASOLINE app56 (Figure 5). However, a general network alignment viewer requires additional features, including: 1) a standard network alignment file format useful for importing alignments from third-party alignment tools, 2) an aligned node viewer showing information supporting the alignment, such as a protein sequence alignment or functional similarity score, 3) support for multi-scale network alignment results, as determined by e.g., GreedyPlus, and 4) highlighting of missing and gained nodes and interactions. We will develop these features in a Cytoscape App, called NetworkAlignmentViewer, including interoperability with third-party alignment tools as a key feature. We will develop the viewer in 2D, but also take advantage of a 3D rendering system the Bader lab has developed (beta release at http://wiki.cytoscape.org/Cytoscape_3/3D_Renderer) to develop a 3D visualization mode, where networks are shown as planes with alignment links connecting the planes (similar to what is simulated in 2D by the Gasoline app, Figure 5). This will enable larger networks to be visualized. We will work with the network alignment community to support additional features. Task 3: Aggregate and make IIN data available to Cytoscape via PSICQUIC server. While protein interaction network information is widely available, interface interaction network (IIN) data is currently difficult to collect from multiple heterogeneous sources, such as the DOMINO57, atomic level molecular interaction structures in the PDB58, multiple PDB-derived databases, such as BioLip59 and Interactome3D60, and experimental data such as that generated by the Bader and Vidal labs50,61,62. We will collect protein binding site level information from multiple sources, including those cited above, and make it available in the standard PSI-MITAB version 2.7 format, which supports binding site features. We will focus on supporting organisms with the largest amount of this type of data, such as worm, yeast and human. This data will be made available as a PSICQUIC web service63. The Bader lab already maintains three public PSICQUIC web services – BIND64, GeneMANIA65, and InteroPorc66 – thus it will be straightforward to set up additional servers. Once a PSICQUIC server is set up and registered with the central registry, it automatically becomes available for querying within Cytoscape based on previously developed PSICQUIC import functionality. However, Cytoscape currently does not recognize binding site information returned from a PSICQUIC server, thus we will

Figure 5. NetworkAlignmentViewer

implement that feature. This will enable a workflow where a user downloads IIN data for alignment using GreedyPlus and visualization with the NetworkAlignmentViewer app. IIN data will also be available for other Cytoscape apps that consider binding sites and will connect with our goals for multi-scale network analysis and visualization in the Multi-scale Representations TRD. Links with other TRDs. GreedyPlus represents the first multi-scale network alignment algorithm, and thus relates to work in the Multi-scale Representations TRD. The NetworkAlignmentViewer app will benefit from multi-scale network visualization knowledge and technology gained in this TRD, which will support the development of multi-scale network alignment visualization options.

1.3 FACILITATING THE INTERPRETATION OF AP-MS DATA AS INTERACTION NETWORKS

Project Leaders: Alexander Pico (Gladstone Institutes) and John H. Morris (UCSF) Overview. Affinity purification mass spectrometry (AP-MS) is a proven technique for determining large-scale and high quality protein-protein interaction networks (c.f. 67-69). AP-MS is now being used to map networks across biological contexts, e.g., species, viruses, cell lines, conditions, host states. Recent advances in AP-MS techniques and instrumentation promise a significant increase in the scale of protein-protein interaction networks derived from AP-MS and the variety of use cases for AP-MS (c.f. 70,71). In addition, new public data repositories are becoming available to support mass spectrometry proteomics experiments. Combined with efforts to determine the entire HEK293T human interactome through AP-MS72 these repositories will enable new differential analyses using targeted AP-MS to explore network changes in response to disease, infection, or other perturbations. The networks resulting from AP-MS have the significant advantages of being quantitative – giving a measure of the abundance of the association; and providing additional biological information such as the state of post-translational modifications, which are critical to understanding molecular function. However, the quantitative nature of AP-MS data also leads to one of the challenges of interpreting the data, requiring both computational and visualization advances over traditional protein-protein interaction networks (e.g., Y2H-derived)73. The goal of this project is to support the quantitative analysis and visualization of AP-MS experiments and enable the broader community to access quantitative AP-MS networks and visualize those results within the context of other -omics data. Objectives. We will develop tools to make specialized methods more accessible and to bridge gaps to improve frequent workflows. The project has two specific tasks: • Support researchers using MS-derived data by augmenting Cytoscape with tools to streamline the

typical MS analysis pipeline. These tools will enable data import, filtering, scoring and clustering, as well as visualization.

• Support the broader research community by augmenting Cytoscape with tools to access public repositories of quantitative AP-MS data and analyze and visualize that data in context with other -omics data already supported by Cytoscape. These tools will focus on annotation, data integration, network augmentation and network comparison, taking advantage of differential network analysis technology developed in this TRD.

Background and Significance. MS proteomics experiments have been used to explore the interactome of yeast67-69, host-pathogen interactions74-77, signaling networks78,79, network rewiring in cancer80, and even as a component of protein complex structural understanding81,82. In all of these biological applications, a major result is (either an explicit or implicit) network model capturing the interactions of the proteins or, in the case of structural analysis using crosslinking, the interactions of individual amino acids within those proteins. This creates a natural affinity between network visualization and analysis tools such as Cytoscape and AP-MS derived data.

As our primary DBP for this work, Dr. Krogan has pioneered strategies for large-scale protein-protein interaction analysis and is applying these methods to the study of host-pathogen interactions. A comprehensive and unbiased survey of host-pathogen interactions is revealing critical biology underlying virus protein homeostasis and evolution. Such an approach will yield global insight towards chaperone networks, quality control networks, and how these modulate virus replication efficiency, adaptation and pathogenesis. The differential network analysis of HIV protein interactions with two human cell lines, for example, served as the driving biomedical project behind the recent Nature Protocol co-authored by Drs. Pico and Morris83, and drives the technology develop project described here to facilitate the network analysis of AP-MS data. Expressing AP-MS data as a network and providing users access to quantitative AP-MS data is challenging, as we learned while preparing the 30-page protocol and 16-page supplemental tutorial. There is much room for improvement in streamlining this workflow, which will save substantial AP-MS analysis time. The standard scoring protocol for raw AP-MS data can be completed in 2-3 hours, resulting in a table of scored interactions that is then imported, augmented, analyzed and visualized in Cytoscape. The import step alone can take up to 2 hours; and network augmentation can take another 4 hours, which has been highly frustrating for Krogan lab members and creating a huge barrier-to-entry for AP-MS practitioners in general. Providing tools to streamline these two steps 10-fold would significantly expedite the availability and utility of AP-MS networks and increase the throughput of the Krogan lab and other AP-MS labs. This would also enable researches using AP-MS to use the wealth of Cytoscape apps and publicly available data to visualize and analyze their data in a biologically meaningful context. In addition to challenges associated with direct analysis of AP-MS data, there are also challenges associated with providing AP-MS data in a form that can be used by other researchers. Currently, most AP-MS data is deposited as binary interactions between proteins in public repositories, such as IMEx84. Most journals do not yet require deposition of raw data, though repositories such as PRIDE85 exist. A wealth of information present in AP-MS data is lost when the quantitative information on abundance, reproducibility and specificity is reduced to a simple binary interaction73. The results of scoring protocols can provide more subtle information relevant to distinguishing indirect vs. direct associations or potentially weak associations that might indicate transient interactions. Furthermore, while existing interaction repositories provide information about proteins, AP-MS data can include information about protein post-translational modification state (PTMs), providing another source of biologically meaningful information (see TRD1.1 and White liposarcoma and Pommier NCI-60 DBPs). Providing appropriate tools to access AP-MS repositories that include quantitative information and support appropriate visualizations of the information, including information about scoring results, abundances, and PTM data will provide researchers with additional biologically meaningful information that is currently not considered in traditional analysis workflows. The PTM data and quantitative results from our DBPs will generate networks in various states. The quantitative information associated with the associations will be used, for example, to find associations, which are transitory or weak in the base network, but much more tightly bound in the perturbed network. This may indicate a pathological condition due to change in PTM state or protein mutation. To support differential network analysis that uses this quantitative data, we will work with TRD1.2 to incorporate quantitative information as edge weights into network alignment algorithms, through modification of the alignment-scoring step. Methods We will develop a set of tools for mass spectrometry practitioners to seamlessly transition into network-based visualizations and analyses, without needlessly reducing the wealth of information contained in AP-MS data. As discussed above, and directed by our collaborators and DBPs, these tools will address the challenges of visualizing and interpreting AP-MS data from scoring and annotation to network augmentation and comparison. Building on the Cytoscape platform, these tools will be implemented as a set of interoperable apps, maximizing accessibility and ease-of-use.

Task 1: Tools to support the use of Cytoscape by AP-MS practitioners. These tools will streamline the formatting and import of a typical AP-MS data set, based on what we have learned by working closely with Krogan lab members on their analysis. As captured in the Morris et al. protocol (in press), even after the scoring procedure is completed, the process of importing AP-MS results into Cytoscape is an unnecessarily complicated set of steps. This initial barrier effectively excludes all but expert Cytoscape users from working with AP-MS data sets in Cytoscape. The first tool will thus be a specialized file importer that takes AP-MS data file and performs network import, edge attribute import and prey attribute import and bait attribute import, all in a single step. With protein identities and interactions properly associated with quantitative information in Cytoscape's network and table model, these data can then leverage relevant existing apps and tools in Cytoscape, such as the clustering and visualization tool, clusterMaker86, which is used to identify modules and protein complexes in AP-MS data. We will also develop AP-MS specific apps, such as one for viewing abundances, MS spectra, and peptide lists associated with proteins in the network, and one for accessing public AP-MS data from repositories, such as MassIVE [http://massive.ucsd.edu]), including information about scores and (if available) information such as abundance counts. Task 2: Tools for augmenting AP-MS interaction networks. Once AP-MS data is more easily accessed, imported and assessed from within Cytoscape, a host of methods become more applicable. In our work with the Krogan DBP and other collaborators, one of the most frequently faced challenges relates to network augmentation, which involves loading additional data onto a network to support integrative analysis. Augmentation is often a prerequisite for other types of network analysis, to remedy sparse interaction data, and leverages context from orthogonal and curated sources. Relevant data types include:

• other protein oriented data (e.g., protein interactions from GeneMANIA and other AP-MS data) • human genetic information, including disease linked genes (e.g., GWAS; see TRD2) • known pathways (e.g., KEGG, WikiPathways, Pathway Commons) • gene function information (e.g., GO; see TRD3) • gene expression datasets (e.g., ArrayExpress, GEO)

While many of these are already accessible via apps in Cytoscape, users face a convoluted set of steps–once again requiring expertise–to distinguish bait proteins from prey proteins when forming a database query or when filtering returned results. In host-pathogen networks, for example, this distinction is relevant to organism context in the augmentation step. For this, we will develop an dedicated interface for augmenting AP-MS interaction networks that considers these factors (based on the information provided by our importer tool) and provides lists or calls to corresponding Cytoscape apps and web services. Analysis and visualization of the differential networks being mapped using AP-MS methods is crucial to gaining insight into changes to pathways and complexes during infection, disease and other perturbations. By providing tools in the main areas described above, the Krogan lab and others will be able to increase the throughput and analyses of AP-MS data sets by streamlining both the processing and contextualization of AP-MS data with other biological data sets. Furthermore, the broader research community will have better access to quantitative AP-MS data to use as a platform for viewing their own data in context or as a baseline for comparing against perturbed (e.g., diseased) sub-networks.

TRD 1: DIFFERENTIAL NETWORKS – BIBLIOGRAPHY AND REFERENCES CITED 1. A., K. et al. Perturbation biology models predict c-Myc as an effective co-target in RAF inhibitor

resistant melanoma cells. Biorxiv (2014). 2. Miller, M.L. et al. Drug synergy screen and network modeling in dedifferentiated liposarcoma

identifies CDK4 and IGF1R as synergistic drug targets. Sci Signal 6, ra85 (2013). 3. Molinelli, E.J. et al. Perturbation biology: inferring signaling networks in cellular systems. PLoS

Comput Biol 9, e1003290 (2013). 4. Nelander, S. et al. Models from experiments: combinatorial drug perturbations of cancer cells.

Mol Syst Biol 4, 216 (2008). 5. Machado, D. et al. Modeling formalisms in Systems Biology. AMB Express 1, 45 (2011). 6. Li, K.C. & Yuan, S. A functional genomic study on NCI's anticancer drug screen.

Pharmacogenomics J 4, 127-35 (2004). 7. Cerami, E.G. et al. Pathway Commons, a web resource for biological pathway data. Nucleic

Acids Res 39, D685-90 (2011). 8. Demir, E. et al. The BioPAX community standard for pathway data sharing. Nat Biotechnol 28,

935-42 (2010). 9. Schurer, S.C. & Muskal, S.M. Kinome-wide activity modeling from diverse public high-quality

data sets. J Chem Inf Model 53, 27-38 (2013). 10. Alekseyenko, A.V. et al. Causal graph-based analysis of genome-wide association data in

rheumatoid arthritis. Biol Direct 6, 25 (2011). 11. Kim, Y.A., Wuchty, S. & Przytycka, T.M. Identifying causal genes and dysregulated pathways

in complex diseases. PLoS Comput Biol 7, e1001095 (2011). 12. Kramer, A., Green, J., Pollard, J., Jr. & Tugendreich, S. Causal analysis approaches in

Ingenuity Pathway Analysis. Bioinformatics 30, 523-30 (2014). 13. Li, J. & Lu, Z. Pathway-based drug repositioning using causal inference. BMC Bioinformatics

14 Suppl 16, S3 (2013). 14. Shin, S.Y. et al. Interrogating causal pathways linking genetic variants, small molecule

metabolites, and circulating lipids. Genome Med 6, 25 (2014). 15. Cerami, E., Demir, E., Schultz, N., Taylor, B.S. & Sander, C. Automated network analysis

identifies core pathways in glioblastoma. PLoS ONE 5, e8918 (2010). 16. Barretina, J. et al. Subtype-specific genomic alterations define new targets for soft-tissue

sarcoma therapy. Nat Genet 42, 715-21 (2010). 17. Le Novere, N. et al. The Systems Biology Graphical Notation. Nat Biotechnol 27, 735-41

(2009). 18. Babur, O. et al. Integrating biological pathways and genomic profiles with ChiBE 2. BMC

Genomics 15, 642 (2014). 19. Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular

interaction networks. Genome Res 13, 2498-504 (2003). 20. Goncalves, E., van Iersel, M. & Saez-Rodriguez, J. CySBGN: a Cytoscape plug-in to integrate

SBGN maps. BMC Bioinformatics 14, 17 (2013). 21. van Iersel, M.P. et al. Software support for SBGN maps: SBGN-ML and LibSBGN.

Bioinformatics 28, 2016-21 (2012). 22. Hu, Z. et al. VisANT 4.0: Integrative network platform to connect genes, drugs, diseases and

therapies. Nucleic Acids Res 41, W225-31 (2013). 23. Royer, L., Reimann, M., Andreopoulos, B. & Schroeder, M. Unraveling protein networks with

power graph analysis. PLoS Comput Biol 4, e1000108 (2008). 24. Demir, E. et al. Using biological pathway data with paxtools. PLoS Comput Biol 9, e1003194

(2013). 25. Vogt, T., Czauderna, T. & Schreiber, F. Translation of SBGN maps: Process Description to

Activity Flow. BMC Syst Biol 7, 115 (2013).

26. Hoffmann, R. & Valencia, A. A gene network for navigating the literature. Nat Genet 36, 664 (2004).

27. Xing, E.P., Jordan, M.I., Russell, S. & Ng, A.Y. Distance metric learning with application to clustering with side-information. . Advances in neural information processing systems 15, 505-512 (2002).

28. Newman, M.E. & Girvan, M. Finding and evaluating community structure in networks. Phys Rev E Stat Nonlin Soft Matter Phys 69, 026113 (2004).

29. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102, 15545-50 (2005).

30. Aksoy, B.A. et al. PiHelper: an open source framework for drug-target and antibody-target data. Bioinformatics 29, 2071-2 (2013).

31. Juty, N., Le Novere, N. & Laibe, C. Identifiers.org and MIRIAM Registry: community resources to provide persistent identification. Nucleic Acids Res 40, D580-6 (2012).

32. Kelley, B.P. et al. PathBLAST: a tool for alignment of protein interaction networks. Nucleic Acids Res 32, W83-8 (2004).

33. Kalaev, M., Smoot, M., Ideker, T. & Sharan, R. NetworkBLAST: comparative analysis of protein networks. Bioinformatics 24, 594-6 (2008).

34. Pache, R.A. & Aloy, P. A novel framework for the comparative analysis of biological networks. PLoS One 7, e31220 (2012).

35. Koyuturk, M. et al. Pairwise alignment of protein interaction networks. J Comput Biol 13, 182-99 (2006).

36. Singh, R., Xu, J. & Berger, B. Global alignment of multiple protein interaction networks with application to functional orthology detection. Proc Natl Acad Sci U S A 105, 12763-8 (2008).

37. Liao, C.S., Lu, K., Baym, M., Singh, R. & Berger, B. IsoRankN: spectral methods for global alignment of multiple protein networks. Bioinformatics 25, i253-8 (2009).

38. Kuchaiev, O., Milenkovic, T., Memisevic, V., Hayes, W. & Przulj, N. Topological network alignment uncovers biological function and phylogeny. J R Soc Interface 7, 1341-54 (2010).

39. Milenkovic, T., Ng, W.L., Hayes, W. & Przulj, N. Optimal network alignment with graphlet degree vectors. Cancer Inform 9, 121-37 (2010).

40. Kuchaiev, O. & Przulj, N. Integrative network alignment reveals large regions of global network similarity in yeast and human. Bioinformatics 27, 1390-6 (2011).

41. Memisevic, V. & Przulj, N. C-GRAAL: common-neighbors-based global GRAph ALignment of biological networks. Integr Biol (Camb) 4, 734-43 (2012).

42. Flannick, J., Novak, A., Srinivasan, B.S., McAdams, H.H. & Batzoglou, S. Graemlin: general and robust alignment of multiple large interaction networks. Genome Res 16, 1169-81 (2006).

43. Flannick, J., Novak, A., Do, C.B., Srinivasan, B.S. & Batzoglou, S. Automatic parameter learning for multiple local network alignment. J Comput Biol 16, 1001-22 (2009).

44. Pinter, R.Y., Rokhlenko, O., Yeger-Lotem, E. & Ziv-Ukelson, M. Alignment of metabolic pathways. Bioinformatics 21, 3401-8 (2005).

45. Cheng, Q., Harrison, R. & Zelikovsky, A. MetNetAligner: a web service tool for metabolic network alignments. Bioinformatics 25, 1989-90 (2009).

46. Wernicke, S. & Rasche, F. Simple and fast alignment of metabolic pathways by exploiting local diversity. Bioinformatics 23, 1978-85 (2007).

47. Cakmak, A. & Ozsoyoglu, G. Mining biological networks for unknown pathways. Bioinformatics 23, 2775-83 (2007).

48. Johnson, M.E. & Hummer, G. Interface-Resolved Network of Protein-Protein Interactions. PLoS Comput Biol 9, e1003065 (2013).

49. Han, J.-D.J. et al. Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature 430, 88-93 (2004).

50. Xin, X. et al. SH3 interactome conserves general function over specific form. Mol Syst Biol 9, 652 (2013).

51. Reimand, J., Hui, S., Jain, S., Law, B. & Bader, G.D. Domain-mediated protein interaction prediction: From genome to network. FEBS Letters 586, 2751-2763 (2012).

52. Reimand, J., Wagih, O. & Bader, G.D. The mutational landscape of phosphorylation signaling in cancer. Sci. Rep. 3(2013).

53. Morris, J.H. et al. clusterMaker: a multi-algorithm clustering plugin for Cytoscape. BMC Bioinformatics 12, 436 (2011).

54. Kuchaiev, O., Stevanovic, A., Hayes, W. & Przulj, N. GraphCrunch 2: Software tool for network modeling, alignment and clustering. BMC Bioinformatics 12, 24 (2011).

55. Waterhouse, A.M., Procter, J.B., Martin, D.M., Clamp, M. & Barton, G.J. Jalview Version 2--a multiple sequence alignment editor and analysis workbench. Bioinformatics 25, 1189-91 (2009).

56. Micale, G., Pulvirenti, A., Giugno, R. & Ferro, A. GASOLINE: a Greedy And Stochastic algorithm for optimal Local multiple alignment of Interaction NEtworks. PLoS ONE 9, e98750 (2014).

57. Ceol, A. et al. DOMINO: a database of domain-peptide interactions. Nucleic Acids Res 35, D557-60 (2007).

58. Berman, H.M. et al. The Protein Data Bank. 28, 235-242 (2000). 59. Yang, J., Roy, A. & Zhang, Y. BioLiP: a semi-manually curated database for biologically

relevant ligand-protein interactions. Nucleic acids research 41, D1096-103 (2013). 60. Mosca, R., Ceol, A. & Aloy, P. Interactome3D: adding structural details to protein networks.

Nature methods 10, 47-53 (2013). 61. Tonikian, R. et al. Bayesian modeling of the yeast SH3 domain interactome predicts

spatiotemporal dynamics of endocytosis proteins. PLoS Biol 7, e1000218 (2009). 62. Zhong, Q. et al. Edgetic perturbation models of human inherited disorders. Molecular systems

biology 5, 321 (2009). 63. Aranda, B. et al. PSICQUIC and PSISCORE: accessing and scoring molecular interactions.

Nature methods 8, 528-9 (2011). 64. Isserlin, R., El-Badrawi, R.A. & Bader, G.D. The Biomolecular Interaction Network Database in

PSI-MI 2.5. Database (Oxford) 2011, baq037 (2011). 65. Zuberi, K. et al. GeneMANIA prediction server 2013 update. Nucleic acids research 41, W115-

22 (2013). 66. Michaut, M. et al. InteroPORC: automated inference of highly conserved protein interaction

networks. Bioinformatics 24, 1625-31 (2008). 67. Ho, Y. et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by

mass spectrometry. Nature 415, 180-3 (2002). 68. Gavin, A.C. et al. Functional organization of the yeast proteome by systematic analysis of

protein complexes. Nature 415, 141-7 (2002). 69. Krogan, N.J. et al. Global landscape of protein complexes in the yeast Saccharomyces

cerevisiae. Nature 440, 637-43 (2006). 70. Hebert, A.S. et al. Neutron-encoded mass signatures for multiplexed proteome quantification.

Nat Methods 10, 332-4 (2013). 71. Hebert, A.S. et al. NeuCode Mouse and One Hour Proteomes. in Eleventh International

Symposium om Mass Spectrometry in the Health and Life Sciences: Molecular & Cellular Proteomics (ed. Burlingame, A.L.) (The American Society for Biochemistry and Molecular Biology, Inc., San Francisco, CA USA, 2014).

72. Huttlin, E. et al. High-Throughput Proteomic Mapping of Protein Interaction Networks: Toward a Global View of the Human Interactome. in Eleventh International Symposium om Mass Spectrometry in the Health and Life Sciences: Molecular & Cellular Proteomics (ed. Burlingame, A.L.) (The American Society for Biochemistry and Molecular Biology, Inc., San Francisco, CA USA, 2014).

73. Gingras, A.C. & Raught, B. Beyond hairballs: The use of quantitative mass spectrometry data to understand protein-protein interactions. FEBS Lett 586, 2723-31 (2012).

74. Jager, S. et al. Global landscape of HIV-human protein complexes. Nature 481, 365-70 (2012).

75. White, E.A. et al. Systematic identification of interactions between host cell proteins and E7 oncoproteins from diverse human papillomaviruses. Proc Natl Acad Sci U S A 109, E260-7 (2012).

76. Pichlmair, A. et al. Viral immune modulators perturb the human molecular network by common and unique strategies. Nature 487, 486-90 (2012).

77. Munday, D.C. et al. Using SILAC and quantitative proteomics to investigate the interactions between viral and host proteomes. Proteomics 12, 666-72 (2012).

78. Bisson, N. et al. Selected reaction monitoring mass spectrometry reveals the dynamics of signaling through the GRB2 adaptor. Nat Biotechnol 29, 653-8 (2011).

79. Song, J., Wang, Z. & Ewing, R.M. Integrated analysis of the Wnt responsive proteome in human cells reveals diverse and cell-type specific networks. Mol Biosyst 10, 45-53 (2014).

80. Song, J., Hao, Y., Du, Z., Wang, Z. & Ewing, R.M. Identifying novel protein complexes in cancer cells using epitope-tagging of endogenous human genes and affinity-purification mass spectrometry. J Proteome Res 11, 5630-41 (2012).

81. Stengel, F., Aebersold, R. & Robinson, C.V. Joining forces: integrating proteomics and cross-linking with the mass spectrometry of intact complexes. Mol Cell Proteomics 11, R111 014027 (2012).

82. Walzthoeni, T., Leitner, A., Stengel, F. & Aebersold, R. Mass spectrometry supported determination of protein complex structure. Curr Opin Struct Biol 23, 252-60 (2013).

83. Morris, J.H.K., G.M.; Verschueren, E.; Johnson, J.R.; Cimermancic, P.; Greninger, A.L.; Pico, A.R. Affinity Purification-Mass Spectrometry and Network Analysis to Understand Protein-Protein Interactions (accepted, pending publication). Nature Protocol (2014).

84. Orchard, S. et al. Protein interaction data curation: the International Molecular Exchange (IMEx) consortium. Nat Methods 9, 345-50 (2012).

85. Vizcaino, J.A. et al. The PRoteomics IDEntifications (PRIDE) database and associated tools: status in 2013. Nucleic Acids Res 41, D1063-9 (2013).

86. Morris, J.H. et al. clusterMaker: a multi-algorithm clustering plugin for Cytoscape. BMC Bioinformatics 12, 436 (2011).