Upload
joylyn
View
123
Download
5
Tags:
Embed Size (px)
DESCRIPTION
Cytoscape and networks. David Amar http://tau.ac.il /~davidama/bioinfo_tutorials. Network biology. Overview: systems biology Represent molecular entities Represent interactions Two main data types Pathways Interaction networks. Biological interaction networks. - PowerPoint PPT Presentation
Citation preview
Cytoscape and networks
Cytoscape and networksDavid Amarhttp://tau.ac.il/~davidama/bioinfo_tutorialsWorkshop overview (and disclaimers)Cytoscape is the most used tool for network analysisWe shall first cover the basicsOur goal is to get a (good) taste, and to see how easy it is to get and analyze dataWe will then move to advanced analysesCytoscape Apps and external toolsSlides cover >2 hours, take what we will not cover as exercisesThe slides show analysis in Cytoscape 3.1.1Network biologyOverview: systems biologyRepresent molecular entitiesRepresent interactionsTwo main data typesPathwaysInteraction networks
Biological interaction networksNodes: genes or other moleculesEdges: evidence for some interaction can contain weights, directionsMagtanong et al. 2011 Nature
Biological interaction networksNodes: genes/proteins or other moleculesEdges based on evidence for interaction
Voineagu et al. 2011 NatureBreker and Schuldiner 2009
Gene co-expressionProtein-protein interactionGenetic interaction55CytoscapeCytoscape is an open source software for integrating, visualizing, and analyzing networks. OutlineBasicsLoad and visualize dataCustomizeApplicationsClusteringEnrichment analysisGeneMANIAIntegrative analysis: Modmap and DICERGene expression analysis (inclusing jActiveModules)Creating co-expression matricesCytoscape BasicsInitial window
The toolbar, contains command buttons, the name is shown when the mouse pointer hovers over it.Main Network View, initially blank.Control Panel: lists the available networks by nameNetwork Overview PaneTable Panel: can be used to display node, edge, and network table dataLoad data: import from databases
Load data: import from databases
The initial window enables searching in the big public databasesLoad data: import from databasesSearch example: by gene name
Choose databasesImport result
The imported networks by nameBasic statistics
Look at a networkThe toolbar, contains command buttons, the name is shown when the mouse pointer hovers over it.Main Network ViewControl Panel: lists the available networks by nameNetwork Overview Pane: move around!Table Panel: displays node, edge, and network table dataSearch for a gene
Information about the marked nodesLoad data: import all interactions
Load data: import all interactions
Import result
The new network Load data: from filesWe sometimes have our own dataFrom papersA special search in a databaseOur experiment (e.g., correlation between genes)Famous formatsSIFA table OWL for pathways, complex textBut easy to get and very informative once uploadedLoad from files
Load from files
Contains an interaction network of 331 genes from Ideker et al. 2001 ScienceLoad data: from SIF files
Text: name1interaction_typename2Load data: from a tableFrom excel files or tab-delimited text tables
Load data: from a table
Load data: from a table
Set where to look for the nodes and the typeLoad data: from a table
OPTIONAL: Click on the columns that you want to be kept as attributesResult
Load data: OWLGood for looking at pathwaysThis example: data from the Reactome database
Load data: result
Directed edges: signalingZoom
Zoom
Focus on a selected region (nodes in yellow)Zoom: result
Move aroundGet a sub-network
Get a sub-network
The sub-network was created below the original networkSave the sessionWe imported six networksBefore we start modifying them lets save the sessionFile -> Save
Sanity check: close Cytoscape and load the session!RemarksAt this point we know how to load data from databases and filesWe can perform simple navigation, zoom and saveWe saved different networks each had its own visualization rulesA good habit that saves troubles: save a session for each visualization typeMultiple networks, but keep a consistent visualizationModifying and saving a visualizationCytoscape supports many visualization optionsLayoutsNode size, color, labelEdge width, line typeTo save the graph as a high quality image:
Change the layout
Organic layout
Circular layout
Places all of the nodes in a circular arrangement.Very quick Partitionsthe network into disconnected parts and independently lays out those parts.Force-directed
Try to position the nodes so that there are as few crossing edges (and such that the edges are of more or less equal length if possible)Change layout scale
Change the scale
Before: scale is 1
Scale is 8
Style
Open and modifyThe IntAct netowrk: node color
The IntAct netowrk: node color
Node colorEach column represents some information that we have (this is a column in the node table data)Discrete: set a value for each type of informationApplicationsAppsCytoscape has many tools called AppsInstall by going to Apps -> App ManagerApplications supportAdvanced analysisBiological analysisIntegrating dataImport special data
I) Find and annotate dense areasUse an app that clusters the networkBiological assumptionWe look for protein communitiesMany interactions withinProbably share functionGene function prediction
Step 1: remove duplicated edgesSometimes nodes are linked by more than one edgeMultiple evidence for interactionRemove them for clustering and simpler visualization
Step 2: use ClusterViz
Step 3: look at the results
All clustersSorted by sizeSelect a clusterStep 3: look at the results
Step 4: biological function?We discovered a clusterA set of highly connected proteinsWhat biological processes/functions are enriched in this cluster?Discover significantly over-represented biological functionsCompared to creating random clustersStep 4: BINGO
Select all nodes (Ctrl+A)
Step 4: BINGO
Give the cluster a name (Cluster 1)Select humanStep 4: Results
Summary tableGO graphOnly correted p-values matter!!!Mark in the networkII) Analyze a gene setWe have a set of genes we want to interpretFrom papersFrom data analysisWe want to discoverFunctional enrichmentsHow they interact within themselves and similar genesUse GeneMANIAResources and installationInstalling GeneMANIA may take >30 minutesStepsApps -> Apps ManagerInstall GeneMANIAOpen GeneMANIA (Apps->GeneMANIA)Confirm data downloadA new window will open: select human for this tutorial
GeneMANIAOur input: a set of genes from Hauser et al. 2005 (http://archneur.ama-assn.org/cgi/pmidlookup?view=long&pmid=15956162)HSPA1B, HSPA1A, DNAJC6, DNAJB2, UBE1, PARK5, SLC25A5, COX5B, COX6C, NDUFA3, ATP5I, HK1, COX4I1, ATP1B1, COX6B, SLC25A3, NDUFS5, ATP5O, UQCRH, ATP5C1, NDUFB8, ATP5G3, ATP5C1, VDAC3, COX4I1, COX7B, NDUFA9, ATP1B1, ATP6V0A1, ATP6V0D1, ATP6V0C, ATP6V1B2, SLC9A6, ATP61P1, ATP6V1D, ATP6V0B, ATP6V1A1, ATP6V1E1, GDI1, STXBP1, SYT1, VAMP1GeneMANIA: input window
Paste here the gene names (or ids) separated by spaces (no commas)GeneMANIA: input window
GeneMANIA: input windowThe recognized genes and their full namesThe type of the supported networksFor each interaction type there is a list of networks that can be marked
GeneMANIA: input windowUse physical interactions, pathways and co-expression for our exampleResults
Information tables. For example: the detected functionsThe output network. Grey nodes are new genes that were added to improve the connectivity
ResultsMark a function: automatically marks the relevant nodesLayout was modified to organic for better visualization
VS.Highlight specific interactions
Highlight specific interactionsIII) Analyze different interaction typesPositive expected within families
Negative expected between families
Some networks contain both
VS.Members of protein complexMembers of parallel pathwaysAnalysis of network pairsInteractions types can differ: within (positive) vs. between (negative) functional units Input: networks H,G with same vertex setGoal: summarize both networks in a module mapNode module: gene set highly connected in HLink two modules highly interconnected in GBetween-pathway modelsKelley and Ideker 2005Ulitsky et al. 2008Kelley and Kingsford 2011Leiserson et al. 2011
70
70Solution: ModMapCytoscape app: under constructionCurrently: run the command line tool and upload to Cytoscape as a solution
Problem exampleCombined analysis of yeast PPI and GI dataFind GIs among complexesOur data: yeast networksPPIs (yeast_ppi.txt)GIs after treatment with MMS (yeast_gi.txt)Load the network: type interaction typesLoad the association of nodes to modulesColor the results and the set layoutGet ModMaps solutionGo to http://acgt.cs.tau.ac.il/modmap/Download the jar file for unweighted networksOpen the command line (Run->cmd)In the command line navigate to the directory with the data and the jar fileUse the cd command
Required only in Windows (to move between drives)Get ModMaps solutionNow we want to run ModMapType: java jar ModMap_graphs.jarWe get the options of the program
Get ModMaps solutionFor our example use: java -Xmx2000m -jar ModMap_graphs.jar yeast_ppi.txt yeast_gi.txt 1 3 0.005 0 1Command line arguments are separated by spaces-Xmx2000m: java can use more space-jar: java knows we are running a jar fileModMap_graphs.jar: our softwareyeast_ppi.txt, yeast_gi.txt: the networks as txt filesLast four parameters are internal and used by the algorithms, use these by default
New files will now appear in the directoryBack to cytoscape: load the dataLoad the YeastData.xlsx file
Important, we have several types
Load the networkLoad the YeastData.xlsx fileRemove self loops and duplicated edgesThe network is large, we tell Cytoscape to generate itLoad a clustering solution
Modmap_modules.txt file format (text file):Node module_name
Import Table: a way to add external information about the nodesLoad a clustering solution
Right click and give it a nameLoad a clustering solution
Load a clustering solution
Layout a clustering solution
Layout a clustering solution: results
Unclustered nodesA circle for each clusterRemove unclustered nodes
Mark the selected nodes and create a sub-networkRemove self and duplicated edges
Zoom in on a part of the solution
Not informative enough, we cannot see edge typesChange the visualization style
Change the visualization styleChange the visualization style
Getting some insightsBelow we see a combined visualization of some of the modulesMarked modules have many PPIs between themBut their GI relations are not the sameLets run BinGO on each one!
Run BINGO
ResultsBoth modules are related to translationMetabolic vs. ribosomal genes
Additional material and exercises
I) Overlay gene expression dataData in the exp_data directoryLoad human PPI netwotk (sif file)Load gene p-value or fold-change results of a gene expression experimentAs before using the Import->Table optionTo filter genes with a score use Group Attribute Layout and select the genes with the scores
Genes with a scoreI) Overlay gene expression dataSet node color and size by the fold changeContinuous mapping (not discrete)Play with the layoutFor example, group attribute layoutRun BINGO on a selected sub-network
jActiveModulesThe previous analyses integrate results from non-network experimentsBut they are hard to understand when the networks are largeInstall jActiveModulesSearch for subnetworks that are well connected and with high node scoresjActiveModulesAfter installation open the App, on the left side you will see:
We select to analyze the logP featureNeed to reverse the order (here, the higher the score the more sig. is the gene)Press SearchResults
BINGO results on module 5
(II) Differential co-expression (DC)Detect gene groups with altered correlation patterns when moving from one group to anotherCan detect better groups compared to standard differential expression
(II) Differential co-expression (DC)DC patterns:
(II) Differential co-expression (DC)Go to the ModMap pageAlso DICER (http://acgt.cs.tau.ac.il/dicer/)Download the sample dataRunning from the command line is very similar to the ModMap examples in previous slidesThe results can be loaded for analysis in Expander (http://acgt.cs.tau.ac.il/expander/)
Running DICER (ModMap is similar)In the diff_coexpression file:Top3000.txt gene expression dataclassesFile partition of the columns to conditions
Running DICER (ModMap is similar)Command (files with the results will appear in the folder): java -Xmx2000m -jar dicer.jar Top3000.txt classesFile.txt 0 dicer_output
Output files_Modules.txt: modules that exhibit DC between_metaGraph.sif: (unweighted) module links_metaGraph.txt: module link weight (can be loaded to Cytoscape as an edge attribute)dicer_output: a file that contains all gene groups, including modules and clusters
Analysis in Cytoscape: ExpressionCorrelationCalculate coexpression networks from gene expression dataBoth for the genes and the conditionsBased on a predefined thresholdCan be done for a subset of the conditionsStep1: load the expression matrixFile->Import->Table->From fileMake sure: Change the first column type to abWhere to Import Table Data: keep as unassigned table
Step2: correlation matrix thresholdAdvanced options -> preview gene histogramCurrent bug: need load the dataset twice with different names to get the following window, call them AD1 and AD2
Advanced option: select columns WGAAD onlyStep2: correlation matrix thresholdCreate Histogram and change the cutoffs to:
Create the networkWe now have condition-specific correlation matrices
Do it yourselfCreate the same network using the WGACON columns of the expression dataWGAAD was built using AD1, use AD2 hereLoad the _Modules.txt file as node attributesIn the load table window select To selected networks only and make sure both AD1 and AD2 were selectedCreate the subnetworks of down modules 0 and 1Use group attribute layout Select the modulesCreate the subnetwork Result
Correlations in ADCorrelations in Control(III) CyToStructAn App that allows using Structural Biology tools directly from your sessionNir Ben-Tals groupSee https://bitbucket.org/sergeyn/cytostruct/wiki/Demos%20(case%20studies) for a tutorial