Cytoscape and networks

Cytoscape and networks

Cytoscape and networksDavid Amarhttp://tau.ac.il/~davidama/bioinfo_tutorialsWorkshop overview (and disclaimers)Cytoscape is the most used tool for network analysisWe shall first cover the basicsOur goal is to get a (good) taste, and to see how easy it is to get and analyze dataWe will then move to advanced analysesCytoscape Apps and external toolsSlides cover >2 hours, take what we will not cover as exercisesThe slides show analysis in Cytoscape 3.1.1Network biologyOverview: systems biologyRepresent molecular entitiesRepresent interactionsTwo main data typesPathwaysInteraction networks

Biological interaction networksNodes: genes or other moleculesEdges: evidence for some interaction can contain weights, directionsMagtanong et al. 2011 Nature

Biological interaction networksNodes: genes/proteins or other moleculesEdges based on evidence for interaction

Voineagu et al. 2011 NatureBreker and Schuldiner 2009

Gene co-expressionProtein-protein interactionGenetic interaction55CytoscapeCytoscape is an open source software for integrating, visualizing, and analyzing networks. OutlineBasicsLoad and visualize dataCustomizeApplicationsClusteringEnrichment analysisGeneMANIAIntegrative analysis: Modmap and DICERGene expression analysis (inclusing jActiveModules)Creating co-expression matricesCytoscape BasicsInitial window

The toolbar, contains command buttons, the name is shown when the mouse pointer hovers over it.Main Network View, initially blank.Control Panel: lists the available networks by nameNetwork Overview PaneTable Panel: can be used to display node, edge, and network table dataLoad data: import from databases

Load data: import from databases

The initial window enables searching in the big public databasesLoad data: import from databasesSearch example: by gene name

Choose databasesImport result

The imported networks by nameBasic statistics

Look at a networkThe toolbar, contains command buttons, the name is shown when the mouse pointer hovers over it.Main Network ViewControl Panel: lists the available networks by nameNetwork Overview Pane: move around!Table Panel: displays node, edge, and network table dataSearch for a gene

Information about the marked nodesLoad data: import all interactions

Load data: import all interactions

Import result

The new network Load data: from filesWe sometimes have our own dataFrom papersA special search in a databaseOur experiment (e.g., correlation between genes)Famous formatsSIFA table OWL for pathways, complex textBut easy to get and very informative once uploadedLoad from files

Load from files

Contains an interaction network of 331 genes from Ideker et al. 2001 ScienceLoad data: from SIF files

Text: name1interaction_typename2Load data: from a tableFrom excel files or tab-delimited text tables

Load data: from a table

Load data: from a table

Set where to look for the nodes and the typeLoad data: from a table

OPTIONAL: Click on the columns that you want to be kept as attributesResult

Load data: OWLGood for looking at pathwaysThis example: data from the Reactome database

Load data: result

Directed edges: signalingZoom

Zoom

Focus on a selected region (nodes in yellow)Zoom: result

Move aroundGet a sub-network

Get a sub-network

The sub-network was created below the original networkSave the sessionWe imported six networksBefore we start modifying them lets save the sessionFile -> Save

Sanity check: close Cytoscape and load the session!RemarksAt this point we know how to load data from databases and filesWe can perform simple navigation, zoom and saveWe saved different networks each had its own visualization rulesA good habit that saves troubles: save a session for each visualization typeMultiple networks, but keep a consistent visualizationModifying and saving a visualizationCytoscape supports many visualization optionsLayoutsNode size, color, labelEdge width, line typeTo save the graph as a high quality image:

Change the layout

Organic layout

Circular layout

Places all of the nodes in a circular arrangement.Very quick Partitionsthe network into disconnected parts and independently lays out those parts.Force-directed

Try to position the nodes so that there are as few crossing edges (and such that the edges are of more or less equal length if possible)Change layout scale

Change the scale

Before: scale is 1

Scale is 8

Style

Open and modifyThe IntAct netowrk: node color

The IntAct netowrk: node color

Node colorEach column represents some information that we have (this is a column in the node table data)Discrete: set a value for each type of informationApplicationsAppsCytoscape has many tools called AppsInstall by going to Apps -> App ManagerApplications supportAdvanced analysisBiological analysisIntegrating dataImport special data

I) Find and annotate dense areasUse an app that clusters the networkBiological assumptionWe look for protein communitiesMany interactions withinProbably share functionGene function prediction

Step 1: remove duplicated edgesSometimes nodes are linked by more than one edgeMultiple evidence for interactionRemove them for clustering and simpler visualization

Step 2: use ClusterViz

Step 3: look at the results

All clustersSorted by sizeSelect a clusterStep 3: look at the results

Step 4: biological function?We discovered a clusterA set of highly connected proteinsWhat biological processes/functions are enriched in this cluster?Discover significantly over-represented biological functionsCompared to creating random clustersStep 4: BINGO

Select all nodes (Ctrl+A)

Step 4: BINGO

Give the cluster a name (Cluster 1)Select humanStep 4: Results

Summary tableGO graphOnly correted p-values matter!!!Mark in the networkII) Analyze a gene setWe have a set of genes we want to interpretFrom papersFrom data analysisWe want to discoverFunctional enrichmentsHow they interact within themselves and similar genesUse GeneMANIAResources and installationInstalling GeneMANIA may take >30 minutesStepsApps -> Apps ManagerInstall GeneMANIAOpen GeneMANIA (Apps->GeneMANIA)Confirm data downloadA new window will open: select human for this tutorial

GeneMANIAOur input: a set of genes from Hauser et al. 2005 (http://archneur.ama-assn.org/cgi/pmidlookup?view=long&pmid=15956162)HSPA1B, HSPA1A, DNAJC6, DNAJB2, UBE1, PARK5, SLC25A5, COX5B, COX6C, NDUFA3, ATP5I, HK1, COX4I1, ATP1B1, COX6B, SLC25A3, NDUFS5, ATP5O, UQCRH, ATP5C1, NDUFB8, ATP5G3, ATP5C1, VDAC3, COX4I1, COX7B, NDUFA9, ATP1B1, ATP6V0A1, ATP6V0D1, ATP6V0C, ATP6V1B2, SLC9A6, ATP61P1, ATP6V1D, ATP6V0B, ATP6V1A1, ATP6V1E1, GDI1, STXBP1, SYT1, VAMP1GeneMANIA: input window

Paste here the gene names (or ids) separated by spaces (no commas)GeneMANIA: input window

GeneMANIA: input windowThe recognized genes and their full namesThe type of the supported networksFor each interaction type there is a list of networks that can be marked

GeneMANIA: input windowUse physical interactions, pathways and co-expression for our exampleResults

Information tables. For example: the detected functionsThe output network. Grey nodes are new genes that were added to improve the connectivity

ResultsMark a function: automatically marks the relevant nodesLayout was modified to organic for better visualization

VS.Highlight specific interactions

Highlight specific interactionsIII) Analyze different interaction typesPositive expected within families

Negative expected between families

Some networks contain both

VS.Members of protein complexMembers of parallel pathwaysAnalysis of network pairsInteractions types can differ: within (positive) vs. between (negative) functional units Input: networks H,G with same vertex setGoal: summarize both networks in a module mapNode module: gene set highly connected in HLink two modules highly interconnected in GBetween-pathway modelsKelley and Ideker 2005Ulitsky et al. 2008Kelley and Kingsford 2011Leiserson et al. 2011

70

70Solution: ModMapCytoscape app: under constructionCurrently: run the command line tool and upload to Cytoscape as a solution

Problem exampleCombined analysis of yeast PPI and GI dataFind GIs among complexesOur data: yeast networksPPIs (yeast_ppi.txt)GIs after treatment with MMS (yeast_gi.txt)Load the network: type interaction typesLoad the association of nodes to modulesColor the results and the set layoutGet ModMaps solutionGo to http://acgt.cs.tau.ac.il/modmap/Download the jar file for unweighted networksOpen the command line (Run->cmd)In the command line navigate to the directory with the data and the jar fileUse the cd command

Required only in Windows (to move between drives)Get ModMaps solutionNow we want to run ModMapType: java jar ModMap_graphs.jarWe get the options of the program

Get ModMaps solutionFor our example use: java -Xmx2000m -jar ModMap_graphs.jar yeast_ppi.txt yeast_gi.txt 1 3 0.005 0 1Command line arguments are separated by spaces-Xmx2000m: java can use more space-jar: java knows we are running a jar fileModMap_graphs.jar: our softwareyeast_ppi.txt, yeast_gi.txt: the networks as txt filesLast four parameters are internal and used by the algorithms, use these by default

New files will now appear in the directoryBack to cytoscape: load the dataLoad the YeastData.xlsx file

Important, we have several types

Load the networkLoad the YeastData.xlsx fileRemove self loops and duplicated edgesThe network is large, we tell Cytoscape to generate itLoad a clustering solution

Modmap_modules.txt file format (text file):Node module_name

Import Table: a way to add external information about the nodesLoad a clustering solution

Right click and give it a nameLoad a clustering solution

Load a clustering solution

Layout a clustering solution

Layout a clustering solution: results

Unclustered nodesA circle for each clusterRemove unclustered nodes

Mark the selected nodes and create a sub-networkRemove self and duplicated edges

Zoom in on a part of the solution

Not informative enough, we cannot see edge typesChange the visualization style

Change the visualization styleChange the visualization style

Getting some insightsBelow we see a combined visualization of some of the modulesMarked modules have many PPIs between themBut their GI relations are not the sameLets run BinGO on each one!

Run BINGO

ResultsBoth modules are related to translationMetabolic vs. ribosomal genes

Additional material and exercises

I) Overlay gene expression dataData in the exp_data directoryLoad human PPI netwotk (sif file)Load gene p-value or fold-change results of a gene expression experimentAs before using the Import->Table optionTo filter genes with a score use Group Attribute Layout and select the genes with the scores

Genes with a scoreI) Overlay gene expression dataSet node color and size by the fold changeContinuous mapping (not discrete)Play with the layoutFor example, group attribute layoutRun BINGO on a selected sub-network

jActiveModulesThe previous analyses integrate results from non-network experimentsBut they are hard to understand when the networks are largeInstall jActiveModulesSearch for subnetworks that are well connected and with high node scoresjActiveModulesAfter installation open the App, on the left side you will see:

We select to analyze the logP featureNeed to reverse the order (here, the higher the score the more sig. is the gene)Press SearchResults

BINGO results on module 5

(II) Differential co-expression (DC)Detect gene groups with altered correlation patterns when moving from one group to anotherCan detect better groups compared to standard differential expression

(II) Differential co-expression (DC)DC patterns:

(II) Differential co-expression (DC)Go to the ModMap pageAlso DICER (http://acgt.cs.tau.ac.il/dicer/)Download the sample dataRunning from the command line is very similar to the ModMap examples in previous slidesThe results can be loaded for analysis in Expander (http://acgt.cs.tau.ac.il/expander/)

Running DICER (ModMap is similar)In the diff_coexpression file:Top3000.txt gene expression dataclassesFile partition of the columns to conditions

Running DICER (ModMap is similar)Command (files with the results will appear in the folder): java -Xmx2000m -jar dicer.jar Top3000.txt classesFile.txt 0 dicer_output

Output files_Modules.txt: modules that exhibit DC between_metaGraph.sif: (unweighted) module links_metaGraph.txt: module link weight (can be loaded to Cytoscape as an edge attribute)dicer_output: a file that contains all gene groups, including modules and clusters

Analysis in Cytoscape: ExpressionCorrelationCalculate coexpression networks from gene expression dataBoth for the genes and the conditionsBased on a predefined thresholdCan be done for a subset of the conditionsStep1: load the expression matrixFile->Import->Table->From fileMake sure: Change the first column type to abWhere to Import Table Data: keep as unassigned table

Step2: correlation matrix thresholdAdvanced options -> preview gene histogramCurrent bug: need load the dataset twice with different names to get the following window, call them AD1 and AD2

Advanced option: select columns WGAAD onlyStep2: correlation matrix thresholdCreate Histogram and change the cutoffs to:

Create the networkWe now have condition-specific correlation matrices

Do it yourselfCreate the same network using the WGACON columns of the expression dataWGAAD was built using AD1, use AD2 hereLoad the _Modules.txt file as node attributesIn the load table window select To selected networks only and make sure both AD1 and AD2 were selectedCreate the subnetworks of down modules 0 and 1Use group attribute layout Select the modulesCreate the subnetwork Result

Correlations in ADCorrelations in Control(III) CyToStructAn App that allows using Structural Biology tools directly from your sessionNir Ben-Tals groupSee https://bitbucket.org/sergeyn/cytostruct/wiki/Demos%20(case%20studies) for a tutorial

Documents

Cytoscape and networks