Joint analysis of regulatory networks and expression profiles

1Joint analysis of regulatory networks and expression profilesRon ShamirSchool of Computer ScienceTel Aviv UniversityApril 20131

Sources: Igor Ulitsky and Ron Shamir. Identification of Functional Modules using Network Topology and High-Throughput Data. BMC Systems Biology 1:8 (2007). Igor Ulitsky and Ron Shamir. Identifying functional modules using expression profiles and confidence-scored protein interactions. BioinformaticsVol. 25 no. 9 1158-1164 (2009).

1OutlineBackgroundJoint network and expression profilesMatisseCezanne

22Background33DNARNAproteintranscriptiontranslation

The hard diskOne programIts output44DNA Microarrays / RNA-seqSimultaneous measurement of expression levels of all genes / transcripts.Perform 105-109 measurements in one experimentAllow global view of cellular processes. The most important biotechnological breakthroughs of the last /current decade

http://www.biomedcentral.com/1471-2105/12/323/figure/F255The Raw Data genesexperimentsEntries of the Raw Data matrix: expression levels.Ratios/absolute values/

expression pattern for each gene Profile for each experiment/condition/sample/chip Needs normalization!667EXPression ANalyzer and DisplayERClustering Identify clusters of co-expressed genesCLICK, KMeans, SOM, hierarchicalhttp://acgt.cs.tau.ac.il/expanderA. Maron, R. Sharan Bioinformatics 03Function. enrichmentGO, TANGO

Visualization

Promoter analysis Analyze TF binding sites of co-regulated genesPRIMABiclustering Identify homogeneous submatricesSAMBAA. Maron-Katz, A. Tanay, C. Linhart, I. Steinfeld, R. Sharan, Y. Shiloh, R. Elkon BMC Bioinformatics 05 microRNA function inference: FAME

Ulitsky et al. Nature Protocols 107Networks of Protein-protein interactions (PPIs)Large, readily available resourceRepresentation: Network with nodes=proteins/genes edges=interactions

8

Analysis methods:Global propertiesMotif content analysisComplex extractionCross-species comparison

The hairball syndrome

9Potential inroad into pathways and functionCan the network help to improve the analysis?10Analysis of gene expression profiles + a network111112GoalChallenge: Detect active functional modules: connected subnetwork of proteins whose genes are co-expressedWhere is the action in the network in a particular experiment?12Ron Shamir, RNA Antalia, April 0813

1313

141415

Ulitsky & ShamirBMC Systems Biology 0715Input: Expression data and a PPI networkOutput: a collection of modulesConnected PPI subnetworksCorrelated expression profilesInteractionHigh expression similarityhttp://acgt.cs.tau.ac.il/matisse16Modular Analysis for Topology of Interactions and Similarity SEts

16Probabilistic model Event Mij: i,j are mates = highly co-expressedP(Sij|Mij) ~ N(m , 2m)P(Sij|Mij) ~ N(n , 2n)H0: U is a set of unrelated genesH1: U is a module = connected subnetwork with high internal similarityRi: gene i transcriptionally regulatedm: fraction of mates out of module gene pairs that are transcriptionally regulatedm= P(Mij| Ri Rj, H1)pm: fraction of mates out of all gene pairs that are transcriptionally regulated

1717Probabilistic model (2)Is connected gene set U a module? Assuming pair indep:Define mij= m P(Ri)P(Rj)

Define nij= pm P(Ri)P(Rj).Likelihood ratio Pr(Data|H1)/Pr Data|H0)

Taking log: sum of terms ij:

18

18 Probabilistic model - summary Similarities: mixture of two GaussiansFor a candidate group U, the likelihood ratio of originating from a module or from the background is

Module score = Gene group likelihood ratio = sum over all the gene pairs

Find connected subgraphs U with high WU

1919ComplexityFinding heaviest connected subgraph: NP hard even without connectivity constraints (+/- edge weights)

Devised a heuristic for the problem2020MATISSE workflowSeed generationGreedy optimizationSignificance filtering21Finding seedsThree seeding alternatives testedAll alternatives build a seed and delete it from the networkBuilding small seeds around single nodes:Best neighborsAll neighborsApproximating the heaviest subgraphDelete low-degree nodes and record the heaviest subnetwork found22Greedy optimizationSimultaneous optimization of all the seedsThe following steps are considered:Node additionNode removalAssignment changeModule merge23Front vs. Back nodesOnly a fraction of the genes (front nodes) have meaningful similarity valuesMATISSE can link them using other genes (back nodes).

Back nodes correspond to:Unmeasured transcriptsPost-translational regulationPartially regulated pathways2424Advantages of MATISSENo p-vals needed for measurementsWorks when a fraction of the genes expression patterns are informativeCan handle any similarity dataNo prespecified number of modules25Test case: Yeast osmotic shockNetwork: 65,990 PPIs & protein-DNA interactions among 6,246 genesExpression: 133 experimental conditions response of perturbed strains to osmotic shock (ORourke & Herskowitz 04)Front nodes: 2,000 genes with the highest variance

2626

Pheromone response subnetwork

BackFront

27Performance comparison

% of modules with category enrichment at p< 10-3

% annotations enriched at p

Documents

Joint analysis of regulatory networks and expression profiles