If you can't read please download the document
Upload
elmo
View
39
Download
1
Embed Size (px)
DESCRIPTION
Joint analysis of regulatory networks and expression profiles. Ron Shamir School of Computer Science Tel Aviv University April 2013. Sources: Igor Ulitsky and Ron Shamir. Identification of Functional Modules using Network Topology and High-Throughput Data. BMC Systems Biology 1:8 (2007). - PowerPoint PPT Presentation
Citation preview
1Joint analysis of regulatory networks and expression profilesRon ShamirSchool of Computer ScienceTel Aviv UniversityApril 20131
Sources: Igor Ulitsky and Ron Shamir. Identification of Functional Modules using Network Topology and High-Throughput Data. BMC Systems Biology 1:8 (2007). Igor Ulitsky and Ron Shamir. Identifying functional modules using expression profiles and confidence-scored protein interactions. BioinformaticsVol. 25 no. 9 1158-1164 (2009).
1OutlineBackgroundJoint network and expression profilesMatisseCezanne
22Background33DNARNAproteintranscriptiontranslation
The hard diskOne programIts output44DNA Microarrays / RNA-seqSimultaneous measurement of expression levels of all genes / transcripts.Perform 105-109 measurements in one experimentAllow global view of cellular processes. The most important biotechnological breakthroughs of the last /current decade
http://www.biomedcentral.com/1471-2105/12/323/figure/F255The Raw Data genesexperimentsEntries of the Raw Data matrix: expression levels.Ratios/absolute values/
expression pattern for each gene Profile for each experiment/condition/sample/chip Needs normalization!667EXPression ANalyzer and DisplayERClustering Identify clusters of co-expressed genesCLICK, KMeans, SOM, hierarchicalhttp://acgt.cs.tau.ac.il/expanderA. Maron, R. Sharan Bioinformatics 03Function. enrichmentGO, TANGO
Visualization
Promoter analysis Analyze TF binding sites of co-regulated genesPRIMABiclustering Identify homogeneous submatricesSAMBAA. Maron-Katz, A. Tanay, C. Linhart, I. Steinfeld, R. Sharan, Y. Shiloh, R. Elkon BMC Bioinformatics 05 microRNA function inference: FAME
Ulitsky et al. Nature Protocols 107Networks of Protein-protein interactions (PPIs)Large, readily available resourceRepresentation: Network with nodes=proteins/genes edges=interactions
8
Analysis methods:Global propertiesMotif content analysisComplex extractionCross-species comparison
The hairball syndrome
9Potential inroad into pathways and functionCan the network help to improve the analysis?10Analysis of gene expression profiles + a network111112GoalChallenge: Detect active functional modules: connected subnetwork of proteins whose genes are co-expressedWhere is the action in the network in a particular experiment?12Ron Shamir, RNA Antalia, April 0813
1313
141415
Ulitsky & ShamirBMC Systems Biology 0715Input: Expression data and a PPI networkOutput: a collection of modulesConnected PPI subnetworksCorrelated expression profilesInteractionHigh expression similarityhttp://acgt.cs.tau.ac.il/matisse16Modular Analysis for Topology of Interactions and Similarity SEts
16Probabilistic model Event Mij: i,j are mates = highly co-expressedP(Sij|Mij) ~ N(m , 2m)P(Sij|Mij) ~ N(n , 2n)H0: U is a set of unrelated genesH1: U is a module = connected subnetwork with high internal similarityRi: gene i transcriptionally regulatedm: fraction of mates out of module gene pairs that are transcriptionally regulatedm= P(Mij| Ri Rj, H1)pm: fraction of mates out of all gene pairs that are transcriptionally regulated
1717Probabilistic model (2)Is connected gene set U a module? Assuming pair indep:Define mij= m P(Ri)P(Rj)
Define nij= pm P(Ri)P(Rj).Likelihood ratio Pr(Data|H1)/Pr Data|H0)
Taking log: sum of terms ij:
18
18 Probabilistic model - summary Similarities: mixture of two GaussiansFor a candidate group U, the likelihood ratio of originating from a module or from the background is
Module score = Gene group likelihood ratio = sum over all the gene pairs
Find connected subgraphs U with high WU
1919ComplexityFinding heaviest connected subgraph: NP hard even without connectivity constraints (+/- edge weights)
Devised a heuristic for the problem2020MATISSE workflowSeed generationGreedy optimizationSignificance filtering21Finding seedsThree seeding alternatives testedAll alternatives build a seed and delete it from the networkBuilding small seeds around single nodes:Best neighborsAll neighborsApproximating the heaviest subgraphDelete low-degree nodes and record the heaviest subnetwork found22Greedy optimizationSimultaneous optimization of all the seedsThe following steps are considered:Node additionNode removalAssignment changeModule merge23Front vs. Back nodesOnly a fraction of the genes (front nodes) have meaningful similarity valuesMATISSE can link them using other genes (back nodes).
Back nodes correspond to:Unmeasured transcriptsPost-translational regulationPartially regulated pathways2424Advantages of MATISSENo p-vals needed for measurementsWorks when a fraction of the genes expression patterns are informativeCan handle any similarity dataNo prespecified number of modules25Test case: Yeast osmotic shockNetwork: 65,990 PPIs & protein-DNA interactions among 6,246 genesExpression: 133 experimental conditions response of perturbed strains to osmotic shock (ORourke & Herskowitz 04)Front nodes: 2,000 genes with the highest variance
2626
Pheromone response subnetwork
BackFront
27Performance comparison
% of modules with category enrichment at p< 10-3
% annotations enriched at p