1
Vertex labels swapping Edges swapping Pathway activity levels with ratio Abstract Metabolic pathway activity estimation from RNA-Seq data Yvette Temate-Tiagueu, Qiong Cheng, Meril Mathew, Igor Mandric, Olga Glebova, Nicole Beth Lopanik, Ion Mandoiu and Alex Zelikovsky Department of Computer Science, Department of Biology, Georgia State University Computer Science and Engineering, University of Connecticut Our Contribution Using Kegg: database resource for understanding high-level functions and utilities of the biological system from molecular-level information. [Kanehisa M., and Goto S., 2000] (1)A novel graph-based approach to analyze pathways significance (2)Representing a pathway as a set an inferring activity from the information extracted from those sets (3)Validating the two approaches through differential expression analysis at the transcripts and genes level and also through qPCR experiment Objectives Methods Results 1. Moran NA: Symbiosis. Curr Biol 2006, 16:R866–R871. 2. McFall-Ngai M, Hadfield MG, Bosch TCG, Carey HV, Domazet-Loso T, Douglas AE, Dubilier N, Eberl G, Fukami T, Gilbert SF et al: Animals in a bacterial world, a new imperative for the life sciences. Proc Natl Acad Sci USA 2013, 110(9):3229-3236. 3. Haine ER: Symbiont-mediated protection. Proc R Soc B-Biol Sci 2008, 275(1633):353-361. 4. Lopanik NB: Chemical defensive symbioses in the marine environment . Funct Ecol 2013, 28:328- 340. 5. Cragg GM, Newman DJ: Natural products: A continuing source of novel drug leads . Biochimica Et Biophysica Acta-General Subjects 2013, 1830(6):3670-3695. 6. Piel J: Metabolites from symbiotic bacteria . Natural Product Reports 2009, 26(3):338-362. 7. Gerwick WH, Moore BS: Lessons from the past and charting the future of marine natural products drug discovery and chemical biology . Chem Biol 2012, 19(1):85-98. Our experimental studies on Bugula neritina RNA-seq data (mutualistic symbiosis data vs none) show that, by analyzing metabolic pathways using our tool XPathway, we can effectively locate pathways which activities level significantly differ. This result is been validated through qPCR. This project is supported in part by the Molecular Basis of Disease fellowship Conclusions and Future Work The application of RNA-Seq has allowed various differential analysis studies including differential expression for pathways. A standard approach to study the metabolic differences between species is metabolic pathway. In this study, we introduce a novel approach to characterize pathways activity levels of two samples. We present XPathway, a set of pathways activity analysis tools based on Kegg-Kaas mapping of proteins to pathways. We applied our proposed methods on RNA-Seq Bugula neritina metagenomics data. We successfully identified several pathways with differential activity levels using our novel computational approaches implemented in XPathway. Further validation of initial results is conducted through qPCR. Develop efficient algorithms for reliable estimation of pathway activity level Identify pathways which activities significantly differ between two conditions Validation Experimental studies: Bugula neritina In United States - Three sibling species: 1. Deep-water (West coast of United States) 2. Shallow-water (West and Southern East coasts) 3. Northern Atlantic (Northern East coast) Illumina sequence paired-end reads: Sample 1: Bugula with symbiont Sample 2: Bugula without symbiont 50bp paired-end reads 200bp mean fragment length Assembly into contigs by Trinity BLAST with Swissprot database Sample 1 Sample 2 Topology-based estimation of pathway significance EM-based estimation of pathway activity Selected pathways for qPCR validation qPCR Model 1: permutation of labels a e b c c a d e b d Model 2: permutation of edges a c b c d a b d RNA-seq reads 2 Samples Trinity Binary EM Contigs IsoDE Contigs validation KEGG, SEED Ortholog groups K00161 K00162 K00163 KEGG, SEED Ortholog groups K00161 K00162 K00163 Graph-based Pathway signific ance Pathway activity Differentially expressed pathways Experimental validation Proteins MAFSAED VLK EYD RRMEAL BLAST binary activity status of w activity level of pathway w = pathway =threshold of w = ¿ = g ( )= { , ¿ , < Bootstrapping: - Repeat 1000 times 1. Randomly switch edges 2. Compute density of the largest component - Sort wrt to density - Find the rank of the observed induced subgraph For gene expression analyses: - Select pathways with significantly different activity - Select DE transcripts from these pathways - Select the genes from these transcripts - Primers are created to test genes per condition Preliminary results More primers ordered References In induced graph: # nodes N # edges M # green connected components # 0 in- & out-degrees • Density of the induced graph: M/(N-1) Pathw ay L1 L2 Prob_Diff_Significance ko04146 99% 5% 0.94 ko03008 99% 5% 0.94 ko03013 99% 5% 0.94 ko00983 99% 5% 0.94 ko04530 99% 5% 0.94 ko00062 1% 75% 0.74 ko00400 1% 99% 0.98 ko00071 99% 1% 0.98 ko00100 99% 1% 0.98 ko00910 4% 99% 0.95 ko04122 99% 3% 0.97 ko04713 99% 1% 0.99 M odel 1: Pvalue Pathw ay L1 L2 Prob_Diff_Significance ko04146 99% 5% 0.94 ko03008 99% 5% 0.94 ko03013 99% 5% 0.94 ko00983 99% 5% 0.94 ko04530 99% 5% 0.94 ko00130 99% 2% 0.97 ko00120 4% 58% 0.55 ko00072 1% 99% 0.98 ko00120 4% 58% 0.55 ko00400 1% 99% 0.98 ko00230 99% 5% 0.94 ko00627 1% 99% 0.99 ko00770 3% 99% 0.97 ko00980 99% 1% 0.99 ko04122 99% 1% 0.98 ko04630 99% 4% 0.96 ko04713 99% 4% 0.96 M odel2: Pvalue Highest_Diff_Activity_Level Expression1 Expression2 Diff_Express ko04068 23.83 19.77 1.21 ko04145 17.35 25.78 0.67 ko04610 9.83 6.83 1.44 ko00051 13.06 9.34 1.40 ko00740 7.83 5.83 1.34 ko01230 30.38 23.81 1.28 ko04020 17.75 23.72 0.75 ko05012 25.71 20.07 1.28 ko00983 8.63 12.20 0.71 ko05034 17.83 14.30 1.25 Pathway #M apped contigs DE contigs Ratio ofDE Pathway nam e ko00062 14 3 21.43% Fattyacid elongation ko00100 8 1 12.50% Steroid biosynthesis ko00250 39 4 10.26% Alanine, aspartate and glutam ate m etabolism ko04146 98 15 15.31% Peroxisome ko03008 67 10 14.93% Ribosom e biogenesisin eukaryotes ko03013 148 22 14.86% RNA transport ko00983 28 4 14.29% Drugm etabolism -otherenzymes ko04530 237 15 6.33% Tightjunction

Vertex labels swapping Edges swapping Pathway activity levels with ratio Abstract Metabolic pathway activity estimation from RNA-Seq data Yvette Temate-Tiagueu,

Embed Size (px)

Citation preview

Page 1: Vertex labels swapping Edges swapping Pathway activity levels with ratio Abstract Metabolic pathway activity estimation from RNA-Seq data Yvette Temate-Tiagueu,

Vertex labels swapping

Edges swapping Pathway activity levels with ratio

Abstract

Metabolic pathway activity estimation from RNA-Seq dataYvette Temate-Tiagueu, Qiong Cheng, Meril Mathew, Igor Mandric, Olga Glebova, Nicole Beth Lopanik, Ion Mandoiu and Alex Zelikovsky

Department of Computer Science, Department of Biology, Georgia State University

Computer Science and Engineering, University of Connecticut

Our ContributionUsing Kegg: database resource for understanding high-level functions and utilities of the biological system from molecular-level information. [Kanehisa M., and Goto S., 2000]

(1) A novel graph-based approach to analyze pathways significance

(2) Representing a pathway as a set an inferring activity from the information extracted from those sets

(3) Validating the two approaches through differential expression analysis at the transcripts and genes level and also through qPCR experiment

Objectives

Methods

Results

1. Moran NA: Symbiosis. Curr Biol 2006, 16:R866–R871. 2. McFall-Ngai M, Hadfield MG, Bosch TCG, Carey HV, Domazet-Loso T, Douglas AE, Dubilier N, Eberl G, Fukami T, Gilbert SF et al: Animals in a bacterial world, a new imperative for the life sciences. Proc Natl Acad Sci USA 2013, 110(9):3229-3236. 3. Haine ER: Symbiont-mediated protection. Proc R Soc B-Biol Sci 2008, 275(1633):353-361. 4. Lopanik NB: Chemical defensive symbioses in the marine environment. Funct Ecol 2013, 28:328-340. 5. Cragg GM, Newman DJ: Natural products: A continuing source of novel drug leads. Biochimica Et Biophysica Acta-General Subjects 2013, 1830(6):3670-3695. 6. Piel J: Metabolites from symbiotic bacteria. Natural Product Reports 2009, 26(3):338-362. 7. Gerwick WH, Moore BS: Lessons from the past and charting the future of marine natural products drug discovery and chemical biology. Chem Biol 2012, 19(1):85-98.

Our experimental studies on Bugula neritina RNA-seq data (mutualistic symbiosis data vs none) show that, by analyzing metabolic pathways using our tool XPathway, we can effectively locate pathways which activities level significantly differ. This result is been validated through qPCR.This project is supported in part by the Molecular Basis of Disease fellowship of GSU

Conclusions and Future Work

The application of RNA-Seq has allowed various differential analysis studies including differential expression for pathways. A standard approach to study the metabolic differences between species is metabolic pathway. In this study, we introduce a novel approach to characterize pathways activity levels of two samples. We present XPathway, a set of pathways activity analysis tools based on Kegg-Kaas mapping of proteins to pathways. We applied our proposed methods on RNA-Seq Bugula neritina metagenomics data. We successfully identified several pathways with differential activity levels using our novel computational approaches implemented in XPathway. Further validation of initial results is conducted through qPCR.

Develop efficient algorithms for reliable estimation of pathway activity level Identify pathways which activities significantly differ between two conditions

Validation

Experimental studies: Bugula neritinaIn United States - Three sibling species:

1. Deep-water (West coast of United States)

2. Shallow-water (West and Southern East coasts)

3. Northern Atlantic (Northern East coast)

Illumina sequence paired-end reads:Sample 1: Bugula with symbiont

Sample 2: Bugula without symbiont

50bp paired-end reads 200bp mean fragment length Assembly into contigs by Trinity BLAST with Swissprot database

Sample 1 Sample 2

Topology-based estimation of pathway significance

EM-based estimation of pathway activity

Selected pathways for qPCR validation

qPCR

Model 1: permutation of labelsa e

b

c

c

a

d

e

b d

Model 2: permutation of edges

a c

b

c

d

a

b d

RNA-seq reads

2 Samples

Trinity

Binary EM

Contigs

IsoDEContigs

validation

KEGG,SEED

Ortholog groupsK00161

K00162K00163

KEGG,SEED

Ortholog groupsK00161

K00162K00163

Graph-based

Pathway significance

Pathway activity

Differentially expressed pathways

Experimental validation

Proteins

MAFSAEDVLK EYDRRMEAL

BLAST

binary activity status of w

activity level of pathway w

𝒘  =    pathway

𝑻𝒘❑=threshold of w

𝒈𝒘❑=¿𝒇 𝒘❑=∑

g   ∊𝒘

𝒈𝒘❑

𝜹(𝒘 )={𝟏 , 𝐢𝐟 𝒇 𝒘❑≥𝑻𝒘❑

¿𝟎 , 𝐢𝐟 𝒇 𝒘❑<𝑻𝒘❑

Bootstrapping:- Repeat 1000 times

1. Randomly switch edges 2. Compute density of the

largest component- Sort wrt to density- Find the rank of the observed

induced subgraph

Pathway L 1 L2 Prob_Diff_Significanceko04146 99% 5% 0.94ko03008 99% 5% 0.94ko03013 99% 5% 0.94ko00983 99% 5% 0.94ko04530 99% 5% 0.94ko00062 1% 75% 0.74ko00400 1% 99% 0.98ko00071 99% 1% 0.98ko00100 99% 1% 0.98ko00910 4% 99% 0.95ko04122 99% 3% 0.97ko04713 99% 1% 0.99

Model 1: Pvalue

Pathway L1 L2 Prob_Diff_Significanceko04146 99% 5% 0.94ko03008 99% 5% 0.94ko03013 99% 5% 0.94ko00983 99% 5% 0.94ko04530 99% 5% 0.94ko00130 99% 2% 0.97ko00120 4% 58% 0.55ko00072 1% 99% 0.98ko00120 4% 58% 0.55ko00400 1% 99% 0.98ko00230 99% 5% 0.94ko00627 1% 99% 0.99ko00770 3% 99% 0.97ko00980 99% 1% 0.99ko04122 99% 1% 0.98ko04630 99% 4% 0.96ko04713 99% 4% 0.96

Model2: Pvalue

Highest_Diff_Activity_Level Expression1 Expression2 Diff_Expressko04068 23.83 19.77 1.21ko04145 17.35 25.78 0.67ko04610 9.83 6.83 1.44ko00051 13.06 9.34 1.40ko00740 7.83 5.83 1.34ko01230 30.38 23.81 1.28ko04020 17.75 23.72 0.75ko05012 25.71 20.07 1.28ko00983 8.63 12.20 0.71ko05034 17.83 14.30 1.25

For gene expression analyses:

- Select pathways with significantly different activity

- Select DE transcripts from these pathways

- Select the genes from these transcripts

- Primers are created to test genes per condition

Preliminary results

More primers ordered

Pathway #Mapped contigs DE contigs Ratio of DE Pathway nameko00062 14 3 21.43% Fatty acid elongationko00100 8 1 12.50% Steroid biosynthesisko00250 39 4 10.26% Alanine, aspartate and glutamate metabolismko04146 98 15 15.31% Peroxisome ko03008 67 10 14.93% Ribosome biogenesis in eukaryotesko03013 148 22 14.86% RNA transport ko00983 28 4 14.29% Drug metabolism - other enzymes ko04530 237 15 6.33% Tight junction

References

In induced graph:• # nodes N• # edges M• # green connected components• # 0 in- & out-degrees• Density of the induced graph: M/(N-1)