17
Anne de Jong | Anne de Jong RNA-seq Anne de Jong 2015 1 RNA-seq analysis for Prokaryotes

Anne de Jong | RNA-seq Anne de Jong 2015 1 RNA-seq analysis for Prokaryotes

Embed Size (px)

Citation preview

Page 1: Anne de Jong | RNA-seq Anne de Jong 2015 1 RNA-seq analysis for Prokaryotes

Ann

e de

Jon

g

|

Ann

e de

Jon

g

RNA-seq

Anne de Jong2015

1

RNA-seq analysisfor Prokaryotes

Page 2: Anne de Jong | RNA-seq Anne de Jong 2015 1 RNA-seq analysis for Prokaryotes

Ann

e de

Jon

g

• What can we do with RNA-seq analysis• Transcription Start points (TTS)• Transcription Termination (TT)• Operon structures (Transcription Active Regions (TARs))• tRNAs• rRNAs• Discover ncRNA’s• Gene Expression

• Here we focus on the last item: “Gene Expression”

2

Measuring gene expression

Page 3: Anne de Jong | RNA-seq Anne de Jong 2015 1 RNA-seq analysis for Prokaryotes

Ann

e de

Jon

g

• What to do• Grow cells and freeze (liquid Nitrogen) them at point X• Isolate total RNA• Optional rRNA depletion• Library Prep (cDNA)• Sequencing (Illumina, IonProton)• Filter, trim, map the sequence reads to a reference genome• Gene expression calling

3

Measuring gene expression

All steps above can be standardized, just follow the protocols

Page 4: Anne de Jong | RNA-seq Anne de Jong 2015 1 RNA-seq analysis for Prokaryotes

Ann

e de

Jon

g

4

Gene expression values

Starting point: Excel file with gene expression values( RPKM/FPKM/TPM/Counts )

Rows are the features (genes)Columns are the experiments (samples)

Tutorial

Step1: Goto http://genome2d.molgenrug.nlIn menu RNA-seq analysis;download the “example data set” Open the file RPKM.txt in ExcelWhat do the numbers represent?

key WT1 WT2 F71Y1 F71Y2 R61K1 R61K2 R61H1 R61H2 null1 null2BSU00010 6376 8682 7756 9785 4676 8684 4161 9004 4384 6756BSU00020 4033 6400 5470 6470 3006 6108 2598 5902 2659 4535BSU00030 1075 1816 1561 1662 750 1688 793 1808 757 1459BSU00040 5400 8661 7443 7906 4159 8573 4379 8247 4412 7312BSU00050 641 1045 814 1117 425 970 437 969 422 891BSU00060 11391 12344 11106 11956 8576 11860 8368 12462 8715 10157BSU00070 17374 23051 20981 22922 13153 21162 12294 22524 12245 17628BSU00080 12 20 14 24 10 26 7 20 17 15BSU00090 32990 42404 29072 31289 14675 28162 10290 18064 7936 10371BSU00100 13208 28177 22074 25142 10919 23351 9161 24268 9958 20054BSU00110 9353 9554 12377 12362 9881 13691 9320 13896 10307 12115BSU00120 5885 7064 7990 9209 5836 9562 5496 9534 6200 8691BSU00130 5107 7408 6571 6810 3728 6851 3449 7262 3900 5950BSU00140 992 1941 1682 1766 812 1679 752 1709 712 1342BSU00150 1282 2180 1624 1805 916 1897 839 1721 831 1439.. .. .. .. .. .. .. .. .. .. ..

Page 5: Anne de Jong | RNA-seq Anne de Jong 2015 1 RNA-seq analysis for Prokaryotes

Ann

e de

Jon

g

5

The factors

• The factors describe the experiment • What are the replicates• What is the biological meaning• Multiple factors possible

experiment strain typeWT1 WT WTWT2 WT WTF71Y1 A_F71Y MutantF71Y2 A_F71Y MutantR61K1 B_R61K MutantR61K2 B_R61K MutantR61H1 C_R61H MutantR61H2 C_R61H Mutantnull1 Null knock-outnull2 Null knock-out

Factor-1 Factor-2

Tutorial

Step 2: In this example we only use Factor-1. Open Factos.txt in ExcelWhat do these Factors mean?

Page 6: Anne de Jong | RNA-seq Anne de Jong 2015 1 RNA-seq analysis for Prokaryotes

Ann

e de

Jon

g

• The factors describe the data, next step is to ask questions• Which genes are differential expressed between WT and one or more mutants?• Is there a global effect?• Which mutants are highly correlated?

• To answer these questions the contrasts needs to be defined

A_F71Y-WTB_R61K-WTC_R61H-WTnull-WT

• In this example all samples are compared to the WT

6

Contrasts

experiment strain typeWT1 WT WTWT2 WT WTF71Y1 A_F71Y MutantF71Y2 A_F71Y MutantR61K1 B_R61K MutantR61K2 B_R61K MutantR61H1 C_R61H MutantR61H2 C_R61H Mutantnull1 Null knock-outnull2 Null knock-out

Factors file

Tutorial

Step 3: Open the file Contrasts.txt in ExcelMake a Contrasts file if you use Factor-2 (type) instead of Factor-1 [see previous slide]

Page 7: Anne de Jong | RNA-seq Anne de Jong 2015 1 RNA-seq analysis for Prokaryotes

Ann

e de

Jon

g

• Adding literature data to the analyses• One way is to define groups of genes/proteins that have a biological relation

• Metabolic pathway; KEGG• Related protein domains; e.g. ABC transporters• Regulons• Related processes; e.g. sporulation• Any defined group of genes is possible

• These groups of genes are called Classes

7

Classes

Class file

Tutorial

Step 4: Open the file Classes.txt in ExcelDefine your own class for at least 20 genes e.g. the best hits found by Brinsbane

BSU00490 green CodYBSU01650 green CodYBSU01660 green CodYBSU01670 green CodYBSU01680 green CodYetc… … …BSU03981 red CcpABSU03982 red CcpABSU03990 red CcpABSU04160 red CcpABSU04470 red CcpAetc… .. ..

Page 8: Anne de Jong | RNA-seq Anne de Jong 2015 1 RNA-seq analysis for Prokaryotes

Ann

e de

Jon

g

key WT1 WT2 F71Y1 F71Y2 R61K1 R61K2 R61H1 R61H2 null1 null2BSU00010 6376 8682 7756 9785 4676 8684 4161 9004 4384 6756BSU00020 4033 6400 5470 6470 3006 6108 2598 5902 2659 4535BSU00030 1075 1816 1561 1662 750 1688 793 1808 757 1459BSU00040 5400 8661 7443 7906 4159 8573 4379 8247 4412 7312BSU00050 641 1045 814 1117 425 970 437 969 422 891BSU00060 11391 12344 11106 11956 8576 11860 8368 12462 8715 10157BSU00070 17374 23051 20981 22922 13153 21162 12294 22524 12245 17628BSU00080 12 20 14 24 10 26 7 20 17 15BSU00090 32990 42404 29072 31289 14675 28162 10290 18064 7936 10371BSU00100 13208 28177 22074 25142 10919 23351 9161 24268 9958 20054BSU00110 9353 9554 12377 12362 9881 13691 9320 13896 10307 12115BSU00120 5885 7064 7990 9209 5836 9562 5496 9534 6200 8691BSU00130 5107 7408 6571 6810 3728 6851 3449 7262 3900 5950BSU00140 992 1941 1682 1766 812 1679 752 1709 712 1342BSU00150 1282 2180 1624 1805 916 1897 839 1721 831 1439.. .. .. .. .. .. .. .. .. .. ..

Gene expression file

Factors fileA_F71Y-WTB_R61K-WTC_R61H-WTnull-WT

Contrasts file

• Now we have 4 files

8

Overview

Class file

Tutorial

Step 4: Open the file Classes.txt in ExcelDefine two or more classes for at least 10 genes in total

BSU00490 green CodYBSU01650 green CodYBSU01660 green CodYBSU01670 green CodYBSU01680 green CodYetc… … …BSU03981 red CcpABSU03982 red CcpABSU03990 red CcpABSU04160 red CcpABSU04470 red CcpAetc… .. ..

experiment strain typeWT1 WT WTWT2 WT WTF71Y1 A_F71Y MutantF71Y2 A_F71Y MutantR61K1 B_R61K MutantR61K2 B_R61K MutantR61H1 C_R61H MutantR61H2 C_R61H Mutantnull1 Null knock-outnull2 Null knock-out

Factors fileA_F71Y-WTB_R61K-WTC_R61H-WTnull-WT

Contrasts file

Page 9: Anne de Jong | RNA-seq Anne de Jong 2015 1 RNA-seq analysis for Prokaryotes

Ann

e de

Jon

g

Flow chart of the AnalysisRPKMs Factors ContrastsClass Project name

RNA-seq Analysis Pipeline (Genome2D webserver or R-script)

User input

Global AnalysisNormalizationLibrary SizesPCA/MDS

ContrastsDifferential ExpressionVolcano PlotsMA PlotsHeatmaps

Experiment AnalysisCorrelation MatrixHeatmap of ExperimentsK-means Clustering

Class AnalysisCorrelation MatricesMean Signal PlotsHeatmaps of• Top Hits• Signals• Class GroupsK-means Clustering

TablesTab delimitedHtml formatted

RESULTS

Downstream Analysis • Functional Analysis on the Genome2D webserver

• TIGR Multi Experiment Viewer• Etc..

Page 10: Anne de Jong | RNA-seq Anne de Jong 2015 1 RNA-seq analysis for Prokaryotes

Ann

e de

Jon

g

• The pipeline is available as R-script or as webserver• The R-script allows modification of settings and parameters• The webserver is parameter free

• parameters are predefined, will be calculated or estimated on the fly

10

Performing a RNA-seq analysis

Tutorial

Step 6: 1. Open the webserver http://genome2d.molgenrug.nl2. Goto to RNA-seq analysis and download the example data set3. Subsequently, upload these four files for analysis4. Give the project a logical (short) name5. Press start run and wait 1-2 min for the results

Page 11: Anne de Jong | RNA-seq Anne de Jong 2015 1 RNA-seq analysis for Prokaryotes

Ann

e de

Jon

g

• The results are divided in 5 sections • Global analysis• Contrasts analysis• Experiment analysis• Class analysis• Data tables• Functional analysis

11

Mining the results

Tutorial

Step 7: Global analysis1. For this RNA-seq experiment we asked for at least 4M (Million) reads per experiment. Did all

samples passed this criteria? 2. Which sample duplicates showed the lowest dispersion

Page 12: Anne de Jong | RNA-seq Anne de Jong 2015 1 RNA-seq analysis for Prokaryotes

Ann

e de

Jon

g

• The results are divided in 5 sections • Global analysis• Contrasts analysis• Experiment analysis• Class analysis• Data tables• Functional analysis

12

Mining the results

Tutorial

Step 8: Contrasts analysis1. Which CodY mutant showed the lowest number of significant changed genes? 2. What is the highest fold change of a gene when the Wild Type was compared to the knock-out3. Volcano plots are used to visualize Fold change and there cognate p-value. Open a volcano plot

and write a good legend for this Figure.4. On the left side of Heatmaps of TopHits, you see a Dendrogram. What is the meaning of the length

of lines in a Dendrogram?

Page 13: Anne de Jong | RNA-seq Anne de Jong 2015 1 RNA-seq analysis for Prokaryotes

Ann

e de

Jon

g

• The results are divided in 5 sections • Global analysis• Contrasts analysis• Experiment analysis• Class analysis• Data tables• Functional analysis

13

Mining the results

Tutorial

Step 9: Experiment analysis1. Correlation matrix of experiments is a visualization method to show the overall Pearson’s

correlation between experiments. Write a legend for his Figure and include a description what the shades of blue represent.

2. K-means clustering groups genes having a good correlation over multiple experiments. The Threshold of separation groups is always arbitrary, which k-means groups could optionally be merged to one group?

Page 14: Anne de Jong | RNA-seq Anne de Jong 2015 1 RNA-seq analysis for Prokaryotes

Ann

e de

Jon

g

• The results are divided in 5 sections • Global analysis• Contrasts analysis• Experiment analysis• Class analysis• Data tables• Functional analysis

14

Mining the results

Tutorial

Step 10: Class analysis1. ‘Correlation matrix of Classes’ gives a quick view on the behavior of Class members (genes) over

the various experiments. What do the colors in these matrices mean?

Page 15: Anne de Jong | RNA-seq Anne de Jong 2015 1 RNA-seq analysis for Prokaryotes

Ann

e de

Jon

g

• The results are divided in 5 sections • Global analysis• Contrasts analysis• Experiment analysis• Class analysis• Data tables• Functional analysis

15

Mining the results

Tutorial

Step 11: Data tables1. The data that is produced and used by the pipeline to draw graphs can be used for further analysis.

Such as the popular freeware programs TMEV and Cytoscape2. The file ‘Edge list for a gene network of Contrasts’ is compatible with Cytoscape but will not be

discussed further.3. Save the file ‘TIGR MEV TopHits log2FC’ for TMEV4. Download MeV: http://www.tm4.org/mev.html

Page 16: Anne de Jong | RNA-seq Anne de Jong 2015 1 RNA-seq analysis for Prokaryotes

Ann

e de

Jon

g

16

MeV; Multi experiment ViewerTutorial

Step 11: Using MeV1. Start MeV and load the file ‘TIGR MEV TopHits log2FC’ as dual channel data (because this is ratio data)2. Deselect “Load Annotation”3. Press load and now the data is imported and ready to analyze using MeV4. Optional: try to do a k-means clustering, here you have to estimate the number of clusters yourself

Page 17: Anne de Jong | RNA-seq Anne de Jong 2015 1 RNA-seq analysis for Prokaryotes

Ann

e de

Jon

g

Functional Analysis

Tutorial

Step 12: Perform a functional analysis on the TopHits of one or multiple Contrasts1. Change the ‘Current active genome’ to your genome of interest2. Upload a list of locus tags to analyze3. Examine the results and describe shortly your findings/conclusion