54
Introduction to RNA-Seq & Transcriptome Analysis Jessica Holmes PowerPoint by Pei-Chen Peng RNA-Seq Lab | Jessica Holmes | 2017 1

Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

  • Upload
    lamminh

  • View
    226

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Introduction to RNA-Seq & Transcriptome Analysis

Jessica Holmes

PowerPoint by Pei-Chen Peng

RNA-Seq Lab | Jessica Holmes | 2017 1

Page 2: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Exercise

Use the Tuxedo Suite to:

1.  Align RNA-Seq reads using TopHat (splice-aware aligner).

2.  Perform reference-based transcriptome assembly with

CuffLinks.

3.  Obtain a new transcriptome using CuffLinks & CuffMerge.

4.  Use CuffDiff to obtain a list of differentially expressed genes.

5.  Report a list of significantly expressed genes.

RNA-Seq Lab | Jessica Holmes | 2017 2

Page 3: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Trapnell et al., Nature Protocols, March 2012

Tuxedo Suite Bowtie and Bowtie use Burrows-Wheeler indexing for aligning reads. With bowtie2 there is no upper limit on the read length

Tophat uses either Bowtie or Bowtie2 to align reads in a splice-aware manner and aids the discovery of new splice junctions

The Cufflinks package has 4 components, the 2 major ones are listed below –

Cufflinks does reference-based transcriptome assembly

Cuffdiff does statistical analysis and identifies differentially expressed transcripts in a simple pairwise comparison, and a series of pairwise comparisons in a time-course experiment

RNA-Seq Lab | Jessica Holmes | 2017 3

Page 4: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Pipeline Overview

v

RNA-Seq Lab | Jessica Holmes | 2017 4

Page 5: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Premise

1. Procedure:

Run 1A: Allow TopHat to select splice junctions de novo and proceed through

the steps without giving the software known genes/gene models.

Run 1B: Force TopHat to use only known splice junctions (i.e. known genes/

gene models) and proceed through the steps making sure we are doing our

analysis in the context of these gene models.

2. Evaluation:

a. 2 metrics: # of mapped reads and # of significantly different identified

genes

b. Compare new transcriptome to known genes.

RNA-Seq Lab | Jessica Holmes | 2017 5

Question: Is there a difference in our results if the Tuxedo Suit is run two different ways?

Page 6: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

sample replicate # fastq name # reads

control Replicate 1 thrombin_control.txt 10,953

experimental Replicate 1 thrombin_expt.txt 12,027

name description

chr22.fa Fasta file with the sequence of chromosome 22 from the human genome (hg19 – UCSC)

genes-chr22.gtf GTF file with gene annotation, known genes (hg19 – UCSC)

RNA-Seq: 100 bp, single end data

Genome & gene information

Input Data

RNA-Seq Lab | Jessica Holmes | 2017 6

Page 7: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 1A: Logging into Galaxy

Go to: galaxy.knowhub.org   Click Enter Click Login Input your login credentials. Click Login.

Regulatory Genomics | Saurabh Sinha | 2017 7

Page 8: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 1B: Galaxy Start Screen The resulting screen should look like the figure below:

Chip-Seq Peak Calling in Galaxy | Lisa Stubbs | 2017 8

Page 9: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 2A: Accessing Input Files

At the top of the page, click Shared Data. Then click Histories.

RNA-Seq Lab | Jessica Holmes | 2017 9

Page 10: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 2B: Accessing Input Files

Click RNA-Seq_Chr_22 Data         You should see this page. Click Import History.

RNA-Seq Lab | Jessica Holmes | 2017 10

Page 11: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 2C: Accessing Input Files

Click Import You should see an imported history like the following.

RNA-Seq Lab | Jessica Holmes | 2017 11

Page 12: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

In this exercise, we will be aligning RNA-Seq reads to a reference genome in the absence of gene models. Splice junctions will be found de novo.

Remember, we are not going to provide any genic structure information.

.

Run 1A: de novo Alignment

RNA-Seq Lab | Jessica Holmes | 2017 12

Page 13: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 3A: Align Reads de novo Using TopHat

At the top right of the page, click the search box : Type TopHat Select TopHat under NGS: RNA Analysis

RNA-Seq Lab | Jessica Holmes | 2017 13

Page 14: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 3B: Align Reads de novo Using TopHat You should a page similar to the one below. We will run TopHat first on the thrombin experimental data. Make sure your inputs match the screenshot below:

RNA-Seq Lab | Jessica Holmes | 2017 14

Page 15: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 3C: Align Reads de novo Using TopHat

The rest of the page contains parameters.

We will change the following parameters: 1.  Library Type: FR Unstranded

2.  Minimum Intron Length: 70

3.  Maximum Intron Length: 500000

4.  Maximum number of alignment to be allowed: 20

RNA-Seq Lab | Jessica Holmes | 2017 15

Page 16: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 3C: Align Reads de novo Using TopHat

The rest of the page contains parameters.

We will change the following parameters: 5.  Number of mismatches allowed in each segment alignments

for reads mapped independently : 2

6.  Do you want to supply your own junction data: No

7.  Use Coverage Search: Yes

8.  Maximum intron length that may be found during coverage search: 500000

RNA-Seq Lab | Jessica Holmes | 2017 16

Page 17: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 3E: Align Reads de novo Using TopHat

The rest of the page contains parameters.

We will change the following parameters: 9.  Use Microexon Search: No 10. Do Fusion Search: No 11.  Set Bowtie2 settings: No 12. Specify read group: No Click Execute when you have set the parameters.

RNA-Seq Lab | Jessica Holmes | 2017 17

Page 18: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 3F: Align Reads de novo Using TopHat

You will see confirmation in the Main Pane denoting which tracks have been added to run.

RNA-Seq Lab | Jessica Holmes | 2017 18

You should see the tracks at the top of the History Pane A gray track means the job isn't running. A yellow track means the job is running. A green track means the job is finished.

Page 19: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 3G: Align Reads de novo Using TopHat We want to run TopHat for the control dataset now. Navigate to the TopHat page again. This time use 1: thrombin_control.fastq for RNA-Seq FASTQ file.

RNA-Seq Lab | Jessica Holmes | 2017 19

Page 20: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 3H: Align Reads de novo Using TopHat2 Configure the parameters as before (below) and click execute: 1.  Library Type: FR Unstranded 2.  Minimum Intron Length: 70 3.  Maximum Intron Length: 500000 4.  Maximum number of alignment to be allowed: 20 9.  Number of mismatches allowed in each segment alignments

for reads mapped independently : 2 10.  Do you want to supply your own junction data: No 11.  Use Coverage Search: Yes 12.  Maximum intron length that may be found during coverage

search: 500000 17.  Use Microexon Search: No 18.  Do Fusion Search: No 19.  Set Bowtie2 settings: No 20.  Specify read group: No

RNA-Seq Lab | Jessica Holmes | 2017 20

Page 21: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 4A: Renaming Files

In galaxy, it is important to rename output files to something meaningful. For example, to rename 9: Tophat_on_data4_and data2:accepted_hits Click the pencil icon

RNA-Seq Lab | Jessica Holmes | 2017 21

Page 22: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 4B: Renaming Files

On the next page, enter expt_accepted_hits for the Name: field. Click Save. Track 9 show have the name change:

RNA-Seq Lab | Jessica Holmes | 2017 22

Page 23: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 4C: Renaming Files

In this manner, rename the following tracks with the respective names: 5.  expt_align_summary 6.  expt_insertions 7.  expt_deletions 8.  expt_splice_junctions 10.  ctrl_align_summary 11.  ctrl_insertions 12.  ctrl_deletions 13.  ctrl_splice_junctions 14.  ctrl_accepted_hits

RNA-Seq Lab | Jessica Holmes | 2017 23

Page 24: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 5A: Evaluating de novo Alignment

Click the eye icon 5: expt_align_summary You should see the results on the screen, like below : In the experimental group, 147 reads were not aligned.

RNA-Seq Lab | Jessica Holmes | 2017 24

Page 25: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 5B: Evaluating de novo Alignment

Click the eye icon 10: ctrl_align_summary You should see the results on the screen, like below : In the control group, 101 reads were not aligned.

RNA-Seq Lab | Jessica Holmes | 2017 25

Page 26: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

In this exercise, we will be aligning RNA-Seq reads to a reference genome in the presence of gene information. This obviates the need for TopHat to find splice junctions de novo.

.

Run 1B: Informed Alignment

RNA-Seq Lab | Jessica Holmes | 2017 26

Page 27: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 6A: Informed Align Reads Using TopHat

We want to re-run the analysis for the experimental group, but using a gene-model annotation this time. Instead of repeating the previous steps, we can save some time by clicking on the update icon on track 9: expt_accepted_hits. Click on track 9. Click the update icon.

RNA-Seq Lab | Jessica Holmes | 2017 27

Page 28: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 6B: Informed Align Reads Using TopHat

Keep the same parameters as before, but change the following: 1.  Do you want to supply your

own junction data: Yes 2.  Use Gene Annotation Model: Yes 3.  Gene Model Annotations:

3: genes-chr22.gtf 4.  Use Raw Junctions: No 5.  Only look for supplied junctions: No

Click Execute.

RNA-Seq Lab | Jessica Holmes | 2017 28

Page 29: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 6C: Informed Align Reads Using TopHat

This should generate tracks 15 through 19. Rename the tracks the following: 15.  expt-genes_align_summary 16.  expt-genes_insertions 17.  expt-genes_deletions 18.  expt-genes_splice_junctions 19.  expt-genes_accepted_hits

RNA-Seq Lab | Jessica Holmes | 2017 29

Page 30: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 6D: Informed Align Reads Using TopHat

We want to re-run the analysis for the control group, but using a gene-model annotation this time. Instead of repeating the previous steps, we can save some time by clicking on the update icon on track 14: ctrl_accepted_hits. Click on track 14. Click the update icon.

RNA-Seq Lab | Jessica Holmes | 2017 30

Page 31: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 6E: Informed Align Reads Using TopHat

Keep the same parameters as before, but change the following: 1.  Do you want to supply your

own junction data: Yes 2.  Use Gene Annotation Model: Yes 3.  Gene Model Annotations:

3: genes-chr22.gtf 4.  Use Raw Junctions: No 5.  Only look for supplied junctions: No

Click Execute.

RNA-Seq Lab | Jessica Holmes | 2017 31

Page 32: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 6F: Informed Align Reads Using TopHat

This should generate tracks 15 through 19. Rename the tracks the following: 20.  ctrl-genes_align_summary 21.  ctrl-genes_insertions 22.  ctrl-genes_deletions 23.  ctrl-genes_splice_junctions 24.  ctrl-genes_accepted_hits

RNA-Seq Lab | Jessica Holmes | 2017 32

Page 33: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 7A: Evaluating Informed Alignment

Click the eye icon 15: expt-genes_align_summary You should see the results on the screen, like below : In the experimental group, 39 reads were not aligned.

RNA-Seq Lab | Jessica Holmes | 2017 33

Page 34: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 7B: Evaluating Informed Alignment

Click the eye icon 20: ctrl-genes_align_summary You should see the results on the screen, like below : In the control group, 27 reads were not aligned.

RNA-Seq Lab | Jessica Holmes | 2017 34

Page 35: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

sample # fastq name # reads Unmapped Reads

de novo Informed

experimental thrombin_expt.txt 11,679 147 39

control thrombin_control.txt 10,619 101 27

Step 8: Comparison of Alignments

There are fewer unmapped reads with the informed alignment, or Run 1B (i.e. when we use the known genes, and known splice sites)!

TopHat’s prediction of splice junctions de novo is not working very well for this dataset. (This is likely due to the low number of reads in our dataset.)

Conclusions

RNA-Seq Lab | Jessica Holmes | 2017 35

Page 36: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Next, we will utilize our RNA-Seq alignments to assembly gene transcripts, thereby permitting us to get relative gene abundances between the two samples (control and experimental).

Finding Differentially Expressed Genes

RNA-Seq Lab | Jessica Holmes | 2017 36

Page 37: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Trapnell et al., Nature Protocols, March 2012

Reminder: Cufflinks The Cufflinks package has 4 components, the 2 major ones are listed below –

Cufflinks does reference-based transcriptome assembly Cuffdiff does statistical analysis and identifies differentially expressed transcripts in a simple pairwise comparison, and a series of pairwise comparisons in a time-course experiment

RNA-Seq Lab | Jessica Holmes | 2017 37

Page 38: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 9A: Assemble Transcripts using Cufflinks

For the de-novo alignment (Run 1A) , we will run the program Cufflinks in order to obtain gene transcripts from our aligned RNA-Seq reads .

There is no need to conduct this step for the informed alignment because we have the locations of known genes already

Type Cufflinks into the search box.

Click on Cufflinks under NGS: RNA Analysis.

RNA-Seq Lab | Jessica Holmes | 2017 38

Page 39: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 9B: Assemble Transcripts using Cufflinks Choose 9: expt_accepted_hits for the BAM file. Use the default parameters for everything except change the following: 1.  Apply length correction: No Length

Correction at all (use raw counts)

Ensure your parameters match up with the figure on the right. Click Execute.

RNA-Seq Lab | Jessica Holmes | 2017 39

Page 40: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 9C: Assemble Transcripts using Cufflinks Go back to Cufflinks. This time choose 14: ctrl_accepted_hits for the BAM file. Use the default parameters for everything except change the following: 1.  Apply length correction: No Length Correction

at all (use raw counts)

Ensure your parameters match up with the figure on the right. Click Execute.

RNA-Seq Lab | Jessica Holmes | 2017 40

Page 41: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 9D: Assemble Transcripts using Cufflinks

Tracks 25 – 29 are the results of the experimental Cufflinks run.

RNA-Seq Lab | Jessica Holmes | 2017 41

Tracks 30 – 34 are the results of the control Cufflinks run.

We will merge the assembled transcripts from the control and experimental samples next using Cuffmerge.

Page 42: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 10A: Merge Transcripts Using CuffMerge

In the search box, type Cuffmerge Click Cuffmerge under NGS: RNA Analysis.

RNA-Seq Lab | Jessica Holmes | 2017 42

Page 43: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 10B: Merge Transcripts Using CuffMerge

RNA-Seq Lab | Jessica Holmes | 2017 43

For GTF file, choose track 27: Cufflinks on data 9, which are the assembled transcripts run on the experimental accepted hits (track 9) of the de novo assembly. Press Ctrl and choose track 32: Cufflinks on data 14, which are the assembled transcripts run on the control accepted hits (track 14) of the de novo assembly. Choose No for the other parameters and click Execute.

Page 44: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 11A: Differential Gene Expression

For the de novo assembly, lets find out how many differentially expressed (DE) genes are present. We will use Cuffdiff to do this. To do this, we need a GTF file and a BAM file for both the control and experimental assemblies. We could use Cuffdiff on the informed alignments, as well, but we normally recommend using htseqcount and edgeR instead. Type Cuffdiff into the search and click its link:

RNA-Seq Lab | Jessica Holmes | 2017 44

Page 45: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 11B: Differential Gene Expression

Choose track 35 for the Transcripts. Under Condition 1:

Name: Control Add replicate: 14:

ctrl_accepted_hits Under Condition 2:

Name: Experimental Add replicate: 9:

expt_accepted_hits Accept the default parameters and click Execute.

RNA-Seq Lab | Jessica Holmes | 2017 45

Page 46: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 11C: Differential Gene Expression

When done, click the eye icon on track 45: You should see output like the following: Count the number of "yes" answers in the significant column (column 14) as you scroll down. There should be 2. These are the DE genes. RNA-Seq Lab | Jessica Holmes | 2017 46

Page 47: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Conclusion

We did the following today

Use the Tuxedo Suite to:

1.  Align RNA-Seq reads using TopHat (splice-aware aligner).

2.  Perform reference-based transcriptome assembly with

CuffLinks.

3.  Obtain a new transcriptome using CuffLinks & CuffMerge.

4.  Use CuffDiff to obtain a list of differentially expressed genes.

5.  Report a list of significantly expressed genes.

RNA-Seq Lab | Jessica Holmes | 2017 47

Page 48: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Useful links Online resources for RNA-Seq analysis questions – ²  http://www.biostars.org/ - Biostar (Bioinformatics explained)

²  http://seqanswers.com/ - SEQanswers (the next generation sequencing community)

²  Most tools have a dedicated lists

Information about the various parts of the Tuxedo suite is available here - http://ccb.jhu.edu/software.shtml

Genome Browsers tutorials –

²  http://www.broadinstitute.org/igv/QuickStart/ - IGV tutorials

²  http://www.openhelix.com/ucsc/ - UCSC browser tutorials

(openhelix is a great place for tutorials, UIUC has a campus-wide subscription)

48

Contact us at: [email protected]

[email protected]

RNA-Seq Lab | Jessica Holmes | 2017

Page 49: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Extra Material IGV

RNA-Seq Lab | Jessica Holmes | 2017 49

Page 50: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

The Integrative Genomics Viewer (IGV) is a tool that supports the visualization of mapped reads to a reference genome, among other functionalities. We will use it to observe where hits were called for the de-novo alignment (Run 1A) for the two samples (control and experimental), the new transcriptome generated by CuffMerge, and the differentially expressed genes.

.

Visualization Using IGV

RNA-Seq Lab | Jessica Holmes | 2017 50

Page 51: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

In this step, we will start IGV and load the chr22.fa file, the known genes file (genes-chr22.gtf), the hits for both sample groups, and the merged transcriptome. These files are located in [course_directory]/04_Transcriptomics/results  

Step 9: Start IGV

RNA-Seq Lab | Jessica Holmes | 2017 51

Graphical Instruction: Load Genome 1. Within IGV, click the ‘Genomes’ tab on the menu bar. 2. Click the the ‘Load Genome from File’ option. 3. In the browser window, select chr22.fa (genome).

Graphical Instruction: Load Other Files

1. Within IGV, click the FILE tab on the menu bar. 2. Click the ‘Load from File’ option. 3. Select the genes-chr22.gtf file (known genes file). 4. Perform Steps 1-3 for the files to the right.

Files to Load genes-chr22.f ctrl_accepted_hits.bam expt_accepted_hits.bam merged.gtf

Page 52: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 10A: Visualization With IGV

RNA-Seq Lab | Jessica Holmes | 2017 52

Your browser window should look similar to the picture below:

Page 53: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 10B: Visualization With IGV

RNA-Seq Lab | Jessica Holmes | 2017 53

Click here and type the following location of a differentially expressed gene:

 chr22:19960675-­‐19963235  

 Move to the left and right of the gene. What do you see?

Page 54: Introduction to RNA-Seq & Transcriptome Analysisveda.cs.uiuc.edu/CompGen2017/labs/04_Transcriptomics_2017.pdf · Exercise Use the Tuxedo Suite to: 1. Align RNA-Seq reads using TopHat

Step 10C: Visualization with IGV

Looks like the new transcriptome (merged.gtf) compares poorly to the known gene models. This is very likely due to the very low number of

reads in our dataset.

We can see that there are many more reads for one dataset compared

to the other. Hence, it makes sense that the gene was called as being

differentially expressed.

Note the intron spanning reads.

RNA-Seq Lab | Jessica Holmes | 2017 54