10
RNA-Seq in Galaxy Igor Makunin [email protected] DI/TRI, March 9, 2015 Research Computing Centre @ UQ

RNA-Seq in Galaxy Igor Makunin DI/TRI, March 9, 2015

Embed Size (px)

DESCRIPTION

Plan Our goals for today: Introduction to Galaxy platform -FASTQ quality score encoding in Galaxy Analysis of differential gene expression using nextGen sequencing data Workflows in Galaxy Sites: Galaxy-tut: Galaxy-qld: Genomics Virtual Lab: https://genome.edu.auhttps://genome.edu.au All GVL resources are public

Citation preview

Page 1: RNA-Seq in Galaxy Igor Makunin DI/TRI, March 9, 2015

RNA-Seq in Galaxy

Igor [email protected]

DI/TRI, March 9, 2015

ResearchComputing

Centre@ UQ

Page 2: RNA-Seq in Galaxy Igor Makunin DI/TRI, March 9, 2015

Genomics Virtual LabGVL site: https://genome.edu.auThe main aim: facilitate the genomics research in Australia

Galaxy:• Tutorials and protocols (nextGen sequencing)• Galaxy for tutorials: galaxy-tut.genome.edu.au• Galaxy for full-scale analysis: galaxy-qld.genome.edu.au• “roll your own” GVL platform on the Australian government

funded computer infrastructure (NeCTAR cloud):- virtual computer cluster- Galaxy

- IPython Notebook- RStudio

Mirror of UCSC Genome BrowserRStudio

LearnUseGet

Page 3: RNA-Seq in Galaxy Igor Makunin DI/TRI, March 9, 2015

Plan

Our goals for today:

Introduction to Galaxy platform- FASTQ quality score encoding in Galaxy

Analysis of differential gene expression using nextGen sequencing data

Workflows in Galaxy

Sites:Galaxy-tut: http://galaxy-tut.genome.edu.auGalaxy-qld: http://galaxy-qld.genome.edu.au

Genomics Virtual Lab: https://genome.edu.auAll GVL resources are public

Page 4: RNA-Seq in Galaxy Igor Makunin DI/TRI, March 9, 2015

Galaxy: how does it look like

Tools Working window Data

Page 5: RNA-Seq in Galaxy Igor Makunin DI/TRI, March 9, 2015

Good user practice for Galaxy-qldGVL Galaxy in Queensland: galaxy-qld.genome.edu.au

Register with your UQ email and get a bigger disk allocation.

Use ftp for big datasets – it is faster. Galaxy recognises .gz compression.

Do not store unneeded datasets. Delete temporary files such as SAM. Purge deleted datasets.

Do not start many big jobs in parallel (BWA, bowtie, bowtie2, tophat, tophat2, velvet, trinity).

Create and use workflows for multi-step analysis.

Specify the quality score encoding for nextGen sequencing data (FASTQ files).

Page 6: RNA-Seq in Galaxy Igor Makunin DI/TRI, March 9, 2015

FASTQ quality score [email protected] ILLUMINA-96BC32_0028_FC:3:1:8035:1092/1TAGCAGCACATCATGGTTTACATCGTATGCCGTCTT+IIHIDIIIIIIIIIIIIIHIHIIIIIDGIBGGGGGG

Qual. = 39Offset = 33ASCII(72): H

Page 7: RNA-Seq in Galaxy Igor Makunin DI/TRI, March 9, 2015

FASTQ quality score in Galaxy

Many old illumina datasets have a proprietary data encoding (offset 64)Currently most NGS datasets use Sanger encoding (offset 33)

Galaxy

By default Galaxy assign ‘fastq’ data type to uploaded FASTQ files.In this case the offset is not specified, and many tools do not recognize the data

fastqillumina – old illumina quality score encoding (offset 64)fastqsanger – new illumina / Sanger quality score encodingNearly all modern NGS data use Sanger encoding (fastqsanger in Galaxy)

Solution:- specify a proper format, eg fastqsanger or fastqillumina, during the data upload - change the format via Attributes > Datatype

Page 8: RNA-Seq in Galaxy Igor Makunin DI/TRI, March 9, 2015

Differential gene expression

Basic GVL Galaxy tutorialbased on Trapnell et al. (2012) Nature Protocols.

Import data

Align to a reference genome (tophat)

Find differentially expressed genes (Cuffdiff)

https://genome.edu.au/wiki/Learn

mRNA

LibraryReads

Number of reads correlates with gene expression level.

Page 9: RNA-Seq in Galaxy Igor Makunin DI/TRI, March 9, 2015
Page 10: RNA-Seq in Galaxy Igor Makunin DI/TRI, March 9, 2015

Thank you!GVL site: www.genome.edu.auGalaxy for tutorials: galaxy-tut.genome.edu.auGalaxy Queensland: galaxy-qld.genome.edu.au

Contributors and participants: