Upload
felix-leonard
View
220
Download
0
Embed Size (px)
DESCRIPTION
Plan Our goals for today: Introduction to Galaxy platform -FASTQ quality score encoding in Galaxy Analysis of differential gene expression using nextGen sequencing data Workflows in Galaxy Sites: Galaxy-tut: Galaxy-qld: Genomics Virtual Lab: https://genome.edu.auhttps://genome.edu.au All GVL resources are public
Citation preview
Genomics Virtual LabGVL site: https://genome.edu.auThe main aim: facilitate the genomics research in Australia
Galaxy:• Tutorials and protocols (nextGen sequencing)• Galaxy for tutorials: galaxy-tut.genome.edu.au• Galaxy for full-scale analysis: galaxy-qld.genome.edu.au• “roll your own” GVL platform on the Australian government
funded computer infrastructure (NeCTAR cloud):- virtual computer cluster- Galaxy
- IPython Notebook- RStudio
Mirror of UCSC Genome BrowserRStudio
LearnUseGet
Plan
Our goals for today:
Introduction to Galaxy platform- FASTQ quality score encoding in Galaxy
Analysis of differential gene expression using nextGen sequencing data
Workflows in Galaxy
Sites:Galaxy-tut: http://galaxy-tut.genome.edu.auGalaxy-qld: http://galaxy-qld.genome.edu.au
Genomics Virtual Lab: https://genome.edu.auAll GVL resources are public
Galaxy: how does it look like
Tools Working window Data
Good user practice for Galaxy-qldGVL Galaxy in Queensland: galaxy-qld.genome.edu.au
Register with your UQ email and get a bigger disk allocation.
Use ftp for big datasets – it is faster. Galaxy recognises .gz compression.
Do not store unneeded datasets. Delete temporary files such as SAM. Purge deleted datasets.
Do not start many big jobs in parallel (BWA, bowtie, bowtie2, tophat, tophat2, velvet, trinity).
Create and use workflows for multi-step analysis.
Specify the quality score encoding for nextGen sequencing data (FASTQ files).
FASTQ quality score [email protected] ILLUMINA-96BC32_0028_FC:3:1:8035:1092/1TAGCAGCACATCATGGTTTACATCGTATGCCGTCTT+IIHIDIIIIIIIIIIIIIHIHIIIIIDGIBGGGGGG
Qual. = 39Offset = 33ASCII(72): H
FASTQ quality score in Galaxy
Many old illumina datasets have a proprietary data encoding (offset 64)Currently most NGS datasets use Sanger encoding (offset 33)
Galaxy
By default Galaxy assign ‘fastq’ data type to uploaded FASTQ files.In this case the offset is not specified, and many tools do not recognize the data
fastqillumina – old illumina quality score encoding (offset 64)fastqsanger – new illumina / Sanger quality score encodingNearly all modern NGS data use Sanger encoding (fastqsanger in Galaxy)
Solution:- specify a proper format, eg fastqsanger or fastqillumina, during the data upload - change the format via Attributes > Datatype
Differential gene expression
Basic GVL Galaxy tutorialbased on Trapnell et al. (2012) Nature Protocols.
Import data
Align to a reference genome (tophat)
Find differentially expressed genes (Cuffdiff)
https://genome.edu.au/wiki/Learn
mRNA
LibraryReads
Number of reads correlates with gene expression level.
Thank you!GVL site: www.genome.edu.auGalaxy for tutorials: galaxy-tut.genome.edu.auGalaxy Queensland: galaxy-qld.genome.edu.au
Contributors and participants: