View
3.257
Download
1
Category
Tags:
Preview:
DESCRIPTION
An introduction to second generation sequencing will be given with focus on the basic production informatics: The approach of raw data conversion and quality control will be discussed.
Citation preview
The Queensland Brain Institute |
Introduction to 2GS data analysisDrink faster !
April 13, 2023
[MIT]
The Queensland Brain Institute | April 13, 2023
Product Time
fastq 5 days
bam, vcf,… 3 weeks
paper >6 months
Per one-flowcell project
Production Informatics and Bioinformatics
Map to genome and generate raw genomic features (e.g. SNPs)
Analyze the data; Uncover the biological meaning
Produce raw sequence readsBasic ProductionInformatics
Advanced Production Inform.
BioinformaticsResearch
The Queensland Brain Institute | April 13, 2023
• First Generation: Sanger sequencing
• Second Generation: amplified molecule sequencing
• Third Generation: single molecule sequencing
Brief history of sequencing
*
*
* Discussion about category
The Queensland Brain Institute | April 13, 2023
What steps are involved in sequencing ?
• sequencing by synthesis (SBS) technology– Fragmentation– Library generation– Amplification– Sequencing– Analysis
Illumina Marketing: “3h 10 minutes wet-lab30 minutes dry lab”
The Queensland Brain Institute | April 13, 2023
Illumina sequencing: Library + Amplification
“Illumina Sequencing Technology” booklet
The Queensland Brain Institute | April 13, 2023
Illumina Sequencing: Synthesis + Imaging
“Illumina Sequencing Technology” booklet
The Queensland Brain Institute | April 13, 2023
Output: 1.5 Terabyte of data
Inspired by anzska information booklet
The Queensland Brain Institute | April 13, 2023
Sequencer Output Conversion: Production Informatics
• 1.5 TB data : 6 billion clusters with 100 bp reads = 600 billion data points
visualpharm.com
For HiSeq: images are converted to flat files (*.bcl or *.cif)
CASAVA
…× read length
HiSeq
Maysoft
The Queensland Brain Institute | April 13, 2023
Multiplexing
• 6 billion reads:– 750 million reads per lane
• Currently 12-plex (soon 96-plex):– One run
Oliver Twardowski
The Queensland Brain Institute | April 13, 2023
Demultiplexing
visualpharm.com
CASAVA
…× samples
…× read length
The Queensland Brain Institute | April 13, 2023
CASAVA1.8.0 program call
configureBclToFastq.pl \--input-dir Data/Intensities/BaseCalls/ \
-output-dir Data/Unaligned \--sample-sheet SampleSheet.csv \ --use-bases-mask y100,I6nn,Y100 >file.log 2>&1
cd Data/Unaligned
qsub -pe make 16 -j y -v $MYPATH –o qsub.out -cwd –N fastq -b y \ make -j 16
Runtime: ~ 6h
The Queensland Brain Institute | April 13, 2023
Fastq files
@HWI-ST301_0112:1:1:1169:2044#0/1CCATAAGGCCACGTATTTTGCAAGCTATTTAACTGGCGGCGAT+HWI-ST301_0112:1:1:1169:2044#0/1dddc\dd^dd`acacdacd`ecdedabdcdddcc\``\`bTa\
36 36 36 35 28 …
ASCII @ .. ~DEC 64 .. 126PHRED 0 .. 62
Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 2010 Apr;38(6):1767-71. PMID:20015970
Phred scores are estimates only !
The Queensland Brain Institute | April 13, 2023
Fastq – PHRED quality
• Pathological
The Queensland Brain Institute | April 13, 2023
Fastq: Quality control
• Base-pair quality score
• Adapter contamination
• Uneven Amplification
The Queensland Brain Institute | April 13, 2023
Three things to remember
1. Don’t be fooled by marketing2. Fastq files are not directly usable3. Basic-run QC can be made from fastq file
“All modern genomics projects are now bottlenecked at the stage of data analysis rather than data production”
Ewan Birney European Bioinformatics Institute
Wellcome Trust
David S. Roos Bioinformatics--Trying to Swim in a Sea of Data; Science 16 February 2001: Vol. 291 no. 5507 pp. 1260-1261 DOI: 10.1126/science.291.5507.1260
The Queensland Brain Institute | April 13, 2023
Next Week:
Abstract: This session will focus on identifying SNPs from whole genome, exome capture or targeted resequencing data. The approaches of mapping, local realigment, recalibration, SNP calling, and SNP recalibration will be introduced and quality metrics discussed.
The Queensland Brain Institute | April 13, 2023
Walk-in-clinic
The Queensland Brain Institute | April 13, 2023
• First Generation: Sanger sequencing
• Second Generation: amplified molecule sequencing
• Third Generation: single molecule sequencing
Brief history of sequencing
*
*
* Discussion about category
The Queensland Brain Institute | April 13, 2023
Helicos
• true Single Molecule Sequencing(tSMS)™ technology– Sequencing by synthesis but much more sensitive so no
amplification
The Queensland Brain Institute | April 13, 2023
Life Technology - Ion Torrent
• Hydrogen Ion is released by the incorporation of a nucleotide, which is measured by a semiconductor
• Depending on which nucleotide wash cycle the signal coincides
The Queensland Brain Institute | April 13, 2023
PacBio
• Immobilized polymerase at the bottom of a well
• Fluorescent nucleotides float around and if they are incorporated they are held still for tens of milliseconds, which is the signal that is recorded
• No upper limit on the length
http://www.pacificbiosciences.com/smrt-biology/smrt-technology?page=4
The Queensland Brain Institute | April 13, 2023
Nanopore
• Molecule is sucked through a poor and the change in the membrane charge due to the different nucleotides is recorded.
http://www.nanoporetech.com/sections/index/82
Recommended