View
48
Download
0
Category
Preview:
DESCRIPTION
in Silico Primer Design and Simulation for Targeted High Throughput Sequencing. I519 – FALL 2010 Adam Thomas, Kanishka Jain, Tulip Nandu. BACKGROUND. Major Milestone Molecular structure of DNA Human Genome Project High-Throughput Sequencing (HTS) - PowerPoint PPT Presentation
Citation preview
in Silico Primer Design and Simulation for Targeted High
Throughput Sequencing
I519 – FALL 2010Adam Thomas,Kanishka Jain,
Tulip Nandu
BACKGROUNDn Major Milestone
n Molecular structure of DNA
n Human Genome Project
n High-Throughput Sequencing (HTS)
n HTS transformed common experiments on single genes to entire genomes
n Low cost
n Multiple samples in every run (Eg. 454 Sequencer can sequence 400-600Mb)
BACKGROUNDn Primers are a short stand of nucleotides that
serve as the starting point of DNA synthesis.
n Approximately 20-25 nucleotides.
n Used to determine the DNA strand that needs amplification.
n Complement of DNA strand.
PCRn Polymerase Chain Reactionn Technique to amplify a small region of DNAn 3 step process:
n Denaturation, n Annealing and n Extension.
n Process repeated for approximately 30 to 40 cycles.
PCRn Denaturation
Heat (approx 90°C) separates double strand into two single strands
PCRn Annealing
Primer binding to individual strands (occurs at 45 to 60°C)
PCRn Extension
Temperature raised to 72°C and the Tag DNA polymerase enzyme is used to replicate DNA strands
PCRn End of First Cycle
Process repeated for approximately 30 to 40 cycles.
CURRENT PROCESS
CURRENT PROCESSn Primer3 used to create primers using PCR.
n The primers then need to be validated. Validation is performed by simulation, alignment and re-assembly.
n MetaSim is used to simulate PCR to create expected amplicons.
n CAP3 is used for re-assembly of simulated sequences.
n BLASTing the simulated sequences against the original sequence give a fairly accurate measure of how well the primers will perform.
ISSUES FACED WITH CURRENT PROCESS
n Each tool uses different file inputs and outputs.
n Users have to manually convert file formats to use in each tool.
n None of the tools up till now can integrate all of the functions and give high throughput analysis.
GOAL
Integrate the whole process involved in the High throughput sequencing experiment and keep
track of the parameters that are enter or changed.
OBJECTIVES
n A way to visualize the primers and amplicons in relation to the genome and be able to edit the primers manually and see how that affects the simulation.
n Optimization of the high-throughput process by minimizing the number of reads needed by the ‘454 process’ and still be able to assemble the sequence.
n Validation of the simulated amplicon reads to see whether the predicted simulation is in order and rectify the problem.
PROPOSED SOLUTION
VISUALIZATION TOOLn GBrowse
n Popular and open source.
n Well defined plugin architecture.
n Plugin to design primers using Primer3 already available.
PRIMER DESIGNn PrimerDesign.pm plugin already exists for GBrowse. Design
primers using Primer 3
n Designed to only amplify one specific region of DNA with as few primers and no overlapping amplicons.
n Tweaked to take two additional input parameters: Amplicon Overlap and Max Amplicon Length.
n Once primers are created using GBrowse, the primers are output into a Featured File Format (FFF)
PRIMER VALIDATION - SIMULATION
n Simulation performed using MetaSim.n MetaSim:
n Generates sets of synthetic reads or mate-pairs based on adaptable sequencing error models (e.g. for Sanger chemistry, Roche's 454 and Illumina (former Solexa).
n Can be controlled via graphical user interface or in command line mode.
SIMULATIONn Function written in Perl to invoke MetaSim using
command line option.
n Algorithm:
n Read FFF file. Extract primer coordinates.
n Extract sequence from the original sequence.
n Run MetaSim simulation using command line options.
n Each sequence generates its own FASTA sequence file with multiple sequences.
ASSEMBLYn Perl function written to invoke CAP3 using its
command line interface.
n Each file generated from the MetaSim simulation is input into CAP3 which then assembles the contigs.
ASSEMBLYn CAP3.
n Input simulated sequences as FASTA file.n CAP3 is a sequence assembly program that allows
users to assemble a set of short contigs.
n Takes an input a file of sequence reads in FASTA format.
n If header contains a dot (‘.’), CAP3 requires that the names of reads sequenced from the same subclone contain the same substring up to the first dot.
n Can be invoked using a command line interface.
BLASTn Assembled contigs are then BLASTed against the original
sequence to validate.n GBrowse accepts the assembled sequence and BLASTs
against the original sequence.n This plugin requires 4 steps:
n Exporting assembled contigs and original sequence from Gbrowse.
n Creating a BLAST database.n BLASTing the contigs against the sequence.n Importing result back into GBrowse.
DEMO
QUESTIONS
Recommended