23
in Silico Primer Design and Simulation for Targeted High Throughput Sequencing I519 – FALL 2010 Adam Thomas, Kanishka Jain, Tulip Nandu

in Silico Primer Design and Simulation for Targeted High Throughput Sequencing

  • Upload
    wright

  • View
    48

  • Download
    0

Embed Size (px)

DESCRIPTION

in Silico Primer Design and Simulation for Targeted High Throughput Sequencing. I519 – FALL 2010 Adam Thomas, Kanishka Jain, Tulip Nandu. BACKGROUND. Major Milestone Molecular structure of DNA Human Genome Project High-Throughput Sequencing (HTS) - PowerPoint PPT Presentation

Citation preview

Page 1: in Silico  Primer Design and Simulation for Targeted High Throughput Sequencing

in Silico Primer Design and Simulation for Targeted High

Throughput Sequencing

I519 – FALL 2010Adam Thomas,Kanishka Jain,

Tulip Nandu

Page 2: in Silico  Primer Design and Simulation for Targeted High Throughput Sequencing

BACKGROUNDn Major Milestone

n Molecular structure of DNA

n Human Genome Project

n High-Throughput Sequencing (HTS)

n HTS transformed common experiments on single genes to entire genomes

n Low cost

n Multiple samples in every run (Eg. 454 Sequencer can sequence 400-600Mb)

Page 3: in Silico  Primer Design and Simulation for Targeted High Throughput Sequencing

BACKGROUNDn Primers are a short stand of nucleotides that

serve as the starting point of DNA synthesis.

n Approximately 20-25 nucleotides.

n Used to determine the DNA strand that needs amplification.

n Complement of DNA strand.

Page 4: in Silico  Primer Design and Simulation for Targeted High Throughput Sequencing

PCRn Polymerase Chain Reactionn Technique to amplify a small region of DNAn 3 step process:

n Denaturation, n Annealing and n Extension.

n Process repeated for approximately 30 to 40 cycles.

Page 5: in Silico  Primer Design and Simulation for Targeted High Throughput Sequencing

PCRn Denaturation

Heat (approx 90°C) separates double strand into two single strands

Page 6: in Silico  Primer Design and Simulation for Targeted High Throughput Sequencing

PCRn Annealing

Primer binding to individual strands (occurs at 45 to 60°C)

Page 7: in Silico  Primer Design and Simulation for Targeted High Throughput Sequencing

PCRn Extension

Temperature raised to 72°C and the Tag DNA polymerase enzyme is used to replicate DNA strands

Page 8: in Silico  Primer Design and Simulation for Targeted High Throughput Sequencing

PCRn End of First Cycle

Process repeated for approximately 30 to 40 cycles.

Page 9: in Silico  Primer Design and Simulation for Targeted High Throughput Sequencing

CURRENT PROCESS

Page 10: in Silico  Primer Design and Simulation for Targeted High Throughput Sequencing

CURRENT PROCESSn Primer3 used to create primers using PCR.

n The primers then need to be validated. Validation is performed by simulation, alignment and re-assembly.

n MetaSim is used to simulate PCR to create expected amplicons.

n CAP3 is used for re-assembly of simulated sequences.

n BLASTing the simulated sequences against the original sequence give a fairly accurate measure of how well the primers will perform.

Page 11: in Silico  Primer Design and Simulation for Targeted High Throughput Sequencing

ISSUES FACED WITH CURRENT PROCESS

n Each tool uses different file inputs and outputs.

n Users have to manually convert file formats to use in each tool.

n None of the tools up till now can integrate all of the functions and give high throughput analysis.

Page 12: in Silico  Primer Design and Simulation for Targeted High Throughput Sequencing

GOAL

Integrate the whole process involved in the High throughput sequencing experiment and keep

track of the parameters that are enter or changed.

Page 13: in Silico  Primer Design and Simulation for Targeted High Throughput Sequencing

OBJECTIVES

n A way to visualize the primers and amplicons in relation to the genome and be able to edit the primers manually and see how that affects the simulation.

n Optimization of the high-throughput process by minimizing the number of reads needed by the ‘454 process’ and still be able to assemble the sequence.

n Validation of the simulated amplicon reads to see whether the predicted simulation is in order and rectify the problem.

Page 14: in Silico  Primer Design and Simulation for Targeted High Throughput Sequencing

PROPOSED SOLUTION

Page 15: in Silico  Primer Design and Simulation for Targeted High Throughput Sequencing

VISUALIZATION TOOLn GBrowse

n Popular and open source.

n Well defined plugin architecture.

n Plugin to design primers using Primer3 already available.

Page 16: in Silico  Primer Design and Simulation for Targeted High Throughput Sequencing

PRIMER DESIGNn PrimerDesign.pm plugin already exists for GBrowse. Design

primers using Primer 3

n Designed to only amplify one specific region of DNA with as few primers and no overlapping amplicons.

n Tweaked to take two additional input parameters: Amplicon Overlap and Max Amplicon Length.

n Once primers are created using GBrowse, the primers are output into a Featured File Format (FFF)

Page 17: in Silico  Primer Design and Simulation for Targeted High Throughput Sequencing

PRIMER VALIDATION - SIMULATION

n Simulation performed using MetaSim.n MetaSim:

n Generates sets of synthetic reads or mate-pairs based on adaptable sequencing error models (e.g. for Sanger chemistry, Roche's 454 and Illumina (former Solexa).

n Can be controlled via graphical user interface or in command line mode.

Page 18: in Silico  Primer Design and Simulation for Targeted High Throughput Sequencing

SIMULATIONn Function written in Perl to invoke MetaSim using

command line option.

n Algorithm:

n Read FFF file. Extract primer coordinates.

n Extract sequence from the original sequence.

n Run MetaSim simulation using command line options.

n Each sequence generates its own FASTA sequence file with multiple sequences.

Page 19: in Silico  Primer Design and Simulation for Targeted High Throughput Sequencing

ASSEMBLYn Perl function written to invoke CAP3 using its

command line interface.

n Each file generated from the MetaSim simulation is input into CAP3 which then assembles the contigs.

Page 20: in Silico  Primer Design and Simulation for Targeted High Throughput Sequencing

ASSEMBLYn CAP3.

n Input simulated sequences as FASTA file.n CAP3 is a sequence assembly program that allows

users to assemble a set of short contigs.

n Takes an input a file of sequence reads in FASTA format.

n If header contains a dot (‘.’), CAP3 requires that the names of reads sequenced from the same subclone contain the same substring up to the first dot.

n Can be invoked using a command line interface.

Page 21: in Silico  Primer Design and Simulation for Targeted High Throughput Sequencing

BLASTn Assembled contigs are then BLASTed against the original

sequence to validate.n GBrowse accepts the assembled sequence and BLASTs

against the original sequence.n This plugin requires 4 steps:

n Exporting assembled contigs and original sequence from Gbrowse.

n Creating a BLAST database.n BLASTing the contigs against the sequence.n Importing result back into GBrowse.

Page 22: in Silico  Primer Design and Simulation for Targeted High Throughput Sequencing

DEMO

Page 23: in Silico  Primer Design and Simulation for Targeted High Throughput Sequencing

QUESTIONS