Upload
integrated-dna-technologies
View
1.657
Download
1
Embed Size (px)
DESCRIPTION
Citation preview
Integrated DNA Technologies
Use of NCBI Databases in qPCR Assay Design
Elisabeth Wagner, PhDScientific Applications Specialist
2
Session Outcomes
You will: Learn which NCBI tools are useful for designing qPCR assays Become proficient using tools for qPCR design in the IDT SciTools® suite Navigate the features and tools available on the NCBI website
Obtain sequence information for your gene of interest Perform a BLAST search for assay specificity Search for SNPs
Understand how to proceed with a basic qPCR design
3
qPCR Design Covers A Lot of Ground
There are many uses for quantitative PCR. For some examples:
Gene expression Copy number variation Genotyping Multi-species analysis Splice variant specific (or common) expression
We will address the general considerations for design in this session, and cover more specific examples later this afternoon.
4
SciTools® Overview
http://www.idtdna.com/pages/scitools Several Tools are available in the IDT SciTools® suite to assist with qPCR design
1. RealTime PCR Tool 2. PrimerQuest® Tool 3. OligoAnalyzer® Tool 4. PrimeTime® Predesigned qPCR Assay Database
5
NCBI Databases Overview:
1. Obtain sequence information for your gene of interest- NCBI Nucleotide or Gene
2. Perform a BLAST search for assay specificity NCBI BLAST
3. Search for SNPs NCBI dbSNP
NCBI enables you to access all of this information necessary for design in one location.
6
Using NCBI Databases for Custom qPCR Assay Design
NCBI Overview (National Center for Biotechnology and Information)
Founded in 1988 as part of the United States National Library of Medicine Houses a series of databases relevant to biotechnology and biomedicine Curates Genbank, a database of over 1x1012 bp of DNA sequences Gene database, which integrates gene-specific information from numerous species dbSNP, which is a database of reported Single Nucleotide Polymorphisms (SNPs) Contains the BLAST sequence similarity search program Maintains PubMed, a journal database for biomedical literature Much, much more information!
7
NCBI Database Search: Sequence Information for qPCR Assay Design
http://www.ncbi.nlm.nih.gov/8
NCBI Sequence Files
Files: Can be entered by anyone May or may not be checked for accuracy May contain contaminated sequence (plasmid or other) May contain annotation errors
Accession numbers: Letters at the beginning indicate the type of file
Nucleotide sequences start with 1 or 2 letters:
9
The RefSeq Database non-redundant explicitly linked nucleotide and protein
sequences ongoing curation by NCBI staff and
collaborators, with reviewed records indicated includes data validation and
format consistency distinct accession numbers
all accessions include an underscore '_' character
Different versions are tracked
10
RefSeq Accession Numbers
mRNAs and Proteins NM_123456 Curated mRNA NP_123456 Curated Protein NR_123456 Curated non-coding RNA XM_123456 Predicted mRNA XP_123456 Predicted Protein XR_123456 Predicted non-coding RNA
Gene Records NG_123456 Reference Genomic Sequence
Chromosome NC_123455 Microbial replicons, organelle genomes, human chromosomes AC-123455 Alternate assemblies
Assemblies NT_123456 Contig NW_123456 WGS Supercontig
11
NCBI Gene Database Information: Gene Search
13
Sequence Data Searches Using Nucleotide
Sequence Files mRNA and genomic Transcript variants
http://www.ncbi.nlm.nih.gov/nuccore14
Genbank information
15
Data Retrieval: Graphics View
16
Data Retrieval: FASTA Sequence Format
17
18
Using PrimerQuest® Tool for Custom qPCR Designs
19
PrimerQuest® Tool for Generating Custom qPCR Designs
Highly customizable tool
20
You Can Use NCBI Accession Number or FASTA Sequence
21
Once Sequence Entered, 3 Defaults Become Available
Often you will need to adjust the parameters of the tool to meet experimental design requirements
22
PrimerQuest® Tool Assay Output
23
Changing Parameters Depend on the Assay Required
Before changing anything, make sure you have selected the correct assay
Sometimes you simply need to increase the number of designs returned
It is unlikely that you will need to change these parameters
24
Directing the Design to a Specific Region
Target a particular “junction”
25
Examples
Excluded region 260-280
Excluded region-probe 260-280Target region 260-280
26
Changing Primer/Probe ParametersIf the target is particularly biased (AT or GC rich), you may need to change primer/probe parameters (i.e. length)
27
Once Initial Design Completed, Back to NCBI
Use NCBI tools to: Check whether assay is specific (BLAST) Ensure there are no SNPs to worry about (dbSNP)
Use IDT OligoAnalzyer® Tool Check primers (and probe) for secondary structure and dimer
formation
28
Using NCBI BLAST to Check for Primer Specificity
29
What is BLAST?—Getting to BLAST
http://www.ncbi.nlm.nih.gov/
Or http://blast.ncbi.nlm.nih.gov/Blast.cgi
30
What is BLAST (Basic Local Alignment Search Tool)? BLAST stands for Basic Local Alignment Search Tool and is provided by the National Center for
Biotechnology and Information (NCBI) Aligns a user defined query (sequence) to a wide variety of databases Can translate the query or the database to align sequences Can align 2 or more sequences together Heuristic algorithm to create alignments very fast
Breaks sequences into “words” and searches the database for matches Reassembles these matches based on the criteria entered
31
What is BLAST?—Basic BLAST
32
How BLAST Works—Words
BLAST divides the query sequence into subsets called “words”, which the algorithm uses to perform the alignment
Example (35 nt sequence): CGATCGGGCATCACACAAAGTTATGTAGTAGAAAT
All possible words that can be generated from the sequence are used for the alignment
The max number of words for this sequence is 29
7-letter word
33
Overview—Definitions Hit: A sequence to which the query is aligned and is returned in the
results of BLAST Identity: the extent of exact matches between 2 sequences (eg ACGT
and ACGG have 75% identity) Similarity = Positives (in BLAST scoring)
34
How BLAST Works—Scores
The BLAST raw score is converted to a bit score for each alignment using parameters based on statistics described in Karlin and Altschul (1990) (www.ncbi.nlm.nih.gov/pmc/articles/PMC53667/pdf/pnas01031-0226.pdf).
A high score does not necessarily indicate that the query is unique The score is only dependent on the alignment, length of the sequence, and the
length of the database E-value is the expected amount of random sequences that have equivalent
sequence alignment Calculated using the Max bit score and the length of the query and database Tells you the relative strength of the alignment Shorter sequences have higher E-values because the probability of finding that
sequence is higher A low E-value does not mean you have a unique match!
35
BLAST Assessment for qPCR Primers
Go to the BLAST server: http://blast.ncbi.nlm.nih.gov/Blast.cgi
Enter primer sequences separated by 7+ N’s
36
Select the Correct Database
“Others” is the most general but contains a lot of sequences. If possible use Human or Mouse specific databases
For species with completed genome projects, consider using “NCBI Genomes” to limit BLAST results
37
Change the parameters of the BLAST scoring
Select less rigorous algorithm
Change Word size to “7”
38
Looking at the Results
The Graphic Summary can immediately give you a sense of what the overall results are
Hover over each result in the graphic to identify the sequence name
39
Then Look at Results List
Look at E-value and Query Coverage. Look for jumps in either/both.
Looks like assay is specific to a single gene by transcript
Ignore the “alternate” chromosome assemblies
40
Investigate details of alignment
Check distance between primer binding if looking at mRNA
Open Graphics result in a new tab/window
41
BLAST Shows Primer Aligned to Sequence
Zoom out with “-” sign
You can grab within window and drag sequence side to side
42
The Target Gene is on Chromosome 6
This looks promising with primers on different exons.
43
But We Had Other Chromosomal Hits……
“real” transcriptPseudogene— doesn’t look transcribed
Primers (red bar indicates mismatch)
44
And Another One……
Another pseudogene.But what’s this?
Intron of a transcribed gene. So potentially in RNA samples. Recommend avoiding if possible
45
Using NCBI to Check for SNPs
46
While Assessing BLAST Results, Also Assess for SNPs
47
Investigate SNPs in Primer Binding Sites
48
Assessing SNP Data
Tells you it’s a single base substitution
Indicates alternate forms (here recorded on opposite strand)
Indicates allele frequency if known
Sometimes more frequency data at bottom of page
49
SNP Data Roughly Divided by Risk
Trusted sourceVery low frequency
No data, likely not going to be problematic
Significant risk. Look to redesign if possible
50
Using OligoAnalzyer® Tool to Check Primers and Probes
51
Checking Primers with OligoAnalyzer® Tool
PrimerQuest® design tools give you the “best” assays for the region specified
They check for self- and hetero-dimers, but this is only part of the scoring system used
An assay maybe be “better” even with dimer issues if it scores well on other parameters
Go to the OligoAnalyzer Tool Perform self-dimer checks for primers and probe Perform heterodimer checks on all primer/probe combinations (especially important
to include all combinations when multiplexing) Check hairpin structures.
Look for stability of < -9 kcal/mol Or multiple hairpins forming with < -4 kcal/mol
52
Assessing Dimer Data
Looks stable < -9kcal/mol
But this is not “dangerous”, avoid if possible but ok
Looks stable < -9kcal/mol
Not extendable, not a problem
Doesn’t look stable > -9kcal/mol
Danger of extension, exponential amplification!
53
Assessing Hairpin Structures
Based on UNAfold predictions
IDT PrimeTime® Predesigned qPCR Database
54
55
Primer and Probe Design Criteria for PrimeTime® Assays
Primers equal Tm (60–63oC) 15–30 bases in length no runs of 4 or more Gs amplicon size 50–150 bp (max 400 bp)
Probe Probe length no longer than 30–35 bases Tm value 4–10oC higher than primers no runs of 4 or more consecutive Gs G+C content 30–80% no G at the 5 end′
56
PrimeTime Results
57
Questions?