Use of NCBI Databases in qPCR Assay Design

  • View
    1.657

  • Download
    1

  • Category

    Science

Preview:

DESCRIPTION

 

Citation preview

Integrated DNA Technologies

Use of NCBI Databases in qPCR Assay Design

Elisabeth Wagner, PhDScientific Applications Specialist

2

Session Outcomes

You will: Learn which NCBI tools are useful for designing qPCR assays Become proficient using tools for qPCR design in the IDT SciTools® suite Navigate the features and tools available on the NCBI website

Obtain sequence information for your gene of interest Perform a BLAST search for assay specificity Search for SNPs

Understand how to proceed with a basic qPCR design

3

qPCR Design Covers A Lot of Ground

There are many uses for quantitative PCR. For some examples:

Gene expression Copy number variation Genotyping Multi-species analysis Splice variant specific (or common) expression

We will address the general considerations for design in this session, and cover more specific examples later this afternoon.

4

SciTools® Overview

http://www.idtdna.com/pages/scitools Several Tools are available in the IDT SciTools® suite to assist with qPCR design

1. RealTime PCR Tool 2. PrimerQuest® Tool 3. OligoAnalyzer® Tool 4. PrimeTime® Predesigned qPCR Assay Database

5

NCBI Databases Overview:

1. Obtain sequence information for your gene of interest- NCBI Nucleotide or Gene

2. Perform a BLAST search for assay specificity NCBI BLAST

3. Search for SNPs NCBI dbSNP

NCBI enables you to access all of this information necessary for design in one location.

6

Using NCBI Databases for Custom qPCR Assay Design

NCBI Overview (National Center for Biotechnology and Information)

Founded in 1988 as part of the United States National Library of Medicine Houses a series of databases relevant to biotechnology and biomedicine Curates Genbank, a database of over 1x1012 bp of DNA sequences Gene database, which integrates gene-specific information from numerous species dbSNP, which is a database of reported Single Nucleotide Polymorphisms (SNPs) Contains the BLAST sequence similarity search program Maintains PubMed, a journal database for biomedical literature Much, much more information!

7

NCBI Database Search: Sequence Information for qPCR Assay Design

http://www.ncbi.nlm.nih.gov/8

NCBI Sequence Files

Files: Can be entered by anyone May or may not be checked for accuracy May contain contaminated sequence (plasmid or other) May contain annotation errors

Accession numbers: Letters at the beginning indicate the type of file

Nucleotide sequences start with 1 or 2 letters:

9

The RefSeq Database non-redundant explicitly linked nucleotide and protein

sequences ongoing curation by NCBI staff and

collaborators, with reviewed records indicated includes data validation and

format consistency distinct accession numbers

all accessions include an underscore '_' character

Different versions are tracked

10

RefSeq Accession Numbers

mRNAs and Proteins NM_123456 Curated mRNA NP_123456 Curated Protein NR_123456 Curated non-coding RNA XM_123456 Predicted mRNA XP_123456 Predicted Protein XR_123456 Predicted non-coding RNA

Gene Records NG_123456 Reference Genomic Sequence

Chromosome NC_123455 Microbial replicons, organelle genomes, human chromosomes AC-123455 Alternate assemblies

Assemblies NT_123456 Contig NW_123456 WGS Supercontig

11

Accessing Sequence Information in NCBI

12

NCBI

NCBI Gene Database Information: Gene Search

13

Sequence Data Searches Using Nucleotide

Sequence Files mRNA and genomic Transcript variants

http://www.ncbi.nlm.nih.gov/nuccore14

Genbank information

15

Data Retrieval: Graphics View

16

Data Retrieval: FASTA Sequence Format

17

18

Using PrimerQuest® Tool for Custom qPCR Designs

19

PrimerQuest® Tool for Generating Custom qPCR Designs

Highly customizable tool

20

You Can Use NCBI Accession Number or FASTA Sequence

21

Once Sequence Entered, 3 Defaults Become Available

Often you will need to adjust the parameters of the tool to meet experimental design requirements

22

PrimerQuest® Tool Assay Output

23

Changing Parameters Depend on the Assay Required

Before changing anything, make sure you have selected the correct assay

Sometimes you simply need to increase the number of designs returned

It is unlikely that you will need to change these parameters

24

Directing the Design to a Specific Region

Target a particular “junction”

25

Examples

Excluded region 260-280

Excluded region-probe 260-280Target region 260-280

26

Changing Primer/Probe ParametersIf the target is particularly biased (AT or GC rich), you may need to change primer/probe parameters (i.e. length)

27

Once Initial Design Completed, Back to NCBI

Use NCBI tools to: Check whether assay is specific (BLAST) Ensure there are no SNPs to worry about (dbSNP)

Use IDT OligoAnalzyer® Tool Check primers (and probe) for secondary structure and dimer

formation

28

Using NCBI BLAST to Check for Primer Specificity

29

What is BLAST?—Getting to BLAST

http://www.ncbi.nlm.nih.gov/

Or http://blast.ncbi.nlm.nih.gov/Blast.cgi

30

What is BLAST (Basic Local Alignment Search Tool)? BLAST stands for Basic Local Alignment Search Tool and is provided by the National Center for

Biotechnology and Information (NCBI) Aligns a user defined query (sequence) to a wide variety of databases Can translate the query or the database to align sequences Can align 2 or more sequences together Heuristic algorithm to create alignments very fast

Breaks sequences into “words” and searches the database for matches Reassembles these matches based on the criteria entered

31

What is BLAST?—Basic BLAST

32

How BLAST Works—Words

BLAST divides the query sequence into subsets called “words”, which the algorithm uses to perform the alignment

Example (35 nt sequence): CGATCGGGCATCACACAAAGTTATGTAGTAGAAAT

All possible words that can be generated from the sequence are used for the alignment

The max number of words for this sequence is 29

7-letter word

33

Overview—Definitions Hit: A sequence to which the query is aligned and is returned in the

results of BLAST Identity: the extent of exact matches between 2 sequences (eg ACGT

and ACGG have 75% identity) Similarity = Positives (in BLAST scoring)

34

How BLAST Works—Scores

The BLAST raw score is converted to a bit score for each alignment using parameters based on statistics described in Karlin and Altschul (1990) (www.ncbi.nlm.nih.gov/pmc/articles/PMC53667/pdf/pnas01031-0226.pdf).

A high score does not necessarily indicate that the query is unique The score is only dependent on the alignment, length of the sequence, and the

length of the database E-value is the expected amount of random sequences that have equivalent

sequence alignment Calculated using the Max bit score and the length of the query and database Tells you the relative strength of the alignment Shorter sequences have higher E-values because the probability of finding that

sequence is higher A low E-value does not mean you have a unique match!

36

Select the Correct Database

“Others” is the most general but contains a lot of sequences. If possible use Human or Mouse specific databases

For species with completed genome projects, consider using “NCBI Genomes” to limit BLAST results

37

Change the parameters of the BLAST scoring

Select less rigorous algorithm

Change Word size to “7”

38

Looking at the Results

The Graphic Summary can immediately give you a sense of what the overall results are

Hover over each result in the graphic to identify the sequence name

39

Then Look at Results List

Look at E-value and Query Coverage. Look for jumps in either/both.

Looks like assay is specific to a single gene by transcript

Ignore the “alternate” chromosome assemblies

40

Investigate details of alignment

Check distance between primer binding if looking at mRNA

Open Graphics result in a new tab/window

41

BLAST Shows Primer Aligned to Sequence

Zoom out with “-” sign

You can grab within window and drag sequence side to side

42

The Target Gene is on Chromosome 6

This looks promising with primers on different exons.

43

But We Had Other Chromosomal Hits……

“real” transcriptPseudogene— doesn’t look transcribed

Primers (red bar indicates mismatch)

44

And Another One……

Another pseudogene.But what’s this?

Intron of a transcribed gene. So potentially in RNA samples. Recommend avoiding if possible

45

Using NCBI to Check for SNPs

46

While Assessing BLAST Results, Also Assess for SNPs

47

Investigate SNPs in Primer Binding Sites

48

Assessing SNP Data

Tells you it’s a single base substitution

Indicates alternate forms (here recorded on opposite strand)

Indicates allele frequency if known

Sometimes more frequency data at bottom of page

49

SNP Data Roughly Divided by Risk

Trusted sourceVery low frequency

No data, likely not going to be problematic

Significant risk. Look to redesign if possible

50

Using OligoAnalzyer® Tool to Check Primers and Probes

51

Checking Primers with OligoAnalyzer® Tool

PrimerQuest® design tools give you the “best” assays for the region specified

They check for self- and hetero-dimers, but this is only part of the scoring system used

An assay maybe be “better” even with dimer issues if it scores well on other parameters

Go to the OligoAnalyzer Tool Perform self-dimer checks for primers and probe Perform heterodimer checks on all primer/probe combinations (especially important

to include all combinations when multiplexing) Check hairpin structures.

Look for stability of < -9 kcal/mol Or multiple hairpins forming with < -4 kcal/mol

52

Assessing Dimer Data

Looks stable < -9kcal/mol

But this is not “dangerous”, avoid if possible but ok

Looks stable < -9kcal/mol

Not extendable, not a problem

Doesn’t look stable > -9kcal/mol

Danger of extension, exponential amplification!

53

Assessing Hairpin Structures

Based on UNAfold predictions

IDT PrimeTime® Predesigned qPCR Database

54

55

Primer and Probe Design Criteria for PrimeTime® Assays

Primers equal Tm (60–63oC) 15–30 bases in length no runs of 4 or more Gs amplicon size 50–150 bp (max 400 bp)

Probe Probe length no longer than 30–35 bases Tm value 4–10oC higher than primers no runs of 4 or more consecutive Gs G+C content 30–80% no G at the 5 end′

56

PrimeTime Results

57

Questions?

Recommended