Upload
wang-mcdonald
View
31
Download
2
Embed Size (px)
DESCRIPTION
Using BLAST options to refine a search Address the question “how many of the Phytophthora/tomato interaction ESTs are tomato?” A: Will depend on conditions. E-value 200 bp; identities > 95%; % match overlap > 50%: ~2100 (54%) show match with 1622 unique ESTs. - PowerPoint PPT Presentation
Citation preview
Using BLAST options to refine a search
1) Address the question “how many of the Phytophthora/tomato interaction ESTs are tomato?”
A: Will depend on conditions. E-value <1 x 10-8 ; match length > 200 bp; identities > 95%; % match overlap > 50%: ~2100 (54%) show match with 1622 unique ESTs.
2) Can the question be more easily addressed by refining BLAST search?
3) Other BLAST options.
$ ./blastall.exe
-e Expectation value <E> [Real] default = 10.0
$ ./blastall.exe
-m alignment view options:0 = pairwise1 = query-anchored showing identities... 7 = XML Blast output8 = tabular9 = tabular with comment lines
Run nucleotide BLAST (blastn)
$ /cygdrive/c/Blast/bin/blastall -p blastn -d ./TA496Seq1.txt -i ./tomatosequence.txt –o OUTE2.txt –e 0.01
$ grep –c “Strand =“ OUTE2.txt
3 (with default this was 82…)
$ /cygdrive/c/Blast/bin/blastall -p blastn -d ./TA496Seq1.txt -i ./PhytophSeq1.txt –o PhytOUTE1.txt –e 1e-8
$ grep –c “Strand =“ PhytOUTE1.txt
108,787 (with default this was 292,568…)
NOTE: the blast which compares 3,921 sequences to a database of 116,711 sequences will take some time (15 minutes on my laptop).
Searching..................................................done
Score ESequences producing significant alignments: (bits) Value
gi|9292199|gb|BE354223.1|BE354223 EST355566 tomato flower buds, ... 1237 0.0 gi|16248018|gb|BI933546.1|BI933546 EST553435 tomato flower, anth... 1017 0.0 gi|4384985|gb|AI489614.1|AI489614 EST247953 tomato ovary, TAMU S... 908 0.0
>gi|9292199|gb|BE354223.1|BE354223 EST355566 tomato flower buds, anthesis, Cornell University Solanum lycopersicum cDNA clone cTOD9L3, mRNA sequence Length = 632
Score = 1237 bits (624), Expect = 0.0 Identities = 630/632 (99%) Strand = Plus / Plus
Query: 1504 gactggctagaatggctgcaatcatggcatctacttacaaggcttatcttggcgtcggac 1563 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||Sbjct: 1 gactggctagaatggctgcaatcatggcatctacttacaaggcttatcttggcgtcggac 60
Query: 1564 ttggtccactatcatttttgacgcagtatagaataccacatcctggaagagttggtggaa 1623 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||Sbjct: 61 ttggtccactatcatttttgacgcagtatagaataccacatcctggaagagttggtggaa 120
Run nucleotide BLAST (blastn)
$ /cygdrive/c/Blast/bin/blastall -p blastn -d ./TA496Seq1.txt -i ./tomatosequence.txt –o OUTE2.txt –m 8
8 = tabular format
-m = alignment view options
Slycopersicum.sequence gi|9292199|gb|BE354223.1|BE354223 99.68 632 2 0 1504 2135 1 632 0.0 1237
Slycopersicum.sequence gi|16248018|gb|BI933546.1|BI933546 99.62 521 2 0 1668 2188 1 521 0.0 1017
Slycopersicum.sequence gi|4384985|gb|AI489614.1|AI489614 99.57 466 2 0 1818 2283 1 466 0.0 908
querry start/end
bit score
e-value
Subject start/end
length/mismatch
gap openings
identities
tblastn
Running BLAST against a protein or peptide (translated BLAST vs nucleotide data)
$ /cygdrive/c/Blast/bin/blastall -p tblastn -d ./TA496Seq1.txt -i ./SB7-15-13.txt –o PEPTIDEOUT.txt (–e #)
Try:
$ /cygdrive/c/Blast/bin/blastall -p tblastn -d ./TA496Seq1.txt -i ./SB7-15-13-Pep4A.txt –o PEPTIDEOUT.txt
Then Try:$ /cygdrive/c/Blast/bin/blastall -p tblastn -d ./TA496Seq1.txt -i ./SB7-15-13-Pep4A.txt –o PEPTIDEOUT.txt –e 50
From Xiaodong
Other useful BLAST options
(1) “-b integer” number of database sequence to show alignments for. The default value is 250. To give it a smaller number will effectively reduce the size of the output file and make the BLAST searches faster.
(2) “-v integer” number of database sequences to show one-line descriptions for. The default value is 500. A smaller number for “-v” option will have a similar effect as the “-b”.
(3) “-a integer” number of processor to use. Most laptops have only one processor. But if they use BLAST program in a linux workstation with multiple processors, use all processors will drastically reduce the execution time.
From Xiaodong
Other useful BLAST options
(4) “-m 7” will give results in XML format, which is useful if the users will import the BLAST output results into the Blast2GO for GO assignment and metabolic pathway predictions.
(5) “-l string” Restrict search of database to list of GI’s (gene index), a specific identifier for each sequence in GenBank. The string is the name of the file containing all the GI’s of the sequences of the subset you want to search against. Use this option for searches against subsets of a large database without creating multiple databases. The advantage of doing this is that the E values for all the searches against the subsets are comparable. If the subsets were individual databases, the sizes are different making E values incomparable between the searches.