18
1 Overview • Repeats •…

Overview

Embed Size (px)

DESCRIPTION

Overview. Repeats …. BLAST - Basic Local Alignment Search Tool. Blast programs use a heuristic search algorithm. The programs use the statistical methods of Karlin and Altschul (1990,1993). - PowerPoint PPT Presentation

Citation preview

Page 1: Overview

1

Overview

• Repeats

• …

Page 2: Overview

2

BLAST - Basic Local Alignment Search Tool

• Blast programs use a heuristic search algorithm. The programs use the statistical methods of Karlin and Altschul (1990,1993).

• Blast programs were designed for fast database searching, with minimal sacrifice of sensitivity to distant related sequences.

Page 3: Overview

3

BLAST - Basic Local Alignment Search Tool

• BLAST programs search databases in a special compressed format. To use your own privat database with blast, you need to format it to the blast format.

Page 4: Overview

4

BLAST Programs• BLAST is actually a family of programs

– BLASTN - Nucleotide query searching a nucleotide database.

– BLASTP - Protein query searching a protein database.– BLASTX - Translated nucleotide query sequence (6 frames)

searching a protein database. – TBLASTN - Protein query searching a translated nucleotide

(6 frames) database.– TBLASTX - Translated nucleotide query (6 frames)

searching a translated nucleotide (6 frames) database.

Page 5: Overview

5

Where to find the BLAST programs?

• BLAST searches can be done on the WWW BLAST server at NIH: http://www.ncbi.nlm.nih.gov/BLAST/

• On a stand alone computer such as dapsas1 at the Weizmann institute.

• From the GCG software package.

Page 6: Overview

6

Blast method• Compare query to each sequence in database • Use heuristic to speed pairwise comparison • Create 'sequence abstraction' by listing exact

and similar words– on the fly for the query– in advance for the database • Find similar words between query and each

database sequence • Extend such words to obtain high-scoring

sequence pairs (HSPs) • Calculate statistics analytically

Page 7: Overview

7

Gapped BLAST

• BLAST 2.0 is a new version with new capabilities such as Gapped-Blast and Psi-Blast.

• The Gapped Blast algorithm allows gaps to be introduces into the alignments. That means that similar regions are not broken into several segments (as in the older versions).

• This method reflects biological relationships much better.

Page 8: Overview

8

PSI - BLAST• PSI (Position Specific Iterated ) Blast provides a

new automatic “profile like” search. • The program first performs a gapped blast search

of the database. The information of the significant alignments are then used by the program to construct a “position specific” score matrix. This matrix replaces the query sequence in the next round of database searching.

• The program may be iterated until no new significant alignments are found.

Page 9: Overview

9

Blast output

• The list of hits • Database accession codes, name, description, general

information about the hit • Score in bits, the alignment score expressed in units of

information. Usually 30 bits are required for significance • Expectation value E(), how many hits we expect to find

by chance with this score, when comparing this query to the database. It is important to keep in mind that the E() value does not represent a measure of similarity between the two sequences.

Page 10: Overview

10

Blast output• The information for each hit • A header including hit name, description, length • The same for all additional entries removed due to

redundancy • Composite expectation value • Each hit may contain several HSPs • score and expectation value – how many identical residues – how many residues contributing positively to the

score • The local alignment itself

Page 11: Overview

11

The Smith-Waterman Tools

• Smith-Waterman searching method:

• Compare query to each sequence in database

• Do full Smith-Waterman pairwise comparisons

• Use search results to generate statistics

Page 12: Overview

12

Where to find the SW programs?

• Since SW searching is exhaustive, it is the slowest method we use a special hardware + software (Bioccelerator) to run the programs.

• Bioccelerator is available here inTAU at the

• at the Weizmann Institute http://dapsas1.weizmann.ac.il/bcd/bcd_parent/bcd_bioccel/bioccel.html

• The Bioccelerator from the command line on dapsas1 or life2.

Page 13: Overview

13

Comparison of programs

• Concept: • SW and BLAST: local alignments • FASTA: global alignments

BLAST can report more than one HSP per database entry, FASTA reports only one segment (match).

• Speed: • BLAST > FASTA >> SW • Sensitivity: SW > FASTA > BLAST (old version!)

Page 14: Overview

14

Comparison of programs• Sensitivity: • FASTA is more sensitive, misses less

homologues, (the opposite can also happen - if there are no identical residues conserved, but this is infrequent).

• FASTA gives a better separation between true homologues and random hits.

• Usually when FASTA gives an unexpected hit, it is an even farther homologue.

Page 15: Overview

15

Comparison of programs

• Statistics: • BLAST calculates probabilities • sometimes fails entirely if some assumptions

are invalid • FASTA calculates significance 'on the fly'

from the given dataset • more relevant • problematic if the dataset is small

Page 16: Overview

16

Tips for DB searches• Use latest database version • Run Blast first, then depending on your results

run a finer tool (fasta, ssearch, SW, blocks, etc..) • Where possible use translated sequence.• E() < 0.05 is statistically significant, usually

biologically interesting. Check also 0.05 < E() <10 because you might find interesting hits.

• Pay attention to abnormal composition of the query sequence, it usually causes biased scoring.

Page 17: Overview

17

Tips for DB searches

• Split large query sequence ( if >1000 for DNA, >200 for protein).

• If the query has repeated segments, remove them and repeat the search.

Page 18: Overview

18