10
Chapter 6 - Profiles 1 Chapter 6 - Profiles Assume we have a family of sequences. To search for other sequences in the family we can Search with a sequence from the family Search with more sequences from the family together Consensus sequences (regular expressions) Regular expression Ex. A-[FR]-X(2,3)-M GARCCM H LCAF ARLM LMA Weight matrices or position-specific scoring matrices Not considering gaps Profiles Profiles as Hidden Markov Models

Chapter 6 - Profiles1 Assume we have a family of sequences. To search for other sequences in the family we can Search with a sequence from the family Search

Embed Size (px)

DESCRIPTION

Chapter 6 - Profiles3 Multiple alignments and profiles What weight does amino acid a have in position r in the profile

Citation preview

Page 1: Chapter 6 - Profiles1 Assume we have a family of sequences. To search for other sequences in the family we can Search with a sequence from the family Search

Chapter 6 - Profiles 1

Chapter 6 - Profiles

Assume we have a family of sequences. To search for other sequences in the family we can

• Search with a sequence from the family• Search with more sequences from the family

together– Consensus sequences (regular expressions)

• Regular expression Ex. A-[FR]-X(2,3)-M• GARCCMH LCAFARLMLMA

– Weight matrices or position-specific scoring matrices• Not considering gaps

– Profiles– Profiles as Hidden Markov Models

Page 2: Chapter 6 - Profiles1 Assume we have a family of sequences. To search for other sequences in the family we can Search with a sequence from the family Search

Chapter 6 - Profiles 2

Search with a family of sequences

1. Align the sequences (multiple)2. Make a profile from part of the alignment3. Search in the database with the profile4. As an option, revise the profile, and search again (iteratively)

Page 3: Chapter 6 - Profiles1 Assume we have a family of sequences. To search for other sequences in the family we can Search with a sequence from the family Search

Chapter 6 - Profiles 3

Multiple alignments and profiles

What weight does amino acid a have in position r in the profile

Page 4: Chapter 6 - Profiles1 Assume we have a family of sequences. To search for other sequences in the family we can Search with a sequence from the family Search

Chapter 6 - Profiles 4

ExampleClustal X (1.64b) multiple sequence alignmentXENLA1 ALVSGPQD------NELDG--MQLXENLA2 AQVNGPQD------NELDG--MQFMOUSE1 PQVEQLEL------GGSP---GDLRAT1 PQVPQLEL------GGGPEA-GDLMOUSE2 PQVAQLEL------GGGPGA-GDLRAT2 PQVAQLEL------GGGPGA-GDL RemovedCRILO PQVAQLEL------GGGPGA-DDLRABIT LQVGQAEL------GGGPGA-GGLBOVIN PQVGALEL------AGGPG-----SHEEP PQVGALEL------AGGPG----- RemovedPIG PQAGAVEL------GGGLGG---LCANFA LQVRDVEL------AGAPGE-GGLHUMAN LQVGQVEL------GGGPGA-GSLCHICK P-LVSSPL------RGEAGV-LPFORENI LLGFLPPKAGGAVVQGGEN---EVVERMO LLGFLPAKSGGAAAGG-ENEVAEF 12345678******567890*234 * means removed Cons A B C D E F G H I K L M N P Q R S T V W X Y Z Gap Le1 P 1 0 -18 -17 -12 -14 -21 -13 -3 -10 1 -2 -15 26 -6 -12 -3 -2 -1 -32 0 -18 0 100 1002 q -4 0 -18 -5 2 -10 -17 2 -3 3 0 1 -3 -7 11 3 -4 -3 -4 -17 0 -10 0 50 1003 V 1 0 -5 -23 -17 -6 -15 -17 15 -15 9 7 -17 -16 -13 -17 -7 -3 18 -26 0 -14 0 100 1004 G 0 0 -12 -8 -7 -14 0 -5 -13 -6 -14 -10 -2 -9 -5 -6 -1 -3 -8 -22 0 -11 0 100 1005 Q 2 0 -15 1 1 -25 4 -3 -17 -1 -15 -11 1 -7 3 -2 3 -1 -12 -30 0 -20 0 100 1006 P 1 0 -13 -17 -11 -14 -21 -13 0 -10 0 -1 -13 18 -7 -13 -1 0 3 -32 0 -17 0 100 1007 E 0 0 -29 12 19 -36 -10 0 -25 7 -24 -19 3 20 13 2 2 0 -17 -41 0 -26 0 100 1008 L -8 0 -20 -15 -10 -1 -29 -10 7 -7 14 9 -13 -17 -6 -10 -12 -8 3 -20 0 -8 0 100 1005 g 3 0 -16 5 2 -36 21 0 -28 3 -28 -21 10 -8 4 5 4 -2 -20 -32 0 -25 0 34 346 G 4 0 -21 6 0 -49 51 -10 -41 -6 -40 -32 4 -13 -4 -7 3 -9 -30 -40 0 -37 0 100 1007 G 3 0 -16 -3 -4 -31 23 -11 -22 -8 -20 -16 -2 -12 -5 -9 0 -6 -16 -33 0 -27 0 100 1008 P 3 0 -24 7 6 -32 -10 -5 -21 -1 -20 -17 0 27 2 -6 2 0 -14 -43 0 -25 0 100 1009 g 3 0 -19 5 -2 -45 49 -8 -39 -6 -38 -30 9 -13 -5 -6 4 -7 -28 -37 0 -33 0 50 780 a 5 0 -3 -2 0 -12 0 -5 -3 -3 -6 -3 -2 -3 -1 -4 1 0 0 -19 0 -12 0 50 782 g -1 0 -11 -9 -9 -12 7 -9 -6 -9 -4 0 -6 -13 -7 -10 -4 -6 -6 -18 0 -14 0 50 783 q 0 0 -22 13 11 -33 4 0 -26 3 -25 -19 6 6 7 0 3 0 -19 -36 0 -23 0 50 784 L -12 0 -10 -37 -28 28 -42 -13 22 -22 29 21 -27 -24 -17 -23 -20 -12 15 1 0 10 0 100 100 * 17 0 0 10 17 3 52 0 0 1 36 2 4 22 21 2 5 0 16 0 0 0 0

Page 5: Chapter 6 - Profiles1 Assume we have a family of sequences. To search for other sequences in the family we can Search with a sequence from the family Search

Chapter 6 - Profiles 5

What to take into account when creating a profile?1. The observed amino acids in position r in the alignment.

2. The number of independent ‘observations’ that has been used for constructingthe alignment of position r (for example number of different a.a. in the column)

3. The similarity of a to the amino acids observed in column r, to allow for not yetobserved amino acids. Amino acid a is more likely to occur in unknown family members if

there are many amino acids similar to a in the known sequences.Thus a ‘background’ scoring matrix should be used.

4. The background (a priori) distribution of the amino acids.

5. The diversity and similarity of the sequences, resulting in the importance (orweight) of each sequence. The known sequences are normally not uniformlydistributed in the ‘family space’, and should have different weights in the calculation.

6. The number of gaps over column r and the neighbouring columns.

These points are not independent. How these aspects are treated varies with the different methods for profile construction.

Page 6: Chapter 6 - Profiles1 Assume we have a family of sequences. To search for other sequences in the family we can Search with a sequence from the family Search

Chapter 6 - Profiles 6

Database search with a profile

Page 7: Chapter 6 - Profiles1 Assume we have a family of sequences. To search for other sequences in the family we can Search with a sequence from the family Search

Chapter 6 - Profiles 7

Notations

Page 8: Chapter 6 - Profiles1 Assume we have a family of sequences. To search for other sequences in the family we can Search with a sequence from the family Search

Chapter 6 - Profiles 8

Position weight

r

rbrb

r

r

rb

rb

r

rbrb

mTV

m

mT

V

mTV

ln1ln1:3

]11ln[

]1

1ln[:2

:1

No sequence weight considered now

1. All a.a. In the column count equally2. A.a occurring many times are favored3. A.a. Occurring many times are ’punished’

Page 9: Chapter 6 - Profiles1 Assume we have a family of sequences. To search for other sequences in the family we can Search with a sequence from the family Search

Chapter 6 - Profiles 9

PSI-BLAST

Page 10: Chapter 6 - Profiles1 Assume we have a family of sequences. To search for other sequences in the family we can Search with a sequence from the family Search

Chapter 6 - Profiles 10

Hidden Markov Model