53

Manual alignment

  • Upload
    acacia

  • View
    26

  • Download
    0

Embed Size (px)

DESCRIPTION

Manual alignment. Difficult for remote homologies Slow Low reproductibility Allows to consider information not included in the sequence. “Equivalent” Aminoácids. Hydrophobic Ala (A), Val (V), Met (M), Leu (L), Ile (I), Phe (F), Trp (W), Tyr (Y) Small Gly (G), Ala (A), Ser (S) Polar - PowerPoint PPT Presentation

Citation preview

Page 1: Manual alignment
Page 2: Manual alignment

Manual alignment

•Difficult for remote homologies

•Slow

•Low reproductibility

•Allows to consider information not included in the sequence

Page 3: Manual alignment

“Equivalent” Aminoácids

Hydrophobic Ala (A), Val (V), Met (M), Leu (L), Ile (I), Phe (F), Trp

(W), Tyr (Y) Small

Gly (G), Ala (A), Ser (S) Polar

Ser (S), Thr (T), Asn (N), Gln (Q), Tyr (Y) Polar and charged are equivalent on the surface

Charged Asp (D), Glu (E) / Lys (K), Arg (R)

Hard to replace (special function) Gly (G), Pro (P), Cys (C), His (H)

Page 4: Manual alignment
Page 5: Manual alignment

BCL2 Human vs BCL2 Mouse

BH4

BH3

BH1

PhosphoSer

>sp|P10417|BCL2_MOUSE Apoptosis regulator Bcl-2 Length=236 Score = 429 bits (1103), Expect = 6e-119, Method: Composition-based stats. Identities = 214/239 (89%), Positives = 217/239 (90%), Gaps = 3/239 (1%)

MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPA 60 MA AGRTGYDNREIVMKYIHYKLSQRGYEWDAGD AAP GAAP PGIFS QP P PAMAQAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDADAAPLGAAPTPGIFSFQPESNPMPA 60

ASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLH 120 R+ ARTSPL+ A AGPALSPVPP VHLTLR+AGDDFSRRYRRDFAEMSSQLH VHREMAARTSPLRPLVA---TAGPALSPVPPCVHLTLRRAGDDFSRRYRRDFAEMSSQLH 117

LTPFTARGRFATVVEELFRDGVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEY 180LTPFTARGRFATVVEELFRDGVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEY LTPFTARGRFATVVEELFRDGVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEY 177

LNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK 239LNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHKLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK 236

Page 6: Manual alignment

BCL-2 vs BCL-X

>emb|CAA57886.1| bcl-x [Rattus norvegicus] Length=233 Score = 172 bits (435), Expect = 2e-41, Method: Composition-based stats. Identities = 93/199 (46%), Positives = 116/199 (58%), Gaps = 13/199 (6%)

NREIVMKYIHYKLSQRGYEW----DAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPV 66 N+E+V+ ++ YKLSQ+GY W D + P S P + P NQELVVDFLSYKLSQKGYSWSQFSDVEENRTEAPEETEPERETPSAINGNPSWHLADSPA 64

ARTSPLQTPAAPGAAAGPALSPVPPV--VHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPF 124 A G ++ V P+ V LR+AGD+F RYRR F++++SQLH+TP VN-------GATGHSSSLDAREVIPMAAVKQALREAGDEFELRYRRAFSDLTSQLHITPG 117

TARGRFATVVEELFRDGVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRH 184 TA F VV ELFRDGVNWGRIVAFF FGG +CVESV++EM LV IA WM YLN HTAYQSFEQVVNELFRDGVNWGRIVAFFSFGGALCVESVDKEMQVLVSRIASWMATYLNDH 177

LHTWIQDNGGWDAFVELYG 203 L WIQ+NGGWD FV+LYG LEPWIQENGGWDTFVDLYG 196

Page 7: Manual alignment

BCL-2 vs BCL-X

>emb|CAA57886.1| bcl-x [Rattus norvegicus] Length=233 Score = 172 bits (435), Expect = 2e-41, Method: Composition-based stats. Identities = 93/199 (46%), Positives = 116/199 (58%), Gaps = 13/199 (6%)

NREIVMKYIHYKLSQRGYEW----DAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPV 66 N+E+V+ ++ YKLSQ+GY W D + P S P + P NQELVVDFLSYKLSQKGYSWSQFSDVEENRTEAPEETEPERETPSAINGNPSWHLADSPA 64

ARTSPLQTPAAPGAAAGPALSPVPPV--VHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPF 124 A G ++ V P+ V LR+AGD+F RYRR F++++SQLH+TP VN-------GATGHSSSLDAREVIPMAAVKQALREAGDEFELRYRRAFSDLTSQLHITPG 117

TARGRFATVVEELFRDGVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRH 184 TA F VV ELFRDGVNWGRIVAFF FGG +CVESV++EM LV IA WM YLN HTAYQSFEQVVNELFRDGVNWGRIVAFFSFGGALCVESVDKEMQVLVSRIASWMATYLNDH 177

LHTWIQDNGGWDAFVELYG 203 L WIQ+NGGWD FV+LYG LEPWIQENGGWDTFVDLYG 196

BH4

BH3

BH1

PhosphoSer

Page 8: Manual alignment
Page 9: Manual alignment

Multiple alignment & functional prediction

Aim: Predict function from sequence

Page 10: Manual alignment

General strategies

Group sequences in a “family” (BLAST, PFAM)

Recognize sequence fragments (short) related to functional or structural features

Page 11: Manual alignment

Multiple alignment

Global alignment, more than two sequences

Allow to include the importance of the position in the alignment

Allow to define conserved residues Residues with functional or structural

importance Tree determinants Correlated mutations

Page 12: Manual alignment

Multiple alignments

Low similarity, only two sequences:Low similarity, only two sequences:

• Low similarity, multiple sequences:Low similarity, multiple sequences:GIFTTDIDMMHFYVKKPPGLDEFFTTLVLRTLLCMAAALTTGIDMMWTTAKRPPDMDDYYTIIIPGLLMNCIAVTTGLNMMWTTAKRPPGMDDFYTTILLPGLLMNCIGVTTTGLNMMYFTARRPPGLDEFYTTLVLRTLLCMCLGIFTTDIDMMHFYVKKPPGLDEFFTTLVLRTLLCMAAAVTTGLNMMWTTAKRPPGMDDFYTTILLPGLLMNCIGLFTTALNMMHFFGRKPPACEEYFTTLVVDGLLCNCI

AVTTGLNMWTTAKRPGMDDFYTILLPGLMNCIGLFTAIDMHFFGRKPACEEYFTLVVDGLCNCI

Page 13: Manual alignment

Tree determinants

Define subfamilies Relevant to philogeny

S1 GIFTTDIDMMHFYVKKPPGLDEFFTTLVLRTLLCMAAS2 ALTTGIDMMWTTAKRPPDMDDYYTIIIPGLLMNCIS3 AVTTGLNMMWTTAKRPPGMDDFYTTILLPGLLMNCIS4 GVTTTGLNLLYFTARRPP--DEFYSS-VLRTLLCMCLS5 GIFTTDIDLLHFYVKKPP--DEFFSSLVLRTLLCMAAS6 AVTTGLNLLWTTAKRPP--DDFYSSILLPGLLMNCIS7 GLFTTALNLLHFFGRKPP--EEYFSSLVVDGLLCNCI

Page 14: Manual alignment

Correlated mutations

Concerted changes in two or more conserved positions Reveal positions of structural interaction

GIFTTDIDMMHFYVKKPPGL DEFFTTLVLRTLLCMAAALTTGIDMMWTTAKRPPDM DDYYTIIIRGLLMNCIAVTTGLDMMWTTAKRPPGM DDFYTTILLRGLLMNCIGVTTTGLDMMYFTARRPPGL DEFYTTLVLKTLLCMCLGIFTTDIRMMHFYVKKPPGL DEFFTTLVLDTLLCMAAAVTTGLRMMWTTAKRPPGM DDFYTTILLDGLLMNCIGLFTTALRMMHFFGRKPPAC EEYFTTLVVEGLLCNCI

D-R R-D

Page 15: Manual alignment

Software

ClustalW Makes global pairwise alignments

building “clusters” of similar sequences

Tcoffee Slower than clustalw but more precise

for low similarity Combines global/local alignments

Page 16: Manual alignment

Profiles

Also known as Position-specific score matrix (PSSM).

Give scores for amino acids or gaps specific to sequence positions

Quantitative approach to include the role of positions

Page 17: Manual alignment

F K L L S H C L L V F K A F G Q T M F Q Y P I V G Q E L L G F P V V K E A I L K F K V L A A V I A D L E F I S E C I I Q F K L L G N V L V C A 0 0 1 0 1 1 1 0 1 0 C 0 0 0 0 0 0 2 0 0 1 D 0 0 0 0 0 0 0 0 0 1 E 0 1 0 0 0 2 1 0 0 0F 5 0 0 1 0 0 0 0 1 0 G 0 0 0 0 3 0 0 0 0 1 H 0 0 0 0 0 1 0 0 0 0 I 0 0 1 1 0 0 0 3 1 0 K 0 4 0 0 1 0 0 0 0 1 L 1 0 2 3 0 0 0 3 3 0 M 0 0 0 0 0 0 0 1 0 0 N 0 0 0 0 0 1 0 0 0 0 P 0 2 0 0 0 0 0 0 0 0Q 0 0 0 0 0 2 0 0 0 2R 0 0 0 0 0 0 0 0 0 0 S 0 0 0 0 2 0 0 0 0 0T 0 0 0 0 0 0 1 0 0 0V 0 0 2 2 0 0 2 0 1 1W 0 0 0 0 0 0 0 0 0 0Y 1 0 0 0 0 0 0 0 0 0

Page 18: Manual alignment

Profiles, simplest calculation

Mij =Nij/NS

fi

log

Frecuency of i at position j

Standard frequency of i.

Mij 0: Normal

> 0: Position j is rich in aa. i

< 0: Position j is poor in the aa. i

Page 19: Manual alignment

Profiles, improvements

Include amino acid frequency according to protein family

Include also classical similarity matrices Allows equivalent amino acids that do

no appear in the available alignment.

Page 20: Manual alignment

F K L L S H C L L V F K A F G Q T M F Q Y P I V G Q E L L G F P V V K E A I L K F K V L A A V I A D L E F I S E C I I Q F K L L G N V L V C A -18 –10 -1 -8 8 -3 3 -10 -2 -8 C -22 -33 -18 -18 -22 -26 22 -24 -19 -7 D -35 0 -32 -33 -7 6 -17 -34 -31 0 E -27 15 -25 -26 -9 23 -9 -24 -23 -1 F 60 -30 12 14 -26 -29 -15 4 12 -29 G -30 -20 -28 -32 28 -14 -23 -33 -27 -5 H -13 -12 -25 -25 -16 14 -22 -22 -23 -10 I 3 -27 21 25 -29 -23 -8 33 19 -23 K -26 25 -25 -27 -6 4 -15 -27 -26 0 L 14 -28 19 27 -27 -20 -9 33 26 -21 M 3 -15 10 14 -17 -10 -9 25 12 -11 N -22 -6 -24 -27 1 8 -15 -24 -24 -4 P -30 24 -26 -28 -14 -10 -22 -24 -26 -18 Q -32 5 -25 -26 -9 24 -16 -17 -23 7 R -18 9 -22 -22 -10 0 -18 -23 -22 -4 S -22 -8 -16 -21 11 2 -1 -24 -19 -4 T -10 -10 -6 -7 -5 -8 2 -10 -7 -11 V 0 -25 22 25 -19 -26 6 19 16 -16 W 9 -25 -18 -19 -25 -27 -34 -20 -17 -28 Y 34 -18 -1 1 -23 -12 -19 0 0 -18

Page 21: Manual alignment

F K L L S H C L L V F K A F G Q T M F Q Y P I V G Q E L L G F P V V K E A I L K F K V L A A V I A D L E F I S E C I I Q F K L L G N V L V C A -18 –10 -1 -8 8 -3 3 -10 -2 -8 C -22 -33 -18 -18 -22 -26 22 -24 -19 -7 D -35 0 -32 -33 -7 6 -17 -34 -31 0 E -27 15 -25 -26 -9 23 -9 -24 -23 -1 F 60 -30 12 14 -26 -29 -15 4 12 -29 G -30 -20 -28 -32 28 -14 -23 -33 -27 -5 H -13 -12 -25 -25 -16 14 -22 -22 -23 -10 I 3 -27 21 25 -29 -23 -8 33 19 -23 K -26 25 -25 -27 -6 4 -15 -27 -26 0 L 14 -28 19 27 -27 -20 -9 33 26 -21 M 3 -15 10 14 -17 -10 -9 25 12 -11 N -22 -6 -24 -27 1 8 -15 -24 -24 -4 P -30 24 -26 -28 -14 -10 -22 -24 -26 -18 Q -32 5 -25 -26 -9 24 -16 -17 -23 7 R -18 9 -22 -22 -10 0 -18 -23 -22 -4 S -22 -8 -16 -21 11 2 -1 -24 -19 -4 T -10 -10 -6 -7 -5 -8 2 -10 -7 -11 V 0 -25 22 25 -19 -26 6 19 16 -16 W 9 -25 -18 -19 -25 -27 -34 -20 -17 -28 Y 34 -18 -1 1 -23 -12 -19 0 0 -18

Page 22: Manual alignment

F K L L S H C L L V F K A F G Q T M F Q Y P I V G Q E L L G F P V V K E A I L K F K V L A A V I A D L E F I S E C I I Q F K L L G N V L V C A -18 –10 -1 -8 8 -3 3 -10 -2 -8 C -22 -33 -18 -18 -22 -26 22 -24 -19 -7 D -35 0 -32 -33 -7 6 -17 -34 -31 0 E -27 15 -25 -26 -9 23 -9 -24 -23 -1 F 60 -30 12 14 -26 -29 -15 4 12 -29 G -30 -20 -28 -32 28 -14 -23 -33 -27 -5 H -13 -12 -25 -25 -16 14 -22 -22 -23 -10 I 3 -27 21 25 -29 -23 -8 33 19 -23 K -26 25 -25 -27 -6 4 -15 -27 -26 0 L 14 -28 19 27 -27 -20 -9 33 26 -21 M 3 -15 10 14 -17 -10 -9 25 12 -11 N -22 -6 -24 -27 1 8 -15 -24 -24 -4 P -30 24 -26 -28 -14 -10 -22 -24 -26 -18 Q -32 5 -25 -26 -9 24 -16 -17 -23 7 R -18 9 -22 -22 -10 0 -18 -23 -22 -4 S -22 -8 -16 -21 11 2 -1 -24 -19 -4 T -10 -10 -6 -7 -5 -8 2 -10 -7 -11 V 0 -25 22 25 -19 -26 6 19 16 -16 W 9 -25 -18 -19 -25 -27 -34 -20 -17 -28 Y 34 -18 -1 1 -23 -12 -19 0 0 -18

Page 23: Manual alignment

Profiles

The use of profiles increases the information available and allow to extract “family” features opposite to individual sequence features

Page 24: Manual alignment

HMM profiles

Statistical models (Hidden Markov Models) to build profiles.

The model is “trained” using multiple alignments to determine evolution probabilities.

They contain a “theoretical machinery” that allows to understand sequence relatioships in a quantitative basis.

Page 25: Manual alignment
Page 26: Manual alignment
Page 27: Manual alignment
Page 28: Manual alignment
Page 29: Manual alignment
Page 30: Manual alignment

The twilight zone

Identity below 25 %

Structural similarity exists but it is difficult to identify it using standard methods

Page 31: Manual alignment
Page 32: Manual alignment

PSI-BLAST

Blast search based on profiles prepared dynamically:

Standard Blast search Building of a position-specific score

matrix (PSSM) from the alignment New search against the profile Repeat until self-consistence

Page 33: Manual alignment

PSI-BLAST

PSI-BLAST cannot find what Blast cannot detect

E-value for inclusion/exclusion must be chosen carefully (0.01)

Some times the system goes to non-sense results

Number of iterations is usually kept small

Page 34: Manual alignment

Motifs Motif: short sequence fragment. Highly

conserved, related to some structural or functional feature

Conserved in distant homologues, due to functional or structural restrictions

Help to functional prediction and to detect remote homology

Page 35: Manual alignment

C-x(2,4)-C-x(12)-H-x(3,5)-H

C

X(2,4)

C

X(12)

H

X(3,5)

H

Page 36: Manual alignment

Motifs

How to express motifs?

Regular expressions, patterns

Profiles (PSSM)

HMMs (Hidden Markov Models) (PFAM)

Page 37: Manual alignment

Regular expressions

ALRDFATHDDFSMTAEATHDSIECDQAATHEAS

A-T-H-[DE]

Page 38: Manual alignment

Regular expressions

[AC]-x-V-x(4)-{ED}

A o C

1 aa

4 aa

Never E o D

Sólo V

Page 39: Manual alignment

Regular expressions

<A-x-[ST](2)-x(0,1)-V

N-Terminal

one or none

Two S o T

Page 40: Manual alignment

Building regular expressions

From multiple alignments

Manual Pattern building software

eMotif, PRATT, Teiresias,…

Page 41: Manual alignment

Manual building

Page 42: Manual alignment
Page 43: Manual alignment
Page 44: Manual alignment
Page 45: Manual alignment

Pattern databases

PROSITE Known sequence motifs

Active site signatures Interaction sites Modification sites (glycosilation,

phosphorilation, ...)

Page 46: Manual alignment
Page 47: Manual alignment
Page 48: Manual alignment
Page 49: Manual alignment
Page 50: Manual alignment

[GA]-x(1,2)-[DE]-x-Y-x-[STAP]-x-C-[NKR]-x-[CH]-[LIVMFYWH] G GQ D L Y V P V C R L C Y

Page 51: Manual alignment
Page 52: Manual alignment

Reverse search

which proteins show a given motif? Allows

Test statistic validity of a hit Check new motifs

Page 53: Manual alignment