56
Moreover, the circular logic How do we know what is the right distance without a “good” alignment? And how do we construct a good alignment without knowing what substitutions were made previously? ATGCGT--GCAAGT --GCGGCGGCAGTT ATGCGT--GCAAGT- --GCGGCGGC-AGTT Less gaps but 7matches More gaps but 8 matches Which is better?

How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

Moreover, the circular logic

How do we know what is the right distance without a “good” alignment?And how do we construct a good alignment without knowing what substitutions were made previously?

ATGCGT--GCAAGT

--GCGGCGGCAGTT

ATGCGT--GCAAGT-

--GCGGCGGC-AGTT

Less gaps but 7matches More gaps but 8 matches

Which is better?

Page 2: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

Substitution matrices for proteins

We need to compare, DNA or protein sequences, estimate their distances etc (e.g. Helpful for inferring molecular function by finding similarity to a sequence with known function).

For that purpose we need: “Good alignment” of sequences.

For that purpose we need: A measure for judging the quality of an alignment in relation to other possible alignments, a scoring system.

Page 3: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case
Page 4: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case
Page 5: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

A practical aspect as well, since we don’t study very diverged DNA sequences

Page 7: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case
Page 8: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case
Page 9: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case
Page 11: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case
Page 12: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case
Page 13: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case
Page 14: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case
Page 15: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case
Page 16: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case
Page 17: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case
Page 18: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case
Page 19: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case
Page 20: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case
Page 21: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case
Page 22: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case
Page 23: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case
Page 24: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case
Page 25: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

Commonly multiplied by a power of 10 to deal with decimals

Page 26: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case
Page 27: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case
Page 28: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

A sample substitution matrix

C 9

S -1 4

T -1 1 5

P -3 -1 -1 7

A 0 1 0 -1 4

G -3 0 -2 -2 0 6

N -3 1 0 -2 -2 0 6

D -3 0 -1 -1 -2 -1 1 6

E -4 0 -1 -1 -1 -2 0 2 5

Q -3 0 -1 -1 -1 -2 0 0 2 5

H -3 -1 -2 -2 -2 -2 1 -1 0 0 8

R -3 -1 -1 -2 -1 -2 0 -2 0 1 0 5

K -3 0 -1 -1 -1 -2 0 -1 1 1 -1 2 5

M -1 -1 -1 -2 -1 -3 -2 -3 -2 0 -2 -1 -1 5

I -1 -2 -1 -3 -1 -4 -3 -3 -3 -3 -3 -3 -3 1 4

L -1 -2 -1 -3 -1 -4 -3 -4 -3 -2 -3 -2 -2 2 2 4

V -1 -2 0 -2 0 -3 -3 -3 -2 -2 -3 -3 -2 1 3 1 4

F -2 -2 -2 -4 -2 -3 -3 -3 -3 -3 -1 -3 -3 0 0 0 -1 6

Y -2 -2 -2 -3 -2 -3 -2 -3 -2 -1 2 -2 -2 -1 -1 -1 -1 3 7

W -2 -3 -2 -4 -3 -2 -4 -4 -3 -2 -2 -3 -3 -1 -3 -2 -3 1 2 11

C S T P A G N D E Q H R K M I L V F Y W

134 LQQGELDLVMTSDILPRSELHYSPMFDFEVRLVLAPDHPLASKTQITPEDLASETLLI| ||| | | |||||| | || ||

137 LDSNSVDLVLMGVPPRNVEVEAEAFMDNPLVVIAPPDHPLAGERAISLARLAEETFVM

D:D = +6

D:R = -2

BLOSUM62

Page 29: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

BLOSUM – Brief OverviewBased purely on counts and alignmentBLOSUM65 < BLOSUM45 in evolutionary distance

1 k i i1 in

ij i

20

j i j i1 1 1

a set of sequences S = S ...S , where S = s ...s , s is the j-th amino acid in sequence S .

The probability of observing each amino acid X =p(X)

p(X) ( , S ) / ( , S )

where ( ,

k k

i j i

j

c X c X

c X= = =

= ∑ ∑∑

i

2

) is the count of amino acid in sequence S

( , | ) 2 ( ). ( ) ( ) , i j

l m l m l

S X

P X X random p X p X or p X if l m= =

Page 30: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

In general …log-odds for alignment

( , | ) # of , combinations/all possible pairwise combinations finally take the log2-odds likelihood ratio and multiply by 2 for easy representationNote: all possible pairwise combinations=k

l m l mP X X data X X=

{ }

{ }l m

l m

l m

. ( 1) / 2 does use pseudo count to accomadate for combinations not observed:

add 1 in numerator, add 210 (20*19+20) in denominator(X , X )ie (X , X | )

k. ( 1) / 2 210(X ,X | ).log

(X

n nBlosum

countP datan n

P dataP

λ

=− +

l m

. , X | )

being a scaling factor for representation

subs matrixrandom

λ

=

Page 31: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

BLOSUM vs PAM vs others?

BLOSUM:Based on a range of evolutionary periods -- Each matrix constructed separately

Indirectly accounts for interdependence of residues

Range of sequences, range of replacements

PAMBased on extrapolation from a short evolutionary period -- Errors in PAM1 are magnified through PAM250

Assumes Markov process – too much extrapolation

Rare replacements too infrequent to be represented accurately

Others

Incorporation of secondary structure/structural alignments

Use of structural alignments

Transmemberane protein-specific matrices

Page 32: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

How do we use the matrices for sequence alignment?

-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC

DefinitionGiven two strings x = x1x2...xM, y = y1y2…yN,

An alignment of two sequences x and y is an arrangement of x and y by position, where a and b can be padded with gap symbols to achieve the same length.

AGGCTATCACCTGACCTCCAGGCCGATGCCCTAGCTATCACGACCGCGGTCGATTTGCCCGAC

Page 33: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

Why do we bother aligning?

Finding important molecular regions that are conserved across speciesGiven a new sequence, infer its function based on similarity to another Seq.Determining the evolutionary constraints at workMutations in a population or a family of genesFind similar looking sequences in a databaseInferring the secondary or tertiary structure of a sequence of interest -Molecular modeling using a template (homology modeling)

Page 34: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

X: AGCACAC-AY: A-CACACTA

X: ACACACTAY: AGCACACA

Alignment is a path in the alignment matrix

! 2 Alignments ! !

possible!

m nm nm n

++≈

Page 35: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

Dynamic programming for global alignment – simple case

Consider two sequences: x[1 .. M], and y[1 .. N]for a new position to be aligned we have ONLY three choices

1. xi aligns to yjx1……xi-1 xiy1……yj-1 yj

2. xi aligns to a gapx1……xi-1 xiy1……yj -

3. yj aligns to a gapx1……xi -y1……yj-1 yj

Align x[1 .. i] with y[1 .. j]

x[1 ..(i-1)] is already aligned with y[1 ..(j)], so align x[i] with a gap in y

x[1 .. i] is already aligned with y[1 .. (j-1)], so align a gap in x to y[j]

Page 36: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

1. xi aligns to yjx1……xi-1 xiy1……yj-1 yj

2. xi aligns to a gapx1……xi-1 xiy1……yj -

3. yj aligns to a gapx1……xi -y1……yj-1 yj

F (i,j) = F (i-1, j-1) + s(i,j)

F (i,j) = F (i-1, j) – gap_open_penalty (go)

F (i,j) = F (i, j-1) – gap_open_penalty (go)

Page 37: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

If we could make F (i, j-1), F (i-1, j), F (i-1, j-1) optimal, then we can make the next ones optimal as well

F (i-1, j-1) + s (xi, yj)F (i, j) = max F (i-1, j) – go

F (i, j-1) – go

Where s (xi, yj) = Score for a match, if xi = yj;score for a mismatch, if xi ≠ yj;

Page 38: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

1. Initialization.a. F(0, 0) = 0b. F(0, j) = - j × goc. F(i, 0) = - i × go

2. Main Iteration. Filling-in partial alignmentsFor each i = 1……M

For each j = 1……NF(i-1,j-1) + s(xi, yj) [case 1]

F(i, j) = max F(i-1, j) – go [case 2]F(i, j-1) – go [case 3]

DIAG, if [case 1]Ptr(i,j) = LEFT, if [case 2]

UP, if [case 3]

3. Termination. F(M, N) is the optimal score, andfrom Ptr(M, N) can trace back optimal alignment

The Needleman-Wunsch Algorithm– pioneering application of DP to biological sequences

Page 39: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

T C G C A

T

C

C

A

0

-2

-4 -1 2 0 -2 -4

-6 -3 0 1 1 -1

-8 -5 -2 -1 0 2

-2 -4 -6 -8 -10

1 -1 -3 -5 -7

Trace back arrows from lower right to upper left and Maximize the scores

T C G C AT C - C A1 1 -2 1 1

Alignment score = 2

F (i-1, j-1) + s (xi, yj)F (i, j) = max F (i-1, j) – go

F (i, j-1) – go

s (xi, yj)=1 for match,-1 for mismatch

go=2

An example

Page 40: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

Current model:Gap of length nincurs penalty n.go

Bunch of gaps are better than individual gapsConvex (saturating) gap penalty function:γ(n):for all n, γ(n + 1) - γ(n) ≤ γ(n) - γ(n – 1)

A common function is Affine gap penalty

γ(n) = go + (n – 1)×ge| |

gap gapopen extend

Are gaps that bad?

γ(n)

γ(n)

Page 41: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

Convex gap dynamic programming

Initialization: same as before

Iteration:F(i-1, j-1) + s(xi, yj)

F(i, j) = max maxk=0…i-1F(k,j) – γ(i-k) maxk=0…j-1F(i,k) – γ(j-k)

Termination: same

Running Time: O(N2M) (assume N>M)Space: O(NM)

Page 42: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

Fast implementation of affine gap penalty

We need three matrices for tracking scores

Matrix-1: a[i,j]= to store maximum score of an alignment thatends in x[i] matched to y[j]

Matrix-2: b[i,j]= to store maximum score of an alignment thatends in gap matched to y[j]

Matrix-3: c[i,j]= to store maximum score of an alignment thatends in gap matched to x[i]

…xi…yj

…-…yj

….xi… -

Page 43: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

Implementation – Cont’d

[ 1, 1][ , ] max [ 1, 1] ( , )

[ 1, 1]

a i ja i j b i j s i j

c i j

⎧⎪⎨⎪⎩

− −= − − +

− −

[ , 1][ , ] max [ , 1]

[ , 1]

a i j gob i j b i j ge

c i j go

⎧⎪⎨⎪⎩

− += − +

− +

[ 1, ][ , ] max [ 1, ]

[ 1, ]

a i j goc i j b i j go

c i j ge

⎧⎪⎨⎪⎩

− += − +

− +

Pointer-matrices: Three matrices to figure out which state within each score matrix maximization was used to obtain the optimal alignment of position i,j. Of course you need to know which matrix yielded the best score in the end (m,n) as well –Covered by assignment!

Page 44: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

http://www.ebi.ac.uk/Tools/emboss/align/index.html

http://www.ch.embnet.org/software/LALIGN_form.html

For checking results of your code

Page 45: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

Pair HMMs for alignment

HMM for sequence alignment, which incorporates affine gap scores.

“Hidden” StatesMatch (M)Insertion in x (X)insertion in y (Y)

Observation SymbolsMatch (M): {(a,b)| a,b in ∑ }.Insertion in x (X): {(a,-)| a in ∑ }.Insertion in y (Y): {(-,a)| a in ∑ }.

Page 46: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

Simple representation

Xqxi

Mpxiyj

Yqyj

ε

ε1-ε

1-ε

δ

δ

1-δ

ε01- ε

0ε1- ε

δδ1-2δM X

X

M

Y

Y

Page 47: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

Xqxi

Mpxiyj

Yqyj

ε

ε

1-ε-τ

δ

δ1-2δ-τ1-ε-τ

δ

δ

τ

τ

τ

τ

Begin End

1-2δ-τ

Full representation

Page 48: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

Initialization:• vM(0, 0) = 1. vX(0, 0) = vY(0, 0) = 0 v*(-1, j) = v*(i, -1) = 0.

Recurrence: i = 0,…,n, j = 0,…,m, except for(0,0);

Termination:

⎩⎨⎧

−−

=

⎩⎨⎧

−−

=

⎪⎩

⎪⎨

−−−−−−−−−−−−

=

)1,()1,(

max),(

),1(),1(

max),(

)1,1()1()1,1()1()1,1()21(

max),(

X

MY

X

MX

Y

X

M

M

jivjiv

qjiv

jivjiv

qjiv

jivjivjiv

pjiv

j

i

ji

y

x

yx

εδ

εδ

τετετδ

)),(),,(),,(max( YXME mnvmnvmnvv τ=

Algorithm: Viterbi algorithm for pair HMMs

Page 49: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

Viterbi Pair HMM is equivalent to global dynamic programming with affine gap!

Proof (sketch)Add the following missing details to DEKM’s summaryTake the viterbi matrices and divide by the terms for random sequence probability, and take the log of resulting transformation i.e,

M 2

M X 2

Y 2

MX

X

(1 2 ) ( 1, 1). (1 )( , ) max (1 ) ( 1, 1). (1 )

(1 ) ( 1, 1). (1 )

( 1, ). (1 )( , ) max

( 1, ). (1 )i

xiyj xi yi

xiyj xi yi

xiyj xi yi

xi xix

xi xi

v i j p q qv i j v i j p q q

v i j p q q

v i j q qv i j q

v i j q q

δ τ ηε τ ηε τ η

δ ηε η

⎧ − − − − −⎪= − − − − −⎨⎪ − − − − −⎩

⎧ − −= ⎨

− −⎩

Page 50: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

Multiple Sequence Alignment (MSA)

How to score? and How to align? Scoring, common approach: Sum of pairs or its variants for each column

Consider L aligned at all N positions, LL score =5total score, ( ) = 5 ( -1) 2Now consider one position being G, rest being L GL score=-4

( ) 5( 1) 4( 1) 9( 1) 9( ) ( ) 2.5 ( 1) 2.5

Relative

S i N N

S i N N NS i S i N N N

×

Δ − − + − − − − −= = =

− evidence decreases with N, opposed to having increased

confidence in L being the "correct" residue at that position

all possible pairs

( ) ( , )i iS i s x y= ∑

Page 51: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

Clustal W

Pairwise alignment: calculation of distance matrix

Rooted nJ tree (guide tree) and calculation of sequence weights

Progressive alignment following the guide tree

Page 52: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

Step 1-Calculation of Distance Matrix using pairwise alignment

Use the Distance Matrix to create a Guide Tree todetermine the “order” of the sequences.

I =D = 1 – (I) D = Difference score

# of identical aa’s in pairwise global alignmenttotal number of aa’s in shortest sequence

Hbb-Hu 1 -

Hbb-Ho 2 .17 -

Hba-Hu 3 .59 .60 -

Hba-Ho 4 .59 .59 .13 -

Myg-Ph 5 .77 .77 .75 .75 -

Gib-Pe 6 .81 .82 .73 .74 .80 -

Lgb-Lu 7 .87 .86 .86 .88 .93 .90 -

1 2 3 4 5 6 7

Page 53: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

Step 2-Create Rooted Tree and calculate weights

Weight

AlignmentOrder of alignment:1 Hba-Hu vs Hba-Ho2 Hbb-Hu vs Hbb-Ho3 A vs B4 Myg-Ph vs C5 Gib-Pe vs D6 Lgh-Lu vs E

Neighbor joining algorithm – simple, to be discussed later

Page 54: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

Step 3-Progressive alignment

Page 55: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

Step 3-Progressive alignmentScoring during progressive alignment

Page 56: How do we know what is the right distance without a “good ...epxing/Class/10810/lecture/lecture11.pdf · F(i-1,j-1) + s(x. i, y. j) [case 1] F(i, j) = max F(i-1, j) – go [case

Recommended MSA Programs

MUSCLE (fast and accurate)MAVID (genome-scale alignment)SAM ( hidden markov, powerful and wide range of options)