35
Pairwise Sequence Alignment

PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

PairwiseSequence Alignment

Page 2: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Outline

�Introduction

�Global Alignment

�The Basic Algorithm

�Local Alignment

�SemiglobalAlignment

Page 3: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Introduction

�Sequence similarity is an indicator of

homology

�There are other uses for sequence

�There are other uses for sequence

similarity

�Database queries

�Comparative genomics

�…

Page 4: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Introduction

�Example:

GACGGATTAG

GATCGGAATAG

GA-CGGATTAG

GATCGGAATAG

GA-CGGA-TTAG

9*1 + 3*(-2) = 3

9*1 -2 -1= 6

�Scoring

�Match +1

�mismatch -1

�Gap penalty -2

GA-CGGA-TTAG

GATCGGAA-TAG

Page 5: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Introduction

�The sequences may have different sizes.

�We define an alignment as the insertion of

spacesin arbitrary locations along the

spacesin arbitrary locations along the

sequences so that they end up with the

same size.

�In general , there may be many alignments

with maximum score.

Page 6: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Different alignment

�Global Alignment

�Local Alignment

�Local Alignment

�Semiglobal Alignment

Page 7: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Global Alignment

�Using Dynamic Programming

�Reuses the results of previous computations

�Example (two sequence x , y):

�Example (two sequence x , y):

x:AGC

y:AAAC

�Scoring function:

�Match +1

�mismatch -1

�Gap penalty -2

Page 8: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Global Alignment

-A

AA

C

yj

Step 1 : forming a matrix F(i,j)

-0

-2-4

-6-8

A-2

G-4

C-6

xi

Page 9: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Global Alignment

F(i-1,j-1)

F(i-1,j)

Move ahead in both

xialigned to gap

F(i,j-1)

F(i,j)

match(xi,yj)

gap penalty

gap penalty

yjaligned to

gap

While building the table, keep track of where

optimal score came from, reverse arrows

Page 10: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Global Alignment

F(i-1,j-1) + match(x

i,yj)

�F( i , j ) =max of F(i-1,j) + gap penalty

F(i,j-1) + gap penalty

F(i,j-1) + gap penalty

Page 11: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Global Alignment

�Seq1:AAAC

�Seq2:AGC

-A

AA

C

-0

-2-4

-6-8

A

-0

-2-4

-6-8

A-2

G-4

C-6

A -

A-

-A

- A

-A

A -

A A

Page 12: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Global Alignment

-A

AA

C

yj

F(1,1) :

↘: 0 + 1 = 1

↓ : -2 + (-2) = -4

→ : -2 + (-2) = -4

-0

-2-4

-6-8

A-2

G-4

C-6

xi

→ : -2 + (-2) = -4

1

Page 13: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Global Alignment

-A

AA

C

yj

F(1,2) :

↘: -2 + 1 = -1

↓ : -4 + (-2) = -6

→ : 1 + (-2) = -1

-0

-2-4

-6-8

A-2

1

G-4

C-6

xi

→ : 1 + (-2) = -1

-1

Page 14: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Global Alignment

-A

AA

C

yj

-0

-2-4

-6-8

A-2

1-1

-3-5

G-4

-10

-2-4

C-6

-3-2

-1-1

xi

Page 15: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Global Alignment

-A

AA

C

yj

Step 2 : trace back

Aligned Sequences :

X :

CG

A-

-0

-2-4

-6-8

A-2

1-1

-3-5

G-4

-10

-2-4

C-6

-3-2

-1-1

xi

X :

Y :

C C

G A

A AA-

Page 16: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Global Alignment

-A

AA

C

yj

Step 2 : trace back

Aligned Sequences :

X :

C-

GA

-0

-2-4

-6-8

A-2

1-1

-3-5

G-4

-10

-2-4

C-6

-3-2

-1-1

xi

X :

Y :

C C

- A

G AAA

Page 17: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Global Alignment

�Arrow preference by

�For instance, when aligning x=ATAT with

y=TATA , we get

–A T A Trather than

A T A T –

T A T A –

–T A T A

Page 18: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Global Alignment

�Summary

�Uses recursion to fill in intermediate

results table

results table

�Uses O(nm) space and time

�O(n2) algorithm

�Feasible for moderate sized sequences, but

not for aligning whole genomes.

Page 19: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Local Alignment

�A local alignment between xand y is an

alignment between a substring of xand a

substring of y.

00 F(i-1,j-1) + match(x

i,yj)

�F( i , j ) =max of F(i-1,j) + gap penalty

F(i,j-1) + gap penalty

Page 20: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Local Alignment

-C

GA

AG

TT

G

-0

00

00

00

00

A0

G0

G0

A0

C0

G0

T0

C0

Page 21: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Local Alignment

-C

GA

AG

TT

G

-0

00

00

00

00

A0

00

11

00

00

G0

01

00

20

01

G0

01

00

20

01

A0

00

21

01

00

C0

10

01

00

00

G0

02

00

20

01

T0

00

10

03

10

C0

10

00

01

20

Page 22: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Local Alignment

-C

GA

AG

TT

G

-0

00

00

00

00

A0

00

11

00

00

G0

01

00

20

01

G0

01

00

20

01

A0

00

21

01

00

C0

10

01

00

00

G0

02

00

20

01

T0

00

10

03

10

C0

10

00

01

20

Page 23: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Local Alignment

-C

GA

AG

TT

G

-0

00

00

00

00

A0

00

11

00

00

G0

01

00

20

01

G0

01

00

20

01

A0

00

21

01

00

C0

10

01

00

00

G0

02

00

20

01

T0

00

10

03

10

C0

10

00

01

20

Page 24: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Local Alignment

-C

GA

AG

TT

G

-0

00

00

00

00

A0

00

11

00

00

G0

01

00

20

01

G0

01

00

20

01

A0

00

21

01

00

C0

10

01

00

00

G0

02

00

20

01

T0

00

10

03

10

C0

10

00

01

20

Page 25: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Local Alignment

-C

GA

AG

TT

G

-0

00

00

00

00

A0

00

11

00

00

G0

01

00

20

01

G0

01

00

20

01

A0

00

21

01

00

C0

10

01

00

00

G0

02

00

20

01

T0

00

10

03

10

C0

10

00

01

20

Page 26: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Local Alignment

-C

GA

AG

TT

G

-0

00

00

00

00

A0

00

11

00

00

Local alignment :

G A A G T

G A C G T

G0

01

00

20

01

A0

00

21

01

00

C0

10

01

00

00

G0

02

00

20

01

T0

00

10

03

10

C0

10

00

01

20

Page 27: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Local Alignment

-C

GA

AG

TT

G

-0

00

00

00

00

A0

00

11

00

00

G0

01

00

20

01

A0

00

21

01

00

C0

10

01

00

00

G0

02

00

20

01

T0

00

10

03

10

C0

10

00

01

20

Page 28: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Semiglobal Alignment

�we score alignments ignoring some of the

end gapsin the sequences.

�Example:

CAGCA-CTTGGATTCTCGG

CAGCA-CTTGGATTCTCGG

---CAGCCTGG--------

Observe that this is not the best alignment.

Below we present another alignment with a

higher score(-12 against -19).

CAGCACTTGGATTCTCGG

CAGC–––––G–T-–––GG

Page 29: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Semiglobal Alignment

-C

GA

AG

TT

G

-0

00

00

00

00

G-2

G-2

A-4

C-6

G-8

Page 30: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Semiglobal Alignment

-C

GA

AG

TT

G

-0

00

00

00

00

G-2

-1

1-1

-1

1-1

-1

1G

-2

-1

1-1

-1

1-1

-1

1

A-4

-3

-1

20

-1

0-2

-1

C-6

-3

-3

01

-1

-2

-1

-3

G-8

-5

-2

-2

-1

2-2

-3

0

Page 31: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Semiglobal Alignment

-C

GA

AG

TT

G

-0

00

00

00

00

G-2

-1

1-1

-1

1-1

-1

1G

-2

-1

1-1

-1

1-1

-1

1

A-4

-3

-1

20

-1

0-2

-1

C-6

-3

-3

01

-1

-2

-1

-3

G-8

-5

-2

-2

-1

2-2

-3

0

Page 32: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Semiglobal Alignment

-C

GA

AG

TT

G

-0

00

00

00

00

G-2

-1

1-1

-1

1-1

-1

1G

-2

-1

1-1

-1

1-1

-1

1

A-4

-3

-1

20

-1

0-2

-1

C-6

-3

-3

01

-1

-2

-1

-3

G-8

-5

-2

-2

-1

2-2

-3

0

Page 33: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Semiglobal Alignment

-C

GA

AG

TT

G

-0

00

00

00

00

G-2

-1

1-1

-1

1-1

-1

1G

-2

-1

1-1

-1

1-1

-1

1

A-4

-3

-1

20

-1

0-2

-1

C-6

-3

-3

01

-1

-2

-1

-3

G-8

-5

-2

-2

-1

2-2

-3

0

Page 34: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Semiglobal Alignment

-C

GA

AG

TT

G

-0

00

00

00

00

G-2

-1

1-1

-1

1-1

-1

1G

-2

-1

1-1

-1

1-1

-1

1

A-4

-3

-1

20

-1

0-2

-1

C-6

-3

-3

01

-1

-2

-1

-3

G-8

-5

-2

-2

-1

2-2

-3

0

CGAAGTTG

-GACG---

Page 35: PairwiseSequence Alignmentkowon.dongseo.ac.kr/~dkkang/ALGO2009Spring/7-1 Sequence Align… · Introduction Sequence similarity is an indicator of ... CGGA T TAG GA T CGGA A TAG GA-CGGA-T

Summary

�Global Alignment

�Local Alignment

�Local Alignment

�Semiglobal Alignment