13
1-month Practical Course Genome Analysis 2008 Lecture 3: Profiles: representing sequence alignment Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands ibivu.nl [email protected] C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E

1-month Practical Course Genome Analysis 2008 Lecture 3: Profiles: representing sequence alignment

  • Upload
    natala

  • View
    12

  • Download
    1

Embed Size (px)

DESCRIPTION

C. E. N. T. E. R. F. O. R. I. N. T. E. G. R. A. T. I. V. E. B. I. O. I. N. F. O. R. M. A. T. I. C. S. V. U. 1-month Practical Course Genome Analysis 2008 Lecture 3: Profiles: representing sequence alignment Centre for Integrative Bioinformatics VU (IBIVU) - PowerPoint PPT Presentation

Citation preview

Page 1: 1-month Practical Course Genome Analysis 2008 Lecture 3: Profiles: representing sequence alignment

1-month Practical CourseGenome Analysis 2008

Lecture 3: Profiles: representing sequence alignment

Centre for Integrative Bioinformatics VU (IBIVU)Vrije Universiteit AmsterdamThe Netherlandsibivu.nl [email protected]

CENTR

FORINTEGRATIVE

BIOINFORMATICSVU

E

Page 2: 1-month Practical Course Genome Analysis 2008 Lecture 3: Profiles: representing sequence alignment

Alignment input parametersScoring alignments

10 1

Amino Acid Exchange Matrix

Gap penalties (open, extension)

2020

A number of different schemes have been developed to compile residue exchange matrices

However, there are no formal concepts to calculate corresponding gap penalties

Emperically determined values are recommended for PAM250, BLOSUM62, etc.

Page 3: 1-month Practical Course Genome Analysis 2008 Lecture 3: Profiles: representing sequence alignment

But how can we align blocks of sequences ?

AB

CD

ABCD

E

?

The dynamic programming algorithm performs well for pairwise alignment (two axes).

So we should try to treat the blocks as a “single” sequence …

Page 4: 1-month Practical Course Genome Analysis 2008 Lecture 3: Profiles: representing sequence alignment

How to represent a block of sequences

Historically: consensus sequence single sequence that best represents the amino acids observed at each alignment position.

Modern methods: alignment profile representation that retains the information about frequencies of amino acids observed at each alignment position.

Page 5: 1-month Practical Course Genome Analysis 2008 Lecture 3: Profiles: representing sequence alignment

Consensus sequence

Problem: loss of information

For larger blocks of sequences it “punishes” more distant members

Sequence 1

F A T N M G T S D P P T H T R L R K L V S Q

Sequence 2

F V T N M N N S D G P T H T K L R K L V S T

Consensus F * T N M * * S D * P T H T * L R K L V S *

Page 6: 1-month Practical Course Genome Analysis 2008 Lecture 3: Profiles: representing sequence alignment

Alignment profiles

Advantage: full representation of the sequence alignment (more information retained)

Not only used in alignment methods, but also in sequence-database searching (to detect distant homologues)

Also called PSSM in BLAST (Position-specific scoring matrix)

Page 7: 1-month Practical Course Genome Analysis 2008 Lecture 3: Profiles: representing sequence alignment

Multiple alignment profilesMultiple alignment profiles

ACDWY

-

i

fA..fC..fD..fW..fY..Gapo, gapxGapo, gapx

Position-dependent gap penalties

Core region Core regionGapped region

Gapo, gapx

fA..fC..fD..fW..fY..

fA..fC..fD..fW..fY..

frequencies

Page 8: 1-month Practical Course Genome Analysis 2008 Lecture 3: Profiles: representing sequence alignment

Profile buildingProfile building Example: each aa is represented as a frequency and gap penalties as weights.

ACDWY

Gappenalties

i0.30.100.30.3

0.51.0Position dependent gap penalties

0.50000.5

00.50.20.10.2

1.0

Page 9: 1-month Practical Course Genome Analysis 2008 Lecture 3: Profiles: representing sequence alignment

Profile-sequence alignmentProfile-sequence alignment

ACD……VWY

sequence

Page 10: 1-month Practical Course Genome Analysis 2008 Lecture 3: Profiles: representing sequence alignment

Sequence to profile alignmentSequence to profile alignment

AAVVL

0.4 A

0.2 L

0.4 V

Score of amino acid L in a sequence that is aligned against this profile position:

Score = 0.4 * s(L, A) + 0.2 * s(L, L) + 0.4 * s(L, V)

Page 11: 1-month Practical Course Genome Analysis 2008 Lecture 3: Profiles: representing sequence alignment

Profile-profile alignmentProfile-profile alignment

ACD..Y

ACD……VWY

profile

profile

Page 12: 1-month Practical Course Genome Analysis 2008 Lecture 3: Profiles: representing sequence alignment

General function for profile-profile General function for profile-profile scoringscoring

At each position (column) we have different residue frequencies for each amino acid (rows)

Instead of saying S=s(aa1, aa2) for pairwise alignment For comparing two profile positions we take:

ACD..Y

Profile 1ACD..Y

Profile 2

20

i

20

jjiji )aa,s(aafaafaaS

Page 13: 1-month Practical Course Genome Analysis 2008 Lecture 3: Profiles: representing sequence alignment

Profile to profile alignmentProfile to profile alignment

0.4 A

0.2 L

0.4 V

Match score of these two alignment columns using the a.a frequencies at the corresponding profile positions:

Score = 0.4*0.75*s(A,G) + 0.2*0.75*s(L,G) + 0.4*0.75*s(V,G) +

+ 0.4*0.25*s(A,S) + 0.2*0.25*s(L,S) + 0.4*0.25*s(V,S)

s(x,y) is value in amino acid exchange matrix (e.g. PAM250, Blosum62) for amino acid pair (x,y)

0.75 G

0.25 S