31
Benchmarking and Extension of Protein3DFit Nasir Mahmood

Protein Structure Alignment

Embed Size (px)

DESCRIPTION

MS bioinformatics - Project talk held on 01.04.2006

Citation preview

Page 1: Protein Structure Alignment

Benchmarking and Extension of Protein3DFit

Nasir Mahmood

Page 2: Protein Structure Alignment

Benchmarking and Extension of Protein3DFit

o Introductiono Why Align Structures?o Structure Alignment: Issues & problem

o Methodo Algorithmo Pitfallso Extension

o Resultso Outlook

Introduction Method Results Outlook

Page 3: Protein Structure Alignment

Why Align Structures?

Fundamental step in:• Analysis & Visualization• Comparison• Design

• Useful for:• Structure classification

• To predict protein folds.• Gold standard for sequence alignment.

• Structure prediction: • Identifying “structural core”.

• Function prediction• Function of protein is determined by its structure.

• Drug discovery• To pinpoint active sites accurately.

Introduction Method Results Outlook

Page 4: Protein Structure Alignment

Structure Alignment: Issues

Theoretical Issues•NP-hard geometric problem•Heuristics needed•No unique solution

Methodological Issues Choices:

• Structure Representation• Scoring function• Search algorithm

Goals Desirable•Automatic•Discriminating•Fast

Introduction Method Results Outlook

Page 5: Protein Structure Alignment

T

Simple case – two closely related proteins with the same number of amino acids.

Structure alignment Problem

Find a transformationto achieve the best superposition

A B

Introduction Method Results Outlook

Page 6: Protein Structure Alignment

Structure Alignment Problem

Two sets of Points in 3D space: A = (a1, a2, …, an) B = (b1, b2, …, bm)

Optimal rigid body transformation(calculated by Diamond (1998) method)

Distance

{ d( A(P) - T(B(Q)) ) }minT

A(P), B(Q): Optimal subsets with |A(P)|=|B(Q)|,

Introduction Method Results Outlook

Page 7: Protein Structure Alignment

Two Subproblems

1. Find correspondence set2. Find optimal transformation

(protein superposition problem)

Introduction Method Results Outlook

Page 8: Protein Structure Alignment

Correspondence set

•Correspondence is a key problem.

•Exhaustive search intractable.

•Heuristics are applied.

Introduction Method Results Outlook

Page 9: Protein Structure Alignment

Superposition

Two sets of points A= {a1,…,an} and B = {b1,…,bn}

Correspondence pairs (ai, bi) Find T

min { RMSD(A,T(B)) } T

Introduction Method Results Outlook

Page 10: Protein Structure Alignment

Aim of the project

To make algorithm more efficient debuggingoptimizationextension imporve quality of alignmentsorganization of code

Introduction Method Results Outlook

Page 11: Protein Structure Alignment

Benchmarking and Extension of Protein3DFit

o Introductiono Why Align Structures?o Structure Alignment: Issues & problem

o Methodo Algorithmo Pitfallso Extension

o Resultso Outlook

Introduction Method Results Outlook

Page 12: Protein Structure Alignment

Structure

s

Intra-Molecular distanceSeed Alignments

(Fragments & Quartets)

Next

Best FragQuartet 1Quartet 2

..............

..............

Alignment

Dynamic Programming

i<NNo

Stop

Superimpose

Cα Match?increased

YesNo

Sup. Archive

Yes

Optimization

Best Supp.

Display Best Alignment

Introduction Method Results Outlook

Algorithm

Page 13: Protein Structure Alignment

Intra-molecular Distance Matrices

1 2 3 4 5.....N

0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

Structure A Structure B

1.4

3.0

0.7

1.9

0.0

4.7

0.0

2.9

0.0

0.0

2.1

0.3

0.0

3.1

0.6

2.9

0.0

0.0

0.0

0.4

0.0

5.9

0.0

2.3

1.4

3.0

0.7

1.9

0.0

4.7

0.0

2.9

2.1

0.3

0.0

3.1

0.6

2.9

0.0

0.0

0.4

0.0

5.9

0.0

2.3

0.4

0.0

5.9

0.0

2.3

3.1

0.6

2.9

0.0

0.6

2.9

0.0

1 2 3 4 5 ........................... N0.0

0.0

0.0

1.4

3.0

0.0

1.9

0.0

4.7

0.0

2.9

0.0

0.0

2.1

0.3

0.0

3.1

0.6

2.9

0.0

0.0

0.0

0.4

0.0

5.9

0.0

2.3

1.4 1.4

1.4

1 2 3 4 5 ............................ M1

2 3 4 5.....M

i i+1

i+2

i+3

i+4i+5

i+6i+7

i i+1

i+2

i+3

i+4i+5i+6

i+7

Introduction Method Results Outlook

Page 14: Protein Structure Alignment

Seed Alignments

Extend FragmentMake Fragment

Quartet

Introduction Method Results Outlook

Page 15: Protein Structure Alignment

Structure

s

Intra-Molecular distanceSeed Alignments

(Fragments & Quartets)

Next

Best FragQuartet 1Quartet 2

..............

..............

Alignment

Dynamic Programming

i<NNo

Stop

Superimpose

Cα Match?increased

YesNo

Sup. Archive

Yes

Optimization

Best Supp.

Display Best Alignment

Introduction Method Results Outlook

Algorithm

Page 16: Protein Structure Alignment

Algorithm- Pitfalls

•Simple Similarity measure 1: Match 0: No match

Introduction Method Results Outlook

Page 17: Protein Structure Alignment

•Problems in aligning two structures low similarity.•Alignments sometimes fragmented.•Single matching residues not in sequential order (letters in the alignment).•No optimization after dynamic programming.

Introduction Method Results Outlook

Algorithm- Pitfalls

Page 18: Protein Structure Alignment

).(... o

A

AdsmrN

).(... o

A

AdsmrN

).(... o

A

AdsmrN

No Chain 1

Chain 2

Protein3DFit

VAST DALI CE

1 1FXI:A 1UBQ:_ 39/1.0 48/2.1 _ _

2 1TEN:_ 3HHR:B 71/1.0 78/1.6 86/1.9 87/1.9

3 3HLA:B 2RHE:_ 42/1.1 _ 63/2.5 85/3.5

4 2AZA:A 1PAZ:_ 52/1.2 74/2.2 _ 85/2.9

5 1CEW:I 1MOL:A 54/1.2 71/1.9 81/2.3 69/1.9

6 1CID:_ 2RHE:_ 59/1.1 85/2.2 95/3.3 94/2.7

7 1CRL:_ 1EDE:_ 100/1.2 211/3.4 187/3.2

8 2SIM:_ 1NSB:A 129/1.2 284/3.8 286/3.8 264/3.0

9 1BGE:B

2GMF:A 52/1.1 74/2.5 98/3.5 94/4.1

10 1TIE:_ 4FGF:_ 70/1.1 82/1.7 108/2.0 116/2.9

).(... o

A

AdsmrN

Reference: 10 ‘difficult‘ structures from (Fischer et al.) obtained by Dali (Holm & Sander, 1993) , VAST (Madej et al., 1995) & CE (Shindyalov & Bourne, 1998)

Algorithm- Pitfalls

Page 19: Protein Structure Alignment

Introduction Method Results Outlook

Algorithm - Extension

Page 20: Protein Structure Alignment

Superimpose

Cα Matchincreased

Yes

Structures

Intra-Mol distance basedSeed Alignments

(Fragments & Quartets)

Next

No

Best FragQuartet 1Quartet 2

..............

..............

Alignment

Dynamic Programming

Op

tim

izati

on

Superimpose

CαMatches ?Increase

YesNo

Improved?

i<N

Sup. Archive

Yes

No

i<MYes

Yes

No

OptimizationStop

Get Supp.

Display Best Alignment

Next

No

Page 21: Protein Structure Alignment

Similarity Score

2

0

ijij

)d

d(1

MS

ijS

ijd

od

:similarity score used in DP

:distance

:distance resulting in half-maximal similarity

M :maximum score for a match

Source: Gerstein Levitt, 1998

Introduction Method Results Outlook

Page 22: Protein Structure Alignment

Quality Function

alignN

21 N,N

)N)N)(RMSD/R((1

NQ

212

0

2align

Q score, ratio of no. of aligned residues & RMSD

0R

:number of matches (equivalent residues)

:length of structure 1 & 2 respectively

:empirical parameter measuringRelative significance of & RMSDalignN

21 N,N

Source: Krissinel & Henrich (2004)

Introduction Method Results Outlook

Page 23: Protein Structure Alignment

Benchmarking and Extension of Protein3DFit

o Introductiono Why Align Structures?o Structure Alignment: Issues & problem

o Methodo Algorithmo Pitfallso Extension

o Resultso Outlook

Introduction Method Results Outlook

Page 24: Protein Structure Alignment

Results

).(... o

A

AdsmrN

).(... o

A

AdsmrN

).(... o

A

AdsmrN

No Chain 1

Chain 2

Protein3DFit VAST DALI CE

1 1FXI:A 1UBQ:_ 59/2.9 48/2.1 _ _

2 1TEN:_ 3HHR:B 82/1.7 78/1.6 86/1.9 87/1.9

3 3HLA:B 2RHE:_ 69/3.4 _ 63/2.5 85/3.5

4 2AZA:A 1PAZ:_ 80/2.3 74/2.2 _ 85/2.9

5 1CEW:I 1MOL:A 76/2.1 71/1.9 81/2.3 69/1.9

6 1CID:_ 2RHE:_ 91/2.6 85/2.2 95/3.3 94/2.7

7 1CRL:_ 1EDE:_ 181/3.2 211/3.4 187/3.2

8 2SIM:_ 1NSB:A 265/3.2 284/3.8 286/3.8 264/3.0

9 1BGE:B 2GMF:A 81/3.0 74/2.5 98/3.5 94/4.1

10 1TIE:_ 4FGF:_ 105/2.6 82/1.7 108/2.0 116/2.9

).(... o

A

AdsmrN

source: 10 ‘difficult‘ structures from (Fischer et al.) obtained by Dali (Holm & Sander, 1993) , VAST (Madej et al., 1995) & CE (Shindyalov & Bourne, 1998).

39/1.0

71/1.0

42/1.1

52/1.2

54/1.2

59/1.1

100/1.2

129/1.2

52/1.1

70/1.1

Introduction Method Results Outlook

Page 25: Protein Structure Alignment

Results

0

50

100

150

200

250

300

1 2 3 4 5 6 7 8 9 10

Protein3DFit

VAST

Dali

CE

Introduction Method Results Outlook

Page 26: Protein Structure Alignment

CE

Dali3Dfit-old

3Dfit-ext

Introduction Method Results Outlook

Page 27: Protein Structure Alignment

CE

Dali3Dfit-old

3Dfit-ext

Page 28: Protein Structure Alignment

3Dfit-ext 3Dfit-old

Page 29: Protein Structure Alignment

Outlook / Possible Extensions

• Use of secondary structure information in alignment step.

• Update Protein3DFit webserver• Multiple structure superposition• Alignment of a protein structure against

PDB• ......

Page 30: Protein Structure Alignment

Acknowlegdments

Prof. Dietmar SchomburgPascal BenkertStructural Biology Group & CUBIC

Page 31: Protein Structure Alignment

Questions?

Thanks a lot for your patience!