TM PRO & Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”

Preview:

DESCRIPTION

TM PRO & Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”. Madhavi Ganapathiraju Graduate student Carnegie Mellon University. Overview. TMpro evaluations on PDBTM, TMPDB and MPTOPO are complete Additional inputs to TMPro are being studied - PowerPoint PPT Presentation

Citation preview

1

TM PRO&

Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”

Madhavi GanapathirajuGraduate student

Carnegie Mellon University

2

Overview

• TMpro evaluations on PDBTM, TMPDB and MPTOPO are complete

• Additional inputs to TMPro are being studied– Yule values (not successful)– Evolutionary Profile (promising)

• TMPro website has been completed• Evaluation of algorithms to predict protein

stability changes upon mutations

3

Part 1: TM pro

4

TMPro Evaluations

Segment Residuelevel

Method Qok SegmentF Score

Segment Recall

SegmentPrecision

Q2 Misclassified as

Soluble

MPtopo (101 TM proteins)

2a TMHMM 66 91 89 94 84 5

2b TMpro NN 60 93 92 94 79 0

PDBTM (191 TM proteins)

3a TMHMM 68 90 89 90 84 13

3b TMpro NN 57 93 93 93 81 2

5

TMPro web-server

is fully functional!

Competition for TMpro

Logo

Prize:See your

logo on the web!

6

Attempts to overcome confusion with globular soluble helices (1)

• Yule value features to be added– Yule value features that discriminate amino acid

neighbor propensities between TM and nonTM helices were computed earlier

– Tried to add these features as input to NN predictor, but could not achieve quantitative improvement

– I will discuss this in future when I have any results to present

7

Attempts to overcome confusion with globular soluble helices (2)

• Evolutionary profile information– It is known that knowledge of evolutionary profile of a

protein can improve prediction accuracy to a great extent

• TMPro is capable of predicting TMs without requiring knowledge of profile– Useful when you cannot extract sequence

alignments from known proteins

• But where profile is known, we would like to use that additional information

8

Profile generation

• Get multiple sequence alignments• Compute position specific scoring matrix for

each protein– 21 rows (20 amino acids, and 1 row for gaps)

• Profile is generated for each protein in the training and test sets

Those of you who have worked with evolutionary analysis before, please give feedback

PSSM (i,j) = log(C(i,j)/total counts at position j)log(C(i,j)/unigram count of i in the protein)

9

Doubts

• We have labels for training sequences– But when original sequence has gaps when aligned,

how to interpret the labels of the gaps?

--n------n----n------nnn-----n------n-----------------M-----2a65 369 --D------E----L------KLS-----R------K-----------------H----- 3772A65_A 369 --.------.----.------...-----.------.-----------------.----- 377AAC07817 369 --.------.----.------...-----.------.-----------------.----- 377YP_001956 364 --E------S----F------G.K-----.------.-----------------T----- 372

-M------M------M------M-------M----------M---------MM-------2a65 378 -A------V------L------W-------T----------A---------AI------- 3852A65_A 378 -.------.------.------.-------.----------.---------..------- 385AAC07817 378 -.------.------.------.-------.----------.---------..------- 385YP_001956 373 -S------C------.-----------------------------------IL------- 377

Even TM regions are having gaps such as shown above

What labels to assign to gaps?

10

Doubts

• When nothing is shown (gap/alignment) for some sequences, I am counting those as gaps

XP_659910 47 L-......K.----------...KAP----RSNQV.-..FVAGTMGLASAVGA.AT 86AAW43619 100 .....A..A-----------KNP----NTTRNV-..FMVGALGALGASSV.ST 136CAB59195 59 ----.N.RP.-A..VIGSARFAYMAWTRVA 83XP_466001 107 SKRA.-A.FVLSGGRFIYASLLRLL 130AAA20832 103 SKRA.-A.FVLTGGRFVYASLVRLL 126

What do with missing segment info for some sequences

11

Using profile for predictionStudied independent of TMpro

Neural network with 21 input, 21 hidden and 1 output neurons

Residue Number

Pre

dic

ted

ou

tpu

t(n

on

me

mb

ran

e=

0,

me

mb

ran

e =

1)

Experimentalobserved locationsof TM helices

12

Another output

13

NN architecture needs to be modifiedBut instead I did post-processing of Neural network output

Computed Wavelet TransformMexican hat wavelet, scale = 10

14

Some more wavelet outputs

Note that these are from the training data itself.. Yet to check how it performs overall

15

Part 2: Stability upon Mutations

16

Evaluation of predictions of protein stability changes upon mutations

• Effects of mutations on 2 TM proteins are available in our group– The two proteins are rhodopsin and

bacteriorhodopsin– Data available for how much mis-folding occurs– How stability of protein is affected

• There are algorithms that can also predict these changes

• We compared how accurate or reliable the prediction methods are, by comparing their results with our experimental data

17

3 Prediction algorithms

• I mutant 2.0– Support vector machine– Features: amino acid neighbors in 9nm sphere,

temperature, pH, relative solvent accessibility surface are

– http://gpcr2.biocomp.unibo.it/cgi/predictors/I-Mutant2.0/I-Mutant2.0.cgi

• DFIRE– Knowledge based statistical potentials– http://phyyz4.med.buffalo.edu/hzhou/mutation.html

• FOLDX– Statistical mechanics.. Account for various energy terms– http://fold-x.embl-heidelberg.de:1100/

18

Authors’ claims in 3 papers

19

Our results

Number of known mutations I mutant DFIRE FOLD-X

Folding 52 54.7 57.7 50Meta 2 32 78.1 73.3 46.9Both 84 64.3 63.0 50.6

Number of known mutations I mutant DFIRE FOLD-X

Folding 147 35.4 37.1 55.7Meta 2 159 56.0 47.5 67.2Both 279 55.3 38.7 52.7

Rhodopsin (PDB: 1U19)

Bacteriorhodopsin (PDB: 1QM8)

20

Bias in # of mutations that increase/decrease stability

Database bias affects apparent accuracies of algorithms

I-mutant for example, predicts decrease in stability for a majority of the mutations.

Whether the mutations studied through experiments preserve the natural bias of decreasing stability mutations, affects the apparent accuracy of the prediction algorithms

Experimental I-mutant DFIRE FOLDXRhodopsin 63 75 46 66Bacteriorhodopsin 81 97 81 65

21

Correlation with known data

I-mutant DFIRE FOLDXRhodopsin 0.11 0.16 0.24Bacteriorhodopsin -0.09 0.18 -0.18

Reported correlations for these methods are quite large (>0.7)

On data compared here the correlations are quite low

22

Notes ..

• Local installation of blast and netblast are on cologne:– /usr1/blast-2.2.13/ – /usr1/netblast-2.2.13/

• Java SDK on Cologne– /usr1/j2sdk1.4.2_11/

23

Acknowledgements

Judith Klein-Seetharaman

Christopher Jon Jursa Pitt Information sciences

(for developing web interface)

Recommended