TM PRO & Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”

TM PRO&

Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”

Madhavi GanapathirajuGraduate student

Carnegie Mellon University

Overview

• TMpro evaluations on PDBTM, TMPDB and MPTOPO are complete

• Additional inputs to TMPro are being studied– Yule values (not successful)– Evolutionary Profile (promising)

• TMPro website has been completed• Evaluation of algorithms to predict protein

stability changes upon mutations

Part 1: TM pro

TMPro Evaluations

Segment Residuelevel

Method Qok SegmentF Score

Segment Recall

SegmentPrecision

Q2 Misclassified as

Soluble

MPtopo (101 TM proteins)

2a TMHMM 66 91 89 94 84 5

2b TMpro NN 60 93 92 94 79 0

PDBTM (191 TM proteins)

3a TMHMM 68 90 89 90 84 13

3b TMpro NN 57 93 93 93 81 2

TMPro web-server

is fully functional!

Competition for TMpro

Prize:See your

logo on the web!

Attempts to overcome confusion with globular soluble helices (1)

• Yule value features to be added– Yule value features that discriminate amino acid

neighbor propensities between TM and nonTM helices were computed earlier

– Tried to add these features as input to NN predictor, but could not achieve quantitative improvement

– I will discuss this in future when I have any results to present

Attempts to overcome confusion with globular soluble helices (2)

• Evolutionary profile information– It is known that knowledge of evolutionary profile of a

protein can improve prediction accuracy to a great extent

• TMPro is capable of predicting TMs without requiring knowledge of profile– Useful when you cannot extract sequence

alignments from known proteins

• But where profile is known, we would like to use that additional information

Profile generation

• Get multiple sequence alignments• Compute position specific scoring matrix for

each protein– 21 rows (20 amino acids, and 1 row for gaps)

• Profile is generated for each protein in the training and test sets

Those of you who have worked with evolutionary analysis before, please give feedback

PSSM (i,j) = log(C(i,j)/total counts at position j)log(C(i,j)/unigram count of i in the protein)

Doubts

• We have labels for training sequences– But when original sequence has gaps when aligned,

how to interpret the labels of the gaps?

--n------n----n------nnn-----n------n-----------------M-----2a65 369 --D------E----L------KLS-----R------K-----------------H----- 3772A65_A 369 --.------.----.------...-----.------.-----------------.----- 377AAC07817 369 --.------.----.------...-----.------.-----------------.----- 377YP_001956 364 --E------S----F------G.K-----.------.-----------------T----- 372

-M------M------M------M-------M----------M---------MM-------2a65 378 -A------V------L------W-------T----------A---------AI------- 3852A65_A 378 -.------.------.------.-------.----------.---------..------- 385AAC07817 378 -.------.------.------.-------.----------.---------..------- 385YP_001956 373 -S------C------.-----------------------------------IL------- 377

Even TM regions are having gaps such as shown above

What labels to assign to gaps?

Doubts

• When nothing is shown (gap/alignment) for some sequences, I am counting those as gaps

XP_659910 47 L-......K.----------...KAP----RSNQV.-..FVAGTMGLASAVGA.AT 86AAW43619 100 .....A..A-----------KNP----NTTRNV-..FMVGALGALGASSV.ST 136CAB59195 59 ----.N.RP.-A..VIGSARFAYMAWTRVA 83XP_466001 107 SKRA.-A.FVLSGGRFIYASLLRLL 130AAA20832 103 SKRA.-A.FVLTGGRFVYASLVRLL 126

What do with missing segment info for some sequences

Using profile for predictionStudied independent of TMpro

Neural network with 21 input, 21 hidden and 1 output neurons

Residue Number

Experimentalobserved locationsof TM helices

Another output

NN architecture needs to be modifiedBut instead I did post-processing of Neural network output

Computed Wavelet TransformMexican hat wavelet, scale = 10

Some more wavelet outputs

Note that these are from the training data itself.. Yet to check how it performs overall

Part 2: Stability upon Mutations

Evaluation of predictions of protein stability changes upon mutations

• Effects of mutations on 2 TM proteins are available in our group– The two proteins are rhodopsin and

bacteriorhodopsin– Data available for how much mis-folding occurs– How stability of protein is affected

• There are algorithms that can also predict these changes

• We compared how accurate or reliable the prediction methods are, by comparing their results with our experimental data

3 Prediction algorithms

• I mutant 2.0– Support vector machine– Features: amino acid neighbors in 9nm sphere,

temperature, pH, relative solvent accessibility surface are

– http://gpcr2.biocomp.unibo.it/cgi/predictors/I-Mutant2.0/I-Mutant2.0.cgi

• DFIRE– Knowledge based statistical potentials– http://phyyz4.med.buffalo.edu/hzhou/mutation.html

• FOLDX– Statistical mechanics.. Account for various energy terms– http://fold-x.embl-heidelberg.de:1100/

Authors’ claims in 3 papers

Our results

Number of known mutations I mutant DFIRE FOLD-X

Folding 52 54.7 57.7 50Meta 2 32 78.1 73.3 46.9Both 84 64.3 63.0 50.6

Number of known mutations I mutant DFIRE FOLD-X

Folding 147 35.4 37.1 55.7Meta 2 159 56.0 47.5 67.2Both 279 55.3 38.7 52.7

Rhodopsin (PDB: 1U19)

Bacteriorhodopsin (PDB: 1QM8)

Bias in # of mutations that increase/decrease stability

Database bias affects apparent accuracies of algorithms

I-mutant for example, predicts decrease in stability for a majority of the mutations.

Whether the mutations studied through experiments preserve the natural bias of decreasing stability mutations, affects the apparent accuracy of the prediction algorithms

Experimental I-mutant DFIRE FOLDXRhodopsin 63 75 46 66Bacteriorhodopsin 81 97 81 65

Correlation with known data

I-mutant DFIRE FOLDXRhodopsin 0.11 0.16 0.24Bacteriorhodopsin -0.09 0.18 -0.18

Reported correlations for these methods are quite large (>0.7)

On data compared here the correlations are quite low

Notes ..

• Local installation of blast and netblast are on cologne:– /usr1/blast-2.2.13/ – /usr1/netblast-2.2.13/

• Java SDK on Cologne– /usr1/j2sdk1.4.2_11/

Acknowledgements

Judith Klein-Seetharaman

Christopher Jon Jursa Pitt Information sciences

(for developing web interface)

TM PRO & Comparison of Algorithms for “Protein Stability Prediction Upon Mutations”

Documents

NOAA TM GLERL-51. A two-dimensional lake wave prediction ... · PDF fileA TWO-DIMENSIONAL LAKE WAVE PREDICTION SYSTEM* David J. Schwab, ... Recently, a two-dimensional numerical wave

Mutations & Evolution. Fig. 14-1 Point Mutations

Recent Development of the Coupled Ocean/Atmosphere Mesoscale Prediction System (COAMPS TM )

Web viewTumor-suppressor genes. Gene mutations. Definition: Causes: Errors in replication. Mutagens. transposons. Types of mutations: Frameshift mutations. Point mutations

Moving the Hazard Prediction and Assessment Capability to .../67531/metadc... · ORNL/TM-2002/145 Moving the Hazard Prediction and Assessment Capability to a Distributed, Portable

Let’s Play Gene Mutations Chromosomal Mutations

TM Series GNSS Receiver Module Data Guide - Linx … Hybrid Ephemeris Prediction ... The TM Series GNSS Receiver module is based on the MediaTek MT3333 chipset, which consumes less

SOMATIC MUTATIONS, GERM-LINE MUTATIONS, AND …mcb.berkeley.edu/courses/mcb41/BreastCancer.pdf · SOMATIC MUTATIONS, GERM-LINE MUTATIONS, AND BREAST CANCER READING: pp. 202-220 CANCERoverview

Mutations Natural and Artificial Mutations. Mutations There are 2 classes of mutations Nucleotide mutations occur when 1-4 nucleotides are altered, added

Types of mutations Mutations: Mutations: protein level ...guralnl/441Mutations and the phenotype.pdf · Mutations and the phenotype • Types of mutations • Agents of mutations

SEC3 Mutations Are Synthetically Lethal With Profilin Mutations and

Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry

Afwal Tm 84 203 Prediction of Aerodynamic Drag

DNA Mutations. Victims of Chernobyl - Mutations What are mutations? Mutations are a change in the genetic material of a cell (i.e. the genes)

point mutations silent mutations

Human Genetic Disorders. Types of mutations Genetic disorders occur due to mutations. Mutations are defined as permanent damage of DNA. Mutations that

A Canonical Ensemble Correlation Prediction Model …...NASA/TM--2001-209989 A Canonical Ensemble Correlation Prediction Model for Seasonal Precipitation Anomaly S. Samuel, P Shen,

MUTATIONS Slide 2MutationsMutations Slide 3Examples of MutationsExamples of Mutations Slide 4How Mutations…

Human Genetic Mutations. 2 Main Types of Mutations 1.) Chromosomal Mutations 2.) Gene Mutations

12.4 MUTATIONS I. Kinds of Mutations Mutations are changes in the genetic material A. Gene mutations – changes in a single gene A. Gene mutations – changes