113
Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Embed Size (px)

Citation preview

Page 1: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Prediction of structural and functional features in proteins

starting from the residue sequence

INTRODUCTION TO NEURAL NETWORKS

Page 2: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Covalent structureTTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN

Ct

Nt

3D structure

Secondary structureEEEE..HHHHHHHHHHHH....HHHHHHHH.EEEE...........

MAPPING PROBLEMS: Secondary structure

Page 3: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

position of Trans Membrane Segments along the sequenceTopography

Porin (Rhodobacter capsulatus)

Bacteriorhodopsin(Halobacterium salinarum)

Bil

ayer

-barrel -helices

Outer Membrane Inner Membrane

ALALMLCMLTYRHKELKLKLKK ALALMLCMLTYRHKELKLKLKK ALALMLCMLTYRHKELKLKLKK

MAPPING PROBLEMS: Topology of transmembrane proteins

Page 4: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

First generation methodsFirst generation methodsSingle residue statisticsSingle residue statistics

Propensity scales

For each residue

•The association between each residue and the different features is statistically evaluated

•Physical and chemical features of residues

A propensity value for any structure can be associated to any residue

HOW?

Page 5: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Secondary structure: Chou-Fasman propensity Secondary structure: Chou-Fasman propensity scalescale

Given a set of known structures we can count how many times a residue is associated to a structure.

Example: ALAKSLAKPSDTLAKSDFREKWEWLKLLKALACCKLSAALhhhhhhhhccccccccccccchhhhhhhhhhhhhhhhhhh

N(A,h) = 7, N(A,c) =1, N= 40

P(A,h) = 7/40, P(A,h) = 1/40

Is that enough for estimating a propensity?

Page 6: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Secondary structure: Chou-Fasman propensity Secondary structure: Chou-Fasman propensity scalescale

Given a set of known structures we can count how many times a residue is associated to a structure.

Example: ALAKSLAKPSDTLAKSDFREKWEWLKLLKALACCKLSAALhhhhhhhhccccccccccccchhhhhhhhhhhhhhhhhhh

N(A,h) = 7, N(A,c) =1, N= 40

P(A,h) = 7/40, P(A,h) = 1/40

We need to estimate how much independent the residue-to-structure association is.

P(h) = 27/40, P(c) = 13/40

Page 7: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Secondary structure: Chou-Fasman propensity Secondary structure: Chou-Fasman propensity scalescale

Given a set of known structures we can count how many times a residue is associated to a structure.

Example: ALAKSLAKPSDTLAKSDFREKWEWLKLLKALACCKLSAALhhhhhhhhccccccccccccchhhhhhhhhhhhhhhhhhh

N(A,h) = 7, N(A,c) =1, N= 40

P(A,h) = 7/40, P(A,h) = 1/40

P(h) = 27/40, P(c) = 13/40

If the structure is independent of the residue:P(A,h) = P(A)P(h)

The ratio P(A,h)/P(A)P(h) is the propensity

Page 8: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Given a LARGE set of examples, a propensity value can be computed for each residue and each structure type

Name P(H) P(E) Alanine 1,42 0,83Arginine 0,98 0,93Aspartic Acid 1,01 0,54Asparagine 0,67 0,89Cysteine 0,70 1,19Glutamic Acid 1,51 0,37Glutamine 1,11 1,10Glycine 0,57 0,75Histidine 1,00 0,87Isoleucine 1,08 1,60Leucine 1,21 1,30Lysine 1,14 0,74Methionine 1,45 1,05Phenylalanine 1,13 1,38Proline 0,57 0,55Serine 0,77 0,75Threonine 0,83 1,19Tryptophan 1,08 1,37Tyrosine 0,69 1,47Valine 1,06 1,70

Secondary structure: Chou-Fasman propensity Secondary structure: Chou-Fasman propensity scalescale

Page 9: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Given a new sequence a secondary structure prediction can be obtained by plotting the propensity values for each structure, residue by residue

Considering three secondary structures (H,E,C), the overall accuracy, as evaluated on an uncorrelated set of sequences with known structure, is very lowQ3 = 50/60 %

T S P T A E L M R S T GP(H) 69 77 57 69 142 151 121 145 98 77 69 57P(E) 147 75 55 147 83 37 130 105 93 75 147 75

Secondary structure: Chou-Fasman propensity Secondary structure: Chou-Fasman propensity scalescale

Page 10: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

http://www.expasy.ch/cgi-bin/protscale.pl

Secondary structure: Chou-Fasman propensity Secondary structure: Chou-Fasman propensity scalescale

Page 11: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Transmembrane alpha-helices: Kyte-Doolittle Transmembrane alpha-helices: Kyte-Doolittle scalescale

It is computed taking into consideration the octanol-water partition coefficient, combined with the propensity of the residues to be found in known transmembrane helices

Ala: 1.800 Arg: -4.500 Asn: -3.500 Asp: -3.500 Cys: 2.500 Gln: -3.500 Glu: -3.500 Gly: -0.400 His: -3.200 Ile: 4.500 Leu: 3.800 Lys: -3.900 Met: 1.900 Phe: 2.800 Pro: -1.600 Ser: -0.800 Thr: -0.700 Trp: -0.900 Tyr: -1.300 Val: 4.200

Page 12: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Second generation methods: GORSecond generation methods: GOR

The structure of a residue in a protein strongly depends on the sequence context

It is possible to estimate the influence of a residue in determining the structure of a residue close along the sequence. Usually windows from -8/8 to -13/13 are considered.

Coefficients P(A,s,i) estimate the contribution of the residue A in determining the structure s for a residue that is i positions apart along the sequence

Page 13: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Struttura secondaria: Metodo GORStruttura secondaria: Metodo GOR

Q3 = 65 % (Considering three secondary structures (H,E,C), and evaluating the overall accuracy on an uncorrelated set of sequences with known structure)

The contribution of each position in the window is independent of the other ones. No correlation among the positions in the window is taken in to account.

Page 14: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

A more efficient method: Neural NetworksA more efficient method: Neural Networks

Alternative computing algorithm: analogies with the computation in the nervous system.

1) The nervous systems is constituted of elementary computing units: neurons2) The electric signal flows in a determined direction (dentrites->axon) (Principle of dynamic polarization)3)There is not cytoplasmic continuity among the neurons. Each neuron specifically communicates with some neighboring neurons by means of synapses (Principle of connective specificity)

Page 15: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

PredictionNew sequence

Prediction

Tools out of machine learning approaches

Tools out of machine learning approaches

Neural Networks can learn the mapping from sequence to secondary structureNeural Networks can learn the mapping from sequence to secondary structure

General

rules

Data Base Subset

Known mapping

TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN

Training

EEEE..HHHHHHHHHHHH....HHHHHHHH.EEEE

Page 16: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Neural network for secondary structure Neural network for secondary structure predictionprediction

Input

Output

C

M P I L K QK P I H Y H P N H G E A K G

A 0 0 0 0 0 0 0 0 0C 0 0 0 0 0 0 0 0 0D 0 0 0 0 0 0 0 0 0 E 0 0 0 0 0 0 0 0 0 F 0 0 0 0 0 0 0 0 0G 0 0 0 0 0 0 0 0 0H 0 0 0 1 0 1 0 0 1I 0 0 1 0 0 0 0 0 0K 1 0 0 0 0 0 0 0 0L 0 0 0 0 0 0 0 0 0M 0 0 0 0 0 0 0 0 0N 0 0 0 0 0 0 0 1 0P 0 1 0 0 0 0 1 0 0Q 0 0 0 0 0 0 0 0 0R 0 0 0 0 0 0 0 0 0S 0 0 0 0 0 0 0 0 0T 0 0 0 0 0 0 0 0 0 V 0 0 0 0 0 0 0 0 0W 0 0 0 0 0 0 0 0 0Y 0 0 0 0 1 0 0 0 0

Usually:Input 17-23 residues

Hidden neurons :4-15

Page 17: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

ACDEFGHIKLMNPQRSTVWY.

H

E

L

D (L)

R (E)

Q (E)

G (E)

F (E)

V (E)

P (E)

A (H)

A (H)

Y (H)

V (E)

K (E)

K (E)

Page 18: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Third generation methods: evolutionary Third generation methods: evolutionary informationinformation

1 Y K D Y H S - D K K K G E L - -2 Y R D Y Q T - D Q K K G D L - -3 Y R D Y Q S - D H K K G E L - -4 Y R D Y V S - D H K K G E L - -5 Y R D Y Q F - D Q K K G S L - -6 Y K D Y N T - H Q K K N E S - -7 Y R D Y Q T - D H K K A D L - -8 G Y G F G - - L I K N T E T T K 9 T K G Y G F G L I K N T E T T K10 T K G Y G F G L I K N T E T T K

A 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0D 0 0 70 0 0 0 0 60 0 0 0 0 20 0 0 0E 0 0 0 0 0 0 0 0 0 0 0 0 70 0 0 0F 0 0 0 10 0 33 0 0 0 0 0 0 0 0 0 0G 10 0 30 0 30 0 100 0 0 0 0 50 0 0 0 0H 0 0 0 0 10 0 0 10 30 0 0 0 0 0 0 0K 0 40 0 0 0 0 0 0 10 100 70 0 0 0 0 100I 0 0 0 0 0 0 0 0 30 0 0 0 0 0 0 0L 0 0 0 0 0 0 0 30 0 0 0 0 0 0 0 0M 0 0 0 0 0 0 0 0 0 0 0 0 0 60 0 0N 0 0 0 0 10 0 0 0 0 0 30 10 0 0 0 0P 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0Q 0 0 0 0 40 0 0 0 30 0 0 0 0 0 0 0R 0 50 0 0 0 0 0 0 0 0 0 0 0 0 0 0S 0 0 0 0 0 33 0 0 0 0 0 0 10 10 0 0T 20 0 0 0 0 33 0 0 0 0 0 30 0 30 100 0V 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0W 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0Y 70 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0

Position

Page 19: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

SeqNo No V L I M F W Y G A P S T C H R K Q E N D

1 1 80 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 80 3 3 50 0 0 0 0 0 0 0 33 0 0 0 0 0 0 0 0 17 0 0 4 4 0 0 0 0 0 0 0 0 13 63 13 0 0 0 0 0 0 13 0 0 5 5 13 0 0 0 0 0 0 13 75 0 0 0 0 0 0 0 0 0 0 0 6 6 0 0 0 13 0 0 0 0 0 13 0 13 0 0 0 0 0 0 0 63 7 7 0 0 0 38 0 0 0 38 0 0 0 0 0 0 0 25 0 0 0 0 8 8 25 13 0 0 0 0 0 0 50 0 13 0 0 0 0 0 0 0 0 0 9 9 0 13 13 0 0 0 0 0 0 25 0 0 0 0 0 50 0 0 0 0 10 10 0 0 25 13 0 0 0 0 13 13 0 0 0 0 0 38 0 0 0 0 11 11 0 0 0 0 0 0 0 0 25 0 0 0 0 0 0 13 13 0 0 50 12 12 0 0 0 0 43 0 0 29 0 29 0 0 0 0 0 0 0 0 0 0 13 13 0 14 29 0 0 0 0 0 29 0 0 0 0 0 0 0 0 14 0 14 14 14 0 0 0 0 0 0 0 43 29 0 0 0 0 0 0 29 0 0 0 0

The Network Architecture for Secondary Structure

Prediction

The Network Architecture for Secondary Structure

PredictionThe First Network (Sequence to Structure)The First Network (Sequence to Structure)

H E C

CCHHEHHHHCHHCCEECCEEEEHHHCC

Page 20: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

The Network Architecture for Secondary Structure

Prediction

The Network Architecture for Secondary Structure

Prediction

SeqNo No V L I M F W Y G A P S T C H R K Q E N D

1 1 80 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 80 3 3 50 0 0 0 0 0 0 0 33 0 0 0 0 0 0 0 0 17 0 0 4 4 0 0 0 0 0 0 0 0 13 63 13 0 0 0 0 0 0 13 0 0 5 5 13 0 0 0 0 0 0 13 75 0 0 0 0 0 0 0 0 0 0 0 6 6 0 0 0 13 0 0 0 0 0 13 0 13 0 0 0 0 0 0 0 63 7 7 0 0 0 38 0 0 0 38 0 0 0 0 0 0 0 25 0 0 0 0 8 8 25 13 0 0 0 0 0 0 50 0 13 0 0 0 0 0 0 0 0 0 9 9 0 13 13 0 0 0 0 0 0 25 0 0 0 0 0 50 0 0 0 0 10 10 0 0 25 13 0 0 0 0 13 13 0 0 0 0 0 38 0 0 0 0 11 11 0 0 0 0 0 0 0 0 25 0 0 0 0 0 0 13 13 0 0 50 12 12 0 0 0 0 43 0 0 29 0 29 0 0 0 0 0 0 0 0 0 0 13 13 0 14 29 0 0 0 0 0 29 0 0 0 0 0 0 0 0 14 0 14 14 14 0 0 0 0 0 0 0 43 29 0 0 0 0 0 0 29 0 0 0 0

The Second Network (Structure to Structure)The Second Network (Structure to Structure)

CCHHEHHHHCHHCCEECCEEEEHHHCC

H E C

Page 21: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Protein set

Training set 1

Testing set 1

The cross validation procedureThe cross validation procedure

The Performance on the Task of Secondary Structure

Prediction

The Performance on the Task of Secondary Structure

Prediction

Page 22: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Efficiency of the Neural Network-Based Predictors onthe 822 Proteins of the Testing Set

INPUTQ3 (%) 66.3

Single SOV 0.62Sequence Q[H] 0.69 Q[E] 0.61 Q[C] 0.66

P[H] 0.70 P[E] 0.54 P[C] 0.71C[H] 0.54 C[E] 0.44 C[C] 0.45

Q3(%) 72.4Multiple SOV 0.69Sequence Q[H] 0.75 Q[E] 0.65 Q[C] 0.75(MaxHom) P[H] 0.77 P[E] 0.64 P[C] 0.73

C[H] 0.64 C[E] 0.54 C[C] 0.53Q3(%) 73.4

Multiple SOV 0.70Sequence Q[H] 0.75 Q[E] 0.70 Q[C] 0.73(PSI-BLAST) P[H] 0.80 P[E] 0.63 P[C] 0.75

C[H] 0.67 C[E] 0.56 C[C] 0.53

Combinando differenti reti: Q3 =76/78%

Page 23: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Secondary Structure PredictionSecondary Structure Prediction

From sequenceFrom sequence

TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN

EEEE..HHHHHHHHHHHH....HHHHHHHH.EEEE...........

To secondary structureTo secondary structure

7997688899999988776886778999887679956889999999

And to the reliability of the predictionAnd to the reliability of the prediction

Page 24: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

PredictProtein Burkhard Rost (Columbia Univ.)http://cubic.bioc.columbia.edu/predictprotein/

PsiPRED David Jones (UCL)http://bioinf.cs.ucl.ac.uk/psipred/

JPred Geoff Barton (Dundee Univ.)

SecPRED http://www.biocomp.unibo.it

SERVERSSERVERS

Page 25: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

QEALEIA

1TIF

1WTUA

Translation Initiation Factor 3

Bacillus stearothermophilus

……GIKSKQEALEIAARRN……

Transcription Factor 1

Bacteriophage Spo1

……FNPQTQEALEIAPSVGV……

Chamaleon sequencesChamaleon sequences

Page 26: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

We extract: We extract:

2,452 5-mer chameleons 107 6-mer chameleons 16 7-mer chameleons 1 8-mer chameleon

2,576 couples

The total number of residues in chameleons is 26,044 out of 755 protein chains (~15%)

from a set of 822 non-homologous proteins(174,192 residues)

Page 27: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

C

NGDQLGIKSKQEALEIAARRNLDLVLVAP

C

ARKGFNPQTQEALEIAPSVGVSVKPG

Prediction of the Secondary Structure of Chameleon sequences with Neural

Networks

Prediction of the Secondary Structure of Chameleon sequences with Neural

NetworksQEALEIAHHHHHHH

QEALEIACCCCCCC

Page 28: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

The Prediction of Chameleons with Neural Networks

The Prediction of Chameleons with Neural Networks

Page 29: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

•Secondary structure

•Topology of transmebrane proteins

•Cysteine bonding state

•Contact maps of proteins

•Interaction sites on protein surface

Other neural network-based predictorsOther neural network-based predictors

Page 30: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Prediction of the cysteine bonding statePrediction of the cysteine bonding state

Tryparedoxin-I from Crithidia fasciculata (1QK8)

Cys40

Cys43

Cys68

Free cysteines

Disulphide bonded cysteines

MSGLDKYLPGIEKLRRGDGEVEVKSLAGKLVFFYFSASWCPPCRGFTPQLIEFYDKFHES KNFEVVFCTWDEEEDGFAGYFAKMPWLAVPFAQSEAVQKLSKHFNVESIPTLIGVDADSG DVVTTRARATLVKDPEGEQFPWKDAP

Page 31: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

A neural network-based method for

predicting the disulfide connectivity

in proteins

A neural network-based method for

predicting the disulfide connectivity

in proteins

Page 32: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

The Protein Folding

T T C C P S I V A R S N F N V C R L P G T P E A L C A T Y T G C I I I P G A T C P G D Y A N

Page 33: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

The Protein Folding

RPDFCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCGGA

Page 34: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Disulfide bonds Disulfide bonds

2-SH -> -SS- + 2H+ + 2e-

S-S distance 2.2 Å

Torsion angle C-S-S-C 90°

Bond Energy 3 Kcal/mol

S

SC CC

C

Page 35: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Intra-chain disulfide bonds in proteins

Of 1259 proteins (a non redundant PDB subset):

• 23% of the chainshave disulfide bonds (S S)

• SS distribution (between secondary structures) % H E C H 7 9 14 E 17 27 C 26

Page 36: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Intra-chain disulfide bonds in proteins

•Distribution: Type % All-13 All-31 / 11 + 13 Small domains 29 Others 3

Distribution of disulfide bonds in the SCOP domains

•99 % of the disulfide bonds are intra-domain

Page 37: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Prediction of the disulfide-bonding state of cysteines in

proteins

Starting from the protein sequence can we

discriminate whether a cysteine residue is disulfide-bonded?

Problem no 1:

Page 38: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

NGDQLGIKSKQEALCIAARRNLDLVLVAP

bonded

Non bonded

Perceptron (input: sequence profile)Perceptron (input: sequence profile)

Page 39: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Plotting the trained weigthsPlotting the trained weigths

Residue

Hinton’s plot

bonding state

non bonding state

V L I M F W Y G A P S T C H R K Q E N D 0 & #

-5-4-3-2-1 0 1 2 3 4 5

Residue V L I M F W Y G A P S T C H R K Q E N D 0 & #

-5-4-3-2-1 0 1 2 3 4 5

Posi

tio

nPosi

tio

n

Residue

Page 40: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

End

Begin

1

3

2

4

Bonded statesFree states

It is possible to add a sintax?It is possible to add a sintax?

Page 41: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Bonding Residue State State

C40C43C68

End

Begin

1

3

2

4

A pathA path

Page 42: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Bonding Residue State State

C40 1 FC43C68

End

Begin

1

3

2

4

P(seq) = P(1 | Begin) P(C40 | 1) ...

A pathA path

Page 43: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Bonding Residue State State

C40 1 FC43 2 BC68

End

Begin

1

3

2

4

P(seq) = P(1 | Begin) P(C40 | 1) ... P(2 | 1) P(C43 | 2) ..

A pathA path

Page 44: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Bonding Residue State State

C40 1 FC43 2 BC68 4 B

End

Begin

1

3

2

4

P(seq) = P(1 | Begin) P(C40 | 1) ... P(2 | 1) P(C43 | 2) .. P(4 | 2) P(C68 | 4) ..

A pathA path

Page 45: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Bonding Residue State State

C40 1 FC43 2 BC68 4 B

End

Begin

1

3

2

4

P(seq) = P(1 | Begin) P(C40 | 1) ... P(2 | 1) P(C43 | 2) .. P(4 | 2) P(C68 | 4) .. P(End | 4)

A pathA path

Page 46: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

End

Begin

1

43

2

Bonding Residue State State

C40 1 FC43 1 FC68 1 F

End

Begin

1

43

2

Bonding Residue State State

C40 1 FC43 2 BC68 4 B

End

Begin

1

43

2

Bonding Residue State State

C40 2 BC43 4 BC68 1 F

End

Begin

1

43

2

Bonding Residue State State

C40 2 BC43 3 FC68 4 B

4 possible paths4 possible paths

Page 47: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

MYSFPNSFRFGWSQAGFQCEMSTPGSEDPNTDWYKWVHDPENMAAGLCSGDLPENGPGYWGNYKTFHDNAQKMCLKIARLNVEWSRIFPNP...

P(B|W1), P(F|W1) P(B|W3), P(F|W3)P(B|W2), P(F|W2)

W1 W2 W3

Free Cys

Bonded Cys

End

Begin

Viterbi path

Prediction of bonding state of cysteines

Hybrid systemHybrid system

Page 48: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Residue

C40 C43 C68

Prediction for TriparedoxinPrediction for Triparedoxin

Page 49: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

NN Output NN predResidue B F

C40 99 1 B C43 82 18 B C68 61 39 B

Prediction for TriparedoxinPrediction for Triparedoxin

Page 50: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

NN Output NN pred HMM HMM predResidue B F Viterbi path

C40 99 1 B 2 BC43 82 18 B 4 BC68 61 39 B 1 F

End

Begin

1

43

2

Prediction for TriparedoxinPrediction for Triparedoxin

Page 51: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Table I. Performance of the NN predictor (20-fold cross

validation) Set Q2 C Q(B) Q(F) P(B) P(F) Q2prot WD 80.4 0.56 67.2 87.5 74.3 83.2 56.9 RD 80.1 0.56 67.2 87.6 75.7 82.2 49.7

B= cysteine bonding state, F=cysteine free state. WD= whole database (969 proteins, 4136 cysteines) RD= Reduced database, in which the chains containing only one cysteine are

removed (782 proteins, 3949 cysteines).

Table II. Performance of the Hidden NN predictor (20-fold cross validation) Set Q2 C Q(B) Q(F) P(B) P(F) Q2prot WD 88.0 0.73 78.1 93.3 86.3 88.8 84.0 RD 87.4 0.73 78.1 92.8 86.3 88.0 80.2

Neural Network

Hybrid system

Martelli PL, Fariselli P, Malaguti L, Casadio R. -Prediction of the disulfide bonding state of cysteines in proteins with hidden neural networks- Protein Eng. 15:951-953 (2002)

PerformancePerformance

Page 52: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Prediction of the connectivity of disulfide bonds in proteins

When the bonding state of cysteines is known can we

predict the connectivity pattern of disulfide bonds?

Problem no 2:

Page 53: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Prediction of disulfide connectivity in proteins Bovine trypsin Inhibitor 6PTI

5 14 30 38 51 55

connectivity pattern

... Sequence

555

5114

38

30

N

C

Page 54: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Prediction of disulfide connectivity in proteins as a problem of maximum-weight perfect

matching

Cys4

Cys2

Cys3Cys1W24

W23W13

W14

W12

W34

N

C

Protein sequence

The undirected weighted graph with V=2B vertices (no of cysteines) and E=2B(2B-1)/2 undirected edges (strength of the interaction W)

Representation:

Page 55: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

•It is not necessary to compute all the possible connectivity patterns ( (i B) (2i-1)) •Given a complete graph G=(2B,E)

the matching with the maximum weight can be computed in a O((B)3) time

with the Edmonds-Gabow’s algorithm*

* Gabow, H.N. (1975). Technical Report,CU-CS-075-75, Dept. of Comp. Sci. Colorado University

From the Graph Theory:

Page 56: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

How to assign the costs (W) of the edges in the

graph

Cys4

Cys2

Cys3Cys1W24

W23W13

W14

W12

W34

N

C

Cys4

Cys2

Cys3Cys1

Cys4

Cys2

Cys3Cys1W24

W23W13

W14

W12

W34W24

W23W13

W14

W12

W34

N

C

N

C

Page 57: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Assumption: for each cysteine all its sequence nearest neighbours make

contacts

CN

Cys i

Cys j

neighbours (Ni)

neigh

bou

rs (N

j)

Cys i Cys j

All possible interactionsusing 1 nearest neighbour

Page 58: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

0

2

4

6

8

10

12

14

16

0 50 100 150 200 250 300 350 400 450

Sequence separation

Fre

qu

en

cy(%

)Frequency distribution of disulfide bonds with respect to sequence separation (726 proteins)

Page 59: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Neural Networks for predicting the edge values

Neural Networks for predicting the edge values

Output ( 1 node)

Hidden nodes(6 nodes)

Input(212 nodes)

Disulfide pair propensity (output = wij)

Each pair in the neighbours of 4 residues

+ Sequence separation + No of SS bonds

(210 + 2 Input nodes)

Page 60: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Accuracy (Qp) of EG vs NN

Chains B Random EG NN

158 2 0.333 0.46 0.68

153 3 0.067 0.17 0.21

103 4 0.009 0.11 0.20

44 5 0.001 0.00 0.02

Page 61: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

The state of art:

•Prediction of bonding states is quite satisfactory

•Prediction of connectivity needs to be improved

Page 62: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Prediction of FoldonsPrediction of Foldons

Piero Fariselli

Page 63: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

The Folding Problem as a Mapping Problem

The Folding Problem as a Mapping Problem

Covalent structureTTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN

Ct

Nt

3D structure

Secondary structureEEEE..HHHHHHHHHHHH....HHHHHHHH.EEEE...........

Page 64: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

We can collect from the PDB data base some 1500 chains of known structures from which to derive non redundant information relating sequence to:

• secondary structure

• structural and functional motifs

• 3D structure

Page 65: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

1 Y K D Y H S - D K K K G E L - - 2 Y R D Y Q T - D Q K K G D L - - 3 Y R D Y Q S - D H K K G E L - - 4 Y R D Y V S - D H K K G E L - - 5 Y R D Y Q F - D Q K K G S L - - 6 Y K D Y N T - H Q K K N E S - - 7 Y R D Y Q T - D H K K A D L - - 8 G Y G F G - - L I K N T E T T K 9 T K G Y G F G L I K N T E T T K 10 T K G Y G F G L I K N T E T T K

A 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D 0 0 70 0 0 0 0 60 0 0 0 0 20 0 0 0 E 0 0 0 0 0 0 0 0 0 0 0 0 70 0 0 0 F 0 0 0 10 0 33 0 0 0 0 0 0 0 0 0 0 G 10 0 30 0 30 0 100 0 0 0 0 50 0 0 0 0 H 0 0 0 0 10 0 0 10 30 0 0 0 0 0 0 0 K 0 40 0 0 0 0 0 0 10 100 70 0 0 0 0 100 I 0 0 0 0 0 0 0 0 30 0 0 0 0 0 0 0 L 0 0 0 0 0 0 0 30 0 0 0 0 0 0 0 0 M 0 0 0 0 0 0 0 0 0 0 0 0 0 60 0 0 N 0 0 0 0 10 0 0 0 0 0 30 10 0 0 0 0 P 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Q 0 0 0 0 40 0 0 0 30 0 0 0 0 0 0 0 R 0 50 0 0 0 0 0 0 0 0 0 0 0 0 0 0 S 0 0 0 0 0 33 0 0 0 0 0 0 10 10 0 0 T 20 0 0 0 0 33 0 0 0 0 0 30 0 30 100 0 V 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 W 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Y 70 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0

sequence position

Evolutionary information

•Multiple Sequence Alignment (MSA) of similar sequences

•Sequence profile: for each position a 20-valued vector contains the aminoacidic composition of the aligned sequences.

MS

ASe

quen

ce p

rofi

le

Page 66: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

The Early Stages of Folding:

Initiation SitesThe Unfolded Chain

Prediction of Initiation Sites of Protein FoldingPrediction of Initiation Sites of Protein Folding

Folded Protein

The Folding ProcessThe Folding Process

Page 67: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Frustration in proteins

• The simultaneous minimisation of all the interaction energies is impossible

• The simultaneous minimisation of all the interaction energies is impossible

Page 68: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

The network architecture

Output

Hidden

Input

Input Window

Non

..ALS.......QGFLLIARQPPFTYFTV......HW..

Page 69: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Q2 = 0.85 Q(H)= 0.67 Q(nonH) = 0.93 Sovpred = 0.85

C = 0.63 Pc(H) = 0.80 Pc(nonH) = 0.86 Sovobs = 0.76

The prediction efficiency of the network

Page 70: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

The conformation of residue R depends both on local (window W) and non local (context C) interactions.

The convergence theorem ensures that:Oi = Probability ( StructureR= i| W )

If , for any i, Oi 1 , then the structure of residue R depends mainly on W and only slightly on C

Context C

Residue RWindow W

O Onon

Neural Network

Theoretical background

Page 71: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

P ( | , ) ( , ) i i natW C ( W,C )

C

P W W C P Ci i( | ) ( | , ) ( ) P

P W W C P Ci i

C

i nat( | ) ( | , ) ( ) ( , (W) ) P

R W C• Anfinsen’s hypothesis:

• Averaging over all the contexts (performed by NN):

• When the pattern is self-stabilising (W dependent):

P ( | , )i W C P ( | )i W=

• Then the Anfinsen’s hypothesis can be cast in a local form:

Page 72: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7Entropy (S5)

Rel

iabi

lity

Inde

x

Relationship between the reliability index and the Shannon entropy

Page 73: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

S = i Oi log Oi

INPUT

O O non-

MAS..... QLMLKDFLNRTPL.........GHI

......... ..........

_

Page 74: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Entropy = Shannon-entropy in (ln 2)/10 units ( S = -i o i ln ( o i ) )NC = Number of protein segments correctly predicted in -helixNT = Total number of protein segments predicted in -helix

Protein segments correctly predicted in -helical structure

13579

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

0

20

40

60

80

100

NC / NT (%)

Entropy Segment length

Page 75: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

1 11 21 31 41 51 61 71 81 91 101 111 121

EntropyPredicted helices

Extracted fragments

Profile of the smoothed entropy (S5) for the hen egg lysozyme (132L)

Protein chain

S5

Page 76: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Hen egg lysozyme (132L)

C-terminus

N-terminus

Page 77: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.1 0.2 0.3 0.4 0.5 0.6 0.7

Entropy (S5)

Frequency Correct

WrongDifferences

0.0

Frequency distribution of predicted helical segments as a function of their entropy value

Threshold value

Page 78: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

An example of the data base of minimally frustrated protein fragments

http://www.biocomp.unibo.it/DB/

Page 79: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Training set from PDB

Number ofproteins

Number ofamino acids

Number of-helices

Averagelength

822 174191 4783 116

Number ofproteins

Number ofamino acids

Number of-helical segments

Averagelength

626 21553 3000 72

Data base of minimally frustrated -helical segments

Page 80: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Comparison of minimally frustrated segments with putative folding initiation sites experimentally determined

*Not yet experimentally detected

Page 81: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Comparison of minimally frustrated segments with peptides extracted from proteins

Code* Peptides* % Helix insolution*

Entropy(S5)

ExtractedSegment

3FXC TYKVTELINEAEGINETIDCDD 1 ##### ####3LZM GFTNSLRMLQQKRWDEAVNLAKS 10 0.262 WDEAVNL

“ 10 0.329 LRMLQQK3LZM-2 GVAGFTNSLRMLQQKRWDEAAVNLAKS 12 0.203 SLRMLQ

“ 12 0.210 DEAAVNLCIII ESLLERITRKLRDGWKRLIDIL 8 0.171 LLERIT

“ 8 0.260 WKRLIDCIII-L ESLLERITRKL 15 0.171 LLERITCIII-R RDGWKRLIDIL 4 0.260 WKRLIDCIII-M RITRKLRDGWK 2 #### ####Sigma KVATTKAQRKLFFNLRKTKQRL 9 0.218 TKAQRKCOMA1 DHPAVMEGTKTILETDSNLS 4 #### ####COMA2 EPSEQFIKQHDFSSY 3 #### ####COMA3 VNGMELSKQILQENPH 6 0.189 LSKQILQCOMA4 EVEDYFEEAIRAGLH 20 0.020 YFEEAIRCOMA5 KEKITQYIYHVLNGEIL 3 #### ####ARA1 AVGKSNLLSRYARNEFSA 2 #### ####ARA2 RFRAVTSAYYRGAVG 3 #### ####ARA3 TRRTTFESVGRWLDELKIHSD 7.5 0.194 SVGRWLARA4 AVSVEEGKALAEEEGLF 4 #### ####ARA5 STNVKTAFEMVILDIYNNV 3 #### ####G1 DTYKLILNGKTLKGETTTEA 2 #### ####G2 GDAATAEKVFKKIANDNGVD 4 #### ####G3 GEWTYDDATKTFTVTE 2 #### ####

* Muñoz and Serrano, 1994.

Page 82: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Minimally frustrated -helical segments are useful for determining:

• Folding initiation sites

• -helix stability

• de-novo design of -helices

Page 83: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Structure prediction of membrane proteins

Page 84: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS
Page 85: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Inner Membrane proteins(all -Transmembrane

proteins)

Outer Membrane proteins(all -Transmembrane

proteins)

Page 86: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Porin (Rhodobacter capsulatus)

Bacteriorhodopsin(Halobacterium salinarum)

Bila

yer

-barrel -helices

Outer Membrane

Inner Membrane

Page 87: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Predictors of the Topology of Membrane Proteins

position of Trans Membrane Segments along the sequenceTopography

++++ +

+

Topology

Bilayer

N

C

Out

In

position of N and C termini with respect to the bilayer

Lipidic Bilayer

Page 88: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Prediction of transmembrane segments

Page 89: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 70 0 0 0 0 60 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 33 0 0 010 0 30 0 30 0 100 0 0 0 0 0 0 10 0 0 10 30 0 40 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 30 0 0 0 0 0 0 0 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 0 0 0 30 0 50 0 0 0 0 0 0 0 0 0 0 0 0 33 0 0 020 0 0 0 0 33 0 0 0 0 0 0 0 10 0 0 0 0 0 10 0 0 0 0 0 0 070 0 0 90 0 0 0 0 0

TM nonTM

Window: 9 residues

5 hidden neurons

2 output neurons

Neural Network for the prediction of TMS in -barrel membrane proteins. (Jacoboni et al., 2001)

Page 90: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

A generic model for membrane proteins (TMHMM)

A generic model for membrane proteins (TMHMM)

Transmembrane Inner Side

Outer Side

End

Begin

Page 91: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Sequence-profile-based HMMSequence-profile-based HMM

085 0 0 5 0 0 0 0 2 0 8 0 0 0 0 0 0 0 0

0 0 0 0 4 013 0 4 0 5 0 6 0 023 0 144 0

0 022 023 0 0 5 023 0 3 011 0 0 2 011 0

034 0 0 024 0 0 0 0 0 2 022 018 0 0 0 0

8 0 0 0 0 0 0 0 0 0 0 092 0 0 0 0 0 0 0

90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 077 023

3 0 2 7 4 0 8 6 1 3 6 5 512 5 617 2 2 6

..A C L P R P E T ...

t

Sequence of characters ct

Sequence of A-dimensional

vectors

st

0 st (n) S t,n S=100

k=1 st (n) = S t A

90 0 0 0 0 0 0 0 010 0 0 0 0 0 0 0 0 0 0

n

For proteins A=20

Constraints

Martelli et al., Bioinformatics 18, S46-53, 2002

Page 92: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

The new algorithms make possible:

•to feed HMMs with sequence profiles

•to eventually couple NNs and HMMs (Hidden Neural Networks)

Advantages:

•Higher performance than standard HMMs

•Increased discrimination capability of a given class

Martelli et al., Bioinformatics, 2002Martelli et al., Protein Eng. 2002,

Page 93: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Prediction of the Topology of -Transmembrane Proteins

position of Trans Membrane Helices along the sequenceTopography

++++ +

+

Topology

Bilayer

N

C

Out

In

position of N and C termini with respect to the bilayer

The prediction accuracy of topography is 92%

The prediction accuracy of topology is 81 %

Page 94: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

position of Transmembrane Strands along the sequenceTopography:

Prediction of the Topology of -Transmembrane Proteins

++++ +

+

Topology:

Bilayer

N

C

LPS (Out)

Periplasmic (In)

position of N and C termini with respect to the bilayer

The prediction accuracy of topography is 73 %

The prediction accuracy of topology is 73 %

Page 95: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

0

10

20

30

40

50

60

70

80

90

100

2.75 2.8 2.85 2.9 2.95

Per

cent

age

Outer membrane

Globular

Inner membrane

I(s | M) = -1/L log P(s | M)

The discriminative capability of the HMM model

Page 96: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

An application: modeling the 3D structure of eukaryotic barrel

proteins

Page 97: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

New folds Existing folds

Threading/ fold

recognition

Ab initio prediction

Building by homology

Homology (%)

0 10 20 30 40 50 60 70 80 90 100

3D structure prediction of proteins

Membrane proteins

Page 98: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

2omf_.seq/ AEIYNKDGNK VDLYGKAVGL HYFSKGNGEN SYGGNGDMTY ARLGFKGETQ 2omf_.str/ CCCCCCCCEE EEEEEEEEEE EEECCCCCCC CCCCCCCCCE EEEEEEEEEE protx.str/ *******CCC CCCCEEEEEE EEEC****** ********CE EEEEEEEECC protx.seq/ *******KGY NFGLWKLDLK TKTS****** ********SG IEFNTAGHSN 2omf_.seq/ I*NSDLTGYG QWEYNFQGNN SEGADAQTGN KTRLAFAGLK YADVGSFDYG 2omf_.str/ C*CCCEEEEE EEEEEEECCC CCCCCCCCCC EEEEEEEEEE ECCCEEEEEE protx.str/ CCCCCEEEEE EEEEEEC*** ********** EEEEEEEEEC CCCCCEEEEE protx.seq/ QESGKVFGSL ETKYKVK*** ********** DYGLTLTEKW NTDNTLFTEV 2omf_.seq/ RNYGVVYDAL GYTDMLPEFG GDTAYSDDFF VGRVGGVATY RNSNFFGLVD 2omf_.str/ ECCCCCCCCC CCCCCCCCCC CCCCCCCCCC CCCCCCEEEE EECCCCCCCC protx.str/ EEEECC**** ********** ********** **CCEEEEEE EEECCCCCCC protx.seq/ AVQDQL**** ********** ********** **LEGLKLSL EGNFAPQSGN 2omf_.seq/ GLNFAVQYLG KNER****** *********D TARRSNGDGV GGSISYEYE* 2omf_.str/ CEEEEEEEEC CCCC****** *********C CCCCCCCCEE EEEEEEEEC* protx.str/ EEEEEEEEEE EEEECCCCCC CCCCCCCEEE EEEEEEEEEE EEEEEEECCC protx.seq/ KNGKFKVAYG HENVKADSDV NIDLKGPLIN ASAVLGYQGW LAGYQTAFDT 2omf_.seq/ **GFGIVGAY GAADRTNLQE AQPLGNGKKA EQWATGLKYD ANNIYLAANY 2omf_.str/ **CEEEEEEE EEEECCCCCC CCCCCCCCEE EEEEEEEEEE ECCEEEEEEE protx.str/ CCEEEEEEEE EEEEEEEEEE EEECCCCCCC EEEEEEEEEE CEEEEEEEEE protx.seq/ QQSKLTTNNF ALGYTTKDFV LHTAVNDGQE FSGSIFQRTS DKLDVGVQLS 2omf_.seq/ GETRNATPIT NKFTNTSGFA NKTQDVLLVA QYQFDFGLRP SIAYTKSKAK 2omf_.str/ EEEECCCCCC CCCCCCCCCC CEEEEEEEEE EEECCCCEEE EEEEEEEEEE protx.str/ EEECC***** ********** *CCCEEEEEE EEECCCCEEE EEEEEEC*** protx.seq/ WASGT***** ********** *SNTKFAIGA KYQLDDDARV RAKVNNA*** 2omf_.seq/ DVEGIGDVDL VNYFEVGATY YFNKNMSTYV DYIINQIDSD NKLGVGSDDT 2omf_.str/ CCCCCCCEEE EEEEEEEEEE ECCCCEEEEE EEEEECCCCC CCCCCCCCCE protx.str/ *********E EEEEEEEEEE EC***EEEEE EEEEECCC** *****CCCCE protx.seq/ *********S QVGLGYQQKL RT***GVTLT LSTLVDGK** *****NFNAG 2omf_.seq/ VAVGIVYQF* *** 2omf_.str/ EEEEEEEEE* *** protx.str/ EEEEEEEEEE EC* protx.seq/ GHKIGVGLEL EA*

Structural alignment of VDAC with the template

Page 99: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

A low resolution 3D Model of VDAC the sequence from Neurospora crassa)

Casa

Page 100: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

A low resolution 3D model of VDAC:location of mutated residues

Casadio et al., FEBS Lett 520:1-7 (2002)

Page 101: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Predictors of membrane protein structures can be used to filter genomes and find new

membrane proteins without sequence homologoues

Page 102: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

FISHING NEW OUTER MEMBRANE PROTEINS IN

GRAM-NEGATIVE BACTERIA

 

FISHING NEW OUTER MEMBRANE PROTEINS IN

GRAM-NEGATIVE BACTERIA

 

Page 103: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

MRAKLLGIVLTTPIAISSFASTETLSFTPDNINADISLGTLSGKTKERVYLAEEGGRKVSQLDWKFNNAAIIKGAINWDLMPQISIGAAGWTTLGSRGGNMVDQDWMDSSNPGTWTDESRHPDTQLNYANEFDLNIKGWLLNEPNYRLGLMAGYQESRYSFTARGGSYIYSSEEGFRDDIGSFPNGERAIGYKQRFKMPYIGLTGSYRYEDFELGGTFKYSGWVESSDNDEHYDPGKRITYRSKVKDQNYYSVAVNAGYYVTPNAKVYVEGAWNRVTNKKGNTSLYDHNNNTSDYSKNGAGIENYNFITTAGLKYTF

Signal peptides in protein sequences:

Sequences of outer membrane proteins have signal peptides:

the secretion marker is also a marker of outer membrane proteins

Proteins have intrinsic signals that govern their transport and localization in the cell: a secretion hydrophic marker (or signal peptide)

Page 104: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

MKLLQRGVALALLTTFTLASETALAYEQDKTYKITVLHTNDHHGHF

Signal Pepetide Mature protein

Cleavage site

Signal Peptide prediction

Page 105: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

MKLLQRGVALALLTTFTLASETALAYEQDKTYKITVLHTNDHHGHF

Predicts if a given residue position belongs to the Signal Pepetide

2 Neural Networs

SignalNet CleavageNet

Predicts if a given residue position is the cleavage site

Page 106: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Organism Window C Q2

Eukaryotes 15-1-15 0.83 0.95 Gram positive 15-1-15 0.79 0.92Gram negative 11-1-11 0.78 0.92

SignalNet Accuracy

Page 107: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Organism Window C Q2

Eukaryotes 15-1-2 0.61 0.97 Gram positive 20-1-3 0.56 0.96 Gram negative 11-1-2 0.62 0.96

CleavageNet Accuracy

Page 108: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Organism SignalP SPEP

Eukaryotes (+) 0.99 0.97 Eukaryotes (-) 0.85 0.94

Prokaryotes(+) 0.99 0.97Prokaryotes (-) 0.93 0.96

Escherichia coli(+/-) 0.95 0.96

Comparison with SignalP

Page 109: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Performance of SignalNN on 2160 annotated proteins

250

Prediction

An

nota

tion

2160

Withoutsignal Total

Withsignal

Wit

hou

tsig

nal

Tota

lW

ith

sig

nal

260 1900

1910

205

1855

Correct predictions

55

45

Wrong predictions

Q2 = 96 %

Qsignal = 82 %Qnon-signal = 97 %

Psignal = 78 %Pnon-signal = 98 %

Page 110: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

Predictors of Membrane Topography: Rate of false positives

The predictors are tested on on 809 globular protein with sequence identity 25 % :

0.5 % have at least 1 -TM helix predicted

5.6 % have at least 2 -TM strand predicted

Page 111: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

PROTEOME

Signal peptide

Yes

All- TM All- TM

No

No

All- TM

Yes

all -TM

Yes

all -TMY

esall -TM

No

Globular

No

Globular

HUNTER

Page 112: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS

* the number of new proteins predicted in the class with Hunter, out of the non-annotated region

Predicting globular, inner and outer membrane proteins in genomes of Gram-negative bacteria with

Hunter

Page 113: Prediction of structural and functional features in proteins starting from the residue sequence INTRODUCTION TO NEURAL NETWORKS