Upload
ghazi
View
42
Download
0
Embed Size (px)
DESCRIPTION
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen. Use of artificial neural networks. A data-driven method to predict a feature, given a set of training data In biology input features could be amino acid sequence or nucleotides Secondary structure prediction - PowerPoint PPT Presentation
Citation preview
Artificial Neural Networks
Thomas Nordahl Petersen &Morten Nielsen
• A data-driven method to predict a feature, given a set of training data
• In biology input features could be amino acid sequence or nucleotides
• Secondary structure prediction
• Signal peptide prediction
• Surface accessibility
• Propeptide prediction
Use of artificial neural networks
N C
Signalpeptide
Propeptide Mature/active protein
Neural network prediction methodshttp://www.cbs.dtu.dk/services/
Pattern recognition
Biological Neural network
Biological neuron structure
Synapse
Neuron
TerminalSeveral connections
Diversity of interactions in a network enables complex calculations
• Similar in biological and artificial systems
• Excitatory (+) and inhibitory (-) relations between compute units
fire0
1
Transfer of biological principles to artificial neural network algorithms
• Non-linear relation between input and output
• Massively parallel information processing
• Data-driven construction of algorithms
• Ability to generalize to new data items
Sparse encoding of amino acid sequence windows
Sparse encoding
Inp Neuron 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
AAcid
A 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
R 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
N 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
D 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
C 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Q 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
E 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
BLOSUM encoding (Blosum50 matrix)
A R N D C Q E G H I L K M F P S T W Y V A 4 -1 -2 -2 0 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -3 -2 0R -1 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -3 -2 -3N -2 0 6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3D -2 -2 1 6 -3 0 2 -1 -1 -3 -4 -1 -3 -3 -1 0 -1 -4 -3 -3C 0 -3 -3 -3 9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1Q -1 1 0 0 -3 5 2 -2 0 -3 -2 1 0 -3 -1 0 -1 -2 -1 -2E -1 0 0 2 -4 2 5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2G 0 -2 0 -1 -3 -2 -2 6 -2 -4 -4 -2 -3 -3 -2 0 -2 -2 -3 -3H -2 0 1 -1 -3 0 0 -2 8 -3 -3 -1 -2 -1 -2 -1 -2 -2 2 -3I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 2 -3 1 0 -3 -2 -1 -3 -1 3L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 -1 -3 -1 0 -1 -3 -2 -2M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 0 -2 -1 -1 -1 -1 1F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 -4 -2 -2 1 3 -1P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 -1 -1 -4 -3 -2S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 1 -3 -2 -2T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -2 -2 0W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 2 -3Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 -1V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
Sequence encoding (continued)
• Sparse encoding
– V:0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
– L:0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
– V.L=0 (unrelated)
• Blosum encoding
– V: 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
– L:-1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1
– V.L = 0.88 (highly related)
– V.R = -0.08 (close to unrelated)
I1 I2 I3
h1 h2
O1
Input
h1 h1
hidden
output
= 1/ (1+e-x)
o=H1*v1,1 + H2*v2,1O1 = (o)
w1,1
v1,1
w3,1
v2,1
Error = O - True
w1,2
Sigmodial or logistic function
Training and error reduction
Training and error reduction
Training and error reduction
Size matters
ß-strand
Helix
TurnBend
Secondary Structure Elements
Neural Network Architecture
IKEEHVI IQAE
HEC
IKEEHVIIQAEFYLNPDQSGEF…..Window
Input Layer
Hidden Layer
Output Layer
Weights
• Normally the best prediction is obtained by averaging• results from several predictions - “wisdom of the crowd
• Two types of neural networks• Prediction of features in classes/bins e.g. H, E or C (1,0,0)
• Values close to 1 or 0 are more accurate than values close to 1/2
• Prediction of real values e.g. Surface accessibility (0.43)• Reliability of a prediction is more difficult to estimate
Predictions and reliability of a prediction
Eukaryotic SP & TM
Signal peptide cleavage1523 seq
C-terminal end ofTM-regions669 seq
Signal peptide prediction
Signal pepdide likenessCleavage siteCombined information
Propeptide predictionMany secretory proteins and peptides are synthesized as inactive precursors that inaddition to signal peptide cleavage undergo post-translational processing to becomebiologically active polypeptides. Precursors are usually cleaved at sites composed ofsingle or paired basic amino acid residues by members of the subtilisin/kexin-likeproprotein convertase (PC) family. In mammals, seven members have been identified,with furin being the one first discovered and best characterized. Recently, theinvolvement of furin in diseases ranging from Alzheimer's disease and cancer toanthrax and Ebola fever has created additional focus on proprotein processing.We have developed a method for prediction of cleavage sites for PCs based onartificial neural networks. Two different types of neural networks have beenconstructed: a furin-specific network based on experimental results derived fromthe literature, and a general PC-specific network trained on data from the Swiss-Protprotein database. The method predicts cleavage sites in independent sequences witha sensitivity of 95% for the furin neural network and 62% for the general PC network.
Protein Engineering, Design and Selection: 17: 107-112, 2004.
General cleavage: R/K-Xn-R/K , n=0, 2, 4, 6
Furin cleavage: R-X-R/K-R
Propeptide prediction
Furin cleavage