Upload
kenen-bhandhavi
View
215
Download
0
Embed Size (px)
Citation preview
7/29/2019 NeuralNetworksForSecStructure (1)
1/21
Artificial Neural Networks forSecondary Structure Prediction
7/29/2019 NeuralNetworksForSecStructure (1)
2/21
Artificial Neural Networks
A problem-solving paradigm modeled after the
physiological functioning of the human brain.
Synapses in the brain are modeled by
computational nodes. The firing of a synapse is modeled by input, output,
and threshold functions.
The network learns based on problems to which
answers are known (in supervised learning).
The network can then produce answers to entirely
new problems of the same type.
7/29/2019 NeuralNetworksForSecStructure (1)
3/21
Applications of
Artificial Neural Networks
speech recognition
medical diagnosis image compression
financial prediction
7/29/2019 NeuralNetworksForSecStructure (1)
4/21
Existing Neural Network Systems for
Secondary Structure Prediction
First systems were about 62% accurate.
Newer ones are about 70% accurate when
they take advantage of information frommultiple sequence alignment.
PHD
NNPREDICT
7/29/2019 NeuralNetworksForSecStructure (1)
5/21
Applications in Bioinformatics
Translational initiation sites and promoter
sites in E. coli
Splice junctions Specific structural features in proteins such
as -helical transmembrane domains
7/29/2019 NeuralNetworksForSecStructure (1)
6/21
Neural Networks Applied to
Secondary Structure Prediction
Create a neural network (a computer program)
Train it uses proteins with known secondary
structure.
Then give it new proteins with unknown structureand determine their structure with the neural
network.
Look to see if the prediction of a series of residues
makes sense from a biological point of view e.g.,
you need at least 4 amino acids in a row for an -
helix.
7/29/2019 NeuralNetworksForSecStructure (1)
7/21
Example Neural Network
From Bioinformatics by David W. Mount, p. 453
Training pattern
One of n inputs, each with 21 bits
7/29/2019 NeuralNetworksForSecStructure (1)
8/21
Inputs to the Network
Both the residues and target classes are encoded inunary format, for example
Alanine: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Cysteine: 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Helix: 1 0 0
Each pattern presented to the network requires n 21-bitinputs for a window of size n. (One bit is required perresidue to indicate when the window overlaps the end of
the chain). The advantage of this sparse encoding scheme is that it
does not pay attention to ordering of the amino acids
The main disadvantage is that it requires a lot of input.
7/29/2019 NeuralNetworksForSecStructure (1)
9/21
Weights
Input values at each layer are multiplied by weights.
Weights are initially random.
Weights are adjusted after the output is computed
based on how close the output is to the rightanswer.
When the full training session is completed, the
weights have settled on certain values.
These weights are then used to compute output for
new problems that werent part of the training set.
7/29/2019 NeuralNetworksForSecStructure (1)
10/21
Neural Network Training Set
A problem-solving paradigm modeled after the
physiological functioning of the human brain.
A typical training setcontains over 100 non-
homologous protein chains comprising more than15,000 training patterns.
The number of training patterns is equal to the total
number of residues in the 100 proteins.
For example, if there are 100 proteins and 150
residues per protein there would be 15,000 training
patterns.
7/29/2019 NeuralNetworksForSecStructure (1)
11/21
Neural Network Architecture
A typical architecture has a window-size of n and 5hidden layer nodes.*
Then a fully-connected would be 17(21)-5-3network, i.e. a net with an input window of 17, five
hidden nodes in a single hidden layer and threeoutputs.
Such a network has 357 input nodes and 1,808weights.
((17 * 21) * 5) + (5 * 3) + 5 + 3 = 1808?*This information is adapted from Protein Secondary Structure Prediction with
Neural Networks: A Tutorial by Adrian Shepherd (UCL),
http://www.biochem.ucl.ac.uk/~shepherd/sspred_tutorial/ss-index.html.)
http://www.biochem.ucl.ac.uk/~shepherd/sspred_tutorial/ss-index.htmlhttp://www.biochem.ucl.ac.uk/~shepherd/sspred_tutorial/ss-index.htmlhttp://www.biochem.ucl.ac.uk/~shepherd/sspred_tutorial/ss-index.htmlhttp://www.biochem.ucl.ac.uk/~shepherd/sspred_tutorial/ss-index.html7/29/2019 NeuralNetworksForSecStructure (1)
12/21
Window
The n-residue window is moved across the
protein, one residue at a time.
Each time the window is moved, the center
residue becomes the focus.
The neural network learns what secondary
structure that residue is a part of. It keeps
adjusting weights until it gets the right answerwithin a certain tolerance. Then the window
is moved to the right.
7/29/2019 NeuralNetworksForSecStructure (1)
13/21
Artificial Neuron (aka node)
iin
)( jsgWi,jaj
ja
Input
LinksInput
FunctionTrigger
FunctionOutput
ai = g(ini)
7/29/2019 NeuralNetworksForSecStructure (1)
14/21
Trigger Function
Each hidden layer node sums its weighted inputs and firesan output accordingly.
A simple trigger function (called a thresholdfunction): send 1to the output if the inputs sum to a positive number; otherwise,send 0.
The sigmoid function is used more often:
sj is the sum of the weighted inputs.
As k increases, discrimination between weak and stronginputs increases.
)1(
1* jske
7/29/2019 NeuralNetworksForSecStructure (1)
15/21
Adjusting Weights With
Back Propagation
The inputs are propagated through the
system as described above.
The outputs are examined and compared to
the right answer.
Each weight is adjusted according to its
contribution to the error.
7/29/2019 NeuralNetworksForSecStructure (1)
16/21
Refinements or Variations of
Method
Use more biological information
Seehttp://www.biochem.ucl.ac.uk/~shepherd/sspr
ed_tutorial/ss-pred-new.html#beyond_bioinf
http://www.biochem.ucl.ac.uk/~shepherd/sspred_tutorial/ss-pred-new.htmlhttp://www.biochem.ucl.ac.uk/~shepherd/sspred_tutorial/ss-pred-new.htmlhttp://www.biochem.ucl.ac.uk/~shepherd/sspred_tutorial/ss-pred-new.htmlhttp://www.biochem.ucl.ac.uk/~shepherd/sspred_tutorial/ss-pred-new.htmlhttp://www.biochem.ucl.ac.uk/~shepherd/sspred_tutorial/ss-pred-new.htmlhttp://www.biochem.ucl.ac.uk/~shepherd/sspred_tutorial/ss-pred-new.htmlhttp://www.biochem.ucl.ac.uk/~shepherd/sspred_tutorial/ss-pred-new.htmlhttp://www.biochem.ucl.ac.uk/~shepherd/sspred_tutorial/ss-pred-new.html7/29/2019 NeuralNetworksForSecStructure (1)
17/21
Predictions Based on Output
Predictions are made on a winner-takes-all
basis.
That is, the prediction is determined by the
strongest of the three outputs. For example,
the output (0.3, 0.1, 0.1) is interpreted as a
helix prediction.
7/29/2019 NeuralNetworksForSecStructure (1)
18/21
Performance Measurements
How do you know if your neural network performs
well?
Test it on proteins that are not included in the training set
but whose structure is known. Determine how often it gets the right answer.
What differentiates one neural network from
another?
Its architecture whether or not it has hidden layers, howmany nodes are used.
Its mathematical functions the trigger function, the back-
propagation algorithm.
7/29/2019 NeuralNetworksForSecStructure (1)
19/21
Balancing Act in
Neural Network Training
The network should NOT just memorize the
training set.
The network should be able to generalize
from the training set so that it can solve
similar but not identical problems.
Its a matter of balancing the # of training
patterns vs. # network weights vs. # hiddennodes vs. # of training iterations
7/29/2019 NeuralNetworksForSecStructure (1)
20/21
Disadvantages to
Neural Networks
They are black boxes. They cannot explain
why a given pattern has been classified asx
rather than y. Unless we associate other
methods with them, they dont tell us anythingabout underlying principles.
7/29/2019 NeuralNetworksForSecStructure (1)
21/21
Summary
Perceptrons (single-layer neural networks)can be used to find protein secondarystructure, but more often feed-forward multi-
layer networks are used. Two frequently-used web sites for neural-
network-based secondary structure predictionare PHD (http://www.embl-heidelberg.de/predictprotein/predictprotein.html ) andNNPREDICT(http://www.cmpharm.ucsf.edu/~nomi/nnpredict.html)
http://www.embl-heidelberg.de/predictprotein/predictprotein.htmlhttp://www.embl-heidelberg.de/predictprotein/predictprotein.htmlhttp://www.cmpharm.ucsf.edu/~nomi/nnpredict.htmlhttp://www.cmpharm.ucsf.edu/~nomi/nnpredict.htmlhttp://www.embl-heidelberg.de/predictprotein/predictprotein.htmlhttp://www.embl-heidelberg.de/predictprotein/predictprotein.htmlhttp://www.embl-heidelberg.de/predictprotein/predictprotein.html