Prediction of Coordination Number and Relative Solvent Accessibility in Proteins Computational Aspects Yacov Lifshits 13.04.2005

Prediction of Coordination Prediction of Coordination Number and Relative Number and Relative

SolventSolventAccessibility in ProteinsAccessibility in Proteins

Computational AspectsComputational Aspects

Yacov LifshitsYacov Lifshits13.04.200513.04.2005

IntroductionIntroduction

There is no known algorithm for predicting solvent There is no known algorithm for predicting solvent accessibility or coordination number.accessibility or coordination number.

Many different approaches were tried, and most of Many different approaches were tried, and most of them utilized the concept of them utilized the concept of neural networksneural networks..

We shall discuss what these networks are, how do We shall discuss what these networks are, how do they work, and how we use them for our cause.they work, and how we use them for our cause.

NeuronsNeurons

The human brain consists has about 100 billion (10The human brain consists has about 100 billion (101111) ) neurons and 100 trillion (10neurons and 100 trillion (101414) connections ) connections (synapses) between them. (synapses) between them.

Many highly specialized types of neurons exist, and Many highly specialized types of neurons exist, and these differ widely in appearance. Characteristically, these differ widely in appearance. Characteristically, neurons are highly asymmetric in shape.neurons are highly asymmetric in shape.

Neurons – cont.Neurons – cont.

Here is what a typical neuron looks like:

Neural NetworksNeural Networks

Like the brain, an neural network is a massively parallel collection of small and simple processing units where the interconnections form a large part of the network's intelligence.

Neural Networks – cont.Neural Networks – cont.

Artificial neural networks, however, are quite different from the brain in terms of structure.

For example, a neural network is much smaller than the brain. Also, the units used in a neural network are typically far simpler than neurons.

Nevertheless, certain functions that seem exclusive to the brain, such as learning, have been replicated on a simpler scale with neural networks.

Neural Networks FeaturesNeural Networks Features

• Large number of very simple units

• Connections through weighted links

• There is no “program”. The “program” is the architectureof the network.

• There is no central control. If a portion of the network is damaged, the network is still functional. Hopefully.

• A human observer can’t understand what is going on inside the network. It is a sort of a “black box”.

Neural Networks - intuitionNeural Networks - intuition

Neural networks are our way to model a brain.

The good thing about them is that you do not program them, but instead teach them, like children.

For example, if you show a kid a couple of trees, he will know to identify other trees. Few humans can actually explain how to identify a tree as such, and even fewer (if any) can write a program to do that.

So when you don’t know how the algorithm to do something, neural networks are your second-best solution.

• Speech recognition• Speech synthesis• Image recognition• Pattern recognition• Stock market prediction• Robot control and navigation

Neural Networks usesNeural Networks uses

Artificial NeuronsArtificial Neurons

The neuron is a basically a simple calculator.

It calculates a weighted sum of the inputs (plus a bias input), and applies an activation function to the result.

Artificial Neurons – exampleArtificial Neurons – example

The above example proves that neural networks are universal – you can implement any function with them.

TrainingTraining

How do we train a neural network?

The task is similar to teaching a student. How do we do that?

• First, we show him some examples.

• After that, we ask him to solve some problems.

• Finally, we correct him, and start the whole process again.

Hopefully, he’ll get it right after a couple of rounds…

Training – Simple caseTraining – Simple case

Training the network means adjusting its weights so that it gives correct output.

It’s rather easy to train a network that has no hidden layers (called a perceptron).

This is done by the “gradient descent” algorithm.

The idea behind this algorithm is to adjust the weights to minimize some measure of the error. The classical measure is sum of squared errors:

( h(x) is the real output, and y is the true (“needed”)

value)

This is an optimization search in a weight space.

Training – Gradient descentTraining – Gradient descent

First, we compute the gradient function:

And then we update the weight as follows:

(α is the learning rate)

Training – Gradient descentTraining – Gradient descent

Training – Back PropagationTraining – Back Propagation

In most real-life problems we need to add a “hidden level”.

This turns out to be a problem – the training data does not specify what the hidden layer’s output should be.

It turns out that we can back-propagate the error from the output layer to the hidden ones.

It works like a “generalized” gradient descent algorithm.

Training – Back PropagationTraining – Back Propagation

Problems of back propagation:

• The algorithm is slow, if it does converge.

• It may get stuck in a local minimum.

• It is sensitive to initial conditions.

• It may start oscillating.

Training – OBDTraining – OBD

A different kind of training algorithm is “Optimal Brain Damage”, which tries to remove “unneeded” neurons.One of its uses is to prune the network after training.

The algorithm begins with a fully connected network and removes connections from it.

After the network is trained, the algorithm identifies an optimal selection of connections that can be dropped.

The network is retrained, and if the performance has not decreased, the process is repeated.

Training – cont.Training – cont.

Two main training modes are:

• Batch: update weights as a sum over all the training set.

• Incremental: update weights after each pattern presented.

Incremental mode requires less space complexity, and is less likely to get stuck in a local minimum.

Each cycle through the training set is called an epoch.

Training – General problemsTraining – General problems

Overfitting: If we give a lot of unnecessary parameters to the network, or train it for too long, it may “get lost”.

An extreme case of overfitting is a network that has 100% success rateon the training set, but behaves randomly on other values.

Poor data set: To train a network well, we must include all possible cases (but not too much). The network learns the easiest feature first!

Back to the problemBack to the problem

Until now, we discussed neural networks where all connections go “forward” only (called feed-forward networks).

They are not suitable for problem we are trying to solve, however.

The major weakness of FF networks is the use of a local input window of fixed size, which cannotprovide any access to long-range information.

Back to the problem – cont.Back to the problem – cont.

Networks for contact prediction, for instance, have windows of size 1–15 amino acids.

Larger windows usually do not work, in part because the increase in the number of parameters leads to overfitting. There are ways to prevent it, but it is no the main problem.

The main problem is that long-range signals are very weak compared to the additional “noise” introduced by a larger window. Thus, larger windows tend to dilute sparse information present in the input that is relevant for the prediction.

Recurrent Neural NetworksRecurrent Neural Networks

One of the methods used to try to overcome these limitations consists of using bidirectional recurrent neural networks (BRNNs).

An RNN is a neural network that allows “backward” connections. This means, that it can re-process the output.

Our brain, as you can surely guess, is a recurrent network.

Sadly, the issues of RNN training and architecture are too complicated for this lecture. An intuitive explanation will be presented, however.

The Network usedThe Network used

The BRNNThe BRNN

The input, vector It, encodes the external input attime t. In the most simple case, it encodes one amino acid, using orthogonal encoding.

The output prediction has the functional form

Ot = η(Ft , Bt , It)

and depends on the forward (upstream) context Ft, thebackward (downstream context) Bt, and the input It attime t.

The Past and the FutureThe Past and the Future

Ft and Bt store information about the “past” and the “future” of the sequence. They make the whole difference, because now we can utilize global information.

The funtions satisfy the recurrent bidirectional equations:

Ft = φ(Ft-1, It)Bt = β(Bt+1, It)

where φ() and β() are learnable nonlinear state transitionfunctions, implemented by two NNs (leftand right subnetworks in the picture).

The Past and the FutureThe Past and the Future

Intuitively, we can think of Ft and Bt as “wheels” that can be rolled along the protein.

To predict the class at position t, we roll the wheels in opposite directions from the N- and C-terminus up to position t and then combine what is read on the wheels with It to calculate the proper output using η.

The difference it makesThe difference it makes

Hence the probabilistic output Ot depends on the input It and on the contextual information from the entire protein, summarized by the pair (Ft , Bt)!

In contrast, in a conventional NN approach, the probability distribution depends only on a relatively short subsequence of amino acids.

Training the BRNNsTraining the BRNNs

The BRNNs are trained by back-propagation on the relative entropy error between the output and target distributions.

The authors use a hybrid between online and batch training, with 300 batch blocks (2–3 proteins each) per training set. Thus, weights are updated 300 times per epoch after each block.

When the error does not decrease for 50 consecutive epochs, the learning rate is divided by 2, and training is restarted. Training stops after 8 or more reductions, which usually happens after 1,500–2,500 epochs.

The experimentThe experiment

The authors used a network of 16 Sun Microsystems UltraSparc workstations for training and testing, roughly equivalent to 2 years of a single CPU.

Seven different BRNNs were trained, and the final result was the average of their outputs.

The resultsThe results

The predictors achieve performances within the71–74% range for contact numbers, depending on radius,and greater than 77% for accessibility in the most interestingrange.

These improve the previous results by 3-7%.

The results – cont.The results – cont.

The improvements are believed to be due to both an increase in the size of the training sets and in the architectures developed.

The most important thing about the architectures is their ability to capture long-range interactions that are beyond the reach of conventional feed-forward neural networks, with their relatively small and fixed input window sizes.

ReferencesReferences

• Prediction of Coordination Number and Relative SolventAccessibility in Proteins, Baldi et al.

• Exploiting the past and the future in protein secondarystructure prediction, Baldi et al.

• Artificial Intelligence - a modern approach, Russel, Norvig.

Documents

Prediction of Coordination Number and Relative Solvent Accessibility in Proteins Computational Aspects Yacov Lifshits 13.04.2005