Biological Neurons

Biological Neurons

The brain is principally composed of about 10 billion neurons, each connected to about 10,000 other neurons. Each of the yellow blobs in the picture above are neuronal cell bodies (soma), and the lines are the input and output channels

(dendrites and axons) which connect them. Each neuron receives electrochemical inputs from other neurons at the dendrites. If the sum of these electrical inputs is

sufficiently powerful to activate the neuron, it transmits an electrochemical signal along the axon, and passes this signal to the other neurons whose dendrites are attached at any of the axon terminals. These attached neurons may then fire.

It is important to note that a neuron fires only if the total signal received at the cell body exceeds a certain level. The neuron either fires or it doesn't, there aren't different grades of firing.

So, our entire brain is composed of these interconnected electro-chemical transmitting neurons. From a very large number of extremely simple processing units (each performing a weighted sum of its inputs, and then firing a binary signal if the total input exceeds a certain level) the brain manages to perform extremely complex tasks.

This is the model on which artificial neural networks are based. Thus far, artificial neural networks haven't even come close to modeling the complexity of the brain, but they have shown to be good at problems which are easy for a human but difficult for a traditional computer, such as image recognition and predictions based on past knowledge.

History: The 1940's to the 1970'sIn 1943, neurophysiologist Warren McCulloch and mathematician Walter Pitts wrote a paper on how

neurons might work. In order to describe how neurons in the brain might work, they modeled a simple neural network using electrical circuits.

In 1949, Donald Hebb wrote The Organization of Behavior, a work which pointed out the fact that neural pathways are strengthened each time they are used, a concept fundamentally essential to the ways in which humans learn. If two nerves fire at the same time, he argued, the connection between them is enhanced.

As computers became more advanced in the 1950's, it was finally possible to simulate a hypothetical neural network. The first step towards this was made by Nathanial Rochester from the IBM research laboratories. Unfortunately for him, the first attempt to do so failed.

In 1959, Bernard Widrow and Marcian Hoff of Stanford developed models called "ADALINE" and "MADALINE." In a typical display of Stanford's love for acronymns, the names come from their use of Multiple ADAptive LINear Elements. ADALINE was developed to recognize binary patterns so that if it was reading streaming bits from a phone line, it could predict the next bit. MADALINE was the first neural network applied to a real world problem, using an adaptive filter that eliminates echoes on phone lines. While the system is as ancient as air

traffic control systems, like air traffic control systems, it is still in commercial use.

In 1962, Widrow & Hoff developed a learning procedure that examines the value before the weight adjusts it (i.e. 0 or 1) according to the rule: Weight Change = (Pre-Weight line value) * (Error / (Number of Inputs)). It is based on the idea that while one active perceptron may have a big error, one can adjust the weight values to distribute it across the network, or at least to adjacent perceptrons. Applying this rule still results in an error if the line before the weight is 0, although this will eventually correct itself. If the error is conserved so that all of it is distributed to all of the weights than the error is eliminated.

Despite the later success of the neural network, traditional von Neumann architecture took over the computing scene, and neural research was left behind. Ironically, John von Neumann himself suggested the imitation of neural functions by using telegraph relays or vacuum tubes.

In the same time period, a paper was written that suggested there could not be an extension from the single layered neural network to a multiple layered neural network. In addition, many people in the field were using a learning function that was fundamentally flawed because it was not

differentiable across the entire line. As a result, research and funding went drastically down.

This was coupled with the fact that the early successes of some neural networks led to an exaggeration of the potential of neural networks, especially considering the practical technology at the time. Promises went unfulfilled, and at times greater philosophical questions led to fear. Writers pondered the effect that the so-called "thinking machines" would have on humans, ideas which are still around today.

The idea of a computer which programs itself is very appealing. If Microsoft's Windows 2000 could reprogram itself, it might be able to repair the thousands of bugs that the programming staff made. Such ideas were appealing but very difficult to implement. In addition, von Neumann architecture was gaining in popularity. There were a few advances in the field, but for the most part research was few and far between.

In 1972, Kohonen and Anderson developed a similar network independently of one another, which we will discuss more about later. They both used matrix mathematics to describe their ideas but did not realize that what they were doing was creating an array of analog ADALINE circuits. The neurons are supposed to activate a set of outputs instead of just one.

The first multilayered network was developed in 1975, an unsupervised network.

History: The 1980's to the presentIn 1982, interest in the field was renewed. John Hopfield of Caltech presented a paper to the National Academy of Sciences. His approach was to create more useful machines by using bidirectional lines. Previously, the connections between neurons was only one way.

That same year, Reilly and Cooper used a "Hybrid network" with multiple layers, each layer using a different problem-solving strategy.

Also in 1982, there was a joint US-Japan conference on Cooperative/Competitive Neural Networks. Japan announced a new Fifth Generation effort on neural networks, and US papers generated worry that the US could be left behind in the field. (Fifth generation computing involves artificial intelligence. First generation used switches and wires, second generation used the transister, third state used solid-state technology like integrated circuits and higher level programming languages, and the fourth generation is code generators.) As a result, there was more funding and thus more research in the field.

In 1986, with multiple layered neural networks in the news, the problem was how to extend the Widrow-Hoff rule to multiple layers. Three

independent groups of researchers, one of which included David Rumelhart, a former member of Stanford's psychology department, came up with similar ideas which are now called back propagation networks because it distributes pattern recognition errors throughout the network. Hybrid networks used just two layers, these back-propagation networks use many. The result is that back-propagation networks are "slow learners," needing possibly thousands of iterations to learn.

Now, neural networks are used in several applications, some of which we will describe later in our presentation. The fundamental idea behind the nature of neural networks is that if it works in nature, it must be able to work in computers. The future of neural networks, though, lies in the development of hardware. Much like the advanced chess-playing machines like Deep Blue, fast, efficient neural networks depend on hardware being specified for its eventual use.

Research that concentrates on developing neural networks is relatively slow. Due to the limitations of processors, neural networks take weeks to learn. Some companies are trying to create what is called a "silicon compiler" to generate a specific type of integrated circuit that is optimized for the application of neural networks. Digital, analog, and optical chips are the different types of chips being developed. One might immediately discount analog

signals as a thing of the past. However neurons in the brain actually work more like analog signals than digital signals. While digital signals have two distinct states (1 or 0, on or off), analog signals vary between minimum and maximum values. It may be awhile, though, before optical chips can be used in commercial applications.

Conventional computing versus artificial neural networksThere are fundamental differences between conventional computing and the use of neural networks. In order to best illustrate these differences one must examine two different types of learning, the top-down approach and the bottom-up approach. Then we'll look at what it means to learn and finally compare conventional computing with artificial neural networks.Some specific details of neural networks:

Although the possibilities of solving problems using a single perceptron is limited, by arranging many perceptrons in various configurations and applying training mechanisms, one can actually perform tasks that are hard to implement using conventional Von Neumann machines.

We are going to describe four different uses of neural networks that are of great significance:

1.

Classification. In a mathematical sense, this involves dividing an n-dimensional space into

http://cs.stanford.edu/people/eroberts/courses/soco/projects/neural-networks/Comparison/comparison.html

http://cs.stanford.edu/people/eroberts/courses/soco/projects/neural-networks/Comparison/comparison.html

http://cs.stanford.edu/people/eroberts/courses/soco/projects/neural-networks/Comparison/learn.html

http://cs.stanford.edu/people/eroberts/courses/soco/projects/neural-networks/Comparison/learn.html

http://cs.stanford.edu/people/eroberts/courses/soco/projects/neural-networks/Comparison/bottomup.html

http://cs.stanford.edu/people/eroberts/courses/soco/projects/neural-networks/Comparison/topdown.html

various regions, and given a point in the space one should tell which region to which it belongs. This idea is used in many real-world applications, for instance, in various pattern recognition programs. Each pattern is transformed into a multi-dimensional point, and is classified to a certain group, each of which represents a known pattern.Type of network used:

Feed-forward networks

2.

Prediction. A neural network can be trained to produce outputs that are expected given a particular input. If we have a network that fits well in modeling a known sequence of values, one can use it to predict future results. An obvious example is stock market prediction.Type of network used:

Feed-forward networks

3.

Clustering. Sometimes we have to analyze data that are so complicated there is no obvious way to classify them into different categories. Neural netowrks can be used to identify special features of these data and classify them into different categories without prior knowledge of the data.

This technique is useful in data-mining for both commercial and scientific uses.Type of network used:

Simple Competitive NetworksAdaptive Resonance Theory (ART) networksKohonen Self-Organizing Maps (SOM)

4.

Association. A neural network can be trained to "remember" a number of patterns, so that when a distorted version of a particular pattern is presented, the network associates it with the closest one in its memory and returns the original version of that particular pattern. This is useful for restoring noisy data.Type of network used:

Hopfield networks

The above is just a general picture of what neural networks can do in real life. There are many creative uses of neural networks that arises from these general applications. One example is image compression using association networks; another is solving the Travelling Salesman's Problem using clustering networks.

The perceptron

The perceptron is a mathematical model of a biological neuron. While in actual neurons the dendrite receives electrical signals from the axons of other neurons, in the perceptron these electrical signals are represented as numerical values. At the synapses between the dendrite and axons, electrical signals are modulated in various amounts. This is also modeled in the perceptron by multiplying each input value by a value called the weight. An actual neuron fires an output signal only when the total strength of the input signals exceed a certain threshold. We model this phenomenon in a perceptron by calculating the weighted sum of the inputs to represent the total strength of the input signals, and applying a step function on the sum to determine its output. As in biological neural networks, this output is fed to other perceptrons.

(Fig. 1) A biological neuron

(Fig. 2) An artificial neuron (perceptron)

There are a number of terminology commonly used for describing neural networks. They are listed in the table below:

The input vector

All the input values of each perceptron are collectively called the input vector of that perceptron.

The weight

Similarly, all the weight values of each perceptron

vector

are collectively called the weight vector of that perceptron.

What can a perceptron do?

As mentioned above, a perceptron calculates the weighted sum of the input values. For simplicity, let us assume that there are two input values, x and y for a certain perceptron P. Let the weights for x and y be A and B for respectively, the weighted sum could be represented as: A x + B y.

Since the perceptron outputs an non-zero value only when the weighted sum exceeds a certain threshold C, one can write down the output of this perceptron as follows:

Output of P =

{1 if A x + B y > C

{0 if A x + B y < = C

Recall that A x + B y > C and A x + B y < C are the two regions on the xy plane separated by the line A x + B y + C = 0. If we consider the input (x, y) as a point on a plane, then the perceptron actually tells us which region on the plane to which this point belongs. Such regions, since they are separated by a single line, are called linearly separable regions.

This result is useful because it turns out that some logic functions such as the boolean AND, OR and NOT operators are linearly separable � i.e. they can be performed using a single perceprton. We can illustrate (for the 2D case) why they are linearly separable by plotting each of them on a graph:

(Fig. 3) Graphs showing linearly separable logic functions

In the above graphs, the two axes are the inputs which can take the value of either 0 or 1, and the numbers on the graph are the expected output for a particular input. Using an appropriate weight vector for each case, a single perceptron can perform all of these functions.

However, not all logic operators are linearly separable. For instance, the XOR operator is not linearly separable and cannot be achieved by a single perceptron. Yet this problem could be overcome by using more than one perceptron arranged in feed-forward networks.

(Fig. 4) Since it is impossible to draw a line to divide the regions containing either 1 or 0, the XOR function is not linearly separable.

Documents

Biological Neurons