Neocognitron

NeocognitrónNeocognitrón

Dept. of Information and Communication Engineering The University of Electro-Communications 1-5-1, Chofugaoka, Chofu, Tokyo 182-8585, Japan E-mail: [email protected]

Kunihiko Fukushima is Full Professor, Department of Information and Communication Engineering, the University of Electro-Communications, Tokyo, Japan. He received a B.Eng. degree in electronics in 1958 and a PhD degree in electrical engineering in 1966 from Kyoto University, Japan. He was a professor at Osaka University from 1989 to March 1999. Prior to his Professorship, he was a Senior Research Scientist at the NHK Science and Technical Research Laboratories. He is one of the pioneers in the field of neural networks and has been engaged in modeling neural networks of the brain since 1965. His special interests lie in modeling neural networks of the higher brain functions, especially, the mechanism of the visual system. He invented the "Neocognitron" for deformation invariant pattern recognition, and the "Selective Attention Model", which can recognize and segment overlapping objects in the visual fields. One of his recent research interests is in modeling neural networks for active vision in the brain. He is the author of many books on neural networks, including "Neural Networks and Information Processing", "Neural Networks and Self-Organization", and "Physiology and Bionics of the Visual System". Prof. Fukushima is the founding President of JNNS (the Japanese Neural Network Society) and is a founding member on the Board of Governors of INNS (the International Neural Network Society).

1. What is the neocognitron ?

The neocognitron is a hierarchical multilayered neural network proposed by Professor Kunihiko Fukushima for handwritten character recognition. At present there are many different versions of the neocognitron. Two original basic versions proposed by Professor Fukushima differ in used learning principle mainly :

•learning without a teacher •learning with a teacher

The first version of the neocognitron was based on the learning without a teacher. This version is often called self-organized neocognitron. In this tutorial, however, we will focus on the version of the neocognitron which is based on the learning with a teacher. We believe that this version is more suitable for presentation of the basic principle of the neocognitron.

The main advantage of neocognitron is its ability to recognize correctly not only learned patterns but also patterns which are produced from them by using of partial shift, rotation or another type of distortion.

We will demonstrate abilities of the neocognitron on the following simple example.

2. Example - Abilities of the neocognitron

On this simple example we will demonstrate abilities of the neocognitron at recognition of presented patterns. The black-box in this example contains neocognitron network which can distinguish between two different types of patterns (between digit zero and digit one). For the learning of the network we have used patterns shown in figure 2.1.

Fig. 2.1 - Patterns 0 and 1 used for learning

By the learning of the neocognitron to distinguish between these two types of patterns we have created two different categories in the network. In the future the network will respond on every presented pattern with a pair of values. Each of these values is a measure of belonging of presented pattern into one of two created categories.

Click on one of the prepared patterns. The network processes it and assigns it to one of the categories. Notice, that the network assigns patterns which have not been presented during learning to the correct category, too. These patterns were produced by distortion of patterns used for learning, shown in figure 2.1.

2. Example - Abilities of the neocognitron

On this simple example we will demonstrate abilities of the neocognitron at recognition of presented patterns. The black-box in this example contains neocognitron network which can distinguish between two different types of patterns (between digit zero and digit one). For the learning of the network we have used patterns shown in figure 2.1.

Fig. 2.1 - Patterns 0 and 1 used for learning

By the learning of the neocognitron to distinguish between these two types of patterns we have created two different categories in the network. In the future the network will respond on every presented pattern with a pair of values. Each of these values is a measure of belonging of presented pattern into one of two created categories.

Click on one of the prepared patterns. The network processes it and assigns it to one of the categories. Notice, that the network assigns patterns which have not been presented during learning to the correct category, too. These patterns were produced by distortion of patterns used for learning, shown in figure 2.1.

4. Network structure - Stages

Structure of the neocognitron arises from a hierarchy of extracted features. One appropriate stage of the neocognitron is created for each stage of the hierarchy of extracted features. The network however contains one additional stage, labeled as stage 0, which is not used, in contrast to higher stages, for feature extraction. All the stages of the neocognitron and a part of features extracted by them, which corresponds to hierarchy in figure 3.1, are shown in figure 4.1.

Total number of stages of the neocognitron depends on the complexity of recognized patterns. The more complex recognized patterns are, the more stages of hierarchy of extracted features we need and the higher number of stages of the neocognitron is.

Fig. 4.1 - Network structure - Stages

Fig. 5.1 - Network structure - Layers

From figure 5.1 it is obvious that four types of layers exist in the neocognitron. Stage 0 always consists of only one input layer. All higher stages consist of one S-layer, one V-layer and one C-layer.

In figure 5.1 we have also established ordinarily used notation of layers in the neocognitron. We will use this notation, described in table 5.1, in the following text as well.

Symbol

Denotes

U0 input layer

USlS-layer

in the l-th stage of the network

UVlV-layer

in the l-th stage of the network

UClC-layer

in the l-th stage of the networkTab. 5.1 - Notation used for layers in the neocognitron

6. Network structure - Cell planes

Each layer in the neocognitron consists of certain number of cell planes of the same type. Input layer is exception from this rule. For the input layer the term cell plane is not established. Number of cell planes in each S-layer and C-layer depends on the number of features extracted in corresponding stage of the network. Each V-layer always consists of only one cell plane. Structure of the network from figure 5.1 after drawing of cell planes from which the particular layers are assembled is shown in figure 6.1.

Fig. 6.1 - Network structure - Cell planes

7. Network structure - Cells

Now we have come to the ground of the neocognitron which is cell. The neocognitron is made of large amount of cells of several distinct types which are organized in cell planes, layers and stages. All the cells, regardless of their type, process and generate analog values. From figure 7.1 it is obvious that each S-plane, V-plane, C-plane and input layer consists of array of cells of the certain type. Size of cell arrays is the same for all cell planes in one layer and it decreases with increasing of the network stage. Each C-plane in the highest stage of the network contains only one cell. Its output value indicates a measure of belonging of presented pattern into the category represented by this cell. Size of cell array in each V-plane is the same as size of cell arrays in S-planes in the same stage of the network.

Fig. 7.1 - Network structure - Cells

From figure 7.1 it is obvious that four types of cells exist in the neocognitron - receptor cells, S-cells, V-cells and C-cells.

On the following pages we will explain V-cell, S-cell and C-cell function in detail.

8. V-cell function

Each V-cell in the neocognitron evaluates outputs of C-cells (or receptor cells) from the certain connection areas from previous C-layer (or input layer). Size of connection areas is the same for all V-cells and S-cells in one stage of the network and it is determined at construction of the network. One V-cell connection areas of size 3 by 3 cells are shown in figure 8.1.

Fig. 8.1 - Connection areas of the V-cell

V-cell output value represents average activity of cells from connection areas and it is used for inhibition of corresponding S-cell activity.

Exact specification of V-cell function is described in mathematical description of its behaviour.

Function of each S-cell is to extract the certain feature at the certain position in the input layer (i.e. in its receptive field). For extraction of this feature an S-cell uses only informations obtained from its connection areas and information about average activity in these areas obtained from corresponding V-cell. All S-cells in one S-plane always extract the same feature. The feature extracted by S-cell is determined by weights for this cell. Weights and method for their adjusting will be described later in detail. For a better conception about function of weights we can compare them to a mask which is used at determination about feature presence for now. The meaning of weights is obvious best for cells from layer US1. Each S-cell in this layer has only one connection area and this area is S-cell's receptive field at the same time. So weights (mask if you want) contain directly representation of the certain feature. In higher S-layers correspondency between extracted feature and its representation by the weights is already not so obvious. Cell plane of S-cells designated for extraction of feature corresponding to vertical line is shown in figure 9.2. S-cell is activated only if this feature is present in S-cell's receptive field (it is identical with connection area here). When incorrect feature is presented the cell becomes inactive.

Fig. 9.2 - S-cell function

S-cell output value is determined exactly by the equation described in mathematical description. However, for understanding of S-cell function simplified equation is sufficient :

The symbols used in this equation have the following meaning :

Symbol

Denotes

us S-cell output value

non-linear function

E excitatory part

a a-weights

ucoutput values of C-cells from connection areas

I inhibitory part

r selectivity

b b-weight

uv V-cell output value

10. Example - Selectivity

The process of feature extraction is influenced by selectivity to a great extent. For each S-layer in the neocognitron we can set different amount of selectivity at construction of the network. By the change of selectivity we change the effect of inhibitory part on the S-cell output value. Decreasing of selectivity causes decreasing of effect of inhibition part. Decreased S-cell ability to distinguish learned feature exactly is the result of it. In other words it means that S-cell considers also more deformed features to be correct. Example represents one S-cell from layer US1 and its connection area which is its receptive field here as well. This S-cell extracts feature corresponding to vertical line in the centre of the receptive field. behaviour but simplified form of this equation will suffice for us :

From this example it is obvious that excitatory part E is influenced only by cells which correspond to the mask marked by gray color. Inhibitory part I is influenced by all cells from connection area and by amount of selectivity as well. S-cell becomes inactive if inhibitory part is greater or equal to excitatory part.

Select one of the prepared patterns with using of mouse, set up desired amount of selectivity and observe the effect on excitatory part E, inhibitory part I and S-cell output value as well.

We reminder that S-cell output value is exactly determined by equation described in mathematical description of its

11. C-cell function

Each C-cell in the neocognitron evaluates outputs of S-cells from the certain connection area from one of S-planes from previous S-layer. Number of S-planes, however, can be greater in some cases.

Size of connection areas is the same for all C-cells in one C-layer and it is determined at construction of the network. One C-cell connection area of size 5 by 5 cells is shown in figure 11.1.

Fig. 11.1 - Connection area of the C-cell

C-cell output value depends on activity of S-cells from connection area. The greater number of active S-cells is or the greater their activities are the greater C-cell output value is. C-cell function is exactly described in mathematical description.

For C-cell to be active it is sufficient that at least one active S-cell is present in its connection area. With regard to overlapping of neighbouring C-cell connection areas activity of one S-cell affects activity of greater number of C-cells. In consequence of that C-plane contains a blurred representation of S-plane content. This is obvious from figure 11.2 as well where one active S-cell and all C-cells influenced by it are marked.

Fig. 11.2 - C-cell function

Ability of C-cell to compress content of connection area in the certain way is the next consequence of C-cell function. Hence we can decrease the density of cells in C-layer to the half of density of cells in previous S-layer in some cases.

On the following example we will show the last and probably the most important consequence of C-cell function which is ensuring of the neocognitron's tolerance of feature shifts

12. Example - Tolerance of feature shifts

This important property of the neocognitron is ensured by C-cells. Connection area of one of C-cells is marked in S-plane in figure 12.1. This C-cell is active only if there is an active S-cell in its connection area. It corresponds to presence of correct feature at the certain position in the input layer. When this feature is shifted to another position another S-cell is activated. If the activated S-cell belongs to the marked connection area again our C-cell remains active.

Fig. 12.1 - Tolerance of feature shifts

Receptive field of observed C-cell is marked in the input layer U0. It is obvious that C-cell is activated only if any S-cell detects correct feature in this field.

13. Weights and connections

The neocognitron is characteristic not only by large number of cells but also by large number of connections. These connections serve for transfer of informations between cells in adjoining layers. Particular cell obtains by means of connections informations from all cells which are located in its connection areas. For each connection there is a weight by means of it we can affect amount of transferred information. If we imagine a connection as a pipeline with a valve we can compare weight assigned to the connection to a degree of opening of this valve. Four types of weights (a-weights, b-weights, c-weights and d-weights) exist in the neocognitron. Each of these types of weights is used for connections between two layers of different types. It is shown schematically in figure 13.1.

Fig. 13.1 - Weights in the neocognitron

Weight sharing is the next term being connected with weights. By this term we designate the fact that all cells in one cell plane use the same weights for connections leading from cells in their connection areas. By the means of weight sharing it is guaranteed that all cells from one cell plane always extract the same feature.

Fig. 13.2 - Weight sharing

We can split the weights shown in figure 13.1 according to the way which they are adjusted :

•weights modified by learning

•a-weights

•b-weights

•fixed weights

•c-weights

•d-weights a-weights

The a-weights are the first type of weights modified by learning. These weights are used for connections between S-cells and C-cells which belong to their connection areas. Features extracted by S-cells are encoded in these a-weights. Adjusting of a-weights is performed during learning of the network according to the presented training patterns.

Fig. 13.3 - a-weights

b-weights

The b-weights are the second type of weights modified by learning. These weights are used for connections between S-cells and corresponding V-cells. Adjusting of b-weights is performed during learning of the network according to the presented training patterns as well.

Fig. 13.4 - b-weights

c-weights

Fixed c-weights are used for connections between V-cells and C-cells which belong to their connection areas. Values of c-weights are determined at construction of the network. These weights are most often set up in such a way that they mostly reduce transfer of information from the periphery of connection area and towards to the center of area the degree of reduction decreases.

Fig. 13.5 - c-weights

d-weights

Fixed d-weights are used for connections between C-cells and S-cells which belong to their connection areas. As well as c-weights also d-weights are determined at construction of the network and again in such a way so as to reduce transfer of information from periphery of connection areas mostly.

Fig. 13.6 - d-weights

14. Learning

In this tutorial we deal only with the version of the neocognitron which uses learning with a teacher and therefore we will describe only this principle of learning here. Learning in this version of the network is controlled by a teacher. His task is to determine what features shall be extracted in particular stages of the network and to prepare corresponding training patterns before beginning of learning. Learning of the neocognitron proceeds stage by stage from the lowest stage of the network and it inheres in adjusting of modifiable weights (i.e. a-weights and b-weights) according to the response of already learned parts of the network to presented training patterns. For each S-plane in the network one training pattern is usually used and this pattern is usually necessary to present to the network only once.

On the beginning of learning teacher have to set all a-weights and b-weights in the network to zero. Then he selects S-plane from layer US1 and in this cell plane he selects one of cells, so-called seed cell. Presentation of training pattern given for this S-plane into the input layer U0 is the next step. Finally teacher adjusts weights of the seed cell according to the equations mentioned in mathematical description of learning. Since weight sharing is used in the neocognitron adjusting of weights of all the other S-cells in the cell plane occurs simultaneously. If more training patterns for the selected S-plane exist then they are presented subsequently and process repeats. In opposite case we move to learning of the next S-plane.

The learning process of the neocognitron is demonstrated on the following example in detail.

15. Example - Learning

In this example we will demonstrate learning of the simple version of the neocognitron network. We want that our network will extract features shown in figure 15.1.

Fig. 15.1 - Hierarchy of extracted features

We prepare the corresponding training pattern set (it is shown quite in the left in example) and we start learning by Start button.

In our demonstration b-weights are not shown because the way of their modification is not so important for us now. Remember however that both a-weights and b-weights are adjusted during learning. Notice that together with seed cell selecting their connection areas and receptive field are selected as well.

16. Recall

Recall in the neocognitron inheres in evaluation of output values of all cells stage by stage. The result of this process is a decision to which of learned categories presented pattern belongs. The process of recall begins with presentation of pattern intended for recognition to the input layer U0. Then output values of V-cells in the layer UV1 are evaluated. S-cells from the layer US1 can extract the simplest features and C-cells from layer UC1 ensure decreasing of effect of extracted features shifts. The whole process repeats analogically for all the following layers of the network. After completion of recall output values of C-cells from the highest layer of the network correspond to measures of belonging presented pattern to categories which the particular C-cells represent. In figure 16.1 the process of recall in the neocognitron is demonstrated schematically.

Fig. 16.1 - Process of recall in the neocognitron

On the following example we will examine recall on the simulator of the neocognitron in detail.

17. Example - Simulator of the neocognitron

At the end of our tutorial we have prepared simulator of the neocognitron network mentioned in the first example.

Each quadrangle in the simulator represents one certain cell plane in the network. For simplicity V-planes are not shown here because their content is not so important for us. Output values of cells in cell planes are expressed by different intensity of color. The higher the output value of cell the darker the color is. After clicking on any cell in any cell plane all its connection areas and receptive field are marked. In control panel select one of patterns and observe state of the network after its presentation. Examine what features are extracted in particular S-planes and in detailed view how these features are encoded in a-weights.

Education

Neocognitron