[IEEE 2005 International Conference on Neural Networks and Brain - Beijing, China (13-15 Oct. 2005)] 2005 International Conference on Neural Networks and Brain - Application of Two-Stage

Application of Two-Stage Learning on Brain-like

SystemLiming Zhang

Departinent of Electronics EngineeringFudan University, Shanghai 200433, China

E-mail: [email protected]

Abstract- This paper proposes a two-stage learningstrategy constructed by two kinds of neural networks tosimulate function of human brain. In the first stage, sequenceimages from environment input to a HOSM neural network.By unsupervised learning, the weights are fixed which canextract local features like vision's receptive field. In the secondstage, an improved HDR neural network is built by supervisedlearning. This proposed structure has been implemented on abrain-like robot. Experimental results show that the learningstrategy is effect.

I. INTRODUCTION

Why the recognition ability is so perfectly in human andanimals vision? How to process the information in human'sand animals' vision? These questions have been interestedby experts and scientists for long time. Many scientists andexperts have been exploring the mechanism of vision andapplying these biological methods to image processing andrecognition in engineering. In 1996, Olshausen and Field [1]found out that some basis-functions, obtained byunsupervised learning from natural images, are similar toresponse of receptive field of simple cells in visual cortex.Many scientists repeated this result. In 2003, we simulated abiological experiment for growth of young animals' visualcortex in special visual environment by using unsupervisedlearning, which confirmed Blakemore and Cooper's resultsin 1970[2] about the growth of simple cells in cat's VI area,and proposed a conjecture of two-stage learning strategy [3]:In the infant stage unsupervised learning may be majormanner for human being, because he or she could notreceive any information from their parents and teachersexcept their eyes. In this stage each single neuron'sconnected weights with their receptive field were updatedby unsupervised learning. When the receptive fields ofsimple cells ofbabies' vision maturated, supervised learningmay be a main manner. In supervised learning stage neuralnetworks in high level of brain accumulate knowledgeand children develop more cleverly.

According to above conjecture, we proposed two neuralnetworks in this paper. One model is Hierarchical

Overlapping Sensory Mapping (HOSM) motivated by thestructure of receptive fields in biological vision. To obtainconnected weights of these receptive fields fromenvironment, an unsupervised learning method calledCandid Covariance-free Incremental Principal ComponentAnalysis (CCIPCA) [4] is used to automatically develop aset of orthogonal filters. This unsupervised learning is tosimulate infant stage. The other model is improved HDRtree in which knowledge can be accumulated by supervisedlearning. The two models will be combined as a brain-likesystem applied to an intelligent robot with developmentfunction. Experimental results show that the brain-likesystem is efficient.The rest of this paper is organized as follows. Section II

shows a HOSM structure and learning procedure. Animproved HDR is introduced in section III. The proposedbrain-like system is presented in section IV. Some resultsof application on intelligent robot and conclusions areshown finally.

HI. HERARCHICAL OVERLAPPING SENSORY MAPPING (HOSM)

In 60s biologists had found out that each neuron onbiological visual pathways only connects its neighborregion called receptive field. They can extract features fromtheir corresponding receptive field. The receptive fields ofadjacent neurons are overlapped and these neurons cancover the whole vision field. The visual pathway isconsisted of multi-layer neurons. Their receptive fields ofthe same layer have the same size. From lower layer tohigher layer, the size of the receptive fields is becominglarger [5]. This structure can detect object with any size orin any location.

We propose a sensory mapping network, in which theinput image is divided into lots of sub-blocks with identicalsize. Each block is transformed into avectorXI ERm, I=1,2,...n. As shown in Fig. 1, the blockmarked upper left is consisted of 5 overlapping receptivefields (sub-blocks). The size of the receptive fields and theoverlapped distance are determined by the resolution of thecamera and input images' scale. The neurons of higher layershould cover a bigger range than the previous one, so the

0-7803-9422-4/05/$20.00 ©2005 IEEE1580

input image should be scaled down for higher layer, inwhich the overlapping sub-blocks still have the same size.The sub-blocks in the second level are forming intovectors X' E Rm,1l = 1, 2, ... n1. The same job is done to thelevel of three, four... until the input image is scaled downto almost the same size of the receptive field.

FigTarce

Fig. 1. The overlapping receptive fields

column vector Xi, is subtracted its mean. It denoted asu(j),j= 1,2,--- , herej is the index of the sub-blocks. Whenthe pth block appears, the covariance matrix in that time is

A(p) = -u(i)u(i)P j=i

(1)

The ith eigenvector of A(p) satisfies:gyi@p)= A(p)yi (P)

Here yi(p) is the ith eigenvector of A(p) and Ai is thecorresponding eigenvalue. The update in CCIPCA isAyi .If thepth input blocks of image appears, we have

vi(p) = iyiy(p) = A(p)y5(p) (2)

From (1) and (2), it could be deduced that

(3)Vi(p) = u(i)u(i) YP j=l

Fig. 2. Neural networks for weights' learning

Fig. 2 shows a neural network. Its inputs aresub-blocks of input image with a dimension of m and itsoutputs have k neurons. Fig. 3 shows all the possiblesub-blocks in four levels: their position is kept organizedand their receipted fields grow incrementally over eachlevel until the entire image is approximately covered withone sub-block.

Suppose the overlapping sub-blocks of each level areorganized as a part of the training set

I{V,X2- ...Xn...*Xq .. .For an image sequence, the training set is shown

as {X (t),X2(t) .. X"(t) .. X'(t)-*X1(t) * j t=l,2b- *.Here t is the serial number of the frame. They are inputtedto the neural network of Fig.2 sequentially in the trainingphase. We can obtain k eigenvectors of covariance matrixfor these training samples by unsupervised learningalgorithm CCIPCA proposed in[4], which could extracteigenvectors in real-time. Here we introduce it as follows.

-AhLh hn+1 hn"+2 bn+n,+l hN

/ // y<WLeLevel/ Level 1 /

Fig. 3. The neural network oftheHierarchical Overlapping Sensory Mapping

A sub-block from the input image, transformed as a

Since all the eigenvectors are normalized, if we canestimate V,then Ai and yi will be calculated as

ii = Vi || andy, = v,i I|vi4 .

Change Eq. (3) to iteration form. We choose y, (p) as

which leads to the following incremental expression as thebasic iteration formula of CCIPCA,

vi(p) v (p 1) + -u (p)u (p) v (p (4)p ~~p Iv,(p -t)I

Where (p-l)/p is the weight for the last estimate and lipis the weight for the new datau(p).It was proven [4] that,with the algorithm given by Eq. (4), vi -+ Aiyi whenp ->0X), where Ai is the ith largest eigenvalue of thecovariance matrix of { u(j) } and y) is the correspondingeigenvector. Eq. (4) can apply to all the eigenvectors.

In the testing period all the sub-blocks shown in Fig.3are inputted to their corresponding networks with the sameweights. After the projection of their individual network, wecan get N-k features corresponding to the output ofN neuralnetworks if the total number of the sub-blocks is N. Whenthe image sequence is inputted to HOSM, the N-k responses

1581

of the organized sub-blocks can be used for identifying anyinterested targets, whatever the position or size of the targetis. For arbitrarily input region at any position with any size,these networks can find a receptive field approximatelycovers the region and get the same k responses.

III. MODIFIED HIERARCHICAL DISCRIMINATE REGRESSION(MHDR)

HDR is a supervised learning method by tree structurewhich uses the natural information and class-label at thesame time and project high dimension data ontolow-dimension subspace[6].The structure of original HDRtree after taring is shown as Fig.5. Suppose that the trainingset includs n image frames with their correspondingrequired outputs. Each frame can be represented as a vectorwith N-k feature components from HOSM. The training setis denoted as L={( xD y,)},i=1,2,...s, xi is an input vectorand yi is the corresponding output vector. The training stepsare presented as follows

a b cFig.4 principle of original HDR a. cluster in teacher y

b. map to input space, c. cluster in subspace

Fig.5 Original HDR model

Stepl: Clustering: divide L into q clusters according torequired outputy, L={ 11,12-@*1q }, for robot, the required y isactions of robot. Fig.4 a. shows the case of q=3 in 2-Doutput space. The corresponding input space is shown inFig.4 b, from which, the q clusters are overlapped in inputspace.Step2: Mapping input vectors to low dimension space:Compute mean vector m,: of x for every cluster 1,,i=1,2, ...q and compute the K-L transform matrix W bymir i=1,2, ...q ,and project all samples x#, i=1,2, ...q,j=1,2, ...n, on the K-L subspace, x'LT=Wx,&, m'. =Wmix,

Step 3: Clustering x'y in subspace: For simple description,we still use 2-d in subspace in Fig.4 c that shows the resultsof clustering. Each cluster of Fig.4 c. is a node in HDR treeof Fig.5. It is seen that the new cluster may include differenty. In that case the node will be split to sub-nodes like theprocedures of stepl-2. This procedure will be repeated untilevery sample in this node has the same y. This node iscalled a leaf-node. Otherwise repeat stepl to step 3 until allnew sub-nodes are leaf-node. The structure of HDR treeshows in Fig 5.The modified HDR is proposed in this paper. The change is(1) Change K-L transform to Fisher transform, because K-Lsubspace is obtained only by the mean value of everycluster that may be lost much classification information. Sothe separability in K-L subspace is worse than original data.Fisher transformation has better performance than K-Lsub-space. (2) Use a two-layer neural network based onFisher subspace to realize the node split in HDR shown inFig.6 and 7.

Fig.6 two-layer neural networkbased on Fisher subspace

Rat

sarqm * .. x..y * * * L... d

midde-laer nuron forthe-l diensikondroectino

sample6sonthF,iu,2i=1,2,shes p representthic

sInplsohFisher subsacei~=1,2, ...,d,representteiptnuoswt

the q clusters in the sample set L. Weight W=[ wh,w2, .Wqi] corresponds to Fisher transform matrix F andweight M=[ ml, m2, ...,mq], mi correspond to the meanvector of cluster i.

1582

The learning algorithm of Fig.6 is two times CCIPCAintroduced in previous section. But it will be more complex.For space limitation, we omit it. The detail discretion willintroduce in other paper.

IV. BRAIN-LIKE SYSTEM

The diagram in Fig. 8 a. shows the combined system ofHOSM and modified HDR that has been applied to amobile robot shown in Fig.8 b. Two cameras with aroundmotion are to simulate human eyes and two microphonesfor ears. The sensory mapping done by HOSM, modifiedHDR is built a cognition network for classification andrecognition and also gives its feedback for sensory mappingas attention selection. The HDR tree is leamedautonomously by robot. When the tree is getting mature, therobot is getting more intelligent like human.

in which the receptive field with size of m-80 x 80, shifts20 pixels each time. The total number of sub-block amountstop= 609 for one image. To cover a larger range in the nextlevel, we scale down the input image to 3/4 each side, thatis 480 x 360. Then we use the same size m to cover thetransformed image as input. Similar job is done to the thirdlevel and the fourth level. Finally there are 4 levels and thetotal N=1056 sub-blocks for one input image. In the trainingphase, the N sub-blocks are transported to a network shownin Fig. 2 producing k visual filters. The leaming algorithmis CCIPCA in infant stage, which is fast enough for onlinedevelopment. After the first stage leaming, the weights ofHOSM are built. Feature vectors with k-N components foreach image are inputted to MHDR to generate HDR tree bysupervised learning described in section III. In supervisedlearning stage the position of the interested targets is asteacher to train HDR. Through the two-stage learning, therobot recognizes all the targets he should notice. In thetesting phase, the receptive fields of the entire input imageare submitted to HOSM and the producing responses areinputted to MHDR. The output from MHDR controls theeyes moving. Fig. 9 shows the results for attention to singleinteresting target.

First-stage learming Second stage learning

Fig.8 a The block diagram of brain-like system

Fig.9 Attention interesting target after two-stage learning

Fig. 8 b The proposed model for brain-like Robot

V. EXPERIANMENTAL RESULTS ON MOVING ROBOT

Three experianments on brain-like robot to test theproposed two-stage learning stragy and two neural models.

A. Attention Selection Experiment to Test the HOSMModel

The image in size of 640 x 480 comes from eyes of robot,

Table 1 shows the correct ratio for four targets in the caseof single target in scene. Here k=30. The time consume oftest is I frame/s.

Table 1 Single target (1 frame /s)target Correct rate No. of frames

CD ROM 97% 200

personal stereo 95% 200

hand 85.5% 200

package 95% 200

Table 2 shows the correct ratio for four targets in onescene, but the robot only stares his interesting one target.

1583

Table 2 Many target in scene: ( Iframe /s)

target Attention rate Correction rate No. of

ofposition frames

CD ROM 99% 99% 100

package 95% 100% 100

Personal stereo 68% 97% 100

cup 98% 100% 100

This experiment shows that the proposed HOSM modelcan extract local feature like receptive field in vision, so themodel can stare target almost in real time whatever thetarget in any position.

B. Robot Automatically Avoids Remora ComparingMHDR with HDRThe robot collects pictures by video camera. We divide

these pictures into three classes according to the scenes infront of the robot and use them training the tree (HDR andour proposed MHDR) for robot moving in the same HOSMcase. The scenes include two different persons stand infront of the robot and no person in front of the robot. Whenrobot sees a scene that he had seen before, he can search thetree and find his correct moving direction: no person he goforward, others are stop or go back. The robot can runnormally and verdict whether there are someone in front ofhim. The simulation results of the video obtained fromrobot are shown as Table 3.

Table 3: The results ofmobile robot visual navigation

This result show the performance of our MHDR is betterthan HDR.

C. Automatically NavigationWhen we take the robot to any new environment, by

two-stage learning (one is for HOSM and the other is for

MHDR), the robot can walk along correct direction himselfIn this experiment, size of input image is 160x120 only use

one eye, consider one layer in HOSM, and No. of featurecomponents is 30. That means the input's dimension ofMHDR is 30. The required output is four classes: go ahead,go back, go right and go left. The training time is 10 frames/s for HOSM and is 60 frames /s for MHDR. So it can workin real time. The test time is fast than training time andcorrect rate can reach 99% if no noise or no machine efror.

In real word the robot walk also correctly.

VI. CONCLUSION

Based on our two-stage learning strategy proposed in2003, this paper proposes two kind neural network modelsto implement the strategy. One is HOSM that simulatereceptive field structure in VI area of human brain byunsupervised learning. The other is MHDR that modifiesHDR tree of [6]. The two models are combined in a

brain-like system, which has been used on a mobile robot.Eyes of the robot can track his interesting target bytwo-stage learning strategy and go his way that he hadexperienced before and avoid person if some one stand infront of him. These models and learning algorithm can workin real time.The next work is to add ears' information to train the

robot and do more complex work. Let the robot becomesmore cleverly.

ACKNOWLEDGMENTThis research is supported by the National Natural

Science Foundation of China (NSF 60171036) and theShanghai Science and Technology Committee (No.045115020).

REFERENCE

[1] Olshausen, B.A. & Field D.J., "Emergence of simple-cell receptivefield properties by learning a sparse code for nature images," Nature,381, pp.607-609, 1996.

[2] Blakemore C. and Cooper G F., "Development of the brain depends on

the visual environment, " Nature, 228, pp. 419-478, 1970.[3] Zhang LM, Mei JF, "Shaping up simple cell's receptive field of animal

vision by ICA and its application in navigation," Neural Networks , vol,16, 609-615, 2003.

[4]Weng, J., Zhang, Y, Hwang, W.S., " Candid covariance-freeincremental principal component analysis," IEEE Trans. PatternAnalysis andMachine Intelligence, vol. 25, pp.1034-1040,2003.

[5] Hubel, D., Wiesel, T., "Receptive fields, binocular interaction andfunctional architecture in the cat's visual cortex," J. ofPhysiology, vol.160, pp.106-154,1962.

[6] Hwang, W.-S., Weng, J., "Hierarchical discriminate regression," IEEETrans. on Pattern Analysis and Machine Intelligence, vol. 22,pp.1277-1293, 2000.

1584

time average Time cost ofNo. of Training Error rate trining/testvs. test Our method Vs. HDR MHDR

HDR

1061 vs. 200 2.62% vs. 5.741s/1.450ms 5.092s/1.750ms

4.88%

961 vs. 300 2.50% vs. 5.537s/1.500ms 5.077s/1.667ms

3.25%

861 vs. 400 3.19% vs. 4.977s/1.600ms 4.813s/1.SOOms

4.50%

761 vs. 500 3.50% vs. 5.037s/1.700ms 4.873s/1.800ms

5.20%

Documents

[IEEE 2005 International Conference on Neural Networks and Brain - Beijing, China (13-15 Oct. 2005)] 2005 International Conference on Neural Networks and Brain - Application of Two-Stage