Si-Wei Luo et al- Knowledge-increasable learning behaviors research of neural field

8/3/2019 Si-Wei Luo et al- Knowledge-increasable learning behaviors research of neural field

1/5

Proceedings of the First International Conference on Machine Learning and Cybernetics, Beijing, 4-5 November 2002

0-7803-7508-4/02/$17.00 2002 IEEE

586

KNOWLEDGE-INCREASABLE LEARNING BEHAVIORS RESEARCH OF

NEURAL FIELD

SI-WEI LUO, JIN-WEI WEN, HUA HUANG

Institute of Computer Science, Beijing Northern Jiaotong University, Beijing, China, 100044

E-MAIL: [email protected]

Abstract:In a hierarchical set of systems, a lower order system is

included in the parameter space of a large one as a subset.

Such a parameter space has rich geometrical structures that

are responsible for the dynamic behaviors of learning. Based

on the theory analysis of information geometry and

differential manifold, this paper researches knowledge-increasable learning behaviors of neural field and presents a

layered knowledge-increasable Artificial Neural Network

(KIANN) model, which has knowledge-increasable and

structure-extendible ability. The method helps to provide

explanation of the transformation mechanism of human

recognition system and understand the theory of global

architecture of neural network.

Keywords:Information geometry; Artificial neural network (ANN);

Neural field; Parallel processing; Knowledge-Increasable

1 Introduction

Currently, the main problems which affect thedevelopment of Artificial Neural Network (ANN) are: Onone hand, the learning model currently used in ANNrequires that the learning procedure be accomplished in atime, which is different from the step by step model ofhumans. Thus the scale of ANN increases when pattern setgrows, which in turn results in the growth of timeconsumption and high requirements on the device context.On the other hand, the learned patterns will be destroyedwhen an ANN learns new patterns. It can not realize the

aim of knowledge inheriting and accumulating. The aim ofpracticing Artificial Neural Network (ANN) is to utilizing

ANN to process large scale pattern set simply and rapidly.The research object of neural field theory is referred tounderstand the transformation mechanism, dynamical behavior, capability and limitation of neural networks, bythe study of global structure analysis. By analysis andresearch the dynamic learning behaviors of neural field, itcan provide explanation of the transformation mechanismof human recognition system and understand the theory ofglobal architecture of neural network.

Information geometry, originated from the geometric

study of the manifold of probability distributions (Amari[1][2]

), has been successfully applied to many fields. Itstudies the intrinsic geometrical structure to be introduced

in the manifold of a family of probability distributions.Amari developed a theory of dual structures and unified allof these theories in the dual differential geometrical

frameworks. Information geometry has been used so far notonly for mathematical foundations of statistical inferences

but also applied to information theory, neural networks,system theory, mathematical programming and others. It isimportant to study the properties which a family of probability distributions posses as a whole, since a parameterized statistical model constitutes a family ofdistributions. Such a family forms a geometrical manifoldin many cases, and geometric structures, such as rimennainmetric, affine connections, and divergences of two points.These intrinsic geometrical structures represent someimportant properties of the family, which are not mere

aggregations for the properties of the distributions in thefamily but are given rise to by the mutual relations of the

distributions. By using information geometry, the intrinsic

geometrical structure properties of a hierarchical structuresand the invariant decomposition can be elucidated. It also provides the explanation of approximate process of NNlearning. As a useful math tool, information geometry playsan important role in the analysis and research of neural fieldtheory.

A family of neural networks forms a neuromanifold,

(Amari[3,4]

). The structure of layered ANN can be described by dual flat manifold structure. By analyzing and

researching the knowledge increasable learning ability ofneural field, we present a layered ANN model by using theknowledge-increasable and knowledge -inheritable thinkingof Siwei Luo

[5]to implement increase and accumulation of

acquired knowledge and solve the problem of systemextension and complex data disposal. We put forward amethod to construct a large scale ANN(LSANN) withoutdestroying the functionality of the existing one. This isimplemented by embedding independent ANNs that havedifferent functionality. Such ANNs have shown the ability

of inheriting the patterns learned and we can use thismethod to construct LSANNs that are appropriate for large-scale pattern set processing. This has solved the difficultiesin constructing LSANNS.

2 Neural Field Knowledge-increasable Learning

Behaviors


2/5


587

2.1 Information geometry theory of Neural Field

Learning

According to Amari[3,4], the set of systems with

modifiable parameters form a parameter space where the parameters play the role of a coordinate system. Astatistical model corresponds to a geometrical figure, adistribution corresponds to a point, and parameterscorrespond to coordinates. A geometrical theory of

statistics would have the property that whatever coordinatesare used, the statements about relations among the points

and figures remain the same[7]

.A family of neural networksystems form a neuromanifold. Amari

[3,4,6]has proved that

most of Neural Network systems can be expressed by a probability distribution which can included in the spacewith a large number of parameters which was showed infig1. In the following section, we will give explanations.

Fig.1. Probability distribution description in manifold

In the theory of neural field learning, the learning rule is

usually a mapping form some data(sample) Zz , to

some parameter w . z is composed of ),( yx input andoutput. Usually, In the meaning of statistics x has a fixeddistribution, which mean that the distribution of z isrestricted to within a submanifold. Training data areassumed to be sampled form an environmental distribution

or true distributionPp

. The space P is composed ofall probability distributions over the sample space z andforms a differentiable manifold. The network, which

finished training, will represent an estimated distribution or

estimated)|( wPq

. The set ofQ

of all thedistributions represent by a network is the computation

model and is usually a smooth submanifold ofP. The

weights are of course coordinates. The manifoldQ

is amanifold which is equipped with the Riemannian metric

)(gij given by the fisher infromation matrix and

moreover which is equipped with dually flat affineconnections(Amari[1]), There are two fundamental

coordinate systemsand

and in it. Neural netwrok

structure can be expressed by exponential probabilitydistributions a family.[2,3,4]

Fig.2. neural field structure map space

2.2 Neural Field Knowledge-increasable Learning

Behaviors Analysis

The aim of practicing Artificial Neural Network(ANN) isto utilize And the main method is: dividing a task into smalltasks by the principle of division and rule or a networkwhich is composed of several known artificial neuralnetworks and matches the NN to cooperate together torealize the learning problem of great amount of data. Forexample the Hierarchical Mixture-of-Expert (HME[8,9]) is aclassical model. It can yield a solution to the complexproblem and improve the performance of a single network.But In some cases, the number of parameters in the HME

models may be too large, resulting in potentially overlycomplex functions, whilst in other cases, there may be too

few parameters, resulting in under-fitting of the data.Indeed, given an arbitrary complex function, there is noreason to expect that increasing the number of samplesfrom the function should require more parameters to modelit, and such a rule is also therefore likely to produce modelsthat will tend to over fit the data. And another disadvantageis: When system extended (add pattern classes or trainingsamples), all experts, including gating function, need

retraining that will destroy acquired knowledge. If there arelarge quantities of training data, it will take long time to

retrain, which will bring to knowledge waste. In order toovercome these disadvantages, the knowledge-increasableand knowledge-inheritable ability is necessary, which

means system can be extended dynamically. We canconstruct a modular parallel ANN to implement increaseand accumulation of acquired knowledge and solve the problem of system extension and complex data disposal.Learning like human brain does, the system can beextended dynamically to achieve extension. In thefollowing section, we will analyze the neural field

knowledge-increasable learning behaviors of functionmodular ANN system.


3/5


588

Fig.3. Sketch map of modular NN system

Fig.4.. neural field learning structure map

Fig 3 is the sketch map of function modular NN system.It consisted by two NN modules, NN1 and NN2.CM is the

control component of system. It can be NN module or othercomponents that have decision-makings ability. Accordingto the neural field learning theory, the structure map of fig3was showed in fig4. Fig 4 is a function modular neural field

learning sketch fig. S is the statistic model manifold.

S= )};|),(({ xzyP }...2,1),,{( Niyx ii = is the

training sample set which was divided into A and A.A isthe neural network learning process by the training sampleset of NN1 and A is another learning process by thetraining sample set of NN2. X and X are the map operationwhich project NN1 and NN2 into manifold S and get the

point 0f

and'

0

f. f stands for the projection point in S

of reality system which is too complex to get , and it is

approximate by 0f

cooperate with'0f by the control of

CM component normally.The function modular system has the ability of

knowledge-increasable learning. In fig 3, after the structure

of system 0f

and'0f were determined, it has two situations

to extend dynamically: existing component change orembed new function modular components.

Fig.5. Sketch map of modular NN system existingcomponents change

Compare fig 5 as fig 3, '0f changed into''

0f , but 0f

keep unchanged. That is the first situation: new trainingsamples were added to NN2. Then NN2 need retraining. To

NN1, any change is needless. In fig 6, f is approximate by

0f cooperate with''

0f by the control of CM component.

Because only training sample added, the projection point

position of f in S is unchanged.

Fig.6. neural field learning structure map existing

components change

Fig.7. Sketch map of modular NN new function modular

embed

Fig.8. Neural field learning structure map fig newfunction modular embed

Another situation of system extend showed in fig 7 andfig 8. Compare fig 7 as fig 3, a new function module NN3was embed. That mean, system has three function modules.

In fig 8, 0f

and'0f keep unchanged. Thats mean: to NN1

and NN2, any change is needless. Bf

stands for the

projection point in S of new embed module NN3, the realitysystem projection point in manifold S is 'f , 'f is

approximate by 0f

,'0f and Bf by the control of CM

component.

From the analysis of neural field knowledge-increasablelearning behaviorsabove, we can see that the key point ofsystem knowledge-increasable learning is CM component.If only CM component can be adaptive to the change ofsituation, the system can deal with complex problem byembedding new function module. Thats mean:Learn as human brain does, the system can achieveknowledge-increasable and knowledge-inheritable function

by extension that equals to embed new function

'

'0

f

'mR

'

CM

12

y

10NNf

2

'

0NNf

x

''' yy

'''00ff

''' AA

'X

2 CM

1

y

10NNf

2

''

0 NNf

x

CM

1

P

2y

10NNf

2

'

0NNf

3NNfB

'

'0

f

'A

mR

'X

B'f

ByBf

'B

X


4/5


589

module(knowledge) into the system. The following sectionis an example of system with knowledge-increasable ability.

3 Knowledge-increasable ANN system

Fig .9. Large Scale Knowledge-increasable ANN

Large scale Knowledge-increasable ANN (LSKIANN)has similarity with the composite neural network whennetwork structure is considered. LSKIANN consists of

several sub-networks and an cooperation module called CM.The main idea of LSKIANN on multi-sort classification is

to divide the sample training set into several sub-sets,where the intersection of any two sub-sets is null and theunion of the whole sub-sets is the training set. Each sub-network has the same structure and learning parameters, butthe training set is different. The sub-networkssimultaneously calculate for the training sample, but onlythe one that has been trained with the sample can recognizeit correctly. The functionality of the CM module is to select

an optimized result as the system result according to therule of similarity. Here, we use maximize similar degree as

the selection standard.On the implementation of LSANN, the sub-networks are

implemented with BP algorithm, while the CM module usesthe method of similarity comparison. The input of the CSBmodule is connected with the output of each sub-networkwhere the source data for the CM modules judgmentscomes from.

The following are the merits of LSKIANN on solving themulti-sort classification problem of giant pattern set.

1 By the method of grouping, each sub-network

need to be trained with a sub-set rather than the whole.By increasing the number of the sub-network we can

reduce the scale of training set of each sub-network.And this can not only accelerate calculation speed andimprove recognition rate but also be extended easilyand freely.

2We can train the sub-networks with parallel processing technologies, which may lessen thetraining time of the whole system as possible as itcould.

3 The network is flexible and can be expandedeasily. When the sort of problems increases, the only

thing that should be done is to add a new sub-network.

And only the newly added sub-network needs to betrained with the added training samples, others

maintain unchanged. The system can learn as human brain does. By embedding new function modules toachieve knowledge accumulation.

4 Experiment and Conclusion

We use Chinese character recognition to test our system.The main idea of the system is to divide the whole Chinese

character set Q into several sub-sets Q i ,i[1,n], where theintersection of any two sub-sets is null while the union of

the whole sub-sets is the whole set. Recognition networkNNi is able to recognize any character of sub-set Qi, and the

Control Network CM is used to determine which result isthe maximize possible result. it will enable the sub-network NNi which output the maximize recognition resultand let it send the result to the output system, the other NNioutput result was discard.

The pattern set consists of two parts. One is the secondlibrary of Chinese Characters set, which includes 6763Chinese Characters. According our method[10,11], they were

divided into 59 group. The following table show the groupnumber(GN) and character number(CN) in every group.

Dynamic grouping method was used in our method in orderto considerate the similarity of characters.

Table 1 Chinese characters grouping table

GN

1 2 3 4 5 6 7 8 9 10

CN 00 110 140 115 115 115 115 120 125 140

G

N

11 12 13 14 15 16 17 18 19 20

CN

100 110 100 100 145 100 115 135 105105

GN

21 22 23 24 25 26 27 28 29 30

C

N

85 115 105 100 105 120 115 115 130110

GN

31 2 33 34 35 36 37 38 39 40

C

N

115 10

0

115 145 125 120 225 135 110 100

G

N

41 42 43 44 45 46 47 48 49 50

CN

110 120 125 125 115 130 120 90 120 130

51 52 53 54 55 56 57 58 59

95

80 135 80 125 95 100 85 83

After sub networks were built when grouping training

finished, test sample was input to test. According to the

decision of cooperation module, here is maximize similaritydegree, we can get the last recognition result.

g1NN1

NN2

gP

yX

g2

Cooperation Module

NNP


5/5


590

Fig 10 is the standard character recognition result. Andfig 11 is a noise test sample recognition result.

Fig .10. standard character recognition result

Fig .11. a noise test sample recognition result

We applied the LSKIANN structures in our experiments.In this structure, we discard the Control Network of thePNN architecture, thus parallel technologies can be

employed in the training procedure of each sub-networkwhich is constructed with the BP algorithm. On recognition,once a BP network has finished its computation, it begins tocalculate the similarity degree between standard patternrepresented by the result and the input pattern, then it reportthe similarity result to the CM module for comparison. Wethink that this architecture has fully developed the parallelism of computation and overcome the structuralfaults of the PNN architecture. Obviously, when a new subnetwork was added to the system, these original networks

neednt change. By this method, system can be extendeddynamically with the knowledge-inheritable and

knowledge-accumulated ability.

Based on the theory analysis of information geometryand differential manifold, this paper researches knowledge-

increasable learning behaviors of neural field and presents alayered Artificial Neural Network (ANN) model, which hasknowledge-increasable and structure-extendible ability. The

method helps to provide explanation of the transformationmechanism of human recognition system and understandthe theory of global architecture of neural network.

Acknowledgements

This work supported by the National Nature Science

Foundation under contract 69973002.

References

[1] Amari S., Differential-Geometrical Methods inStatistics, second edition, Lecture Notes in StatisticsNo. 28, Springer-Verlag, Berlin 1990

[2] Amari S., dualistic geometry of the manifold ofhigher-order neurons, Neural Network, 4(4) 443-4511991

[3] Amari S., K. Kurata and H. Nagaoka, DifferentialGeometry of Boltzmann Machines. IEEE Transactionson Neural Networks 3(2): 260-271 1992

[4] Amari S., information geometry of the EM and em

algorithm for neural network. Neural Network 8 (9),1379-1408, 1995

[5] Luo. Siwei, Large Pattern Set Processing of Artificial

Neural Network, USA Signal and image processingACTA2000 Nov 423--429[6] Amari S., Tomoko Ozeki and Hyeyoung Park.

Information Geometry of Adaptive Systems.Symposium 2000 On Adaptive Systems For SignalProcessing, Communication And Control. OCT 2000

[7] Zhu Huaiyu On the Mathematical Foundation of

Learning Algorithm Submitted to Machine Learning,1996 7. 15

[8] Jordan and Jacobs Hiearchical Mixtures of Experts

and the Em Algorithm Neural Computation(6),

181-214, 2 , 1994

[9] Jacobs and Jordan, Adaptive mixtures of Local

ExpertsNeural Computation2(3), 79-87, 1991[10] Luo Siwei Han Zhen Large Scale Neural

Networks And its Application On Recognition of

Chinese Character, Fourth International Conferenceon Signal Processing Proceedings (ICSP98), 1998,

12731277

[11] Luo Siwei, Han Zhen, Zhang Aijun, MicrocomputerCluster Parallel Processing System and Its Application

on Large Scale Artificial Neural NetworkSecond

Sino-German Workshop on Advanced Parallel

Processing Technologies (APPT97), 1997, 8387

Documents

Si-Wei Luo et al- Knowledge-increasable learning behaviors research of neural field