Upload
jmasn
View
212
Download
0
Embed Size (px)
Citation preview
8/3/2019 Si-Wei Luo et al- Knowledge-increasable learning behaviors research of neural field
1/5
Proceedings of the First International Conference on Machine Learning and Cybernetics, Beijing, 4-5 November 2002
0-7803-7508-4/02/$17.00 2002 IEEE
586
KNOWLEDGE-INCREASABLE LEARNING BEHAVIORS RESEARCH OF
NEURAL FIELD
SI-WEI LUO, JIN-WEI WEN, HUA HUANG
Institute of Computer Science, Beijing Northern Jiaotong University, Beijing, China, 100044
E-MAIL: [email protected]
Abstract:In a hierarchical set of systems, a lower order system is
included in the parameter space of a large one as a subset.
Such a parameter space has rich geometrical structures that
are responsible for the dynamic behaviors of learning. Based
on the theory analysis of information geometry and
differential manifold, this paper researches knowledge-increasable learning behaviors of neural field and presents a
layered knowledge-increasable Artificial Neural Network
(KIANN) model, which has knowledge-increasable and
structure-extendible ability. The method helps to provide
explanation of the transformation mechanism of human
recognition system and understand the theory of global
architecture of neural network.
Keywords:Information geometry; Artificial neural network (ANN);
Neural field; Parallel processing; Knowledge-Increasable
1 Introduction
Currently, the main problems which affect thedevelopment of Artificial Neural Network (ANN) are: Onone hand, the learning model currently used in ANNrequires that the learning procedure be accomplished in atime, which is different from the step by step model ofhumans. Thus the scale of ANN increases when pattern setgrows, which in turn results in the growth of timeconsumption and high requirements on the device context.On the other hand, the learned patterns will be destroyedwhen an ANN learns new patterns. It can not realize the
aim of knowledge inheriting and accumulating. The aim ofpracticing Artificial Neural Network (ANN) is to utilizing
ANN to process large scale pattern set simply and rapidly.The research object of neural field theory is referred tounderstand the transformation mechanism, dynamical behavior, capability and limitation of neural networks, bythe study of global structure analysis. By analysis andresearch the dynamic learning behaviors of neural field, itcan provide explanation of the transformation mechanismof human recognition system and understand the theory ofglobal architecture of neural network.
Information geometry, originated from the geometric
study of the manifold of probability distributions (Amari[1][2]
), has been successfully applied to many fields. Itstudies the intrinsic geometrical structure to be introduced
in the manifold of a family of probability distributions.Amari developed a theory of dual structures and unified allof these theories in the dual differential geometrical
frameworks. Information geometry has been used so far notonly for mathematical foundations of statistical inferences
but also applied to information theory, neural networks,system theory, mathematical programming and others. It isimportant to study the properties which a family of probability distributions posses as a whole, since a parameterized statistical model constitutes a family ofdistributions. Such a family forms a geometrical manifoldin many cases, and geometric structures, such as rimennainmetric, affine connections, and divergences of two points.These intrinsic geometrical structures represent someimportant properties of the family, which are not mere
aggregations for the properties of the distributions in thefamily but are given rise to by the mutual relations of the
distributions. By using information geometry, the intrinsic
geometrical structure properties of a hierarchical structuresand the invariant decomposition can be elucidated. It also provides the explanation of approximate process of NNlearning. As a useful math tool, information geometry playsan important role in the analysis and research of neural fieldtheory.
A family of neural networks forms a neuromanifold,
(Amari[3,4]
). The structure of layered ANN can be described by dual flat manifold structure. By analyzing and
researching the knowledge increasable learning ability ofneural field, we present a layered ANN model by using theknowledge-increasable and knowledge -inheritable thinkingof Siwei Luo
[5]to implement increase and accumulation of
acquired knowledge and solve the problem of systemextension and complex data disposal. We put forward amethod to construct a large scale ANN(LSANN) withoutdestroying the functionality of the existing one. This isimplemented by embedding independent ANNs that havedifferent functionality. Such ANNs have shown the ability
of inheriting the patterns learned and we can use thismethod to construct LSANNs that are appropriate for large-scale pattern set processing. This has solved the difficultiesin constructing LSANNS.
2 Neural Field Knowledge-increasable Learning
Behaviors
8/3/2019 Si-Wei Luo et al- Knowledge-increasable learning behaviors research of neural field
2/5
Proceedings of the First International Conference on Machine Learning and Cybernetics, Beijing, 4-5 November 2002
587
2.1 Information geometry theory of Neural Field
Learning
According to Amari[3,4], the set of systems with
modifiable parameters form a parameter space where the parameters play the role of a coordinate system. Astatistical model corresponds to a geometrical figure, adistribution corresponds to a point, and parameterscorrespond to coordinates. A geometrical theory of
statistics would have the property that whatever coordinatesare used, the statements about relations among the points
and figures remain the same[7]
.A family of neural networksystems form a neuromanifold. Amari
[3,4,6]has proved that
most of Neural Network systems can be expressed by a probability distribution which can included in the spacewith a large number of parameters which was showed infig1. In the following section, we will give explanations.
Fig.1. Probability distribution description in manifold
In the theory of neural field learning, the learning rule is
usually a mapping form some data(sample) Zz , to
some parameter w . z is composed of ),( yx input andoutput. Usually, In the meaning of statistics x has a fixeddistribution, which mean that the distribution of z isrestricted to within a submanifold. Training data areassumed to be sampled form an environmental distribution
or true distributionPp
. The space P is composed ofall probability distributions over the sample space z andforms a differentiable manifold. The network, which
finished training, will represent an estimated distribution or
estimated)|( wPq
. The set ofQ
of all thedistributions represent by a network is the computation
model and is usually a smooth submanifold ofP. The
weights are of course coordinates. The manifoldQ
is amanifold which is equipped with the Riemannian metric
)(gij given by the fisher infromation matrix and
moreover which is equipped with dually flat affineconnections(Amari[1]), There are two fundamental
coordinate systemsand
and in it. Neural netwrok
structure can be expressed by exponential probabilitydistributions a family.[2,3,4]
Fig.2. neural field structure map space
2.2 Neural Field Knowledge-increasable Learning
Behaviors Analysis
The aim of practicing Artificial Neural Network(ANN) isto utilize And the main method is: dividing a task into smalltasks by the principle of division and rule or a networkwhich is composed of several known artificial neuralnetworks and matches the NN to cooperate together torealize the learning problem of great amount of data. Forexample the Hierarchical Mixture-of-Expert (HME[8,9]) is aclassical model. It can yield a solution to the complexproblem and improve the performance of a single network.But In some cases, the number of parameters in the HME
models may be too large, resulting in potentially overlycomplex functions, whilst in other cases, there may be too
few parameters, resulting in under-fitting of the data.Indeed, given an arbitrary complex function, there is noreason to expect that increasing the number of samplesfrom the function should require more parameters to modelit, and such a rule is also therefore likely to produce modelsthat will tend to over fit the data. And another disadvantageis: When system extended (add pattern classes or trainingsamples), all experts, including gating function, need
retraining that will destroy acquired knowledge. If there arelarge quantities of training data, it will take long time to
retrain, which will bring to knowledge waste. In order toovercome these disadvantages, the knowledge-increasableand knowledge-inheritable ability is necessary, which
means system can be extended dynamically. We canconstruct a modular parallel ANN to implement increaseand accumulation of acquired knowledge and solve the problem of system extension and complex data disposal.Learning like human brain does, the system can beextended dynamically to achieve extension. In thefollowing section, we will analyze the neural field
knowledge-increasable learning behaviors of functionmodular ANN system.
8/3/2019 Si-Wei Luo et al- Knowledge-increasable learning behaviors research of neural field
3/5
Proceedings of the First International Conference on Machine Learning and Cybernetics, Beijing, 4-5 November 2002
588
Fig.3. Sketch map of modular NN system
Fig.4.. neural field learning structure map
Fig 3 is the sketch map of function modular NN system.It consisted by two NN modules, NN1 and NN2.CM is the
control component of system. It can be NN module or othercomponents that have decision-makings ability. Accordingto the neural field learning theory, the structure map of fig3was showed in fig4. Fig 4 is a function modular neural field
learning sketch fig. S is the statistic model manifold.
S= )};|),(({ xzyP }...2,1),,{( Niyx ii = is the
training sample set which was divided into A and A.A isthe neural network learning process by the training sampleset of NN1 and A is another learning process by thetraining sample set of NN2. X and X are the map operationwhich project NN1 and NN2 into manifold S and get the
point 0f
and'
0
f. f stands for the projection point in S
of reality system which is too complex to get , and it is
approximate by 0f
cooperate with'0f by the control of
CM component normally.The function modular system has the ability of
knowledge-increasable learning. In fig 3, after the structure
of system 0f
and'0f were determined, it has two situations
to extend dynamically: existing component change orembed new function modular components.
Fig.5. Sketch map of modular NN system existingcomponents change
Compare fig 5 as fig 3, '0f changed into''
0f , but 0f
keep unchanged. That is the first situation: new trainingsamples were added to NN2. Then NN2 need retraining. To
NN1, any change is needless. In fig 6, f is approximate by
0f cooperate with''
0f by the control of CM component.
Because only training sample added, the projection point
position of f in S is unchanged.
Fig.6. neural field learning structure map existing
components change
Fig.7. Sketch map of modular NN new function modular
embed
Fig.8. Neural field learning structure map fig newfunction modular embed
Another situation of system extend showed in fig 7 andfig 8. Compare fig 7 as fig 3, a new function module NN3was embed. That mean, system has three function modules.
In fig 8, 0f
and'0f keep unchanged. Thats mean: to NN1
and NN2, any change is needless. Bf
stands for the
projection point in S of new embed module NN3, the realitysystem projection point in manifold S is 'f , 'f is
approximate by 0f
,'0f and Bf by the control of CM
component.
From the analysis of neural field knowledge-increasablelearning behaviorsabove, we can see that the key point ofsystem knowledge-increasable learning is CM component.If only CM component can be adaptive to the change ofsituation, the system can deal with complex problem byembedding new function module. Thats mean:Learn as human brain does, the system can achieveknowledge-increasable and knowledge-inheritable function
by extension that equals to embed new function
'
'0
f
'mR
'
CM
12
y
10NNf
2
'
0NNf
x
''' yy
'''00ff
''' AA
'X
2 CM
1
y
10NNf
2
''
0 NNf
x
CM
1
P
2y
10NNf
2
'
0NNf
3NNfB
'
'0
f
'A
mR
'X
B'f
ByBf
'B
X
8/3/2019 Si-Wei Luo et al- Knowledge-increasable learning behaviors research of neural field
4/5
Proceedings of the First International Conference on Machine Learning and Cybernetics, Beijing, 4-5 November 2002
589
module(knowledge) into the system. The following sectionis an example of system with knowledge-increasable ability.
3 Knowledge-increasable ANN system
Fig .9. Large Scale Knowledge-increasable ANN
Large scale Knowledge-increasable ANN (LSKIANN)has similarity with the composite neural network whennetwork structure is considered. LSKIANN consists of
several sub-networks and an cooperation module called CM.The main idea of LSKIANN on multi-sort classification is
to divide the sample training set into several sub-sets,where the intersection of any two sub-sets is null and theunion of the whole sub-sets is the training set. Each sub-network has the same structure and learning parameters, butthe training set is different. The sub-networkssimultaneously calculate for the training sample, but onlythe one that has been trained with the sample can recognizeit correctly. The functionality of the CM module is to select
an optimized result as the system result according to therule of similarity. Here, we use maximize similar degree as
the selection standard.On the implementation of LSANN, the sub-networks are
implemented with BP algorithm, while the CM module usesthe method of similarity comparison. The input of the CSBmodule is connected with the output of each sub-networkwhere the source data for the CM modules judgmentscomes from.
The following are the merits of LSKIANN on solving themulti-sort classification problem of giant pattern set.
1 By the method of grouping, each sub-network
need to be trained with a sub-set rather than the whole.By increasing the number of the sub-network we can
reduce the scale of training set of each sub-network.And this can not only accelerate calculation speed andimprove recognition rate but also be extended easilyand freely.
2We can train the sub-networks with parallel processing technologies, which may lessen thetraining time of the whole system as possible as itcould.
3 The network is flexible and can be expandedeasily. When the sort of problems increases, the only
thing that should be done is to add a new sub-network.
And only the newly added sub-network needs to betrained with the added training samples, others
maintain unchanged. The system can learn as human brain does. By embedding new function modules toachieve knowledge accumulation.
4 Experiment and Conclusion
We use Chinese character recognition to test our system.The main idea of the system is to divide the whole Chinese
character set Q into several sub-sets Q i ,i[1,n], where theintersection of any two sub-sets is null while the union of
the whole sub-sets is the whole set. Recognition networkNNi is able to recognize any character of sub-set Qi, and the
Control Network CM is used to determine which result isthe maximize possible result. it will enable the sub-network NNi which output the maximize recognition resultand let it send the result to the output system, the other NNioutput result was discard.
The pattern set consists of two parts. One is the secondlibrary of Chinese Characters set, which includes 6763Chinese Characters. According our method[10,11], they were
divided into 59 group. The following table show the groupnumber(GN) and character number(CN) in every group.
Dynamic grouping method was used in our method in orderto considerate the similarity of characters.
Table 1 Chinese characters grouping table
GN
1 2 3 4 5 6 7 8 9 10
CN 00 110 140 115 115 115 115 120 125 140
G
N
11 12 13 14 15 16 17 18 19 20
CN
100 110 100 100 145 100 115 135 105105
GN
21 22 23 24 25 26 27 28 29 30
C
N
85 115 105 100 105 120 115 115 130110
GN
31 2 33 34 35 36 37 38 39 40
C
N
115 10
0
115 145 125 120 225 135 110 100
G
N
41 42 43 44 45 46 47 48 49 50
CN
110 120 125 125 115 130 120 90 120 130
51 52 53 54 55 56 57 58 59
95
80 135 80 125 95 100 85 83
After sub networks were built when grouping training
finished, test sample was input to test. According to the
decision of cooperation module, here is maximize similaritydegree, we can get the last recognition result.
g1NN1
NN2
gP
yX
g2
Cooperation Module
NNP
8/3/2019 Si-Wei Luo et al- Knowledge-increasable learning behaviors research of neural field
5/5
Proceedings of the First International Conference on Machine Learning and Cybernetics, Beijing, 4-5 November 2002
590
Fig 10 is the standard character recognition result. Andfig 11 is a noise test sample recognition result.
Fig .10. standard character recognition result
Fig .11. a noise test sample recognition result
We applied the LSKIANN structures in our experiments.In this structure, we discard the Control Network of thePNN architecture, thus parallel technologies can be
employed in the training procedure of each sub-networkwhich is constructed with the BP algorithm. On recognition,once a BP network has finished its computation, it begins tocalculate the similarity degree between standard patternrepresented by the result and the input pattern, then it reportthe similarity result to the CM module for comparison. Wethink that this architecture has fully developed the parallelism of computation and overcome the structuralfaults of the PNN architecture. Obviously, when a new subnetwork was added to the system, these original networks
neednt change. By this method, system can be extendeddynamically with the knowledge-inheritable and
knowledge-accumulated ability.
Based on the theory analysis of information geometryand differential manifold, this paper researches knowledge-
increasable learning behaviors of neural field and presents alayered Artificial Neural Network (ANN) model, which hasknowledge-increasable and structure-extendible ability. The
method helps to provide explanation of the transformationmechanism of human recognition system and understandthe theory of global architecture of neural network.
Acknowledgements
This work supported by the National Nature Science
Foundation under contract 69973002.
References
[1] Amari S., Differential-Geometrical Methods inStatistics, second edition, Lecture Notes in StatisticsNo. 28, Springer-Verlag, Berlin 1990
[2] Amari S., dualistic geometry of the manifold ofhigher-order neurons, Neural Network, 4(4) 443-4511991
[3] Amari S., K. Kurata and H. Nagaoka, DifferentialGeometry of Boltzmann Machines. IEEE Transactionson Neural Networks 3(2): 260-271 1992
[4] Amari S., information geometry of the EM and em
algorithm for neural network. Neural Network 8 (9),1379-1408, 1995
[5] Luo. Siwei, Large Pattern Set Processing of Artificial
Neural Network, USA Signal and image processingACTA2000 Nov 423--429[6] Amari S., Tomoko Ozeki and Hyeyoung Park.
Information Geometry of Adaptive Systems.Symposium 2000 On Adaptive Systems For SignalProcessing, Communication And Control. OCT 2000
[7] Zhu Huaiyu On the Mathematical Foundation of
Learning Algorithm Submitted to Machine Learning,1996 7. 15
[8] Jordan and Jacobs Hiearchical Mixtures of Experts
and the Em Algorithm Neural Computation(6),
181-214, 2 , 1994
[9] Jacobs and Jordan, Adaptive mixtures of Local
ExpertsNeural Computation2(3), 79-87, 1991[10] Luo Siwei Han Zhen Large Scale Neural
Networks And its Application On Recognition of
Chinese Character, Fourth International Conferenceon Signal Processing Proceedings (ICSP98), 1998,
12731277
[11] Luo Siwei, Han Zhen, Zhang Aijun, MicrocomputerCluster Parallel Processing System and Its Application
on Large Scale Artificial Neural NetworkSecond
Sino-German Workshop on Advanced Parallel
Processing Technologies (APPT97), 1997, 8387