Si-Wei Luo et al- Knowledge-increasable learning behaviors research of neural field

  • Upload
    jmasn

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

  • 8/3/2019 Si-Wei Luo et al- Knowledge-increasable learning behaviors research of neural field

    1/5

    Proceedings of the First International Conference on Machine Learning and Cybernetics, Beijing, 4-5 November 2002

    0-7803-7508-4/02/$17.00 2002 IEEE

    586

    KNOWLEDGE-INCREASABLE LEARNING BEHAVIORS RESEARCH OF

    NEURAL FIELD

    SI-WEI LUO, JIN-WEI WEN, HUA HUANG

    Institute of Computer Science, Beijing Northern Jiaotong University, Beijing, China, 100044

    E-MAIL: [email protected]

    Abstract:In a hierarchical set of systems, a lower order system is

    included in the parameter space of a large one as a subset.

    Such a parameter space has rich geometrical structures that

    are responsible for the dynamic behaviors of learning. Based

    on the theory analysis of information geometry and

    differential manifold, this paper researches knowledge-increasable learning behaviors of neural field and presents a

    layered knowledge-increasable Artificial Neural Network

    (KIANN) model, which has knowledge-increasable and

    structure-extendible ability. The method helps to provide

    explanation of the transformation mechanism of human

    recognition system and understand the theory of global

    architecture of neural network.

    Keywords:Information geometry; Artificial neural network (ANN);

    Neural field; Parallel processing; Knowledge-Increasable

    1 Introduction

    Currently, the main problems which affect thedevelopment of Artificial Neural Network (ANN) are: Onone hand, the learning model currently used in ANNrequires that the learning procedure be accomplished in atime, which is different from the step by step model ofhumans. Thus the scale of ANN increases when pattern setgrows, which in turn results in the growth of timeconsumption and high requirements on the device context.On the other hand, the learned patterns will be destroyedwhen an ANN learns new patterns. It can not realize the

    aim of knowledge inheriting and accumulating. The aim ofpracticing Artificial Neural Network (ANN) is to utilizing

    ANN to process large scale pattern set simply and rapidly.The research object of neural field theory is referred tounderstand the transformation mechanism, dynamical behavior, capability and limitation of neural networks, bythe study of global structure analysis. By analysis andresearch the dynamic learning behaviors of neural field, itcan provide explanation of the transformation mechanismof human recognition system and understand the theory ofglobal architecture of neural network.

    Information geometry, originated from the geometric

    study of the manifold of probability distributions (Amari[1][2]

    ), has been successfully applied to many fields. Itstudies the intrinsic geometrical structure to be introduced

    in the manifold of a family of probability distributions.Amari developed a theory of dual structures and unified allof these theories in the dual differential geometrical

    frameworks. Information geometry has been used so far notonly for mathematical foundations of statistical inferences

    but also applied to information theory, neural networks,system theory, mathematical programming and others. It isimportant to study the properties which a family of probability distributions posses as a whole, since a parameterized statistical model constitutes a family ofdistributions. Such a family forms a geometrical manifoldin many cases, and geometric structures, such as rimennainmetric, affine connections, and divergences of two points.These intrinsic geometrical structures represent someimportant properties of the family, which are not mere

    aggregations for the properties of the distributions in thefamily but are given rise to by the mutual relations of the

    distributions. By using information geometry, the intrinsic

    geometrical structure properties of a hierarchical structuresand the invariant decomposition can be elucidated. It also provides the explanation of approximate process of NNlearning. As a useful math tool, information geometry playsan important role in the analysis and research of neural fieldtheory.

    A family of neural networks forms a neuromanifold,

    (Amari[3,4]

    ). The structure of layered ANN can be described by dual flat manifold structure. By analyzing and

    researching the knowledge increasable learning ability ofneural field, we present a layered ANN model by using theknowledge-increasable and knowledge -inheritable thinkingof Siwei Luo

    [5]to implement increase and accumulation of

    acquired knowledge and solve the problem of systemextension and complex data disposal. We put forward amethod to construct a large scale ANN(LSANN) withoutdestroying the functionality of the existing one. This isimplemented by embedding independent ANNs that havedifferent functionality. Such ANNs have shown the ability

    of inheriting the patterns learned and we can use thismethod to construct LSANNs that are appropriate for large-scale pattern set processing. This has solved the difficultiesin constructing LSANNS.

    2 Neural Field Knowledge-increasable Learning

    Behaviors

  • 8/3/2019 Si-Wei Luo et al- Knowledge-increasable learning behaviors research of neural field

    2/5

    Proceedings of the First International Conference on Machine Learning and Cybernetics, Beijing, 4-5 November 2002

    587

    2.1 Information geometry theory of Neural Field

    Learning

    According to Amari[3,4], the set of systems with

    modifiable parameters form a parameter space where the parameters play the role of a coordinate system. Astatistical model corresponds to a geometrical figure, adistribution corresponds to a point, and parameterscorrespond to coordinates. A geometrical theory of

    statistics would have the property that whatever coordinatesare used, the statements about relations among the points

    and figures remain the same[7]

    .A family of neural networksystems form a neuromanifold. Amari

    [3,4,6]has proved that

    most of Neural Network systems can be expressed by a probability distribution which can included in the spacewith a large number of parameters which was showed infig1. In the following section, we will give explanations.

    Fig.1. Probability distribution description in manifold

    In the theory of neural field learning, the learning rule is

    usually a mapping form some data(sample) Zz , to

    some parameter w . z is composed of ),( yx input andoutput. Usually, In the meaning of statistics x has a fixeddistribution, which mean that the distribution of z isrestricted to within a submanifold. Training data areassumed to be sampled form an environmental distribution

    or true distributionPp

    . The space P is composed ofall probability distributions over the sample space z andforms a differentiable manifold. The network, which

    finished training, will represent an estimated distribution or

    estimated)|( wPq

    . The set ofQ

    of all thedistributions represent by a network is the computation

    model and is usually a smooth submanifold ofP. The

    weights are of course coordinates. The manifoldQ

    is amanifold which is equipped with the Riemannian metric

    )(gij given by the fisher infromation matrix and

    moreover which is equipped with dually flat affineconnections(Amari[1]), There are two fundamental

    coordinate systemsand

    and in it. Neural netwrok

    structure can be expressed by exponential probabilitydistributions a family.[2,3,4]

    Fig.2. neural field structure map space

    2.2 Neural Field Knowledge-increasable Learning

    Behaviors Analysis

    The aim of practicing Artificial Neural Network(ANN) isto utilize And the main method is: dividing a task into smalltasks by the principle of division and rule or a networkwhich is composed of several known artificial neuralnetworks and matches the NN to cooperate together torealize the learning problem of great amount of data. Forexample the Hierarchical Mixture-of-Expert (HME[8,9]) is aclassical model. It can yield a solution to the complexproblem and improve the performance of a single network.But In some cases, the number of parameters in the HME

    models may be too large, resulting in potentially overlycomplex functions, whilst in other cases, there may be too

    few parameters, resulting in under-fitting of the data.Indeed, given an arbitrary complex function, there is noreason to expect that increasing the number of samplesfrom the function should require more parameters to modelit, and such a rule is also therefore likely to produce modelsthat will tend to over fit the data. And another disadvantageis: When system extended (add pattern classes or trainingsamples), all experts, including gating function, need

    retraining that will destroy acquired knowledge. If there arelarge quantities of training data, it will take long time to

    retrain, which will bring to knowledge waste. In order toovercome these disadvantages, the knowledge-increasableand knowledge-inheritable ability is necessary, which

    means system can be extended dynamically. We canconstruct a modular parallel ANN to implement increaseand accumulation of acquired knowledge and solve the problem of system extension and complex data disposal.Learning like human brain does, the system can beextended dynamically to achieve extension. In thefollowing section, we will analyze the neural field

    knowledge-increasable learning behaviors of functionmodular ANN system.

  • 8/3/2019 Si-Wei Luo et al- Knowledge-increasable learning behaviors research of neural field

    3/5

    Proceedings of the First International Conference on Machine Learning and Cybernetics, Beijing, 4-5 November 2002

    588

    Fig.3. Sketch map of modular NN system

    Fig.4.. neural field learning structure map

    Fig 3 is the sketch map of function modular NN system.It consisted by two NN modules, NN1 and NN2.CM is the

    control component of system. It can be NN module or othercomponents that have decision-makings ability. Accordingto the neural field learning theory, the structure map of fig3was showed in fig4. Fig 4 is a function modular neural field

    learning sketch fig. S is the statistic model manifold.

    S= )};|),(({ xzyP }...2,1),,{( Niyx ii = is the

    training sample set which was divided into A and A.A isthe neural network learning process by the training sampleset of NN1 and A is another learning process by thetraining sample set of NN2. X and X are the map operationwhich project NN1 and NN2 into manifold S and get the

    point 0f

    and'

    0

    f. f stands for the projection point in S

    of reality system which is too complex to get , and it is

    approximate by 0f

    cooperate with'0f by the control of

    CM component normally.The function modular system has the ability of

    knowledge-increasable learning. In fig 3, after the structure

    of system 0f

    and'0f were determined, it has two situations

    to extend dynamically: existing component change orembed new function modular components.

    Fig.5. Sketch map of modular NN system existingcomponents change

    Compare fig 5 as fig 3, '0f changed into''

    0f , but 0f

    keep unchanged. That is the first situation: new trainingsamples were added to NN2. Then NN2 need retraining. To

    NN1, any change is needless. In fig 6, f is approximate by

    0f cooperate with''

    0f by the control of CM component.

    Because only training sample added, the projection point

    position of f in S is unchanged.

    Fig.6. neural field learning structure map existing

    components change

    Fig.7. Sketch map of modular NN new function modular

    embed

    Fig.8. Neural field learning structure map fig newfunction modular embed

    Another situation of system extend showed in fig 7 andfig 8. Compare fig 7 as fig 3, a new function module NN3was embed. That mean, system has three function modules.

    In fig 8, 0f

    and'0f keep unchanged. Thats mean: to NN1

    and NN2, any change is needless. Bf

    stands for the

    projection point in S of new embed module NN3, the realitysystem projection point in manifold S is 'f , 'f is

    approximate by 0f

    ,'0f and Bf by the control of CM

    component.

    From the analysis of neural field knowledge-increasablelearning behaviorsabove, we can see that the key point ofsystem knowledge-increasable learning is CM component.If only CM component can be adaptive to the change ofsituation, the system can deal with complex problem byembedding new function module. Thats mean:Learn as human brain does, the system can achieveknowledge-increasable and knowledge-inheritable function

    by extension that equals to embed new function

    '

    '0

    f

    'mR

    '

    CM

    12

    y

    10NNf

    2

    '

    0NNf

    x

    ''' yy

    '''00ff

    ''' AA

    'X

    2 CM

    1

    y

    10NNf

    2

    ''

    0 NNf

    x

    CM

    1

    P

    2y

    10NNf

    2

    '

    0NNf

    3NNfB

    '

    '0

    f

    'A

    mR

    'X

    B'f

    ByBf

    'B

    X

  • 8/3/2019 Si-Wei Luo et al- Knowledge-increasable learning behaviors research of neural field

    4/5

    Proceedings of the First International Conference on Machine Learning and Cybernetics, Beijing, 4-5 November 2002

    589

    module(knowledge) into the system. The following sectionis an example of system with knowledge-increasable ability.

    3 Knowledge-increasable ANN system

    Fig .9. Large Scale Knowledge-increasable ANN

    Large scale Knowledge-increasable ANN (LSKIANN)has similarity with the composite neural network whennetwork structure is considered. LSKIANN consists of

    several sub-networks and an cooperation module called CM.The main idea of LSKIANN on multi-sort classification is

    to divide the sample training set into several sub-sets,where the intersection of any two sub-sets is null and theunion of the whole sub-sets is the training set. Each sub-network has the same structure and learning parameters, butthe training set is different. The sub-networkssimultaneously calculate for the training sample, but onlythe one that has been trained with the sample can recognizeit correctly. The functionality of the CM module is to select

    an optimized result as the system result according to therule of similarity. Here, we use maximize similar degree as

    the selection standard.On the implementation of LSANN, the sub-networks are

    implemented with BP algorithm, while the CM module usesthe method of similarity comparison. The input of the CSBmodule is connected with the output of each sub-networkwhere the source data for the CM modules judgmentscomes from.

    The following are the merits of LSKIANN on solving themulti-sort classification problem of giant pattern set.

    1 By the method of grouping, each sub-network

    need to be trained with a sub-set rather than the whole.By increasing the number of the sub-network we can

    reduce the scale of training set of each sub-network.And this can not only accelerate calculation speed andimprove recognition rate but also be extended easilyand freely.

    2We can train the sub-networks with parallel processing technologies, which may lessen thetraining time of the whole system as possible as itcould.

    3 The network is flexible and can be expandedeasily. When the sort of problems increases, the only

    thing that should be done is to add a new sub-network.

    And only the newly added sub-network needs to betrained with the added training samples, others

    maintain unchanged. The system can learn as human brain does. By embedding new function modules toachieve knowledge accumulation.

    4 Experiment and Conclusion

    We use Chinese character recognition to test our system.The main idea of the system is to divide the whole Chinese

    character set Q into several sub-sets Q i ,i[1,n], where theintersection of any two sub-sets is null while the union of

    the whole sub-sets is the whole set. Recognition networkNNi is able to recognize any character of sub-set Qi, and the

    Control Network CM is used to determine which result isthe maximize possible result. it will enable the sub-network NNi which output the maximize recognition resultand let it send the result to the output system, the other NNioutput result was discard.

    The pattern set consists of two parts. One is the secondlibrary of Chinese Characters set, which includes 6763Chinese Characters. According our method[10,11], they were

    divided into 59 group. The following table show the groupnumber(GN) and character number(CN) in every group.

    Dynamic grouping method was used in our method in orderto considerate the similarity of characters.

    Table 1 Chinese characters grouping table

    GN

    1 2 3 4 5 6 7 8 9 10

    CN 00 110 140 115 115 115 115 120 125 140

    G

    N

    11 12 13 14 15 16 17 18 19 20

    CN

    100 110 100 100 145 100 115 135 105105

    GN

    21 22 23 24 25 26 27 28 29 30

    C

    N

    85 115 105 100 105 120 115 115 130110

    GN

    31 2 33 34 35 36 37 38 39 40

    C

    N

    115 10

    0

    115 145 125 120 225 135 110 100

    G

    N

    41 42 43 44 45 46 47 48 49 50

    CN

    110 120 125 125 115 130 120 90 120 130

    51 52 53 54 55 56 57 58 59

    95

    80 135 80 125 95 100 85 83

    After sub networks were built when grouping training

    finished, test sample was input to test. According to the

    decision of cooperation module, here is maximize similaritydegree, we can get the last recognition result.

    g1NN1

    NN2

    gP

    yX

    g2

    Cooperation Module

    NNP

  • 8/3/2019 Si-Wei Luo et al- Knowledge-increasable learning behaviors research of neural field

    5/5

    Proceedings of the First International Conference on Machine Learning and Cybernetics, Beijing, 4-5 November 2002

    590

    Fig 10 is the standard character recognition result. Andfig 11 is a noise test sample recognition result.

    Fig .10. standard character recognition result

    Fig .11. a noise test sample recognition result

    We applied the LSKIANN structures in our experiments.In this structure, we discard the Control Network of thePNN architecture, thus parallel technologies can be

    employed in the training procedure of each sub-networkwhich is constructed with the BP algorithm. On recognition,once a BP network has finished its computation, it begins tocalculate the similarity degree between standard patternrepresented by the result and the input pattern, then it reportthe similarity result to the CM module for comparison. Wethink that this architecture has fully developed the parallelism of computation and overcome the structuralfaults of the PNN architecture. Obviously, when a new subnetwork was added to the system, these original networks

    neednt change. By this method, system can be extendeddynamically with the knowledge-inheritable and

    knowledge-accumulated ability.

    Based on the theory analysis of information geometryand differential manifold, this paper researches knowledge-

    increasable learning behaviors of neural field and presents alayered Artificial Neural Network (ANN) model, which hasknowledge-increasable and structure-extendible ability. The

    method helps to provide explanation of the transformationmechanism of human recognition system and understandthe theory of global architecture of neural network.

    Acknowledgements

    This work supported by the National Nature Science

    Foundation under contract 69973002.

    References

    [1] Amari S., Differential-Geometrical Methods inStatistics, second edition, Lecture Notes in StatisticsNo. 28, Springer-Verlag, Berlin 1990

    [2] Amari S., dualistic geometry of the manifold ofhigher-order neurons, Neural Network, 4(4) 443-4511991

    [3] Amari S., K. Kurata and H. Nagaoka, DifferentialGeometry of Boltzmann Machines. IEEE Transactionson Neural Networks 3(2): 260-271 1992

    [4] Amari S., information geometry of the EM and em

    algorithm for neural network. Neural Network 8 (9),1379-1408, 1995

    [5] Luo. Siwei, Large Pattern Set Processing of Artificial

    Neural Network, USA Signal and image processingACTA2000 Nov 423--429[6] Amari S., Tomoko Ozeki and Hyeyoung Park.

    Information Geometry of Adaptive Systems.Symposium 2000 On Adaptive Systems For SignalProcessing, Communication And Control. OCT 2000

    [7] Zhu Huaiyu On the Mathematical Foundation of

    Learning Algorithm Submitted to Machine Learning,1996 7. 15

    [8] Jordan and Jacobs Hiearchical Mixtures of Experts

    and the Em Algorithm Neural Computation(6),

    181-214, 2 , 1994

    [9] Jacobs and Jordan, Adaptive mixtures of Local

    ExpertsNeural Computation2(3), 79-87, 1991[10] Luo Siwei Han Zhen Large Scale Neural

    Networks And its Application On Recognition of

    Chinese Character, Fourth International Conferenceon Signal Processing Proceedings (ICSP98), 1998,

    12731277

    [11] Luo Siwei, Han Zhen, Zhang Aijun, MicrocomputerCluster Parallel Processing System and Its Application

    on Large Scale Artificial Neural NetworkSecond

    Sino-German Workshop on Advanced Parallel

    Processing Technologies (APPT97), 1997, 8387