19
* Corresponding author. Tel.: #81-78-969-2184; fax: #81-78-969-2189. E-mail addresses: qma@crl.go.jp (Q. Ma), isahara@crl.go.jp (H. Isahara). Neurocomputing 34 (2000) 207}225 Semantic networks represented by adaptive associative memories Qing Ma*, Hitoshi Isahara Communications Research Laboratory, Ministry of Posts and Telecommunications, 588-2, Iwaoka, Nishi-ku, Kobe 651-2492, Japan Received 9 November 1997; accepted 13 April 2000 Abstract This paper demonstrates that semantic networks can be well represented by the adaptive associative memories proposed by Ma. The new model of semantic networks using the adaptive associative memories can do inference extremely fast * in time that does not increase with the size of knowledge base. This performance cannot be realized by any previous systems. In addition, the new model has a easily understandable architecture and is #exible in the sense that modifying knowledge can do easily be done using one-shot relearning and the generalization of knowledge is a basic system property. ( 2000 Elsevier Science B.V. All rights reserved. Keywords: Associative memories; Semantic networks; Computational e!ectiveness 1. Introduction The human performance indicates that its knowledge representation is computa- tionally e!ective: human agents take but a few hundred milliseconds to perform a broad range of cognitive tasks in spite of operating with a large knowledge base. The question of computational e!ectiveness must be tackled at the very o!set for under- standing the cognitive systems [19,20]. The traditional AI approaches (i.e., symbolic methods), however, cannot attain such a computational e!ectiveness. To address this important issue in knowledge representation, we think that the arti"cial neural networks, among which particularly the associative memories [7], are the appropriate paradigm. 0925-2312/00/$ - see front matter ( 2000 Elsevier Science B.V. All rights reserved. PII: S 0 9 2 5 - 2 3 1 2 ( 0 0 ) 0 0 2 8 9 - 7

Semantic networks represented by adaptive associative memories

  • Upload
    qing-ma

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

*Corresponding author. Tel.: #81-78-969-2184; fax: #81-78-969-2189.E-mail addresses: [email protected] (Q. Ma), [email protected] (H. Isahara).

Neurocomputing 34 (2000) 207}225

Semantic networks represented by adaptive associativememories

Qing Ma*, Hitoshi IsaharaCommunications Research Laboratory, Ministry of Posts and Telecommunications, 588-2, Iwaoka, Nishi-ku,

Kobe 651-2492, Japan

Received 9 November 1997; accepted 13 April 2000

Abstract

This paper demonstrates that semantic networks can be well represented by the adaptiveassociative memories proposed by Ma. The new model of semantic networks using the adaptiveassociative memories can do inference extremely fast* in time that does not increase with thesize of knowledge base. This performance cannot be realized by any previous systems. Inaddition, the new model has a easily understandable architecture and is #exible in the sense thatmodifying knowledge can do easily be done using one-shot relearning and the generalization ofknowledge is a basic system property. ( 2000 Elsevier Science B.V. All rights reserved.

Keywords: Associative memories; Semantic networks; Computational e!ectiveness

1. Introduction

The human performance indicates that its knowledge representation is computa-tionally e!ective: human agents take but a few hundred milliseconds to performa broad range of cognitive tasks in spite of operating with a large knowledge base. Thequestion of computational e!ectiveness must be tackled at the very o!set for under-standing the cognitive systems [19,20]. The traditional AI approaches (i.e., symbolicmethods), however, cannot attain such a computational e!ectiveness. To address thisimportant issue in knowledge representation, we think that the arti"cial neuralnetworks, among which particularly the associative memories [7], are the appropriateparadigm.

Neucom=1156=Chan=Venkatachala=BG

0925-2312/00/$ - see front matter ( 2000 Elsevier Science B.V. All rights reserved.PII: S 0 9 2 5 - 2 3 1 2 ( 0 0 ) 0 0 2 8 9 - 7

The basic attribute of conventional associative memories proposed so far is one-to-one associative function in heteroassociative types or pattern recovery function inautoassociative types. Several studies have shown that when an associative memory isequipped with one-to-many associative function (or pattern segmentation function inautoassociative types), it becomes a powerful model to address various cognitiveprocesses [8}10,13]. It has been particularly indicated by these studies that theprocedural knowledge, a kind of knowledge consisting of the skills of perceptualmotion or cognition, can be well represented by such an associative memory. Thispaper further demonstrates that the semantic knowledge, the another kind of know-ledge for understanding spoken and written text, making inferences, and so on canalso be well represented by such an associative memory.

In this paper we begin by describing semantic networks and noting the short-comings of the previous models of semantic networks, and pointing out that theseshortcomings can be avoided by using the adaptive associative memories (AAM)proposed by Ma [16]. We then describe in detail a new semantic network model thatuses AAM and that is called associative semantic memory (ASM). We show how itperforms inference based on semantic networks in a way that is computationallye!ective: inference time does not increase with the size of the semantic network. Wealso explain why knowledge can be easily modi"ed and that the generalization ofknowledge is a basic system property. We then describe several computer simulationscon"rming whether ASM behaves as expected and evaluating how fast the ASMperforms inference. We conclude by brie#y summarizing the paper.

2. Semantic networks

The primitives of semantic knowledge are concepts [12,18,21] which can becharacterized by a set of features which are also instances of concepts. For example,the concept `doga can be characterized by a set of features (mammal, fur,teeth, four legs,2,). Semantic knowledge can therefore be represented by a set ofsuch concepts which are related with each other via their properties, e.g., is-a, part-of,and so on. If we use nodes to express concepts and labeled links to expressproperties (the relationships between concepts), semantic knowledge can be represent-ed by a graph which consists of nodes and links. Such graphs are called semanticnetworks.

The semantic networks specially from the conceptual hierarchical point of view arecalled conceptual hierarchies. Fig. 1 shows an example of conceptual hierarchies. Aspointed out by Shastri [19], the characterization of conceptual hierarchies is broadenough to capture the basic organizational principles underlying frame-based repres-entation languages such as KRL [1] and KL-ONE [2]. In real world, we can "ndmany examples of conceptual hierarchies. Roget's thesaurus [3] constructed from theviewpoint of linguistics and cognition, for example, is a very large word (conceptual)hierarchy in which more than 325,000 words and phrases are devided into six "eldsand the words in each "elds are further classi"ed into more than 1000 meaningcategories. This kind of dictionaries has been constructed in several languages and

Neucom=1156=Chan=VVC

208 Q. Ma, H. Isahara / Neurocomputing 34 (2000) 207}225

Fig. 1. Real-world example of conceptual hierarchies.

widely used in explaining human language activities (phenomena) and constructingnatural language processing systems.

2.1. A class of inference: inheritance and recognition

A basic characteristic in conceptual hierarchies is property inheritance: the proper-ties of superordinate concepts are inherited by their subordinate ones. Propertyinheritance leads to the e$cient storage of knowledge. However, a class of basicinference, inheritance and recognition, based on conceptual hierarchies is required toobtain needed knowledge.

Inheritance inference enables us to infer the properties of concepts based on those oftheir ancestors. Suppose, for example, we know elephants are mammals and want toknow how many legs elephants have. Symbolically, we "nd the node for Elephant. Ifthere is no information about legs attached directly to this node, then the node forElephant's ancestor, Mammal, is checked. If the answer is again not attached to thenode, then its ancestor, Animal, is examined, and so on. In Fig. 1, the answer can befound in the Mammal node. Consequently, to answer a seemingly simple query witha symbolic approach, we might have to visit many nodes in the data base to retrievethe desired information.

Recognition inference enables us to "nd individual concepts that have multiplespeci"c properties. When we are told that something has four legs, runs fast, and soon, for example, we can use the conceptual hierarchy shown in Fig. 1 to infer that it isa tiger. Symbolically, the sets MMammal, Elephant, TigerN with four legs and MTiger,SwallowN with runs fast are found "rst. Then, the answer is obtained by examining theintersection of these sets. In this example, Tiger is obtained. Although Tiger has bothfour legs and runs fast, these properties are not necessarily available locally at thenode for Tiger. They may have to be determined via inheritance inference. Doing

Neucom=1156=Chan=VVC

Q. Ma, H. Isahara / Neurocomputing 34 (2000) 207}225 209

set-intersection operations symbolically takes much longer time compared to doinginheritance inference.

In the real world there are exceptions to property inheritance. For example, birdsare thought to #y, but some, such as penguins, do not. Exceptions in conceptualhierarchies give rise to nonmonotonicity and ambiguity, neither of which can behandled using "rst-order logic.

2.2. Shortcomings of the previous models

Since their introduction by Quillian [17], semantic networks have played a signi"-cant role in knowledge representation and form the backbone of many cognitive andAI systems [6]. To date, several connectionist models of semantic networks (e.g.[4,5,11,19,22]) have been proposed to attain computational e!ectiveness: they coulddo inference in time that is logarithmic in the number of concepts in the knowledgebase. As represented by Shastri's system [19], most of them used controlled spreadingactivation as a computational primitive and have very complex architectures. A fewmodels have been proposed that use typical arti"cial neural networks proposed so far(e.g., associative memory and the Boltzmann machine). These models usually havesimple and easily understandable architectures, but lack su$cient ability to handlegeneral problem cases.

The Hinton's network [11], the one most closely related to our model, for example,uses a kind of associative memory in which higher-order problems, inheritance andrecognition inference based on conceptual hierarchies, are attempted to be solved byreducing them into lower-level processing, i.e., pattern recovery. More speci"cally, thenetwork encodes conceptual hierarchies by a set of triples of the form: (concept,property, value). Given two components of a triple, the network can determine thethird tuple. However, because the network performs inference only by pattern recov-ery, there are inherent problems. First, if there are exceptions in the conceptualhierarchies (e.g., birds #y, but some, such as penguins, do not, as shown in Fig. 1),interference may be caused by these exceptions. As a result, inference takes longer andthe accuracy drops. Evidently, if conceptual hierarchies are large (so that there aremore exceptions), the interference may become so strong that inference cannot beperformed correctly. Furthermore, it is almost impossible to obtain correct answersfor complex recognition inference problems that are given by a large number of clues.Besides these inherent problems, the network also su!ers heavily from the shortcom-ings of the perceptron algorithm that it uses. For example, in large conceptualhierarchies, learning takes a very long time, and convergence cannot be guaranteed.Moreover, new knowledge can only be added by relearning the entire conceptualhierarchy, and unnecessary and/or incorrect knowledge cannot be deleted.

2.3. A solution by means of using AAM

The problems existing in Hinton's network mentioned above can be resolved bymeans of using AAM, which can not only do pattern recovery as Hinton's networkcan, but can also do pattern segmentation.

Neucom=1156=Chan=VVC

210 Q. Ma, H. Isahara / Neurocomputing 34 (2000) 207}225

Conceptual hierarchies include two kinds of knowledge: hierarchical relationshipsbetween concepts (e.g. Elephants are mammals), which can be represented usinga special encoding procedure that will be described in Section 3.3, and knowledgeabout each concept, which can be represented by triples (concept, property, value) (e.g.(Elephant, feeding habits, herbivorous)).

By using triple expressions, any inheritance inference problem can be formalizedas (concept: c, property: p, value: ?), where concept: c and property: p are clues andvalue: ? is the solution (value: v) that should be found to "ll the triple so that theknowledge represented by this triple is true (i.e. in agreement with the conceptualhierarchy). Note that triple (concept: c, property: p, value: v) is not necessarily foundlocally at the representation of concept: c. For example, for the triple (Elephant,respiration, pulmonary respiration), the property respiration and its value pulmonaryrespiration are not directly attached in the representation of Elephant as shown inFig. 1. In connectionist representation, each element of a triple is encoded asa spatial pattern. The inheritance problem can thus be rewritten as (P(c),P(p), P(d))or more simply, P(c, p,d), where P(d) is the de"cient part to be recalled, usingpattern recovery (and pattern segmentation as the case may be; see Section 4 formore detail).

Also, any recognition inference problem can be formalized as (concept: ?, property:p1, value: v

1) & (concept: ?, property: p

2, value: v

2) &2& (concept: ?, property: p

m,

value: vm), where property: p

iand value: v

i(i"1,2, m), are clues and concept: ?

is the solution (concept: c) that should be found to "ll these triples so that allthe knowledge represented by these triples is true. Note that (concept: c, property:pi, value: v

i) (i"1,2, m) also need not necessarily be found locally at the re-

presentation of concept c. In connectionist representation, the recognitionproblem can further be rewritten as (P(d),+m

i/1P(p

i), +m

i/1P(v

i)) or more simply,

P(d,+mi/1

pi,+m

i/1vi), where +m

i/1P(x

i) ("P(+m

i/1xi)) is a composite (superim-

posed) pattern composed of P(xi) (i"1,2,m). The recognition problem can there-

fore be regarded as a composite pattern with de"cient part. To recall the de"cientpart, the composite pattern must be separated and recovered into a single patternlike P(c, p

1, v

1).

In this way, a class of inference is transformed into pattern recovery and segmen-tation processing which can be performed by the AAM. A new semantic networkmodel (ASM) can therefore be constructed by means of using the AAM. The ASMresolves the previous models' dilemma of obtaining high performance while at thesame time maintaining a simple and easily understandable architecture. As shownlater, ASM performs inference much faster than existing models: the computationtime tends not to increase with the size of the conceptual hierarchy. By using patternsegmentation, ASM resolves the problems in Hinton's network: it solves inferenceproblems with exceptions and complex recognition problems #awlessly. Becauseone-shot learning is used in ASM, there are also no learning problems, the same asin Hinton's network, and the modi"cation (addition and deletion) of knowledge isvery easy. As in Hinton's network, generalization is a basic property of ASM in thesense that concepts automatically acquire any e!ects possessed by their subordinateconcepts.

Neucom=1156=Chan=VVC

Q. Ma, H. Isahara / Neurocomputing 34 (2000) 207}225 211

Fig. 2. Structure of ASM.

3. Associative semantic memory (ASM)

3.1. Construction

Our proposed ASM (Fig. 2) comprises an encoder, a decoder, and a single adaptiveassociative memory (AAM), which is divided into three parts: CON, PROP, and VALand is described in Section 3.2. The states of these three parts are used to representconcept, property, and value of triples. Within CON, we assume that there are noconnections between units. Besides this constraint, the units in the three parts areconnected with each other. The output (recalled pattern) of AAM (P

OPT"P(c

0, p

0, v

0))

is fed back to the encoder; P(c0), P(p

0), and P(v

0) are the outputs of CON, PROP, and

VAL, respectively.The encoder initially transforms symbolic input like (c, p,?) into spatial pattern

PI"P(c, p,d). Then, it further combines P

Iwith the output of AAM, if it exists, to

form an input to AAM, i.e., PIPT

"PI#P

OPT"P(c#c

0, p#p

0, v

0). Thus, P(c#c

0),

P(p#p0), and P(v

0) are given to CON, PROP, and VAL, respectively. In the same

way, for recognition problem (?, p1, v

1) & (?, p

2, v

2) & 2 & (?, p

m, v

m), the input to

Neucom=1156=Chan=VVC

212 Q. Ma, H. Isahara / Neurocomputing 34 (2000) 207}225

Fig. 3. Example of recall from a de"cient composite input.

AAM will be PIPT

"P(c0,+m

i/1pi#p

0,+m

i/1vi#v

0). On the other hand, the decoder

extracts subpattern P(v0) or P(c

0) corresponding to the kind of problems (inheritance

or recognition inference) from output POPT

of AAM and transforms it into symbolicsrepresenting the solution. In this paper, the encoder and decoder are both assumed toexist and their functions described above are only realized by computer codings.

3.2. The kernel: AAM

Adaptive associative memory (AAM) [16] can separate and recover compositepatterns which might be degraded by de"ciency or noise. An example where AAMseparates and recoveres a de"cient composite input is shown in Fig. 3, in which theblack dots indicate the bits that have positive values and others indicate the bits thathave value zero. Since such patterns can be easily transformed into binary patterns,the bits shown by the black dots are called `1-bitsa and the others are called `0-bitsa incase with no confusion.

Fig. 4 shows a simpli"ed structure of AAM composed of two units. The units areinhibited with each other via inhibitory connections w

ij/w

jiand have positive self-

loops via auto-connections wii/w

jj. AAM consists of a fully interconnected set of such

units; its dynamics can be expressed as

si(k#1)"uAwii

(k)si(k)!+

jEi

wjisj(k)#cP

i#bB, (1)

where si(k) denotes the activity of unit i at time k, P

iis the external input (0 or 1), c and

b are, respectively, the connecting coe$cient from Pito unit i and the bias of unit i,

and u(x) is an analog thresholding function:

u(x)"Gx if x'0,

0 if x40.(2)

Neucom=1156=Chan=VVC

Q. Ma, H. Isahara / Neurocomputing 34 (2000) 207}225 213

Fig. 4. Simpli"ed structure of AAM composed of two units.

Note that bias b in (1) is introduced into AAM to enable pattern recovery (in additionto pattern segmentation). If bias b is set to a positive value, all bits in the input patternare non-zero so that the de"cient part of the input can be recovered; if bias b is set tozero, the bits in the de"cient part are always zero and cannot be recovered. Theauto-connection weight, w

ii(k), changes adaptively to reduce the recall of spurious

memories and to speed up the recall time (see details in [16]). Speci"cally, it changesaccording to the activities of the other units j (jOi) that are weighted with thecorresponding inhibitory connections:

wii(k)"a+

jEi

wjit(s

j(k)), (3)

where a is a positive coe$cient called adaptive weight and t(x) is a binary thre-sholding function:

t(x)"G1 if x'0,

0 if x40.(4)

The weight of inhibitory connection wji

is decided at the learning phase according tothe following one-shot learning algorithm:

(i) Initially, wji"= for all i and j (iOj), where='0.

(ii) When a learning pattern is presented,

wji"G

0 if Pi)P

j'0,

wji

if Pi)P

j"0.

(L-1)

From (L-1) we can see that inhibitory connections are symmetric, i.e., wij"w

ji. After

the learning phase, the autocorrelations of the learned patterns are formed by theinhibitory connections that have disappeared. In other words, the inhibitions betweenthe 1-bits in the same learned pattern are removed, but those between 1-bits indi!erent learned patterns remain intact. Thus, if a de"cient composite pattern (see forexample Fig. 3) is input, the patterns constituting the input (`0a and `1a in thisexample) compete for survival via the intact inhibitory connections, w

ji. One pattern

(`1a in this example) "nally wins and is recalled.

Neucom=1156=Chan=VVC

214 Q. Ma, H. Isahara / Neurocomputing 34 (2000) 207}225

In spite that the AAM itself rarely recalls spurious memories, to construct a modelof semantic network that can perform inference perfectly the probability that nospurious memories are recalled must be 1. This can be guaranteed by the followingcorollary derived in [16].

Corollary 1. When M patterns, P(X1), P(X2 ),2, P(XM ), are stored, for any composite input,a suzcient condition to guarantee that the desired patterns are recalled (i.e., no spuriousmemories are recalled) is as follows:

for P(A)3MP(X1 ), P(X2),2, P(XM )N, &i3P(A) such that <j|P(B)

wijO0, (C-1)

where P(B)"6Mi/1

P(Xi )!P(A). This (C-1) means that a 1-bit i should exist in eachpattern P(A) such that all the inhibitory connections between the 1-bit i and the 1-bits ofany other stored pattern are non-zero after learning.

By suppressing the competitive strength of the currently recalled pattern (`1a in theexample shown in Fig. 3), the next pattern (`0a in this example) can be recalled.Suppression is done by a readout control unit, RC;

i, attached to each unit of AAM

(not shown in Fig. 4, but shown in [16]). Note that once a pattern is recalled, the samepattern will not be recalled again.

The AAM put in the ASM has been designed to give a "nal answer through at mosttwo stages (i.e., AAM will at most do two sequential recalls). At the "rst stage, bias b ofthe AAM in Eq. (1) is set to a positive value so that both pattern recovery and patternsegmentation are possible. The input to AAM is simply the pattern transformed fromthe symbolic input, i.e., P

IPT"P

I(Fig. 2). For this input, AAM will have output

POPT1

through several steps. If POPT1

.PIPT

, ASM will not to proceed to the secondstage and decode P

OPT1to obtain a symbolic answer. Otherwise, ASM will proceed to

the second stage by suppressing the competitive strength of the currently recalledpattern P

OPT1. The input to AAM is the pattern composed of the pattern transformed

from the symbolic input and the output of AAM obtained at the "rst stage, i.e.,PIPT

"PI#P

OPT1. Bias b of the AAM in Eq. (1) is set to 0 so that only pattern

segmentation is possible, i.e., only the subpattern of PIPT

can be recalled. Thus, at thesecond stage, AAM will have another output, P

OPT2, which will be decoded to obtain

a "nal answer. Note that POPT1

and POPT2

are sequentially recalled patterns (like `1aand `0a shown in Fig. 3), and P

OPT2is therefore di!erent than P

OPT1(see details

in [16]).In competitive process, because each unit has a positive self-loop (the "rst terms in

(1) and (3)), the more inhibitions received by an AAM unit, the stronger the unit willbecome. We can thus easily obtain the following corollary, which is closely related tothe present work.

Corollary 2. Supposing that all the stored patterns satisfy (C-1), among the storedpatterns in competition, the one that has a 1-bit receiving the most inhibitions from otherpatterns will win the competition (be recalled xrst).

Neucom=1156=Chan=VVC

Q. Ma, H. Isahara / Neurocomputing 34 (2000) 207}225 215

3.3. Encoding and storing

To make ASM behave as expected, special encoding for conceptual hierarchies isneeded. To represent hierarchical relationships between concepts, similar to Hinton'snetwork, the concepts in the hierarchical relationships are encoded as patterns withinclusion relations. The concepts in the same levels (i.e., the concepts rooted in oneancestor) are encoded into patterns that satisfy (C-1) and have the same number of1-bits so that whichever pattern wins the competition "rst is not a!ected by thestructure of the pattern itself. If we consider the conceptual hierarchy in Fig. 1, forexample, Elephant may be encoded in P(Elephant)"(1,0,0,0), Tiger may be encoded inP(¹iger)"(0,1,0,0), and their ancestor, Mammal, may therefore be encoded inP(Mammal)"(1,1,0,0), which consists of both P(Elephant) and P(¹iger). All proper-ties and values are also, respectively, encoded into patterns that satisfy (C-1), and thenumbers for the 1-bits in these patterns are set the same. Note that because theconcepts, properties, and values can be encoded independently, it is not di$cult toencode a large-scale conceptual hierarchy. Needless to say, these special encodingsdestroyed the real connectionist sense. By introducing such symbolic encoding ina connectionist system, however, the inference can be performed perfectly and the timerequired does not increase with the size of conceptual hierarchy. This cannot be doneby purely symbolic methods or previous connectionist systems.

By using such encoding methods, a conceptual hierarchy can be stored in ASM bysimply learning a set of patterns for triples (c

i, p

ij, v

ij), in which (p

ij, v

ij) is the jth pair of

a property and its value attached to the node for the ith concept ci(e.g., (Elephant,

feeding habits, herbivorous) in Fig. 1). By assuming no connections among units inCON and the special encoding described above, it can easily be known that for anystored pattern P(c, p, v), the 1-bits in P(p) and P(v) will always have a larger, or at leastthe same, number of inhibitory connections (with the 1-bits in other stored patterns)than the 1-bits in P(c). Thus, from Corollary 2 we can further obtain the followingcorollary.

Corollary 3. When l patterns P(i)"P(ci, p

i, v

i) (14i4l ) stored in ASM are in competi-

tion, pattern P(h)"P(ch, p

h, v

h) (14h4l ) will be recalled xrst if and only if its subpat-

tern P(ph) or P(v

h) has a 1-bit that receives the most inhibitions compared to the other

l!1 subpatterns, P(pi) and P(v

i) (iOh), without taking account of P(c

i) (14i4l ).

4. Inference by ASM

In this section, we show how ASM can correctly deal with inheritance andrecognition inference, as well as unknown, clashing, and ambiguous problems.

4.1. Inheritance inference

For a solvable inheritance problem (c, p, ?), we suppose v to be the correct solution.At the "rst stage, the pattern input to AAM is P

IPT"P

I"P(c, p,d). When solution

Neucom=1156=Chan=VVC

216 Q. Ma, H. Isahara / Neurocomputing 34 (2000) 207}225

v is not an exception to property inheritance, there is a pattern [shown in (5)] stored inASM that includes input pattern P

IPT:

P(c,p, v) or P(c0, p, v), (5)

where c0

is c's ancestor, i.e., P(c0).P(c). On the other hand, when solution v is an

exception to property inheritance, there are two patterns [shown in (6)] stored inASM that include input pattern P

IPT:

P(c,p, v) and P(c0, p, v

0), (6)

where c0

is c's ancestor and vOv0. In general, we can assume that in the conceptual

hierarchy there are further m concepts, c1,2, c

m(m*0), that satisfy the following

conditions:

f they are c0's ancestors, i.e., P(c

i).P(c

0) for i"1,2,m;

f they are in hierarchical relationships, i.e., P(ci)-P(c

j) for i(j and i, j"1,2,m;

f they have di!erent values viin property p, i.e., all v

i(i"0,2,m!1) are exceptions

in property inheritance.

Thus, there are an additional m patterns [shown in (7)] also stored in ASM thatinclude input pattern P

IPT:

P(ci, p, v

i), i"1,2,m. (7)

The patterns in (5) and (7), or in (6) and (7), compete with each other when P(c, p,d) isgiven.

Lemma 1. When the patterns in (5) and (7), or in (6) and (7), compete with each other, inboth cases the pattern that has value v, i.e., P(c, p, v) or P(c

0, p, v) in (5), or P(c, p, v) in (6),

is recalled xrst.

Proof. See Appendix A.

The correct solution v can thus be obtained. Because the recalled pattern of thetriple includes the input pattern, ASM stops processing at the "rst stage.

Let us assume the conceptual hierarchy shown in Fig. 1 is stored in ASM. For the(Tiger, motility,?) problem, for example, because the only pattern stored in ASM thatincludes the input pattern is P(Mammal,motility, running), this pattern can be regardedas P(c

0, p, v) in (5). Thus, v ("running) can be obtained as the solution. On the other

hand, for the problem (Penguin, motility,?) where not yying is the exception solution, forexample, only two patterns, P(Penguin,motility,not -ying) and P(Bird,motility,-ying),stored in ASM include the input pattern. These two patterns therefore can beregarded, respectively, as P(c, p, v) and P(c

0, p, v

0) in (6), where c"Penguin,

c0"Bird, p"motility, v"not -ying, and v

0"-ying, Thus, v("not -ying) can be

obtained as the solution by Lemma 1.

Neucom=1156=Chan=VVC

Q. Ma, H. Isahara / Neurocomputing 34 (2000) 207}225 217

4.2. Recognition inference

For a solvable recognition inference problem (?, p1, v

1) & (?, p

2, v

2) &2& (?, p

m, v

m)

(m51), we assume c to be the correct solution that has all the given (pi, v

i) for

i"1,2, m. At the "rst stage, the input to AAM is PIPT

"PI"P(d,+m

i/1pi,+m

i/1vi).

In general the patterns for triples stored in ASM that, respectively, includeP(p

i, v

i)(i"1,2,m) can be written as follows:

P(Mc,SiN, p

i, v

i), i"1,2, m, (8)

where Mc,SiN is the set of concepts that have (p

i, v

i). S

iis the subset excluding c, and

P(Mc,SiN)"P(c)#P(S

i). All patterns included in (8) compete with each other when

the problem is given, and one of them will be recalled via pattern segmentationand pattern recovery. Without loss of generality, we can assume this pattern isP(Mc,S

1N, p

1, v

1)("P

OPT1). Because P

OPT1-P

IPTdoes not hold, ASM proceeds

to the second stage, where the input to AAM is PIPT

"PI#P

OPT1"

P(Mc,S1N+m

i/1pi,+m

i/1vi) and only pattern segmentation is possible, i.e., only a sub-

pattern of PIPT

can be recalled.

Lemma 2. For PIPT

"P(Mc,S1N,+m

i/1pi,+m

i/1vi) at the second stage (i.e., only pattern

segmentation is possible and (Mc,S1N, p

1, v

1) was recalled at the xrst stage), a pattern that

has the single concept c, i.e., P(c,px, v

x) (xO1) is recalled xrst.

Proof. See Appendix B.

The correct solution c can thus be obtained.Let us assume the conceptual hierarchy shown in Fig. 1 is stored in ASM. For

the recognition problem (?, motility, running) & (?, speed, fast), for example, onlytwo patterns, P(MMammal, Elephant, ¹igerN, motility, running) and P(M¹iger, SwallowN,speed, fast), in ASM include the pattern for each clue, respectively. By matching thetwo patterns with those shown in (8), they can, respectively, be rewritten asP(Mc,S

1N, p

1, v

1) and P(Mc,S

2N, p

2, v

2), where c"¹iger,S

1"MMammal,ElephantN,

S2"Swallow,(p

1, v

1)"(motility, running), and (p

2, v

2)"(speed, fast). Thus, c("¹i-

ger) can be obtained as the solution.

4.3. Unknown, clashing, and ambiguous problems

Besides the solvable problems that were described in Sections 4.1 and 4.2, there arealso unknown problems that cannot be solved using existing knowledge, e.g. (Pengin,speed, ?), ambiguous problems in which the given clues are insu$cient to provideunique answers, e.g. (?, speed, fast), and clashing problems in which the given cluescon#ict with each other, e.g. (?, respiration, pulmonary) & (?, respiration, branchialrespiration). Using a similar way of analyses as described in Sections 4.1 and 4.2, wecan know that the ASM will reject unknown and clashing problems by outputinga null pattern from the VAL part for the inheritance problem or the CON part for

Neucom=1156=Chan=VVC

218 Q. Ma, H. Isahara / Neurocomputing 34 (2000) 207}225

recognition problem, and give ambigious answers to ambiguous problems by output-ing a pattern consisting of several solutions. For the problem (?, speed, fast), forexample, a pattern consists of the solutions both Tiger and Swallow will be outputtedfrom the CON part.

4.4. Computation time for inference

According to Ref. [14], the computation time for pattern segmentation and recov-ery of AAM tends to be sub-linearly proportional to the number of patterns incompetition. Because ASM makes inference by pattern segmentation and recovery,the computation time for inference therefore tends to be sub-linearly proportional tothe number of patterns in competition. For most inheritance inference problems,because the patterns shown in (7) do not exist, and only at most two patterns are incompetition as shown in (5) and (6), the computation time does not vary with the sizeof conceptual hierarchies. Similarly, for the recognition inference problem, the numberof patterns in competition only depends on the input complexity (i.e., the number ofgiven clues, or, the number of given pairs of properties and values), as shown in (8).Therefore, the computation time for recognition problems does not vary with the sizeof the conceptual hierarchy and is sub-linearly proportional to the input complexity.Even for those rare inheritance inference problems in which the patterns shown in (7)do exist, because m#24D, where D is the depth of (i.e., the number of levels in) theconceptual hierarchy, the computation time for such problems is at most sub-linearlyproportional to the depth of the conceptual hierarchy.

5. Flexibility of ASM

In ASM, modifying the knowledge is very easy. Additing a new triple, such as(Elephant, body, large), is done using one-shot relearning according to (L-1) withoutthe need to relearn the entire conceptual hierarchy. Deleting triple (c, p, v) is also doneusing one-shot relearning by restoring the inhibitory connections between P(c) andP(v).

ASM also has a generalizing ability. Assuming c1,2, c

nto be the concepts rooted

to c@, we have P(c@)"+ni/1

P(ci). If new property (p

/%8, v

/%8) is added to c

1,2, c

n, by

(L-1) the inhibitory connections between (p/%8

, v/%8

) and P(ci) for i"1,2, n are

removed, which means that those between (p/%8

, v/%8

) and pattern(c@) are also re-moved. As a result, ancestor c@ will acquire the new property (p

/%8, v

/%8) automatically.

If a new property, (nutritive source, organic matter), is added to Mammal, Fish, and Birdin Fig. 1, for example, then their ancestor, Animal, will automatically obtain this newproperty.

6. Computer simulations

To con"rm whether ASM behaves as expected in solving inference problems andto evaluate how fast it performs inference, we conducted two kinds of computer

Neucom=1156=Chan=VVC

Q. Ma, H. Isahara / Neurocomputing 34 (2000) 207}225 219

experiments. The parameters we used were as follows. Adaptive weight a was set to1.2, and bias b (at the "rst stage) was set to 1.0. Connecting coe$cient c was set to 19,to which a random value from 0 to 0.00001 was added. The random value was set assmall as possible so that it would not bias the competitive power of each pattern incompetition. Initial weight= was set to 20, and the gain of RCU,=

RI, was set to the

same value as =. To prevent the units of AAM from being too large before AAMreached a stable state, units s

iwere assumed to have a saturation value of 10. All units

siwere normalized by this value at each step.

6.1. Behavior of ASM

This experiment was conducted to con"rm whether ASM behaves as expected,using the conceptual hierarchy shown in Fig. 1 as an example. The results showed thatfor all solvable inheritance and recognition problems, no matter whether there areexceptions or not, correct answers can always be obtained. For the problem (Swallow,motility,?), the answer yying and for the problem (Penguin, motility,?), the answer notyying, for example, were obtained. Also, for the problem (?, respiration, pulmonaryrespiration) & (?, motility, running) & (?, speed, fast), for example, the answer Tiger wereobtained. On the other hand, for unknown or clashing problems, the obtainedsolutions are always null. When an unknown problem, e.g. (Penguin, speed, ?), ora clashing problem, e.g. (?, respiration, pulmonary respiration) & (?, respiration, bran-chial respiration), was given, the obtained pattern for solution (value or concept) wasnull and the symbolic answer was thus null. For ambigious problems, the answerswere always ambigious. For the ambigious problem (?, respiration, pulmonary respir-ation) & (?, speed, fast), for example, a composite pattern composed of the patterns forTiger and for Swallow was obtained, and a symbolic answer that included both Tigerand Swallow was thus obtained. The number of computation steps needed to obtainthe answer depended on both the type of problem and its complexity. It was smallerthan 20 for all problems we tested. Our experiments also con"rmed the ease ofmodi"cation (i.e., additing and deleting knowledge by using one-shot relearning) andthe generalizing ability (i.e., if a new property is added to all concepts rooted in anancestor, then this ancestor acquires the new property automatically).

6.2. Computation time of ASM

This experiment was conducted to evaluate how quickly the ASM performs infer-ence. It consisted of two sub-experiments: Experiment I was performed to examine therelationship between the inference time and the size of the conceptual hierarchy andExperiment II was performed to examine the relationship between the inference timeand the input complexity.

In Experiment I, a conceptual hierarchy with "ve layers in which the number ofconcepts was variable was used. As an initial structure, the root concept (the conceptat the top layer) had two branches and the concepts from the second through thefourth layer had one branch each. Each concept had only one pair of property andvalue. Thus, there were a total of nine concepts and therefore nine triples to be learned

Neucom=1156=Chan=VVC

220 Q. Ma, H. Isahara / Neurocomputing 34 (2000) 207}225

Fig. 5. Relationship between inference time and size of conceptual hierarchy.

in this initial structure. The conceptual hierarchy was then expanded gradually. First,the number of branches for each concept in the fourth, third, and second layer wasincreased to two in order. Thus, the total number of concepts was increased to 11, 17,and 31 from 9 in order. The number of branches for each concept in the fourth layerwas increased from two to eight in increments of two, the number of branches for eachconcept in the third layer was increased from two to six in increments of two, and thenumber of branches for each concept in the second layer was increased to four inincrements of one in order. Thus, the total number of concepts was further increasedto 47, 63, 79, 151, 223, 333, and 443 from 31 in order. For each of these cases, "veproblems (c, p,?), where c was randomly selected concept from the "fth layer and p wasa property located in the root concept, were solved to obtain an average computa-tion time for inheritance inference. Also, for each of these cases, "ve problems(?, p

1, v

1)&(?, p

2, v

2), where (p

1, v

1) was randomly selected from those in the concepts

located in the "fth layer and (p2, v

2) was randomly selected from those in the concepts

located in the fourth layer, were solved to obtain an average computation time forrecognition inference. As shown in Fig. 5, the time required to compute both inherit-ance and recognition inference was nearly una!ected by the size of the conceptualhierarchy. Because of the recognition inference requires two processing stages, it tooklonger than inheritance inference.

In Experiment II, a conceptual hierarchy that had the same structure as that used inExperiment I and with the maximum of 443 concepts was used. However, the numberand location of the properties and their values were di!erent. In this experiment, theproperties and values were located only on concepts selected as follows: (1) for eachlayer only one concept was selected and (2) all the selected concepts were in one train,i.e., were in a hierarchial relationship. For each of the selected concepts, 20 di!erentproperties and values were located. Recognition problems that had m pairs ofproperties and values were solved; m was initially set to 1, then increased until itreached 100 in increments of 1 when m)5, in increments of 5 when 5(m)20, andin increments of 10 when m'20 in order. The properties and values were selectedfrom the stored conceptual hierarchy as follows: those located in the root conceptwere used "rst when m)20, and then those located in the concepts of the second,

Neucom=1156=Chan=VVC

Q. Ma, H. Isahara / Neurocomputing 34 (2000) 207}225 221

Fig. 6. Relationship between inference time and input complexity.

third, and fourth layers were used in order with increasing m. As shown in Fig. 6, thecomputation time for inference tended to be sub-linearly proportional to the inputcomplexity.

7. Conclusion

We have demonstrated that semantic networks can be well represented by adaptiveassociative memories. The main feature of the proposed model is its computationale!ectiveness; it can do inference based on large knowledge base extremely fast } intime that does not increase with the size of knowledge base. This performance cannotbe relized by any previous models. The ASM is also #exible in the sense thatmodifying knowledge can be easily performed by one-shot relearning and the general-ization of knowledge is a basic system property.

8. For further reading

The following reference is also of interest to the reader: [15].

Appendix A

Proof of Lemma 1. (a) Because all patterns in (5)}(7) have the same property p and arestored ones, there are no inhibitory connections between the 1-bits in P(p) and the1-bits in P(x) where x"c, c

i, v, v

ifor i"0,2,m by (L-1). Thus, P(p) does not a!ect

the competition.(b) We can show that the 1-bits in P(v) receive more inhibitions than the 1-bits in

P(vi) for i"0,2,m as follows.

When the patterns in (5) and (7) compete with each other, because all of themare stored patterns, the 1-bits in P(v

i) (i"1,2,m) do not receive inhibitions

from the 1-bits in P(ci) (i"1,2, m) by (L-1). Further, because P(c

i).P(c

0).

Neucom=1156=Chan=VVC

222 Q. Ma, H. Isahara / Neurocomputing 34 (2000) 207}225

P(c) (i"1,2, m), the 1-bits in P(vi) (i"1,2,m) also do not receive inhibitions from

the 1-bits in P(c0) and P(c). Therefore, the 1-bits in P(v

i) (i"1,2, m) only receive

inhibitions from the 1-bits in P(cj) for j"1,2, m and jOi. On the other hand, the

1-bits in P(v) receive at least inhibitions from the 1-bits in P(ci) for i"1,2,m.

Therefore, the 1-bits in P(v) receive more inhibitions than the 1-bits in P(vi) for

i"1,2, m.Similarly, when the patterns in (6) and (7) compete with each other, the 1-bits in P(v)

receive inhibitions from P(ci) for i"0,2, m, but the 1-bits in P(v

i) (i"0,2, m) only

receive inhibitions from P(cj) for j"0,2,m and jOi. Therefore, the 1-bits in P(v)

also receive more inhibitions than the 1-bits in P(vi) for i"0,2, m.

By (a), (b), and Corollary 3, Lemma 1 is proved. h

Appendix B

Proof of Lemma 2. (a) If S1

in PIPT

is null, P(c, pi, v

i) for i"2,2, m compete with each

other and P(c,px, v

x) (2)x)m) is recalled. Note that (c, p

1, v

1), which was recalled

at the "rst stage, cannot be recalled again.(b) If S

1in P

IPTis not null, because the answer exists uniquely, among S

2,2,S

min

(8) there is at least one S-set in which all concepts are not those of S1. Without loss of

generality, we can assume Sm

is such a set. Thus, the inhibitory connections betweenthe 1-bits in P(p

m, v

m) and the 1-bits in the patterns for the concepts of S

1are intact

after learning by (L!1). The following m!1 stored patterns therefore compete witheach other:

P(c,pm, v

m), (B.1)

P(Mc,S(1)i

N, pi, v

i), i"2,2,m!1. (B.2)

Here S(1)i

is a subset of S1

and its element has (pi, v

i). Note that S(1)

imay be null.

Among these patterns, the 1-bits in P(pm, v

m) [in (B.1)] receive inhibitions from all

patterns for the concepts in S1

as noted above, whereas the 1-bits inP(p

i, v

i) (i"2,2,m!1) [in (B.2)] at least do not receive inhibitions from the con-

cepts in S(1)i

. Thus, if no S(1)i

(i"2,2, m!1) is null, the 1-bits in P(pm, v

m) receive

more inhibitions, and P(c, pm, v

m) is recalled by Corollary 3. If there are some S-sets,

S(1)l

(2)l)m!1), that are null, either the 1-bits in P(pm, v

m) or the 1-bits in

P(pl, v

l) (2)l)m!1) receive more inhibitions, and therefore either P(c, p

m, v

m) or

P(c,pl, v

l) (2)l)m!1), i.e., P(c, p

x, v

x) (xO1), is recalled "rst by Corollary 3.

Lemma 2 is thus proved. h

References

[1] D.G. Bobrow, T. Winogard, An overview of KRL: a knowledge representation language, CognitiveSci. 1 (1) (1977) 3}46.

[2] R.J. Brachman, J. Schmolze, An overview of KL-ONE knowledge representation system, CognitiveSci. 9 (2) (1985) 171}216.

Neucom=1156=Chan=VVC

Q. Ma, H. Isahara / Neurocomputing 34 (2000) 207}225 223

[3] R.L. Chapman, Roget's International Thesaurus, 5th Edition, HarperPernnial, A division of Harper-Collins Publishers, 1992.

[4] M. Derthik, A connectionist architecture for representation and reasoning about structuredknowledge, Proceedings of the Cognitive Science Conference, 1987.

[5] S.E. Fahlman, Representing implicit knowledge, in: G.E. Hinton, J.A. Anderson (Eds.), ParallelModels of Associative Memory, Lawrence Erlbaum, London, 1989.

[6] J. Geller, Advanced update operations in massively parallel knowledge representation, in: H. Kitano,J.A. Hendler (Eds.), Massively Parallel Arti"cial Intelligence, AAAI Press/MIP Press, Cambridge,MA, 1994.

[7] M.H. Hassoun, Associative Neural Memories } Theory and Implementation, Oxford UniversityPress, Oxford, 1993.

[8] Y. Hirai, Representation of procedural knowledge by a model of human associative processor, HASP,Trans. IECE Japan J69-D, (1986) 1743}1753. (in Japanese).

[9] Y. Hirai, Representation of control structures by a model of associative processor, HASP, Proceed-ings of the First Annual International Conference on Neural Networks, 1987, pp. II-447}454.

[10] Y. Hirai, Q. Ma, Modeling the process of problem-solving by associative networks capable onimproving the performance, Biol. Cybernet. 59 (1988) 353}365.

[11] G.E. Hinton, Implementing semantic networks in parallel hardware, in: G.E. Hinton, J.A. Anderson(Eds.), Parallel Models of Associative Memory, Lawrence Erlbaum, London, 1989.

[12] J.J. Katz, J.A. Fordor, The structure of a semantic theory, Language 39 (1963) 170}210.[13] Q. Ma, Y. Hirai, Modeling the acquisition of counting with an associative network, Biol. Cybernet. 61

(1989) 271}278.[14] Q. Ma, An adaptive type of competitive network for pattern segmentation in associative memory,

IEICE of Japan, Tech. Rep. NC93-64, pp. 75}82, 1993 (in Japanese).[15] Q. Ma, Connectionist realization of semantic networks using an adaptive associative memory, AAM,

1995 IEEE International Conference on Neural Networks Proceedings (ICNN'95), pp. 1328}1333.[16] Q. Ma, Adaptive associative memories capable of pattern segmentation, IEEE Trans. Neural

Networks 7 (6) (1996) 1439}1449.[17] M.R. Quillian, Semantic memory, in: M. Minsky (Ed.), Semantic Information Processing, MIT Press,

Cambridge, MA, 1968.[18] P.G. Schyns, A modular neural network model of concept acquisition, Cognitive Sci. 15, pp. 461}508,

(1991).[19] L. Shastri, A connectionist approach to knowledge representation and limited inference, Cognitive

Sci. 12 (1988) 331}392.[20] L. Shastri, Why semantic networks?, in: J.F. Sowa (Ed.), Principles of semantic Networks, Morgan

Kaufmann Publishers, Los Altmos, GA, 1991.[21] E.E. Smith, D.L. Medin, Categories and Concepts, Harvard University Press, Cambridge, MA, 1981.[22] M. Zeidenberd, Neural Networks in Arti"cial Intelligence, Ellis Horwood, Chicester, UK, 1990.

Neucom=1156=Chan=VVC

224 Q. Ma, H. Isahara / Neurocomputing 34 (2000) 207}225

Qing Ma received B.S. degree in electrical engineering from Beijing University ofAeronautics and Astronautics, China, in 1983, and M.S. and Dr. Eng. degree incomputer science from University of Tsukuba, Japan, in 1987 and 1990. Heworked in Ono Sokki Co., Ltd., Japan from 1990 to 1993. Since 1993 he has beenworking at Communications Research Laboratory, Ministry of Posts and Tele-communications, Japan, currently as a Senior Researcher. His research interestsinclude neural networks and its applications in knowledge representation andnatural language processing.

Hitoshi Isahara received B.E., M.E. degrees in electrical engineering from KyotoUniversity, Japan, in 1978 and 1980 respectively, and Dr. Eng. degree in electricalengineering from Kyoto University, Japan, in 1995. He worked in ElectrotechnicalLaboratory, Ministry of International Trade and Industry, Japan, from 1980 to1995. In 1995, he moved to the Communications Research Laboratory, Ministryof Posts and Telecommunications, Japan, where he is now the Chief of IntelligentProcessing Section. His research interests include natural language processing, AI,and knowledge representation.

Neucom=1156=Chan=VVC

Q. Ma, H. Isahara / Neurocomputing 34 (2000) 207}225 225