10
Rev. Neural. (Paris), 1994, 150, 9, 570-579. c) Masson. Paris. 1994. THE ORGANIZATION OF MEMORY A PARALLEL DISTRIBUTED PROCESSING PERSPECTIVE James L. McCLELLAND 1:' SuMMARy Parallel distributed processing (PDP) provides a contemporary framework for thinking about the nature and organization of perception, memory, language, and thought. In this talk I describe the overall framework briefly and discuss its implications of procedural, semantic, and episodic memory. According to the PDP approach, the processing of information takes place through the interaction of a large number of simple processing units organized into modules. Storage occurs through the modification of connection weights based on the system response to its input, that provides an opponunity for incremental storage. I will describe how connection modification may give rise through the course of experience to procedural learning and to the formation of semantic memories, structured by their semantic content I will argue that the discovery of semantic structure requires gradual learning, with repeated exposure to representative samples of the structure to be learned will then describe two neuropsychological implications of the PDP approach. First, I will consider the possible modular organization of semantic information in the brain. Then, I will examine the role of the hippocampus in learning and memory. In the first case, we will see how the PDP approach leads us to see how brain damage might produce apparent dissociations between categories, when infact the underlying organization is not by category but by modality. In the second case, we will see that the PDP approach gives us a new way to understand why it is important that unique, arbitrary associations not be stored all at once in the same memorys systems used for semantic information. This leads to a specific theory of complementary roles of cortex and hippocampus, and to an explanation for the phenomenon of temporally graded retrogade amnesill. L ' orgllnistltion de III memoire. Interet du f( PllNllel Distributed Processing L. McCLELLAND. Rev. Neurol (Paris), 1994, 150: 9, 570- 579. RtsUMt La methode dite . parallel distributed processing" (PDP) ofl're un nouveau cadre a la retlexion sur la nature et I'organisation de perception de la memoire, du langage et de la pensee. L' auteur decrit d' abord brievement I'ensemble de ce cadre et discute ses applications ala memoire procedurale. semantique et episodique. Selon la methode DPD, Ie traitement de I'information passe par I'interaction d' grand nombre d' unites de traitement simple organisees en modules. Le stockage a lieu par modification des poids de connexions, fondee sur la reponse du systeme a ses atrerences, ce qui permet un stockage de plus en plus important. L'auteur decrit ensuite comment la modification des coMexions peut aboutir, au tOurs de I'experience, a I'apprentissage procedural et a la formation de memoires semantiques structurees par leur contenu semantique. II demontre que la decouverte d' une structure semantique exige un apprentissage graduel comportant ime exposition repetee a des echantillons representatifs de la structure a apprendre. L'auteur dCcrit alors deux applications du PDP a la neuropsychologie. II considere d' abord I' organisation modulaire possible (i' une information semantique dans I'encephale, puis examine Ie role joue par I' hippocampe dans I' apprentissage et la memoire. Dans Ie premier cas, on voit comment la methode PDP amene a envisager comment une lesion cerebrale peut provoquer des dissociations entre categories, alors que I' organisation sous-jacente n est pas fondee sur la categoric mais sur la modalite. Dans Ie second cas, I'on voit que la methode PDP procure une nouvelle fa~on de comprendre pourquoi il est important que des associations uniques et arbitraires ne soient pas toutes stockees en meme temps dans les memes systemes mnesiques que ceux utilises pour I' information semantique. Tout tela aboutit a une theorie spCcifique des roles complementaires que jouent Ie cortex et I'hippocampe ainsi qu a une explication du phenomene d' amnesie retrograde temporellement graduee. How is . memory organized? This question actually sub- sumes many more specific questions. To list only two, how many different kinds of memory are there? What is the substructure of each type ? These kinds of questions have motivated a great deal of research, both psychological and neuropsychological. Behind them lies another set of ques. tions : Why is memory organized as it is ? What leads to its organization? Are there fundamental computational issues that dictate the form of the human memory system, or is it simply a conglomeration of evolutionary artifacts ? I will address some of these questions in this talk, drawing on insights ftom the behavior of connectionist, Tires Q pan: lames L. McCLELLAND. Department of Psychology Carnegie Mellon University Pittsburgh, PA 15213, USA.

THE ORGANIZATION OF MEMORY - Stanford Universityjlmcc/papers/McClelland94b.pdf · mE ORGANIZATION OF MEMORY 571 parallel-distributed processing models of human memory. FlI'St, I will

  • Upload
    lytram

  • View
    234

  • Download
    0

Embed Size (px)

Citation preview

Rev. Neural. (Paris), 1994, 150, 9, 570-579. c) Masson. Paris. 1994.

THE ORGANIZATION OF MEMORY

A PARALLEL DISTRIBUTED PROCESSING PERSPECTIVE

James L. McCLELLAND 1:'

SuMMARy

Parallel distributed processing (PDP) provides a contemporary framework for thinking about the nature and organization of perception,memory, language, and thought. In this talk I describe the overall framework briefly and discuss its implications of procedural, semantic, andepisodic memory. According to the PDP approach, the processing of information takes place through the interaction of a large number of simpleprocessing units organized into modules. Storage occurs through the modification of connection weights based on the system response to itsinput, that provides an opponunity for incremental storage. I will describe how connection modification may give rise through the course ofexperience to procedural learning and to the formation of semantic memories, structured by their semantic content I will argue that the

discovery of semantic structure requires gradual learning, with repeated exposure to representative samples of the structure to be learned will then describe two neuropsychological implications of the PDP approach. First, I will consider the possible modular organization ofsemantic information in the brain. Then, I will examine the role of the hippocampus in learning and memory. In the first case, we will seehow the PDP approach leads us to see how brain damage might produce apparent dissociations between categories, when infact the underlyingorganization is not by category but by modality. In the second case, we will see that the PDP approach gives us a new way to understand whyit is important that unique, arbitrary associations not be stored all at once in the same memorys systems used for semantic information. Thisleads to a specific theory of complementary roles of cortex and hippocampus, and to an explanation for the phenomenon of temporally gradedretrogade amnesill.

L 'orgllnistltion de III memoire. Interet du f( PllNllel Distributed Processing

L. McCLELLAND. Rev. Neurol (Paris), 1994, 150: 9, 570-579.

RtsUMt

La methode dite . parallel distributed processing" (PDP) ofl're un nouveau cadre a la retlexion sur la nature et I'organisation de perception de la memoire, du langage et de la pensee. L' auteur decrit d' abord brievement I'ensemble de ce cadre et discute ses applicationsala memoire procedurale. semantique et episodique. Selon la methode DPD, Ie traitement de I'information passe par I'interaction d'grand nombre d'unites de traitement simple organisees en modules. Le stockage a lieu par modification des poids de connexions, fondeesur la reponse du systeme a ses atrerences, ce qui permet un stockage de plus en plus important. L'auteur decrit ensuite comment lamodification des coMexions peut aboutir, au tOurs de I'experience, a I'apprentissage procedural et a la formation de memoires semantiquesstructurees par leur contenu semantique. II demontre que la decouverte d'une structure semantique exige un apprentissage graduelcomportant ime exposition repetee a des echantillons representatifs de la structure a apprendre. L'auteur dCcrit alors deux applications duPDP a la neuropsychologie. II considere d'abord I'organisation modulaire possible (i'une information semantique dans I'encephale, puisexamine Ie role joue par I'hippocampe dans I'apprentissage et la memoire. Dans Ie premier cas, on voit comment la methode PDP amenea envisager comment une lesion cerebrale peut provoquer des dissociations entre categories, alors que I'organisation sous-jacente n est pasfondee sur la categoric mais sur la modalite. Dans Ie second cas, I'on voit que la methode PDP procure une nouvelle fa~on de comprendre

pourquoi il est important que des associations uniques et arbitraires ne soient pas toutes stockees en meme temps dans les memes systemesmnesiques que ceux utilises pour I' information semantique. Tout tela aboutit a une theorie spCcifique des roles complementaires que jouentIe cortex et I'hippocampe ainsi qu a une explication du phenomene d' amnesie retrograde temporellement graduee.

How is . memory organized? This question actually sub-sumes many more specific questions. To list only two, howmany different kinds of memory are there? What is thesubstructure of each type ? These kinds of questions havemotivated a great deal of research, both psychological andneuropsychological. Behind them lies another set of ques.

tions : Why is memory organized as it is ? What leads to itsorganization? Are there fundamental computational issues

that dictate the form of the human memory system, or is itsimply a conglomeration of evolutionary artifacts ?

I will address some of these questions in this talk,drawing on insights ftom the behavior of connectionist,

Tires Q pan: lames L. McCLELLAND. Department of Psychology Carnegie Mellon University Pittsburgh, PA 15213, USA.

common
Pencil
common
Pencil
common
Pencil
common
Pencil

mE ORGANIZATION OF MEMORY 571

parallel-distributed processing models of human memory.FlI'St, I will briefly review the properties of such models.Then, I will describe how structured procedural and seman-tic knowledge may arise ftom learning in such systems. Iwill argue that both kinds of learning depend on gradualconnection weight adjustment through repeated and inter-leaved presentation to representative samples of the struc-ture of the domains to which they apply.

I will then build upon these descriptions to address twospecific questions ftom the range of issues raised earlier.Each question is at once neuropsychological and psycholo-

gical. The questions are :How is semantic memory organized in the brain? Is it

by category or by modality?Why is there is a special system - namely the hippocam-

pus - dedicated to the rapid acquisition of episodic

memories?The investigation of these questions relates closely to the

legacy of Charcot, since they depend on the results of thestudy of neurological patients, and they draw on a conside-ration of the effects of brain damage on memory. Theybring this kind of study together with explicit computatio-nal modeling based on the principles of parallel distributedprocessing (POP). In both cases, I will argue that theproperties of POP models allow us to supplement thecareful analysis of patient behavior with theoretical observa.tions that can dramatically advance our understanding of

the implications of the behavioral evidence for fundamentalquestions like the ones posed above.

Let us begin with the parallel distributed processing

ftamework (Mc Clelland & Rumelhart, 1986; Rumelhart&: Mc Clelland, 1986). It is a ftamework for buildingexplicit computational models of human cognitive function.In this ftamework, we begin by assuming that cognitiontakes place via the interactions of a large number of simplebut highly interconnected computational elements organi-zed into groups or modules (fig. I). Each element issomething like a neuron, in that et aggregates inputs thatit receives ftom each other element via excitatory (positive)and inhibitory (negative) connections. The sum of all these influences in turn determines the activation of theunit, according to a simple non-linear function like the oneshown.

In systems of this type, what we might call the activerepresentation of information takes the form of the patternof activation over the units in the network. So, for example,in the network illustrated in the figure, the presentation of

. a word would give rise to a pattern of activation over severalpools of units, one representing perhaps the alphabetic

content of the word, another its semantic content, and athird its phonological content.

We think of the processing of information as the propa-gation of activation among the units. In the brain, weassume that this is an iterative process, and we simulate thisby dividing time into many small steps, and updating theactivation of each unit once in each timestep, based on theactivations of other units at the previous step. In the case

!nouts Out out

Processing Unit

-5 -A -3 -2 -I 0 I ::I

Total Input

Fla. 1.- The parallel-distributed processing fuunewon.

Le cadre du tI parallel-distributed processing JI.

Middle and bottom sections after Sejnowski & Rosenberg. 1987.

of networks with symmetrical connectivity, this causes themto settle, after a pattern is presented, into a stable oranraClor, in which the relevant patterns are active in allparts of the network. In general the ftamework allows forthe possibility that cognitive states are represented astrajectories or patterns of activation that change throughtime. For the present our interest focuses on the simpler,but still interesting case of attractor networks.

Now, given these conceptions of representation andprocessing, let us consider knowledge and learning. Whereis the knowledge in a parallel distributed processing sys-tem ? The answer is, it is stored in the strengths of theexcitatory and inhibitory connections among the proces-sing units. So, for example, the knowledge that allows usto translate an alphabetic pattern into a phonological

pattern is stored in the strengths of the connections in thepathway leading ftom the orthographic to the phonologicalmodules. These weights, as we shall see, are capable ofstoring the complex pattern of regular and exceptionalrelations between the spellings of English words and theirsounds. Other sets of weights can capture the more arbi-trary relations between spelling ~d meaning or betweenmeaning and sound.

common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil

~ f

572 L. McCLELLAND

Let us briefly consider explicit memories - for examplethe knowledge that' Charcot died 100 years ago ' or . Anoak is a tree '. According to the connectionist approach,even such memories as these are stored in connectionweights. Of course, when we think explicitly about one ofthese propositions, we hold it in mind as a pattern ofactivation. But the knowledge that allows us to reinstate this

" proposition ftom memory when appropriately cued is to befound. we believe, in connection weights. We will examinea model that encOdes propositional knowledge in this wayin a moment Also, when we think about specific episodicmemories, we reinstate a pattern of activation representingvisual aspects of the event and its context and of the feelingsand emotions of ourselves and others. Again such memo-ries are patterns of activation when we recall them but theknowledge that allows this recall is, we claim, stored inconnection weights.

If, as we have suggested, procedural, semantic, andepisodic knowledge are all stored in connection weights,then clearly, all three types of learning occur throughconnection strength adjustment. This fact links the psy-chological study oflearning and memory with the physiolo-gical investigation of synaptic modification. For our purpo-ses, we will not consider in great detail exactly what thesephysiological mechanisms are. Rather, we will rely on theuse of a procedure called back propagation that allows usto adjust the strengths of the connections among simpleprocessing units according to a very simple principle callederror correction. Basically, this principle says:

Adjust the strength of each connection in proportion to theextent that its adjustment will reduce the discrepancy betweenthe response of the network and external teaching signals.

For example, we can imagine a child learning to translateftom spelling to sound, seing a word and then hearing itscorrect pronunciation. We would use the pattern producedby the visual input to produce a pattern of activation overthe alphabetic units: then we would use the existingconnection weights to produce a pattern corresponding tothe networks representation of the sound. We would com-pare this to the sound supplied by the teacher, and thenadjust each connection weight so as to reduce the differencebetween the pattern produced by the network and thepattern specified by the teacher.

We now have a ftamework for thinking about representa.tion and processing; and about knowledge and knowledgeacquisition. I will now briefly describe the results of twosimulations demonstrating the power of this approach forthe acquisition of both procedural and semantic knowledge.

FIrst we consider procedural knowledge. The networkshown in fig. 2 was developed by Plaut, McClelland, andSeidenberg (1992; Plaut & McClelland, 1993) to learnto translate alphabetic patterns representing the spellings ofwords into phonological patterns representing their sounds.The network consists of input, hidden, and output units ;as well as connections as illustrated in the figure. Thetraining corpus consisted of a set of 3000 monosyllabic

100 hidden units

108 grapheme units

FIG. 2.- The network used by Plaut and Me CleUand (1993).

JUseau utilise par Piaut tI Me Cleiland (1993).

words. Training proceeded very gradually, capturing thegradual nature of the acquistion ofreading skill ; in fact thenetwork was exposed to the entire corpus of trainingexamples 3200 times, with presentations of more ftequent

. words given greater weight than presentations of less fte.quent words. After training, the network could read99.7 p. 100 of the words in the corpus correctly, missingonly 10 words that were low in ftequency and very excep-

tional either in their spelling or in their pronunciation giventhe spelling - two examples are the French words ' sioux 'and . bas " which are very rare and completely inconsistentwith English pronunciation. For the vast majority of words,including most exceptions like those shown on (fig. 3), thenetwork was correct. The network was also tested withmany pronounceable nonwords, and scored as well asnormals in providing plausible pronunciations - as high as98 p. 100 correct on some sets of items. Based on compa-risons of the networks performance with typical adultEnglish speakers, we can say that the network has graduallylearned both the regular mapping between spelling andsound as well as the commonly occurring exceptions to itin a way that corresponds to human capabilities in thisregard. It has mastered, in short, the procedural skill oftranslating ftom spelling to sound through gradual,incremental learning.

Let us now consider a model that learns semantic infor-mation. This model was constructed by Rumelhart (1990)to demonstrate how structured semantic representationscan arise in PDP systems ftom experience with proposi-tions about objects and their properties. The network wastrained to capture the information standardly stored in the

Example exception words mastered by the network

of Plaut and McClelland (1993):

...

HAVE ONE

PINT AISLE

FIG. 3.- Some examples of exception words learned by the network ofPlaut & Me Clelland (1993).

Quelques exemples de mots d'eJa:eption appris par Ie reseau de Plaut et Clelland (/993).

common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil

TIlE ORGANIZATION OF MEMORY

--

fro.living

~~G-

..~

;:t",~

~!\ -

1X\..i'," pin. 10" "sisi

f/nary robin \ ../mon sunfish

fl8 J,.. "8

", ~ \" ,

big If..n ,.11_: sin, ,.110. ).": 1." ,.110.

....: ....

~ftOI."Yaparro. os/1k:t(

fiG. 4.- A semantic network of the type used in symbolic models ofcognition. From Rumelhart and Todd. 1993.

Riseau semanlique du rype uliiise dons ies modeles symboliques de cogni.tion. D'apres RumelJuur el Todd (/993).

old-fashioned semantic networks of the kind illustrated inthe fig. 4. The goal is to allow the network to storeinformation about concepts in such a way as to permit togeneralize ftom what it has learned about some concepts toother related concepts. In the old approach, this was doneby storing information true of a class or subclass ofconcepts at the highest possible level of the tree. In the newapproach, this is done by gradually learning to assign eachconcept a distributed representation, capturing its similarityrelations to other concepts.

Rumelhart s network is shown in fig. 5. It consists of aset of input units, one for each concept and one for eachof several relations: 4( IS A~ 4( HAS~ 4( IS~, and4( CAN~. At the output it consists of a set of units forvarious completions of simple propositions, such as 4( RO-BIN IS A BIRD ~ , 4( ROBIN CAN FLY It, etc. In betweenthere are two layers of hidden units, one to represent thesemantic relations of the concepts and another to combinethem with the relations to activate the correct outputs.

Rumelhart trained this network with a set ofpropositionsinvolving the concepts shown in the previous figure. Thetraining was one input pattern for concept-relation pair,and the task was to turn on all the output units representingcorrect completions of the proposition. For example, whenthe input is ' ROBIN IS A ' the correct output is BIRD,ANIMAL, LMNG THING. The entire set of patterns waspresented to the network many times, making small ad-justments to the strengths of the connections after thepresentation of each pattern.

After the network has mastered the training set, it ispossible to examine the representations that the networkhas learned to assign to the words in the concepts. We canlook either at the patterns themselves, as well as their

similarity relations. I have repeated the simulation myself,

AnimalPlantBirdFishTreeFlowerCanaryRobin

is ahascan

573

AnimalPlantBirdFishTreeFlowerCanaryRobin

&kinleaveswings

growflyswim

bigsmallyellow

FIG. The connectionist netWork used by Rumelhart to learn the factsstored in the previous figure.

Le reseau #I connexioniste. ulilise par Rumel/um pour apprrndrr ies faitsstockes dans ID figurr precUente.

and the results that I obtained are presented in fig. 6. The

figure demonstrates that the network has assigned siIriiIarpatterns to similar concepts - so for example, the patternfor oak is similar to the pattern for pine, the pattern fordaisy is similar to the pattern for rose, etc.

Now we may consider how we may achieve generaliza-tion in this network. We may do so if we can assign to anew pattern a representation that is similar to the represen-tation of other similar concepts. To illustrate this, Rumel-hart trained the network to produce the correct classinclusion output for the concept sparrow. He simplypresented' Sparrow is a ' as input, trained the network toproduce the corrrect response - . Bird, Animal, LivingThing ' . This caused the network to assign a pattern tosparrow very close to the patterns for robin and canary. Asa result the network was able to answer reasonably whenprobed with other propositions involving sparrows. It saidthe sparrow can grow and can fly, it has wings, features andskin, and it is small. It was unsure about whether thesparrow could sing and about its color. Thus the networklearned to inherit knowledge ftom what it had learnedabout other birds without direct instruction.

The point here is that PDP models, trained slowly viainterleaved presentation on a representative sample of anentire domain of knowledge, can gradually acquire kno-

common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil

574 n. McCLELLAND

1.2 1.8

....

0Ik

pOle

Almon

_ISh

fiG. 6.- Resuhs ofa cluster analysis on the representations of each of thespecific concepts learned by Rumelhart's network (replication fromMe Clelland, Me Naughton, and O'Reilly, 1993).

RisultDls dune analyse en grappe portlJnt sur les representations de chacundes concepts spiciflques appris por Ie reseau de Rumelhart (d'apres McClelland. Mc Naughton et O'Reilly, /993).

wledge of structured procedures such as spelling-soundcorrespondence and structured semantic domains such asthe domain of living things.

In summary, we have seen how POP models provide away not only of representing procedural and semanticknowledge but of indicating how this knowledge may be

. acquired through gradual experience. In the next section ofthe talk, I will show how parallel distributed processingmodels can help us understand two very stilking neuropsy-

. chological phenomena that have implications fof questionsabout the nature of the organization of human memory.

FJrStwe consider a question that arises within semanticmemory: How is semantic memory organized in thebrain ? Is it organized by category, for example with repre-sentations of living things stored in one part of the brainand knowledge of man-made artifacts in another? Or is itorganized perhaps by modality, with representations ofvisual properties stored in one part of the brain andknowledge about other types of properties, such as functionand use in other parts ?

This question has arisen within the domain of neurologydue to some fascinating observations reported in a series ofstudies by Warrington and her colleagues (Warrington &Mc Carthy, 1983, 1987; Warrington and Shallice, 1984).Warrington and Shallice (1984) reported two patients whoshowed a dramatic impairment in semantic knowledge ofliving things coupled with only slight deficits in knowledgeof man-made objects such as a briefcase or an umbrella. Inone test, patients were asked to define concepts when giventhe word for the concept Fig. 7 shows the data, along withsome example responses.

Clearly, the patients retain more information aboutman-made objects than they retain about living things, Thesame results are obtained when the input is a picture so the

2.0

Perfomltlllce of TII'O PIIIie"ts WitJI Im",;,ed K"owledge of Livi'"Tltiqs 0" J'1II'io1lS Se_tk Memory T.b

Case Living thing Nonliving thing

Picture identificationJBR 90%SBY 7S%

Spoken word definitionJBR 79%SBY

JBR Parrot : dont ' know Tent: temporary outhouse, livinghome

DaJrodil : plant Briefcase: small case used by stu-dents to carry papers

Snail : an insect animal Compasse: tools for tellling direc-tion you are going

Eel : not well Torch: hand-held lightOstrich: unusual Dustbin: bin for putting rubbish in

SBY Duck: an animal Wheelbarrow: object used be peopleto take material about

Wasp: bird that fiies Towel: material used to dry peopleCrocus: rubbish material Pram: used to carry people, with

wheels and a thing to sit onHolly: what you drink Submarine: ship that goes under-

neath the sea

Spider a person looking Umbrella: object used to protect youfor things, he was a spider trom WIther that comesfor his nation or country

fiG. 7.- Penormance of two patients with impaired knowledge of livingthings in various semantic memory taw. From Farah &: Me Clelland(1991) based on data in Warrington &. Shallice (1984).

Performance de deux malades ayant une ma/lllaise connaissance des etresvivants dans dijJerentes uiches de memoire semantique. D'apres Farah

Mc Clelland (/991),fonde sur les donnees de Warrington Shallice(/984).

problem appears to be semantic, not simply one of acces-sing meaning from sound. Other patients, tested by War-rington and Mc Carthy (1983), show a reverse pattern,although it is not as dramatic.

Taken together the results suggest that localized lesionsto the brain can selectively interfere with knowledge ofliving things on the one hand and with knowledge ofman-made objects on the other. It is tempting to concludefrom this in fact knowledge of living things is stored in a

- different part of the cognitive system than knowledge ofman-made objects. In fact, though, Warrington and hercolleagues (Warrington & Mc Carthy, 1983; Warringtonand Shallice, 1984) proposed at first a very different inter-pretation. On this view, these selective deficits reflectorganiza~on by type of information rather than by the typeof object. These authors noted that living things seem to bedistinguished predominantly by their sensory properties,while man~made objects such as umbrellas are distinguis-hed primarily by their function. They suggested that per-haps to access a concept is to access its distinguishingfeatures. If the living things have more visual distinguishingfeatures. then access to concepts about living things would

common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil

-""' ~'. ---, ._. --- _.~"' "-'--' .,. .

TIlE ORGANIZATION OF MEMORY 575

be impaired if the portion of the brain containing visualfeature information was damaged. This view makes a greatdeal of sense for a number of reasons. We know that thebrain is organized in part by modality; furthermore, in laterresearch it appeared that the dissociation could best bedescribed, not in terms of living things vs. man-madeartifacts, but in terms of concepts that are distinguished bytheir appearance (including jewelry, for example, which isman-made) vs. concepts that are distinguished by the filetthat they refer to objects that are easy to manipulate. YetWarrington later abandoned the idea, because of whatseemed to many people to be a devastating critique. Thecritique rests on the observation that Warrington andShallice s patients were impaired, not only for questionsabout the visual properties of living things, but also forquestions abOut the functional properties of living things. Ifvisual and functional properties are stored separately (per-haps in modality specific stQres related to perception andaction), it is difficult to see why a lesion that affects visualknowledge more than functional knowledge would affectfunctional knowledge about living things more than func-tional knowledge about man-made artifacts. Therefore thetheory that semantic memory is organized by type ofinformation rather than by category of concepts was aban-doned. Martha Farah and I have used a connectionistmodel to reinvestigate this issue. In particular, we examinedwhether a PDP model that adheres to Warrington andShallice s assumptions could simulate the basic findingsftom the patient data. In our simulation, the representationof an object is a distributed pattern of activation over$everal modules (fig. 8). We have a visual input module forrepresenting the appearance of the object and anothermodule for representing the name. We imagine there arealso semantic modules, perhaps several for different typesof information. In our simulation we include two semanticmodules, one for visual aspects of the representation of aconcept and one for functional information about thepurpose and use of the object. The model contains recipro-cal connections between both input modules to both se-mantic modules, allowing bi-directional associations bet-

..,.

The Semantic Memory network of Farah and Me Clelland 11991)

FIG. 8.- The network model used by Farah and Me Clelland (1991).

Modele de reseau ulilise par Farah el Me Clelland (1991).

ween both types of input and both aspects of meaning.There are also recurrent connections within the visual andphonological input representations, allowing each patternto function as an attractor. Fmally, we also see the conceptsas consisting of semantic attractors. These involve connec-tions both within and between the different semanticmodules.

Before designing representations of living things andman.made objects, we first tested Warrington and Shalliceclaim that living things are distinguished primarily by theirvisual properties and artifacts are distinguished primarily bytheir functions. To do this, we presented students withdictionary definitions of the living things and artifacts usedin their experiments. One group of students underlinedwords referring to visual properties, and another underlinedwords referring to function. The results of this were a littledifferent than Warringhton and Shallice had claimed, butvery similar: There were many more visual than functionalaspects of living things, but about 7 : 1. For artifacts, therewere about equal numbers of visual and functional aspects.We designed our representations accrodingly. Eachconcept was represented with a random feature pattern, butfor living things, 7 : 8 of the features were visual, only 1/8were functional; for artifacts, 50 p. 100 of the features werevisual and 50 p. 100 were functional. In this way we created10 . living thing' concepts and 10 random ' artifact 'concepts. For each concept, the visual and phonologicalinput representations were simply random patterns. Foreach concept, the model was trained to use either the nameor the picture representation as input, and ftom this tocomplete the rest of the pattern - this includes both visualand functional semantics, as well as the other input repre-sentation. Note that there are no direct connections bet-ween input representations, so in order to perform theassociation between visual and phonological patterns thenetwork had to make use of the connections via thesemantic units.

After training, the network is able to complete all of theconcepts accurately, both producing the correct semanticrepresentation and reproducing the correct phonologicalpattern when the picture is presented and vice-versa. Ourquestion, though, is whether the organization em~dded inthis network is compatible with the neuropsychologicalevidence of selective deficits of living things after somelesions, but on artifacts after others. There are no modulescorresponding to these categories in the model, yet in factit does allow us to capture the selective deficits seen inpatient behavior, by assuming that their lesions selectivelyaffect visual or functional semantics. To demonstrate this,we carried out a series oflesion experiments on the model.We simulated brain damage of different degrees by des-troying different random &actions of the semantic units either visual semantic units or functional semantic units and then testing the network to see how well it can performan. analog of picture naming. Specifically, we determinedwhether the network can activate the correct name patternwhen given the picture pattern as input. The results for the

common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil

576 L. McCLELLAND

effects of lesions 10 "yisual semantics are shown in the fig.9. Here, in the first panel, we see a gradual degradation offunction that is much greater for living things that forartifacts. There is very . little effect on artifacts unless thelesion is quite severe. The magnitude of the effect is quiteconsistent with the patient data. Now consider the effectsof lesions to functional semantics

(fig. 9 second pane!).Here we see that there is very little at all on living things,but a moderate effect on ' artifacts. The model largely repro-duces, then, the double dissociations found in the patients.It does so, essentially, by implementing the Warrington andShallice account of the data. To activate the correct name,one must activate the distinctive semantics of each concept.But for living things, the bulk of the semantic informationis visual, and so by destroying the visual semantic units weprevent the activation of enough of the semantics toproduce the correct response. For artifacts, the semanticinformation is distributed about evenly over the twomodules so a lesion to either module produces about thesame size effect.

We may now consider the objection that caused Warring-ton to abandon her model. The argument, was that a lesionthat affected visual semantics should leave knowledge of thenon-visual aspects of concepts unaffected, yet patients whoshowed a deficit for living things showed a deficit forfunctional properties of living things, not just their visualproperties. But in our model, we see in fact just this verykind of d~ficit. As the third panel of the figure shows, wesee an increasing deficit in the activation of functional

. semantic representations as we increase damage to visualsemantics, and the deficit is more dramatic for living thingsthan for artifacts. The reason is this. Although the semanticfeatures are segregated as two types the knowledge thatallows for their activation is distributed throughout thenetwork, and includes knowledge in connections betweenvisual and functional semantics. Thus when the visual

% co"ect

1.0

0.4

D.2

Basic model % co"ec.

Nonliving

~ ~

c~~~~-c

tJo,~

living

"Oo_~~ oo.----

living

1.0

c...

Q..

..

living

0.&

0.0 0.080 100 % damaga 10 ..sual.emamic memory

semantic units are destroyed, the network is less able toactivate the correct functional semantics. The effect ispresent for both types of concepts, but much greater forliving things than for artifacts. Of course, the deficit is notas great as the deficit in visual knowledge of living things.This is in fact exactly the pattern seen in patients. We showthe data ftom one patient tested by Farah et al. (1989). Thepatient showed relatively subtle deficits compared to theoriginal patients. But the largest deficit was in visualknowledge ofliving things, consistent with a lesion in visualsemantics. The patient also showed smaller deficits in visualknowledge of artifacts and in functional knowledge of livingthings. There was no significant deficit in functional kno-wledge of living things. This is just the sort of pattern thatthe model will show after partial damage to visual seman-tics.

In summary, Farah and I have been able to show withthis model that some of the key data relevant to theorganization of seinantic memory in the brain is entirelyconsistent with the idea that the brain is organized by thetype of features rather than by the type of concept. On thisview concepts are in fact highly distributed, and crucially,the activation ora part of the concept depends on the abilityto activate other parts. In this way we are able to see howapparent category specificity of knowledge may emergeftom a network that is organized by modality or featuretype.

I turn now to a very different kind of neuropsychologicaldissociation, namely the pattern of deficits and sparedcapacities seen in hippocampal amnesia. My work on thisproblem with Mc Naughton and O' Reilly is currently inprogress (Mc Clelland, Mc Naughton & O' Reilly, 1993).I am sure everyone is aware of the dramatic pattern ofdeficits seen in patient HM - a pattern that is now apparentin the data of a large number of other patients with selectivebut extensive lesions to the hippocampus and related

Basic modelNonliving

Duality of func601lf1.emen6c memory Basic model

1.0

0.&

80 100% damagOlo vi.ua'somantic memory

0.0 80 100

% damaga to visualsomantic memo,,/

fiG. 9. Simulation of dissociations in semantic memory from Farah & Me Clelland (1991). (a) Effects ofJesions to visual semantics on naming livingthings and artifacts (b) Effects oflesions to functional semantics on the same. (c) Effects oflesions to visual semantics on activation of functional semantics.

Simulation de dissociation en memoire semantique, dapres Farah et Mc Clelland (1991). (a) EjJets de lesions sur la semantique visuelle concernant les nomsderres vivants et danefacts: (b) EjJets de lesions sur la semantique fonctionnelle concernant Ie mime sujet; (c) EjJets de lesions sur la semantique I'isllelleconcernant f'activation de .fa semantique fonctionnelle.

common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil

1HE ORGANIZATION OF MEMORY

structures in the medial temporal lobes. These patientsshow a dramatic deficit in the ability 10 acquire new explicitmemories of the contents. of specific episodes and events.However, at the same time they are completely normal inthe use of their existing semantic and procedural kno-wledge, and in fact their acquisition of new skills appears10 be completely intact. They also show normal repetitionpriming effects in a wide range of tasks. For example, theyshow a normal amount of perceptual facilitation in theidentification of visually presented words due to a priorpresentation of the word. They also show a severe tempo.rally graded retrograde amnesia for episodic information.Figure 10 shows data on this ftom the laboratory of Squire(Mac Kinnon and Squire, 1989; Squire et al. 1989). Herewe see a temporally graded retrograde amnesia that appearsto encompass 10 years or more. Memories of episodesftom childhood and adolescence appear to be independentof the hippocampus, but more recent memories do appear10 depend on the hippocampal formation.

Our view of the organization of mammalian memory thatis consistent with these facts is illustrated in

fig. I 1. FlI'St, weimagine that ultimately all kinds of knowledge can ultima-

100 Percent correct

CON IN - 8)

AMN IN . 4)

... .-'""....,

Number of20 memories

AMN

\ -... ,

before Decade1950

FIG. 10.- Temporally graded retrograde amnesia in humans. Reca11perfo~ce is shown as a function of the date of the event remembered.(a) Personal episodes (Mac Kinnan and Squire, 1989). (b) Public events(Squire, Haist, and Sbimamura, 1989).

Amnesie retrograde temporellement grat/uk chez /'homme. Nombre memoires rappelees par des sujets normaux et amnesiques. (a) tpisodespersonnels (Mac Kinnon et Squire. 1989) (b) tvenements publics (Squire.Haist et Shimamura. 1989).

577

FIG. 11.- Sketch of the human neoconica1 and hippocampal systems(from Me Clelland. Me Naughton. and O' Reilly, 1993).

Schema des systimes neoconical et hippocampique chez /'homme (d'apresMc Cieelland, Mc Naughton et O'Reilly, 1993).

tely be stored in the connections among neurons in thecortex and other non-hippocampal structures such as thebasal ganglia (for brevity we will refer to this as the corticalsystem). We imagine that this cortical system learns gra-dually, with each experience producing very small changes10 connection weights. These small changes, we assume,are the basis of repetition priming effects, and they gra.dually give rise to cognitive skills, such as the ability totranslate ftom spelling to sound, and semantic knowledge,such as that exhibited by Rumelhart s net. However, inaddition to this cortical system, we assume that there is alsorapid storage of traces of specific episodes within thehippocampus. On this view, when an event or experienceproduces a pattern of activation in the neocortex, it produ-ces . as the same time a pattern of activation at the input tothe hippocampus. So for example, if I am introduced tosomeone, I form a pattern of activation including theappearance of the person, the name, and other aspects ofthe situation. Synaptic modifications within the hippocam-pus itself then auto-associate the parts of the pattern. Later,when a retrieval cue is presented (let us say the personreappears and I wish to recall his name), this then producesa partial reinstatement on the hippocampal input pattern.This is then completed with the aid of the modified synap-ses in the hippocampus, and then reinstated in the neocor-tex via return projections. The assumption is that thesechanges are large enough so that with one or a few trialsthey may sustain accurate recall, while the changes that takeplace in the cortex are initially too small for that. Gradually,though, through repeated reinstatement of the same trace,the cortex may receive enough trials with the same associa-tion to learn it in the neocortical connections. This reinsta-tement can occur, I propose, either through repetition ofthe association over many different events; or throughrepeated reinstatement ftom the hippocampus. Thus on thisview the hippocampus can serve both as the initial cite ofstorage but also as teacher to the neocortex.

common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil

578 L. McCLELLMID

The above description provides an account of the data,and although not everyone accepts it, it is not really totallynew. Many investigators have suggested some or most these ideas with varying degrees of completeness. What I donot know of, however, is any very clear story about why thesystem might be organized in this way. Let me phrase thequestions as follows :

FU'St, why is it that we need a special system for rapidstorage of episodic memories, if knowledge of all types iseventually stored in connectionist within the neocortex?

Second, why is consolidation of information into theneocortex so slow? Why does it require many repeatedreinstatements apparently spanning years in some cases?

The final substantive point of my talk will be to suggestan answer to these questions. The answer requires us toreturn to the demonstration of the acquisition of proceduraland semantic representations that I presented earlier in thelecture. There I argued that the effective discovery of theshared structure that underlies an entire domain such asspelling-to-sound correspondence or the semantic organi-zation of knowledge of living things specifically requirCsgradual learning, in which the learning of anyone associa-tion is interleaved with learning about other associations.

Let me illustrate and amplify this point by comparingwhat happens in Rumelhart s semantic network ifwe try toteach it some new information in either of two differentways. The first way, which I call focussed learning, involvesteaching the network the new information all at once,without interleaving it with ongoing exposure to the struc-ture of the entire domain. The second way, which I callinterleaved learning, involves simply introducing the newinformation into the mix of experiences that characterizethe entire domain. As our example, I consider the case ofthe Penguin. Now as we all know, the penguin is a bird, butit can swim, and it cannot fly. I took Rumelhart s semanticnetwork and taught it these two things in the two differentways just described: The focussed case involved simply

presenting the two items repeatedly and watching thenetwork learn. The interleaved case involved adding thetwo new items to the same training set used previously andcontinuing training as before. The results of these experi-ments, for the case of . penguin can grow-swim-fly " areshown on the fig. 12. Here if we look at the number ofpresentations required for acquisition, in the left panel ofthe figure, we see that focused learning is better, sinceacquisition is considerably faster with focussed learningthan it is with interleaved learning. But it turns out that thisslight advantage has been purchased at a considerable costFor it turns out that with focussed learning, the training onthe case of the penguin has strongly corrupted the modelsunderstanding. of other concepts. Indeed if we consider thenetworkS knowledge of the concept robin, in the lowerpanel, we see that focussed learning has produced a drama-tic interference. In fact, the model now thinks that allanimals can swim, and even some plants. But in the caseof interleaved learning, we see very little interference withwhat is already known. To be sure there is a slight effect,

1JI

Learning that a Penguin can Swim but not FlySummedAbsoluteError

I ,I ,

: ,

I ~I ,I ,I ,

! ~

: 'b."'00 o"OoooOo()ooO-o-o Focussed

o()oo o()oo()o 00-0

1.2

Interleaved

1.0

0.8

0.6

0.2

20 40 &II 80 100

Presentations of penguin-can-swim-move-grow

Interference with Knowledge about Robins

SummedAbsoluteError

1.0

~.o-o-O.0-0o..oc-O' Focussed

.c-p D'"

. InterleavedT....10-

0.9

0.8

0.6

0.3

0.2 40 80 100

Presentations of penguin-can-swim-move-grow

FIG. 12. Effects of focussed and interleaved learning- (a) Learning thata penguin can swim but not fly. (b) Interference of/earning about thepenguin with knowledge about the robin (Me Clelland, Me Naughton& O'Reilly, 1993).

EjJets de l'apprentissage centre et inter-patine. (a) Connaissance du faitque Ie pingouin peut nager mais non voler. (b) Interfirence de laconnaissance concernant ie pingouin et de celie concernant ie rouge-gorge (Mc Clelland. Mc Naughton et O'Reilly, 1993).

but as interleaved learning proceeds this is even reducedgradually over trials. Thus with focussed learning newknowledge corrupts the structured database in the net-work; with interleaved learning new knowledge is graduallyadded with no disruption.

This demonstration recapitulates a phenomenon that haspreviously been reported by researchers who have tried touse networks of the type used in Rumelhart s semantic

network to model episodic memory (Mc Closkey and

common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil

579TIlE ORGANIZATION OF MEMORY

Cohen, 1989). They termed the phenomenon catastrophicinterference, and rejected networks of the type that gra-dually discover structure through incremental learning asmodels of episodic memory, since storage in episodicmemory clearly involves what I have called focussed lear-ning. I agree that back propagation networks are poormodels for this kind of learning. However, the kind learning that they are good at is just the kind of learning wethink the corte~ is specialized for - gradual discovery of

skills ftom the overall structure of experience.For the discovery of overall structure, gradual, interlea-

ved learning is necessary, so that the connection weights inthe network come to reflect influences of all of the exam-ples in the domain. In fact, it can be shown mathematicallythat in order to converge on a set of connection weights

that best captures the structure of a particular domain, it isnecessary in the limit to reduce the learning rate asymptoti-cally to 0; the closer it gets to zero, the better we willapproximate correct connection weights. This point leadsto the observation that if the brain is to be able to extractthe structure, it is going to have to learn slowly.

With these ideas in mind, we can now return to the twoquestions asked above.

FU'St, why is it that we need a special system for rapidlearning of the contents of specific episodes and events, ifknowledge of all types is eventually stored in connectionistwithin the neocortex?

The answer begins with the idea that the cortical systemis specialized for the extraction of shared structure of eventsand experiences. In order to do this, and to avoid catastro-phic interference with that structure, it is necessary for thecortical system to use a very small learning rate. In thiscontext, the role of the hippocampus is to provide a systemin which the contents of specific episodes and events canbe stored rapidly, without at the same time interfering withthe structure that has been extracted by the cortical system.

Second, why is consolidation of information into theneocortex so slow? Why does it require many repeatedreinstatements apparently spanning years in some cases?

My answer is that consolidation is slow precisely becauseit would be disruptive if information stored in the hippo-campus were incorporated in the neocortex all at once. Itis only when new information is gradually interleaved inthis way that we are able to incorporate new knowledge intothe neo~rtical system without interfering with what wealready know.

Let me conclude by summarizing briefly. FlI'St I reviewedthe parallel distributed processing ftamework, and showedhow it provided a mechanism for th~ gradual acquisition ofcognitive skills and for the gradual discovery of semanticstructure. Then I considered two issues that have arisenwithin the field of cognitive neuropsychology: FU'St, issemantic memory organized by conceptual category, or bymodality of information about each concept stored inmemory? Second, why do we have a hippocampus, whenultimately everything that we know is stored in the neocor-tex ? I argued that parallel distributed processing models

provide new ways of studying these questions, and indeedI suggested answers to these questions that make sense ofthe existing neuropsychological facts in terms of mecha-nisms that are consistent with the principles of paralleldistributed processing. In closing I would like simply to saythat the particular ideas I have presented are not the mainpoint of this talk. Rather, the main point is to indicate thatcomputational models. based on principles of paralleldistributed pr;ocessing, can help us gain further insight intothe neural mechanisms underlying cognition, as these arerevealed through the striking dissociations we" see in thebehavior of neurological patients.

REFERENCES

FAJWf MJ., HAMMOND K.H., MEHTA Z. " RATCLIFF G. (1989).Category-specificity and modality-specificity in semantic memory.Neuropsychoiogia, 27: 193-200.

FAJWf MJ. " McCLEU.AND 1.L. (1991). A compUUltional model of

semantic memory impairment: Modality specificity and emergentcategory specificity. Joumal of Experimental Psychoiogy: General,120: 339.357.

MeCLELLANDl.L., Me NAUGK1"ON B. " O'REILLY RC. (1993). Whywe have complementary lcamina systems in the hippocampus andneocortex: Insights from the successes and failures of connectionistmodels of learning and memory. Manuscript in preparation.

McCLELLAND 1.L., RUMELHAJlT D.E. and the PDP research group.(1986). Parallel distributed processing: Explorations in the micros.

tructure of cognition. Vol. II Cambridge, MA : MIT Press.Me CLOSKEY M. " COHEN NJ. (1989). Catastrophic interference in

connectionist networks: The sequential learning problem. IN:

Bower, (Ed. ), The Psychology of Learning and Motivation: Advancesin Research and Theory, 24. San Diego: Academic Press. '

MAe KiNNON D. " SQuIRE L.R (1989). Autobiographical memory inamnesia. Psychobiology, 17: 247-256.

PLAur D.C. " Me CLELLAND 1.L; (1993). Generalization with compo-nential attractors : Word and nonword reading in an attractor network.To appear In: Proceedings of the 15th Annual Conference fo theCognitive Science Society. Hillsdale, N1 : ErIbaum.

PLAur D.C., Me CLELLAND 1.L., " SEIDENBERG M.S. (1992). Readingwords and pseudowords: Are . two routes really necessary? Talkpresented at the Annual Mee/ing of the Psychonomic Society, SL Louis,

MO, November.RUMELHAJlT D.E. (1990). Brain style computation: Learning and genera-

lization. In: F. Zometzer, 1.L. Davis. and C. Lau (Eels.). Anintroduction to neural and electronic networks (pp. 408.420). Aca.demic Press: San Diego, CA.

RUMELHAJlT D.E., Me CLELLAND L. and the PDP research group.(1986). Parallel distributed processing: Explorations in the micros-tructure of cognition. Vol. 1. Cambridge. MA : MIT Press.

RUMELHAJlT D.E. &. TODD P.M. (1993). Learning and connectionistrepresentations. In: E. Meyer &. S. Kornblum. (Eds. ), Attentionand Perfonnance XIV: Synergies in experimental psychology. artifi-cial intelligence, and cognitive neuroscience (pp 3.30). Cambridge.MA : MIT Press.

SEJNOWSKl TJ. &. ROSENBERG C.R (1987). Parallel networks that learnto pronounce English taL Complex Systems, I, 145- 168.

SQUIRE L.R, IWST F. " SHiMAMlJRA AP. (1989). The neurology ofmemory: Quantitative assessment of retrograde amnesia in two groupsof amnesic patients. The Journal of Neuro:.science, 9 (3) ,828.839.

WARRINGTON E.K. " Me CAlmlY R (1983). Category specific accessdysphasia. Brain, 106: 859-878.

WARRINGTON E.K. &. Me CARmY R (1987). Categories of knowledge:Further ftactionation and an attempted integration. Brain, 110

1273-1296.WARRINGTON E.K. " SHALLICE T. (1984). Category specific semantic

impainnents. Brain, 107: 829-854.

common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil
common
Pencil