Psychological Review - Stanford

Psychological ReviewV O L U M E 88 N U M B E R S SEPTEMBER 1981

An Interactive Activation Model of Context Effects inLetter Perception: Part 1. An Account of Basic Findings

James L. McClelland and David E. RumelhartUniversity of California, San Diego

A model of context effects iiin various contexts. In the moqelinteractions of detectors forexcites detectors for visualletters consistent with the a<detectors for consistent wordsand send feedback to theceptibility of their constituentthe perceptual advantage foconsistent with the basic fact;model produces facilitation fiwords. Pseudowords activateof the active letters, andactivations of the letters inparently rule-governed perfi

perception is applied to the perception of letters, perception results from excitatory and inhibitory

isual features, letters, and words. A visual inputfeatures in the display. These excite detectors for

tive features. The letter detectors in turn exciteActive word detectors mutually inhibit each other

letjter level, strengthening activation and hence per-letters. Computer simulation of the model exhibitsletters in words over unrelated contexts and isabout the word advantage. Most importantly, the

>r letters in pronounceable pseudowords as well asdetectors for words that are consistent with most

feedback from the activated words strengthens thepseudoword. The model thus accounts for ap-

oi|mance without any actual rules.

As we perceive, we are continually ex-tracting sensory information to guide ourattempts to determine what is before us. Inaddition, we bring to perception a wealth ofknowledge about the objects we might seeor hear and the larger units in which theseobjects co-occur. As one of us has argued forthe case of reading (Rumelhart, 1977), ourknowledge of the objects we might be per-ceiving works together with the sensory in-formation in the perceptual process. Exactlyhow does the knowledge that we have inter-

Preparation of this article was supported by NationalScience Foundation Grants BNS-76-14830 and BNS-79-24062 to J. L. McClelland and Grant BNS-76-15024to D. E. Rumelhart, and by the Office of Naval Researchunder contract N00014-79-C-0323. We would like tothank Don Norman, James Johnston, and members ofthe LNR research group for helpful discussions of muchof the material covered in this article.

Requests for reprints may be sent to James L.McClelland or David E. Rumelhart at Department ofPsychology, C-009, University of California, San Diego,La Jolla, California 92093.

act with the input? And how does this in-teraction facilitate perception?

In this two-part article we have attemptedto take a few steps toward answering thesequestions. We consider one specific exampleof the interaction of knowledge and percep-tion—the perception of letters in words andother contexts. In Part 1 we examine themain findings in the literature on perceptionof letters in context and develop a modelcalled the interactive activation model toaccount for these effects. In Part 2 (Ru-melhart & McClelland, in press) we extendthe model in several ways. We present a setof studies introducing a new technique forstudying the perception of letters in context,independently varying the duration and tim-ing of the context and target letters. Weshow how the model fares in accounting forthe results of these experiments and discusshow the model may be extended to accountfor a variety of phenomena. We also presentan experiment that tests—and supports—a

Copyright 1981 by the American Psychological Association, Inc. 0033-295X/81/8805-0375S00.75

375

376 JAMES L. MCCLELLAND AND DAVID E. RUMELHART

counterintuitive prediction of the model. Fi-nally, we consider how the mechanisms de-veloped in the course of exploring our modelof word perception might be extended toperception of other sorts of stimuli.

Basic Findings on the Role of Context inPerception of Letters

The notion that knowledge and familiarityplay a role in perception has often been sup-ported by experiments on the perception ofletters in words (Bruner, 1957; Neisser,1967). It has been known for nearly 100years that it is possible to identify letters inwords more accurately than letters in ran-dom letter sequences under tachistoscopicpresentation conditions (Cattell, 1886; seeHuey, 1908, and Neisser, 1967, for reviews).However, until recently such effects wereobtained using whole reports of all of theletters presented. These reports are subjectto guessing biases, so that it was possible toimagine that familiarity did not determinehow much was seen but only how much couldbe inferred from a fragmentary percept. Inaddition, for longer stimuli, full reports aresubject to forgetting. We may see more let-ters than we can actually report in the caseof nonwords, but when the letters form aword, we may be able to retain as a singleunit the item whose spelling may simply beread out from long-term memory. Thus, de-spite strong arguments to the contrary byproponents of the view that familiar contextreally does influence perception, it has beenpossible until recently to imagine that thecontext in which a letter was presented in-fluences only the accuracy of postperceptualprocesses and not the process of perceptionitself.

The perceptual advantage of letters inwords. The seminal experiment of Reicher(1969) suggests that context does actuallyinfluence perceptual processing. Reicherpresented target letters in words, unpro-nounceable nonwords, and alone, followingthe presentation of the target display witha presentation of a patterned mask. The sub-ject was then tested on a single letter in thedisplay, using a forced choice between twoalternative letters. Both alternatives fit thecontext to form an item of the type pre-

sented, so that, for example in the case ofa word presentation, the alternative wouldalso form a word in the context.

Forced-choice performance was more ac-curate for letters in words than for lettersin nonwords or even for single letters. Sinceboth alternatives made a word with the con-text, it is not possible to argue that the effectis due to postperceptual guessing based onequivalent information extracted about thetarget letter in the different conditions. Itappears that subjects actually come awaywith more information relevant to a choicebetween the alternatives when the target let-ter is a part of a word. And, since one of thecontrol conditions was a single letter, it isnot reasonable to argue that the effect is dueto forgetting letters that have been per-ceived. It is hard to see how a single letter,once perceived, could be subject to a greaterforgetting than a letter in a word.

Reicher's (1969) finding seems to suggestthat perception of a letter can be facilitatedby presenting it in the context of a word. Itappears, then, that our knowledge aboutwords can influence the process of percep-tion. Our model presents a way of bringingsuch knowledge to bear. The basic idea isthat the presentation of a string of lettersbegins the process of activating detectors forletters that are consistent with the visual in-put. As these activations grow stronger, theybegin to activate detectors for words that areconsistent with the letters, if there are any.The active word detectors then produce feed-back, which reinforces the activations of thedetectors for the letters in the word. Lettersin words are more perceptible, because theyreceive more activation than representationsof either single letters or letters in an un-related context.

Reicher's basic finding has been investi-gated and extended in a large number ofstudies, and there now appears to be a setof important related findings that must alsobe explained.

Irrelevance of word shape. The effectseems to be independent of the familiarityof the word as a visual configuration. Theword advantage over nonwords is obtainedfor words in lowercase type, words in up-percase type, or words in a mixture of upper-

INTERACTIVE ACTIVATION MODEL, PART 1 377

and lowercase (Adams, 1979; McClelland,1976).

Role of patterned masking. The wordadvantage over single letters and nonwordsappears to depend upon the visual maskingconditions used (Johnston & McClelland,1973; Massaro & Klitzke, 1979; see alsoJuola, Leavitt, & Choe, 1974; Taylor &Chabot, 1978). The word advantage is quitelarge when the target appears in a distinct,high-contrast display followed by a pat-terned mask of similar characteristics. How-ever, the word advantage over single lettersis actually reversed, and the word advantageover nonwords becomes quite small when thetarget is indistinct, low in contrast, and/orfollowed by a blank, nonpatterned field.

Extension to pronounceable pseudo-words. The word advantage also applies topronounceable nonwords, such as REET orMAVE. A large number of studies (e.g.,Aderman & Smith, 1971; Baron & Thur-ston, 1973; Spoehr & Smith, 1975) haveshown that letters in pronounceable non-words (also called pseudowords) have a largeadvantage over letters in unpronounceablenonwords (also called unrelated letterstrings), and three studies (Carr, Davidson,& Hawkins, 1978; Massaro & Klitzke,1979; McClelland & Johnston, 1977) haveobtained an advantage for letters in pseu-dowords over single letters.

Absence of effects of contextual con-straint under patterned-mask conditions.One important finding, which rules out sev-eral of the models that have been proposedpreviously, is the finding that letters inhighly constraining word contexts have littleor no advantage over letters in weakly con-straining word contexts under the distinct-target/patterned-mask conditions that pro-duce a large word advantage (Johnston,1978; see also Estes, 1975). For example, ifthe set of possible stimuli contains onlywords, the context -HIP constrains the firstletter to be either an S, a C, or a W; whereasthe context -INK is compatible with 12 to14 letters (the exact number depends onwhat counts as a word). We might expectthat the former, more strongly constrainingcontext would produce superior detection ofa target letter. But in a very carefully con-trolled and executed study, Johnston (1978)

found no such effect. Although constraintsdo influence performance under other con-ditions (e.g., Broadbent & Gregory, 1968),they do not appear to make a difference un-der the distinct-target/patterned-mask con-ditions of the Johnston study.

To be successful, any model of word per-ception must provide an account not only forReicher's (1969) basic effect but for theserelated findings as well. Our model accountsfor all of these effects. We begin by pre-senting the model in abstract form. We thenfocus on the specific version of the modelimplemented in our simulation program andconsider some of the details. Subsequently,we turn to detailed considerations of thefindings we have discussed in this section.

The Interactive Activation Model

We approach the phenomena of word per-ception with a number of basic assumptionsthat we want to incorporate into the model.First, we assume that perceptual processingtakes place within a system in which thereare several levels of processing, each con-cerned with forming a representation of theinput at a different level of abstraction. Forvisual word perception, we assume that thereis a visual feature level, a letter level, anda word level, as well as higher levels of pro-cessing that provide "top-down" input to theword level.

Second, we assume that visual perceptioninvolves parallel processing. There are twodifferent senses in which we view perceptionas parallel. We assume that visual percep-tion is spatially parallel. That is, we assumethat information covering a region in spaceat least large enough to contain a four-letterword is processed simultaneously. In addi-tion, we assume that visual processing occursat several levels at the same time. Thus, ourmodel of word perception is spatially parallel(i.e., capable of processing several letters ofa word at one time) and involves processesthat operate simultaneously at several dif-ferent levels. Thus, for example, processingat the letter level presumably occurs simul-taneously with processing at the word leveland with processing at the feature level.

Third, we assume that perception is fun-damentally an interactive process. That is,


HIGHER LEVEL INPUT

VISUAL INPUT ACOUSTIC INPUT

Figure I . A sketch of some of the processing levels in-volved in visual and auditory word perception, with in-terconnections.

we assume that "top-down" or "concep-tually driven" processing works simulta-neously and in conjunction with "bottom-up" or "data driven" processing to providea sort of multiplicity of constraints thatjointly determine what we perceive. Thus,for example, we assume that knowledgeabout the words of the language interactswith the incoming featural information incodetermining the nature and time courseof the perception of the letters in the word.

Finally, we wish to implement these as-sumptions by using a relatively simplemethod of interaction between sources ofknowledge whose only "currency" is simpleexcitatory and inhibitory activations of aneural type.

Figure 1 shows the general conception ofthe model. Perceptual processing is assumedto occur in a set of interacting levels, eachcommunicating with several others. Com-munication proceeds through a spreadingactivation mechanism in which activation atone level spreads to neighboring levels. Thecommunication can consist of both excit-

atory and inhibitory messages. Excitatorymessages increase the activation level oftheir recipients. Inhibitory messages de-crease the activation level of their recipients.The arrrows in the diagram represent excit-atory connections, and the circular ends ofthe connections represent inhibitory connec-tions. The intralevel inhibitory loop repre-sents a kind of lateral inhibition in whichincompatible units at the same level com-pete. For example, since a string of four let-ters can be interpreted as at most one four-letter word, the various possible words mu-tually inhibit one another and in that waycompete as possible interpretations of thestring.

It is clear that many levels are importantin reading and perception in general, and theinteractions among these levels are impor-tant for many phenomena. However, a the-oretical analysis of all of these interactionsintroduces an order of complexity that ob-scures comprehension. For this reason, wehave restricted the present analysis to anexamination of the interaction between asingle pair of levels, the word and letter lev-els. We have found that we can account forthe phenomena reviewed above by consid-ering only the interactions between letterlevel and word level elements. Therefore, forthe present we have elaborated the modelonly on these two levels, as illustrated inFigure 2. We have delayed consideration ofthe effects of higher level processes and pho-nological processes, and we have ignored thereciprocity of activation that may occur be-tween word and letter levels and any otherlevels of the system. We consider aspects ofthe fuller model including these influencesin Part 2 (Rumelhart & McClelland, inpress).

Specific Assumptions

Representation assumptions. For everyrelevant unit in the system we assume thereis an entity called a node. We assume thatthere is a node for each word we know, andthat there is a node for each letter in eachletter position within a four-letter string.

The nodes are organized into levels. Thereare word level nodes and letter level nodes.Each node has connections to a number of


LETTERLEVEL

VISUAL INPUTFigure 2. The simplified processing system.

other nodes. The nodes to which a node con-nects are called its neighbors. Each connec-tion is two-way. There are two kinds of con-nections: excitatory and inhibitory. If twonodes suggest each other's existence (in theway that the node for the word the suggeststhe node for an initial t and vice versa), thenthe connections are excitatory. If two nodesare inconsistent with one another (in the waythat the node for the word the and the nodefor the word boy are inconsistent), then therelationship is inhibitory. Note that we iden-

tify nodes according to the units they detect,printing them in italics; stimuli presented tothe system are in uppercase letters.

Connections may occur within levels orbetween adjacent levels. There are no con-nections between nonadjacent levels. Con-nections within the word level are mutuallyinhibitory, since only one word can occur atany one place at any one time. Connectionsbetween the word level and letter level maybe either inhibitory or excitatory (dependingon whether the letter is a part of the wordin the appropriate letter position). We callthe set of nodes with excitatory connectionsto a given node its excitatory neighbors andthe set of nodes with inhibitory connectionsto a given node its inhibitory neighbors.

A subset of the neighbors of the letter tis illustrated in Figure 3. Again, excitatoryconnections are represented by the arrowsending with points, and inhibitory connec-tions are represented by the arrows endingwith dots. We emphasize that this is a smallsubset of the neighborhood of the initial t.The picture of the whole neighborhood, in-cluding all the connections among neighborsand their connections to their neighbors, ismuch too complicated to present in a two-dimensional figure.

Activation assumptions. There is asso-ciated with each node a momentary activa-tion value. This value is a real number, andfor node / we will represent it by a,(0- Anynode with a positive activation value is saidto be active. In the absence of inputs fromits neighbors, all nodes are assumed to decayback to an inactive state, that is, to an ac-tivation value at or below zero. This restinglevel may differ from node to node and cor-responds to a kind of a priori bias (Broad-bent, 1967) determined by frequency of ac-tivation of the node over the long term. Thus,for example, the nodes for high-frequencywords have resting levels higher than thosefor low-frequency words. In any case, theresting level for node / is represented by r,.For units not at rest, decay back to the rest-ing level occurs at some rate 0,.

When the neighbors of a node are active,they influence the activation of the node byeither excitation or inhibition, depending ontheir relation to the node. These excitatoryand inhibitory influences combine by a sim-


pie weighted average to yield a net input tothe unit, which may be either excitatory(greater than zero) or inhibitory. In math-ematical notation, if we let «,(<) representthe net input to the unit, we can write theequation for its value as

= 2 - 2 yikik(t), (1)

where ej(t) is the activation of an active ex-citatory neighbor of the node, each ik(t) isthe activation of an active inhibitory neigh-bor of the node, and a,j and yik are associatedweight constants. Inactive nodes have no in-fluence on their neighbors. Only nodes in anactive state have any effects, either excit-atory or inhibitory.

The net input to a node drives the acti-vation of the node up or down, dependingon whether it is positive or negative. Thedegree of the effect of the input on the nodeis modulated by the node's current activitylevel to keep the input to the node from driv-ing it beyond some maximum and minimumvalues (Grossberg, 1978). When the net in-put is excitatory, n,(0 > 0, the effect on thenode, fj(t), is given by

(2)

where M is the maximum activation level ofthe unit. The modulation has the desiredeffect, because as the activation of the unitapproaches the maximum, the effect of theinput is reduced to zero. M can be thought

Figure 3. A few of the neighbors of the node for the letter T in the first position in a word, and theirinterconnections.


of as a basic scale factor of the model, andwe have set its value to 1.0.

In the case where the input is inhibitory,n,(0 < 0, the effect of the input on the nodeis given by

(3)

where m is the minimum activation of theunit.

The new value of the activation of a nodeat time f + Af is equal to the value at timet, minus the decay, plus the influence of itsneighbors at time t:

a,(t + A/)

= a,(t) - f i ( t ) . (4)

Input assumptions. Upon presentationof a stimulus, a set of featural inputs is madeavailable to the system. Each feature in thedisplay will be detected with some proba-bility p. For simplicity it is assumed thatfeature detection occurs, if it is to occur atall, immediately after onset of the stimulus.The probability that any given feature willbe detected is assumed to vary with the vi-sual quality of the display. Features that aredetected begin sending activation to all letternodes that contain that feature. All letterlevel nodes that do not contain the extractedfeature are inhibited.

It is assumed that features are binary andthat we can extract either the presence orabsence of a particular feature. So, for ex-ample, when viewing the letter R we canextract, among other features, the presenceof a diagonal line segment in the lower rightcorner and the absence of a horizontal lineacross the bottom. In this way the modelhonors the conceptual distinction betweenknowing that a feature is absent and notknowing whether a feature is present.

Presentation of a new display following anold one results in the probabilistic extractionof the set of features present in the new dis-play. These features, when extracted, re-place the old ones in corresponding positions.Thus, the presentation of an E following theR described above would result in the re-placement of the two features describedabove with their opposites.

On making responses. One of the more

problematic aspects of a model such as thisone is a specification of how these relativelycomplex patterns of activity might be relatedto the content of percepts and the sorts ofresponse probabilities we observe in experi-ments. We assume that responses and per-haps the contents of perceptual experiencedepend on the temporal integration of thepattern of activation over all of the nodes.The integration process is assumed to occurslowly enough that brief activations maycome and go without necessarily becomingaccessible for purposes of responding or en-tering perceptual experience. However, asthe activation lasts longer and longer, theprobability that it will be reportable in-creases. Specifically, we think of the inte-gration process as taking a running averageof the activation of the node over previoustime:

(5)

In this equation, the variable x representspreceding time, varying between — oo andtime t. The exponential portion of the expres-sion weights the contribution of the activa-tion of the node in previous time intervals:Essentially, its effect is to reduce the con-tribution of prior activations as they recedefurther back in time. The parameter r rep-resents the relative weighting given to oldand new information and determines howquickly the output values change in responseto changes in the activations of the under-lying nodes. The larger the value of r, themore quickly the output values change. Re-sponse strength, in the sense of Luce's choicemodel (Luce, 1959), is an exponential func-tion of the running average activation:

s.(t) = -̂,«>_ (6)

The parameter n determines how rapidlyresponse strength grows with increases inactivation. Following Luce's formulation, weassume that the probability of making a re-sponse based on node / is given by

where L represents the set of nodes compet-ing at the same level with node /.


Most of the experiments we will be con-sidering test subjects' performance on oneof the letters in a word or other type of dis-play. In accounting for these results, we haveadopted the assumption that responding isalways based on the output of the letter level,rather than the output of the word level orsome combination of the two. The forcedchoice is assumed to be based only on thisletter-level information. The subject com-pares the letter selected for the appropriateposition against the forced-choice alterna-tives. If the letter selected is one of the al-ternatives, then that alternative is chosen inthe forced choice. If it is not one of the al-ternatives, then the model assumes that oneof the alternatives would simply be chosenat random.

One somewhat problematical issue in-volves deciding when to read out the resultsof processing and select response letters foreach letter position. When a target displayis simply turned on and left on until the sub-ject responds, and when there is no pressureto respond quickly, we assume that the sub-ject simply waits until the output strengthshave reached their asymptotic values. How-ever, when a target display is presentedbriefly followed by a patterned mask, theactivations produced by the target are tran-sient, as we shall see. Under these conditions,we assume that the subject learns throughexperience in the practice phase of the ex-periment to read out the results of processingat a time that allows the subject to optimizeperformance. For simplicity, we have as-sumed that readout occurs in parallel for allfour letter positions.

The Operation of the Model

Now, consider what happens when an in-put reaches the system. Assume that at timet0 all prior inputs have had an opportunityto decay, so that the entire system is in itsquiescent ̂ state, and each node is at its rest-ing level. The presentation of a stimulus ini-tiates a process in which certain features areextracted and excitatory and inhibitory pres-sures begin to act upon the letter-level nodes.The activation levels of certain letter nodesare pushed above their resting levels. Othersreceive predominantly inhibitory inputs and

are pushed below their resting levels. Theseletter nodes, in turn, begin to send activationto those word-level nodes they are consistentwith and inhibit those word nodes they arenot consistent with. In addition, within agiven letter position channel, the various let-ter nodes attempt to suppress each other,with the strongest ones getting the upperhand. As word-level nodes become active,they in turn compete with one another andsend feedback down to the letter-level nodes.If the input features were close to those forone particular set of letters and those letterswere consistent with those forming a partic-ular word, the positive feedback in the sys-tem will work to rapidly converge on theappropriate set of letters and the appropriateword. If not, they will compete with eachother, and perhaps no single set of letters orsingle word will get enough activation todominate the others. In this case the variousactive units might strangle each otherthrough mutual inhibition.

At any point during processing, the resultsof perceptual processing may be read outfrom the pattern of activations at the letterlevel into a buffer, where they may be keptthrough rehearsal or used as the basis forovert reports. The accuracy of this processdepends on a running average of the acti-vations of the correct node and of other com-peting nodes.

Simulations

Although the model is in essence quitesimple, the interactions among the variousnodes can become complex, so that themodel is not susceptible to a simple intuitiveor even mathematical analysis. Instead, wehave relied on computer simulations to studythe behavior of the model and to see if it isconsistent with the empirical data. A de-scription of the actual computer program isgiven in the Appendix.

For purposes of these simulations, we havemade a number of simplifying assumptions.These additional assumptions fall into threeclasses: (a) discrete rather than continuoustime, (b) simplified feature analysis of theinput font, and (c) a limited lexicon.

The simulation operates in discrete timeslices, or ticks, updating the activations of


all of the nodes in the system once each cycleon the basis of the values on the previouscycle. Obviously, this is simply a matter ofcomputational convenience and not a fun-damental assumption. We have endeavoredto keep the time slices "thin" enough so thatthe model's behavior is continuous for allintents and purposes.

Any simulation of the model involvesmaking explicit assumptions about the ap-propriate featural analysis of the input font.We have, for simplicity, chosen the font andfeatural analysis employed by Rumelhart(1970) and by Rumelhart and Siple (1974),illustrated in Figure 4. Although the exper-iments we have simulated employed differ-ent type fonts, we assume that the basic re-sults do not depend on the particular fontused. The simplicity of the present analysisrecommends it for the simulations, thoughit obviously skirts several fundamental issuesabout the lower levels of processing.

Finally, our simulations have been re-stricted to four-letter words. We haveequipped our program with knowledge of1,179 four-letter words occurring at leasttwo times per million in the Kucera andFrancis (1967) word count. Plurals, inflectedforms, first names, proper names, acronyms,abbreviations, and occasional unfamiliar en-tries arising from apparent sampling flukes

3CIEFGHIJKLMNDPQRBTU^WXYZ

wFigure 4. The features used to construct the letters inthe font assumed by the simulation program, and theletters themselves. (From "Process of Recognizing Ta-chistoscopically Presented Words" by David E. Ru-melhart and Patricia Siple, Psychological Review, 1974,81, 99-118. Copyright 1974 by the American Psycho-logical Association. Reprinted by permission.)

Figure 5. A hypothetical set of features that might beextracted on a trial in an experiment on word perception.

have been excluded. This sample appears tobe sufficient to reflect the essential charac-teristics of the language and to show howthe statistical properties of the language canaffect the process of perceiving letters inwords.

An Example

Let us now consider a sample run of oursimulation model. The parameter values em-ployed in the example are those used to sim-ulate all the experiments discussed in theremainder of Part 1. These values are de-scribed in detail in the following section. Forthe purposes of this example, imagine thatthe word WORK has been presented to thesubject and that the subject has extractedthose features shown in Figure 5. In the firstthree letter positions, the features of the let-ters W, O, and R have been completely ex-tracted. In the final position a set of featuresconsistent with the letters K and R have beenextracted, with the features that would dis-ambiguate the letter unavailable. We wishnow to chart the activity of the system re-sulting from this presentation. Figure 6shows the time course of the activations forselected nodes at the word and letter levels,respectively.

At the word level, we have charted theactivity levels of the nodes for the wordswork, word, wear, and weak. Note first thatwork is the only word in the lexicon consis-tent with all the presented information. Asa result, its activation level is the highest andreaches a value of .8 through the first 40time cycles. The word word is consistent withthe bulk of the information presented andtherefore first rises and later is pushed back


down below its resting level, as a result ofcompetition with work. The words wear andweak are consistent with the only letter ac-tive in the first letter position, and one of thetwo active in the fourth letter position. Theyare also inconsistent with the letters activein Positions 2 and 3. Thus, the activationthey receive from the letter level is quite

weak, and they are easily driven down wellbelow zero, as a result of competition fromthe other word units. The activations of theseunits do not drop quite as low, of course, asthe activation level of words such as gill,which contain nothing in common with thepresented information. Although not shownin Figure 6, these words attain near-mini-

word activations

work

Co

• 1-1-po

OO

-0.40

letter activations

co

-POo

-0.40

Figure 6. The time course of activations of selected nodes at the word and letter levels after extractionof the features shown in Figure 5.


mum activation levels of about —.20 and staythere as the stimulus stays on. Returning towear and weak, we note that these words areequally consistent with the presented infor-mation and thus drop together for about thefirst 9 time units. At this point, however, theword work has clearly taken the upper handat the word level, and produces feedbackthat reinforces the activation of the final kand not the final r. As a result, the wordweak receives more activation from the letterlevel than the word wear and begins to gaina slight advantage over wear. The strength-ened k continues to feed activation into theword level and strengthen consistent words.The words that contain an R continue toreceive activation from the r node also, butthey receive stronger inhibition from thewords consistent with a K and are thereforeultimately weakened, as illustrated in thelower panel of Figure 6.

The strong feature-letter inhibition en-sures that when a feature inconsistent witha particular letter is detected, that letter willreceive relatively strong net bottom-up in-hibition. Thus in our example, the infor-mation extracted clearly disconfirms the pos-sibility that the letter D has been presentedin the fourth position, and thus the activationlevel of the d node decreases quickly to nearits minimum value. However, the bottom-upinformation from the feature level supportseither a K or an R in the fourth position.Thus, the activation of each of these nodesrises slowly. These activations, along withthose for W, O, and R, push the activationof work above zero, and it begins to feedback; by about Time Cycle 4, it is beginningto push the k above the r (because WORKis not a word). Note that this separationoccurs just before the words weak and wearseparate. It is the strengthening of k due tofeedback from work that causes them to sep-arate.

Ultimately, the r reaches a level well be-low that of k where it remains, and the kpushes toward a .8 activation level. As dis-cussed below, the word-to-letter inhibitionand the letter-to-letter inhibition have bothbeen set to 0. Thus, k and r both co-exist atmoderately high levels, the r fed only fromthe bottom up, and the k fed from both bot-tom up and top down.

Finally, consider the output values for the

letter output values

0 10 20 30

TIME

Figure 7. Output values for the letters r, k, and d afterpresentation of the display shown in Figure 5.

letter nodes r, k, and d. Figure 7 shows theoutput values for the simulation. The outputvalue is the probability that if a response wasselected at time t, the letter in questionwould be selected as the output or responsefrom the system. As intended, these outputvalues grow somewhat more slowly than thevalues of the letter activations themselvesbut eventually, as they reach and hold theirasymptotic values, come to reflect the acti-vations of the letter nodes. Since in the ab-sence of masking subjects can afford to waitto read out a response until the output valueshave had a chance to stabilize, they wouldbe highly likely to choose the letter K as theresponse. .

Although this example is not very generalin that we assumed that only partial infor-mation was available in the input for thefourth letter position, whereas full infor-mation was available at the other letter po-sitions, it does illustrate many of the impor-tant characteristics of the model. It showshow ambiguous sensory information can bedisambiguated by top-down processes. Herewe have a very simple mechanism capableof applying knowledge of words in the per-ception of their component letters.

Parameter Selection

Once the basic simulation model was con-structed, we began a lengthy process of at-tempting to simulate the results of severalrepresentative experiments in the literature.


Only two parameters of the model were al-lowed to vary from experiment to experi-ment: (a) the probability of feature extrac-tion and (b) the timing of the presentationof the masking stimulus if one was used.

The probability of feature extraction isassumed to depend on the visual character-istics of the display. In most of the experi-ments we will consider, a bright, high-con-trast target was used. Such a target wouldproduce perfect performance if not followedby a patterned mask. In these cases proba-bility of feature extraction was fixed at 1.0and the timing of the target offset and coin-cident mask onset typically was adjusted toachieve 75% correct performance over thedifferent experimental conditions of interest.In simulating the results of these experi-ments, we likewise varied the timing of thetarget offset/mask onset to achieve the rightaverage correct performance from the model.

In some experiments no patterned maskwas used, and performance was kept belowperfect levels by using a dim or otherwisedegraded target display. In these cases theprobability of feature extraction was set toa value less than 1.0, which produces aboutthe right overall performance level.

The process of exploring the behavior ofthe model amounted to an extended searchfor a set of values for all the other param-eters that would permit the model to simu-late, as closely as possible, the results of allof the experiments to be discussed later inPart 1, as well as those to be considered inPart 2 (Rumelhart & McClelland, in press).To constrain the search, we adopted variousrestrictive simplifications. First, we assumedthat all nodes have the same maximum ac-tivation value. In fact, the maximum was setto 1.0, and served to scale all activationswithin the model. The minimum activationvalue for all nodes was set at —.20, a valuethat permits rapid reactivation of stronglyinhibited nodes. The decay rate of all nodeswas set to the value of .07. This parametereffectively serves as a scale factor that de-termines how quickly things are allowed tochange in a single time slice. The .07 valuewas picked after some exploration, since itseemed to permit us to run our simulationswith the minimum number of time slices pertrial, at the same time as it minimized a kind

of reverberatory oscillation that sets in whenthings are allowed to change too much onany given time cycle. We also assigned theresting value of zero to all of the letter nodes.The resting value of nodes at the word levelwas set to a value between —.05 and 0, de-pending on word frequency.

We have assumed that the weight param-eters, oiij and jij depend only on the pro-cessing levels of nodes i and j and on no othercharacteristics of their identity. This means,among other things, that the excitatory con-nections between all letter nodes and all ofthe relevant word nodes are equally strong,independent of the identity of the words.Thus, for example, the degree to which thenode for an initial t excites the node for theword tock is exactly the same as the degreeto which it excites the node for the word this,in spite of a substantial difference in fre-quency of usage. To further simplify mat-ters, the word-to-letter inhibition was alsoset to zero. This means that feedback fromthe word level can strengthen activations atthe letter level but cannot weaken them.

The output from the detector network hasessentially two parameters. The value .05was used for the parameter r, which deter-mines how quickly the output values changein response to changes in the activations ofthe underlying nodes. This value is smallenough that the output values change rela-tively slowly, so that transient activationscan come and go without much effect on theoutput. The value 10 was given to the pa-rameter n in Equation 6 above. The param-eter is essentially a scale factor relating ac-tivations in the model to response strengthsin the Luce formulation.

The values of the remaining parameterswere fixed at the values given in Table 1. Itis worth noting the differences between thefeature-letter influences and the letter-wordinfluences. The feature-letter inhibition is30 times as strong as the feature-letter ex-citation. This means that all of the featuresdetected must be compatible with a partic-ular letter before that letter will receive netexcitation (since there are only 14 possiblefeatures, there can only be a maximum of13 excitatory inputs whenever there is a sin-gle inhibitory input). The main reason forchoosing this value was to permit the pre-

INTERACTIVE ACTIVATION MODEL, PART 387

Table 1Parameter Values Used in the Simulations

Parameter Value

Feature-letter excitationFeature-letter inhibition

Letter-word excitationLetter-word inhibition

Word-word inhibitionLetter-letter inhibitionWord-letter excitation

.005

.15

.07

.04

.210.30

sentation of a mask to clear the previouspattern of activation. On the other hand, theletter-word inhibition is actually somewhatless than the letter-word excitation. Whenonly one letter is active in each letter posi-tion, this means that the letter level will pro-duce net excitation of all words that sharetwo or more letters with the target word.Because of these multiple activations,strong word-word inhibition is necessary to"sharpen" the response of the word level, aswe will see. In contrast, no such inhibitionis necessary at the letter level. For these rea-sons, the letter-letter inhibition has been setto 0, whereas the word-word inhibition hasbeen set to .21.

Comments on Related Formulations

Before turning to the application of themodel to the experimental literature, somecomments on the relationship of this modelto other models extant in the literature is inorder. We have tried to be synthetic. Wehave taken ideas from our own previous workand from the work of others in the literature.In what follows, we attempt to identify thesources of most of the assumptions of themodel and to show in what ways our modeldiffers from the models we have drawn on.

First of all, we have adopted the approachof formulating the model in terms similar tothe way in which such a process might ac-tually be carried out in a neural or neural-like system. We do not mean to imply thatthe nodes in our system are necessarily re-lated to the behavior of individual neurons.We will, however, argue that we have keptthe kinds of processing involved well withinthe bounds of capability for simple neural

circuits. The approach of modeling infor-mation processing in a neural-like systemhas recently been advocated by Szentagothaiand Arbib (1975) and is represented in manyof the articles presented in the volume byHinton and Anderson (1981) as well asmany of the specific models mentionedbelow.

One case in point is the work of Levin(1976). He proposed a parallel computa-tional system capable of interactive process-ing that employs only excitation and inhi-bition as its currency. Although our modelcould not be implemented exactly in the for-mat of their system (called Proteus), it isclearly in the spirit of their model and couldreadily be implemented within a variant ofthe Proteus system.

In a recent article McClelland (1979) hasproposed a cascade model of perceptual pro-cessing in which activations on each level ofthe system drive those at the next higherlevel. This model has the properties that par-tial outputs are continuously available forprocessing and that every level of the systemprocesses the input simultaneously. Thepresent model certainly adopts these as-sumptions. It also generalizes them, permit-ting information to flow in both directionssimultaneously.

Hinton (Note 1) has developed a relax-ation model for visual perception in whichmultiple constraints interact by means of in-crementing and decrementing real num-bered strengths associated with various in-terpretations of a portion of the visual scenein an attempt to attain a maximally consis-tent interpretation of the scene. Our modelcan be considered a relaxation system inwhich activation levels are manipulated toget an optimal interpretation of an inputword.

James Anderson and his colleagues (An-derson, 1977; Anderson, Silverstein, Ritz,& Jones, 1977) and Kohonen and his col-leagues (Kohonen, 1977) have developed apattern recognition system which they callan associative memory system. Their systemshares a number of commonalities with ours.One feature the models share is the schemeof adding and subtracting weighted excita-tion values to generate output patterns thatrepresent cleaned-up versions of the input


patterns. In particular, our atj and ytj cor-respond to the matrix elements of the as-sociative memory models. Our model differsin that it has multiple levels and employs anonlinear cumulation function similar to onesuggested by Grossberg (1978), as men-tioned above.

Our model also draws on earlier work inthe area of word perception. There is, ofcourse, a strong similarity between thismodel and the logogen model of Morton(1969). What we have implemented mightbe called a hierarchical, nonlinear, logogenmodel with feedback between levels and in-hibitory interactions among logogens at thesame level. We have also added dynamicassumptions that are lacking from the log-ogen model.

The notion that word perception takesplace in a hierarchical information-process-ing system has, of course, been advocated byseveral researchers interested in word per-ception (Adams, 1979; Estes, 1975; Johnston& McClelland, 1980; LaBerge & Samuels,1974; McClelland, 1976). Our model differsfrom those proposed in many of these papersin that processing at different levels is ex-plicitly assumed to take place in parallel.Many of the models are not terribly expliciton this topic, although the notion that partialinformation could be passed along from onelevel to the next so that processing could goon at the higher level while it was continuingat the lower level had been suggested byMcClelland (1976). Our model also differsfrom all of these others, except that of Ad-ams (1979), in assuming that there is feed-back from the word level to the letter level.The general formulation suggested by Ad-ams (1979) is quite similar to our own, al-though she postulates a different sort ofmechanism for handling pseudowords (ex-citatory connections among letter nodes) anddoes not present a detailed account.

Our mechanism for accounting for theperceptual facilitation of pseudowords in-volves, as we will see below, the integrationof feedback from partial activation of a num-ber of different words. The idea that pseu-doword perception could be accounted forin this way was inspired by Glushko (1979),who suggested that partial activation andsynthesis of word pronunciations could ac-

count for the process of constructing a pro-nunciation for a novel pseudoword.

The feature-extraction assumptions andthe bottom-up portion of the word recog-nition model are nearly the same as thoseemployed by Rumelhart (1970, Note 2) andRumelhart and Siple (1974). The interactivefeedback portion of the model is clearly oneof the class of models discussed by Rumel-hart (1977) and could be considered a sim-plified control structure for expressing themodel proposed in that paper.

Application of the Simulation Model toSeveral Basic Findings

We are finally ready to see how well ourmodel fares in accounting for the findingsof several representative experiments in theliterature. In discussing each account, wewill try to explain not only how well the sim-ulation works but why it behaves as it does.As we proceed through the discussion, wewill have occasion to describe several inter-esting synergistic properties of the modelthat we did not anticipate but discovered aswe explored the behavior of the system. Asmentioned previously, the actual parametersused in both the examples that we will dis-cuss and in the simulation results we willreport are those summarized in Table 1. Wewill consider the robustness of the model,and the effects of changes in these param-eters, in the discussion section at the end ofPart 1.

The Word Advantage and the Effects ofVisual Conditions

As we noted previously, word perceptionhas been studied under a variety of differentvisual conditions, and it is apparent that dif-ferent conditions produce different results.The advantage of words over nonwords ap-pears to be greatest under conditions inwhich a bright, high-contrast target is fol-lowed by a patterned mask with similar char-acteristics. The word advantage appears tobe considerably less when the target presen-tation is dimmer or otherwise degraded andis followed by a blank white field.

Typical data demonstrating these points(from Johnston & McClelland, 1973) are


Table 2Effect of Display Conditions on Proportion ofCorrect Forced Choices in Word and LetterPerception (From Johnston & McClelland,1973)

Display type

Visual condition

Bright target/patterned maskDim target/blank mask

Word

.80

.78

Letter withnumber

signs

.65

.73

presented in Table 2. Forced-choice perfor-mance on letters in words is compared toperformance on letters embedded in a rowof number signs (e.g., READ vs. #£##). Thenumber signs serve as a control for lateralfacilitation or inhibition. This factor appearsto be important under dim-target/blank-mask conditions.

Target durations were adjusted separatelyfor each condition, so that it is only the pat-tern of differences within display conditionsthat is meaningful. The data show that a15% word advantage was obtained in thebright-target/patterned-mask condition andonly a 5% word advantage in the dim-target/blank-mask condition. Massaro and Klitzke(1979) obtained about the same size effects.Various aspects of these results have alsobeen corroborated in two other studies (Juolaet al, 1974; Taylor & Chabot, 1978).

To understand the difference betweenthese two conditions it is important to notethat in order to get about 75% correct per-formance in the no-mask condition, the stim-ulus must be highly degraded. Since thereis no patterned mask, the iconic trace pre-sumably persists considerably beyond theoffset of the target. It is our assumption thatthe effect of the blank mask is simply toreduce the contrast of the icon by summatingwith it. Thus, the limit on performance isnot so much the amount of time availablein which to process the information as it isthe quality of the information made avail-able to the system. In contrast, when a pat-terned mask is employed, the mask producesspurious inputs, which can interfere with theprocessing of the target. Thus, in the bright-target/patterned-mask conditions, the pri-

mary limitation on performance is theamount of time that the information is avail-able to the system in relatively legible formrather than the quality of the informationpresented. This distinction between the wayin which blank masks and patterned masksinterfere with performance has previouslybeen made by a number of investigators, in-cluding Rumelhart (1970) and Turvey(1973). We now consider each of these sortsof conditions in turn.

Word perception under patterned-maskconditions. When a high-quality display isfollowed by a patterned mask, we assumethat the bottleneck in performance does notcome in the extraction of feature informa-tion from the target display. Thus, in oursimulation of these conditions, we assumethat all of the features presented can be ex-tracted on every trial. The limitation on per-formance comes from the fact that the ac-tivations produced by the target are subjectto disruption and replacement by the maskbefore they can be translated into a per-manent form suitable for overt report. Thisgeneral idea was suggested by Johnston andMcClelland (1973) and considered by anumber of other investigators, includingCarr et al. (1978), Massaro and Klitzke(1979), and others. On the basis of this idea,a number of possible reasons for the advan-tage for letters in words have been suggested.One is that letters in words are for somereason translated more quickly into a non-maskable form (Johnston & McClelland,1973; Massaro & Klitzke, 1979). Anotheris that words activate representations re-moved from the direct effects of visual pat-terned masking (Carr et al., 1978; Johnston& McClelland, 1973, 1980; McClelland,1976). In the interactive activation model,the reason letters in words fare better thanletters in nonwords is that they benefit fromfeedback that can drive them to higher ac-tivation levels. As a result, the probabilitythat the activated letter representation willbe correctly encoded is increased.

To understand in detail how this accountworks, consider the following example. Fig-ure 8 shows the operation of our model forthe letter E both in an unrelated (#) contextand in the context of the word READ fora visual display of moderately high quality.


We assume that display conditions are suf-ficient for complete feature extraction, sothat only the letters actually contained in thetarget receive net excitatory input on thebasis of feature information. After somenumber of cycles have gone by, the mask ispresented with the same parameters as the

target. The mask simply replaces the targetdisplay at the feature level, resulting in acompletely new input to the letter level. Thisinput, because it contains features incom-patible with the letter shown in all four po-sitions, immediately begins to drive down theactivations at the letter level. After only a

letter level activations

co

O

O

I.OCU

0.6C_

(X2C_

-a 20

E in READ

OUtput values

l.OCL-

a a _

_

O-O

OLQ_

0.2C_

-aza

Figure 8. Activation functions (top) and output values (bottom) for the letter E, in unrelated contextand in the context of the word READ.


few more cycles, these activations drop be-low resting level in both cases. Note that thecorrect letter was activated briefly, and nocompeting letter was activated. However,because of the sluggishness of the outputprocess, these activations do not necessarilyresult in a high probability of correct report.As shown in the top half of Figure 8, theprobability of correct report reaches a max-imum after 16 cycles at a performance levelfar below the ceiling.

When the letter is part of the word (inthis case, READ), the activation of the let-ters results in rapid activation of one or morewords. These words, in turn, feed back to theletter level. This results in a higher net ac-tivation level for the letter embedded in theword.

Our simulation of the word advantageunder patterned-mask conditions used thestimulus list that was used for simulating theblank-mask results. Since the internal work-ings of the model are completely determin-istic as long as probability of feature ex-traction is 1.0, it was only necessary to runeach item through the model once to obtainthe expected probability that the critical let-ter would be encoded correctly for each itemunder each variation of parameters tried.

As described previously, we have assumedthat readout of the results of processing oc-curs in parallel.for all four letter positionsand that the subject learns through practiceto choose a time to read out in order to op-timize performance. We have assumed thatreadout time may be set at a different pointin different conditions, as long as they areblocked so that the subject knows in advancewhat type of material will be presented oneach trial in the experiment. Thus, in sim-ulating the Johnston and McClelland (1973)results, we allowed for different readouttimes for letters in words and letters in un-related contexts, with the different times se-lected on the basis of practice to optimizeperformance on each type of material.

A final feature of the simulation is theduration of the target display. This was var-ied to produce an average performance onboth letters embedded in number signs andletters in words that was as close as possibleto the average performance on these twoconditions in the 1973 experiment of John-

ston and McClelland. The value used for therun reported below was 15 cycles. As in theJohnston and McClelland study, the maskfollowed the target immediately.

The simulation replicated the experimen-tal data shown in Table 2 quite closely. Ac-curacy on the forced choice was 81 % correctfor the letters embedded in words and 66%correct for letters in an unrelated (#) con-text.

It turns out that it is not necessary to allowfor different readout times for different ma-terial types. A repetition of the simulationproduced a 15% word advantage when thesame readout time was chosen for both singleletters and letters in words, based on optimalperformance averaged over the two materialtypes. Thus, the model is consistent with thefact that the word advantage does not de-pend on separating the different stimulustypes into separate blocks (Massaro &Klitzke, 1979).

Perception of letters in words under con-ditions of degraded input. In conditions ofdegraded (but not abbreviated) input, therole of the word level is to selectively rein-force possible letters that are consistent withthe visual information extracted and that arealso consistent with the words in the subject'svocabulary. Recall that the task requires thesubject to choose between two letters, bothof which (on word trials) make a word withthe rest of the context. There are two distinctcases to consider. Either the featural infor-mation extracted from the to-be-probed let-ter is sufficient to distinguish between thealternatives, or it is not. Whenever the fea-tural information is consistent with both ofthe forced-choice alternatives, any feedbackwill selectively enhance both alternativesand will not permit the subject to distinguishbetween them. When the information ex-tracted is inconsistent with one of the alter-natives, the model produces a word advan-tage. The reason is that we assume forced-choice responses are based not on the featureinformation itself but on the subject's bestguess about what letter was actually shown.Feedback from the word level increases theprobability of correct choice in those caseswhere the subject extracts information thatis inconsistent with the incorrect alternativebut consistent with the correct alternative


and a number of others. Thus, feedbackwould have the effect of helping the subjectselect the actual letter shown from severalpossibilities consistent with the set of ex-tracted features. Consider again, for exam-ple, the case of the presentation of WORDdiscussed above. In this case, the subjectextracted incomplete information about thefinal letter consistent with both R and K.Assume that the forced choice the subjectwas to face on this trial was between a Dand a K. The account supposes that the sub-ject encodes a single letter for each letterposition before facing the forced choice.Thus, if the features of the final letter hadbeen extracted in the absence of any context,the subject would encode R or K equallyoften, since both are equally compatible withthe features extracted. This would leave thesubject with the correct response some of thetime. But if R were chosen instead, the sub-ject would enter the forced choice betweenD and K without knowing the correct answerdirectly. When the whole word display isshown, the feedback generated by the pro-cessing of all of the letters greatly strength-ens the K, increasing the probability that itwill be chosen over the R and thus increasingthe probability that the subject will proceedto the forced choice with the correct responsein mind.

Our interpretation of the small word ad-vantage in blank-mask conditions is a spe-cific version of the early accounts of the wordadvantage offered by Wheeler (1970) andThompson and Massaro (1973) before it wasknown that the effect depends on masking.Johnston (1978) has argued that this typeof account does not apply under patterned-mask conditions. We are suggesting that itdoes apply to the small word advantage ob-tained under blank-mask conditions likethose of the Johnston and McClelland (1973)experiment. We will see below that themodel offers a different account of perfor-mance under patterned-mask conditions.

We simulated our interpretation of thesmall word advantage obtained in blank-mask conditions in the following way. A setof 40 pairs of four-letter words that differedby a single letter was prepared. The differingletters occurred in each position equallyoften. From these words corresponding con-

trol pairs were generated in which the crit-ical letters from the word pairs were pre-sented in nonletter contexts (#s). Becausethey were presented in nonletter contexts, weassumed that these letters did not engage theword processing system at all.

Each member of each pair of items waspresented to the model four times, yieldinga total of 320 stimulus presentations of wordstimuli and 320 presentations of single let-ters. On each presentation, the simulationsampled a random subset of the possible fea-tures to be detected by the system. The prob-ability of detection of each feature was setat .45. As noted previously, these values arein a ratio of 1 to 30, so that if any one ofthe 14 features extracted is inconsistent witha particular letter, that letter receives netinhibition from the features and is rapidlydriven into an inactive state.

For simplicity, the features were treatedas a constant input, which remained on whileletter and word activations (if any) wereallowed to take place. At the end of 50 pro-cessing cycles, which is virtually asymptotic,output was sampled. Sampling results in theselection of one letter to fill each position;the selected letter is assumed to be all thesubject takes away from the target display.As described previously, the forced choiceis assumed to be based only on this letteridentity information. The subject comparesthe letter selected for the appropriate posi-tion against the forced-choice alternatives.If the letter selected is one of the alterna-tives, then that alternative is selected. If itis not one of the alternatives, then one of thetwo alternatives is simply picked at random.

The simulation produced a 10% advantagefor letters in words over letters embedded innumber signs. Probability-correct forcedchoice for letters embedded in words was78% correct, whereas for letters in numbersigns, performance was 68% correct.

The simulated results for the no-maskcondition clearly show a smaller word ad-vantage than for the patterned-mask case.However, the model produces a larger wordadvantage, which is observed in the experi-ment (Table 2). As Johnston (1978) haspointed out, there are a number of reasonswhy an account such as the one we haveoffered would overestimate the size of the


word advantage. First, subjects may occa-sionally be able to retain an impression ofthe actual visual information they have beenable to extract. On such occasions, feedbackfrom the word level will be of no furtherbenefit. Second, even if subjects only retaina letter identity code, they may tend tochoose the forced-choice alternative that ismost similar to the letter encoded—insteadof simply guessing—when the letter encodedis not one of the two choices. This wouldtend to result in a greater probability of cor-rect choices and less of a chance for feedbackto increase accuracy of performance. It ishard to know exactly how much these factorsshould be expected to reduce the size of theword advantage under these conditions, butthey would certainly bring it more closelyin line with the results.

Perception of Letters in RegularNonwords

One of the most important findings in theliterature on word perception is that an itemneed not be a word in order to produce fa-cilitation with respect to unrelated letter orsingle letter stimuli. The advantage for pseu-dowords over unrelated letters has been ob-tained in a very large number of studies(Aderman & Smith, 1971; Baron & Thur-ston, 1973; Carr et al., 1978; McClelland,1976; Spoehr & Smith, 1975). The pseu-doword advantage over single letters hasbeen obtained in three studies (Carr et al.,1978; Massaro & Klitzke, 1979; McClelland& Johnston, 1977).

Our model produces the facilitation forpseudowords by allowing them to activatenodes for words that share more than oneletter in common with the display. Whenthey occur, these activations produce feed-back which strengthens the letters that gaverise to them just as in the case of words.These activations occur in the model if thestrength of letter-to-word inhibition is rea-sonably small compared to the strength ofletter-to-word excitation.

To see how this takes place in detail, con-sider a brief presentation of the pseudowordMAVE followed by a patterned mask. (Thepseudoword is one used by Glushko, 1979,in developing the idea that partial activa-

o• l-«-po

•H

-poo

MAVE

TIME

Figure 9. Activation at the word level upon presentationof the nonword MAVE.

tions of words are combined to derive pro-nunciations of pseudowords.) As illustratedin Figure 9, presentation of MAVE resultsin the initial activation of 16 different words.Most of these words, like have and gave,share three letters with MAVE. By andlarge, these words steadily gain in strengthwhile the target is on and produce feedbackto the letter level, sustaining the letters thatsupported them.

Some of the words are weakly activatedfor a brief period of time before they fallback below zero. These typically are wordslike more and many, which share only twoletters with the target but are very high infrequency, so they need little excitation be-fore they exceed threshold. But soon afterthey exceed threshold, the total activationat the word level becomes strong enough toovercome the weak excitatory input, causingthem to drop down just after they begin torise. Less frequent words sharing two letterswith the word displayed have a worse fatestill. Since they start out initially at a lowervalue, they generally fail to receive enoughexcitation to reach threshold. Thus, whenthere are several words that have three let-ters in common with the target, words thatshare only two letters with the target tendto exert little or no influence. In general then,with pronounceable pseudoword stimuli, theamount of feedback—and hence the amountof facilitation—depends primarily on theactivation of nodes for words that share threeletters with a displayed pseudoword. It is the


nodes for these words that primarily interactwith the activations generated by the pre-sentation of the actual target display. Inwhat follows we will call the words that havethree letters in common with the target letterstring the neighbors of that string.

The amount of feedback a particular letterin a nonword receives depends, in the model,on two primary factors and two secondaryfactors. The two primary factors are thenumber of words in the neighborhood thatcontain the target letter and the number ofwords that do not. In the case of the M inMAVE, for example, there are seven wordsin the neighborhood of MAVE that beginwith M, so the m node gets excitatory feed-back from all of these. These words arecalled the "friends" of the m node in thiscase. Because of competition at the wordlevel, the amount of activation that thesewords receive depends on the total numberof words that have three letters in commonwith the target. Those that share three let-ters with the target but are inconsistent withthe m node (e.g., have) produce inhibitionthat tends to limit the activation of thefriends of the m node, and can thus be con-sidered its "enemies." These words also pro-duce feedback that tends to activate lettersthat were not actually presented. For ex-ample, activation from have produces excit-atory input to the h node, thereby producingsome competition with the m node. Theseactivations, however, are usually not terriblystrong. No one word gets very active, andso letters not in the actual display tend toget fairly weak excitatory feedback. Thisweak excitation is usually insufficient toovercome the bottom-up inhibition actingon nonpresented letters. Thus, in most cases,the harm done by top-down activation ofletters that were not shown is minimal.

A part of the effect we have been describ-ing is illustrated in Figure 10. Here, we com-pare the activations of the nodes for the let-ters in MAVE. Without feedback, the fourcurves would be identical to the one single-letter curve included for comparison. So al-though there is facilitation for all four let-ters, there are definitely differences in theamount, depending on the number of friendsand enemies of each letter. Note that withina given pseudoword, the total number offriends and enemies (i.e., the total number

letter level

-PD

-POO

-0.2d

Figure 10. Activation functions for the letters a and von presentation of MA VE. (Activation function for e isindistinguishable from function for a, and that for m issimilar to that for v. The activation function for a singleletter (si), or a letter in an unrelated context is includedfor comparison.)

of words with three letters in common) is thesame for all the letters.

There are two other factors that affect theextent to which a particular word will be-come active at the word level when a par-ticular pseudoword is shown. Although theeffects of these factors are only weakly re-flected in the activations at the letter level,they are nevertheless interesting to note,since they indicate some synergistic effectsthat emerge from the interplay of simpleexcitatory and inhibitory influences in theneighborhood. These are the rich-get-richereffect and the gang effect. The rich-get-richer effect is illustrated in Figure 11,which compares the activation curves for thenodes for have, gave, and save under pre-sentation of MAVE. The words differ in fre-quency, which gives the words slight differ-ences in baseline activation. What isinteresting is that the difference gets mag-nified; so that at the point of peak activation,there is a much larger difference. The reasonfor the amplification can be seen by consid-ering a system containing only two nodes,a and b, starting at different initial positiveactivation levels, a and b at time t. Let ussuppose that a is stronger than b at t. Thenat t + I , a will exert more of an inhibitoryinfluence on b, since inhibition of a givennode is determined by the sum of the acti-vations of all nodes other than itself. This


the "rich get richer" effect

have

-poo

TIME

Figure I I . The rich-get-richer effect. (Activation func-tions for the nodes for have, gave, and save under pre-sentation of MAVE.)

advantage for the initially more active nodesis compounded further in the case of theeffect of word frequency by the fact thatmore frequent words creep above thresholdfirst, thereby exerting an inhibitory effect onthe lower frequency words when the latterare still too weak to fight back at all.

Even more interesting is the gang effect,which depends on the coordinated action ofa related set of word nodes. This effect isdepicted in Figure 12. Here, the activationcurves for the move, male, and save nodesare compared. In the language, move andmake are of approximately equal frequency,so their activations start out at about thesame level. But they soon pull apart. Simi-larly, save starts out below move but soonreaches a higher activation. The reason forthese effects is that male and save are bothmembers of gangs with several members,whereas move is not. Consider first the dif-ference between male and move. The reasonfor the difference is that there are severalwords that share the same three letters withMAVE as male does. In the list of wordsused in our simulations, there are six. Thesewords all work together to reinforce the m,and a, and the e nodes, thereby producingmuch stronger reinforcement for themselves.Thus, these words make up a gang called thema-e gang. In this example, there is also a-jave gang consisting of 6 other words, ofwhich save is one. All of these work togetherto reinforce the a, v, and e. Thus, the a ande are reinforced by two gangs, whereas the

letters v and m are reinforced by only oneeach. Now consider the word move. Thisword is a loner; there are no other words inits gang, the m-ve gang. Although two ofthe letters in move receive support from onegang each, and one receives support fromboth other gangs, the letters of move are lessstrongly enhanced by feedback than the let-ters of the members of the other two gangs.Since continued activation of one word inthe face of the competition generated by allof the other partially activated words de-pends on the activations of the componentletter nodes, the words in the other two gangseventually gain the upper hand and drivemove back below the activation threshold.

As our study of the MAVE example il-lustrates, the pattern of activation producedby a particular pseudoword is complex andidiosyncratic. In addition to the basic friendsand enemies effects, there are also the rich-get-richer and the gang effects. These effectsare primarily reflected in the pattern of ac-tivation at the word level, but they also exertsubtle influences on the activations at theletter level. In general though, the main re-sult is that when the letter-to-word inhibitionis low, all four letters in the pseudoword re-ceive some feedback reinforcement. The re-sult, of course, is greater accuracy of re-porting letters in pseudowords compared tosingle letters.

Comparison of performance on words andpseudowords. Let us now consider the factthat the word advantage over pseudowords

the "qanq" effectgangO.IOL_

co-pa

oo aoc_

10 20 30

TIMEFigure 12. The gang effect. (Activation functions formove, male, and save under presentation of MA VE.)


Table 3Actual and Simulated Results of theMcClelland & Johnston (1977) Experiments(Proportion of Correct Forced Choice)

a in different contexts

Result class

Actual dataHigh BFLow BFAverage

SimulationHigh BFLow BFAverage

Word

.81

.78

.80

.81

.79

.80

Target type

Pseudoword

.79

.77

.78

.79

.77

.78

Singleletter

.67

.64

.66

.67

.67

.67

Note. BF = bigram frequency.

is generally rather small in experimentswhere the subject knows that the stimuli in-clude pseudowords. Some fairly represen-tative results, from the study of McClellandand Johnston (1977), ar% illustrated in Table3. The visual conditions of the study werethe same as those used in the patterned-maskcondition in Johnston and McClelland(1973). Trials were blocked, so subjectscould adopt the optimum strategy for eachtype of material. The slight word-pseudo-word difference, though representative, isnot actually statistically reliable in thisstudy.

CAVE

TIME

Figure 13. Activity at the word level upon presentationof CAVE, with weak letter-to-word inhibition.

-poa

-a 20

Figure 14. Activation functions for the letter a, underpresentation of CA VE and MA VE and alone.

Words differ from pseudowords in that aword strongly activates one node at the wordlevel, whereas a pseudoword does not. Whilewe would tend to think of this as increasingthe amount of feedback for words as opposedto pseudowords, there is the word-level in-hibition that must be taken into account.This inhibition tends to equalize the totalamount of activation at the word level be-tween words and pseudowords. With words,the word shown tends to dominate the pat-tern of activity, thereby keeping all thewords that have three letters in common withit from achieving the activation level theywould reach in the absence of a node acti-vated by all four letters. This situation isillustrated for the word CAVE in Figure 13.The result is that the sum of the activationsof all the active units at the word level is notmuch different between the two cases. Thus,CAVE produces only slightly more facili-tation for its constituent letters than MAVE,as illustrated in Figure 14.

In addition to the leveling effect of com-petition at the word level, it turned out thatin our model, one of the common design fea-tures of studies comparing performance onwords and pseudowords would operate tokeep performance relatively good on pseu-dowords. In general, the stimulus materialsused in most of these studies are designedby beginning with a list of pairs of wordsthat differ by one letter (e.g., PEEL-PEEP),From each pair of words, a pair of non-words is generated, differing from the orig-


inal word pair by just one of the contextletters and thereby keeping the actual targetletters—and as much of the context as pos-sible—the same between word and pseudo-word items (e.g., TEEL-TEEP). A previ-ously unnoticed side effect of this matchingprocedure is that it ensures that the criticalletter in each pseudoword has at least onefriend, namely the word from the matchingpair that differs from it by one context letter.In fact, most of the critical letters in thepseudowords used by McClelland and John-ston (1977) tended to have relatively fewenemies, compared to the number of friends.In general, a particular letter should be ex-pected to have three times as many friendsas enemies. In the McClelland and Johnstonstimuli, the great majority of the stimuli hadmuch larger differentials. Indeed, more thanhalf of the critical letters had no enemiesat all.

The puzzling absence of cluster frequencyeffects. In the account we have just de-scribed, facilitation of performance on let-ters in pseudowords was explained by thefact that pseudowords tend to activate alarge number of words, and these words tendto work together to reinforce the activationsof letters. This account might seem to sug-gest that pseudowords that have commonletter clusters, and therefore have severalletters in common with many words, wouldtend to produce the greatest facilitation.However, this factor has been manipulatedin a number of studies, and little has beenfound in the way of an effect. The Mc-Clelland and Johnston (1977) study is onecase in point. As Table 3 illustrates, thereis only a slight tendency for superior per-formance on high cluster frequency words.This slight tendency is also observed in singleletter control stimuli, suggesting that thedifference may be due to differences in per-ceptibility of the target letters in the differ-ent positions, rather than cluster frequencyper se. In any case, the effect is very small.Other studies have likewise failed to find anyeffect of cluster frequency (Spoehr & Smith,1975; Manelis, 1974). The lack of an effectis most striking in the McClelland and John-ston study, since the high and low clusterfrequency items differed widely in clusterfrequency as measured in a number of ways.

In our model, the lack of a cluster fre-quency effect is due to the effect of mutualinhibition at the word level. As we have seen,this mutual inhibition tends to keep the totalactivity at the word level roughly constantover a variety of different input patterns,thereby greatly reducing the advantage forhigh cluster frequency items. Items contain-ing infrequent clusters tend to activate fewwords, but there is less competition at theword level, so that the words that do becomeactive reach higher activation levels.

The situation is illustrated for the non-words TEEL and HOET in Figure 15. Al-though TEEL activates many more words,the total activation is not much different inthe two cases.

The total activation is not, of course, thewhole story. The ratio of friends to enemiesis also important. And it turns out that thisratio is working against the high clusteritems more than the low cluster items. InMcClelland and Johnston's stimuli, only oneof the low cluster frequency nonword pairshad critical letters with any enemies at all!For 23 out of 24 pairs, there was at least onefriend (by virtue of the method of stimulusconstruction) and no enemies. In contrast,for the high cluster frequency pairs, therewas a wide range, with some items havingseveral more enemies than friends.

To simulate the McClelland and Johnston(1977) results, we had to select a subset oftheir stimuli, since some of the words theyused were not in our word list. The stimulihad been constructed in sets containing aword pair, a pseudoword pair, and a singleletter pair that differed by the same lettersin the same position (e.g., PEEL-PEEPTEEL-TEEP; X P). We simply se-lected all those sets in which both words inthe pair appeared in our list. This resultedin a sample of 10 high cluster frequency setsand 10 low cluster frequency sets. The singleletter stimuli derived from the high and lowcluster frequency pairs were also run throughthe simulation. Both members of each pairwere tested.

Since the stimuli were presented in theactual experiment blocked by material type,we separately selected an optimal time forreadout for words, pseudowords, and singleletters. Readout time was the same for high


and low cluster frequency items of the sametype, since these were presented in a mixedlist in the actual experiment. As in the sim-ulation of the Johnston and McClelland(1973) results, the display was presented fora duration of 15 cycles.

The simulation results, shown in Table 3,reveal the same general pattern as the actual

data. The magnitude of the pseudoword ad-vantage over single letters is just slightlysmaller than the word advantage, and theeffect of cluster frequency is very slight.

We have yet to consider how the modeldeals with unrelated letter strings. This de-pends a little on the exact characteristics ofthe strings. First let us consider truly ran-

words active

LCD

_QEDC

teel

hoet\

300 10 20

summed word activations

co

•rH-PO

.poo

l.OOL-

aec_

o.2c_

-a za10 20 30

TIMEFigure 15. The number of words activated (top) and the total activation at the word level (bottom) uponpresentation of the nonwords TEEL and HOET.


domly generated consonant strings. Suchitems typically produce some activation atthe word level in our model, since they tendto share two letters with several words (oneletter out of four is insufficient to activatea word, since three inhibitory inputs out-weight one excitatory input). These stringsrarely have three letters in common with anyone word. Thus, they only tend to activatea few words very weakly, and because of theweakness of the bottom-up excitation, com-petition among partially activated wordskeep any one word from getting very active.So, little benefit results. When we ran oursimulation on randomly generated conso-nant strings, there was only a 1% advantageover single letters.

Some items which have been used as un-pronounceable nonwords or unrelated letterstrings do produce a weak facilitation. Weran the nonwords used by McClelland andJohnston (1977) in their Experiment 2.These items contain a large number of vow-els in positions that vowels typically occupyin words, and they therefore activate morewords than, for example, random strings ofconsonants. The simulation was run underthe same conditions as the one reportedabove for McClelland and Johnston's Ex-periment 1. The simulation produced a slightadvantage for letters in these nonwords,compared to single letters, as did the exper-iment. In both the simulation and the actualexperiment, forced-choice performance was4% more accurate for letters in these unre-lated letter strings than in single letterstimuli.

On the basis of this characteristic of ourmodel, the results of one experiment on theimportance of vowels in reading may be rein-terpreted. Spoehr and Smith (1975) foundthat subjects were more accurate when re-porting letters in unpronounceable nonwordsthat contained vowels than in those com-posed of all consonants. They interpreted theresults as supporting the view that subjectsparse letter strings into "vocalic centergroups." However, an alternative possibleaccount is that the strings containing vowelshad more letters in common with actualwords than the all consonant strings.

In summary, the model provides a goodaccount of the perceptual advantage for let-

ters in pronounceable nonwords, and for thelack of such an advantage in unrelated letterstrings. In addition, it accounts for the smalldifference between performance on wordsand pseudowords and for the absence of anyreally noticeable cluster frequency effect inthe McClelland and Johnston (1977) exper-iment.

The Role of Lexical Constraints

The Johnston (1978) experiment. Sev-eral models that have been proposed to ac-count for the word advantage rely on theidea that the context letters in a word fa-cilitate performance by constraining the setof possible letters that might have been pre-sented in the critical letter position. Accord-ing to models of this class, contexts thatstrongly constrain what the target lettershould be result in greater accuracy of per-ception than more weakly constraining con-texts. For example, the context -HIP shouldfacilitate the perception of an initial S morethan the context .JNK. The reason is that-HIP is more strongly constraining, sinceonly three letters (S, C, and W) fit in thecontext to make a word, compared to -INK,where nine letters (D, F, K, L, M, P, R, S,and W) fit in the context to make a word.In a test of such models, Johnston (1978)compared accuracy of perception of lettersoccurring in high- and low-constraint con-texts. The same target letters were tested inthe same positions in both cases. For ex-ample, the letters 5 and W were tested inthe high-constraint -HIP context and thelow-constraint -INK context. Using bright-target/patterned-mask conditions, Johnstonfound no difference in accuracy of percep-tion between letters in the high- and low-constraint contexts. The results of this ex-periment are shown in Table 4. Johnstonmeasured letter perception in two ways. Henot only asked the subjects to decide whichof two letters had been presented (the forced-choice measure), but he also asked subjectsto report the whole word and recorded howoften they got the critical letter correct. Nosignificant difference was observed in eithercase. In the forced choice there was a slightdifference favoring low-constraint items,


Table 4Actual and Simulated Results (ProbabilityCorrect) From Johnston (1978) Experiments

Constraint

Result class High Low

Actual dataForced choiceFree report

SimulationForced choiceFree report

.77

.54

.77

.56

.79

.54

.76

.54

but in the free report there was no differenceat all.

Although our model does use contextualconstraints (as they are embodied in specificlexical items), it turns out that it does notpredict that highly constraining contextswill facilitate perception of letters muchmore than weakly constraining contextsunder bright-target/patterned-mask condi-tions. Under such conditions, the role of theword level is not to help the subject selectamong alternatives left open by an incom-plete feature analysis process, as most con-straint-based models have assumed, butrather to help strengthen the activation ofthe nodes for the letters presented. Contex-tual constraints, at least as manipulated byJohnston, do not have much effect on themagnitude of this strengthening effect.

In detail, what happens in the model whena word is shown is that the presentation re-sults in weak activation of the words thatshare three letters with the target. Some ofthese words are friends of the critical letterin that they contain the actual critical lettershown, as well as two of the letters from thecontext (e.g., shop is a friend of the initialS in SHIP). Some of the words, however,are enemies of the critical letter in that theycontain the three context letters of the wordbut a different letter in the critical letterposition (e.g., chip and whip are enemies ofthe S in SHIP). From our point of view,Johnston's (1978) constraint manipulationis essentially a manipulation of the numberof enemies the critical letter has in the givencontext. Johnston's high- and low-constraintstimuli have equal numbers of friends, onthe average, but (by design) the high-con-

straint items have fewer enemies, as shownin Table 5.

In the simulation, the friends and enemiesof the target word receive some activation.The greater number of enemies in the low-constraint condition is responsible for thesmall effect of constraint that the model pro-duces. What happens is that the enemies ofthe critical letter tend to keep nodes for thepresented word and for the friends of thecritical letter from being quite as stronglyactivated as they would otherwise be. Theeffect is quite small for two reasons. First,the node for the word presented receives fourexcitatory inputs from the letter level, andall other words can only receive at most threeexcitatory inputs and at least one inhibitoryinput. As we saw in the case of the wordCAVE, the node for the correct word dom-inates the activations at the word level andis predominantly responsible for any feed-back to the letter level. Second, while thehigh-constraint items have fewer enemies,by more than a two-to-one margin, bothhigh- and low-constraint items have, on theaverage, more friends than enemies. Thefriends of the target letter work with theactual word shown to keep the activationsof the enemies in check, thereby reducingthe extent of their inhibitory effect still fur-ther. The ratio of the number of friends overthe total number of neighbors is not verydifferent in the two conditions, except in thefirst serial position.

This discussion may give the impressionthat contextual constraint is not an impor-tant variable in our model. In fact, it is quitepowerful. But its effects are obscured in theJohnston (1978) experiment because of thestrong dominance of the target word whenall the features are extracted and the factthat we are concerned with the likelihood ofperceiving a particular letter rather thanperformance in identifying correctly whatwhole word was shown. We will now con-sider an experiment in which contextual con-straints played a strong role, because thecharacteristics just mentioned were absent.

The Broadbent and Gregory (1968) ex-periment. Up to now we have found noevidence that either bigram frequency orlexical constraints have any effect onperformance. However, in experiments using


Table 5Friends and Enemies of the Critical Letters in the Stimuli Used by Johnston (1978)

position

1234

Average

Friends

3.339.176.304.965.93

High constraint

Enemies

2.221.001.701.671.65

Ratio

.60

.90

.79

.75

Friends

3.616.637.756.676.17

Low constraint

Enemies

6.442.884.303.504.27

Ratio

.36

.70

.64

.66

the traditional whole report method, thesevariables have been shown to have substan-tial effects. Various studies have shown thatrecognition thresholds are lower, or recog-nition accuracy at threshold higher, whenrelatively unusual words are used (Bou-whuis, 1979; Havens & Foote, 1963; New-bigging, 1961). Such items tend to be lowin bigram frequency and at the same timehigh in lexical constraint.

In one experiment, Broadbent and Greg-ory (1968) investigated the role of bigramfrequency at two different levels of word fre-quency and found an interesting interaction.We now consider how our model can accountfor their results. To begin, it is important tonote that the visual conditions of their ex-periment were quite different from those ofMcClelland and Johnston (1977), in whichthe data and our model failed to show a bi-gram frequency effect, and of Johnston(1978), in which the data and the modelshowed little or no constraint effect. Theconditions were like the dim-target/blank-mask conditions discussed above, in that thetarget was shown briefly against an illumi-nated background, without being followedby any kind of mask. The dependent measurewas the probability of correctly reporting thewhole word. The results are indicated inTable 6. A slight advantage for high bigramfrequency items over low bigram frequencywas obtained for frequent words, althoughit was not consistent over different subsetsof items tested. The main finding was thatwords of low bigram frequency had an ad-vantage among infrequent words. For thesestimuli, higher bigram frequency actuallyresulted in a lower percent correct.

Unfortunately, Broadbent and Gregoryused five-letter words, so we were unable to

run a simulation on their actual stimuli.However, we were able to select a subset ofthe stimuli used in the McClelland and John-ston (1977) experiment that fit the require-ments of the Broadbent and Gregory design.We therefore presented these stimuli to ourmodel, under the presentation parametersused in simulating the blank-mask conditionof the Johnston and McClelland (1973) ex-periment above. The only difference was thatthe output was taken, not from the letterlevel, as in all of our other simulations, butdirectly from the word level. The results ofthe simulation, shown in Table 6, replicatethe obtained pattern very nicely. The sim-ulation produced a large advantage for thelow bigram items, among the infrequentwords, and produced a slight advantage forhigh bigram items among the frequent words.

In our model, low-frequency words of highbigram frequency are most poorly recog-nized, because these are the words that havethe largest number of neighbors. Under con-ditions of incomplete feature extraction,which we expect to prevail under these visual

Table 6Actual and Simulated Results of the Broadbentand Gregory (1968) Experiment (Proportion ofCorrect Whole Report)

Word frequency

Result class High Low

Actual dataHigh BFLow BF

SimulationHigh BFLowBF

.64

.64

.41

.39

.43

.58

.21

.37

Note. BF = bigram frequency.


conditions, the more neighbors a word hasthe more likely it is to be confused with someother word. This becomes particularly im-portant for lower frequency words. As wehave seen, if both a low-frequency word anda high-frequency word are equally compat-ible with the detected portion of the input,the higher frequency word will tend to dom-inate. When incomplete feature informationis extracted, the relative activation of thetarget and the neighbors is much lower thanwhen all the features have been seen. Indeed,some neighbors may turn out to be just ascompatible with the features extracted as thetarget itself. Under these circumstances, theword of the highest frequency will tend togain the upper hand. The probability of cor-rectly reporting a low-frequency word willtherefore be much more strongly influencedby the presence of a high-frequency neighborcompatible with the input than the other wayaround.

But why does the model actually producea slight reversal with high-frequency words?Even here, it would seem that the presenceof numerous neighbors would tend to hurtinstead of facilitate performance. However,we have forgotten the fact that the activationof neighbors can be beneficial as well asharmful. The active neighbors produce feed-back that strengthens most or all of the let-ters, and these in turn increase the activationof the node for the word shown. As it hap-pens, there turns out to be a delicate balancefor high-frequency words between the neg-ative and positive effects of neighbors, whichonly slightly favors the words with moreneighbors. Indeed, the effect only holds forsome of these items. We have not yet hadthe opportunity to explore all the factors thatdetermine whether the effect of neighborsin individual cases will on balance be positiveor negative.

Different effects in different experi-ments. This discussion of the Broadbentand Gregory (1968) experiment indicatesonce again that our model is something ofa chameleon. The model produces no effectof constraint or bigram frequency under thevisual conditions and testing procedures usedin the Johnston (1978) and McClelland andJohnston (1977) experiments but does pro-duce such effects under the conditions of the

Broadbent and Gregory (1968) experiment.This flexibility of the model, of course, isfully required by the data. While there areother models of word perception that canaccount for one or the other type of result,to our knowledge the model presented hereis the only scheme that has been worked outto account for both.

Discussion

The interactive activation model does agood job of accounting for the results in theliterature on the perception of letters inwords and nonwords. The model provides aunified explanation of the results of a varietyof experiments and provides a framework inwhich the effects of manipulations of thevisual display characteristics used may beanalyzed. In addition, as we shall see in Part2 (Rumelhart & McClelland, in press), themodel readily accounts for a variety of ad-ditional phenomena. Moreover, as we shallalso show, it can be extended beyond its cur-rent domain of applicability with substantialsuccess. In Part 2 we will report a numberof experiments demonstrating what we call"context enhancement effects" and showhow the model can explain the major find-ings in the experiments.

One issue that deserves some considera-tion is the robustness of the model. To whatextent do the simulations depend upon par-ticular parameter values? What are the ef-fects of changes of the parameter values?These are extremely complex questions, andwe do not have complete answers. However,we have made some observations. First, thebasic Reicher (1969) effect can be obtainedunder a very wide range of different param-eters, though of course its exact size willdepend on the ensemble of parameter values.However, one thing that seems to be impor-tant is the overpowering effect of one incom-patible feature in suppressing activations atthe letter level. Without this strong bottom-up inhibition, the mask would not effectivelydrive out the activations previously estab-lished by the stimulus. Second, performanceon pronounceable nonwords depends on therelative strength of letter-word excitationcompared to inhibition and on the strengthof the competition among word units. Pa-


rameter values can be found which produceno advantage for any multiletter strings ex-cept words, whereas other values can befound that produce large advantages forwords, pseudowords, and even many non-word strings. The effects (or rather the lackof effects) of letter-cluster frequency andconstraints likewise depend on these param-eters.

It thus appears that relatively strong fea-ture-letter inhibition is necessary, but at thesame time, relatively weak letter-word in-hibition is necessary. This discrepancy is abit puzzling, since we would have thoughtthat the same general principles of operationwould have applied to both the letter and theword levels. A possible way to resolve thediscrepancy might be to introduce a moresophisticated account of the way maskingworks. It is quite possible that new inputsact as position-specific "clear signals," dis-rupting activations created by previous pat-terns in corresponding locations. Some pos-sible physiological mechanisms that wouldproduce such effects at lower processing lev-els have been described by Weisstein, Ozog,and Szoc (1975) and by Breitmeyer andGanz (1976), among others. If we used sucha mechanism to account for the basic effectof masking, it might well be possible to lowerthe feature-letter inhibition considerably.Lowering feature-letter inhibition wouldthen necessitate strong letter-letter inhibi-tion, so that letters that exactly match theinput would be able to dominate those withonly partial matches. With these changes theletter and word levels would indeed operateby the same principles.

Perhaps it is a bit premature to discusssuch issues as robustness, since there are anumber of problems that we have not yetresolved. First, we have ignored the fact thatthere is a high degree of positional uncer-tainty in reports of letters—particularly let-ters in unrelated strings, but occasionallyalso in reports of letters in words and pseu-dowords (Estes, 1975; McClelland, 1976;McClelland & Johnston, 1977). Anotherthing that we have not considered very fullyis the serial position curve. In general, itappears that performance is more accurateon the end letters in multiletter strings, par-ticularly the first letter. In Part 2 we consider

ways of extending the model to account forboth of these aspects of perceptual perfor-mance.

Third, there are some effects of set onword perception that we have not considered.Johnston and McClelland (1974) found thatperception of letters in words was actuallyhurt if subjects focused their attention on asingle letter position in the word (see alsoHolender, 1979, and Johnston, 1974). Inaddition, Aderman and Smith (1971) foundthat the advantage for pseudowords overunrelated letters only occurs if the subjectexpects that pseudowords will be shown; andmore recently, Carr et al. (1978) have rep-licated this finding, while at the same timeshowing that it is apparently not necessaryto be prepared for presentations of actualwords. Part 2 considers how our model iscompatible with this effect also. We will alsoconsider how our model might be extendedto account for some recent findings demon-strating effects of letter and word maskingon perception of letters in words and othercontexts.

In all but one of the experiments we havesimulated, the primary (if not the only) datafor the experiments were obtained fromforced choices between pairs of letters, orstrings differing by a single letter. In thesecases, it seemed to us most natural to relyon the output of the letter level as the basisfor responding. However, it may well be thatsubjects often base their responses on theoutput of the word level. Indeed, we haveassumed that they do in experiments like theBroadbent and Gregory (1968) study, inwhich subjects were told to report what wordthey thought they had seen. This may alsohave happened in the McClelland and John-ston (1977) and Johnston (1978) studies, inwhich subjects were instructed to report allfour letters before the forced choice on sometrials. Indeed, both studies found that theprobability of reporting all four letters cor-rectly for letters in words was greater thanwe would expect given independent process-ing of each letter position. It seems naturalto account for these completely correct re-ports by assuming that they often occurredon occasions where the subject encoded the

• item as a word. Even in experiments whereonly a forced choice is obtained, on many


occasions subjects may still come away witha word, rather than a sequence of letters.

In the early phases of the development ofour model, we explicitly included the pos-sibility of output from the word level as wellas the letter level. We assumed that the sub-ject would either encode a word, with someprobability dependent on the activations atthe word level or, failing that, would encodesome letter for each letter position dependenton the activations at the letter level. How-ever, we found that simply relying on theletter level permitted us to account equallywell for the results. In essence, the reasonis that the word-level information is incor-porated into the activations at the letter levelbecause of the feedback, so that the wordlevel is largely redundant. In addition, ofcourse, readout from the letter level is nec-essary to the model's account of performancewith nonwords. Since it is adequate to ac-count for all of the forced-choice data, andsince it is difficult to know exactly how muchof the details of free-report data should beattributed to perceptual processes and howmuch to such things as possible biases in thereadout processes and so forth, we have stuckfor the present with readout from the letterlevel.

Another decision that we adopted in orderto keep the model within bounds was to ex-clude the possibility of processing interac-tions between the visual and phonologicalsystems. However, in the model as sketchedat the outset (Figure 1), activations at theletter level interacted with a phonologicallevel as well as the word level. Perhaps themost interesting feature of our model is itsability to account for performance on lettersin pronounceable nonwords without assum-ing any such interactions. We will also seein Part 2 (Rumelhart & McClelland, inpress) that certain carefully selected unpro-nounceable consonant strings produce quitelarge contextual facilitation effects, com-pared to other sequences of consonants,which supports our basic position that pro-nounceability per se is not an important fea-ture of the perceptual facilitation effects wehave accounted for.

Another simplification we have adoptedin Part 1 has been to consider only cases inwhich individual letters or strings of letterswere presented in the absence of a linguistic

context. In Part 2 we will consider the effectsof introducing contextual inputs to the wordlevel, and we will explore how the modelmight work in processing spoken words incontext as well.

Reference Notes

1. Hinton, G. E. Relaxation and its role in vision. Un-published doctoral dissertation, University of Edin-burgh, Scotland, 1977.

2. Rumelhart, D. E. A multicomponent theory of con-fusion among briefly exposed alphabetic characters(Tech. Rep. 22). San Diego: University of California,San Diego, Center for Human Information Process-ing, 1971.

References

Adams, M. J. Models of word recognition. CognitivePsychology, 1979, / / , 133-176.

Aderman, D., & Smith, E. E. Expectancy as a deter-minant of functional units in perceptual recognition.Cognitive Psychology, 1971, 2, 117-129.

Anderson, J. A. Neural models with cognitive impli-cations. In D. LaBerge & S. J. Samuels (Eds.), Basicprocesses in reading: Perception and comprehension.Hillsdale, N.J.: Erlbaum, 1977.

Anderson, J. A., Silverstein, J. W., Ritz, S. A., & Jones,R. S. Distinctive features, categorical perception, andprobability learning: Some application's of a neuralmodel. Psychological Review, 1977, 84, 413-451.

Baron, J., & Thurston, I. An analysis of the word-su-periority effect. Cognitive Psychology, 1973, 4, 207-228.

Bouwhuis, D. G. Visual recognition of words. Eind-hoven, The Netherlands: Greve Offset B. V., 1979.

Breitmeyer, B. G., & Ganz, L. Implications of sustainedand transient channels for theories of visual patternmasking, saccadic suppression, and information pro-cessing. Psychological Review, 1976, 83, 1-36.

Broadbent, D. E. Word-frequency effect and responsebias. Psychological Review, 1967, 74, 1-15.

Broadbent, D. E., & Gregory, M. Visual perception ofwords differing in letter digram frequency. Journalof Verbal Learning and Verbal Behavior, 1968, 7,569-571.

Bruner, J. S. On perceptual readiness. PsychologicalReview, 1957, 64, 123-152.

Carr, T. H., Davidson, B. J., & Hawkins, H. L. Per-ceptual flexibility in word recognition: Strategies af-fect orthographic computation but not lexical access.Journal of Experimental Psychology: Human Per-ception and Performance, 1978, 4, 674-690.

Cattell, J. M. The time taken up by cerebral operations.Mind, 1886, 11, 220-242.

Estes, W. K. The locus of inferential and perceptualprocesses in letter identification. Journal of Experi-mental Psychology: General, 1975, 1, 122-145.

Glushko, R. J. The organization and activation of or-thographic knowledge in reading words aloud. Jour-nal of Experimental Psychology: Human Perceptionand Performance, 1979, 5, 674-691.


Grossberg, S. A theory of visual coding, memory, anddevelopment. In E. L. J. Leeuwenberg & H. F. J. M.Buffart (Eds.), Formal theories of visual perception.New York: Wiley, 1978.

Havens, L. L., & Foote, W. E. The effect of competitionon visual duration threshold and its independence ofstimulus frequency. Journal of Experimental Psy-chology, 1963, 65, 5-11.

Hinton, G. E., & Anderson, J. A. (Eds.), Parallel mod-els of associative memory. Hillsdale, N.J.: Erlbaum,1981.

Holender, D. Identification of letters in words and ofsingle letters with pre- and postknowledge vs. post-knowledge of the alternatives. Perception & Psycho-physics, 1979,25, 213-318.

Huey, E. B. The psychology and pedagogy of reading.New York: Macmillan, 1908.

Johnston, J. C. The role of contextual constraint in theperception of letters in words. Unpublished doctoraldissertation, University of Pennsylvania, 1974.

Johnston, J. C. A test of the sophisticated guessing the-ory of word perception. Cognitive Psychology, 1978,10, 123-154.

Johnston, J. C., & McClelland, J. L. Visual factors inword perception. Perception & Psychophysics, 1973,14, 365-370.

Johnston, J. C., & McClelland, J. L. Perception of let-ters in words: Seek not and ye shall find. Science,1974, 184, 1192-1194.

Johnston, J. C., & McClelland, J. L. Experimental testsof a hierarchical model of word identification. Journalof Verbal Learning and Verbal Behavior, 1980, 19,503-524.

Juola, J. F., Leavitt, D. D., & Choe, C. S. Letter iden-ttification in word, nonword, and single letter displays.Bulletin of the Psychonomic Society, 1974, 4, 278-280.

Kohonen, T. Associative memory: A system-theoreticapproach. West Berlin: Springer-Verlag, 1977.

Kucera, H., & Francis, W. Computational analysis ofpresent-day American English. Providence, R.I.:Brown University Press, 1967.

LaBerge, D., & Samuels, S. Toward a theory of auto-matic information processing in reading. CognitivePsychology, 1974, 6, 293-323.

Levin, J. A. Proteus: An activation framework for cog-nitive process models (ISI/WP-2). Marina del Rey,Calif.: Information Sciences Institute, 1976.

Luce, R. D. Individual choice behavior. New York:Wiley, 1959.

Manelis, L. The effect of meaningfulness in tachisto-scopic word perception. Perception & Psychophysics,1974, 16, 182-192.

Massaro, D. W., & Klitzke, D. The role of lateral mask-ing and orthographic structure in letter and word rec-ognition. Acta Psychologica, 1979, 43, 413-426.

McClelland, J. L. Preliminary letter identification in the

perception of words and nonwords. Journal of Ex-perimental Psychology: Human Perception and Per-formance, 1976, 7, 80-91.

McClelland, J. L. On the time relations of mental pro-cesses: An examination of systems of processes in cas-cade. Psychological Review, 1979, 86, 287-330.

McClelland, J. L., & Johnston, J. C. The role of familiarunits in perception of words and nonwords. Perception& Psychophysics, 1977, 22, 249-261.

Morton, J. Interaction of information in word recogni-tion. Psychological Review, 1969, 76, 165-178.

Neisser, U. Cognitive psychology. New York: Appleton-Century-Crofts, 1967.

Newbigging, P. L. The perceptual reintegration of fre-quent and infrequent words. Canadian Journal ofPsychology, 1961, 75, 123-132.

Reicher, G. M. Perceptual recognition as a function ofmeaningfulness of stimulus material. Journal of Ex-perimental Psychology, 1969, 81, 274-280.

Rumelhart, D. E. A multicomponent theory of the per-ception of briefly exposed visual displays. Journal ofMathematical Psychology, 1970, 7, 191-218.

Rumelhart, D. E. Toward an interactive model of read-ing. In S. Dornic (Ed.), Attention and performanceIV. Hillsdale, N.J.: Erlbaum, 1977.

Rumelhart, D. E., & McClelland, J. L. An interactiveactivation model of context effects in letter perception:Part 2. The contextual enhancement effect and sometests and extensions of the model. Psychological Re-view, in press.

Rumelhart, D. E., & Siple, P. The process of recognizingtachistoscopically presented words. PsychologicalReview, 1974, 87, 99-118.

Spoehr, K., & Smith, E. The role of orthographic andphonotactic rules in perceiving letter patterns. Jour-nal of Experimental Psychology: Human Perceptionand Performance, 1975, 7, 21-34.

Szentagothai, J., & Arbib, M. A. Conceptual modelsof neural organization. Cambridge, Mass.: MITPress, 1975.

Taylor, G. A., & Chabot, R. J. Differential backwardmasking of words and letters by masks of varyingorthographic structure. Memory & Cognition, 1978,6, 629-635.

Thompson, M. C., & Massaro, D. W. Visual informa-tion and redundancy in reading. Journal of Experi-mental Psychology, 1973, 98, 49-54.

Turvey, M. On peripheral and central processes in vi-sion: Inferences from an information-processing anal-ysis of masking with patterned stimuli. PsychologicalReview, 1973, 80, 1-52.

Weisstein, N., Ozog, G., & Szoc, R. A comparison andelaboration of two models of metacontrast. Psycho-logical Review, 1975, 82, 325-343.

Wheeler, D. Processes in word recognition. CognitivePsychology, 1970, 7, 59-85.

AppendixComputer Simulation of the Model

The computer program for simulating the in-teractive activation model was written in the Cprogramming language to run on a Digital PDF

11/45 computer under the UNIX (Trade Markof Bell Laboratories) operating system. There isnow a second version, also in C, which runs under


UNIX on a VAX 11/780. When no other jobsare running on the VAX, a simulation of a singleexperimental trial takes approximately 15-30 sec.

Data Structures

The simulation relies on several arrays for eachof the processing levels in the model. The inputis held in an array that contains slots for each ofthe line segments in the Rumelhart-Siple font ineach position. Segments can be present or absent,or their status can be indeterminate (as when theinput is made deliberately incomplete). There isanother array that holds the information themodel has detected about the display. Each ele-ment of this array represents a detector for thepresence or absence of a feature. When the cor-responding feature is detected, the detector's valueis set to 1 (remember that both absence and pres-ence must be detected).

At the letter level, one array (the activationarray) stores the current activation of each node.A second array (the excitatory buffer) is used tosum all of the excitatory influences reaching eachnode on a given tick of the clock, and a third array(the inhibitory buffer) is used to sum all of theinhibitory influences reaching each node. In ad-dition there is an output array, containing thecurrent output strength of each letter level node.At the word level, there is an activation array forthe current activation of each node, as well as anexcitatory buffer and an inhibitory buffer.

Knowledge of Letters and Words

The links among the nodes in the model arestored in a set of tables. There is a table in theprogram that lists which features are present ineach letter and which are absent. Another tablecontains the spellings of each of the 1,179 wordsknown to the program.

Input

Simulated visual input is entered from a com-puter terminal or from a text file. Several suc-cessive displays within a single "trial" may bespecified. Each display is characterized by an on-set time (tick number from the start of the trial—see below) and some array of visual information.Each lowercase letter stands for the array of fea-tures making up the corresponding letter. Othercharacters stand for particular mask characters,blanks, and so forth. As examples, "_" stands fora blank, and "0" stands for the & mask character.Thus the specification:

0 mav-12 mave24 0000

instructs the program to present the visual array

consisting of the letters M, A, and V in the first,second, and third letter positions, respectively, atCycle 0; to present the letter E in the fourth po-sition at Cycle 12; and to present an & mask atCycle 24. It is also possible to specify any arbi-trary feature array to occur in any letter position.

Processing Occurring During Each Cycle

During each cycle, the values of all of the nodesare updated. The activations of letter and wordnodes, which were determined on Cycle t — 1, areused to determine the activations of these nodeson Cycle t. Activations of feature nodes are up-dated first, so that they begin to influence letternodes right away.

The first thing the program does on each cycleis update the input array to reflect any newchanges in the display. On cycles when a newdisplay is presented, detectors for features in letterpositions in which there has been a change in theinput are subject to resetting. A random numbergenerator is used to determine whether each newfeature is detected or not. When the new valueof a particular feature (present or absent) is de-tected, the old value is erased. Probability of de-tection can be set to any probability (in manycases it is simply set to 1.0, so that all of thefeatures are detected).

For each letter in each position, the programthen checks the current activation value (i.e., thevalue computed on the previous cycle) in the ac-tivation array. If the node is active (i.e., if itsactivation is above threshold), its excitatory andinhibitory effects on each node at the word levelare computed. To determine whether the letter inquestion excites or inhibits a particular word node,the program simply examines the spelling of eachword to see if the letter is in the word in theappropriate position. If so, excitation is added tothe word's excitatory buffer. If not, inhibition isadded to the word's inhibitory buffer. The mag-nitudes of these effects are the product of the driv-ing letter's activation and the appropriate rateparameters. Word-to-letter influences are com-puted in a similar fashion.

The next step is the computation of the word-word inhibition and the determination of the newword activation values. First, the activations of allthe active word nodes are summed. The inhibitorybuffer of each word node is incremented by anamount proportional to the summed activation ofall other word nodes (i.e., by the product of thetotal word level activation minus its own activa-tion, if it is active, times the word-word inhibitionrate parameter). This completes the influencesacting on the word nodes. The value in the inhib-itory buffer is subtracted from the value in theexcitatory buffer. The result is then subjected tofloor and ceiling effects, as described in the article,


to determine the net effect of the excitatory andinhibitory input. This net effect is then added tothe current activation of the node, and the decayof the current value is subtracted to give a newcurrent value, which is stored in the activationarray. Finally, the excitatory and inhibitory buff-ers are cleared for new input on the next cycle.

Next is the computation of the feature-to-letterinfluences. For each feature in each letter position,if that feature has been detected, the programchecks each letter to see if it contains the feature.If it does, the excitatory buffer for that letter inthat position is incremented. If not, the corre-sponding inhibitory buffer is incremented. Afterthis, the letter-letter inhibition is added into theinhibitory buffers following a similar procedureas was used in computing the word-word inhibi-tory effects. (Actually, this step is skipped in thereported simulations, since the value of letter-let-ter inhibition has been set to zero.)

Next is the computation of the new activationvalues at the letter level. These are computed injust the same way as the new activation values atthe word level. Finally, the effect of the currentactivation is added into the letter's output strength,and the excitatory and inhibitory buffers arecleared for the next cycle.

The order of some of the preceding steps is ar-bitrary. What is important is that at the end ofeach cycle, the activations of all the word nodeshave been updated to reflect letter activations ofthe previous cycle and vice versa. The fact thatnewly detected input influences the letter detec-tors immediately is not meaningful, since waitinguntil the next cycle would just add a fixed delayto all of the activations in the system.

Output

To simulate forced-choice performance, theprogram must be told when to read out the resultsof processing at the letter level, what position isbeing tested, and what the two alternatives are.In fact the user actually gives the program thefull target display and the full alternative display(e.g., LEAD-LOAD), and the program comparesthem to figure out the critical letter position andthe two choice alternatives. Various options areavailable for monitoring readout performance ofthe simulation. First, it is possible to have theprogram print out what the result of readoutwould be at each time cycle. Second, the user mayspecify a particular cycle for readout. Third, theuser may tell the program to figure out the optimaltime for readout and to print both the time andthe resulting percent correct performance. Thisoption is used in preliminary runs _ to determine

what readout time to use in the final simulationruns for each experiment.

On each cycle for which output is requested,the program computes the probability that thecorrect alternative is read out and the probabilitythat the incorrect alternative is read out, basedon their response strengths as described in thetext. Probability-correct forced choice is then sim-ply the probability that the correct alternativewas read out, plus .5 times the probability thatneither the correct nor the incorrect alternativewas read out.

Observation and Manipulation

It is possible to examine the activation of anynode at the end of each cycle. A few useful sum-maries are also available, such as the number ofactive word nodes and the sum of their activations,the number of active letter nodes in each position,and so on. It is also possible to alter any of theparameters of the model between cycles or tochange a parameter and then start again at Time0 in order to compare the response of the modelunder different parameter values.

Running a Simulation

When simulating an experiment with a numberof different trials (i.e., a number of different stim-ulus items in each experimental condition), theinformation the computer needs about the inputand the forced-choice alternatives can be specifiedin a file, with one line containing all of the nec-essary information for each trial of the simulation.Typically a few test runs are carried out to choosean optimal exposure duration and readout time.Then the simulation is run with a single specifiedreadout time for each display condition (whendifferent display types are mixed within the sameblock of trials in the experiment being simulated,a single readout time is used for all display con-ditions). Note that when the probability of featuredetection is set to 1.0, the model is completelydeterministic. That is, it computes readout andforced-choice probabilities on the basis of responsestrengths. These are determined completely by theknowledge stored in the system (e.g., what thesystem knows about the appearance of the lettersand the spellings of the words), by the set of fea-tures extracted, and by the values of the variousparameters.

Received March 11, 1981 •

Documents

Psychological Review - Stanford