Understanding Normal and Impaired W ord Reading ...plaut/papers/pdf/PlautETAL96PsyRev.wordReading.pdf · Understanding Normal and Impaired W ord Reading: Computational Principles

Understanding Normal and Impaired Word Reading:Computational Principles in Quasi-Regular Domains

David C. Plaut James L. McClellandCarnegieMellon Universityandthe Carnegie Mellon Universityandthe

Centerfor theNeuralBasisof Cognition Centerfor theNeuralBasisof Cognition

Mark S. Seidenberg Karalyn PattersonNeuroscienceProgram MRC AppliedPsychologyUnit

Universityof SouthernCalifornia

To appearin Psychological Review, 1996.

AbstractWe developa connectionistapproachto processingin quasi-regulardomains,asexemplifiedby Englishwordreading.A considerationof theshortcomingsof a previousimplementation(Seidenberg & McClelland,1989,Psych. Rev.)in readingnonwordsleadsto thedevelopmentof orthographicandphonologi-calrepresentationsthatcapturebettertherelevantstructureamongthewrittenandspokenformsof words.In anumberof simulationexperiments,networksusingthenew representationslearnto readbothregularandexceptionwords,including low-frequencyexceptionwords,andyet arestill able to readpro-nounceablenonwordsaswell asskilled readers.A mathematicalanalysisoftheeffectsof word frequencyandspelling-soundconsistencyin a relatedbutsimplersystemservesto clarify thecloserelationshipof thesefactorsin influ-encingnaminglatencies.Theseinsightsareverifiedin subsequentsimulations,includinganattractornetworkthatreproducesthenaminglatencydatadirectlyin its time to settleona response.Furtheranalysesof thenetwork’sability toreproducedataon impairedreadingin surfacedyslexiasupporta view of thereadingsystemthatincorporatesagradeddivision-of-laborbetweensemanticandphonologicalprocesses.Sucha view is consistentwith themoregeneralSeidenberg andMcClellandframework andhassomesimilaritieswith—butalsoimportantdifferencesfrom—thestandarddual-routeaccount.

Many aspectsof languagecan be characterizedas quasi-regular—the relationshipbetweeninputsandoutputsis sys-tematicbut admitsmany exceptions. One suchtask is themappingbetweenthe written and spokenforms of Englishwords. Most words are regular (e.g., GAVE, MINT) in thattheir pronunciationsadhereto standardspelling-soundcorre-spondences.There are, however, many irregular or excep-tion words (e.g., HAVE, PINT) whosepronunciationsviolatethestandardcorrespondences.To makemattersworse,somespelling patternshave a rangeof pronunciationswith noneclearly predominating(e.g., OWN in DOWN, TOWN, BROWN,�

This researchwassupportedfinanciallyby theNationalInstituteof Men-tal Health(GrantsMH47566andMH00385),theNationalInstituteonAging(GrantAg10109),the National ScienceFoundation(GrantASC–9109215),and the McDonnell-Pew Programin Cognitive Neuroscience(Grant T89–01245–016).

We thankMarleneBehrmann,DerekBesner, Max Coltheart,JoeDevlin,Geoff Hinton, andEamonStrainfor helpful discussionsandcomments.WealsoacknowledgeDerekBesner, Max Coltheart,andMichaelMcCloskeyfordirectingattentionto manyof theissuesaddressedin this paper.

Correspondenceconcerningthis papershould be sent to Dr. David C.Plaut, Departmentof Psychology, CarnegieMellon University, Pittsburgh,PA 15213–3890,[email protected].

CROWN vs. KNOWN, SHOWN, GROWN, THROWN, or OUGH inCOUGH, ROUGH, BOUGH, THOUGH, THROUGH). Nonetheless,inthe faceof this complexity, skilled readerspronouncewrittenwordsquickly andaccurately, andcanalsousetheir knowl-edgeof spelling-soundcorrespondencesto readpronounceablenonwords(e.g.,MAVE, RINT).

An importantdebatewithin cognitive psychologyis howbesttocharacterizeknowledgeandprocessingin quasi-regulardomainsin orderto accountfor humanlanguageperformance.Oneview (e.g.,Pinker, 1984,1991)is that thesystematicas-pectsof languagearerepresentedandprocessedin theform ofanexplicit setof rules.A rule-basedapproachhasconsiderableintuitiveappealbecausemuchof humanlanguagebehavior canbecharacterizedata broadscalein termsof rules. It alsopro-videsa straightforwardaccountof how languageknowledgecanbeappliedproductively to novel items(Fodor& Pylyshyn,1988). However, asillustratedabove, mostdomainsareonlypartially systematic;accordingly, a separatemechanismis re-quired to handlethe exceptions. This distinctionbetweenarule-basedmechanismandanexceptionmechanism,eachop-eratingaccordingto fundamentallydifferentprinciples,formsthecentraltenetof so-called“dual-route”theoriesof language.

An alternativeview comesout of researchon connectionistor paralleldistributedprocessingnetworks,in whichcomputa-tion takestheformof cooperativeandcompetitiveinteractionsamonglargenumbersof simple,neuron-likeprocessingunits(McClelland, Rumelhart,& the PDP researchgroup, 1986;Rumelhart,McClelland, & the PDP researchgroup, 1986).Suchsystemslearnby adjustingweightson connectionsbe-tweenunits in a way that is sensitive to how the statisticalstructureof theenvironmentinfluencesthebehavior of thenet-work. As a result, thereis no sharpdichotomybetweentheitems that obey the rulesand the itemsthat do not. Rather,all itemscoexist within a singlesystemwhoserepresentationsandprocessingreflecttherelativedegreeof consistency in themappingsfor differentitems. The connectionistapproachisparticularlyappropriatefor capturingtherapid,onlinenatureof languageuse,aswell asfor specifyinghow suchprocessesmightbelearnedandimplementedin thebrain(althoughstill at

UnderstandingNormalandImpairedWordReading 2

asomewhatabstractlevel; seeSejnowski,Koch,& Churchland,1989,for discussion).Perhapsmorefundamentally, connec-tionist modelingprovidesa rich setof generalcomputationalprinciplesthat can lead to new and useful waysof thinkingabouthumanperformancein quasi-regulardomains.

Much of the initial debatebetweenthesetwo views of thelanguagesystemfocusedon therelatively constraineddomainof Englishinflectionalmorphology—specifically, forming thepast-tenseof verbs. Past-tenseformation is a rathersimplequasi-regular task: there is a single regular “rule” (add –ed; e.g., WALK

� “walked”) and only about100 exceptions,groupedinto several clustersof similar itemsthat undergo asimilar change(e.g.,SING

� “sang”, DRINK� “drank”) along

with a very small numberof very high-frequency, arbitraryforms(e.g.,GO

� “went”; Bybee& Slobin,1982).RumelhartandMcClelland(1986)attemptedtoreformulatetheissueawayfrom asharpdichotomybetweenexplicit rulesandexceptions,andtowarda view thatemphasizesthegradedstructurerelat-ingverbsandtheir inflections.They developedaconnectionistmodelthatlearnedadirectassociationbetweenthephonologyof all typesof verbstemsandthephonologyof theirpast-tenseforms.PinkerandPrince(1988)andLachterandBever(1988),however, pointedoutnumerousdeficienciesin themodel’sac-tual performanceandin someof its specificassumptions,andarguedmoregenerallythat the applicabilityof connectionistmechanismsin languageis fundamentallylimited (also seeFodor& Pylyshyn,1988).However, many of thespecificlim-itations of the Rumelhartand McClelland model have beenaddressedin subsequentsimulationwork (Cottrell & Plun-kett, 1991;Daugherty& Seidenberg, 1992; Hoeffner, 1992;MacWhinney & Leinbach,1991;Marchman,1993;Plunkett&Marchman,1991,1993). Thus,thepossibilityremainsstrongthataconnectionistmodelcouldprovideafull accountof past-tenseinflection. Furthermore,somerecentapplicationsto as-pectsof languagedisorders(Hoeffner & McClelland,1993;Marchman,1993)andlanguagechange(Hare& Elman,1992,in press)demonstratetheongoingextensionof theapproachtoaccountfor a wider rangeof languagephenomena.

Very similar issuesarise in the domain of oral reading,wherethereis a muchricher empiricaldatabasewith whichto makecontact. As in the domainof inflectionalmorphol-ogy, many researchersassumethataccountingfor the wealthof existing dataon both normal and impairedword readingrequirespostulatingmultiplemechanisms.In particular, dual-routetheorists(e.g.,Besner& Smith,1992;Coltheart,1978,1985;Coltheart,Curtis, Atkins, & Haller, 1993;Coltheart&Rastle,1994;Marshall& Newcombe,1973;Meyer, Schvan-eveldt, & Ruddy, 1974;Morton & Patterson,1980;Paap&Noel, 1991)have claimedthat pronouncingexceptionwordsrequiresa lexical lookupmechanismthat is separatefrom thesublexical spelling-soundcorrespondencerules that apply toregular words and nonwords(also seeHumphreys & Evett,1985,andthe accompanying commentariesfor discussionofthepropertiesof dual-routetheories).Theseparationof lexical

andsublexical proceduresis motivatedprimarily by evidencethat they canbe independentlyimpaired,eitherby abnormalreadingacquisition(developmentaldyslexia) or by braindam-agein a previously literate adult (acquireddyslexia). Thus,phonological dyslexics,whocanreadwordsbutnotnonwords,appearto have a selective impairmentof thesublexical proce-dure,whereassurface dyslexics, who canreadnonwordsbutwho“regularize”exceptionwords(e.g.,SEW

� “sue”), appearto have a selective impairmentof thelexical procedure.

Seidenberg andMcClelland(1989,hereafterSM89) chal-lengedthecentralclaim of dual-routetheoriesby developinga connectionistsimulationthatlearnedto maprepresentationsof thewritten formsof words(orthography)to representationsof their spokenforms (phonology). The network success-fully pronouncesboth regular and exception words and yetis not an implementationof two separatemechanisms(seeSeidenberg & McClelland,1992,for a demonstrationof thislast point). The simulationwasput forward in supportof amore generalframework for lexical processingin which or-thographic,phonological,and semanticinformation interactin graduallysettling on the best representationsfor a giveninput (seeStone& Van Orden,1989, 1994; Van Orden &Goldinger, 1994;VanOrden,Pennington,& Stone,1990,fora similar perspective on word reading). A major strengthoftheapproachis thatit providesanaturalaccountof thegradedeffectsof spelling-soundconsistency amongwords(Glushko,1979;Jared,McRae,& Seidenberg,1990)andhow thisconsis-tency interactswith word frequency (Andrews,1982;Seiden-berg, 1985;Seidenberg, Waters,Barnes,& Tanenhaus,1984;Taraban& McClelland,1987;Waters& Seidenberg, 1985).1

Furthermore,SM89 demonstratedthat undertrainedversionsof the modelexhibit someaspectsof developmentalsurfacedyslexia, andPatterson(1990,Patterson,Seidenberg, & Mc-Clelland,1989)showedhow damagingthenormalmodelcanreproducesomeaspectsof acquiredsurfacedyslexia. TheSM89modelalsocontributesto thebroaderenterpriseof con-nectionistmodelingof cognitive processes,in which a com-monsetof generalcomputationalprinciplesarebeingappliedsuccessfullyacrossa widerangeof cognitivedomains.

However, theSM89work hasa seriousempiricallimitationthatunderminesits role in establishinga viableconnectionistalternative to dual-routetheoriesof word readingin particu-lar, andin providing a satisfactoryformulationof the natureof knowledgeandprocessingin quasi-regular domainsmoregenerally. Specifically, theimplementedmodelis significantlyworsethanskilled readersat pronouncingnonwords(Besner,Twilley, McCann,& Seergobin, 1990). This limitation hasbroadimplicationsfor therangeof empiricalphenomenathatcan be accountedfor by the model (Coltheartet al., 1993).Poornonwordreadingisexactlywhatwouldbepredictedfromthe dual-routeclaim that no singlesystem—connectionistorotherwise—canreadbothexceptionwordsandpronounceable

1Thefindingsof thesestudieshaveoftenbeencastaseffectsof regularityratherthanconsistency—wewill addressthis distinctionin thenextsection.


nonwordsadequately. Under this interpretation,the modelhadsimplyapproximatedalexical look-upprocedure:it couldreadbothregularandexceptionwords,but hadnot separatelymasteredthesublexical rulesnecessaryto readnonwords.Analternative interpretation,however, is thattheempiricalshort-comingsof theSM89simulationstemfrom specificaspectsofits designandnot from inherentlimitationson theabilitiesofconnectionistnetworksin quasi-regulardomains.In particular,Seidenberg andMcClelland(1990)suggestedthatthemodel’snonwordreadingmight be improved—withoutadverselyaf-fecting its otherproperties—byusingeithera larger trainingcorpusor differentorthographicandphonologicalrepresenta-tions.

A secondlimitation of the SM89 work is that it did notprovideaveryextensiveexaminationof underlyingtheoreticalissues. SM89’s main emphasiswason demonstratingthat anetworkwhich operatedaccordingto fairly generalconnec-tionist principlescould accountfor a wide rangeof empiri-calfindingsonnormalanddevelopmentally-impairedreading.Relatively little attentionwaspaidin thatpaperto articulatingthe generalprinciplesthemselves or to evaluating their rel-ative importance. Thus, muchof the underlyingtheoreticalfoundationof thework remainedimplicit. Despitesubsequenteffortsin explicatingtheseprinciples(Seidenberg,1993),thereremainsconsiderableconfusionwith regardto theroleof con-nectionistmodelingin contributingtoatheoryof wordreading(or of any othercognitive process).Thus,someresearchers(e.g.,Forster, 1994;McCloskey, 1991)have claimedthat theSM89 demonstration,while impressive in its own right, hasnot extendedour understanding of word readingbecausetheoperationof themodelitself—andof connectionistnetworksmoregenerally—istoocomplex to understand.Consequently,“connectionistnetworksshouldnotbeviewedastheoriesof hu-mancognitivefunctions,or assimulationsof theories,or evenasdemonstrationsof specifictheoreticalpoints” (McCloskey,1991,p. 387; alsoseeMassaro,1988;Olsen& Caramazza,1991). Althoughwe rejectthe claim thatconnectionistmod-eling is atheoretical(seeSeidenberg, 1993),andthatthereareno basesfor analyzingandunderstandingnetworks(see,e.g.,Hanson& Burr, 1990), we agreethat the theoreticalprinci-plesandconstructsfor developingconnectionistexplanationsof empiricalphenomenaarein needof furtherelaboration.

Thecurrentworkdevelopsaconnectionistaccountof knowl-edgerepresentationandcognitive processingin quasi-regulardomains,in thespecificcontext of normalandimpairedwordreading. Thework draws on ananalysisof thestrengthsandweaknessesof theSM89work,with thedualaimof providingamoreadequateaccountof therelevantempiricalphenomena,andof articulatingin a moreexplicit andformal mannerthetheoreticalprinciplesthat underliethe approach.We exploretheuseof alternativerepresentationsthatmaketheregularitiesbetweenwritten andspokenwordsmoreexplicit. In the firstsimulationexperiment,a networkusing the new representa-tionslearnsto readbothregularandexceptionwords,includ-

ing low-frequency exceptionwords,andyet is still ableto readpronounceablenonwordsaswell asskilledreaders.Theresultsopenuptherangeof possiblearchitecturesthatmightplausiblyunderliehumanwordreading.A mathematicalanalysisof theeffectsof wordfrequency andspelling-soundconsistency in asimplerbut relatedsystemservesto clarify thecloserelation-ship of thesefactorsin influencingnaminglatencies. Theseinsightsareverifiedin a secondsimulation. Simulation3 de-velopsanattractornetworkthatreproducesthenaminglatencydatadirectly in its time to settleon a response,obviating theneedto useerror asa proxy for reactiontime. The implica-tion of the semanticcontribution to readingis consideredinthe fourth andfinal simulation,in the context of accountingfor theimpairedreadingbehavior of acquiredsurfacedyslexicpatientswith braindamage.Damageto theattractornetworkprovidesonly a limited accountof therelevantphenomena;abetteraccountis providedbytheperformanceof anetworkthatlearnsto maporthographyto phonologyin thecontext of sup-port fromsemantics.Thefindingsleadto aview of thereadingsystemthat incorporatesa gradeddivision-of-laborbetweensemanticandphonologicalprocesses.Sucha view is consis-tent with the more generalSM89 framework and hassomesimilarities with—but also important differencesfrom—thestandarddual-routeaccount. The GeneralDiscussionartic-ulatesthesedifferences,andclarifiesthe implicationsof thecurrentwork for abroaderrangeof empiricalfindings,includ-ing thoseraisedby Coltheartet al. (1993)aschallengesto theconnectionistapproach.

Webegin with abrief critiqueof theSM89model,in whichwe try to distinguishits centralcomputationalpropertiesfromlesscentralaspectsof its design. An analysisof its repre-sentationsleadsto the designof new representationsthatareemployedin a seriesof simulationsanalogousto the SM89simulation.

The Seidenberg and McClelland Model

The General Framework

Seidenberg andMcClelland’s (1989) generalframework forlexical processingis shown in Figure1. Orthographic,phono-logical, andsemanticinformation is representedin termsofdistributed patternsof activity over separategroupsof sim-ple neuron-likeprocessingunits. Within eachdomain,similarwordsarerepresentedby similar patternsof activity. Lexicaltasksinvolvetransformationsbetweentheserepresentations—for example,oral readingrequirestheorthographicpatternfora wordto generatetheappropriatephonologicalpattern.Suchtransformationsareaccomplishedvia thecooperativeandcom-petitive interactionsamongunits, includingadditionalhiddenunitsthatmediatebetweentheorthographic,phonological,andsemanticunits. Unit interactionsaregovernedby weightedconnectionsbetweenthem,whichcollectively encodethesys-tem’s knowledgeabouthow thedifferenttypesof information


Orthography Phonology

MAKE /mAk/�

Context

Meaning

Figure 1. Seidenberg and McClelland’s (1989) generalframework for lexicalprocessing.Eachoval representsagroupof unitsandeacharrow representsagroupof connections.Theimplementedmodelis shown in bold. (Adaptedfrom Seiden-berg & McClelland,1989,p. 526)

arerelated.Thespecificvaluesof theweightsarederivedbyanautomaticlearningprocedureon thebasisof thesystem’sexposureto writtenwords,spokenwords,andtheirmeanings.

TheSM89framework isbroadlyconsistentwith amoregen-eralview of informationprocessingthathasbeenarticulatedbyMcClelland(1991,1993)in thecontext of GRAIN networks.Thesenetworksembodythefollowing generalcomputationalprinciples:� Graded: Propagationof activation is not all-or-nonebut

ratherbuildsupgraduallyover time.� Random:Unit activationsaresubjectto intrinsicstochas-tic variability.� Adaptive: The systemgradually improves its perfor-mancebyadjustingweightsonconnectionsbetweenunits.� Interactive: Informationflows in a bidirectionalmannerbetweengroupsof units,allowing their activity levels toconstraineachotherandbemutuallyconsistent.� Nonlinear:Unit outputsaresmooth,nonlinearfunctionsof their total inputs,significantlyextendingthecomputa-tional power of theentirenetworkbeyond thatof purelylinearnetworks.

TheacronymGRAIN is alsointendedtoconvey thenotionthatcognitive processesareexpressedat a finer grainof analysis,

in termsof interactinggroupsof neuron-likeunits,thanis typ-ical of most“box-and-arrow” informationprocessingmodels.Furthercomputationalprinciplesthatarecentralto theSM89framework but not capturedby theacronym are:� DistributedRepresentations:Itemsin thedomainarerep-

resentedby patternsof activity over groupsof units thatparticipatein representingmany otheritems.� DistributedKnowledge: Knowledgeaboutthe relation-ship betweenitems is encodedacrosslarge numbersofconnectionweights that also encodemany other map-pings.

Muchof thecontroversysurroundingtheSM89framework,and the associatedimplementation,stemsfrom the fact thatit breakswith traditionalaccountsof lexical processing(e.g.,Coltheart,1985;Morton& Patterson,1980)in two fundamen-tal ways. The first is in the representationalstatusof words.Traditionalaccountsassumethatwordsarerepresentedin thestructureof thereadingsystem—inits architecture. Morton’s(1969) “logogens” are well-known instancesof this type ofwordrepresentation.By contrast,within theSM89frameworkthe lexical statusof a string of lettersor phonemesis not re-flectedin the structureof the readingsystem. Rather, wordsaredistinguishedfrom nonwordsonly by functional proper-tiesof thesystem—theway in which particularorthographic,phonological,andsemanticpatternsof activity interact(alsoseeVanOrdenetal., 1990).

The SM89 framework’s secondmajor breakwith traditionconcernsthe degree of uniformity in the mechanism(s)bywhich orthographic,phonological,and semanticrepresenta-tions interact. Traditionalaccountsassumethatpronouncingexception words and nonwordsrequireseparatelexical andsublexical mechanisms,respectively. By contrast,the SM89framework employsfar morehomogeneousprocessesin oralreading. In particular, it eschews separatemechanismsforpronouncingnonwordsand exceptionwords. Rather, all ofthe system’s knowledge of spelling-soundcorrespondencesis broughtto bearin pronouncingall typesof letter strings.Conflictsamongpossiblealternativepronunciationsof a letterstring are resolved, not by structurallydistinct mechanisms,but by cooperative andcompetitive interactionsbasedon howthe letterstring relatesto all known wordsandtheir pronun-ciations. Furthermore,the semanticrepresentationof a wordparticipatesin oral readingin exactly thesamemannerasdoits orthographicand phonologicalrepresentations,althoughthe framework leaves openthe issueof how importantthesesemanticinfluencesarein skilledoral reading.

Regularity versus Consistency. An issue that is inti-mately relatedto the tensionbetweenthe SM89 frameworkandtraditionaldual-routetheoriesconcernsthedistinctionbe-tweenregularityandconsistency. Broadlyspeaking,a wordisregular if its pronunciationcanbegenerated“by rule” anditis consistent if its pronunciationagreeswith thoseof similarly


speltwords. Of course,to beusefulthesedefinitionsmustbeoperationalizedin morespecificterms. The mostcommonlyproposedpronunciationrulesarebasedon the mostfrequentgrapheme-phonemecorrespondencesin thelanguage,althoughsuchGPCrulesmustbeaugmentedwith considerablecontext-sensitivity to operateadequately(seeColtheartet al., 1993;Seidenberg, Plaut,Petersen,McClelland,& McRae,1994,fordiscussion).Consistency, ontheotherhand,hastypicallybeendefinedwith respectto theorthographicbodyandthephono-logical rime (i.e., the vowel plus any following consonants).Thischoicecanbepartly justifiedonthegroundsof empiricaldata: for example,Treiman,Mullennix, Bijeljac-Babic,andRichmond-Welty (in press)have recentlydemonstratedthat,in namingdatafor all 1329 monosyllabicwords in Englishwith a CVC pronunciation,theconsistency of thebody(VC)accountsfor significantlymorevariancein naminglatency thanthe consistency of the onsetplusvowel (CV). Therearealsopragmaticreasonsfor restrictingconsiderationto body-levelconsistency—bodiesconstitutea manageablemanipulationindesigningexperimentallists. If experimentershadto considerconsistency acrossorthographicneighborhoodsatall possiblelevels, from individual graphemesup to the largestsub-wordsizedchunks,their selectionof stimuluswordswould be anevenmoreagonizingprocessthanit alreadyis. Nonetheless,the generalnotion of consistency is broaderthan a specificinstantiationin termsof bodyconsistency, just asthegeneralnotionof regularity is broaderthanthatdefinedby any partic-ularsetof spelling-soundcorrespondencerules.

Basedon the frequentobservation (e.g., Coltheart,1978;Parkin,1982;Waters& Seidenberg,1985)thatwordswith reg-ularor typical spelling-soundcorrespondences(suchasMINT)produceshorternaminglatenciesand lower error ratesthanwordswith exceptionalcorrespondences(suchasPINT), regu-larity wasoriginally consideredto bethecritical variable. In1979,however, Glushkoarguedthat consistency provided abetteraccountof empiricalresults.Although MINT maybearegularword accordingto GPCrules,its spelling-soundrela-tionshipis inconsistentwith thatof its orthographicneighbor,PINT. To the extent that the processof computingphonol-ogy from orthographyis sensitive to thecharacteristicsof theneighborhood,performanceonaregularbut inconsistentwordlike MINT may also be adverselyaffected. Glushko(1979)did indeeddemonstratelongernaminglatenciesfor regularin-consistentwordsthanfor regularwordsfrom consistentbodyneighborhoods,thoughthis resultwasnot alwaysobtainedinsubsequentexperiments(e.g.,Stanhope& Parkin,1987).

In 1990,Jared,McRae,andSeidenberg offereda moreso-phisticatedhypothesisthatcapturesaspectsof resultsnothan-dledby previousaccountsreferringsolely to eitherregularityor consistency. Accordingto Jaredandcolleagues,the mag-nitudeof the consistency effect for a givenword dependsonthe summedfrequency of that word’s friends (wordswith asimilar spellingpatternandsimilar pronunciation)andof itsenemies (wordswith asimilarspellingpatternbut adiscrepant

pronunciation).For example,an inconsistentword like MINT

hasanumberof friends(e.g.,LINT, TINT, PRINT, etc.) andjustasingleenemy, PINT. Againstthestrengthof friends,thesingleenemycannotexert a markedinfluence(especiallywhen,asis true of PINT, the enemyis of relatively low frequency); itsnegative impacton computingthepronunciationof MINT willthusbe small andperhapsundetectable.By contrast,an in-consistentword like GOWN, with many enemies(e.g.,BLOWN,SHOWN, GROWN, etc.) aswell asfriends(e.g.,DOWN, BROWN,TOWN), givesrise to a moresubstantialeffect. Suchwords,with roughlybalancedsupportfrom friendsandenemies,havebeentermedambiguous (with respectto thepronunciationoftheirbody;Backman,Bruck,Hebert,& Seidenberg,1984;Sei-denberg etal., 1984).

The commonlyobserved effect of regularity also finds anaturalexplanationwithin Jaredet al.’s (1990)account,be-causemostregularwords(asdefinedby GPCrules)havemanyfriendsandfew if any enemies,whereaswordswith irregularspelling-soundcorrespondences(suchas PINT or SEW) typi-cally have many enemiesandfew if any friends. Given thiscorrespondence,and following Glushko(1979)andTarabanand McClelland (1987), we will refer to words with manyenemiesandfew if any friendsasexception words,acknowl-edgingthat this definition excludesmany words that wouldbeconsideredexceptionalaccordingto GPCrules(e.g.,manyambiguouswords). Jaredet al.’s hypothesisandsupportingdataalsomeshwell with otherresultsdemonstratingtheinad-equacy of a simple regular/irregular dichotomy, suchas the“degreesof regularity” effect observed in acquiredsurfacedyslexia (Shallice,Warrington,& McCarthy, 1983,also seePlaut, Behrmann,Patterson,& McClelland,1993, for moredirectevidenceof consistency effectsin surfacedyslexia).

It mustbekeptin mind,however, thatadefinitionof consis-tency basedsolelyonbodyneighborhoods,evenif frequency-weighted,canprovideonlyapartialaccountof theconsistencyeffects that would be expectedto operateover the full rangeof spelling-soundcorrespondences.Thus, for example, theword CHEF could not be consideredinconsistenton a body-level analysisas all the words in Englishwith the body EF

(i.e., CLEF, REF) agreewith its pronunciation. On a broaderdefinitionof consistency, however, CHEF is certainlyinconsis-tent, sincethe overwhelminglymost commonpronunciationof CH in English is the oneappropriateto CHIEF, not CHEF.This broadview of consistency is also importantwhencon-sideringwhatmightbecalledirregularconsistentwords—thatis, wordssuchasKIND, BOLD, andTOOK thathave highly con-sistentbodyneighborhoodsbut thatarenonethelessirregularaccordingto GPCrulessuchasthoseof Coltheartetal. (1993).The processingof suchitemswould beexpectedto besensi-tive to theconflictbetweenconsistency at thebody-rimeleveland inconsistency at the grapheme-phonemelevel. In all ofwhat follows, therefore,althoughwe will adoptthe standardpracticeof usingbody-level manipulationsfor empiricaltests,this shouldbeinterpretedasproviding only anapproximation


of thetruerangeof consistency effects.Relationship to Other Approaches. A cursoryinspection

of Figure1 mightsuggestthattheSM89framework is, in fact,a dual-routesystem:orthographycaninfluencephonologyei-therdirectlyor viasemantics.To clarify thispossiblesourceofconfusion,wemustbemoreexplicit abouttypicalassumptionsin dual-routetheoriesconcerningthe structureandoperationof the differentprocedures.As describedearlier, the centraldistinctionin suchtheoriesis betweenlexical andsublexicalprocedures.The sublexical procedureappliesGPC rules toproducecorrectpronunciationsfor regular words,reasonablepronunciationsfor nonwords,andincorrect,“regularized”pro-nunciationsfor exceptionwords. The lexical procedurepro-ducescorrectpronunciationsfor all words,andnoresponsefornonwords.Whentheoutputsof thetwoproceduresconflict,asthey dofor exceptionwords,somemodels(e.g.,Paap& Noel,1991) assumea “horse race” with the faster(typically lexi-cal) proceduregeneratingthe actualresponse.Others(e.g.,Monsell, Patterson,Graham,Hughes,& Milroy, 1992)sug-gest that output from the two proceduresis pooled until aphonologicalrepresentationsufficient to drive articulationisachieved (althoughthe specificmeansby which this poolingoccursis rarelymadeexplicit). Thelexical procedureis oftensubdividedintoadirect routethatmapsorthographicwordrep-resentationsdirectly onto phonologicalword representations,andan indirect route that mapsvia semantics.In thesefor-mulations,the “dual-route”modelis in a sensea three-routemodel,althoughresearcherstypically assumethattheindirect,semanticroute would be too slow to influenceskilled wordpronunciation(Coltheart,1985;Patterson& Morton,1985).

By contrast,the nonsemanticportion of the SM89 frame-workdoesnotoperatebyapplyingGPCrules,butby thesimul-taneousinteractionof units. It is alsocapableof pronouncingall typesof input,includingexceptionwords,althoughthetimeit takesto dosodependsonthetypeof input. Furthermore,thesemanticportionof theframework doesnotoperatein termsofwhole-wordrepresentations,but ratherin termsof interactingunits, eachof which participatesin the processingof manywords. In addition,nonwordsmayengagesemanticsto somedegree,althoughtheextentto which thisoccursis likely to beminimal (seethediscussionof lexical decisionin theGeneralDiscussion). Thus, the structureandoperationof the SM89framework is fundamentallydifferentfrom existingdual-routetheories.

It mayalsohelptoclarify therelationshipbetweentheSM89framework andapproachesto word readingother thandual-routetheories.The two mainalternativesarelexical-analogytheoriesandmultiple-levels theories. Lexical-analogytheo-ries (Henderson,1982;Marcel,1980)dispensewith the sub-lexical procedure,andproposethat the lexical procedurecanpronouncenonwordsby synthesizingthepronunciationsof or-thographicallysimilarwords.Unfortunately, thewayin whichthesepronunciationsaregeneratedandsynthesizedis rarelyfully specified.Multiple-levelstheories(Shallice& McCarthy,

1985;Shalliceet al., 1983)dispensewith the (direct) lexicalroute (or rather, incorporateit into the sublexical route) byassumingthatspelling-soundcorrespondencesarerepresentedfor segmentsof all sizes,rangingfrom singlegraphemesandphonemesto wordbodiesandentiremorphemes.

In away, theSM89framework canbethoughtof asaninte-grationandmoredetailedspecificationof lexical-analogyandmultiple-level theories(alsoseeNorris, 1994, for a connec-tionist implementationof the latter). The pronunciationsofnonwordsaregeneratedonthebasisof thecombinedinfluenceof all known word pronunciations,with thosemost similarto the nonwordhaving the strongesteffect. In order for thesystemto pronounceexceptionwords as well as nonwords,the hiddenunits must learnto be sensitive to spelling-soundcorrespondencesof a rangeof sizes. The framework is alsobroadlyconsistentwith VanOrdenetal.’s(1990)proposalthatorthographyand phonologyare strongly associatedvia co-variant learning,althoughthe SM89 framework incorporatesdirect interactionbetweenorthographyandsemantics,whichVanOrdenandcolleaguesdispute.

The Implemented ModelThe SM89 framework clearly representsa radical departurefrom widely held assumptionsaboutlexical processing,butis it plausible asanaccountof humanword reading? In theserviceof establishingtheframework’splausibility,SM89im-plementedaspecificconnectionistnetworkthat,they implicitlyclaimed,embodiesthecentraltheoreticaltenetsof theframe-work.

Thenetwork,highlightedin boldin Figure1, containsthreegroupsof units: 400orthographicunits,200hiddenunits,and460phonologicalunits. Thehiddenunitsreceive connectionsfrom all of the orthographicunits and,in turn, sendconnec-tions to all of thephonologicalunitsaswell asbackto all ofthe orthographicunits. Thenetworkcontainsno semanticorcontext information.

Orthographicandphonologicalformsarerepresentedaspat-ternsof activity over theorthographicandphonologicalunits,respectively. Thesepatternsaredefinedin termsof context-sensitive triples of lettersandphonemes(Wickelgren,1969).It wascomputationallyinfeasiblefor SM89 to includea unitfor eachpossibletriple, so they usedrepresentationsthat re-quire fewer units but preserve therelative similaritiesamongpatterns.In orthography, the letter triples to which eachunitrespondsaredefinedby atableof 10 randomlyselectedletters(or a blank) in eachof threepositions. In the representationof a letter string, an orthographicunit is active if the stringcontainsoneof thelettertriplesthancanbegeneratedby sam-pling from eachof thethreepositionsof thatunit’s table. Forexample,GAVE would activateall orthographicunits capableof generating GA, GAV, AVE, or VE .

Phonologicalrepresentationsare derived in an analogousfashion,exceptthata phonologicalunit’s tableentriesat eachposition are not randomlyselectedphonemes,but ratherall


phonemescontainingaparticularphonemicfeature(asdefinedbyRumelhart& McClelland,1986).A furtherconstraintisthatthefeaturesfor thefirst andthirdpositionsmustcomefromthesamephoneticdimension(e.g.,placeof articulation). Thus,eachunit in phonologyrepresentsa particularorderedtripleof phonemicfeatures,termeda Wickelfeature. For example,thepronunciation/gAv/ wouldactivatephonologicalunitsrep-resentingtheWickelfeatures[back, vowel, front], [stop, long,fricative], andmany others(given that /g/ hasback andstopamongits features,/A/ hasvowel and long, and/v/ hasfrontandfricative). Onaverage,a wordactivates81 (20.3%)of the400orthographicunits,and54 (11.7%)of the460phonolog-ical units. We will return to an analysisof the propertiesoftheserepresentationsaftersummarizingtheSM89simulationresults.

Theweightson connectionsbetweenunitswereinitializedto small randomvalues. The network then was repeatedlypresentedwith theorthographyof eachof 2897monosyllabicwords,andtrainedbothto generatethephonologyof thewordandto regenerateits orthography(seeSeidenberg & McClel-land,1989,for details).Duringeachsweepthroughthetrainingset,the probability thata word waspresentedto the networkwas proportionalto a logarithmic function of its frequency(Kucera& Francis,1967).Processingaword involvedsettingthestatesof theorthographicunits(asdefinedabove),comput-ing hiddenunit statesbasedonstatesof theorthographicunitsandtheweightson connectionsfrom them,andthencomput-ing statesof thephonologicalandorthographicunitsbasedonthoseof thehiddenunits. Back-propagation(Rumelhart,Hin-ton,& Williams, 1986a,1986b)wasusedto calculatehow toadjusttheweightsto reducethedifferencesbetweenthecorrectphonologicalandorthographicrepresentationsof thewordandthosegeneratedby the network. Theseweightchangeswereaccumulatedduringeachsweepthroughthetrainingset;at theend,thechangeswerecarriedoutandtheprocesswasrepeated.

Thenetworkwasconsideredtohavenamedawordcorrectlywhenthegeneratedphonologicalactivity wascloserto therep-resentationof thecorrectpronunciationof thewordthanto thatof any pronunciationwhichdifferedfrom thecorrectoneby asinglephoneme.For theexampleGAVE

� /gAv/, thecompet-ing pronunciationsareall thoseamong/ � Av/, /g� v/, or /gA � /,where/ � / is any phoneme.After 250trainingsweepsthroughthe corpus,amountingto about150,000word presentations,thenetworkcorrectlynamedall but 77words(97.3%correct),mostof whichwerelow-frequency exceptionwords.

A considerableamountof empirical dataon oral readingconcernsthe time it takesto namewords of varioustypes.A naturalanaloguein a modelto naminglatency in subjectswould be the amountof computingtime requiredto producean output. SM89 could not usethis measurebecausetheirnetworktakesexactlythesameamountof time—oneupdateofeachunit–tocomputephonologicaloutputfor any letterstring.Instead,they approximatednaming latency with a measureof the accuracy of the phonologicalactivity producedby the

Low� HighFrequency�3

4

5

6

7

Sum

Squ

ared

Err

or

�Exception�AmbiguousRegular InconsistentRegular Consistent

Figure 2. Meanphonologicalerror scoresproducedby theSeidenberg and McClelland (1989) network for words withvariousdegreesof spelling-soundconsistency (listed in Ap-pendix1) asa functionof frequency. Regeneratedfrom Fig-ure16of Seidenberg andMcClelland(1989,p. 542).

network—thephonological error score. SM89showedthatthenetwork’sdistributionof phonologicalerrorscoresfor variouswords replicatesthe effects of frequency andconsistency innaminglatenciesfoundin a wide varietyof empiricalstudiesusingthesamewords.Figure2presentsparticularlyillustrativeresultsin this regard,usinghigh- andlow-frequency wordsatfour levelsof consistency (listedin Appendix1 andusedin thecurrentsimulations):� Exception wordsfrom Experiments1 and2 of Taraban

and McClelland (1987); they have an averageof 0.73friendsin theSM89corpus(notcountingtheworditself),and9.2enemies;� Ambiguous wordsgeneratedby SM89 to be matchedinKuceraandFrancis(1967)frequency with theexceptionwords;they average8.6friendsand8.0enemies;� Regular inconsistent words,alsofrom TarabanandMc-Clelland(1987),which average7.8 friendsandonly 2.1enemies;� Regular consistent wordswhicharethecontrolitemsfortheexceptionwordsin theTarabanandMcClellandstudy;they have an averageof 10.7 friends and0.04 enemies(the foreignword COUP for the item GROUP, andoneofthepronunciationsof BASS for theitem CLASS).

Therelevantempiricaleffectsin naminglatency exhibitedbytheSM89modelare,specifically:

1. High-frequency words are named faster than low-frequency words(e.g.,Forster& Chambers,1973;Fred-eriksen& Kroll, 1976).


2. Consistentwordsarenamedfasterthaninconsistentwords(Glushko,1979), and latenciesincreasemonotonicallywith increasingspelling-soundinconsistency (asapprox-imatedby therelative proportionof friendsvs.enemies;Jaredet al., 1990). Thus,regular inconsistentwordslikeMOTH (cf. BOTH) areslowerto benamedthanregularcon-sistentwordslike MUST (Glushko,1979),andexceptionwordslike PINT andSEWaretheslowestto benamed(Sei-denberg et al., 1984). Performanceon ambiguouswordslike GOWN (cf. GROWN) falls betweenthaton regular in-consistentwordsandthat on exceptionwords,althoughthis hasbeeninvestigateddirectly only with respecttoreadingacquisition(Backmanet al., 1984).

3. Frequency interactswith consistency (Seidenberg, 1985;Seidenberg et al., 1984; Waters& Seidenberg, 1985),suchthat the consistency effect is muchgreateramonglow-frequency wordsthanamonghigh-frequency words(whereit mayevenbeabsent;see,e.g.,Seidenberg,1985),or equivalently, the frequency effect decreaseswith in-creasinglyconsistency (perhapsbeingabsentamongreg-ularwords;see,e.g.,Waters& Seidenberg, 1985).

In consideringtheseempiricalandsimulationresults,it is im-portantto keepin mindthattheuseof afour-wayclassificationof consistency is not intendedto imply the existenceof fourdistinctsubtypesof words;rather, it is intendedto help illus-tratetheeffectsof whatis actuallyanunderlyingcontinuumofconsistency (Jaredet al., 1990).2

The modelalsoshows analogouseffectsof consistency innonwordnaminglatency. In particular, nonwordsderivedfromregularconsistentwords(e.g.,NUST from MUST) arefastertonamethannonwordsderivedfromexceptionwords(e.g.,MAVE

from HAVE; Glushko,1979; Taraban& McClelland, 1987).As mentionedin the Introduction,however, the model’s non-wordnamingaccuracy is muchworsethanthatof skilledread-ers. Besneret al. (1990)reportedthat,on nonwordlists fromGlushko(1979)andMcCannandBesner(1987),themodelisonly 59% and51% correct,whereasskilled readersare94%and 89% correct, respectively. Seidenberg and McClelland(1990)pointedout that thescoringcriterionusedfor thenet-work wasmorestrict thanthatusedfor thesubjects.We willreturnto theissueof scoringnonwordreadingperformance—for thepresentpurposes,it sufficesto acknowledgethat,even

2This is particularlytrue with respectto the distinction betweenregularinconsistentwordsandambiguouswords,which differ only in the degreeofbalancebetweenfriendsandenemies.In fact, a numberof previousstudies,including Tarabanand McClelland (1987), failed to makethis distinction.As a result,someof the TarabanandMcClellandregularinconsistentwordscontainbodiesthatwecategorizeasambiguous(e.g.,DEAR, GROW). Thishastheunfortunateconsequencethat,occasionally,wordswith identicalbodiesareassignedinto differentconsistencyclasses.However, in the currentcontext,we arenot concernedwith individual itemsbut solelywith usingthepatternof meansacrossclassesto illustrateoverallconsistencyeffects. In this regard,the word classesdiffer in the appropriatemannerin their average relativenumbersof friendsandenemies.Thus,for continuitywith earlierwork, wewill continueto usetheTarabanandMcClellandstimuli.

takingdifferencesin scoringinto account,theperformanceoftheSM89modelonnonwordsis inadequate.

The SM89 model replicatesthe effects of frequency andconsistency in lexical decision(Waters& Seidenberg, 1985)whenresponsesarebasedonorthographic error scores, whichmeasurethe degreeto which the networksucceedsat recre-ating the orthographyof eachinput string. Again, however,the model is not as accurateat lexical decisionundersomeconditionsasarenormalsubjects(Besneretal., 1990;Fera&Besner, 1992).

Consistency alsoinfluencestheeasewith whichwordnam-ing skills areacquired. Thus, lessskilled readers—whetheryounger or developmentallydyslexic—show larger consis-tency effectsthanmoreskilled readers(Backmanetal., 1984;Vellutino,1979).Themodelshowssimilareffectsbothearlyinthecourseof learningandwhentrainedwith limited resources(e.g.,too few hiddenunits).

Finally, damagingthemodelby removing unitsor connec-tions resultsin a patternof errorsthat is somewhat similar tothatof brain-injuredpatientswith oneform of surfacedyslexia(Patterson,1990; Pattersonet al., 1989). Specifically, low-frequency exceptionwordsbecomeparticularlyproneto be-ing regularized(seePatterson,Coltheart,& Marshall,1985).Overall, however, attemptsto model surfacedyslexia by le-sioningtheSM89modelhave beenlessthansatisfactory(seeBehrmann& Bub,1992;Coltheartet al., 1993,for criticism).Wewill considerthisandothertypesof developmentalandac-quireddyslexia in moredetailafterpresentingnew simulationresultsonnormalskilled reading.

Evaluation of the ModelIn evaluatingtheSM89results,it is importantto bearin mindtherelationshipbetweentheimplementedmodelandthemoregeneralframework for lexical processingfrom which it wasderived. In many ways,theimplementednetworkis apoorap-proximationto thegeneralframework: it containsnosemanticrepresentationsor knowledge,it wastrainedon a limited vo-cabulary, andits feedforwardarchitectureseverelyrestrictstheway in which informationcaninteractwithin the system. Inaddition,asaworkingimplementation,thenetworkinevitablyembodiesspecificrepresentationalandprocessingdetailsthatare not central to the overall theoreticalframework. Suchdetailsincludethespecificorthographicandphonologicalrep-resentationschemes,the logarithmic frequency compressionusedin training, the useof error scoresto modelnamingla-tencies,andthe useof a supervised,error-correctingtrainingprocedure(but seeJordan& Rumelhart,1992). Nonetheless,the implementednetwork is faithful to most of the centraltheoreticaltenetsof the generalframework (seealsoSeiden-berg, 1993): (a) thenetworkemploysdistributedorthographicandphonologicalrepresentationsthatreflectthesimilaritiesofwordswithin eachdomain,(b) the computationof orthogra-phy and phonologyinvolve nonlinearcooperative and com-petitiveinfluencesgovernedby weightedconnectionsbetween


units,(c) theseweightsencodeall of thenetwork’sknowledgeabouthow orthographyandphonologyarerelated,and(d) thisknowledgeis acquiredgraduallyon thebasisof thenetwork’sexposureto written wordsandtheir pronunciations.It is im-portantto note that two centralprinciplesare lacking in theimplementednetwork: interactivity and intrinsic variability.We considertheimplicationsof theseprincipleslater.

Beforewe focuson the limitations of SM89’s work, it isimportanttobeclearaboutitsstrengths.Firstandforemost,thegeneralframework is supportedby anexplicit computationalmodelthatactuallyimplementsthemappingfromorthographytophonology. Of course,implementingamodeldoesnotmakeit any morecorrect,but it does,amongotherthings,allow itto bemorethoroughlyandadequatelyevaluated(Seidenberg,1993).Many modelsof readingarenomoreexplicit than“box-and-arrow” diagramsaccompaniedby descriptive text onhowprocessingwould occur in eachcomponent(a notablerecentexceptiontothisis theimplementationof Coltheartetal.,1993;Coltheart& Rastle,1994,whichis comparedin detailwith thecurrentapproachbySeidenberg etal.,1994).In fact,theSM89generalframework amountsto sucha description.By takingthe further stepof implementinga portion of the frameworkandtestingit ontheidenticalstimuli usedin empiricalstudies,SM89 enabledthe entire approachto be evaluatedin muchgreaterdetailthanhasbeenpossiblewith previous,lessexplicitmodels.

Furthermore,it should not be overlooked that the im-plementedmodel succeedsin accountingfor a considerableamountof dataon normaland impairedword reading. Themodel reproducesthe quantitative effects found in over 20empirical studieson normal reading,as well as somebasicfindingson developmentaland acquireddyslexia. No otherexisting implementationcovers anything close to the samerangeof results.

Finally, it is importantto bearin mindthatthebasiccompu-tationalpropertiesof the SM89 framework and implementa-tion werenotdevelopedspecificallyfor wordreading.Rather,they derive from themuchbroaderenterpriseof connectionistmodelingin cognitive domains. The sameprinciplesof dis-tributedrepresentations,interactivity, distributedknowledge,andgradient-descentlearningarealsobeingappliedsuccess-fully to problemsin high-level vision, learningandmemory,speechand language,reasoningand problem solving, andmotor planningand control (seeHinton, 1991;McClelland,Rumelhart,& the PDPresearchgroup,1986;Quinlan,1991,for examples). Two distinctive aspectsof the connectionistapproachare its strongemphasison generallearningprinci-ples,andits attemptto makecontactwith neurobiologicalaswell ascognitive phenomena.Neurally plausiblelearningisparticularlycritical to understandingreadingasit is unlikelythatthebrainhasdevelopedinnate,dedicatedcircuitry for suchanevolutionarily recentskill. Thus,theSM89work not onlymakesspecificcontributionsto thestudyof reading,but alsofitswithin ageneralcomputationalapproachfor understanding

how cognitive processesare learnedandimplementedin thebrain.

TheSM89implementationdoes,however, haveseriouslim-itationsin accountingfor someempiricaldata.Someof theselimitationsnodoubtstemfromthelackof unimplementedpor-tionsof theframework—mostimportantly, theinvolvementofsemanticrepresentations,but alsoperhapsvisualandarticula-tory procedures.A full considerationof therangeof relevantempirical findings will be betterundertakenin the GeneralDiscussionin thecontext of thenew simulationresults.Con-siderationof the poor nonwordreadingperformanceof theSM89network,however, cannotbepostponed.This limitationis fundamentalasnonwordreadingis unlikely to beimprovedby the addition of semantics. Furthermore,Coltheartet al.(1993)have arguedthat,primarily asa resultof its poorpro-cessingof nonwords,themodelis incapableof accountingforfiveof six centralissuesin normalandimpairedwordreading.More fundamentally, by not readingnonwordsadequately, themodelfails to refutetheclaimof dual-routetheoriststhatread-ing nonwordsandreadingexceptionwordsrequiresseparatemechanisms.

Seidenberg andMcClelland(1990)arguedthatthemodel’spoor nonwordreadingwas not a fundamentalproblemwiththegeneralframework,but ratherwastheresultof twospecificlimitationsin theimplementation.Thefirst isthelimitedsizeofthetrainingcorpus.Themodelwasexposedtoonlyabout3000words,whereastheskilled readerswith whomit is comparedknow approximatelytentimesthatnumber. Giventhattheonlyknowledgethatthemodelhasavailablefor readingnonwordsis whatit hasderivedfrom words,a limited trainingcorpusisa serioushandicap.

Coltheartet al. (1993)have arguedthat limitations of theSM89 training corpuscannotexplain the model’s poor non-word readingbecausea systemthat learnsGPC rulesusingthesamecorpusperformsmuchbetter. Thisargumentis falla-cious,however, becausetheeffectivenessof a trainingcorpusdependscritically on otherassumptionsbuilt into thetrainingprocedure. In fact, Coltheartandcolleagues’procedureforlearningGPCruleshasbuilt into it a considerableamountofknowledgethat is specificto reading,concerningthe possi-blerelationshipsbetweengraphemesandphonemesin variouscontexts. In contrast,SM89 applieda generallearningpro-cedureto representationsthat encodeonly orderedtriples oflettersandphonemicfeatures,but nothingof their correspon-dences. A demonstrationthat the SM89 training corpusissufficient to supportgoodnonwordreadingin the context ofstrong,domain-specificassumptionsdoesnot invalidatetheclaim that the corpusmay be insufficient in the context ofmuchweakerassumptions.

Thesecondaspectof theSM89simulationthatcontributedto its poornonwordreadingwastheuseof Wickelfeaturestorepresentphonology. Thisrepresentationalschemehasknownlimitations,many of whicharerelatedto how well theschemecouldbeextendedtomorerealisticvocabularies(seeLachter&


Bever, 1988;Pinker& Prince,1988,for detailedcriticism). Inthecurrentcontext, SeidenbergandMcClelland(1990)pointedout that therepresentationsdo not adequatelycapturephone-mic structure. Specifically, the featuresof a phonemearenot boundwith eachother, but only with featuresof neigh-boring phonemes.As a result, the surroundingcontext cantoo easily introduceinappropriatefeatures,producingmanysingle-featureerrorsin nonwordpronunciations(e.g.,TIFE

�/tIv/).

Neither the specifictraining corpusnor the Wickelfeaturerepresentationarecentralto theSM89generalframework forlexical processing.If Seidenberg andMcClelland(1990)arecorrectin suggestingthat it is theseaspectsof thesimulationthatareresponsiblefor its poornonwordreading,their moregeneralframework remainsviable. On theotherhand,theac-tual performanceof an implementationis the mainsourceofevidencethatSM89putforwardin supportof theirview of thereadingsystem. As McCloskey (1991)hasrecentlypointedout, it is notoriouslydifficult bothto determinewhethera im-plementation’s failings aredue to fundamentalor incidentalpropertiesof its design,andto predicthow changesto its de-sign would affect its behavior. Thus, to supportthe SM89connectionistframework asa viablealternative to rule-based,dual-routeaccounts,it is critical to developfurthersimulationsthataccountfor samerangeof findingsastheoriginal imple-mentationandyetalsopronouncenonwordsaswell asskilledreaders.Thispaperpresentssuchsimulations.

Orthographic and PhonologicalRepresentations

Wickelfeatures and the Dispersion ProblemFor the purposesof supportinggood nonwordreading, theWickelfeaturephonologicalrepresentationhasa morefunda-mentaldrawback.Theproblemstemsfromthegeneralissueofhow to representstructuredobjects,suchaswordscomposedoforderedstringsof lettersandphonemes,in connectionistnet-works.Connectionistresearcherswouldlike theirnetworkstohave threeproperties(Hinton,1990):

1. All theknowledgein a networkshouldbein connectionweightsbetweenunits.

2. To supportgoodgeneralization,thenetwork’sknowledgeshouldcapturetheimportantregularitiesin thedomain.

3. For processingtobefast,themajorconstituentsof anitemshouldbeprocessedin parallel.

Theproblemis that thesethreepropertiesaredifficult to rec-oncilewith eachother.

Considerfirst the standardtechniqueof usingof position-specificunits, sometimescalled a slot-based representation(e.g.,McClelland& Rumelhart,1981).Thefirst lettergoesinthefirst slot,thesecondletterin thesecondslot,etc. Similarly

Table1The Dispersion Problem

Slot-basedrepresentationsLeft-justified

1 2 3 4 5L O GG L A DS P L I T

Vowel-centered� 3 � 2 � 1 0 1S U N

S W A MS P L I T

Context-sensitivetriples(“Wickelgraphs”)LOG: LO LOG OGGLAD: GL GLA LAD ADSPLIT: SP SPL PLI LIT IT

for theoutput,thefirst phonemegoesin the first slot, andsoon. With enoughslots,wordsup to any desiredlengthcanberepresented.

This schemesatisfiesproperties(1) and(3) but at a costtoproperty(2). Thatis, processingcanbedonein parallelacrosslettersandphonemesusingweightedconnections,but atacostof dispersingtheregularitiesof how lettersandphonemesarerelated. The reasonis that theremustbe a separatecopy ofeachletter(andphoneme)for eachslot, andbecausetherele-vantknowledgeis embeddedin connectionsthatarespecifictotheseunits, this knowledgemustbe replicatedin theconnec-tionsto andfrom eachslot. To someextentthis is usefulin thedomainof oral readingbecausethe pronunciationof a lettermaydependon whetherit occursat thebeginning,middle,orendof a word. However, theslot-basedapproachcarriesthisto anextreme,with unfortunateconsequences.ConsiderthewordsLOG, GLAD, andSPLIT. The fact that the letter L corre-spondsto thephoneme/l/ in thesewordsmustbelearnedandstoredthreeseparatetimesin the system.Thereis no gener-alizationof whatis learnedaboutlettersin onepositionto thesameletter in otherpositions.Theproblemcanbealleviatedto somedegreeby aligningtheslotsin variousways(e.g.,cen-teredaroundthevowel; Daugherty& Seidenberg, 1992)but itis noteliminatedcompletely(seeTable1). Adequategeneral-izationstill requireslearningtheregularitiesseparatelyacrossseveralslots.

An alternativeschemeis to applythenetworkto asinglelet-terata time,asin Sejnowski andRosenberg’s(1987)NETtalkmodel.3 Here,thesameknowledgeis appliedto pronouncinga letter regardlessof whereit occursin a word, and wordsof arbitrary lengthcanbe processed.Unfortunately, proper-ties (1) and(2) arenow beingtradedoff againstproperty(3).Processingbecomesslow andsequential,whichmaybesatis-

3Bullinaria(1995)hasrecentlydevelopedaseriesof networksof thisformthatexhibit impressiveperformancein readingnonwords,althoughonly veryweakeffectsof word frequency. Coltheartet al. (1993)alsotakeasequentialapproachto solvingthedispersionproblem,in thata correspondencelearnedfrom a positionis appliedto all positionsunlessadifferentcorrespondenceislearnedelsewhere.


factoryin many domainsbutnotin wordreading.Notethatthecommonfindingof smallbut significanteffectsof wordlengthonnaminglatency (e.g.,Butler& Hains,1979;Frederiksen&Kroll, 1976;Richardson,1976)doesnot imply that thecom-putationfrom orthographyto phonologyoperatessequentiallyover letters; a parallel implementationof this mappingmayalsoexhibit small lengtheffects (aswill be demonstratedinSimulation3).

Therepresentationsusedby SM89wereanattemptto avoidthe specificlimitationsof the slot-basedapproach,but in theendturnout to have a versionof thesameproblem.Elementssuchas lettersand phonemesare represented,not in termsof their absolutespatialposition, or relative positionwithinthe word, but in termsof the adjacentelementsto the leftandright. This approach,which originatedwith Wickelgren(1969),makestherepresentationof eachelementcontext sen-sitive without being rigidly tied to position. Unfortunately,however, theknowledgeof spelling-soundcorrespondencesisstill dispersedacrossa largenumberof differentcontexts,andadequategeneralizationstill requiresthat the training effec-tively covers them all. Returningto Table 1, althoughthewordsLOG, GLAD, andSPLIT sharethecorrespondenceL � /l/,they have no triplesof lettersin common.A similar propertyholdsin phonologyamongtriples of phonemesor phonemicfeatures. Thus, as in the slot-basedapproach,althoughthesamecorrespondenceis presentin thesethreecases,differentunitsareactivated.Asaresult,theknowledgethatis learnedinonecontext—encodedasconnectionweights—doesnotapplyin othercontexts, therebyhinderinggeneralization.

Noticethattheeffect of dispersingregularitiesis muchliketheeffectof limiting thesizeof thetrainingcorpus.Thecontri-butionthatanelementmakesto therepresentationof thewordis specificto the context in which it occurs. As a result,theknowledgelearnedfrom oneitem is beneficialonly to otheritemswhichsharethatspecificcontext. Whenrepresentationsdispersetheregularitiesin thedomain,thenumberof trainedmappingsthatsupporta givenpronunciationis effectively re-duced. As a result,generalizationto novel stimuli, asin thepronunciationof nonwords,is basedon lessknowledgeandsuffers accordingly. In a way, Seidenberg andMcClelland’s(1990)two suggestionsfor improving their model’s nonwordreadingperformance—enlargethetrainingcorpusandimprovetherepresentations—amountto thesamething. By usingim-provedrepresentationsthatminimizethedispersionproblem,theeffective sizeof thetrainingcorpusfor a givenpronuncia-tion is increased.

Condensing Spelling-Sound RegularitiesThehypothesisguidingthecurrentwork wastheideathatthedispersionproblempreventedtheSM89networkfromexploit-ing the structureof the English spelling-to-sound systemasfully as humanreadersdo. We setout, therefore,to designrepresentationsthatminimizethisdispersion.

Thelimiting caseof ourapproachwouldbeto haveasingle

setof letterunits,onefor eachletterin thealphabet,andasinglesetof phonemeunits,onefor eachphoneme.Sucha schemesatisfiesall threeof Hinton’s(1990)desiredproperties:All ofthelettersin awordmapto all of its phonemessimultaneouslyvia weightedconnections(andpresumablyhiddenunits),andthespelling-soundregularitiesarecondensedbecausethesameunitsandconnectionsareinvolvedwheneveraparticularletteror phonemeis present. Unfortunately, this approachhas afatalflaw: it doesnot preserve therelativeorder of lettersandphonemes.Thus,it cannotdistinguishTOP from POT or SALT

from SLAT.It turnsout, however, thata schemeinvolving only a small

amountof replicationis sufficient to provide a uniquerep-resentationof virtually every uninflectedmonosyllabicword.By definition,amonosyllablecontainsonly a singlevowel, soonly onesetof vowel units is needed.A monosyllablemaycontainboth an initial anda final consonantcluster, andal-mostevery consonantcanoccur in eithercluster, so separatesetsof consonantunitsarerequiredfor eachof theseclusters.Theremarkablething is thatthis is nearlyall thatis necessary.Thereasonis that,within aninitial or final consonantcluster,therearestrongphonotacticconstraintsthatarisein largepartfrom the structureof the articulatorysystem. At both endsof the syllable,eachphonemecan occuronly once,and theorderof phonemesis strongly constrained.For example, ifthephonemes/s/, /t/ and/r/ all occurin theonsetcluster, theymust be in that order, /str/. Given this, all that is requiredto specifya pronunciationis which phonemesarepresentineachcluster—thephonotacticconstraintsuniquelydeterminetheorderin which thesephonemesoccur.

Thenecessaryphonotacticconstraintscanbeexpressedsim-ply by groupingphonemesinto mutually exclusive sets,andorderingthesesetsfromleft to right in accordancewith theleft-to-right orderingconstraintswithin consonantclusters.Oncethis is done,readingout a pronunciationinvolvessimply con-catenatingthephonemesthatareactive in sequencefrom leftto right, includingatmostonephonemepermutuallyexclusiveset(seeTable2).

Therearea few casesin which two phonemescanoccurineitherorderwithin aconsonantcluster(e.g.,/p/and/s/in CLASP

andLAPSE). To handlesuchcases,it is necessaryto addunitsto disambiguatetheorder(e.g.,/ps/). Theconventionis that,if /s/and/p/ arebothactive, they aretakenin thatorderunlessthe /ps/ unit is active, in which casethe order is reversed.To cover the pronunciationsin the SM89 corpus,only threesuchunitsarerequired:/ps/,/ks/ and/ts/. Interestingly, thesecombinationsaresometimeswritten with singleletters(e.g.,English X, GermanZ) andare closely relatedto other stop-fricative combinations,like /C/ (/tS/) and /j/ (/dZ/), that aretypically consideredto be singlephonemescalledaffricates.In fact,/ts/isoftentreatedasanaffricateand,acrosslanguages,is amongthemostcommon(seeMaddieson,1984),andpost-vocalic/ps/and/ks/behavesimilarly toaffricates(Lass,1984).

This representationalschemeappliesalmostaswell to or-


Table2Phonological and Orthographic Representations Used in the Simulations

Phonology�onset sS C z Z j f v T D p b t d k g m n h l r w yvowel a e i o u @ A E I O U W Ycoda r l m n N b g d psks ts sz f v p k t SZ T D C j

Orthographyonset Y S P T K Q C B D G F V J Z L M N R W H CH GH GN PH PSRH SH TH TS WH

vowel E I O U A Y AI AU AW AY EA EE EI EU EW EY IE OA OE OI OO OU OW OY UE UI UY

coda H R L M N B D G C X F V J S Z P T K Q BB CH CK DD DG FF GG GH GN KS LL NG

NN PH PPPSRR SH SL SSTCH TH TS TT ZZ U E ESED� /a/ in POT, /@/ in CAT, /e/ in BED, /i/ in HIT, /o/ in DOG, /u/ in GOOD, /A/ in MAKE, /E/ in KEEP, /I/ in BIKE, /O/ inHOPE, /U/ in BOOT, /W/ in NOW, /Y/ in BOY, / � / in CUP, /N/ in RING, /S/ in SHE, /C/ in CHIN /Z/ in BEIGE, /T/ inTHIN, /D/ in THIS. All otherphonemesarerepresentedin theconventionalway (e.g.,/b/ in BAT). Thegroupingsindicatesetsof mutuallyexclusivephonemes.Note: Thenotationfor vowelsis slightly differentfrom thatusedby Seidenberg andMcClelland(1989). Also,the representationsdiffer slightly from thoseusedby Plaut and McClelland (1993,Seidenberg et al., 1994).In particular, /C/ and /j/ have beenaddedfor /tS/ and /dZ/, the orderingof phonemesis somewhat different,themutuallyexclusivephonemesetshave beenadded,andtheconsonantalgraphemesU, GU andQU have beeneliminated.Thesechangesbettercapturetherelevantphonotacticconstraintsandsimplify theencodingprocedurefor convertingletterstringsinto activity patternsovergraphemeunits.

thographyas it doesto phonologybecauseEnglishis an al-phabeticlanguage(i.e., partsof the written form of a wordcorrespondto partsof its spokenform). However, thespellingunits that correspondto phonemesarenot necessarilysingleletters. Rather, they are what Venezky (1970) termedrela-tional units, sometimescalled graphemes,that can consistof from one to four letters(e.g., L, TH, TCH, EIGH). As thespelling-soundregularitiesof Englishareprimarilygrapheme-phonemecorrespondences,the regularitiesin the systemaremostelegantlycapturedif theorthographicunitsrepresentthegraphemespresentin the stringratherthansimply the lettersthatmakeuptheword.

Unfortunately, it is not alwaysclear what graphemesarepresentin a word. Considerthe word SHEPHERD. In thiscase,thereis a P next to an H, so we might supposethat theword containsa PH grapheme,but in fact it doesnot; if itdid it would be pronounced“she-ferd.” It is apparentthatthe input is ambiguousin suchcases.Becauseof this, thereis no simple procedurefor translatingletter stringsinto thecorrectsequenceof graphemes. It is, however, completelystraightforwardto translatea lettersequenceinto a patternofactivity representingall possiblegraphemesin thestring.Thus,whenever a multiletter graphemeis present,its componentsarealsoactivated. This procedureis alsoconsistentwith thetreatmentof /ps/,/ks/,and/ts/ in phonology.

To this point, theorthographicandphonologicalrepresen-tationshave beenmotivatedpurelyby computationalconsid-erations: to condensespelling-soundregularitiesin order toimprove generalization. Before turning to the simulations,however, it is importantto be clear about the empirical as-sumptionsthatareimplicit in theuseof theserepresentations.

Certainly, a full accountof readingbehavior would have toincludea specificationof how therepresentationsthemselvesdevelopprior to andduringthecourseof readingacquisition.Suchademonstrationis beyondthescopeof thecurrentwork.In fact,unlessweareto modeleverythingfrom theeye to themouth,wecannotavoid makingassumptionsaboutthereadingsystem’sinputsandoutputs,eventhough,in actuality, thesearelearned,internalrepresentations.Thebestwecando is to en-surethat theserepresentationsareat leastbroadlyconsistentwith therelevantdevelopmentalandbehavioral data.

Therelevantassumptionsaboutthephonologicalrepresen-tationsarethat they aresegmental(i.e., they arecomposedofphonemes)andthatthey arestronglyconstrainedby phonotac-tics. We presumethat this phonologicalstructureis learned,for themostpart,prior to readingacquisition,on thebasisofspeechcomprehensionandproduction.Thisis nottodeny thatphonologicalrepresentationsmaybecomefurtherrefinedoverthecourseof readingacquisition,particularlyundertheinflu-enceof explicit phoneme-basedinstruction(see,e.g.,Morais,Cary, Alegria, & Bertelson,1979;Morais,Bertelson,Cary, &Alegria, 1986). For simplicity, however, our modelingworkusesfully developedphonologicalrepresentationsfrom theoutsetof training.

Analogousassumptionsapply with regard to the ortho-graphicrepresentations.We assumethat they are basedonlettersandlettercombinations,andthat theorderingof theseobeys graphotacticconstraints(althoughin Englishsuchcon-straintsaregenerallyweakerthanthosein phonology).Whilethesepropertiesarenotparticularlycontroversialper se, ortho-graphicrepresentationsmustdevelopconcurrently with read-ing acquisition.Thus,theuseof fully-articulatedorthographic


representationsfrom the outsetof readingacquisitionis cer-tainly suspect.

Again, a completeaccountof how orthographicrepresen-tationsdevelop from moreprimitive visual representationsisbeyond thescopeof thecurrentwork. Herewe provide onlya generalcharacterizationof suchan account. We supposethat children first learn visual representationsfor individualletters,perhapsmuch like thoseof other visual objects. Inlearningto read, they are exposedto words that consistofthesefamiliar lettersin variouscombinations.Explicit repre-sentationsgraduallydevelopfor lettercombinationsthatoccuroftenor have unusualconsequences(seeMozer, 1990).In thecontext of oral reading,many of thesecombinationsarepre-cisely thosewhosepronunciationsarenot predictedby theircomponents(e.g.,TH, PH), correspondingto Venezky’s (1970)relationalunits. Of course,explicit representationsmay de-velop for other, regularly-pronouncedlettercombinations.Inthe limit, the orthographicrepresentationmight containallthe letter combinationsthat occur in the language.Expand-ing our orthographicrepresentationwith multiletter units forall of theseadditionalcombinationswould have little conse-quencebecausetherewouldbelittle pressurefor thenetworkto learnanything aboutthem,given that thecorrespondencesof their componentsarealreadylearned.In this way, thepar-ticular setof multilettergraphemesweemploycanbeviewedasan efficient simplificationof a moregeneralorthographicrepresentationthat would develop throughexposureto lettercombinationsin words.

Tobeclear, wedonotclaimthattheorthographicandphono-logical representationsweusearefully general.Someof theiridiosyncrasiesstemfrom the fact that their designtook intoaccountspecificaspectsof the SM89 corpus. Nonetheless,we do claim that the principleson which the representationswerederived—inparticular, theuseof phonotacticandgrapho-tacticconstraintsto condensespelling-soundregularities—aregeneral.

Simulation 1: Feedforward Network

Thefirst simulationis intendedto testthehypothesisthat theuseof representationswhich condensedthe regularitiesbe-tweenorthographyandphonologywouldimprovethenonwordreadingperformanceof a networktrainedon the SM89 cor-pusof monosyllabicwords. Specifically, the issueis whethera singlemechanism,in the form of a connectionistnetwork,canlearnto reada reasonablylargecorpusof words,includ-ing many exceptionwords,andyet alsoreadpronounceablenonwordsaswell asskilled readers.If sucha networkcanbedeveloped,it would underminetheclaimsof dual-routetheo-riststhatskilledwordreadingrequirestheseparationof lexicalandsublexical proceduresfor mappingprint to sound.

100 hidden units

61 phoneme units

105 grapheme units

Figure 3. Thearchitectureof thefeedforwardnetwork.Ovalsrepresentgroupsof units,andarrowsrepresentcompletecon-nectivity from onegroupto another.

MethodNetwork Architecture. Thearchitectureof thenetwork,

shown in Figure3, consistsof threelayersof units. Theinputlayerof thenetworkcontains105grapheme units,onefor eachgraphemein Table2. Similarly, the outputlayercontains61phoneme units. Betweenthesetwo layersis an intermediatelayerof 100hidden units.Eachunit � hasareal-valuedactivitylevel or state,�� , thatrangesbetween0 and1, andis asmooth,nonlinear(logistic) functionof theunit’stotal input, �� .� �� (1)� �!� "�# � �%$&� 1

1 � exp # � � � $ (2)

where�� is the weight from unit ' to unit � , �� is the real-

valuedbias of unit � , andexp (*),+ is theexponentialfunction.Eachhiddenunit receivesaconnectionfrom eachgrapheme

unit, and in turn sendsa connectionto eachphonemeunit.In contrastto the SM89 network,the graphemeunits do notreceive connectionsback from the hiddenunits. Thus, thenetworkonly mapsfrom orthographyto phonology, not alsofrom orthographyto orthography(also seePhillips, Hay, &Smith,1993).Weightsonconnectionsareinitializedto small,randomvalues,uniformly distributedbetween- 0.1. Thebiastermsfor thehiddenandphonemeunitscanbe thoughtof astheweightonanadditionalconnectionfromaunit whosestateis always1.0(andsocanbelearnedin thesameway asotherconnectionweights).Includingbiases,thenetworkhasa totalof 17,061connections.

Training Procedure. The trainingcorpusconsistsof the2897monosyllabicwordsin theSM89corpus,augmentedby101monosyllabicwordsmissingfrom thatcorpusbut usedasword stimuli in variousempiricalstudies,for a total of 2998words.4 Amongtheseare13 setsof homographs(e.g.,READ

4The PlautandMcClelland(1993,Seidenberg et al., 1994)networkwasalsotrainedon 103 isolatedgrapheme-phonemecorrespondences,asan ap-proximationto the explicit instructionmanychildren receive in learningtoread. Thesecorrespondenceswerenot includedin the trainingof anyof thenetworksreportedin this paper.

UnderstandingNormalandImpairedWordReading 14� /rEd/andREAD� /red/)—forthese,bothpronunciationsare

includedin the corpus. Most of the words are uninflected,althoughtherearea few inflectedforms that have beenusedin someempiricalstudies(e.g.,ROLLED, DAYS). Althoughtheorthographicandphonological representationsarenotintendedto handleinflectedmonosyllables,they happento becapableof representingthosein the trainingcorpusandsothesewereleft in. It shouldbekeptin mind,however, thatthenetwork’sexposureto inflectedformsis extremelyimpoverishedrelativeto thatof skilled readers.

A letterstring is presentedto thenetworkby clampingthestatesof thegraphemeunitsrepresentinggraphemescontainedin thestring to 1.0,andthestatesof all othergraphemeunitsto 0. In processingtheinput,hiddenunitscomputetheirstatesbasedon thoseof thegraphemeunitsandtheweightson con-nectionsfrom them(accordingto Equations1 and2) andthenphonemeunitscomputetheir statesbasedon thoseof thehid-denunits. Theresultingpatternof activity over thephonemeunitsrepresentsthenetwork’spronunciationof theinput letterstring.

After eachword is processedby the networkduring train-ing,back-propagation(Rumelhart,Hinton,& Williams,1986a,1986b) is usedto calculatehow to changethe connectionweightsso as to reducethe discrepancy betweenthe patternof phonemeactivity generatedby thenetworkandthecorrectpatternfor the word (i.e., the derivative of the error with re-spectto eachweight). A standardmeasureof thisdiscrepancy,andthe oneusedby SM89, is the summedsquarederror, E,betweenthegeneratedandcorrectoutput(phoneme)states.. � � � ( � � �0/ � )2 ( 3+where� � is thestateof phonemeunit ' and/ � is its correct(tar-get)value. However, in thenew representationof phonology,eachunit canbeinterpretedasanindependenthypothesisthata particularphonemeis presentin the outputpronunciation.5

In this case,a more appropriateerror measureis the cross-entropy, C, betweenthegeneratedandcorrectactivity patterns(seeHinton, 1989;Rumelhart,Durbin, Golden,& Chauvin,in press),alsotermedtheasymmetricdivergenceor Kullback-Leiblerdistance(Kullback& Leibler, 1951).1 � � � � / � log2 ( � � ) � (1 �2/ � ) log2 (1 � � � ) ( 4+Notice that the contribution to cross-entropyof a given unit' is simply � log2 ( � � ) if its target is 1, and � log2 (1 � � � ) ifits target is 0. From a practicalpoint of view, cross-entropyhasan advantageover summedsquarederror whenit comes

5This is not preciselytruebecausetheprocedurefor determiningthepro-nunciationbasedon phonemeunit activities, soonto be described,doesnotconsidertheseunits independently, andtheir statesarenot determinedinde-pendentlybut arebasedon the samesetof hiddenunit states.Nonetheless,theapproximationis sufficientto makecross-entropyamoreappropriateerrormeasurethansummedsquarederror.

to correctingoutputunits that arecompletelyincorrect(i.e.,on the oppositeflat portion of the logistic function). This isa particularconcernin tasksin which outputunitsareoff formostinputs—thenetworkcaneliminatealmostall of its erroron thetaskby turningall of theoutputunitsoff regardlessoftheinput, includingthosefew thatshouldbeon for this input.Theproblemis that,whenaunit’sstatefallsonaflatportionofthelogistic function,verylargeweightchangesarerequiredtochangeitsstatesubstantially. Asaunit’sstatedivergesfrom itstarget,thechangein cross-entropyincreasesmuchfasterthanthat of summedsquarederror (exponentiallyvs. linearly) sothat cross-entropyis betterableto generatesufficiently largeweightchanges.6

During training,weightswerealsogivena slight tendencyto decaytowardszero. This wasaccomplishedby augment-ing the cross-entropyerror function with a term proportional(with a constantof 0.0001in the currentsimulation)to thesumof the squaresof eachweight, 3 �54 � � 2

� � . Althoughnotcritical, weightdecaytendsto aidgeneralizationby constrain-ing weightsto grow only to theextent thatthey areneededtoreducetheerroron thetask(Hinton,1989).

In the SM89 simulation,the probability that a word waspresentedto the networkfor training during an epochwasalogarithmicfunctionof its written frequency (Kucera& Fran-cis, 1967). In the currentsimulation,the samecompressedfrequency valuesare usedinsteadto scaleerror derivativescalculatedby back-propagation.Thismanipulationhasessen-tially the sameeffect: more frequentwordshave a strongerimpactthanlessfrequentwordson theknowledgelearnedbythesystem.In fact,usingfrequenciesis thismanneris exactlyequivalenttoupdatingtheweightsaftereachsweepthroughanexpandedtrainingcorpusin whichthenumberof timesawordis presentedis proportionalto its (compressed)frequency. Thenew procedurewas adoptedfor two reasons.First, by pre-sentingthe entiretrainingcorpusevery epoch,learningrateson eachconnectioncould be adaptedindependently(Jacobs,1988; but seeSutton,1992,for a recentlydevelopedon-lineversion).7 Second,by implementingfrequencieswith multi-plication ratherthansampling,any rangeof frequenciescanbeused;laterwewill investigatetheeffectsof usingtheactualKuceraandFrancis(1967)frequenciesin simulations.SM89were constrainedto usea logarithmic compressionbecause

6Thederivativeof cross-entropywith respectto anoutputunit’s total inputis simply thedifferencebetweentheunit’s stateandits target.6876:9<;>= 6876�?5;A@ ?5;@ 9<;B=DC 1 EGF ;

1 E ?H; E F ;?5;JI ?H;LK 1 E ?5;NM = ?5; EBF ;7Theprocedurefor adjustingtheconnection-specificlearningrates,called

delta-bar-delta(Jacobs,1988),worksasfollows. Eachconnection’s learningrate is initialized to 1.0. At the endof eachepoch,the error derivative forthatconnectioncalculatedby back-propagationis comparedwith its previousweightchange.If theyarebothin thesamedirection(i.e.,havethesamesign),theconnection’slearningrateis incremented(by0.1in thecurrentsimulation);otherwise,it is decreasedmultiplicatively (by 0.9in thecurrentsimulation).


less-severe compressionswould have meantthat the lowestfrequency wordsmightneverhave beenpresentedto theirnet-work.

The actual weight changesadministeredat the end ofan epochare a combinationof the accumulatedfrequency-weightederror derivatives and a proportionof the previousweightchanges.

∆�PO QSR� � �UTVT � � CXW 1W �� XY ∆

�PO Q�Z1R� � I ( 5+

where / is the epochnumber, T is the global learning rate(0.001in thecurrentsimulation),T � � is theconnection-specificlearningrate,

1is thecross-entropyerrorfunctionwith weight

decay, and Y is thecontributionof pastweightchanges,some-timestermedmomentum (0.9 after the first 10 epochsin thecurrentsimulation). Momentumis introducedonly after thefirst few initial epochsto avoid magnifyingthe effectsof theinitial weightgradients,whichareverylargebecause,for eachword,any activity of all but a few phonemeunits—thosethatshouldbeactive—producesa largeamountof error (Plaut&Hinton,1987).

Testing Procedure. The network, as describedabove,learnsto takeactivity patternsoverthegraphemeunitsandpro-ducecorrespondingactivity patternsover thephonemeunits.The behavior of humansubjectsin oral reading,however, isbetterdescribedin termsof producingphonemestringsin re-sponseto letterstrings. Accordingly, for a directcomparisonof thenetwork’sbehavior with thatof subjects,weneedapro-cedurefor encodingletterstringsasactivity patternsover thegraphemeunits,andanotherprocedurefor decodingactivitypatternsover thephonemeunitsinto phonemestrings.

Theencodingprocedureis theoneusedtogeneratetheinputto the networkfor eachword in the trainingcorpus. To con-vert a letterstring into anactivity patternover the graphemeunits,thestring is parsedinto onsetconsonantcluster, vowel,andfinal (coda)consonantcluster. This involvessimply lo-cating in the string the leftmostcontiguousblock composedof the lettersA, E, I, O, U, or (non-initial) Y. This block oflettersis encodedusingvowel graphemeslisted in Table2—any graphemecontainedin the vowel substringis activated;all othersareleft inactive. Thesubstringsto theright andleftof the vowel substringareencodedsimilarly usingthe onsetand codaconsonantgraphemes,respectively. For example,the word SCHOOL activatesthe onsetunits S, C, H, and CH,the vowel units O andOO, andthe codaunit L. Notice that,in wordslike GUEST, QUEEN, andSUEDE, the U is parsedasavowel althoughit functionsasa consonant(cf. GUST, QUEUE,andSUE; Venezky, 1970). This is muchlike theissuewith PH

in SHEPHERD—suchambiguityis left for thenetworkto copewith. Theanalogousencodingprocedurefor phonemesusedto generatethe trainingpatternsfor wordsis evensimplerasmonosyllabicpronunciationsmustcontainexactly onevowel.

Thedecodingprocedurefor producingpronunciationsfromphonemeactivities generatedby the network is likewise

straightforward.As shown in Table2, phonemesaregroupedinto mutuallyexclusive sets,andthesesetsareorderedleft toright (andtop to bottomin theTable). This groupingandor-deringencodethephonotacticconstraintsthatarenecessarytodisambiguatepronunciations.Theresponseof thenetworkissimply theorderedconcatenationof all active phonemes(i.e.,with stateabove0.5)thatarethemostactivein theirset.Thereareonly two exceptionsto this rule. Thefirst is that,asmono-syllabicpronunciationsmustcontaina vowel, themostactivevowel is includedin the network’s responseregardlessof itsactivity level. Thesecondexceptionrelatesto theaffricate-likeunits, /ps/, /ks/ and/ts/. As describedearlier, if oneof theseunits is active alongwith its components,the orderof thosecomponentsin theresponseis reversed.

Thesimplicityof theseencodinganddecodingproceduresisasignificantadvantageof thecurrentrepresentationsoverthoseusedbySM89. In thelattercase,reconstructingauniquestringof phonemescorrespondingto apatternof activity over triplesof phonemicfeaturesis exceedinglydifficult, andsometimesimpossible(alsoseeRumelhart& McClelland,1986;Mozer,1991). In fact, SM89 did not confrontthis problem—rather,they simply selectedthe bestamonga setof alternative pro-nunciationsbasedon their errorscores.In a sense,theSM89modeldoesnotproduceexplicit pronunciations;it enablesan-otherprocedureto selectamongalternatives. In contrast,thecurrentdecodingproceduredoesnot requireexternally gen-eratedalternatives;every possiblepatternof activity over thephonemeunits correspondsdirectly andunambiguouslyto aparticularstringof phonemes.Nonetheless,it shouldbekeptin mindthattheencodinganddecodingproceduresareexternalto the networkand,hence,constituteadditionalassumptionsaboutthenatureof theknowledgeandprocessinginvolvedinskilled reading,asdiscussedearlier.

ResultsWord Reading. After 300epochsof training,thenetwork

correctlypronouncesall of the 2972nonhomographicwordsin the training corpus. For eachof the 13 homographs,thenetworkproducesoneof thecorrectpronunciations,althoughtypically thecompetingphonemesfor thealternativesareaboutequallyactive. For example,thenetworkpronouncesLEAD as/lEd/; the activationof the /E/ is 0.56while the activationof/e/ is 0.44.Thesedifferencesreflecttherelativeconsistency ofthealternativeswith thepronunciationsof otherwords.

Giventhenatureof thenetwork,thislevelof performanceonthetrainingcorpusis optimal. As thenetworkis deterministic,it alwaysproducesthesameoutputfor a giveninput. Thus,infact, it is impossiblefor thenetworkto learnto producebothpronunciationsof any of thehomographs.Notethatthisdeter-minacy is notanintrinsic limitation of connectionistnetworks(see,e.g.,Movellan & McClelland,1993). It merelyreflectsthe fact that the generalprinciple of intrinsic variability wasnot includedin thepresentsimulationfor practicalreasons—tokeepthecomputationaldemandsof thesimulationreasonable.


For the presentpurposes,the importantfinding is that thetrainednetworkreadsboth regular andexceptionwordscor-rectly. Wearealsointerestedin how well thenetwork replicatesthe effects of frequency andconsistency on naminglatency.However, we will return to this issueafter we considerthemorepressingissueof the network’s performancein readingnonwords.

Nonword Reading. Wetestedthenetworkonthreelistsofnonwordsfrom two empiricalstudies.Thefirst two listscomefrom anexperimentby Glushko(1979),in whichhecomparedsubjects’readingof 43 nonwordsderivedfrom regularwords(e.g., HEAN from DEAN) with their readingof 43 nonwordsderived from exceptionwords (e.g., HEAF from DEAF). Al-thoughGlushkooriginally termedtheseregularnonwordsandexceptionnonwords,respectively, they aremoreappropriatelycharacterizedin termsof whethertheir bodyneighborhoodisconsistentor not,andhencewewill referto themasconsistentor inconsistent nonwords.Thethirdnonwordlist comesfromastudyby McCannandBesner(1987),in whichthey comparedperformanceon a setof 80 pseudohomophones(e.g.,BRANE)with asetof 80controlnonwords(e.g.,FRANE). Weusedonlytheir control nonwordsin the presentinvestigationaswe be-lievepseudohomophoneeffectsaremediatedby aspectsof thereadingsystem,suchassemanticsandthearticulatorysystem,that arenot implementedin our simulation(seethe GeneralDiscussion).

As nonwordsare,by definition,novel stimuli, exactly whatcountsasthe“correct” pronunciationof a nonwordis amatterof considerabledebate(see,e.g.,Masterson,1985;Seidenbergetal.,1994).Thecomplexity of thisissuewil l becomeapparentmomentarily. For the purposesof an initial comparison,wewill considerthepronunciationof anonwordto becorrectif itis regular, asdefinedby adheringto theGPCrulesoutlinedbyVenezky (1970).

Table3 presentsthecorrectperformanceof skilled readersreportedbyGlushko(1979)andbyMcCannandBesner(1987)on their nonwordlists, andthecorrespondingperformanceofthenetwork. Table4 lists theerrorsmadeby thenetworkontheselists.

FirstconsiderGlushko’sconsistentnonwords.Thenetworkmakesonly a singleminor mistakeon theseitems,just failingto introducethe transitional/y/ in MUNE. In fact, this inclu-sionvariesacrossdialectsof English(e.g.,DUNE

� /dUn/ vs./dyUn/). In thetrainingcorpus,thefour wordsendingin UNE

(DUNE, JUNE, PRUNE, TUNE) areall codedwithout the /y/. Inany case,overallboththenetworkandsubjecthavenodifficulton theserelatively easynonwords.

The situation is ratherdifferent for the inconsistentnon-words. Both the network and subjectsproducenon-regularpronunciationsfor asignificantsubsetof theseitems,with thenetworkbeingslightly moreproneto doso. However, acloserexaminationof theresponsesin thesecasesrevealswhy. Con-siderthenonwordGROOK. ThegraphemeOO mostfrequentlycorrespondsto /U/, as in BOOT, and so the correct(regular)

pronunciationof GROOK is /grUk/. However, thebody OOK isalmostalwayspronounced/u/,asin TOOK. Theonlyexceptionto this amongthe 12 words endingin OOK in the trainingcorpusis SPOOK

� /spUk/.Thissuggeststhat/gruk/shouldbethecorrectpronunciation.

Actually, the issueof whetherthe network’spronunciationis corrector not is less relevant than the issueof whetherthe networkbehaves similarly to subjects. In fact, both thesubjectsandthenetworkaresensitive to thecontext in whichvowelsoccur, asevidencedby their muchgreatertendency toproducenon-regularpronunciationsfor inconsistentnonwordsascomparedwith consistentnonwords.Glushko(1979)foundthat 80% of subject’s non-regular responsesto inconsistentnonwordswereconsistentwith someother pronunciationofthe nonword’s body that occursin the Kucera and Francis(1967) corpus,leaving only 4.1% of all responsesas actualerrors. In the network, all of the non-regular responsestoinconsistentnonwordsmatchsomeotherpronunciationin thetraining corpusfor the samebody, with half of thesebeingthe most frequentpronunciationof the body. None of thenetwork’sresponsesto inconsistentnonwordsareactualerrors.Overall,thenetworkperformsaswell if notslightlybetterthansubjectson theGlushkononwordlists. Appendix2 listsall ofthepronunciationsacceptedascorrectfor eachof theGlushkononwords.

BoththesubjectsandthenetworkfindMcCannandBesner’s(1987)controlnonwordsmoredifficult to pronounce,whichisnotsurprisingasthelistscontainanumberof orthographicallyunusualnonwords(e.g.,JINJE, VAWX). Overall, thenetwork’sperformanceis slightly worsethanthatof subjects.However,many of the network’s errorscanbe understoodin termsofspecificpropertiesof thetrainingcorpusandnetworkdesign.First,althoughthereis noword in thetrainingcorpuswith thebody OWT, medialOW is oftenpronounced/O/ (e.g.,BOWL� /bOl/) and so KOWT

� /kOt/ should be considereda rea-sonableresponse.Second,two of the errorsareon inflectedforms, SNOCKSandLOKES, andaspreviously acknowledged,the network hasminimal experiencewith inflectionsand isnot intendedto applyto them. Finally, thereareno instancesin the training corpusof wordscontainingthe graphemeJ inthe coda, and so the network cannotpossiblyhave learnedto mapit to /j/ in phonology. In a way, for a nonwordlikeJINJE, the effective input to the networkis JINE, to which thenetwork’sresponse/jIn/ iscorrect.Thisalsoappliesto thenon-word FAIJE. Excludingtheseandtheinflectedformsfrom thescoring,and consideringKOWT

� /kOt/ correct, the networkperformscorrectlyon69/76(90.8%)of theremainingcontrolnonwords,which is slightly betterthanthesubjects.Most ofthe remainingerrorsof the networkinvolve correspondencesthatareinfrequentor variablein thetrainingcorpus(e.g.,PH� /f/, U

� /yU/).It mustbe acknowledgedthat the failure of the modelon

inflectedformsandon thosewith J in thecodaarerealshort-comingsthatwouldhave to beaddressedin a completelyade-


Table3Percent of Regular Pronunciations of Nonwords

Glushko(1979) McCannandBesner(1987)Consistent Inconsistent ControlNonwords Nonwords Nonwords

Subjects 93.8 78.3 88.6Network 97.7 72.1 85.0

Table4Errors by the Feedforward Network in Pronouncing Nonwords

Glushko(1979) McCannandBesner(1987)Nonword Correct Response Nonword Correct Response

ConsistentNonwords(1/43) ControlNonwords(12/80)MUNE /myUn/ /m(y 0.43)Un/ *PHOYCE /fYs/ /(f 0.42)Y(s0.00)/

InconsistentNonwords(12/43) *TOLPH /tolf/ /tOl(f 0.12)/BILD /bild/ /bIld/ *ZUPE /zUp/ /zyUp/BOST /bost/ /bOst/ SNOCKS /snaks/ /snask(ks0.31)/COSE /kOz/ /kOs/ LOKES /lOks/ /lOsk(ks0.02)/GROOK /grUk/ /gruk/ *YOWND /yWnd/ /(y 0.47)and/LOME /lOm/ /l m/ KOWT /kWt/ /kOt/MONE /mOn/ /m n/ FAIJE /fAj/ /fA(j 0.00)/PILD /pild/ /pIld/ *ZUTE /zUt/ /zyUt/PLOVE /plOv/ /pl v/ *VEEZE /vEz/ /(v 0.40)Ez/POOT /pUt/ /put/ *PRAX /pr@ks/ /pr@sk(ks0.33)/SOOD /sUd/ /sud/ JINJE /jinj/ /jIn(j 0.00)/SOST /sost/ /s st/WEAD /wEd/ /wed/

Note: /a/ in POT, /@/ in CAT, /e/ in BED, /i/ in HIT, /o/ in DOG, /u/ in GOOD, /A/ in MAKE, /E/ in KEEP, /I/ in BIKE,/O/ in HOPE, /U/ in BOOT, /W/ in NOW, /Y/ in BOY, / � / in CUP, /N/ in RING, /S/ in SHE, /C/ in CHIN /Z/ in BEIGE, /T/in THIN, /D/ in THIS. Theactivity levelsof correctbut missingphonemesarelistedin parentheses.In thesecases,theactualresponseis whatfalls outsidetheparentheses.Wordsmarkedwith “*” remainerrorsafterconsideringpropertiesof thetrainingcorpus(asexplainedin thetext).


quateaccountof word reading.Ourpurposein separatingouttheseitemsin theaboveanalysissimplyacknowledgesthatthemodel’s limitationsareeasilyunderstoodin termsof specificpropertiesof thetrainingcorpus.

Is it a Dual-Route Model? One possibility, consistentwith dual-routetheories,is that the networkhaspartitioneditself into two sub-networks,onethatreadsregularwords,andanotherthatreadsexceptionwords.If thiswerethecase,somehiddenunits would contribute to exceptionwordsbut not tononwords,while otherswouldcontributeto nonwordsbut notto exceptionwords. To testthis possibility, we measuredthecontributionahiddenunit makesto pronouncingaletterstringby theamountof increasein cross-entropyerrorwhentheunitis removedfromthenetwork.If thenetworkhadpartitionedit-self, therewouldbea negative correlationacrosshiddenunitsbetweenthe numberof exceptionwords and the numberofnonwordsto which eachhiddenunit makesa substantialcon-tribution(definedasgreaterthan0.2). In fact, for theTarabanandMcClelland (1987)exceptionwordsand a setof ortho-graphicallymatchednonwords(listed in Appendix1), thereis a moderatepositive correlationbetweenthenumbersof ex-ceptionwordsandnonwordsto whichhiddenunitscontribute( [ �]\ 25, / 98 � 2 \ 59, ^ �_\ 011; seeFigure4). Thus,someunitsaremoreimportantfor theoverall taskandsomearelessimportant,but the networkhasnot partitioneditself into onesystemthatlearnstherulesandanothersystemthatlearnstheexceptions.

Frequency and Consistency Effects. It is important toverify that, in addition to producinggoodnonwordreading,the new model replicatesthe basiceffects of frequency andconsistency in naminglatency. Like the SM89 network,thecurrentnetworktakesthesameamountof time to computethepronunciationof any letterstring. Hence,wemustalsoresortto usingan errorscoreasananalogueof naminglatency. Inparticular, wewill usethecrossentropybetweenthenetwork’sgeneratedpronunciationof a word and its correctpronunci-ation, as this is the measurethat the networkwastrainedtominimize. Later we will examine the effects of frequencyandconsistency directly in thesettlingtimeof anequivalentlytrainedrecurrentnetworkwhenpronouncingvarioustypesofwords.

Figure 5 shows the meancrossentropyerror of the net-work in pronouncingwords of varying degreesof spelling-soundconsistency asa function of frequency. Overall, high-frequency wordsproducelesserrorthanlow-frequency words( ` 1 a 184=17.1,p b .001). However, frequency interactssignifi-cantlywith consistency ( ` 3 a 184=5.65,p=.001).Post-hoccom-parisonswithin eachwordtypeseparatelyrevealthattheeffectof frequency reachessignificanceat the 0.05 level only forexceptionwords(althoughthe effect for regular inconsistentwordsis significantat 0.053).Theeffect of frequency amongall regularwords(consistentandinconsistent)justfailsto reachsignificance( ` 1 a 94=3.14,p=.08).

Thereis alsoa maineffect of consistency in theerrormade

0c 5d 10 15 20 25Number of Exception Wordse0

5

10

15

20

25

Num

ber

of N

onw

ords

r=.25, p<.05f

Figure 4. The numbersof exception words and nonwords( g � 48 for each,listed in Appendix1) to which eachhid-denunit makesa significantcontribution, asindicatedby anincreasein cross-entropyerror of at least0.2 when the unitis removed from the network. Eachcircle representsoneormorehiddenunits, with the sizeof the circle proportionaltothenumberof hiddenunitsmakingsignificantcontributionstotheindicatednumbersof exceptionwordsandnonwords.

Lowh HighiFrequencyj0.00

0.05

0.10

0.15

0.20

Cro

ss E

ntro

pykExceptionlAmbiguousmRegular InconsistentnRegular Consistentn

Figure 5. Meancross-entropyerrorproducedby thefeedfor-wardnetworkfor wordswith variousdegreesof spelling-soundconsistency (listedin Appendix1) asa functionof frequency.


by the networkin pronouncingwords( ` 3 a 184=24.1,p b .001).Furthermore,collapsedacrossfrequency, all post-hocpair-wisecomparisonsof word typesaresignificant. Specifically,regularconsistentwordsproducelesserrorthanregularincon-sistentwords,whichin turnproducelesserrorthanambiguouswords,which in turnproducelesserrorthanexceptionwords.Interestingly, the effect of consistency is significantconsid-ering only high-frequency words( ` 3 a 92=12.3, p b .001). Allpairwisecomparisonsarealsosignificantexceptbetweenex-ceptionwordsandambiguouswords. This contrastswith theperformanceof normalsubjects,who typically show little orno effect of consistency amonghigh frequency words (e.g.,Seidenberg, 1985;Seidenberg etal., 1984).

SummaryA feedforwardconnectionistnetworkwas trainedon an ex-tendedversionof the SM89 corpusof monosyllabicwords,usingorthographicandphonologicalrepresentationsthatcon-densethe regularities betweenthesedomains. After train-ing, thenetworkreadsregularandexceptionwordsflawlesslyandyet alsoreadspronounceablenonwords(Glushko,1979;McCann& Besner, 1987)essentiallyaswell asskilled read-ers. Minor discrepanciesin performancecanbe ascribedtononessentialaspectsof the simulation. Critically, the net-work hadnot segregateditself over thecourseof trainingintoseparatemechanismsfor pronouncingexception words andnonwords. Thus, the networkdirectly refutesthe claims ofdual-routetheoriststhatskilledwordreadingrequiresthesep-arationof lexical andsublexical proceduresfor mappingprintto sound.

Furthermore,theerrorproducedby thenetworkon varioustypesof words, as measuredby the crossentropybetweenthe generatedandcorrectpronunciations,replicatesthe stan-dardfindingsof frequency, consistency, andtheir interactionin the naminglatenciesof subjects(Andrews, 1982;Seiden-berg, 1985;Seidenberg et al., 1984;Taraban& McClelland,1987;Waters& Seidenberg,1985).A notableexception,how-ever, is that, unlike subjectsandthe SM89 network,the cur-rentnetworkexhibitsasignificanteffectof consistency amonghigh-frequency words.

Analytic Account of Frequency andConsistency Effects

The empirical finding that naming latenciesfor exceptionwords are slower and far more sensitive to frequency thanthosefor regularwordshasoftenbeeninterpretedasrequiringexplicit lexical representationsand grapheme-phonemecor-respondencerules. By recastingregularity effectsin termsofspelling-soundconsistency (Glushko,1979;Jaredetal.,1990),theSM89networkandtheonepresentedin theprevioussectionreproducetheempiricalphenomenawithout theseproperties.What, then,are the propertiesof thesenetworks(andof the

humanlanguagesystem,on our account)thatgive rise to theobservedpatternof frequency andconsistency effects?

The relevant empiricalpatternof resultscan be describedin the following way. In general,high-frequency wordsarenamedfasterthanlow-frequency words,andwordswith greaterspelling-soundconsistency arenamedfasterthanwordswithlessconsistency. However, theeffect of frequency diminishesasconsistency is increased,andthe effect of consistency di-minishesasfrequency is increased.A naturalinterpretationofthis patternis thatfrequency andconsistency contributeinde-pendentlyto naminglatency, but that the systemasa wholeis subjectto what might be termeda gradualceiling effect:themagnitudeof incrementsin performancedecreasesasper-formanceimproves. Thus,if eitherthe frequency or thecon-sistency of a set of words is sufficiently high on its own toproducefastnaminglatencies,increasingtheotherfactorwillyield little furtherimprovement.

A closeanalysisof theoperationof connectionistnetworksrevealsthattheseeffectsareadirectconsequenceof propertiesof theprocessingandlearningin thesenetworks—specifically,theprinciplesof Nonlinearity,Adaptivity,andDistributedRep-resentationsandKnowledgereferredto earlier. In a connec-tionist network,theweightchangesinducedby a wordduringtrainingserve to reducetheerroron thatword (andhence,bydefinition, its naminglatency). The frequency of a word isreflectedin how oftenit is presentedto thenetwork(or, asinthe previous simulation,in the explicit scalingof the weightchangesit induces).Thus,word frequency directly amplifiesweightchangesthatarehelpful to theword itself.

The consistency of the spelling-soundcorrespondencesoftwo wordsis reflectedin thesimilarity of theorthographicandphonologicalunitsthatthey activate.Furthermore,two wordswill inducesimilarweightchangesto theextentthatthey acti-vatesimilar units. Giventhat theweightchangesinducedbya word aresuperimposedon the weightchangesfor all otherwords,awordwill tendto behelpedby theweightchangesforwords whosespelling-soundcorrespondencesare consistentwith its own (and,conversely, hinderedby theweightchangesfor inconsistentwords). Thus,frequency andconsistency ef-fectscontributeindependentlyto naminglatency becausetheybotharisefrom similar weightchangesthataresimply addedtogetherduringtraining.

Overthecourseof training,themagnitudesof theweightsinthenetworkincreasein proportionto theaccumulatedweightchanges. Theseweight changesresult in correspondingin-creasesin thesummedinput to outputunitsthatshouldbeac-tive,anddecreasesin thesummedinputto unitsthatshouldbeinactive. However, dueto thenonlinearityof theinput-outputfunction of units, thesechangesdo not translatedirectly intoproportionalreductionsin error. Rather, asthemagnitudeofthesummedinputsto outputunitsincreases,theirstatesgrad-ually asymptotetowards0 or 1. As aresult,agivenincreaseinthesummedinput to aunit yieldsprogressively smallerdecre-mentsin error over the courseof training. Thus, although


wij

js

is

input

outp

ut

1.0

−1.00.0

Figure 6. A simplenetworkfor analyzingfrequency andcon-sistency effectsandthesigmoidalinput-outputfunctionof itsunits.

frequency andconsistency eachcontributeto theweights,andhenceto the summedinput to units, their effect on error issubjectedto a gradualceiling effect asunit statesaredriventowardsextremalvalues.

The Frequency-Consistency EquationToseetheeffectsof frequency andconsistency in connectionistnetworksmoredirectly, it will helpto considera networkthatembodiessomeof thesamegeneralprinciplesastheSM89andfeedforwardnetworks,but whichis simpleenoughto permitaclosed-formanalysis(followingAnderson,Silverstein,Ritz,&Jones,1977,alsoseeStone,1986). In particular, consideranonlinearnetwork without hidden units and trained with acorrelational(Hebbian)rather than error-correctinglearningrule (seeFigure6). Sucha networkis a specificinstantiationof VanOrdenetal.’s(1990)covariant learning hypothesis. Tosimplify the presentation,we will assumethat input patternsarecomposedof 1’s and0’s, outputpatternsarespecifiedintermsof � 1’sand � 1’s,connectionweightsareall initializedtozero,andunitshave nobiasterms.Wewill deriveanequationthat expressesin conciseform the effects of frequency andconsistency in thisnetworkon its responseto any giveninput.

A learningtrial involvessettingthestatesof theinput unitsto the input pattern(e.g.,orthography)for a word,settingtheoutputunitsto thedesiredoutputpattern(e.g.,phonology)fortheword,andadjustingtheweightfromeachinputunit toeachoutputunit accordingtoo �� >�UT � � � � ( 6+where T is a learningrateconstant,� � is thestateof inputunit' , � � is the stateof output unit � , and

�� is the weight on

theconnectionbetweenthem.After eachinput-outputtrainingpatternis presentedonce in this manner, the value of eachconnectionweightis simplythesumof theweightchangesforeachindividualpattern:�� p�DT �Nq � O q R� � O q R� ( 7+where indexesindividual trainingpatterns.

After training, the network’s performanceon a given testpatternis determinedby settingthestatesof theinputunitstotheappropriateinputpatternandhaving thenetworkcomputethestatesof theoutputunits. In this computation,thestateofeachoutputunit is assumedto bea nonlinear, monotonicallyincreasingfunctionof thesum,over inputunits,of thestateoftheinputunit timestheweighton theconnectionfrom it:� O QSR� �U"Grs� �D� O QSR� �� <t ( 8+where / is the testpatternand " (*),+ is thenonlinearinput-unitfunction. An exampleof sucha function,thestandardlogisticfunctioncommonlyusedin connectionistnetworks,is shownin Figure6. Theinput-outputfunctionof theoutputunitsneednot be this particularfunction,but it musthave certainof itsproperties:it mustvary monotonicallywith input,andit mustapproachits extremalvalues(here, - 1) at a diminishingrateasthemagnitudeof thesummedinput increases(positivelyornegatively). We call suchfunctionssigmoid functions.

Wecansubstitutethederivedexpressionfor eachweight��

from Equation7 into Equation8, andpull theconstantterm Tout of thesummationover ' to obtain� O QSR� �u" r T � � � O QSR� �Nq � O q R� � O q R� t ( 9+This equationindicatesthat theactivationof eachoutputunitreflectsasigmoidfunctionof thelearningrateconstantT timesa sumof terms,eachconsistingof theactivationof oneof theinput units in the testpatterntimesthe sum,over all trainingpatterns,of theactivationof theinputunit timestheactivationof theoutputunit. In ourpresentformulation,wheretheinputunit’sactivationis 1 or 0, this sumreflectstheextentto whichthe output unit’s activation tendsto be equalto 1 when theinput unit’s activation is equalto 1. Specifically, it will beexactly equalto the numberof timesthe outputunit is equalto 1 whenthe input unit is equalto 1, minusthe numberoftimes the output unit is equalto � 1 when the input unit isequalto 1. We canseefrom Equation9 that if, over anentireensembleof training patterns,thereis a consistentvalue ofthe activation of an outputunit whenan input unit is active,thentheconnectionweightsbetweenthemwill cometo reflectthis. If the training patternscomefrom a completelyregularenvironment, suchthat eachoutput’s activation dependsononly one input unit and is completelyuncorrelatedwith the


activationof everyotherinputunit, thenall theweightstoeachoutputunit will equal0 excepttheweight from theparticularinput unit on which it depends.(If the training patternsaresampledrandomlyfrom a largerspaceof patterns,thesamplewill notreflectthetruecorrelationsexactly,butwill bescatteredapproximatelynormally aroundthe true value.) Thus, thelearningprocedurediscovers which output units dependonwhich input units,andsetsthe weightsaccordingly. For ourpurposesin understandingquasi-regulardomains,in whichthedependenciesarenot sodiscretein character, theweightswillcometo reflectthedegreeof consistency betweeneachinputunit andeachoutputunit, over theentireensembleof trainingpatterns.

Equation9canbewrittenadifferentwayto reflectarelation-shipthatis particularlyrelevantto thewordreadingliterature,in which the frequency of a particularword and the consis-tency of its pronunciationwith the pronunciationsof other,similarwordsareknown to influencetheaccuracy andlatencyof pronunciation.Therearrangementexpressesa very reveal-ing relationshipbetweentheoutputat testandthesimilarity ofthetestpatternto eachinputpattern:� O QSR� �D"GrsT � q � O q R� � �D� O q R� � O QSR� t ( 10+This expressionshows the relationshipbetweenthe stateofan outputunit at testasa function of its statesduring train-ing andthe similarity betweenthe testinput patternandeachtraininginputpattern,measuredin termsof their dot product,3 � � O q R� � O QSR� . For input patternsconsistingof 1’s and0’s, thismeasureamountsto thenumberof 1’sthetwo patternshave incommon,which we refer to astheoverlap of trainingpattern^ andtestpattern/ anddesignatev O q QwR . Substitutinginto theprevious expression,we find that the stateof an outputunitat test reflectsthe sumover all trainingpatternsof the unit’soutputfor thatpatterntimestheoverlapof thepatternwith thetestpattern. � O QSR� �D"GrsT �Nq � O q R� v O q QSR t ( 11+Notice that the product � O q R� v O q QSR is a measureof the input-outputconsistency of thetrainingandtestpatterns.To seethis,supposethattheinputsfor thetrainingandtestingpatternshaveconsiderableoverlap. Then the contribution of the trainingpatterndependson the signof the outputunit’s statefor thatpattern.If thissignagreeswith thatof theappropriatestateforthetestpattern(i.e.,thetwopatternsareconsistent)thetrainingpatternwill help to move the stateof theoutputunit towardstheappropriateextremalvaluefor thetestpattern.However, ifthesignsof thestatesfor thetrainingandtestpatternsdisagree(i.e., the patternsare inconsistent),performanceon the testpatternis worsefor having learnedthetrainingpattern.As theinputfor thetrainingpatternbecomeslesssimilarto thatof the

testpattern,reducingv O q QSR , theimpactof their consistency ontestperformancediminishes.

To clarify theimplicationsof theaboveequation,it will helpto considersomesimplecases.First,supposethatthenetworkis trainedon only one pattern,and testedwith a variety ofpatterns.Thenthestateof eachoutputunit duringtestingwillbeamonotonicfunctionof itsvaluein thetrainingpatterntimestheoverlapof the trainingandtestinput patterns.As long asthereis any overlapin thesepatterns,thetestoutputwill havethe samesign as the training output,and its magnitudewillincreasewith theoverlapbetweenthetestpatternandtrainingpattern.Thus,theresponseof eachoutputunit varieswith thesimilarity of thetestpatternto thepatternusedin training.

As a secondexample,supposewe testonly on the trainingpatternitself, but vary the numberof training trials on thepattern.In thiscase,thesummationoverthe trainingpatternsin the above equationreducesto a count of the numberoftraining presentationsof the pattern. Thus, the stateof theoutputunit on thispatternwill approachits correctasymptoticvalueof - 1 asthenumberof trainingpresentationsincreases.

Finally, considerthemoregeneralcasein whichseveraldif-ferentinput-outputpatternsarepresentedduringtraining,witheachonepresentedsomenumberof times. Then,elaboratingEquation11,thestateof anoutputunit at testcanbewrittenas� O QSR� �U"GrsT �Nq ` O q R � O q R� v O q QwR t ( 12+where

O q Ris thenumber(frequency) of trainingpresentations

of pattern .We will refer to Equation12 asthe frequency-consistency

equation. Relatingthis equationto word andnonwordread-ing simply involvesidentifying the input to thenetworkwitha representationof the spellingof a word, andthe outputofthenetworkwith a representationof its pronunciation.Giventhe assumptionthat strongeractivationscorrespondto fasternaminglatencies,wecanusethefrequency-consistency equa-tion to derivepredictionsabouttherelativenaminglatenciesofdifferenttypesof words. In particular, the equationprovidesa basisfor understandingwhy naminglatency dependson thefrequency of a word, ` O q R , andtheconsistency of its spelling-soundcorrespondenceswith thoseof otherwords, � O q R� v O q QSR . Italsoaccountsfor thefact that theeffect of consistency dimin-ishesasthefrequency of theword increases(andvice versa),sincehigh-frequency wordspushthevalueof thesumoutintothetail of theinput-output function,whereinfluencesof otherfactorsarereduced(seeFigure7).

Quantitative Results with a Simple CorpusTo makethe implicationsof the frequency-consistency equa-tion moreconcrete,supposea givenoutputunit shouldhave avalueof � 1 if aword’spronunciationcontainsthevowel /I/ (asin DIVE) and � 1 if it containsthevowel /i/ (asin GIVE). Sup-posefurtherthatwehavetrainedthenetworkonasetof words


LFE

HFE

Input

Out

put

ConsistentInconsistent

High Frequency

Low Frequency

ConsistentInconsistent

HFRC

LFRC

Figure 7. A frequency-by-consistency interactionarisingoutof applyingan asymptotingoutputactivation function to theadditive input contributions of frequency (solid arrows) andconsistency (dashedarrows). Noticein particularthattheiden-tical contribution from consistency hasa muchweakereffecton high-frequency wordsthanon low-frequency words.Onlythetop half of thelogisticactivationfunctionis shown. HF =high frequency; LF = low frequency; RC = regularconsistent;E = exception.

endingin IVE which all containeither/I/ or /i/ asthevowel.Thenthefrequency-consistency equationtells usimmediatelythat theresponseto a giventestinput shouldreflectthe influ-enceof every oneof thesewordsto somedegree.Holdingallelseconstant,thehigherthe frequency of theword, themorecloselytheoutputwill approachthedesiredvalue.Holdingthefrequency of the word itself constant,the moreothersimilarwordsagreewith its pronunciation(andthe highertheir fre-quency), themorecloselytheoutputwill approachthecorrectextremalvalue.Thedistancefrom thedesiredvaluewill varycontinuouslywith thedifferencebetweenthetotal influenceoftheneighborsthatagreewith thewordandtheneighborsthatdisagree,with thecontribution of eachneighborweightedbyits similarity to the word andits frequency. Whenthe worditself hasa high frequency, it will tendto pushthe activationcloseto the correctextreme. Near the extremes,the slopeof the function relatingthe summedinput to the stateof theoutputunit becomesrelatively shallow, sotheinfluenceof theneighborsis diminished.

To illustratetheseeffects,Figure8 shows thecross-entropyerror for a particularoutputunit aswe vary the frequency oftheword beingtestedandits consistency with 10 other, over-lappingwords(alsoseeVanOrden,1987). For simplicity, weassumethatall tenwordshave afrequency of 1.0andanover-lapof 0.75with thetestword—thiswouldbetrue,for example,if inputunitsrepresentedlettersandwordsdifferedin asingle

0x

5y

10x

15y

20x

Frequencyz0.0

0.5

1.0

1.5

Cro

ss E

ntro

py{Exception|Ambiguous}Regular Inconsistent~Regular Consistent~

Figure 8. Theeffectsof frequency andconsistency in a net-workwithouthiddenunitstrainedwith correlational(Hebbian)learning( T�� 0 \ 2 in Equation12).

letterout of four. Four degreesof consistency areexamined:(a) exceptionwords(e.g.,GIVE), for which all but oneof theten neighborsdisagreewith the testword on thevalueof theoutputunit; (b) ambiguouswords(e.g.,PLOW), for which theneighborsaresplit evenly betweenthosethatagreeandthosethat disagree;(c) regular inconsistentwords(e.g.,DIVE), forwhichmostneighborsagreebuttwodisagree(namelyGIVE andLIVE); and(d) regularconsistentwords(e.g.,DUST), for whichall neighborsagreeon valueof theoutputunit. In thepresentanalysis,thesedifferentcasesarecompletelycharacterizedintermsof asinglevariable:theconsistency of thepronunciationof the vowel in the testword with its pronunciationin otherwordswith overlappingspellings.Theanalysisclearlyrevealsa gradedeffect of consistency thatdiminisheswith increasingfrequency.

Error Correction and Hidden Units

It shouldbe notedthat the Hebbianapproachdescribedheredoesnot, in fact, provide an adequatemechanismfor learn-ing the spelling-soundcorrespondencesin English. For this,we requirenetworkswith hiddenunitstrainedusinganerror-correctinglearningrulesuchasback-propagation.In thissec-tion we takesomestepsin thedirectionof extendingtheanal-ysesto thesemorecomplex cases.

First we consider the implications of using an error-correcting learning rule rather than Hebbian learning, stillwithin a network with no hidden units. Back-propagationis a generalizationof onesuchrule, known asthe delta rule(Widrow & Hoff, 1960). The first observation is that, whenusingthedeltarule,thechangein weight

�� dueto trainingon

pattern is proportionalto thestateof theinputunit, � O q R� , timesthe partial derivative of the error on pattern with respectto


thesummedinput to theoutputunit � , � O q R� , ratherthansimply

times the correctstateof unit � , � O q R� (cf. Equation6). As aresult,Equation12becomes� O QSR� �U"GrsTV�Nq ` O q R � O q R� v O q QSR t ( 13+Mattersaremorecomplex herebecause� O q R� dependson the

actualperformanceof thenetworkoneachtrial. However, � O q R�will alwayshavethesamesignas � O q R� , becauseanoutputunit’serroralwayshasthesamesignasits targetaslongasthetargetis anextremalvalueof theactivationfunction( - 1 here),andbecauseonlyunit � is affectedbyachangeto its input. Thus,asin theHebbiancase,trainingon a wordthatis consistentwiththetestwordwill alwayshelpunit ' to becorrect,andtrainingon an inconsistentword will alwayshurt, therebygiving riseto theconsistency effect.

The main differencebetweenthe Hebbrule and the deltarule is that,with thelatter, if asetof weightsexiststhatallowsthe network to producethe correctoutput for eachtrainingpattern,thelearningprocedurewill eventuallyconvergeto it.8

This is generallynot the casewith Hebbianlearning,whichoftenresultsin responsesfor somecasesthatareincorrect.Toillustratethis, we considerapplyingthe two learningrulestoa trainingsetfor which a solutiondoesexist. Thesolutionisfoundby thedeltaruleandnotby theHebbrule.

Theproblemis posedwithin theframework wehavealreadybeenexamining. The specificnetworkconsistsof 11 inputunits (with valuesof 0 and1) representinglettersof a word.Theinputunitssenddirectconnectionsto a singleoutputunitthatshouldbe � 1 if thepronunciationof thewordcontainsthevowel /I/ but � 1 if it containsthevowel /i/. Table5 showstheinputpatternsandthetargetoutputfor eachcase,aswell asthenet inputsandactivationsthat result from training with eachlearningrule. Thereare10 itemsin the trainingset,six withthebody INT andfour with thebody INE. The INE wordsalltakethevowel /I/, sofor thesethevowel hasatargetactivationof � 1; fiveof the INT wordstake/i/, sothevowel hasatargetof � 1. The INT wordsalsoincludetheexceptionword PINT

that takesthevowel /I/. For this analysis,eachword is givenanequalfrequency of 1.

Table6 lists theweightsfrom eachinput unit to theoutputunit that are acquiredafter training with eachlearningrule.For the Hebb rule, this involved 5 epochsof training usinga learningrate T�� 0 \ 1. The resultingweightsareequalto

8Actually, giventheuseof extremaltargetsandanasymptotingactivationfunction,no setof finite weightswill reducethe error to zero. In this case,a “solution” consistsof a setof weightsthatproducesoutputsthatarewithinsomespecifiedtolerance(e.g.,0.1)of thetargetvaluefor everyoutputunit inevery trainingpattern.If a solutionexiststhatproducesoutputsthatall havethe correctsign (i.e., toleranceof 1.0, given targetsof � 1), thena solutionalsoexistsfor anysmallertolerancebecausemultiplying all theweightsby alargeenoughconstantwill pushtheoutputof thesigmoidarbitrarily closetoits extremevalueswithoutaffectingits sign.

0.5 (the numberof epochstimesthe learningrate) timesthenumberof trainingitemsin which theletteris presentandthevowel is /I/, minusthenumberof itemsin which the letter ispresentandthe vowel is /i/. Specifically, the lettersL andM

occuroncewith /I/ andoncewith /i/, so their weightis 0; thelettersI andN occurfive timeswith /I/ andfive time with /i/,sotheirweightsarealso0. Final E andfinal T have thelargestmagnitudeweights; E is strongly positive becauseit occursfour timeswith /I/ andneverwith /i/, andT is stronglynegativebecauseit occursfive timeswith /i/ andonly oncewith /I/. F

is weaklypositive sinceit occursoncewith /I/, andD, H andonsetT areweaklynegative sinceeachoccursoncewith /i/. P

is moderatelypositive,sinceit occurstwice with /I/—onceinPINE andoncein PINT. Thus,theseweightsdirectly reflecttheco-occurrencesof lettersandphonemes.

Theoutputsof thenetworkwhenusingtheweightsproducedby theHebbrule, shown in Table5, illustratetheconsistencyeffect, both in net inputs and in activations. For example,thenetinput for FINE is strongerthanfor LINE, becauseLINE ismoresimilarto theinconsistentLINT; andthenetinputfor PINE

is strongerthanfor LINE, sincePINEbenefitsfrom its similaritywith PINT, which hasthesamecorrespondence.However, theweightsdo not completelysolve thetask: For theword PINT,thenetinput is � 1 \ 0 (1.0fromtheP minus2.0fromtheT), andpassingthisthroughthelogistic functionresultsin anactivationof � 0 \ 46,whichis quitedifferentfrom thetargetvalueof � 1.Whathashappenedis thatPINT’s neighborshave castslightlymorevotesfor /i/ thanfor /I/.

Now considerthe resultsobtainedusingthe deltarule. Inthis case,we trainedthe networkfor 20 epochs,againwith alearningrateof 0.1. Theoverall magnitudeof theweightsiscomparableto theHebbrulecasewith only 5 epochsbecause,with thedeltarule,theweightchangesgetsmallerastheerrorgetssmaller, andsothecumulativeeffectgenerallytendsto beless. More importantly, though,whenthe deltarule is used,thesamegeneraleffectsof consistency areobserved,but nowtheresponseto PINT, thoughweakerthanotherresponses,hastheright sign. Thereasonfor this is thatthecumulativeweightchangescausedby PINT areactuallylarger thanthosecausedby otheritems,becauseafterthefirst epoch,theerroris largerfor PINT thanfor otheritems. Error-correctinglearningeven-tually compensatesfor thisbut, beforelearninghascompletelyconverged,theeffectsof consistency arestill apparent.

Theerror-correctinglearningprocesscausesanalterationintherelativeweightingof theeffectsof neighbors,by assigninggreaterrelative weight to thoseaspectsof eachinput patternthat differentiateit from inconsistentpatterns(seeTable 6).This is why the weight tendsto accumulateon P, which dis-tinguishesPINT from the inconsistentneighborsDINT, HINT,LINT, MINT , andTINT. Correspondingly, theweightsfor D, H,andT areslightly morenegative(relativeto theHebbweights)to accentuatethedifferentiationof DINT, HINT, andTINT fromPINT. Theeffect of consistency, then,is still presentwhenthedeltarule is usedbut, preciselybecauseit makesthe biggest


Table5Input Patterns, Targets, and Activations after Training with Hebb Rule and Delta Rule

LetterInputs HebbRule DeltaRuleWord D F H L M P T I N E T Target Net Act Net ActDINT 1 0 0 0 0 0 0 1 1 0 1 � 1 � 2 \ 5 � 0 \ 85 � 2 \ 35 � 0 \ 82HINT 0 0 1 0 0 0 0 1 1 0 1 � 1 � 2 \ 5 � 0 \ 85 � 2 \ 29 � 0 \ 82LINT 0 0 0 1 0 0 0 1 1 0 1 � 1 � 2 \ 0 � 0 \ 76 � 1 \ 70 � 0 \ 69MINT 0 0 0 0 1 0 0 1 1 0 1 � 1 � 2 \ 0 � 0 \ 76 � 1 \ 70 � 0 \ 69PINT 0 0 0 0 0 1 0 1 1 0 1 � 1 � 1 \ 0 � 0 \ 46 0.86 0.41TINT 0 0 0 0 0 0 1 1 1 0 1 � 1 � 2 \ 5 � 0 \ 85 � 2 \ 25 � 0 \ 81FINE 0 1 0 0 0 0 0 1 1 1 0 � 1 2.5 0.85 3.31 0.93LINE 0 0 0 1 0 0 0 1 1 1 0 � 1 2.0 0.76 2.52 0.85MINE 0 0 0 0 1 0 0 1 1 1 0 � 1 2.0 0.76 2.52 0.85PINE 0 0 0 0 0 1 0 1 1 1 0 � 1 3.0 0.91 5.09 0.98Note: “Net” is thenetinput of theoutputunit; “Act” is its activation.

Table6Weights from Letter Units to Output Unit after Training with Hebb Rule and Delta Rule

LetterUnitsD F H L M P T I N E T

HebbRule � 0 \ 50 0.50 � 0 \ 50 0.00 0.00 1.00 � 0 \ 50 0.00 0.00 2.00 � 2 \ 00DeltaRule � 0 \ 84 0.59 � 0 \ 77 � 0 \ 19 � 0 \ 18 2.37 � 0 \ 73 0.24 0.24 2.23 � 1 \ 99


changeswherethe errorsaregreatest,the deltarule tendstocounteracttheconsistency effect.

A relatedimplicationof usingerror-correctinglearningcon-cernsthedegreeto which anoutputunit comesto dependondifferentpartsof the input. If a particularinput-outputcor-respondenceis perfectly consistent(e.g., onsetB

� /b/), sothat the stateof a givenoutputunit is predictedperfectlybythe statesof particularinput units, the deltarule will set theweightsfrom all other input units to 0, even if they arepar-tially correlatedwith the output unit. By contrast,when acorrespondenceis variable(e.g.,vowel I

� /i/ vs. /I/), so thatno inputunit on its owncanpredictthestateof theoutputunit,the deltarule will develop significantweightsfrom the otherpartsof theinput (e.g.,consonants)thatdisambiguatethecor-respondence.Thus,if thereis acomponentialcorrespondence,asfor mostconsonants,otherpartialcorrespondenceswill notbeexploited;however, whencomponentialitybreaksdown, asit oftendoeswith vowels, therewill bea greaterrelianceoncontext andthereforea greaterconsistency effect.

For sometasks,includingEnglishword reading,no setofweightsin a two-layernetworkthatmapslettersto phonemeswill work for all of the training patterns(seeMinsky & Pa-pert, 1969). In such cases,hidden units that mediatebe-tweenthe input andoutputunits areneededto achieve ade-quateperformance.9 Thingsareconsiderablymorecomplexin networkswith hiddenunits,but Equation13 still providessomeguidance.Thecomplexity comesfrom thefact that,foran output unit, v O q QSR reflectsthe similarities of the patternsof activation for trainingpattern andtestpattern/ over thehiddenunits ratherthanover theinputunits. Evenso,hiddenunits have the sametendency asoutputunits to give similaroutputto similar inputs,asthey usethesameactivationfunc-tion. In fact, Equation13 appliesto them as well if � O q R� isinterpretedasthepartialderivativeof theerrorover all outputunits with respectto the summedinput to the hiddenunit � .The valuesof particularweightsand the nonlinearityof theactivationfunctioncanmakehiddenunits relatively sensitiveto somedimensionsof similarity andrelatively insensitive toothers,andcanevenallow hiddenunitsto respondtoparticularcombinationsof inputsandnot to other, similarcombinations.Thus, from the perspective of the outputunits, hiddenunitsre-represent theinputpatternssoasto altertheir relative sim-ilarities. This is critical for learningcomplex mappingslikethosein theEnglishspelling-to-sound system.Phonemeunitsrespondon thebasisof hidden-layersimilarity, andthey mustrespondquitedifferently to exceptionwordsthanto their in-

9An alternativestrategyfor increasingtherangeof tasksthatcanbesolvedby a two-layernetwork is to addadditionalinput units that explicitly coderelevantcombinationsof the original input units(seeGluck & Bower, 1988;Marr, 1969;Rumelhart,Hinton, & Williams, 1986a,for examples). In thedomainof wordreading,suchhigher-orderunitshavebeenhand-specifiedbythe experimenteras input units (Norris, 1994),hand-specifiedbut activatedfrom the input units as a separatepathway(Reggia,Marsland,& Berndt,1988),or learnedashiddenunits in a separatepathway(Zorzi, Houghton,&Butterworth,1995).

consistentneighborsin orderfor all of themto bepronouncedcorrectly. Thus,by alteringthe effective similaritiesamonginput patterns,a networkwith hiddenunitscanovercomethelimitationsof onewith only input andoutputunits. Thepro-cessof learningto besensitive to relevant input combinationsoccursrelatively slowly, however, becauseit goesagainstthenetwork’s inherenttendency towardmakingsimilar responsesto similar inputs.

The fact that hiddenunits canbe sensitive to higher-ordercombinationsof inputunitshasimportantimplicationsfor un-derstandingbody-level consistency effects. In a one-layernetwork without hidden units, the contribution of an inputunit to the total signal received by an output unit summedover all its input is unconditional;that is the contribution ofeachinput unit is independentof the stateof the other inputunits. As mentionedearlier, however, the pronunciationsofvowelscannottypically bepredictedfrom individuallettersorgraphemes.Rather, thecorrelationsbetweenvowelgraphemesandphonemesarehighly conditionalon the presenceof par-ticular consonantgraphemes.For example,themappingfromI to /i/ is inconsistent,but themappingfrom I to /i/ is perfectlyreliablein the context of a coda consisting only of the letter N

(e.g.,PIN, WIN, THIN, etc.). In English,the predictivenessofvowels conditionalon codasis generallygreaterthanthat ofvowelsconditionalon onsets(Treimanet al., in press).Con-sequently, a multi-layer networkwill be aidedin generatingappropriatevowel pronunciationsby developinghiddenunitsthatrespondto particularcombinationsof orthographicvowelsandcodas(i.e.,wordbodies).Evenwhenthecodais takenintoaccount,however, its correlationwith thevowelpronunciationmaybelessthanperfect(e.g.,I in thecontext of NT in MINT vs.PINT). In thiscase,thechoiceof vowelmustbeconditionedbyboththeonsetandcodafor thecorrespondenceto bereliable.Becauseof the fact thathiddenunits tendto makesimilar re-sponsesto similarinputs,hiddenunitsthatrespondto anentireinputpatternandcontributetoanonstandardvowelpronuncia-tion (e.g.,I � /I/ in thecontext of P NT) will tendtobepartiallyactive whensimilar wordsarepresented(e.g.,MINT). Thesewill tendto produceinterferenceat thephonemelevel, givingrise to a consistency effect. It is importantto note,however,thatamulti-layernetworkwill exhibit consistency effectsonlywhentrainedon tasksthatareat leastpartially inconsistent—that is, quasi-regular;asin one-layernetworksusingthedeltarule, if the training environmentinvolvesonly componentialcorrespondences,hiddenunits will learnto ignore irrelevantaspectsof theinput.

In summary, abroadrangeof connectionistnetworks,whentrained in a quasi-regular environment, exhibit the generaltrendsthat have beenobserved in humanexperimentaldata:robust consistency effects that tend to diminish with experi-ence,both with specificitems(i.e., frequency) andwith theentireensembleof patterns(i.e., practice). Thesefactorsareamongthe mostimportantdeterminantsof the speedandac-curacy with whichpeoplereadwordsaloud.


Balancing Frequency and Consistency

Theresultsof theseanalysesconcurwith thefindingsin empir-ical studiesandin theSM89andfeedforwardnetworksimula-tions: thereis aneffectof consistency thatdiminisheswith in-creasingfrequency. Furthermore,detailsof theanalyticresultsarealsorevealing. In particular, theextent to which theeffectof consistency is eliminatedin high frequency wordsdependson just how frequentthey arerelative to wordsof lower fre-quency. In fact,thiseffectmayhelpto explainthediscrepancybetweenthefindingsin thefeedforwardnetworkandthoseintheSM89 network—namely, the existenceof consistency ef-fectsamonghigh-frequency wordsin theformerbut not in thelatter(andnot generallyin empiricalstudies).At first glance,it would appearthat the patternobserved in the feedforwardnetworkmatchesonein which the high-frequency wordsareof lowerfrequency relative to thelow-frequency words(e.g.,afrequency of 10 in Figure8) thanin theSM89network(e.g.,a frequency of 20). This is not literally true,however, becausethesame(logarithmicallycompressed)wordfrequencieswereusedin thetwo simulations.

A betterinterpretationis that, in the feedforwardnetwork,theeffect of consistency is strongerthanin theSM89networkand,relativeto this,theeffectof frequency appearsweaker. Asdescribedearlier, theorthographicandphonologicalrepresen-tationsusedby SM89,basedoncontext-sensitivetriplesof let-tersandphonemes,dispersetheregularitiesbetweenthewrit-tenandspokenformsof words. This hastwo relevanteffectsin thecurrentcontext. Thefirst is to reducetheextentto whichthe training on a givenword improvesperformanceon otherwords that sharethe samespelling-soundcorrespondences,and impairs performanceon words that violate thosecorre-spondences.As illustratedearlierwith thewordsLOG, GLAD,andSPLIT, eventhougha correspondencemaybe thesameina setof words, they may activatedifferentorthographicandphonologicalunits. As mentionedabove, theweightchangesinducedby oneword will helpanotheronly to theextent thatthey activatesimilar units (i.e., asa function of their overlapv O q QSR ). Thiseffect is particularlyimportantfor low-frequencyregular words, for which performancedependsprimarily onsupportfrom higherfrequency wordsratherthanfrom train-ing on the word itself. In contrast,the new representationscondensetheregularitiesbetweenorthographyandphonology,sothatweightchangesfor high-frequency wordsalsoimproveperformanceon low-frequency wordswith thesamespelling-soundcorrespondencesto a greaterextent. Thus,thereis aneffectof frequency amongregularwordsin theSM89networkbut not in the feedforwardnetwork. For the samereason,intheSM89network,performanceon anexceptionword is lesshinderedby trainingonregularwordsthatareinconsistentwithit. It is almostasif regularwordsin theSM89networkbehavelike regularinconsistentwordsin thefeedforwardnetwork,andexceptionwordsbehave like ambiguouswords: thesupportorinterferencethey receive from similar wordsis somewhat re-

duced(seeFigure9).The SM89 representationsalso reducethe effect of con-

sistency in an indirectmanner, by improving performanceonexceptionwords. This arisesbecausetheorthographicrepre-sentationscontainunitsthatexplicitly indicatethepresenceofcontext-sensitivetriplesof letters.Someof thesetriplescorre-spondto onset-vowel combinationsandto word bodies(e.g.,PIN, INT) that candirectly contribute to the pronunciationofexceptionwords(PINT). In contrast,althoughthe new ortho-graphicrepresentationscontainmultilettergraphemes,noneofthemincludebothconsonantsandvowels,or consonantsfromboth theonsetandcoda.Thus,for example,theorthographicunits for P, I, N, andT contribute independently to thehiddenrepresentations.It is only at thehiddenlayerthatthenetworkcandevelopcontext-sensitive representationsin orderto pro-nounceexceptionwordscorrectly, andit mustlearnto do thisonlyonthebasisof its exposuretowordsof varyingfrequency.

Nonetheless,it remainstruethatthepatternof frequency andconsistency effectsin theSM89networkbetterreplicatesthefindingsin empiricalstudiesthandoesthepatternin thefeed-forwardnetwork. Yet thesameskilled readersexhibit a highlevel of proficiency at readingnonwordsthatis notmatchedintheSM89network,but only in oneusingalternativerepresen-tationsthatbettercapturethespelling-soundregularities.Howcantheeffectof frequency andconsistency bereconciledwithgoodnonwordreading?

The answermay lie in the fact that both the SM89 andthe feedforwardnetworksweretrainedusingword frequencyvaluesthatarelogarithmicallycompressedfrom their truefre-quenciesof occurrencein the language.Thus,theSM89net-workreplicatestheempiricalnaminglatency patternbecauseitachievestheappropriatebalance betweentheinfluenceof fre-quency andthatof consistency, althoughbotharesuppressedrelative to theeffectsin subjects.Thissuppressionis revealedwhennonwordreadingis examined,becauseon this taskit isprimarily thenetwork’ssensitivity to consistency thatdictatesperformance.In contrast,by virtueof thenew representations,the feedforwardnetworkexhibits a sensitivity to consistencythatis comparableto thatof subjects,asevidencedby its goodnonwordreading. But now, using logarithmic frequencies,theeffectsof frequency andconsistency areunbalancedin thenetworkandit fails to replicatetheprecisepatternof naminglatenciesof subjects.

This interpretationleadsto the predictionthat the feedfor-wardnetworkshouldexhibit bothgoodnonwordreadingandtheappropriatefrequency andconsistency effectsif it is trainedonwordsusingtheiractualfrequenciesof occurrence.Thenextsimulationteststhis prediction.

Simulation 2: Feedforward Network withActual Frequencies

Themostfrequentword in theKuceraandFrancis(1967)list,THE, hasa frequency of 69971per million, while the least


1� 10�Frequency�0.0

0.1

0.2

0.3

0.4

0.5

0.6

Cro

ss E

ntro

py�Ambiguous�Regular Inconsistent�

1� 10�Frequency�0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Cro

ss E

ntro

py�Exception�Regular Consistent�

Figure 9. Data from the frequency-consistency equation(Equation12 and Figure 8) for test words of frequencies1and10,plottedseparatelyfor regularinconsistentandambigu-ouswords(uppergraph)andregularconsistentandexceptionwords(lowergraph).Theupperpatternis similarto thatfoundfor regular and exception words in the SM89 network (seeFigure2) while the lower oneis similar to thepatternfor thefeedforwardnetwork(seeFigure5). Thecorrespondencesareonly approximatedue to the simplifying assumptionsof thefrequency-consistency equation.

frequentwordshavea frequency of 1 permillion. In thetrain-ing procedureusedby SM89,theprobabilitythata word waspresentedto the networkfor training wasproportionalto thelogarithmof its frequency ratherthanitsactualfrequency. Thiscompressestheeffective frequency rangefrom about70000:1to about16:1. Thus,thenetworkexperiencesmuchlessvari-ation in the frequency of occurrenceof wordsthandonormalreaders.

SM89 put forward a numberof argumentsin favor of us-ing logarithmicallycompressedfrequenciesratherthanactualfrequenciesin trainingtheir network.Beginningreadershaveyet to experienceenoughwordsto approximatetheactualfre-quency rangein thelanguage.Also, low-frequency wordsdis-proportionatelysuffer from thelackof inflectionalandderiva-tional formsin thetrainingcorpus.However, themainreasonfor compressingthe frequency rangewasa practicalconsid-eration basedon limitations of the available computationalresources.If the highestfrequency word was presentedev-ery epoch,thelowestfrequency wordswould bepresentedonaverageonly aboutonceevery 70,000epochs. Thus, if ac-tual frequencieswereused,SM89couldnothave trainedtheirnetworklong enoughfor it to have hadsufficientexposureonlow-frequency words.

To compoundmatters,asSM89point out, basicpropertiesof the networkand training procedurealreadyserve to pro-gressively weakentheimpactof frequency over thecourseoftraining. In an error-correctingtrainingprocedurelike back-propagation,weightsarechangedonly to theextentthatdoingso reducesthe mismatchbetweenthe generatedand correctoutput.As high-frequency wordsbecomemastered,they pro-ducelessmismatchandsoinduceprogressivelysmallerweightchanges.This effect is magnifiedby the fact that,dueto theasymptoticnatureof the unit input-outputfunction, weightchangeshave smallerand smaller impact as units approachtheir correctextremalvalues. As a result, learningbecomesdominatedmostlyby lowerfrequency wordsthatarestill inac-curate,effectively compressingtherangeof frequency drivinglearningin thenetwork.

Thus,SM89consideredit importantto verify that their re-sults did not dependcritically on the use of such a severefrequency compression. They traineda versionof the net-work in which theprobabilitythata word is presentedduringan epochis basedon the square-rootof its frequency ratherthan the logarithm (resulting in a frequency rangeof about265:1 ratherthan16:1). They found the samebasicpatternof frequency andconsistency effectsin naminglatency for theTarabanandMcClelland(1987)words,althoughtherewasalargereffect of frequency amongregularwords,andvirtuallyno effect of consistency amonghigh-frequency words evenearly in training. This shift correspondspredictablyto a pat-tern in which the influenceof frequency is strongerrelativeto theinfluenceof consistency. However, SM89presentednodataonthenetwork’saccuracy in readingwordsor nonwords.

In thecurrentsimulation,we train a versionof thefeedfor-


wardnetwork(with thenew representations)usingtheactualfrequenciesof occurrenceof words.Thetrainingprocedureinthecurrentworkavoidstheproblemof samplinglow-frequencywordsby usingfrequency directly to scalethe magnitudeofthe weight changesinducedby a word—thisis equivalenttosamplingin the limit of a small learningrate,and it allowsany rangeof frequenciesto beemployed.The goal is to testthehypothesisthat,by balancingthestronginfluenceof con-sistency thatarisesfrom theuseof representationsthatbettercapturespelling-soundregularitieswith a realisticallystronginfluenceof frequency, thenetworkshouldexhibit theappro-priatepatternof frequency andconsistency effectsin naminglatency while alsoproducingaccurateperformanceon wordandnonwordpronunciation.

Method

Network Architecture. The architectureof the networkis thesameasin theSimulation1 (seeFigure3).

Training Procedure. Theonly majorchangein thetrain-ing procedurefrom Simulation1 is that, asdescribedabove,the valuesusedto scalethe error derivatives computedbyback-propagationareproportionalto theactualfrequenciesofoccurrenceof thewords(Kucera& Francis,1967)ratherthanto a logarithmiccompressionof their frequencies.FollowingSM89, the82 wordsin the trainingcorpusthatarenot listedin Kuceraand Francis(1967) wereassigneda frequency of2, andall otherswereassignedtheir listed frequency plus 2.Thesevalueswere then divided by the highestvalue in thecorpus(69973 for THE) to generatethe scalingvaluesusedduring training. Thus, the weight changesproducedby theword THE areunscaled(i.e., scalingvalueof 1.0). For com-parison,AND, thewordwith thenext highestfrequency (28860occurrencespermillion), hasavalueof 0.412.By contrast,therelativefrequenciesof mostotherwordsis extremelylow. Themeanscalingvalueacrosstheentiretrainingcorpusis 0.0020,while themedianvalueis 0.00015.TarabanandMcClelland’s(1987)high-frequency exceptionwordshave anaveragevalueof 0.014 while the low-frequency exception words average0.00036.Wordsnot in theKuceraandFrancis(1967)list havea valuejustunder3� 10

Z5.

In addition,two parametersof the trainingprocedureweremodifiedto compensatefor thechangesin word frequencies.First, theglobal learningrate, T in Equation5, wasincreasedfrom0.001to 0.05,to compensatefor thefactthatthesummedfrequency for theentiretrainingcorpusis reducedfrom 683.4to 6.05whenusingactualratherthanlogarithmicfrequencies.Second,theslight tendency for weightsto decaytowardszerowas removed, to prevent the very small weight changesin-ducedby low-frequency words(dueto theirverysmallscalingfactors)from beingovercomeby the tendency of weightstoshrinktowardszero.

Otherthanfor thesemodifications,thenetworkwastrainedin exactly thesamewayasin Simulation1.

Testing Procedure. The procedurefor testing the net-work’s procedureon words andnonwordsis the sameas inSimulation1.

ResultsWord Reading. As the weight changescausedby low-

frequency wordsareso small, considerablymore training isrequiredto reachapproximatelythesamelevelof performanceaswhenusinglogarithmicallycompressedfrequencies.After1300 epochsof training, the networkmispronouncesonly 7wordsin the corpus: BAS, BEAU, CACHE, CYST, GENT, TSAR,andYEAH (99.8%correct,wherehomographswereconsideredcorrect if they elicited either correctpronunciation). Thesewordshaveratherinconsistentspelling-soundcorrespondencesandhaveverylow frequencies(i.e.,anaveragescalingvalueof9 \ 0� 10

Z5). Thus,thenetworkhasmasteredall of theexception

wordsexcepta few of thevery lowestin frequency.Nonword Reading. Table7 lists the errorsmadeby the

networkin pronouncingthe lists of nonwordsfrom Glushko(1979)and from McCannandBesner(1987). The networkproduces“regular” responsesto 42/43(97.7%)of Glushko’sconsistentnonwords,39/43(67.4%)of the inconsistentnon-words, and 66/80 (82.5%)of McCannand Besner’s controlnonwords.Usinga criterionthatmorecloselycorrespondstothat usedwith subjects—consideringa responsecorrectif itis consistentwith thepronunciationof a word in the trainingcorpus(andnot consideringinflectednonwordsor thosewithJ in the coda)—thenetworkachieves 42/43 (97.7%)correcton both the consistentandinconsistentnonwords,and68/76(89.5%)correcton thecontrolnonwords.Thus,thenetwork’sperformanceon thesesetsof nonwordsis comparableto thatof subjectsandto that of the networktrainedon logarithmicfrequencies.

Frequencyand Consistency Effects. Figure10showsthemeancrossentropyerrorof thenetworkin pronouncingwordsof varyingdegreesof spelling-soundconsistency asafunctionof frequency. Thereis amaineffectof frequency ( ` 1 a 184=22.1,p b .001),a maineffect of consistency ( ` 3 a 184=6.49,p b .001),andaninteractionof frequency andconsistency ( ` 1 a 184=5.99,p b .001). Posthoc comparisonsshow that the effect of fre-quency is significantat the 0.05 level amongwordsof eachlevel of consistency whenconsideredseparately.

Theeffectof consistency issignificantamonglow frequencywords ( ` 3 a 92=6.25, p=.001) but not amonghigh-frequencywords ( ` 3 a 92=2.48, p=.066). Posthoc comparisonsamonglow-frequency wordsrevealedthat thedifferencein errorbe-tween exception words and ambiguouswords is significant( ` 1 a 46=4.09,p=.049),thedifferencebetweenregularconsistentand inconsistentwords is marginally significant( ` 1 a 46=3.73,p=.060),but thedifferencebetweenambiguouswordsandreg-ular inconsistentwordsfails to reachsignificance( ` 1 a 46=2.31,p=.135).

Overall, this patternof resultsmatchesthe one found inempirical studiesfairly well. Thus, with a training regime


Table7Errors by the Feedforward Network Trained with Actual Frequencies in PronouncingNonwords


ConsistentNonwords(1/43) ControlNonwords(14/80)*WOSH /waS/ /woS/ TUNCE /t ns/ /tUns/

InconsistentNonwords(14/43) *TOLPH /tolf/ /tOl(f 0.13)/BLEAD /blEd/ /bled/ *ZUPE /zUp/ /(z 0.09)yUp/BOST /bost/ /bOst/ SNOCKS /snaks/ /snask(ks0.31)/COSE /kOz/ /kOs/ *GOPH /gaf/ /gaT/GROOK /grUk/ /gruk/ *VIRCK /vurk/ /(v 0.13)urk/

*HEAF /hEf/ /h@f/ LOKES /lOks/ /lOsk(ks0.00)/HOVE /hOv/ /h v/ *YOWND /yWnd/ /(y 0.04)and/LOME /lOm/ /l m/ KOWT /kWt/ /kOt/PILD /pild/ /pIld/ *FUES /fyUz/ /fyU(z 0.45)/PLOVE /plOv/ /pl v/ *HANE /hAn/ /h@n/POOT /pUt/ /put/ FAIJE /fAj/ /fA(j 0.00)/POVE /pOv/ /p v/ *ZUTE /zUt/ /(z 0.01)yUt/SOOD /sUd/ /sud/ JINJE /jinj/ /jIn(j 0.00)/WEAD /wEd/ /wed/WONE /wOn/ /w n/

Note: /a/ in POT, /@/ in CAT, /e/ in BED, /i/ in HIT, /o/ in DOG, /u/ in GOOD, /A/ in MAKE, /E/ in KEEP, /I/ inBIKE, /O/ in HOPE, /U/ in BOOT, /W/ in NOW, /Y/ in BOY, / � / in CUP, /N/ in RING, /S/ in SHE, /C/ in CHIN /Z/ inBEIGE, /T/ in THIN, /D/ in THIS. Theactivity levelsof correctbut missingphonemesarelistedin parentheses.In thesecases,theactualresponseis whatfalls outsidetheparentheses.Wordsmarkedwith “*” remainerrorsafterconsideringpropertiesof thetrainingcorpus(asexplainedin thetext).


Low� High�Frequency�0.00

0.05

0.10

0.15

0.20

Cro

ss E

ntro

py�Exception�Ambiguous�Regular Inconsistent�Regular Consistent�

Figure 10. Meancross-entropyerror producedby the feed-forwardnetworktrainedon actualfrequenciesfor wordswithvariousdegreesof spelling-soundconsistency (listed in Ap-pendix1) asa functionof frequency.

that balancesthe influenceof frequency andconsistency, thenetworkreplicatesthepatternof interactionof thesevariableson naminglatency while alsoreadingwordsandnonwordsasaccuratelyasskilled readers.

Training with a Moderate Frequency Compres-sionAsSM89argued,trainingwith theactualfrequenciesof mono-syllabic words might not provide the bestapproximationtotheexperienceof readers.For example,sincemany multisyl-labicwordshaveconsistentspelling-soundcorrespondences—bothin their baseformsandwith their variousinflectionsandderivations—training with only monosyllabicwordswill un-derestimatea reader’s exposureto spelling-soundregularities.Trainingwith a compressedfrequency rangecompensatesforthis bias becauseexception words tend to be of higher fre-quency than regular words and, thus, aredisproportionatelyaffectedby thecompression.

We have seenthata very severe(logarithmic)compressionreducestheeffectof frequency tosuchanextentthatanetworkusingrepresentationsthatamplify consistency effectsfails toexhibit theexact patternof naminglatenciesfoundin empir-ical studies. Nonetheless,it would seemappropriateto testwhethera lesssevere compressionresultsin a bettermatchto theempiricalfindings. As mentionedearlier, SM89foundthat presentingwordsduring training with a probability pro-portional to the square-rootof their frequency replicatesthebasicfrequency andconsistency effectsin their network,butthey presentedno dataon the accuracy of the network’s per-formance.Accordingly, it seemedworthwhilefor comparisonpurposesto train a networkwith thenew representationsalsousinga square-rootcompressionof wordfrequencies.

Analogousto theuseof actualfrequencies,thescalingvaluefor eachword wasthe square-rootof its KuceraandFrancis(1967) frequency plus 2, divided by the square-rootof thefrequency of THE plus2 (264.5). Thevaluefor AND is 0.642.The meanfor the corpusis 0.023and the medianis 0.012.Tarabanand McClelland’s (1987) high-frequency exceptionwordsaverage0.097while thelow-frequency exceptionwordsaverage0.017.Wordsnot in theKuceraandFrancis(1967)listhave a valueof 0.0053. Thus,the compressionof frequencyis muchlessseverethanwhenusinglogarithmsbut it is stillsubstantial.

The summedfrequency of the training corpusis 69.8; ac-cordingly, the global learningrate, � , was adjustedto 0.01.Thetrainingprocedureis otherwiseidenticalto thatusedwhentrainingon theactualwordfrequencies.

Word Reading. After 400 epochs, the network pro-nouncescorrectlyall wordsin the training corpusexcept forthe homographHOUSE, for which the statesof both the final/s/ andthe final /z/ just fail to beactive (/s/: 0.48,/z/: 0.47).Thus,thenetwork’swordreadingis essentiallyperfect.

Nonword Reading. The network makesno errors onGlushko’s (1979)consistentnonwords. On the inconsistentnonwords,14 of thenetwork’s responsesarenon-regular, butall but oneof these(POVE � /pav/) areconsistentwith someword in the training corpus(97.7% correct). The networkmispronounces13 of McCann and Besner’s (1987) controlnonwords. However, only 7 of theseremainaserrorswhenusingthesamescoringcriterionaswasusedwith subjectsandignoring inflectedforms andthosewith J in the coda(90.8%correct). Thus,thenetworktrainedwith square-rootfrequen-ciespronouncesnonwordsaswell, if not slightly better, thanthenetworktrainedwith actualfrequencies.

Frequencyand Consistency Effects. Figure11showsthemeancrossentropyerrorof thenetworkin pronouncingwordsof varyingdegreesof spelling-soundconsistency asafunctionof frequency. Overall, thereis asignificanteffectof frequency( � 1 � 184=47.7, p � .001), consistency, ( � 1 � 184=14.9, p � .001),and interactionof frequency and consistency ( � 3 � 184=8.409,p � .001). The effect of frequency is also significantat the0.05 level amongwords of eachlevel of consistency whenconsideredseparately. Amonghigh-frequency words,regularinconsistent,ambiguous,andexceptionwordsaresignificantlydifferentfromregularconsistentwordsbutnotfromeachother.Among low-frequency words, the differencebetweenregu-lar inconsistentwordsandambiguouswordsis not significant( � 1 � 46=1.18,p=.283)but all otherpairwisecomparisonsare.Thus,thisnetworkalsoreplicatesthebasicempiricalfindingsof theeffectsof frequency andconsistency onnaminglatency.

SummaryThe SM89 simulationreplicatesthe empiricalpatternof fre-quency andconsistency effectsby appropriatelybalancingtherelativeinfluencesof thesetwofactors.Unfortunately,botharereducedrelative to their strengthin skilled readers.The fact


Low� High�Frequency�0.00

0.02

0.04

0.06

0.08

0.10

Cro

ss E

ntro

py�Exception�Ambiguous�Regular Inconsistent�Regular Consistent�

Figure 11. Meancross-entropyerror producedby the feed-forwardnetworktrainedonsquare-rootfrequenciesfor wordswith variousdegreesof spelling-soundconsistency (listed inAppendix1) asa functionof frequency.

thattheorthographicandphonologicalrepresentationsdispersetheregularitiesbetweenspellingandsoundservesto diminishtherelative impactof consistency. Likewise,theuseof a log-arithmiccompressionof theprobabilityof wordpresentationsservesto diminishtheimpactof frequency. As a resultof thereducedeffectivenessof consistency, nonwordreadingsuffers.

The currentwork usesrepresentationsthat bettercapturespelling-soundregularities,therebyincreasingtherelative in-fluenceof consistency. Oneeffectof thisisto improvenonwordreadingto a level comparableto thatof skilled readers.How-ever, if a logarithmicfrequency compressioncontinuesto beused,therelative impactof frequency is tooweakandthenet-workexhibitsconsistency effectsamonghigh-frequency wordsnot foundin empiricalstudies.

The appropriaterelative balanceof frequency andconsis-tency canberestored,while maintaininggoodnonwordread-ing, by usingtheactualfrequenciesof wordsduringtraining.In fact,asquare-rootfrequency compressionthatismuchmoremoderatethat a logarithmiconealsoreplicatesthe empiricalnaminglatency pattern,althougha consistency effect amonghigh-frequency wordsbeginsto emerge. In thisway, thethreenetworkspresentedthusfar—trainedon logarithmicfrequen-cies,square-rootfrequencies,or actualfrequencies—provideclearpointsof comparisonof the relative influencesof wordfrequency andspelling-soundconsistency on naminglatency.Togetherwith theanalyticalresultsfrom theprevioussection,thesefindingssuggestthat the centralempiricalphenomenain word andnonwordreadingcanbe interpretednaturallyintermsof thebasicprinciplesof operationof connectionistnet-worksthatareexposedto anappropriatelystructuredtrainingcorpus.

Simulation 3: Interactivity,Componential Attractors, and

Generalization

Asoutlinedearlier, thecurrentapproachto lexicalprocessingisbasedonanumberof generalprinciplesof informationprocess-ing, looselyexpressedby the acronym GRAIN (for Graded,Random,Adaptive,Interactive,andNonlinear).Togetherwiththe principlesof distributedrepresentationsand knowledge,the approachconstitutesa substantialdeparturefrom tradi-tional assumptionsabout the natureof languageknowledgeandprocessing(e.g.,Pinker, 1991).It mustbenoted,however,that the simulationspresentedso far involve only determin-istic, feedforwardnetworks,and thusfail to incorporatetwoimportantprinciples: interactivity and randomness(intrinsicvariability). In part, this simplification has beennecessaryfor practical reasons;interactive, stochasticsimulationsarefar more demandingof computationalresources.More im-portantly, includingonly someof the relevant principlesin agivensimulationenablesmoredetailedanalysisof thespecificcontributionthateachmakesto theoverallbehavior of thesys-tem. Thishasbeenillustratedmostclearlyin thecurrentworkwith regardto thenatureof thedistributedrepresentationsusedfor orthographyandphonology, andtherelative influencesoffrequency and consistency on network learning(adaptivity).Nonetheless,eachsuchnetworkconstitutesonly anapproxi-mationorabstractionof amorecompletesimulationthatwouldincorporateall of theprinciples.Themethodologyof consid-eringsetsof principlesseparatelyreliesontheassumptionthatthereareno unforeseen,problematicinteractionsamongtheprinciples,suchthat the findingswith simplified simulationswouldnot generalizeto morecomprehensive ones.

The currentsimulationinvestigatesthe implicationsof in-teractivity for the processof pronouncingwritten wordsandnonwords.Interactivity playsanimportantrole in connection-ist explanationsof anumberof cognitivephenomena(McClel-land& Elman,1986;McClelland& Rumelhart,1981;McClel-land,1987),andconstitutesamajorpointof contentionwith al-ternativetheoreticalformulations(Massaro,1988,1989).Pro-cessingin a network is interactive when units can mutuallyconstraineachotherin settlingonthemostconsistentinterpre-tationof theinput. For this to bepossible,thearchitectureofthe networkmustbegeneralizedto allow feedbackor recur-rent connectionsamongunits. For example,in theInteractiveActivationmodelof letterandwordperception(McClelland&Rumelhart,1981;Rumelhart& McClelland,1982),letterunitsandwordunitsarebidirectionallyconnectedsothatthepartialactivationof awordunitcanfeedbacktosupporttheactivationof letterunitswith which it is consistent.

A commonwayin whichinteractivity hasbeenemployedinnetworksis in makingparticularpatternsof activity into stableattractors. In an attractornetwork,units interactandupdatetheir statesrepeatedlyin sucha way that theinitial patternof


activity generatedby an input graduallysettlesto the nearestattractorpattern.A usefulwayof conceptualizingthisprocessis in termsof a multidimensionalstate space in which theactivity of eachunit is plottedalonga separatedimension.Atany instantin time, thepatternof activity over all of theunitscorrespondsto a singlepoint in this space. As units changetheirstatesin responsetoagiveninput,thispointmovesin statespace,eventuallyarrivingatthe(attractor)pointcorrespondingto thenetwork’s interpretation.Thesetof initial patternsthatsettleto thissamefinal patterncorrespondsto a regionaroundthe attractor, called its basin of attraction. To solve a task,thenetworkmustlearnconnectionweightsthatcauseunitstointeractin suchawaythattheappropriateinterpretationof eachinput is anattractorwhosebasincontainstheinitial patternofactivity for thatinput.

In the domain of word reading,attractorshave playedacritical role in connectionistaccountsof thenatureof normalandimpairedreadingviameaning(Hinton& Sejnowski,1986;Hinton & Shallice,1991;Plaut& Shallice,1993). Accordingto theseaccounts,the meaningsof wordsarerepresentedintermsof patternsof activity over a largenumberof semanticfeatures.Thesefeaturescansupportstructured,frame-likerep-resentations(e.g.,Minsky,1975)if unitsrepresentconjunctionsof rolesandpropertiesof role-fillers(Hinton,1981;Derthick,1990). As only a small fractionof thepossiblecombinationsof featurescorrespondto the meaningsof actualwords,it isnaturalfor a networkto learnto makethesesemanticpatternsinto attractors.Then,in deriving themeaningof a word fromits orthography, thenetworkneedonly generateaninitial pat-tern of activity that falls somewherewithin the appropriatesemanticattractorbasin;thesettlingprocesswill cleanup thispatterninto the exact meaningof the word.10 If, however,thesystemis damaged,theinitial activity for a wordmayfallwithin a neighboringattractorbasin,typically correspondingto a semantically-relatedword. The damagednetwork willthensettleto theexactmeaningof thatword,resultingin ase-manticerror(e.g.,CAT readas“dog”). In fact, theoccurrenceof sucherrorsis thehallmarksymptomof a typeof acquiredreadingdisorderknown asdeep dyslexia (seeColtheart,Pat-terson,& Marshall,1980, for moredetailson the full rangeof symptomsof deepdyslexia, and Plaut & Shallice,1993,for connectionistsimulationsreplicatingthesesymptoms).Inthis way, attractorsobviate the needfor word-specificunitsin mediatingbetweenorthographyandsemantics(seeHinton,McClelland,& Rumelhart,1986,for discussion).

Whenappliedto themappingfrom orthographyto phonol-ogy, however, theuseof interactivity to form attractorswould

10This characterizationof deriving word meaningsis necessarilyoversim-plified. Wordswith multiple,distinctmeaningswouldmapto oneof anumberof separatesemanticattractors.Shadesof meaningacrosscontextscouldbeexpressedby semanticattractorsthat areregions in semanticspaceinsteadof singlepoints. Notice that thesetwo conditionscanbe seenasendsof acontinuuminvolving variousdegreesof similarity andvariability amongthesemanticpatternsgeneratedby a word acrosscontexts(alsoseeMcClelland,St.John,& Taraban,1989).

appearproblematic.In particular, thecorrectpronunciationofa nonwordtypically doesnot correspondto thepronunciationof someword. If the network developsattractorsfor wordpronunciations,onemight expectthattheinput for a nonwordwouldoftenbecapturedwithin theattractorbasinfor asimilarword, resultingin many incorrectlexicalizations. More gen-erally, attractorswould seemto beappropriateonly for tasks,suchassemanticcategorizationor objectrecognition,in whichthe correctresponseto a novel input is a familiar output. Bycontrast,in oral reading,thecorrectresponseto a novel inputis typically a novel output. If it is true that attractorscan-notsupportthis lattersortof generalization,theirapplicabilityin readingspecifically, andcognitive sciencemoregenerally,wouldbefundamentallylimited.

Thecurrentsimulationdemonstratesthattheseconcernsareill-founded, andthat, with appropriatelystructuredrepresen-tations,theprincipleof interactivity canoperateeffectively inthephonologicalpathwayaswell asin thesemanticpathway(seeFigure1). Thereasonis that, in learningto maporthog-raphyto phonology, the networkdevelopsattractorsthat arecomponential—they have substructurethat reflectscommonsublexical correspondencesbetweenorthographyandphonol-ogy. Thissubstructureappliesnotonly to mostwordsbut alsoto nonwords,enablingthemto be pronouncedcorrectly. Atthe sametime, the networkdevelopsattractorsfor exceptionwords that are far lesscomponential. Thus, ratherthan be-ing a hindrance,attractorsarea particularlyeffective styleofcomputationfor quasi-regulartaskssuchaswordreading.

A further advantageof an attractornetwork over a feed-forwardnetworkin modelingword readingis that the formerprovidesa moredirectanalogueof naminglatency. Thusfar,wehavefollowedSM89in usinganerrormeasurein afeedfor-wardnetworktoaccountfor naminglatency datafromsubjects.SM89 offer two justificationsfor this approach.The first isbasedontheassumptionthattheaccuracy of thephonologicalrepresentationof awordwoulddirectlyinfluencetheexecutionspeedof the correspondingarticulatorymotor program(seeLacouture,1989;Zorzi et al., 1995,for simulationsembody-ing this assumption).This assumptionis consistentwith theview that the time requiredby the orthography-to-phonologycomputationitself doesnotvarysystematicallywith wordfre-quency or spelling-soundconsistency. If this werethecase,afeedforwardnetworkof thesortSM89andwehaveused,whichtakesthesameamountof timetoprocessany input,wouldbeareasonablerenditionof thenatureof thephonologicalpathwayin subjects.

An alternative justification for the use of error scorestomodelnaminglatencies,mentionedonly briefly by SM89, isbasedon the view that the actualcomputationfrom orthog-raphyto phonologyinvolvesinteractive processing,suchthatthetimetosettleonanappropriatephonologicalrepresentationdoesvarysystematicallywith wordtype.Thenaminglatenciesexhibited by subjectsarea functionof this settlingtime, per-hapsin conjunctionwith articulatoryeffects. Accordingly, a


100 hidden units

61 phoneme units

105 grapheme units

Figure 12. Thearchitectureof the attractornetwork. Ovalsrepresentgroupsof units,andarrows representcompletecon-nectivity from onegroupto another.

feedforwardimplementationof themappingfromorthographyto phonologyshouldbeviewedasanabstractionof arecurrentimplementationthat would moreaccuratelyapproximatetheactualword readingsystem.Studyingthefeedforwardimple-mentationis still informative becausemany of its properties,includingits sensitivity to frequency andconsistency, dependoncomputationalprinciplesof operationthatwouldalsoapplyto arecurrentimplementation—namely, adaptivity, distributedrepresentationsandknowledge,andnonlinearity. Theseprin-ciplesmerelymanifestthemselvesdifferently: influencesthatreduceerror in a feedforwardnetworkserve to accelerateset-tling in a recurrentnetwork. Thus, error in a feedforwardnetworkis a valid approximationof settlingtime in a recur-rentnetworkbecausethey botharisefrom thesameunderlyingcauses:additivefrequency andconsistency effectsin thecon-text of a nonlineargradualceiling effect. Nonetheless,evengiventhesearguments,it is importantto verify thatarecurrentimplementationthat readswordsandnonwordsasaccuratelyasskilled readersalsoreproducesthe relevant empiricalpat-ternof naminglatenciesdirectly in thetimeit takesto settleinpronouncingwords.

Method

Network Architecture. The architectureof the attractornetwork is shown in Figure12. The numbersof grapheme,hidden,andphonemeunitsarethesameasin thefeedforwardnetworks,but the attractornetworkhassomeadditionalsetsof connections. Each input unit is still connectedto eachhidden unit which, in turn, is connectedto eachphonemeunit. In addition, eachphonemeunit is connectedto eachotherphonemeunit (including itself), andeachphonemeunitsendsa connectionbackto eachhiddenunit. Theweightsonthe two connectionsbetweena pair of units (e.g., a hiddenunit anda phonemeunit) aretrainedseparatelyandneednothave identicalvalues. Including thebiasesof thehiddenandphonemeunits,thenetworkhasatotalof 26,582connections.

Thestatesof unitsin thenetworkchangesmoothlyovertimein responseto influencesfrom otherunits. In particular, the

0.0 0.5 1.0 1.5 2.0Time (t)

0.5

0.6

0.7

0.8

0.9

1.0

Uni

t Sta

te

input = 16�input = 8�input = 4�input = 2�input = 1�

Figure 13. Thestateovertimeof acontinuousunit, initializedto 0.5andgovernedby Equation14,whenpresentedwith fixedexternal input from other units of varying magnitude. Thecurvesof statevaluesfor negative externalinputaretheexactmirror imagesof thesecurves,approaching0 insteadof 1.

instantaneouschangeover time of the input ¡8¢ to unit £ isproportionalto thedifferencebetweenits currentinputandthesummedcontributionfrom otherunits.

d¡�¢d ¥¤§¦V¨ª© ¨¬«�¨ ¢® ¯�¢±°²¡8¢ ³ 14

The state © ¢ of unit £ is µ�¶¬¡ ¢<· , the standardlogistic func-tion of its integratedinput, that rangesbetween0 and1 (seeEquation2). For clarity, we will call thesummedinput fromotherunits ¸ (plusthebias)theexternal input to eachunit, todistinguishit from the integrated input thatgovernstheunit’sstate.

According to Equation14, when a unit’s integrated in-put is perfectly consistentwith its external input (i.e., ¡ ¢ =¹ ¨ © ¨ « ¨ ¢ ¯ ¢ ), thederivative is zeroandtheunit’s integratedinput, andhenceits state,ceasesto change. Notice that itsactivity at thispoint, µ�¶ ¹ ¨ © ¨ « ¨ ¢ º¯ ¢L· , is exactly thesameasit would be if it werea standardunit that computesits statefrom the external input instantaneously(as in a feedforwardnetwork; seeEquations1 and 2). To illustrate this, and toprovide somesenseof the temporaldynamicsof units in thenetwork,Figure13showstheactivity overtimeof asingleunit,initialized to 0.5andgovernedby Equation14, in responsetoexternal input of varying magnitude.Notice that, over time,theunit stategraduallyapproachesanasymptoticvalueequalto thelogistic functionappliedits externalinput.

For the purposesof simulationon a digital computer, it isconvenientto approximatecontinuousunitswith finite differ-enceequations,in which time is discretizedinto ticks of some


duration » : ¼ ¡�¢ ¤ »±½ ¦ ¨D© ¨�«�¨ ¢®¾¯�¢¿°À¡�¢LÁwhere

¼ ¡�¢ ¤ ¡VÂ ÃSÄ¢ °u¡ÅÂ Ã�ÆÈÇ*Ä¢ . Using explicit superscriptsfordiscretetime, thiscanberewrittenas¡ Â ÃSÄ¢ ¤ »±½ ¦ ¨D© ¨�«�¨ ¢®¾¯�¢<Áu (1 °X» ) ¡ Â Ã�ÆsÇ�Ä¢ ³ 15

Accordingto this equation,a unit’s input at eachtime tick isa weightedaverageof its currentinput and that dictatedbyother units, where » is the weighting proportion.11 Noticethat,in thelimit (as »�É 0) this discretecomputationbecomesidenticalto thecontinuousone. Thus,adjustmentsto » affecttheaccuracy with which thediscretesystemapproximatesthecontinuousone,but do not alter the underlyingcomputationbeingperformed.Thisis of considerablepracticalimportance,as the computationaltime requiredto simulatethe systemisinverselyproportionalto » . A relatively larger » canbeusedduringtheextensive trainingperiod(0.2in thecurrentsimula-tion),whenminimizingcomputationtimeis critical,whereasamuchsmaller» canbeusedduringtesting(e.g.,0.01),whenavery accurateapproximationis desired.As long as » remainssufficiently smallfor theapproximationsto beadequate,thesemanipulationsdo not fundamentallyalter the behavior of thesystem.

Training Procedure. Thetrainingcorpusfor thenetworkis the sameasusedwith the feedforwardnetworktrainedonactualword frequencies.As in thatsimulation,thefrequencyvalueof eachwordis usedto scaletheweightchangesinducedby theword.

Thenetworkis trainedwith a versionof back-propagationdesignedfor recurrentnetworks,known asback-propagationthrough time (Rumelhart,Hinton,& Williams, 1986a,1986b;Williams & Peng, 1990), further adaptedfor continuousunits(Pearlmutter, 1989). In understandingback-propagationthroughtime, it mayhelpto think of thecomputationin stan-dardback-propagationin athreelayerfeedforwardnetworkasoccurringover time. In the forward pass,the statesof inputunitsareclampedat time ¤ 0. Hiddenunit statesarecom-putedat ¤ 1 from theseinput unit states,andthenoutputunit statesarecomputedat ¤ 2 from thehiddenunit states.In the backwardpass,error is calculatedfor the outputunitsbasedon their states( ¤ 2). Error for the hiddenunits andweightchangesfor the hidden-to-outputconnectionsarecal-culatedbasedon theerrorof theoutputunits ( ¤ 2) andthestatesof hiddenunits( ¤ 1). Finally, theweightchangesfor

11Thesetemporaldynamicsaresomewhatdifferentfrom thoseof thePlautandMcClelland(1993,Seidenberg et al., 1994) network. In that network,eachunit’s input wassetinstantaneouslyto thesummedexternalinput fromotherunits;theunit’sstatewasaweightedaverageof its currentstateandtheonedictatedby its instantaneousinput.

the input-to-hiddenconnectionsare calculatedbasedon thehiddenunit error ( ¤ 1) and the input unit states( ¤ 0).Thus,feedforwardback-propagationcanbe interpretedasin-volving apassforwardin timetocomputeunit states,followedby a passbackwardin time to computeunit errorandweightchanges.

Back-propagationthroughtime hasexactly thesameform,except that, becausea recurrentnetwork can have arbitraryconnectivity, eachunit canreceive contributionsfrom any unitatany time,notjustfromthosein earlierlayers(for theforwardpass)or later layers(for thebackwardpass).This meansthateachunit must storeits stateanderror at eachtime tick, sothat thesevaluesareavailableto otherunitswhenneeded.Inaddition,thestatesof non-inputunitsaffectthoseof otherunitsimmediately, so they needto be initialized to someneutralvalue (0.5 in the currentsimulation). In all other respects,back-propagationthroughtime is computationallyequivalentto feedforwardback-propagation.In fact, back-propagationthrough time can be interpretedas “unfolding” a recurrentnetworkinto a muchlarger feedforwardnetworkwith a layerfor eachtimetick composedof aseparatecopyof all theunitsintherecurrentnetwork(seeMinsky & Papert,1969;Rumelhart,Hinton,& Williams, 1986a,1986b).

In orderto applyback-propagationthroughtime to contin-uousunits,thepropagationof errorin thebackwardpassmustbemadecontinuousaswell (Pearlmutter, 1989). If weuse Ê ¢to designatethederivativeof theerrorwith respectto theinputof unit £ , then,in feedforwardback-propagation:Ê ¢ ¤_ËÅÌË © ¢ µVÍ ¶ ¡ ¢ ·where Ì is the cross-entropyerror function and µ Í ³*Î,´ is thederivative of the logistic function. In thediscreteapproxima-tion to back-propagationthroughtime with continuousunits,this becomesÊ%Â ÃwÄ¢ ¤ » ËÅÌË © Â ÃÐÏsÇ�Ä¢ µVÍ<ÑL¡VÂ ÃÐÏsÇ*Ä¢ Ò (1 °X» ) Ê%Â ÃÐÏsÇ�Ä¢Thus, Ê�¢ is aweightedaveragebackwardsin timeof its currentvalueandthecontributionfromthecurrenterrorof theunit. Inthis way, asin standardback-propagation,ÊÓ¢ in thebackwardpassis analogousto ¡8¢ in theforwardpass(cf. Equation15).

As outputunitscaninteractwith otherunitsover thecourseof processinga stimulus,they can indirectly affect the errorfor other output units. As a result, the error for an outputunit becomesthe sumof two terms: the errordueto the dis-crepancy betweenits own stateand its target, and the errorback-propagatedto it from otherunits. Thefirst termis oftenreferredto as error that is injected into the network by thetrainingenvironment,while thesecondtermmight bethoughtof aserrorthatis internal to thenetwork.

Given that the statesof outputunits vary over time, theycanhave targetsthat specifywhat statesthey shouldbe in at


particularpointsin time. Thus,in back-propagationthroughtime, errorcanbe injectedat any or all time ticks, not just atthe lastoneasin feedforwardback-propagation.Targetsthatvary over time definea trajectorythat the outputstateswillattemptto follow (seePearlmutter, 1989,for a demonstrationof this type of learning). If the targetsremainconstantovertime, however, the output units will attemptto reachtheirtargetsasquickly aspossibleandremainthere. In thecurrentsimulation,weusethis techniqueto train thenetworkto formstableattractorsfor thepronunciationsof wordsin thetrainingcorpus.

It is possiblefor the statesof units to changequickly ifthey receive a very largesummedinput from otherunits (seeFigure 13). However, even for rather large summedinput,units typically requiresomeamountof time to approachanextremal value, and may never completelyreach it. As aresult, it is practicallyimpossiblefor units to achieve targetsof 0 or 1 immediatelyafterastimulushasbeenpresented.Forthis reason,in thecurrentsimulation,a lessstringenttrainingregime is adopted.Although thenetworkis run for 2.0 unitsof time,erroris injectedonly for thesecondunit of time;unitsreceive no direct pressureto be correct for the first unit oftime (althoughback-propagatedinternalerror causesweightchangesthatencourageunitsto move towardstheappropriatestatesasearlyaspossible).In addition,outputunitsaretrainedto targetsof 0.1and0.9ratherthan0 and1,althoughnoerrorisinjectedif aunit exceedsits target(e.g.,reachesastateof 0.95for a targetof 0.9). This trainingcriterioncanbeachievedbyunitswith only moderatelylargesummedinput (seethecurvefor input= 4 in Figure13).

As with the feedforwardnetworkusingactualfrequencies,theattractornetworkwastrainedwithagloballearningrate� ¤0 Ô 05(with adaptive connection-specificrates)andmomentumÕ ¤ 0 Ô 9. Furthermore,asmentionedabove, thenetworkwastrainedusingadiscretization» ¤ 0 Ô 2. Thus,unitsupdatetheirstates10 times(2 Ô 0Ö 0 Ô 2) in the forwardpass,andthey back-propagateerror10timesin thebackwardpass.As aresult,thecomputationaldemandsof the simulationareabout10 timesthat of oneof the feedforwardsimulations. In an attempttoreducethe training time, momentumwas increasedto 0.98after 200 epochs. To improve the accuracy of the network’sapproximationto acontinuoussystemneartheendof training,» wasreducedfrom 0.2 to 0.05 at epoch1800,andreducedfurther to 0.01at epoch1850for an additional50 epochsoftraining. During this final stageof training,eachunit updatedits state200timesover thecourseof processingeachinput.

Testing Procedure. A fully adequatecharacterizationofresponsegenerationin distributed connectionistnetworkswould involve stochasticprocessing(seeMcClelland,1991)and, thus, is beyond the scopeof the presentwork. As anapproximationin a deterministicattractornetwork,we useameasureof the time it takesthe networkto computea stableoutputin responseto a giveninput. Specifically, thenetworkrespondswhentheaveragechangein thestatesof thephoneme

unitsfalls below somecriterion(0.00005with » ¤ 0 Ô 01for theresultsbelow).12 At this point, thenetwork’s naminglatencyis theamountof continuoustime thathaspassedin processingtheinput,andits namingresponseis generatedon thebasisofthecurrentphonemestatesusingthesameprocedureasfor thefeedforwardnetworks.

ResultsWord Reading. After 1900epochsof training, the net-

work pronouncescorrectlyall but 25of the2998wordsin thetrainingcorpus(99.2%correct).Abouthalf of theseerrorsareregularizationsof low-frequency exceptionwords(e.g.,SIEVE� /sEv/,SUEDE � /swEd/,TOW � /tW/). Mostof theremainingerrorswould beclassifiedasvisualerrors(e.g.,FALL � /folt/,GORGE� /grOrj/, HASP � /h@ps/)althoughfour merelyhaveconsonantsthat failed to reachthreshold(ACHE � /A/, BEIGE� /bA/, TZAR � /ar/,WOUND � /Und/). All in all, thenetworkhascomecloseto masteringthe trainingcorpus,althoughitsperformanceis slightly worsethanthatof theequivalentfeed-forwardnetwork.

Even thoughthe networksettlesto a representationof thephonemesof a word in parallel,the time it takesto do so in-creaseswith the lengthof theword. To demonstratethis, weenteredthenaminglatenciesof thenetworkfor the2973wordsit pronouncescorrectlyinto amultiple linearregression,usingaspredictors(a) orthographiclength(i.e., numberof letters),(b) phonologicallength(i.e., numberof phonemes),(c) loga-rithmic word frequency, and(d) a measureof spelling-soundconsistency equalto thenumberof friends(includingtheworditself)dividedby thetotalnumberof friendsandenemies;thus,highlyconsistentwordshavevaluesnear1andexceptionwordshave valuesnear0. Collectively, the four factorsaccountfor15.9%of the variancein the latency values( � 4 � 2968=139.92;p � .001).Moreimportantly,all four factorsaccountfor signif-icantuniquevarianceafterfactoringout theotherthree(9.9%,5.6%,0.8%,and0.1%for consistency, log-frequency, ortho-graphiclength,andphonologicallength,respectively, p � .05for each). In particular, orthographiclengthis positively cor-relatedwith naminglatency (semipartialr=.089)andaccountsuniquelyfor 0.8%of its variance( � 1 � 2968=40.0,p � .001). Toconvert this correlationinto an increasein RT per letter, thenetwork’s meanRTs for the TarabanandMcClelland(1987)high- and low-frequency exception words and their regularconsistentcontrolswereregressedagainstthe subjectmeansreportedby Tarabanand McClelland, resulting in a scalingof 188.5msecper unit of simulationtime (with an interceptof 257 msec). Given this scaling,the effect of orthographiclengthin thenetworkis4.56msec/letterbasedonitssemipartialcorrelationwith RT (after factoringout theotherpredictors),and 7.67 msec/letterbasedon its direct correlationwith RT(r=.139). Lengtheffectsof this magnitudeareat the low end

12This specificcriterionwaschosenbecauseit givesriseto meanresponsetimesthatarewithin the2.0unitsof timeoverwhichthenetworkwastrained;othercriteriaproducequalitatively equivalentresults.


of therangefoundin empiricalstudies,althoughsucheffectscanvary greatlywith subjects’readingskill (Butler & Hains,1979)andwith thespecificstimuli andtestingconditionsused(seeHenderson,1982).

Nonword Reading. Table8 lists the errorsmadeby thenetworkin pronouncingthe lists of nonwordsfrom Glushko(1979)andfromMcCannandBesner(1987).Thenetworkpro-duces“regular” pronunciationsto 40/43(93.0%)of Glushko’sconsistentnonwords,27/43(62.8%)of the inconsistentnon-words, and 69/80 (86.3%)of McCannand Besner’s controlnonwords. If we acceptascorrectany pronunciationthat isconsistentwith thatof a word in the trainingcorpuswith thesamebody (and ignore inflectedwords and thosewith J inthecoda),thenetworkpronouncescorrectly42/43(97.7%)ofthe inconsistentnonwords,and68/76(89.5%)of the controlnonwords. Although the performanceof the networkon theconsistentnonwordsis somewhatworsethanthatof thefeed-forwardnetworks,it is aboutequalto thelevel of performanceGlushko(1979) reportedfor subjects(93.8%; seeTable 3).Thus,overall, theability of theattractornetworkto pronouncenonwordsis comparableto thatof skilled readers.

Frequency and Consistency Effects. Figure 14 showsthe meanlatenciesof the network in pronouncingwords ofvaryingdegreesof spelling-soundconsistency asafunctionoffrequency. Oneof the low-frequency exceptionwordsfromthe Tarabanand McClelland (1987) list was withheld fromthis analysisas it is pronouncedincorrectlyby the network(SPOOK� /spuk/). Among the remainingwords, there aresignificantmain effects of frequency ( � 3 � 183=25.0, p � .001)andconsistency ( � 3 � 183=8.21, p � .001), anda significantin-teractionof frequency andconsistency ( � 3 � 183=3.49,p=.017).Theseeffectsalsoobtainin a comparisonof only regularandexceptionwords(frequency: � 1 � 91=10.2,p=.002;consistency:� 1 � 91=22.0, p � .001; frequency-by-consistency: � 1 � 91=9.31,p=.003). Consideringeachlevel of consistency separately,the effect of frequency is significant for exception words( � 1 � 45=11.9, p=.001)and for ambiguouswords ( � 1 � 46=19.8,p=.001) and marginally significant for regular inconsistentwords( � 1 � 46=3.51,p=.067). Thereis no effect of frequencyamongregularwords(F � 1).

Thenaminglatenciesof thenetworkshow asignificanteffectof consistency for low-frequency words( � 3 � 91=6.65,p � .001)butnotfor high-frequency words( � 3 � 91=1.71,p=.170).Amonglow-frequency words, regular consistentwords are signifi-cantly differentfrom eachof the otherthreetypesat p � .05,but regular inconsistent,ambiguous,andexceptionwordsarenot significantlydifferentfrom eachother(althoughthecom-parisonbetweenregular inconsistentandexceptionwords issignificantatp=.075).Amonghigh-frequency words,noneofthepairwisecomparisonsis significantexceptbetweenregularandexceptionwords( � 1 � 46=4.87,p=.032). Thus,overall, thenaminglatenciesof thenetworkreplicatethestandardeffectsof frequency andconsistency asfoundin empiricalstudies.

Low× HighØFrequencyÙ1.65

1.70

1.75

1.80

1.85

1.90

1.95

Nam

ing

Late

ncy

ExceptionÚAmbiguousÛRegular InconsistentÜRegular ConsistentÜ

Figure 14. Naminglatency of theattractornetworktrainedonactualfrequenciesfor wordswith variousdegreesof spelling-soundconsistency (listedin Appendix1) asa functionof fre-quency.

Network Analyses

The network’s successat word reading demonstratesthat,throughtraining, it hasdevelopedattractorsfor the pronun-ciationsof words.How thenis it capableof readingnonwordswith novel pronunciations?Why isn’t theinput for anonword(e.g.,MAVE) capturedby theattractorfor anorthographicallysimilar word (e.g., GAVE, MOVE, MAKE)? We carriedout anumberof analysesof thenetworkto gainabetterunderstand-ing of its ability to readnonwords.Becausenonwordreadinginvolvesrecombiningknowledgederivedfromwordpronunci-ation,wewereprimarily concernedwith how separatepartsoftheinputcontributeto (a)thecorrectnessof partsof theoutput,and(b) thehiddenrepresentationfor theword. Aswith naminglatency, theitemSPOOKwaswithheldfrom theseanalysesasitis mispronouncedby thenetwork.

Componential Attractors. The first analysismeasuresthe extent to which eachphonologicalcluster(onset,vowel,coda)dependson the input from eachorthographiccluster.Specifically, for eachword,theactivity of theactivegraphemeunitsin aparticularorthographicclusterwasgraduallyreduceduntil, whenthenetworkwasrerun,thephonemesin aparticularphonologicalclusterwereno longercorrect.13 This boundaryactivity level measureshow importantinput from a particularorthographicclusteris to thecorrectnessof aparticularphono-logical cluster;a valueof 1 meansthat thegraphemesin thatclustermustbecompletelyactive; a valueof 0 meansthatthephonemesarecompletelyinsensitive to thegraphemesin thatcluster. In statespace,theboundarylevel correspondsto theradiusof theword’sattractorbasinalongaparticulardirection(assumingstatespaceincludesdimensionsfor the grapheme

13Final E wasconsideredto bepartof theorthographicvowel cluster.


Table8Errors by the Attractor Network in Pronouncing Nonwords


ConsistentNonwords(3/43) ControlNonwords(11/80)*HODE /hOd/ /hOdz/ *KAIZE /kAz/ /skwAz/*SWEAL /swEl/ /swel/ *ZUPE /zUp/ /zyUp/*WOSH /waS/ /wuS/ *JAUL /jol/ /jOl/

InconsistentNonwords(16/43) *VOLE /vOl/ /vOln/BLEAD /blEd/ /bled/ *YOWND /yWnd/ /(y 0.04)Ond/BOST /bost/ /bOst/ KOWT /kWt/ /kOt/COSE /kOz/ /kOs/ *VAWX /voks/ /voNks/COTH /koT/ /kOT/ FAIJE /fAj/ /fA(j 0.00)/GROOK /grUk/ /gruk/ *ZUTE /zUt/ /zyUt/LOME /lOm/ /l Ý m/ *YOME /yOm/ /yam/MONE /mone/ /mÝ n/ JINJE /jinj/ /jIn(j 0.00)/PLOVE /plOv/ /plUv/POOT /pUt/ /put/

*POVE /pOv/ /pav/SOOD /sUd/ /sud/SOST /sost/ /sOst/SULL /sÝ l/ /sul/WEAD /wEd/ /wed/WONE /wOn/ /w Ý n/WUSH /w Ý S/ /wuS/

Note: /a/ in POT, /@/ in CAT, /e/ in BED, /i/ in HIT, /o/ in DOG, /u/ in GOOD, /A/ in MAKE, /E/ in KEEP, /I/in BIKE, /O/ in HOPE, /U/ in BOOT, /W/ in NOW, /Y/ in BOY, / Þ / in CUP, /N/ in RING, /S/ in SHE, /C/ inCHIN /Z/ in BEIGE, /T/ in THIN, /D/ in THIS. Theactivity levelsof correctbut missingphonemesarelistedin parentheses.In thesecases,theactualresponseis whatfalls outsidetheparentheses.Wordsmarkedwith “*” remainerrorsafterconsideringpropertiesof thetrainingcorpus(asexplainedin thetext).


units).This procedurewas applied to the Tarabanand McClel-

land (1987) regular consistent,regular inconsistent,and ex-ceptionwords,aswell asto thecorrespondingsetof ambigu-ouswords(seeAppendix1). Wordswereexcludedfrom theanalysisif they lackedan orthographiconsetor coda(e.g.,ARE, DO). Theresultingboundaryvaluesfor eachcombinationof orthographicand phonologicalclustersweresubjectedtoan ANOVA with frequency andconsistency asbetween-itemfactorsand orthographicclusterand phonologicalclusteraswithin-itemfactors.

With regardto frequency, high-frequency wordshave lowerboundaryvaluesthanlow-frequency words(0.188vs. 0.201,respectively; � 1 � 162=6.48,p=.012). However, frequency doesnot interactwith consistency ( � 3 � 162=2.10, p=.102)nor withorthographicorphonologicalcluster( � 2 � 324=1.49,p=.227;and� 2 � 324=2.46, p=.087, respectively). Thus, we will considerhigh- and low-frequency wordstogetherin the remainderoftheanalysis.

Thereisastrongeffectof consistency ontheboundaryvalues( � 3 � 162=14.5,p � .001),andthiseffectinteractsbothwith ortho-graphiccluster( � 6 � 324=16.1,p � .001) andwith phonologicalcluster( � 6 � 324=20.3,p � .001). Figure15 presentstheaverageboundaryvaluesof eachorthographicclusterasa functionofphonologicalcluster, separatelyfor wordsof eachlevel of con-sistency. Thus,for eachtypeof word, thesetof barsfor eachphonologicalclusterindicateshow sensitive thatclusteris toinputfromeachorthographiccluster. Consideringregularcon-sistentwordsfirst,thefigureshowsthateachphonologicalclus-terdependsalmostentirelyon thecorrespondingorthographiccluster, andlittle if atall ontheotherclusters.For instance,thevowelandcodagraphemescanbecompletelyremovedwithoutaffectingthenetwork’spronunciationof theonset.Thereis aslight interdependenceamongthevowel andcoda,consistentwith the fact that word bodiescaptureimportantinformationin pronunciation(see,e.g.,Treiman& Chafetz,1987;Treimanet al., in press).Nonetheless,neitherthe phonologicalvowelnor codaclusterdependson the orthographiconsetcluster.Thus,for a regularword like MUST, analternative onset(e.g.,N) canbesubstitutedandpronouncedwithoutdependingonoraffectingthepronunciationof thebody(producingthecorrectpronunciationof thenonwordNUST).

Similarly, for regular inconsistent,ambiguous,andexcep-tion words,thecorrectnessof thephonologicalonsetandcodais relatively independentof non-correspondingparts of theorthographicinput. The pronunciationof the vowel, how-ever, is increasinglydependenton the orthographicconso-nantsas consistency decreases(main effect of consistency:� 3 � 166=47.7,p � .001;p � .05 for all pairwisecomparisons).Infact, most spelling-soundinconsistency in English involvesunusualvowel pronunciations. Interestingly, for exceptionwords,the vowel pronunciationis lesssensitive to the ortho-graphicvowel itself thanit is to the surrounding(consonant)context (orthographiconsetvs. vowel: � 1 � 41=8.39, p=.006;

Onsetß Vowelà Codaá0.0

0.1

0.2

0.3

0.4

0.5

0.6

Ort

hogr

aphi

c A

ctiv

ity B

ound

ary

âOrthographic OnsetãOrthographic VowelãOrthographic Codaã

Regular Consistent (n=46)ä

Onsetå Vowelæ Codaç0.0

0.1

0.2

0.3

0.4

0.5

0.6

Ort

hogr

aphi

c A

ctiv

ity B

ound

ary

è Regular Inconsistent (n=42)é

Onsetå Vowelæ Codaç0.0

0.1

0.2

0.3

0.4

0.5

0.6

Ort

hogr

aphi

c A

ctiv

ity B

ound

ary

è Ambiguous (n=38)ê

Onsetë Vowelì CodaíPhonological Clusterî0.0

0.1

0.2

0.3

0.4

0.5

0.6

Ort

hogr

aphi

c A

ctiv

ity B

ound

ary

ï Exception (n=42)ð

Figure 15. Thedegreeof activity in eachorthographicclus-ter requiredto activate eachphonologicalcluster correctly,for wordsof variousspelling-soundconsistency (listedin Ap-pendix1). Wordslacking eitheran onsetor codaconsonantclusterin orthographywereexcludedfrom theanalysis.


codavs.vowel: � 1 � 41=6.97,p=.012).Thismakessense,astheorthographicvowel in anexceptionword is a misleadingindi-catorof the phonologicalvowel. Thus,in contrastto regularconsistentwords,wordswith ambiguousor exceptionalvowelpronunciationsdependon theentireorthographicinput to bepronouncedcorrectly.

Theseeffects canbe understoodin termsof the natureoftheattractorsthatdevelopwhentrainingon differenttypesofwords.Therelativeindependenceof theonset,vowel,andcodacorrespondencesindicatesthat theattractorbasinsfor regularwordsconsistof threeseparate,orthogonalsub-basins(oneforeachcluster). Whena word is presented,the networksettlesinto theregionin statespacewherethesethreesub-basinsover-lap,correspondingtotheword’spronunciation.However, eachsub-basincanapplyindependently, sothat“spurious”attractorbasinsexist wherethe sub-basinsfor partsof wordsoverlap(seeFigure16). Eachof thesecombinationscorrespondstoa pronounceablenonword that the network will pronouncecorrectlyif presentedwith theappropriateorthographicinput.Thiscomponentialityarisesdirectlyoutof thedegreeto whichthenetwork’srepresentationsmakeexplicit thestructureof thetask. By minimizing theextent to which informationis repli-cated,the representationscondensethe regularitiesbetweenorthographyandphonology. Only smallportionsof theinputandoutputarerelevantto a particularregularity, allowing it tooperateindependentlyof otherregularities.

Theattractorbasinsfor exceptionwords,by contrast,arefarlesscomponentialthanthosefor regularwords(unfortunately,this cannotbe depictedadequatelyin a two-dimensionaldi-agram such as Figure 16). In this way, the network canpronounceexception words and yet still generalizewell tononwords. It is importantto note, however, that the attrac-tors for exception words are noncomponentialonly in theirexceptionalaspects—notin a monolithic way. In particular,while theconsonantclustersin (most)exceptionwordscom-bine componentially, the correctvowel phonemedependsonthe entire orthographicinput. Thus, a word like PINT is insomesensethree-quartersregular, in that its consonantcorre-spondencescontributeto thepronunciationsof regularwordsandnonwordsjust like thoseof other items. The traditionaldual-routecharacterizationof a lexical “look-up” procedurefor exceptionwordsfails to do justiceto thisdistinction.

The Development of Componentiality in Learning. Wecangaininsight into thedevelopmentof this componentialityby returningto the simple, two-layer Hebbiannetwork thatformedthe basisfor the frequency-consistency equation(seeFigure6; alsoseeVan Ordenet al., 1990,for relateddiscus-sion). As expressedby Equation7, thevalueof eachweight«�¨ ¢ in thenetworkis equalto the sumover trainingpatterns,weightedby the learningrate,of the productof the stateofinput unit ¸ andthestateof outputunit £ . Patternsfor whichtheinput stateis 0 do not contributeto thesum,andthoseforwhich it is 1 contributethevalueof theoutputstate,which iseither 1 or ° 1 in this formulation. Thus, the valueof the

trained attractor (word)spurious attractor (nonword)

N

B

O Y

"no"

"by"

"ny"

"bo"

Vowel

On

set

D"dy"

Figure 16. A depictionof how componentialattractorsforwordscanrecombineto supportpronunciationsof nonwords.Theattractorbasinsfor wordsconsistof orthogonalsub-basinsfor eachof its clusters(only two aredepictedhere).Spuriousattractorsfor nonwordsexist wherethe sub-basinsfor partsof wordsoverlap. To supportthenoncomponentialaspectsofattractorsfor exceptionwords (e.g., DO), the sub-basinsforvowelsin theregionof therelevantconsonantclustersmustbedistortedsomewhat (into dimensionsin statespaceotherthantheonesdepicted).


weightcanbere-expressedin termsof two counts:thenumberof consistentpatterns,ñ Â òsó ô5Ä , in whichthestatesof units ¸ and£ areboth positive, andthe numberof inconsistentpatterns,ñ Â õ ó ô Ä , in which ¸ is positivebut £ is negative.«�¨ ¢ ¤ �öÑNñ Â ò ó ô Ä °÷ñ Â õ ó ô Ä ÒIf patternsdiffer in their frequency of occurrence,thesecountssimply becomecumulative frequencies(seeEquation12); forclarityof presentation,weleavethisouthere(seeReggiaetal.,1988,for asimulationbaseddirectlyon thesefrequencies).

Now considera word like PINT � /pInt/. Over theentiresetof words,theonsetPand/p/ typicallyco-occur(butnotalways;cf. PHONE), sothat ñ Â òøó ôÓÄ is largeand ñ Â õùó ô�Ä is small,andtheweightbetweenthesetwo unitsbecomesstronglypositive. Bycontrast,/p/ never co-occurswith, for example, an onsetK

(i.e., ñ Â òøó ô5Ä ¤ 0 and ñ Â õùó ô�Ä is large), leadingto a stronglynegative weightbetweenthem. For onsetlettersthatcanco-occur with /p/ and P, suchas L, ñ Â òøó ôÓÄ is positive and theresultingweight is thus lessnegative. Going a stepfurther,onset/p/ canco-occurwith virtually any orthographicvowelandcoda,so ñ Â òøó ôÓÄ for eachrelevantconnectionsis largerandtheweightis closerto zero.Actually, giventhateachphonemeis inactive for most words, its weights from graphemesinnon-correspondingclusterswill tend to becomemoderatelynegative whenusingHebbianlearning. With error-correctinglearning, however, theseweights remainnearzero becausethe weightsbetweencorrespondingclustersare sufficient—and more effective, due to the higher unit correlations—foreliminatingtheerror. Thesesamepropertiesholdfor /n/and/t/in thecoda.Thus,theunit correlationsacrosstheentirecorpusgive rise to a componentialpatternof weightsfor consonantphonemes,with significantvaluesonly on connectionsfromunitsin thecorrespondingorthographiccluster(seeBrousse&Smolensky, 1989,for additionalrelevantsimulations).

The situationis a bit more complicatedfor vowels. Firstof all, there is far more variability acrosswords in thepronunciationof vowels as comparedwith consonants(seeVenezky, 1970).Consequently, for connectionsbetweenvowelgraphemesand phonemes,generally ñ Â òøó ôÓÄ is smaller andñ Â õùó ôÓÄ is largerthanfor thecorrespondingonsetandcodacon-nections.Themorecritical issueconcernsexceptionalvowelpronunciationsin words like PINT. Here, for the I—/I/ cor-respondence,the small ñ Â òøó ôÓÄ is overwhelmedby the largeñ Â õùó ôÓÄ that comesfrom the muchmorecommonI—/i/ corre-spondence(in which /I/ hasa stateof ° 1). Furthermore,withHebbianlearning,the correlationsof /i/ with the consonantsP, N, and T are too weak to help. Error-correctinglearningcancompensateto somedegree,by allowing theweightsfromtheseconsonantunits to grow larger than dictatedby corre-lation underthe pressureto eliminateerror. Note that thisreducesthe componentialityof the vowel phonemeweights.Suchcross-clusterweightscannotprovide a generalsolutionto pronouncingexceptionwords,however, because,in a di-versecorpus, the consonantsmust be able to co-exist with

many othervowel pronunciations(e.g.,PUNT, PANT). In orderfor a networkto achieve correctpronunciationsof exceptionwords while still maintainingthe componentialityfor regu-lar words(andnonwords),error-correctionmustbecombinedwith theuseof hiddenunits in orderto re-representthesimi-laritiesamongthewordsin awaythatreducestheinterferencefrom inconsistentneighbors(asdiscussedearlier).

Internal Representations. The first analysisestablishedthecomponentialityof theattractorsfor regularwordsbehav-iorally, andthesecondshowedhow it arisesfromthenatureoflearningin asimpler, relatedsystem.Weknow thatsimultane-ouslysupportingthelesscomponentialaspectsof wordreadingin thesamesystemrequireshiddenunitsanderrorcorrection,but wehave yet to characterizehow this is accomplished.Themostobviouspossibilitywouldbetheoneraisedfor thefeed-forwardnetworks—thatthenetworkhaspartitioneditself intotwo sub-networks:a fully componentialonefor regularwords(andnonwords),anda muchlesscomponentialonefor excep-tion words. As before,however, this doesnot seemto bethecase.If we applythecriterionthata hiddenunit is importantfor pronouncingan item if its removal increasesthe total er-ror on the item by more than0.1, thenthereis a significantpositive correlationbetweenthenumbersof exceptionwordsandthenumbersof orthographically-matchednonwords(listedin Appendix1) for which hiddenunits are important( ú =.71, 98=9.98,p � .001). Thus,the hiddenunits have not becomespecializedfor processingparticulartypesof items.

The questionsremains, then, as to how the attractornetwork—asa singlemechanism—implementscomponentialattractorsfor regular words (and nonwords)and less com-ponentialattractorsfor exceptionwords. A secondanalysisattemptsto characterizethedegreeto which hiddenrepresen-tationsfor regular versusexceptionwords reflect the differ-encesin the componentialityof their attractors.Specifically,weattemptedto determinetheextentto whichthecontributionthat an orthographicclustermakesto the hiddenrepresenta-tion dependson thecontext in whichit occurs—thisshouldbelessfor wordswith morecomponentialrepresentations.Forexample,considertheonsetP in anexceptionword like PINT.Whenpresentedby itself, theonsetneedonly generateits ownpronunciation.Whenpresentedin the context of INT, the P

mustalsocontribute to alteringthe vowel from /i/ to /I/. Bycontrast,in a regular word like PINE, the onsetP plays thesamerolein thecontext of INE aswhenpresentedin isolation.Thus,if thehiddenrepresentationsof regularwordsaremorecomponentialthanthoseof exceptionwords,thecontributionof anonset(P) shouldbemoregreatlyaffectedby thepresenceof anexceptioncontext ( INT) thanby aregularcontext ( INE).

We measuredthecontributionof anorthographicclusterina particularcontext by first computingthehiddenrepresenta-tion generatedby theclusterwith thecontext (e.g.,PINT), andsubtractingfrom this (unit by unit) as a baselinecondition,thehiddenrepresentationgeneratedby thecontext alone(e.g.,

INT). Thecontributionof a clusterin isolationwascomputed


Onsetû Vowelü CodaýOrthographic Clusterþ0.5

0.6

0.7

0.8

0.9

1.0

Con

trib

utio

n C

orre

latio

n

ÿRegular Words (n=46)�Exception Words (n=43)�

Figure 17. The similarity (correlations)of the contributionthateachorthographicclustermakesto thehiddenrepresenta-tion in thecontext of theremainingclustersversusin isolation,for the TarabanandMcClelland(1987)exceptionwordsandtheir regularconsistentcontrolwords.

similarly, exceptthatthebaselineconditionin this caseis therepresentationgeneratedby thenetworkwhenpresentedwithno input (i.e.,all graphemeunitsaresetto 0). Thecorrelationbetweenthesetwo vectordifferenceswasusedasa measureof the similarity of the contribution of the clusterin the twoconditions. A high correlationindicatesthat thecontributionof a clusterto thehiddenrepresentationis independentof thepresenceof otherclusters,andhence,reflectsa highdegreeofcomponentiality.

Thesecontribution correlationswerecomputedseparatelyfor theonset,vowel, andcodaclustersof theTarabanandMc-Clelland(1987)exceptionwordsandtheir frequency-matchedregularconsistentcontrolwords.Wordslackingeitheranonsetor a codawerewithheld from the analysis. The correlationsfor the remainingwordsweresubjectedto an ANOVA withfrequency andconsistency asbetween-itemfactorsandortho-graphicclusteras a within-item factor. Therewasno maineffect of frequency ( � 1 � 85, p=.143)nor any significantinter-actionof frequency with consistency or orthographiccluster(F � 1 for both) so this factor is not consideredfurther. Fig-ure17showstheaveragecorrelationsfor regularandexceptionwordsasa functionof orthographiccluster.

Thereis nosignificantinteractionof consistency with ortho-graphiccluster(F � 1). Thereis, however, a significantmaineffect of cluster( � 2 � 170=16.1,p � .001), with the vowel clus-ter producinglower correlationsthan either consonantclus-ter (vowel vs. onset: � 1 � 88=26.8, p � .001; vowel vs. coda:� 1 � 88=21.0,p � .001). More importantly, regular wordshavehigher correlationsthan exception words [means(standarddeviations): 0.828(0.0506)vs. 0.795(0.0507),respectively;� 1 � 85=20.7,p � .001]. Thus,thecontributionsof orthographic

clustersto thehiddenrepresentationsaremoreindependentofcontext in regularwordsthanin exceptionwords.In thissense,the representationsof regular wordsaremorecomponential.What is surprising,however, is that the averagecorrelationsfor exceptionwords,thoughlowerthanthoseof regularwords,arestill quitehigh,andthereis considerableoverlapbetweenthe distributions. Furthermore,the representationsfor regu-lar words are not completelycomponential,given that theircorrelationsaresignificantlylessthan1.0.

Apparently, the hidden representationsof words onlyslightly reflect their spelling-soundconsistency. An alterna-tive possibility is that theserepresentationscapturepredom-inantly orthographic informationacrossa rangeof levels ofstructure(from individualgraphemesto combinationsof clus-ters;cf. Shallice& McCarthy, 1985).If thiswerethecase,thelow-orderorthographicstructureaboutindividual graphemesandclusterscouldsupportcomponentialattractorsfor regularwords.Thepresenceof higher-orderstructurewouldmaketherepresentationof clustersin bothregularandexceptionwordssomewhatsensitive to context in which they occur. More im-portantly, thishigher-orderstructurewouldbeparticularusefulfor pronouncingexceptionwords,by overridingat thephono-logical layer the standardspelling-soundcorrespondencesofindividual clusters. In this way, noncomponentialaspectsoftheattractorsfor exceptionwordscouldco-exist with compo-nentialattractorsfor regularwords.

To provide evidencebearingon this explanation,a finalanalysiswascarriedout to determinethe extent to which thehidden representationsare organizedon the basisof ortho-graphic(asopposedto phonological)similarity. The hiddenrepresentationsfor a setof itemsareorganizedorthographi-cally (or phonologically)to theextent thatpairsof itemswithsimilar hiddenrepresentationshave similar orthographic(orphonological)representations.Put more generally, the setsrepresentationsover two groupsof unitshave thesamestruc-tureto theextentthatthey inducethesamerelativesimilaritiesamongitems.

To control for the contribution of orthographyasmuchaspossible,theanalysisinvolved48 triples,eachconsistingof anonword,a regular inconsistentword,andanexceptionwordthat all sharethe samebody (e.g.,PHINT, MINT , PINT; listedin Appendix1). For eachitem in a triple, we computedthesimilarity of its hiddenrepresentationwith the hiddenrepre-sentationsof all of the other items of the sametype (mea-suring similarity by the correlationof unit activities). Thesimilarities amongorthographicrepresentationsand amongphonologicalrepresentationswerecomputedanalogously. Theorthographic,hidden,andphonologicalsimilarity valuesforeach item were then correlatedin a pairwise fashion (i.e.,orthographic-phonological, hidden-orthographic, andhidden-phonological).Figure18 presentsthemeansof thesecorrela-tion valuesfor nonwords,regularwords,andexceptionwords,asa functionof eachpairof representationtypes.

First considerthecorrelationbetweentheorthographicand


Orth-Phon� Hidden-Orthû Hidden-PhonûRepresentation Pair�0.5

0.6

0.7

0.8

0.9

1.0

Cor

rela

tion

Nonwords�Regular Words�Exception Words�

Figure 18. Thecorrelationsamongorthographic,hidden,andphonologicalsimilaritiesfor body-matchednonwords,regularinconsistentwords,andexceptionwords(listedin Appendix1).

phonologicalsimilarities. Thesevaluesreflect the relativeamountsof structurein the spelling-soundmappingsfor dif-ferent typesof items. All of the valuesare relatively highbecauseof thesystematicityof Englishword pronunciations;even within exceptionwords, the consonantclusterstendtomap consistently. Nonetheless,the mappingsfor exceptionwordsarelessstructuredthanfor nonwordsor regularwords(paired 47=5.48,p � .001;and 47=5.77,p � .001,respectively).In otherwords,orthographicsimilarity is lessrelatedtophono-logicalsimilarity for exceptionwordsthanfor theotheritems.In asense,thisis thedefiningcharacteristicof exceptionwordsand,thus,the finding simply verifiesthat the representationsusedin the simulationshave the appropriatesimilarity struc-ture.

Themoreinterestingcomparisonsarethosethatinvolvethehiddenrepresentations.As Figure18 shows, the similaritiesamongthehiddenrepresentationsof all typesof itemsaremuchmorehighlycorrelatedwith theirorthographicsimilaritiesthanwith their phonologicalsimilarities (p � .001 for all pairwisecomparisons).The representationsof nonwordsandregularwordsbehave equivalentlyin this regard. Therepresentationsof exceptionwordsshow theeffectevenmorestrongly, havingsignificantly less phonologicalstructurethan the other twoitem types(exceptionvs. nonword: paired 47=2.81,p=.007;exceptionvs. regular: paired 47=3.22,p=.002). This maybedueto therelianceof thesewordson high-orderorthographicstructureto overridestandardspelling-soundcorrespondences.Overall, consistentwith the explanationoffered above, thehidden representationsare organizedmore orthographicallythanphonologically.

SummaryInteractivity, andits usein implementingattractors,is an im-portantcomputationalprinciplein connectionistaccountsof awiderangeof cognitivephenomena.Althoughthetendency ofattractorstocapturesimilarpatternsmightappeartomaketheminappropriatefor tasksin whichnovel inputsrequirenovel re-sponses,suchas pronouncingnonwordsin oral reading,thecurrentsimulationshows that usingappropriatelystructuredrepresentationsleadsto the developmentof attractorswithcomponentialstructurethat supportseffective generalizationto nonwords. At the sametime, the networkalso developslesscomponentialattractorsfor exceptionwordsthat violatetheregularitiesin thetask. A seriesof analysessuggeststhatboththecomponentialandnoncomponentialaspectsof attrac-torsaresupportedby hiddenrepresentationsthatreflectortho-graphicinformationat a rangeof levels of structure. In thisway, attractorsprovide an effective meansof capturingboththeregularitiesandtheexceptionsin aquasi-regulartask.

A furtheradvantageof anattractornetworkin this domainis thatits temporaldynamicsin settlingto aresponseprovideamoredirectanalogueof subjects’naminglatenciesthanerrorinafeedforwardnetwork.In fact,thetimeit takesthenetworktosettleto astablepronunciationin responseto wordsof varyingfrequency andconsistency replicatesthestandardpatternfoundin empiricalstudies.

Simulation 4: Surface Dyslexia and theDivision of Labor Between the

Phonological and Semantic PathwaysA centralthemeof the currentwork is that the processingofwords and nonwordscan co-exist within connectionistnet-works that employappropriatelystructuredorthographicandphonologicalrepresentationsandoperateaccordingto certaincomputationalprinciples. It mustbe kept in mind, however,thatSM89’sgenerallexical framework—onwhich thecurrentwork is based—containstwo pathwaysby whichorthographicinformationcaninfluencephonologicalinformation:aphono-logical pathwayanda semantic pathway(seeFigure1). Thusfar, we have ignoredthe semanticpathwayin order to focuson theprinciplesthatgoverntheoperationof thephonologicalpathway. However, onourview,thephonologicalandsemanticpathwaysmustwork togetherto supportnormalskilled read-ing. For example,semanticinvolvementis clearlynecessaryfor correctpronunciationof homographslike WIND andREAD.Furthermore,a semanticvariable—imageability—influencesthestrengthof thefrequency-by-consistency interactionin thenaminglatenciesanderrorsof skilled readers(Strain,Patter-son,& Seidenberg, in press). Even in traditionaldual-routetheories(see,e.g.,Coltheartet al., 1993;Coltheart& Rastle,1994),the lexical proceduremustinfluencetheoutputof thesublexical procedureto accountfor consistency effectsamongregularwordsandnonwords(Glushko,1979).


TheSM89framework (andtheimpliedcomputationalprin-ciples) provides a natural formulation of how contributionsfrom both the semanticandphonologicalpathwaysmight beintegratedin determiningthepronunciationof awrittenword.Critically, whenformulatedin connectionistterms,this inte-grationhasimportantimplicationsfor the natureof learningin thetwo pathways.In mostconnectionistsystems,learningis drivenby somemeasureof thediscrepancy or errorbetweenthecorrectresponseandtheonegeneratedby thesystem.Tothe extent that the contribution of one pathwayreducestheoverall error, theotherpathwaywill experiencelesspressureto learn.As aresult,onits own, it maymasteronly thoseitemsit findseasiestto learn. Specifically, if thesemanticpathwaycontributessignificantly to the pronunciationof words, thenthephonologicalpathwayneednot masterall of thewordsbyitself. Rather, it will tendto learnbestthosewordshigh infrequency and/orconsistency; low-frequency exceptionwordsmay never be learnedcompletely. This is especiallytrue ifthereis someintrinsic pressurewithin thenetworkto preventoverlearning—forexample,if weightshaveaslightbiastowardstayingsmall. Of course,thecombination of thesemanticandphonologicalpathwayswill be fully competent.But readersof equivalentovert skill may differ in their division of laborbetweenthetwopathways(see,e.g.,Baron& Strawson,1976).In fact,if thesemanticpathwaycontinuesto improve with ad-ditional readingexperience,thephonologicalpathwaywouldbecomeincreasinglyspecializedfor consistentspelling-soundmappingsat the expenseof even higher-frequency exceptionwords.At any point,braindamagethatreducedor eliminatedthesemanticpathwaywouldlaybarethelatentinadequaciesofthephonologicalpathway. In thisway, adetailedconsiderationof thedivision of laborbetweenthephonologicalandseman-tic pathwaysis critical to understandingthe specificpatternsof impairedandpreservedabilitiesof brain-damagedpatientswith acquireddyslexia.

Of particularrelevancein thiscontext is thefindingthatbraindamagecanselectively impair eithernonwordreadingor ex-ceptionwordreadingwhile leaving theother(relatively) intact.Thus,phonologicaldyslexic patients(Beauvois& Derouesne,1979) readwords (both regular and exception)much betterthannonwords,whereassurfacedyslexic patients(Marshall&Newcombe,1973;Pattersonetal.,1985)readnonwordsmuchbetterthan(exception)words.

Phonologicaldyslexiahasanaturalinterpretationwithin theSM89framework in termsof selective damageto thephono-logicalpathway(orperhapswithinphonology itself;seePatter-son& Marcel,1992),sothatreadingisaccomplishedprimarily(perhapseven exclusively in somepatients)by the semanticpathway. This pathwaycanpronouncewordsbut is unlikelyto provide muchusefulsupportin pronouncingnonwordsas,by definition, theseitems have no semantics. Along theselines, as mentionedin the previous section,Plaut and Shal-lice (1993, also seeHinton & Shallice,1991) useda seriesof implementationsof the semanticroute to provide a com-

prehensive accountof deepdyslexia (Coltheartet al., 1980;Marshall & Newcombe,1966), a form of acquireddyslexiasimilar to phonologicaldyslexia but also involving the pro-ductionof semanticerrors(seeFriedman,in press;Glosser&Friedman,1990,for argumentsthatdeepdyslexia is simplythemostsevereform of phonologicaldyslexia). Thequestionoftheexactnatureof theimpairmentthatgivesriseto readingviasemanticsin phonologicaldyslexia, andwhetherthis interpre-tationcanaccountfor all of the relevantfindings,is takenupin theGeneralDiscussion.

Surfacedyslexia, on the other hand, would seemto in-volve readingprimarily via the phonologicalpathwayduetoan impairmentof the semanticpathway. It its purest,fluentform (e.g.,MP, Behrmann& Bub,1992;Bub, Cancelliere,&Kertesz,1985;KT,McCarthy& Warrington,1986;HTR,Shal-lice etal., 1983),patientsexhibit normalaccuracy andlatencyin readingwords with consistentspelling-soundcorrespon-dencesandin readingnonwords,but oftenmisreadexceptionwords,particularlythoseof low frequency, bygivingapronun-ciation consistentwith morestandardcorrespondences(e.g.,SEW � “sue”). Althoughwe ascribesucherrorsto influencesof consistency, they areconventionallytermedregularizations(Coltheart,1981)andwehaveretainedthisterminology. Thus,thereisafrequency-by-consistency interactionin accuracy thatmirrors the interactionin latency exhibited by normalskilledreaders(Andrews,1982;Seidenberg, 1985;Seidenberg et al.,1984; Taraban& McClelland, 1987; Waters& Seidenberg,1985). The relevanceof the semanticimpairmentin surfacedyslexia is supportedby the finding that, in somecasesofsemanticdementia(Graham,Hodges,& Patterson,1994;Pat-terson& Hodges,1992;Schwartz,Marin,& Saffran,1979)andof Alzheimer’s typedementia(Patterson,Graham,& Hodges,1994),thesurfacedyslexic readingpatternemergesaslexicalsemanticknowledgeprogressively deteriorates.

Theprevioussimulationsof thephonologicalpathway,alongwith thatof SM89,aresimilar to surfacedyslexic patientsinthat they readwithout the aid of semantics.The simulationsdo not provide a directaccountof surfacedyslexia, however,as they all readexception words as well as skilled readers.Onepossibilityis thatsurfacedyslexia arisesfrom partial im-pairmentof the phonologicalpathwayin addition to severeimpairmentof thesemanticpathway. A moreinterestingpos-sibility, basedon thedivision-of-laborideasabove, is thatthedevelopmentand operationof the phonologicalpathwayisshapedin animportantwayby theconcurrentdevelopmentofthe semanticpathway, and that surfacedyslexia ariseswhentheintact phonologicalpathwayoperatesin isolationdueto animpairmentof thesemanticpathway.

Twosetsof simulationsareemployedto testtheadequacy ofthesetwoaccountsof surfacedyslexia. Thefirstsetinvestigatesthe effects of damagein the attractornetwork developedinthe previoussimulation. Thesecondinvolvesa new networktrainedin thecontext of supportfrom semantics.


Phonological Pathway Lesions

Pattersonet al. (1989)investigatedthepossibilitythatsurfacedyslexia mightarisefrom damageto anisolatedphonologicalpathway. They lesionedthe SM89 model,by removing dif-ferentproportionsof units or connections,andmeasureditsperformanceon regular andexceptionwordsof variousfre-quencies. The damagednetwork’s pronunciationof a givenword wascomparedwith the correctpronunciationandwitha plausiblealternative—forexceptionwords,thiswasthereg-ularizedpronunciation.Pattersonandcolleaguesfound that,afterdamage,regularandexceptionwordsproduceaboutequalamountsof error, andtherewasnoeffectof frequency in read-ing exceptionwords.Exceptionwordsweremuchmorelikelythanregularwordstoproducethealternativepronunciation,buta comparisonof thephonemicfeaturesin errorsrevealedthatthenetworkshowednogreatertendency to produceregulariza-tionsthanothererrorsthatdifferfromthecorrectpronunciationby thesamenumberof features.Thus,thedamagednetworkfailed to show the frequency-by-consistency interactionandthehighproportionof regularizationerrorsonexceptionwordscharacteristicof surfacedyslexia.

Using a moredetailedprocedurefor analyzingresponses,Patterson(1990) found that removing 20% of the hiddenunitsproducedbetterperformanceonregularversusexceptionwords and a (nonsignificant)trend towardsa frequency-by-consistency interaction.Figure19showsanalogousdatafrom100instancesof lesionsto areplicationof theSM89network,in whicheachhiddenunithadaprobability� of either0.2or0.4of beingremoved. Plottedfor eachseverity of damageis thepercentcorrecton theTarabanandMcClelland’s(1987)high-andlow-frequency exceptionwordsandtheir regular consis-tentcontrolwords,thepercentof errorsontheexceptionwordsthatareregularizations,andthepercentcorrecton Glushko’s(1979)nonwords,countingascorrectany pronunciationcon-sistentwith that of someword with the samebody in thetrainingcorpus.Also shown in thefigurearethecorrespond-ing datafor two surfacedyslexic patients:MP (Behrmann&Bub,1992;Bubetal.,1985)andKT (McCarthy& Warrington,1986).

Themilder lesions(� ¤ 0 Ô 2) produceagoodmatchto MP’sperformanceon the Tarabanand McClelland words. How-ever, the more severe lesions(� ¤ 0 Ô 4) fail to simulatethemoredramaticeffectsshown by KT. Instead,while thedam-agednetworkandKT performaboutequallywell onthehigh-frequency exceptionwords,thenetworkis not asimpairedonthelow-frequency exceptionwordsandismuchmoreimpairedon both high- andlow-frequency regular words. In addition,with the lesssevere damage,only abouta third of the net-work’serrorsto exceptionwordsareregularizationsandonlyjust above half of thenonwordsarepronouncedcorrectly;formoreseveredamage,thesefiguresareevenlower. By contrast,bothMP andKT produceregularizationratesaround85-90%andarenearperfectatnonwordreading.Overall, theattempts

HF Reg� LF Reg� HF Exc� LF Exc� Reg’s� Nonwords

0

10

20

30

40

50

60

70

80

90

100

Per

cent

Cor

rect

Patient MPPatient KTSM89 (p=0.2)SM89 (p=0.4)

Figure 19. Performanceof twosurfacedyslexic patients(MP,Behrmann& Bub,1992;Bubetal.,1985;andKT, McCarthy&Warrington,1986) and of a replicationof the SM89 modelwhenlesionedby removing eachhiddenunit with probability� ¤ 0 Ô 2 or 0.4 (resultsareaveragedover 100 suchlesions).Correctperformanceis given for TarabanandMcClelland’s(1987)high-frequency (HF) and low-frequency (LF) regularconsistentwords (Reg) and exceptionwords (Exc), and forGlushko’s (1979)nonwords.“Reg’s” is theapproximateper-centageof errorson the exceptionwordsthatareregulariza-tions.

to accountfor surfacedyslexia by damagingtheSM89modelhave beenlessthansatisfactory(seeBehrmann& Bub,1992;Coltheartet al., 1993,for furthercriticisms).

One possibleexplanationof this failing parallelsour ex-planationof the SM89 model’s poor nonwordreading: it isdueto theuseof representationsthatdonotmaketherelevantstructurebetweenorthographyandphonologysufficiently ex-plicit. In essence,theinfluenceof spelling-soundconsistencyin the model is too weak. This weaknessalsoseemsto becontributing to its inability to simulatesurfacedyslexia afterseveredamage:regular word reading,nonwordreading,andregularizationratesareall toolow. This interpretationleadstothepossibilitythata networktrainedwith moreappropriatelystructuredrepresentationswould,whendamaged,successfullyreplicatethesurfacedyslexic readingpattern.

Method. Theattractornetworkwaslesionedeitherby re-moving eachhidden unit or eachconnectionbetweentwogroupsof unitswith someprobability� , orbyaddingnormally-distributednoiseto the weightson connectionsbetweentwogroupsof units. In the lattercase,theseverity of thedamagedependsonthestandarddeviationsd of thenoise—ahighersdconstitutesamoresevereimpairment.Thisformof damagehastheadvantageover thepermanentremoval of unitsor connec-tions of reducingthe possibilityof idiosyncraticeffectsfromlesionsto particularunits/connections.As Shallice(1988)haspointedout, sucheffectsin a networksimulationareof little


interestto thestudyof the cognitive effectsof damageto thebrain given the vastdifferencein scalebetweenthe two sys-tems(alsoseePlaut,in press).In general,simulationstudiescomparingtheeffectsof addingnoiseto weightswith theef-fectsof removingunitsorconnections(e.g.,Hinton& Shallice,1991)have found that the two proceduresyield qualitativelyequivalentresults.14

Fifty instancesof eachtypeof lesionof arangeof severitieswereadministeredtoeachof themainsetsof connectionsin theattractornetwork(graphemes-to-hidden,hidden-to-phonemes,phonemes-to-hidden,andphonemes-to-phonemes),andto thehiddenunits. After agivenlesion,theoperationof thenetworkwhenpresentedwith aninputandtheprocedurefor determin-ing its responseareexactly thesameasin Simulation3.

To evaluatetheeffectsof lesions,thenetworkwastestedonTarabanandMcClelland’s(1987)high- andlow-frequency reg-ular consistentwordsandexceptionwordsandon Glushko’s(1979) nonwords. For the words, in addition to measuringcorrectperformance,we calculatedthe percentageof errorson the exceptionwordsthatcorrespondto a regularizedpro-nunciation. The full list of responsesthat wereacceptedasregularizationsis given in Appendix 3. As the undamagednetwork mispronouncesthe word SPOOK, this item was notincluded in the calculationof regularizationrates. For thenonwords,a pronunciationwasacceptedas correctif it wasconsistentwith thepronunciationof someword in thetrainingcorpuswith thesamebody(seeAppendix2).

Results and Discussion. Figure20 shows the datafromtheattractornetworkaftertheweightsof eachof thefour mainsetsof connectionswerecorruptedby noiseof varyingsever-ities. Themilder lesionsto thegraphemes-to-hiddenconnec-tions(on the top left of theFigure)produceclearinteractionsof frequency andconsistency in correctperformanceon wordreading. For instance,after adding noisewith sd=0.4, thenetworkpronouncescorrectlyover 96%of regularwordsand93%of high-frequency exceptionwords,but only 77%of low-frequency exceptionwords.In addition,for theselesions,68%of errorsonexceptionwordsareregularizations,and89%of thenonwordsarepronouncedcorrectly. Comparedwith theresultsfrom lesionsof 20%of thehiddenunitsin theSM89network,theseshow a strongereffect of consistency and area bettermatchto the performanceof MP (althoughthe regularizationrateis somewhat low; seeFigure19). Thus,aspredicted,theuseof representationsthatbettercapturespelling-soundstruc-tureproducesastrongerfrequency-by-consistency interaction,moreregularizations,andbetternonwordreading.

14To seewhy this shouldbe the case,imaginea much larger network inwhich the role of eachweight in a smallernetworkis accomplishedby thecollective influenceof a large setof weights. For instance,we might replaceeachconnectionin thesmallnetworkby a setof connectionswhoseweightsarebothpositiveandnegativeandsumto theweightof theoriginalconnection.Randomlyremovingsomeproportionof theconnectionsin thelargenetworkwill shift the meanof eachsetof weights;this will have the sameeffect asaddinga randomamountof noiseto thevalueof thecorrespondingweightinthesmallnetwork.

As foundfor theSM89network,however, moreseverele-sionsdonotreplicatethepatternshownbyKT. Lesionsthatre-ducecorrectperformanceon high-frequency exceptionwordsto equivalentlevels(sd=1.0;network: 46%;KT: 47%)do notimpair low-frequency exceptionwordssufficiently (network:38%; KT: 26%) and,unlike KT, impair both high- andlow-frequency regularwords(network: 65%and60%;KT: 100%and89%, respectively). Furthermore,andeven moreunlikeKT, thereis a substantialdrop in both the regularizationrate(network: 32%; KT: 85%)andin performanceon nonwords(network:60%;KT: 100%).

Lesionsto the other setsof connectionsproducebroadlysimilarbut evenweakerresults:thefrequency-by-consistencyinteractionsareweaker(especiallyfor severelesions),theim-pairmentof regularwordsismoresevere(exceptfor phoneme-to-hiddenlesions),andtheregularizationratesaremuchlower(notethatadifferentrangeof lesionseveritieswasusedfor thehidden-to-phonemesconnectionsasthey aremuchmoresen-sitive to noise). Thus,in summary, mild grapheme-to-hiddenlesionsin theattractornetworkcanaccountfor MP’sbehavior,but moreseverelesionscannotreproduceKT’sbehavior.

Thesenegativefindingsarenotspecificto theuseof noiseinlesioningthenetwork;removingunitsorconnectionsproducesqualitatively equivalentresults,except that the regularizationratesareeven lower. To illustratethis, Table9 presentsdatafor the two patientsandfor the attractornetworkaftereithermild orseverelesionsof thegraphemes-to-hiddenconnections,thehiddenunits,or thehidden-to-phonemesconnections.Thelevelsof severity werechosento approximatetheperformanceof MP andKT on low-frequency exceptionwords.

In summary, sometypesof lesionto anetworkimplementa-tion of thephonologicalpathwaymaybeableto approximatethe less-impairedpatternof performanceshown by MP, butare unableto accountfor the more dramaticpatternof re-sultsshown by KT. Thesefindingssuggestthatimpairmenttothe phonologicalpathwaymayplay a role in the behavior ofsomesurfacedyslexic patients,but seemsunlikely to provideacompleteexplanationof somepatients—particularthosewithnormalnonwordreadingandseverelyimpairedexceptionwordreading.

Phonological and Semantic Division of LaborWe now consideranalternative view of surfacedyslexia: thatit reflectsthe behavior of an undamagedbut isolatedphono-logical pathwaythat hadlearnedto dependon supportfromsemanticsin normalreading.All of theprevioussimulationsof the phonologicalpathwayhave beentrained to be fullycompetenton their own. Thus,if this explanationfor surfacedyslexia holds, it entailsa reappraisalof the relationshipbe-tweenthosesimulationsandthenormalskilled word readingsystem.

The currentsimulationinvolvestraininga new networkinthecontext of anapproximationto thecontributionof seman-tics. Includinga full implementationof thesemanticpathway


HF Reg� LF Reg� HF Exc� LF Exc� Reg’s Nonwords�

0

10

20

30

40

50

60

70

80

90

100

Per

cent

Cor

rect

sd=0.2sd=0.4sd=0.6sd=0.8sd=1.0

Lesion Severity�

Graphemes-to-Hidden Lesions


0

10

20

30

40

50

60

70

80

90

100

Per

cent

Cor

rect

sd=0.1sd=0.15sd=0.2sd=0.3sd=0.4

Lesion Severity�

Hidden-to-Phonemes Lesions�


0

10

20

30

40

50

60

70

80

90

100

Per

cent

Cor

rect

sd=0.2sd=0.4sd=0.6sd=0.8sd=1.0

Lesion Severity�

Phonemes-to-Hidden Lesions


0

10

20

30

40

50

60

70

80

90

100

Per

cent

Cor

rect

sd=0.2sd=0.4sd=0.6sd=0.8sd=1.0

Lesion Severity�

Phonemes-to-Phonemes Lesions

Figure 20. Performanceof the attractornetworkafter lesionsof variousseverities to eachof the main setsof connections,in which weightsarecorruptedby noisewith meanzeroandstandarddeviation “sd” as indicated. Correctperformanceisgivenfor TarabanandMcClelland’s (1987)high-frequency (HF) andlow-frequency (LF) regularconsistentwords(Reg) andexceptionwords(Exc),andfor Glushko’s(1979)nonwords.“Reg’s” is thepercentageof errorsontheexceptionwordsthatareregularizations.


Table9Performance of the Attractor Network after Lesions of Units or Connections

CorrectPerformanceHF Reg LF Reg HF Exc LF Exc Reg’s Nonwords

PatientMP� 95 98 93 73 90� 95.5PatientKT

�100 89 47 26 85� 100

AttractorNetworkLesionsGraphemes-to-Hidden� ¤ Ô 05 95.8 94.4 88.9 75.8 65.6 89.6� ¤ Ô 3 49.0 42.8 37.8 27.9 26.0 45.3HiddenUnits� ¤ Ô 075 93.9 93.5 85.6 75.8 51.4 85.6� ¤ Ô 3 54.5 49.4 45.3 31.7 18.0 48.4Hidden-to-Phonemes� ¤ Ô 02 89.0 89.2 81.0 70.0 48.3 82.4� ¤ Ô 1 36.3 31.8 26.4 24.8 13.3 35.5

Note: p is the probability that eachof the specifiedunits or connectionsis removedfrom the network for alesion;resultsareaveragedover50 instancesof suchlesions.Correctperformanceis givenfor theTarabanandMcClelland’s(1987)high-frequency(HF) andlow-frequency(LF) regularconsistentwords(Reg)andexceptionwords(Exc),andfor Glushko’s(1979)nonwords.“Reg’s” is thepercentageof errorsontheexceptionwordsthatareregularizations.�

FromBub etal. (1985,seealsoBehrmann& Bub,1992).�FromPatterson(1990,basedon McCarthy& Warrington,1986).�Approximate(from Patterson,1990).

is, of course,beyond the scopeof the presentwork. Rather,we will characterizethe operationof this pathwaysolely intermsof its influenceon thephonemeunitswithin thephono-logical pathway. Specifically, to the extent that the semanticpathwayhaslearnedto derive themeaningandpronunciationof a word, it providesadditionalinput to the phonemeunits,pushingthemtoward their correctactivations. Accordingly,wecanapproximatetheinfluenceof thesemanticpathwayonthedevelopmentof thephonologicalpathwayby training thelatter in thepresenceof someamountof appropriateexternalinput to thephonemeunits.

A difficult issuearisesimmediatelyin the context of thisapproach,concerningthe time-courseof developmentof thesemanticcontribution during the trainingof thephonologicalpathway. Presumably, the mappingbetweensemanticsandphonologydevelops,in largepart,prior to readingacquisition,aspartof speechcomprehensionandproduction.By contrast,the orthography-to-semanticsmapping,like orthography-to-phonologymapping,obviously candeveloponly whenlearn-ing to read.In fact,it is likely thatthesemanticpathwaymakesasubstantialcontributionto oral readingonly oncethephono-logicalpathwayhasdevelopedtosomedegree—inpartbecauseof thephonologicalnatureof typicalreadinginstruction,andinpartbecause,in English,the orthography-to-phonology map-ping is far morestructuredthantheorthography-to-semanticsmapping. The degreeof learningwithin the semanticpath-

way is alsolikely to besensitive to the frequency with whichwordsareencountered.Accordingly, asa coarseapproxima-tion, we will assumethat thestrengthof thesemanticcontri-bution to phonologyin readingincreasesgraduallyover timeandis strongerfor high-frequency words.

It mustbeacknowledgedthatthischaracterizationof seman-tics fails to capturea numberof propertiesof theactualwordreadingsystemthatarecertainlyimportantin somecontexts:other lexical factors,suchas imageability, that influencethecontributionof semanticsto phonology, interactivity betweenphonologyandsemantics,andtherelative time-courseof pro-cessingin thesemanticandphonologicalpathways.Nonethe-less,themanipulationof external input to thephonemeunitsallows us to investigatethe centralclaim in theproposedex-planationof surfacedyslexia: that partial semanticsupportfor wordpronunciationsalleviatestheneedfor thephonolog-ical pathwayto masterall words,suchthat,whenthesupportis eliminatedby brain damage,the surfacedyslexic readingpatternemerges.

Method. As will becomeapparentbelow, the necessarysimulationrequires4–5 timesmoretraining epochsthanthecorrespondingprevioussimulation.Thus,anattractornetworktrainedonactualwordfrequenciescouldnotbedevelopeddueto thelimitationsof availablecomputationalresources.Rather,thesimulationinvolvedtrainingafeedforwardnetworkusingasquare-rootcompressionof wordfrequencies.Suchanetwork


producesapatternof resultsin wordandnonwordreadingthatis quite similar to the attractornetwork (seeSimulation2).More importantly, there is nothing specific about the feed-forwardnatureof thenetworkthatis necessaryto producetheresultsreportedbelow; anattractornetworktrainedunderanal-ogousconditionswould beexpectedto producequalitativelyequivalentresults.

Thenetworkwastrainedwith thesamelearningparametersas the correspondingnetwork from Simulation2 except foronechange:asmallamountof weight decay wasreintroduced,suchthat eachweight experiencesa slight pressureto decaytowardszero,proportional(with constant0.001)to its currentmagnitude.As mentionedin thecontext of Simulation1, thisprovidesabiastowardssmallweightsthatpreventsthenetworkfromoverlearningandtherebyencouragesgoodgeneralization(seeHinton,1989).As isdemonstratedbelow, theintroductionof weight decaydoesnot alter the ability of the network toreplicatethepatternsof normalskilledperformanceonwordsandnonwords.

Over the courseof training, the magnitude� of the inputto phonemeunits from the(putative) semanticpathwayfor agivenwordwassetto be� ¤ � log( �B 2)

log( � 2) ø�� ³ 16

where � is the Kuceraand Francis(1967) frequency of theword, and is the training epoch. The parameters� and �determinetheasymptoticlevel of inputandthetimeto asymp-tote,respectively. Theirvalues( � ¤ 5, � ¤ 2000in thecurrentsimulation),and,moregenerally, thespecificanalyticfunctionusedtoapproximatethedevelopmentof thesemanticpathway,affect thequantitativebut not thequalitativeaspectsof there-sultsreportedbelow. Figure21shows themeanvaluesof thisfunctionover trainingepochsfor theTarabanandMcClelland(1987)high- andlow-frequency words. If, for a givenword,the correctstateof a phonemeunit was1.0, thenits externalinput waspositive; otherwiseit wasthe samemagnitudebutnegative.

For the purposesof comparison,a secondversionof thenetworkwastrainedwithoutsemantics,usingexactlythesamelearningparametersandinitial randomweights.

Results and Discussion. Learningin thenetworktrainedwithout semanticsreachedasymptoteby epoch500,at whichpoint it pronouncedcorrectlyall but 9 of the 2998wordsinthetrainingcorpus(99.7%correct).Figure22 shows theper-formanceof thenetworkonTarabanandMcClelland’s(1987)high- and low-frequency exception words and their regularconsistentcontrolwords,andon Glushko’s(1979)nonwords,over thecourseof training.Performanceonregularwordsandonnonwordsimprovesquiterapidlyover thefirst 100epochs,reaching97.9% for the words and 96.5%for the nonwordsat thispoint. Performanceonhigh-frequency exceptionwordsimprovessomewhatmoreslowly. By contrast,performanceonthelow-frequency exceptionwordsimprovesfar moreslowly,

0�

200�

400�

600�

800�

1000�

1200�

1400�

1600�

1800�

2000�

Training Epoch�0.0

1.0

2.0

3.0

4.0

5.0

Ext

erna

l Inp

ut to

Pho

nem

es

�Contribution of Semantic Pathway�

High Frequency (1222/million) Low Frequency (20/million)!

Figure 21. The magnitudeof the additionalexternal inputsuppliedto phonemeunitsby theputative semanticpathway,asafunctionof trainingepoch,for theTarabanandMcClelland(1987)high-andlow-frequency words.

only becomingperfectat epoch400. At this point, all ofthe words are readcorrectly. Even so, thereare significantmain effects of frequency ( " 1 # 92=35.9, p $ .001) and consis-tency ( " 1 # 92=64.3, p $ .001), and a significant interactionoffrequency andconsistency in thecross-entropyerrorproducedby the words (means: HFR 0.031,LFR 0.057,HFE 0.120,LFE 0.465; " 1 # 92=26.4,p $ .001). Thus,thenetworkexhibitsthestandardpatternof normalskilledreaders;theuseof weightdecayduringtraininghasnotsubstantiallyalteredthebasicin-fluencesof frequency andconsistency in thenetwork.

In the currentcontext, the networktrainedwith a concur-rently increasingcontribution from semantics(as shown inFigure 21) is the more direct analogueof a normal reader.Not surprisingly, overall performanceimprovesmorerapidlyin this case. All of the regularsand the high-frequency ex-ceptionsare pronouncedcorrectly by epoch110, and low-frequency exceptionsare at 70.8% correct. By epoch200,all of the low-frequency exceptionsarecorrect,andnonwordreadingis 95.4%correct(whereweassumenonwordsreceiveno contribution from semantics).At this point, the networkwith semanticsexhibits thestandardeffectsof frequency andconsistency in cross-entropyerror (means:HFR 0.021,LFR0.025,HFE0.102,LFE0.382;frequency: " 1 # 92=19.0,p $ .001;consistency: " 1 # 92=45.0,p $ .001; frequency-by-consistency:" 1 # 92=17.8, p $ .001). Even after a considerableamountofadditional training (epoch2000), during which the divisionof labor betweenthe semanticand phonologicalpathwayschangesconsiderably(as shown below), the overt behaviorof thenormal“combined”networkshows thesamepatternofeffects(nonwordreading:97.7%correct;wordcross-entropyerrormeans:HFR0.013,LFR 0.014,HFE0.034,LFE 0.053;frequency: " 1 # 92=13.6, p $ .001; consistency: " 1 # 92=125.1,p $ .001;frequency-by-consistency: " 1 # 92=9.66,p=.003).

This lastfindingmayhelpexplain why, asin previoussim-


0%

100%

200%

300%

400%

500%

Training Epoch&0

10

20

30

40

50

60

70

80

90

100

Per

cent

Cor

rect

Training Without Semantics'

HF Reg(LF Reg)HF Exc(LF Exc)Nonwords*

Figure 22. Correctperformanceof thenetworktrainedwith-outsemantics,asa functionof trainingepoch,onTarabanandMcClelland’s (1987)high-frequency (HF) andlow-frequency(LF) regular consistentwords (Reg) and exception words(Exc),andonGlushko’s(1979)nonwords.

ulations,networksthat are trainedto be fully competentontheir own replicatethe effects of frequency andconsistencyin naminglatency, even though,from thecurrentperspective,suchsimulationsarenotfully adequatecharacterizationsof theisolatedphonologicalpathwayin skilled readers.Thereasonis that, when performanceis nearasymptote—dueeither toextendedtrainingor to semanticsupport—wordfrequency andspelling-soundconsistency affect the relative effectivenessofprocessingdifferentwordsin thesameway. This asymptoticbehavior followsfromthefrequency-consistency equation(seeEquation12 andFigure8). Increasingtraining(by increasingeach +-, .0/ in the equation)or addingan additionalsemanticterm to the sumservesequallyto drive units further towardstheirextremalvalues(alsoseetheGeneralDiscussion).

Figure 23 shows the performanceof the network at eachpoint in training when the contribution from semanticsiseliminated—thatis, afteracompletesemantic“lesion.” Thesedata reflect the underlyingcompetenceof the phonologicalpathwaywhentrainedin thecontext of aconcurrentlydevelop-ing semanticpathway. Firstnoticethatthesimulationinvolvestrainingfor 2000epochs,eventhoughthebulk of “overt” read-ing acquisitionoccursin thefirst 100epochs.Thus,theeffectsin the network shouldbe thoughtof as reflectingthe grad-ual improvementof skill from readingexperiencethat, in thehumansystem,spansperhapsmany decades.

Initially, performanceon nonwordsandall typesof wordsimproves as the phonologicalpathwaygainscompetenceinthetask,muchaswhenthenetworkis trainedwithout seman-tics (seeFigure22). But as the semanticpathwayincreasesin strength(ascharacterizedby the curvesin Figure21), the

01

2001

4001

6001

8001

10001

12001

14001

16001

18001

20001

Training Epoch20

10

20

30

40

50

60

70

80

90

100P

erce

nt C

orre

ct

HF Reg3LF Reg4HF Exc3LF Exc4Regularizations5Nonwords6Effects of Semantic ‘‘Lesion’’

Figure 23. Performanceof thenetworktrainedwith seman-ticsafterasemantic“lesion,” asafunctionof thetrainingepochatwhichsemanticsiseliminated,for TarabanandMcClelland’s(1987)high-frequency (HF) and low-frequency (LF) regularconsistentwords (Reg) and exceptionwords (Exc), and forGlushko’s (1979)nonwords,andthe approximatepercentageof errorson theexceptionwordsthatareregularizations.


accuracy of thecombinednetwork’s pronunciationsof wordsimprovesevenfaster(recallthatthecombinednetworkis per-fect on the Tarabanand McClelland words by epoch200).The pressureto continueto learn in the phonologicalpath-way is therebydiminished. Eventually, at aboutepoch400,this pressureis balancedby the bias for weights to remainsmall. At this point, most of the error that remainscomesfrom low-frequency exceptionwords. This error is reducedas the semanticpathwaycontinuesto increaseits contribu-tion to the pronunciationof these(and other) words. As aresult,thepressurefor weightsto decayis no longerbalancedby the error, and the weightsbecomessmaller. This causesa deterioration in the ability of the phonologicalpathwaytopronouncelow-frequency exceptionwordsby itself. With fur-thersemanticimprovement,theprocessingof high-frequencyexceptionwordsin the phonologicalpathwayalsobegins tosuffer. Virtually all of the errors on exception words thatresult from this processareregularizations(plottedasaster-isks in Figure23). Larger weightsareparticularlyimportantfor exceptionwordsbecausethey mustoverridethe standardspelling-soundcorrespondencesthatareimplementedbymanysmallerweights. Furthermore,high-frequency wordsarelesssusceptibleto degradationbecauseany decrementin overtper-formanceinducesmuchlargerweightchangesto compensate.By contrast,theprocessingof regularwordsandnonwordsisrelatively unaffectedby thegradualreductionin weightmag-nitudes.Low-frequency regularwordsjustbegin tobeaffectedatepoch1900.

Thus, with extendedreadingexperience,thereis a redis-tribution of laborwithin themodelbetweenthesemanticandphonologicalpathways. As the semanticpathwaygains incompetence,the phonologicalpathwayincreasinglyspecial-izesfor consistentspelling-soundcorrespondencesat the ex-penseof exceptionwords. Notice, however, that even withextendedtraining, the phonologicalpathwaycontinuesto beableto readsomeexceptionwords—particularlythoseof high-frequency. In this way it is quiteunlike thesublexical proce-dure in a traditionaldual-routetheory, which can readonlyregularwordsandno exceptionwords. It is alsoimportanttokeepin mindthatnormalovertperformance—assupportedbythecombinationof thephonologicalandsemanticpathways—becomesfully accurateveryearlyonandcontinuesto improvein naminglatency (asreflectedindirectlyby error).

Onthisinterpretationof surfacedyslexia,differencesamongpatientsin theirability to readexceptionwordsmaynotreflectdifferencesin theseveritiesof theirbraindamage.Rather, theymayreflectdifferencesin theirpremorbid divisionof laborbe-tweenpathways,with the patientsexhibiting themoresevereimpairmentbeingthosewho hadreliedto a greaterextentonsemanticsupport. To illustrate this moredirectly, Figure24presentsdatafrom MP andKT aswell asdatafrom the net-work at two differentpointsin training,whensemanticswaseliminated.Overall,thenetworkatepoch400providesaclosematchto MP’sperformance,while thenetworkat epoch2000

HF Reg7 LF Reg7 HF Exc8 LF Exc8 Reg’s9 Nonwords:

0

10

20

30

40

50

60

70

80

90

100

Per

cent

Cor

rect

Patient MP;Patient KT;Epoch 4008Epoch 20008

Figure 24. Performanceof twosurfacedyslexic patients(MP,Behrmann& Bub,1992;Bubetal.,1985;andKT, McCarthy&Warrington,1986)andthenetworkatdifferentpointsin train-ing when semanticsis eliminated. Correct performanceisgiven for Tarabanand McClelland’s (1987) high-frequency(HF) andlow-frequency (LF) regular consistentwords(Reg)and exception words (Exc), and for Glushko’s (1979) non-words.“Reg’s” is theapproximatepercentageof errorsontheexceptionwordsthatareregularizations.

matchesKT’s performance.Theonly substantialdiscrepancyis that,in bothconditions,thenetwork’srateof regularizationsis higherthanthatof the correspondingpatient(althoughthepatientdataareonly approximate;seePatterson,1990).

Thus far, we have assumedthat surfacedyslexic patients,at leastthoseof thefluenttype,have a lesionthatcompletelyeliminatesany contributionof thesemanticpathwayinreading.This assumptionmay be reasonablefor MP andKT, asbothpatientshadverysevereimpairmentsin writtenwordcompre-hension.MP wasat chanceat selectingwhichof four writtenwordswassemanticallyrelatedtoagivenwordor picture(Bubet al.,1985,alsoseeBub,Black,Hampson,& Kertesz,1988).KT’s severeword comprehensiondeficit preventedhim fromscoringoneithertheVocabularyor Similaritiessubtestsof theWechslerAdult IntelligenceScale(WAIS) (e.g.,“bed, bed,Ido not know what a bedis;” McCarthy& Warrington,1986,p. 361).

However, somepatientswith fluent surfacedyslexia ap-pearto have only a partial impairmentin the semanticpath-way. In particular, amongpatientswith semanticdementiawhosereadinghasbeentestedin detail, the large majorityalso exhibit a surfacedyslexic patternsuchthat severity ofthe readingdisorderis correlatedwith the degreeof seman-tic deterioration(Grahamet al., 1994; Patterson& Hodges,1992,but seeCipolotti & Warrington,1995). A similar find-ing appliesamongpatientswith Alzheimer’s type dementia(Pattersonet al., 1994). Suchcaseshave a naturalinterpreta-tion in thecurrentcontext in termsof theperformanceof the


2.0< 1.8< 1.6< 1.4< 1.2< 1.0< 0.8< 0.6< 0.4< 0.2< 0.0<Strength of Semantics

0 0=10 10

20 20>30 30

40 40?50 50@60 60

70 70A80 80

90 90B100 100

Per

cent

Cor

rect P

ercent Correct

HF RegCLF RegDHF ExcCLF ExcDRegularizationsENonwordsF

Figure 25. Theeffectof gradualeliminationof semanticsonthe correctperformanceof the networkafter 2000epochsoftrainingwith semantics,for TarabanandMcClelland’s (1987)high-frequency (HF) andlow-frequency (LF) regular consis-tentwords(Reg)andexceptionwords(Exc),andfor Glushko’s(1979)nonwords,andapproximatepercentageof errorsontheexceptionwordsthatareregularizations.

networkwith partial ratherthancompleteeliminationof thecontribution of the putative semanticpathway. To illustratethis effect, Figure25 shows the performanceof the networktrainedwith semanticsto epoch2000,as the strengthof thesemanticcontribution to thephonemeunits—theparameterGin Equation16—isgraduallyreduced.As semanticsdegrades,performanceon thelow-frequency exceptionsis thefirst to beaffected,followedby thehigh-frequency exceptions.By con-trast,performanceonregularwordsandnonwordsis relativelyunaffectedby semanticdeterioration,althoughperformanceon low-frequency regular wordsis somewhat impairedasse-manticsis completelyeliminated(for GIH 0 J 0, the dataareidentical to thosein Figure23 for epoch2000). In fact, se-manticdementiapatientsalsoexhibit adropin performanceonlow-frequency regularwordswhentheir semanticimpairmentbecomesverysevere(Patterson& Hodges,1992).Of course,apatientwith progressivedementiamayalsohavesomeamountof deteriorationwithin the phonologicalpathwayitself. AsFigure20 andTable9 illustrate,suchimpairmentwould tendto degradeperformanceon exceptionwordseven further, butalsowouldaffectperformanceonregularwordsandnonwordsto somedegree.

The observationof surfacedyslexic readingin associationwith eitherdegradedsemanticsor a disruptedmappingfromsemanticsto phonology(which, on our account,shouldhavethesameeffect) is common,andindeedhasbeenreportedinseveral languagesotherthanEnglish,includingDutch(Dies-feldt, 1992),Italian(Miceli & Caramazza,1993)andJapanese(Patterson,Suzuki,Wydell, & Sasanuma,in press). It is im-

portantto note,however, thattherearecasesthatsuggesttheremay be individual differencesin the extent to which the pro-nunciationof low-frequency exceptionsdependson contribu-tions from semantics. The first is patientWLP (Schwartz,Saffran, & Marin, 1980),oneof themostthoroughlystudiedcasesof neurodegenerative diseasein thehistoryof cognitiveneuropsychology. AlthoughWLP beganto makeregulariza-tion errorsonlow-frequency exceptionwordsatalaterstageofherdisease,therewasaperiodof testingatwhichhersemanticdisorderwasalreadymarkedbut herexception-wordreadingwas still largely intact. Even more dramatically, CipolottiandWarrington(1995)have recentlyreporteda patient,DRN,with a substantiallossof meaningfor low-frequency words,thoughhis comprehensionof high-frequency words(asmea-suredby thedifficult taskof producingword definitions)wasstill intact. DRN’sperformancein readinglow-frequency ex-ceptionwordswas,however, almostperfectlyintact,with onlytwo or threereportedregularizationerrors(CANOE K “kano”,SHOE K “show”). On our account,theseobservationssuggestthat,in theseindividuals,thephonologicalpathwayhaddevel-opedarelativelyhighdegreeof competencewithoutassistancefromsemantics;but thispost-hocinterpretationclearlyrequiressomefuture,independentsourceof evidence.

Onefinal comment,with respectto phonologicaldyslexia,seemsappropriate.Recallthatphonologicaldyslexic patientsareableto readwordsmuchbetterthannonwords.In thecur-rent simulation,the external input to the phonemeunits thatrepresentsthe contribution of the semanticpathwayis suffi-cient, on its own, to supportaccurateword reading(but notnonwordreading). On the otherhand,severedamageto thephonologicalpathwaycertainlyimpairsnonwordreading(seeFigure 20 and Table 9). In the limit of a completelesionbetweenorthographyandphonology, nonwordreadingwouldbeimpossible.Thus,a lesionto thenetworkthatseverelyim-pairedthephonologicalpathwaywhile leavingthecontributionof semanticsto phonology(relatively) intact would replicatethebasiccharacteristicsof phonologicaldyslexia.

SummaryThedetailedpatternsof behavior of acquireddyslexic patientsprovideimportantconstraintsonthenatureof thenormalwordreadingsystem. The most relevant patientsin the currentcontext are thosewith (fluent) surfacedyslexia, as, like thenetworks,they would seemto readwithout theaid of seman-tics. Thesepatientsreadnonwordsnormally, but exhibit afrequency-by-consistency interactionin word readingaccu-racy, suchthatlow-frequency exceptionwordsareparticularlyerror-proneandtypicallyproduceregularizationerrors.Patter-sonet al. (1989;Patterson,1990)wererelatively unsuccessfulin replicatingthe surfacedyslexia readingpatternby damag-ing theSM89model.Althoughthecurrentsimulationsemploymoreappropriatelystructuredrepresentations,whendamaged,they toofail toproducesurfacedyslexia—particularlythemoresevereform exhibitedby KT (McCarthy& Warrington,1986).


Thesefindingscall into questionthe interpretationof surfacedyslexiaasarisingfrom apartialimpairmentof thephonologi-calpathwayin additiontoextensiveimpairmentof thesemanticpathway. Rather, abettermatchto thesurfacedyslexic readingpattern—inbothitsmild andsevereforms—isproducedby thenormaloperationof anisolatedphonologicalpathwaythathaddevelopedin thecontext of supportfromthesemanticpathway.Thisfindingsupportsaview of thenormalwordreadingsystemin which thereis a division of labor betweenthe phonologi-cal andsemanticpathways,suchthat neitherpathwayaloneis completelycompetentand the two mustwork togethertosupportskilledwordandnonwordreading.

General DiscussionThecurrentworkdevelopsaconnectionistapproachtoprocess-ing in quasi-regulardomains,asexemplifiedby Englishwordreading.Theapproachderivesfrom thegeneralcomputationalprinciplesthatprocessingis graded,random,adaptive,interac-tive,andnonlinear, andthatrepresentationsandknowledgearedistributed(McClelland,1991,1993).Wheninstantiatedin thespecificdomainof oral reading,theseprinciplesleadto aviewin which thereadingsystemlearnsgraduallyto besensitivetothestatisticalstructureamongorthographic,phonological,andsemanticrepresentations,andthattheserepresentationssimul-taneouslyconstraineachotherin interpretinga giveninput.

In supportof this view, we have presenteda seriesof con-nectionistsimulationsof normalandimpairedword reading.A considerationof theshortcomingsof a previous implemen-tation(Seidenberg & McClelland,1989)in readingnonwordsled to thedevelopmentof orthographicandphonologicalrep-resentationsthatcapturebettertherelevantstructureamongthewritten andspokenforms of words. In Simulation1, a feed-forward networkemployingtheserepresentationslearnedtopronounceall of alargecorpusof monosyllabicwords,includ-ing theexceptionwords,andyetalsopronouncednonwordsaswell asskilled readers.

An analysisof the effectsof word frequency andspelling-soundconsistency in a relatedbut simplersystemformedthebasisfor understandingtheempiricalpatternof naminglaten-ciesasreflectinganappropriatebalance betweenthesefactors.In Simulation2, a feedforwardnetwork trainedwith actualword frequenciesexhibited goodword andnonwordreading,andalsoreplicatedthefrequency-by-consistency interactionintheamountof errorit producedfor wordsof varioustypes.

In Simulation3, a recurrentnetworkreplicatedthe effectsof frequency and consistency on naminglatency directly inthe time requiredto settleon a stablepronunciation. Morecritically, theattractorsthat thenetworkdevelopedfor wordsoverthecourseof traininghadcomponentialstructurethatalsosupportedgoodnonwordreading.

Finally, in Simulation4, theroleof thesemanticpathwayinoral readingwasconsideredin thecontext of acquiredsurfacedyslexia, in which patientsreadnonwordswell but exhibit

a frequency-by-consistency interactionin namingaccuracy,typically regularizing low-frequency exception words. Theview that thesesymptoms—particularlyin their most severeform—reflecttheoperationof a partially impairedphonologi-cal pathwaywasnotsupportedby thebehavior of theattractornetworkaftera varietyof typesof damage.A furthersimula-tion supportedanalternativeinterpretationof surfacedyslexia:thatit reflectsthenormaloperationof aphonologicalpathwaythatis notfully competenton its ownbecauseit learnedto relyonsupportfrom thesemanticpathway(which is subsequentlyimpairedby braindamage).

Alternative Perspectives on Word ReadingWe cannow raise,andthenconsiderin thelight of theresultssummarizedabove,severalissuesconcerningthenatureof thereadingprocess.Thereis generalagreementthat(at least)twopathwayscontribute to readingwords and nonwordsaloud,but this still leavesopena numberof fundamentalquestions.Whataretheunderlyingexplanatoryprinciplesthatdeterminethe existenceand the characterof thesedifferentpathways?How doesthe operationof eacharisefrom the fundamentalprinciples,andwhataretheparticularprinciplesto whicheachpathwayadheres?How do thedifferentpathwayscombinetocontribute to word andnonwordreading? We considerheretwo verydifferentapproachesto thesequestions.

One view—the so-calleddual-routeview—holds that thefundamentalexplanatoryprinciplein thedomainof wordread-ing is that distinctly differentmechanismsarenecessaryforreadingnonwordsontheonehandandexceptionwordsontheother. Thetwomechanismsoperatein fundamentallydifferentways. Oneassemblespronunciationsfrom phonemesgener-atedby theapplicationof grapheme-phonemecorrespondencerules. The othermapswhole (orthographic)inputsto whole(phonological)outputs,using either a lexical lookup proce-dureor, in morerecentformulations,an associative network(Pinker, 1991)or McClellandandRumelhart’s (1981) Inter-active Activation model (Coltheartet al., 1993;Coltheart&Rastle,1994).

The alternative view—our connectionistapproach—holdsthat the fundamentalexplanatoryprinciple in the domainofwordreadingis thattheunderlyingmechanismemploysanon-linear, similarity-basedactivationprocessin conjunctionwitha frequency-sensitive connectionweight adjustmentprocess.Two pathwaysarenecessaryin reading,not becausedifferentprinciplesapply to itemsof differenttypes,but becausedif-ferent tasksmustbe performed. Onepathway—heretermedthephonologicalpathway—performsthetaskof transformingorthographicrepresentationsintophonologicalrepresentationsdirectly. Theother—thesemanticpathway—actuallyperformstwo tasks. Thefirst is specificto reading;namely, the trans-formationof orthographicrepresentationsinto semanticrepre-sentations.Thesecondis a moregeneralaspectof language;namely, the transformationof semanticrepresentationsintophonologicalrepresentations.


At first glance,thesetwo views mayappearsosimilar thatdecidingbetweenthemhardly seemsworth the effort. Afterall, both the lexical procedurein the dual-routeaccountandthe semanticpathwayin the connectionistaccountcan readwords but not nonwords,and both the sublexical procedureandthephonologicalpathwayarecritical for nonwordreadingandwork betterfor regular wordsthanfor exceptionwords.It is temptingto concludethatthesetwo explanatoryperspec-tivesareconvergingonessentiallythesameprocessingsystem.Suchaconclusion,however, neglectssubtlebut importantdif-ferencesin the theoreticalandempiricalconsequencesof thetwo approaches.

Asacasein point,thesublexicalGPCprocedurein thedual-routeaccountcannot besensitive to whole-wordfrequency, asit eschews storageof whole lexical items. By contrast,inthe connectionistapproach,the phonologicalpathwaymain-tainsanintrinsicandincontrovertiblesensitivity to bothwordfrequency andspelling-soundconsistency (alsoseeMonsell,1991). This sensitivity is capturedin approximateform bythe frequency-consistency equation(Equation12), which ex-pressesthestrengthof theresponseof a simpletwo-layernet-work to a given test patternin terms of the frequency andoverlapof the trainingpatterns.The connectionistapproach,asreflectedby thisequation,predictsthattherecannever beacompletedissociationof frequency andconsistency effects;thephonologicalpathwaymustalwaysexhibit sensitivity to both.This sensitivity takesa specificform, however: Itemsthatarefrequent,consistentor bothwill have anadvantageover itemsthatareneitherfrequentnor consistent,but itemsthatarefre-quentandconsistentmaynot enjoyalargeadditionaladvantageover thosethatareonly frequentor only consistent;aseitherfrequency or consistency increases,sensitivity to differencesin theotherdecreases.15

This relationship,aswe have previously discussed,is ap-proximatelycharacterizedby thefrequency-consistency equa-tion, which we reproduceherein a form that is elaboratedtoincludea term for the contribution of the semanticpathway,and by separatingout the contributions of training patternswhoseoutputsareconsistentwith thatof thetestpattern(i.e.,so-calledfriends; Jaredetal., 1990)from thosewhoseoutputsareinconsistent(i.e., enemies). Accordingly, the state L , MN/O ofanoutput(phoneme)unit P thatshouldbeon in testpatternQ

15Recently, BalotaandFerraro(1993)have reportedan apparentdissoci-ation of frequencyandconsistencyin the naminglatenciesof patientswithAlzheimer’s typedementia,overincreasinglevelsof severity of impairment.However, thesepatientsmakesubstantialnumbersof errors,and the usualrelationshipof frequencyandconsistencyholdsin their accuracydata(alsoseePattersonet al., 1994). Furthermore,the dissociationwasnot found innaminglatenciesof youngeror oldernormalsubjects.

canbewrittenas16

L , MN/O HSR TUWV , MN/YX[Z TU " , M\/]X_^a` " , ` /cbd, ` MN/feg^ih " , h /jbk, h MN/jlmnlmo17p

in which the logistic activationfunction R ( q ) is appliedto thecontribution of the semanticpathway,

V , MN/ , plusthe contribu-tion of the phonologicalpathway, which itself is the sumofthreeterms(scaledby thelearningrate, Z ): (1) thecumulativefrequency of training on the patternitself, (2) thesumof thefrequenciesof the friends (indexed by f) times their overlapwith the test pattern,and (3) the sum of the frequenciesofthe enemies(indexed by r ) times their overlap with the testpattern.It mustbekeptin mind,however, thatthisequationisonly approximatefor networkswith hiddenunits andtrainedby error correction. Thesetwo aspectsof the implementednetworksarecritical in thatthey helpto overcomeinterferencefrom enemies(i.e., thenegativetermsin Equation17), therebyenablingthe networksto achieve correctperformanceon ex-ceptionwords—thatis, wordswith many enemiesandfew ifany friends—aswell ason regularwordsandnonwords.

Many of thebasicphenomenain thedomainof word read-ing canbe seenasnaturalconsequencesof adherenceto thisfrequency-consistency equation. In general,any factor thatservesto increasethesummedinput to theactivationfunction,R o qsp in Equation17, improvesperformance,asmeasuredbynamingaccuracy and/orlatency. Thus,morefrequentwordsarereadbetter(e.g.,Forster& Chambers,1973;Frederiksen&Kroll, 1976) becausethey have higher valuesof "t, MN/ , andwords with greaterspelling-soundconsistency are readbet-ter (Glushko,1979; Jaredet al., 1990) becausethe positivesumfrom friendsoutweighsthe negative sumfrom enemies.The nonlinear, asymptotingnatureof theactivation function,however, dictatesthatthecontributionsof thesefactorsaresub-ject to “diminishing returns”asperformanceimproves. Thus,asreadingexperienceaccumulates—therebyincreasing"t, M\/ ,"t, ` / , and "t, h / proportionally; or equivalently, increasingZ —theabsolutemagnitudesof the frequency andconsistency ef-fects diminish (see,e.g.,Backmanet al., 1984; Seidenberg,1985). The sameprinciple appliesamongdifferenttypesofstimuli for a readerat a given skill level: performanceonstimuli thatarestrongin onefactoris relatively insensitive tovariationin otherfactors.Thus,regularwordsshow little effectof frequency, andhigh-frequency wordsshow little effect ofconsistency (asshown in Figure7). Theresultis thestandardpatternof interactionbetweenfrequency andconsistency, inwhichthenamingof low-frequency exceptionwordsis dispro-portionatelyslow or inaccurate(Andrews, 1982;Seidenberg,1985;Seidenberg et al., 1984;Taraban& McClelland,1987;Waters& Seidenberg, 1985).

16For a unit with a target of u 1, the signs would simply be reversed.Alternatively, the equationcanbe interpretedasreflectingthe correlationoftheactivationof outputunit v with its target,whichmayin thatcasebeeitherw

1 or u 1.


Theelaboratedversionof the frequency-consistency equa-tion alsoprovidesa basisfor understandingtheeffectsof se-manticson namingperformance. In the approximationex-pressedby Equation17,thecontributionof thesemanticpath-way for a given word,

V , MN/ , is simply anotherterm in thesummedinput to eachoutput (phoneme)unit. Justas withfrequency andconsistency, then,a strongersemanticcontri-bution movesthe overall input furtheralongtheasymptotingactivation function, therebydiminishing the effects of otherfactors. As a result,words with a relatively weaksemanticcontribution (e.g.,abstractor low-imageabilitywords;Jones,1985; Saffran, Bogyo, Schwartz,& Marin, 1980) exhibit astrongerfrequency-by-consistency interaction—inparticular,naminglatenciesand error ratesare disproportionately highfor itemsthatareweakonall threedimensions:abstract,low-frequency exceptionwords(Strainetal., in press).

Of course,as the simulationsdemonstrate,networkswithhidden units and trainedwith error correctioncan learn topronouncecorrectlyall typesof wordswithout any helpfromsemantics.In thecontext of themoregeneralframework,how-ever, full competenceis requiredonly from thecombinationofsemanticandphonologicalinfluences.Thus,asthesemanticpathwaydevelopsand

V , MN/ increases,thecontributionrequiredfrom theother, phonologicaltermsin Equation17 to achievethe samelevel of performanceis correspondinglyreduced.With theadditionalassumptionthatthesystemhasanintrinsicbiasagainstunnecessarycomplexity (e.g.,by limiting its ef-fectivedegreesof freedomwith weightdecay),extendedread-ing experienceleadsto a redistribution of labor. Specifically,asthesemanticpathwayimproves,thephonologicalpathwaygraduallylosesits ability to processthewordsit learnedmostweakly: thosethatarelow in bothfrequency andconsistency.

If, in thiscontext, thecontributionfromsemanticsisseverelyweakenedor eliminated(by braindamage),thesummedinputto eachoutputunit will be reducedby asmuchas

V , MN/ . Foroutputunits with significantnegative termsin their summedinput—thatis, for thosein words with many enemies—thismanipulationmaycausetheir summedinput (andhencetheiroutput) to changesign. The result is an incorrectresponse.Such errors tend to be regularizationsbecausethe reducedsummedinput affectsonly thoseoutput units whosecorrectactivationsareinconsistentwith thoseof theword’sneighbors.Furthermore,asfrequency makesanindependentpositivecon-tributionto thesummedinputs,errorsaremorelikely for low-thanfor high-frequency exceptionwords. By contrast,a re-duction in the contribution from semanticshas little if anyeffect on correctperformanceon regular words becausethepositivecontributionfrom their friendsis sufficienton its ownto give outputunits the appropriatelysignedsummedinput.Theresultingpatternof behavior, correspondingto fluentsur-facedyslexia (Bubetal.,1985;McCarthy& Warrington,1986;Shalliceetal., 1983),canthusbeseenasanexaggeratedman-ifestationof thesameinfluencesof frequency andconsistencythatgiveriseto thenormalpatternof naminglatencies.

Thepatternof joint, nonlinearsensitivity to thecombinedef-fectsof frequency andconsistency in theconnectionistaccount,alongwith assumptionson thecontributionof semantics,leadto a numberof predictionsnotsharedby traditionaldual-routeaccounts.First,frequency andconsistency cantradeoff againsteachother, sothatthedetrimentaleffectsof spelling-soundin-consistency canalwaysbeovercomeby sufficiently highwordfrequency. Consequently, the connectionistaccountmakesastrongprediction: therecannot bean(English-language)sur-facedyslexic patientwhoreadsno exceptionwords;if regularwordscanbe readnormally, theremustalsobesomesparingof performanceon high-frequency exceptions.By contrast,adual-routeframework could accountfor sucha patientquiteeasily, in termsof damagethateliminatesthe lexical route(s)while leaving the GPCroutein operation. In fact, given theputativeseparationof theseroutes,theframework wouldseemto predict the existenceof suchpatients. The connectionistaccountalsodiffers from the dual-routeaccountin claimingthat consistency ratherthanregularity per se (i.e., adherenceto GPCrules)is the determiningvariablein “regularization”errors(where,asformulatedhere,consistency dependson alltypesof orthographicoverlapratherthansolelyon wordbod-ies; cf. Glushko,1979). Finally, the connectionistaccountpredictsa closerelationshipbetweenimpairmentsin thecon-tribution of semanticsto phonologyandthe surfacedyslexicreadingpattern(Grahamet al., 1994; Patterson& Hodges,1992),althoughthis relationshipwill besubjectto premorbidindividual differencesin readingskill and division-of-laborbetweenthesemanticandphonologicalpathways.Thus,pa-tientswith highly developedphonologicalpathwaysmay notexhibit thepatternunlessthesemanticimpairmentis very se-vere (Cipolotti & Warrington,1995; Schwartzet al., 1980).By contrast,dual-routetheoriesthat include a lexical, non-semanticpathway(e.g.,Coltheart,1978,1985;Coltheartetal.,1993)predictthatselective semanticdamageshouldnever af-fect namingaccuracy.

Ourconnectionistaccount,webelieve,alsohasanimportantadvantageof simplicity over the dual-routeapproach. Thisadvantagegoeswell beyond the basicpoint that it providesa singlesetof computationalprinciplesthat canaccountforexception word and nonwordreading,while the dual-routemodelmustrely onseparatesetsof principles.Theadditionaladvantagelies in the fact that the boundarybetweenregularand exception words is not clear, and all attemptsto drawsuchboundariesleadto unfortunateconsequences.First, themarkingof itemsasexceptionswhich mustbe lookedup aswholes in the lexicon ignoresthe fact that most of the let-tersin theseitemswill taketheirstandardgrapheme-phonemecorrespondences.Thus, in PINT, 3/4 of the letterstaketheirregular correspondence.Second,the markingof suchitemsasexceptionsignoresthe fact thateven the partsthat areex-ceptionaladmit of someregularity, so that, for example,theexceptionalpronunciationof the I in PINT alsooccursin manyotherwordscontaininganI (e.g.,mostof thoseendingin I x E,


IND or ILD, wherethe/ x / representsany consonant).Third,exceptionsoften comein clustersthat sharethe samewordbody. Specialword-bodyrules may be invoked to capturetheseclusters,but then any word that conformsto the moreusualcorrespondencebecomesexceptional. Thus,we couldtreatOO K /u/ whenfollowedby K asregular, but this wouldmakeSPOOK, which takesthe morecommoncorrespondenceOO K /U/, an exception. The explicit treatmentof virtuallyany word asanexception,then,neglectsits partial regularityandpreventsthe word both from benefittingfrom this partialregularity andfrom contributing to patternsof consistency itentersinto with otheritems. Our connectionistapproach,bycontrast,avoidstheneedto imposesuchunfortunatedivisions,and leaves a mechanismthat exhibits sensitivity to all thesepartially regularaspectsof so-calledexceptionwords.

The fact that exceptionsaresubjectto the sameprocessesasall otheritemsin oursystemallowsusto explainwhy therearevirtually no completelyarbitraryexceptions.On theotherhand,thedual-routeapproachleavesthis fact of thespelling-soundsystemcompletelyunexplained. Nor, in fact, do somedual-routemodelsevenprovide a basisfor accountingfor ef-fectsof consistency in readingwordsandnonwords.Recentdual-routetheorists(e.g.,Coltheartet al., 1993;Coltheart&Rastle,1994)have appealedto partialactivationof otherlex-ical items as a basisfor sucheffects. Suchan assumptionmovespart-waytowardourview thatconsistency effectsarisefrom theinfluenceof all lexical items.Wewouldonlyaddthatour connectionistmodelexhibits theseeffectsaswell as therequisitesensitivity to generalgrapheme-phonemecorrespon-dences,without stipulatinga separaterule systemover andabove thesystemthatexhibits thebroadrangeof consistencyeffects.

Additional Empirical IssuesProponentsof dual-routetheorieshave raiseda numberofempiricalissuesthat they believe challengeour connectionistaccountof normal and impairedword reading. For exam-ple, Coltheartet al. (1993,alsoseeBesneret al., 1990)raisesix questionsconcerningthe readingprocess,all but oneofwhich—exceptionword reading—they deemproblematicforthe SM89 framework. Two of the remainingfive—nonwordreadingandacquiredsurfacedyslexia—have beenaddressedextensively in thecurrentwork. Herewe discusshow there-mainingthreeissues—acquiredphonologicaldyslexia, devel-opmentaldyslexia, and lexical decision—maybe accountedfor in light of thesefindings. We alsoconsiderthreeotherempiricalfindingsthathave beeninterpretedasproviding ev-idenceagainstthe currentapproach—pseudohomophoneef-fects(Buchanan& Besner, 1993;Fera& Besner, 1992;Mc-Cann& Besner, 1987;Pugh,Rexer, & Katz, 1994),stimulusblockingeffects(Baluch& Besner, 1991;Coltheart& Rastle,1994;Monselletal.,1992),andtherecentfindingthatnaminglatenciesfor exceptionwordsareinfluencedby thepositionoftheexceptionalcorrespondence(Coltheart& Rastle,1994).

Acquired Phonological Dyslexia. As mentionedearlier,it isstraightforwardwithin theSM89framework toaccountforthecentralcharacteristicof acquiredphonologicaldyslexia—substantiallybetterword readingthan nonwordreading—intermsof a relatively selective impairmentof the phonologi-cal pathway. The apparentdifficult ariseswhenconsideringpatientswho (a) arevirtually unableto readnonwords,sug-gestinga completeeliminationof the phonologicalpathway,and(b) have anadditionalsemanticimpairmentthatseemstorenderthesemanticpathwayinsufficientto accountfor theob-served proficiency at word reading. Two suchpatientshavebeendescribedin theliterature:WB (Funnell,1983)andWT(Coslett,1991).To explain thewordreadingof thesepatients,dual-routetheoristsclaim that it is necessaryto introduceathird routethatis lexical but nonsemantic.

In pointof fact,Coltheartetal. (1993)explicitly consideredan alternative explanationand(we think too hastily) rejectedit.

Perhapsa patientwith an impairedsemanticsys-tem,whothereforemakessemanticerrorsin readingcomprehensionandwhoalsohasaseverelyimpairednonsemanticreadingsystem,couldavoid makingse-manticerrorsin readingaloudbymakinguseof evenvery poor informationaboutthe pronunciationof aword yielded by the nonsemanticreadingsystem.Thesemanticsystemmayno longerbeableto dis-tinguishtheconceptorange fromtheconceptlemon;however, to avoid semanticerrorsin readingaloud,all thenonsemanticrouteneedsto deliver is just thefirst phonemeof the written word, not a completerepresentationof its phonology. (p. 596)

Coltheartandcolleaguesarguedagainstthis accountentirelyon the basisof two findingsof Funnell (1983): WB did notpronouncecorrectlyany of asinglelist of 20writtennonwords,andhe did not give the correctphonemiccorrespondencetoany of 12 singleprinted letters. Thus, they claimed,“WB’ snonsemanticreadingroutewasnot just severely impaired,itwascompletelyabolished”(p. 596).

This argumentis unconvincing. First of all, it would seemunwiseto basesucha strongtheoreticalclaim on sofew em-pirical observations,especiallygivenhow little informationisrequiredof thephonologicalpathwayontheaboveaccount.Topronouncea nonwordcorrectly, however, all of its phonemesmustbe derivedaccurately. Thus,WB’s inability to read20nonwordscannotbetakenasdefinitiveevidencethathisphono-logical pathwayis completelyinoperative. Furthermore,WBdid, in fact,makesemanticerrorsin oralreading(e.g.,TRAIN K“plane”, GIRL K “boy”; seeAppendix1 of Funnell,1983).Al-thoughsucherrorswererelatively rare,comprisingonly 7.5%(5/67) of all lexical error responses,therewere no error re-sponsesthatwerecompletelyunrelatedto thestimulus.Thus,theeffectof semanticrelatednessin errorsis difficult toascribeto chanceresponding(seeEllis & Marshall,1978;Shallice&


McGill, 1978). More generally, fully 38.8%(26/67)of WB’slexical errorshada semanticcomponent,typically in combi-nationwith visual/phonemicor morphologicalrelatedness.

More critically, Coltheartand colleaguesfail to take intoaccountthe fact thatWB exhibited deficitson purelyphono-logical tasks,suchasnonwordrepetition(Funnell,1983)andphonemestrippingandblending(Patterson& Marcel,1992),suggestinganadditionalimpairmentwithin phonologyitself.Funnellhadarguedthatsuchaphonologicalimpairmentcouldnot explain WB’s nonwordreadingdeficit, because(a) here-peatednonwordsmoresuccessfully(10/20)thanhereadthem(0/20), and (b) he achieved somesuccess(6/10) in blendingthree-phonemewordsfrom auditorypresentationof their in-dividual phonemes.We note,however, that the failure to re-peatfully half of a set of simple, single-syllable,word-likenonwords(e.g., COBE, NUST) certainly representsa promi-nentphonologicaldeficit. Moreover, sinceFunnell’sauditoryblendingtestusedonly wordsastarget responses,WB’s par-tial successon this taskis notespeciallygermaneto theissue.PattersonandMarcel (1992)assessedWB’s blendingperfor-mancewith nonwordtargetsandfoundthathewasunabletoproduceasinglecorrectresponse,whethertheauditorypresen-tationconsistedof the threeindividual phonemesof a simplenonword(suchasCOBE) or its onsetandrime. PattersonandMarcelarguedthat this phonologicaldeficit in a non-readingtaskwassufficient to accountfor WB’s completeinability toreadnonwords.

Thus,the patternof performanceexhibited by WB canbeexplainedwithin the SM89 framework in termsof a mildlyimpaired semanticreading pathway, possibly an impairedphonologicalreadingpathwaybut, in particular, an impair-mentwithin phonologyitself. A similar explanationappliesto WT (Coslett,1991):althoughthispatient’sperformanceonphonologicalblendingtasksis not reported,shewasseverelyandequally impairedin her ability to readandto repeatthesamesetof 48nonwords.

We point out in passing that deep dyslexia (Coltheartet al., 1980), the remainingmajor type of acquiredcentraldyslexiaandcloselyrelatedto phonologicaldyslexia(see,e.g.,Glosser& Friedman,1990),canbeaccountedfor in termsofthe samecomputationalprinciplesthat are employedin thecurrentwork (seePlaut& Shallice,1993).

Developmental Dyslexia. Our focusin the currentworkhasbeenon characterizingthecomputationalprinciplesgov-erningnormalskilledreadingandacquireddyslexia followingbraindamagein premorbidlyliterateadults. Evenso,we be-lievethatthesameprinciplesprovideinsightinto thenatureofreadingacquisition,bothin itsnormalformandindevelopmen-tal dyslexia, in which childrenfail to acquireage-appropriatereadingskills.

Thereis generalagreementthat a numberof distinct pat-ternsof developmentaldyslexia exist, althoughexactly whatthesepatternsareandwhat givesrise to themis a matterofongoingdebate. A commonviewpoint is that therearede-

velopmentalanaloguesto theacquiredformsof dyslexia (see,e.g.,Baddeley, Ellis, Miles,& Lewis,1982;Harris& Coltheart,1986;Marshall,1984). Perhapsthe clearestevidencecomesfrom CastlesandColtheart(1993),whocompared53dyslexicchildrenwith 56 age-matchednormalreadersin their abilityto pronounceexceptionwordsandnonwords. The majority(32) of the dyslexic childrenwereabnormallypoor on bothsetsof items. However, 10 wereselectively impairedat ex-ceptionwordreading,correspondingto developmentalsurfacedyslexia, and 8 were selectively impairedat nonwordread-ing, correspondingto developmentalphonologicaldyslexia.CastlesandColtheartinterprettheir findingsassupportingadual-routetheoryof word reading,in which eitherthe lexicalor thesublexicalprocedurecanselectively fail todevelopprop-erly (althoughthey offer no suggestionas to why this mightbe).

More recently, Manis, Seidenberg, Doi, McBride-Chang,and Peterson(in press)compared51 dyslexic children with51controlsmatchedfor ageand27matchedfor readinglevel.They confirmedthe existenceof separatesurfaceandphono-logical dyslexic patternsalthough,again,mostof thedyslexicchildren showed a generalreadingimpairment. Critically,the performanceof the developmentalsurfacedyslexic chil-drenwasremarkablysimilar to thatof reading-level matchedcontrols,suggestinga developmentaldelay. By contrast,thephonologicaldyslexic childrenperformedunlike eithersetofcontrols,suggestinga deviant developmentalpattern. Whilethesefindings are not incompatiblewith the dual-routeac-count,Manis andcolleaguescontendthat they aremorenat-urally accountedfor in termsof differentimpedimentsto thedevelopmentof a single(phonological)pathway. Specifically,they suggest(following SM89)thatthedelayedacquisitionindevelopmentalsurfacedyslexia mayarisefrom limitations intheavailablecomputationalresourceswithin thephonologicalroute. Consistentwith this interpretation,SM89found thataversionof theirnetwork,trainedwith only half thenormalnum-berof hiddenunits,showeda disproportionateimpairmentonexceptionwordscomparedwith regular words(althoughper-formanceonall itemswaspoorer, consistentwith finding thatgeneralizeddeficitsaremostcommon).However, thenonwordreadingcapabilityof thenetworkwasnottested,andColtheartet al. (1993)point out that it wasnot likely to bevery good,given thatoverall performancewasworsethanin thenormalnetworkwhich itself wasimpairedonnonwordreading.

Justasfor normalskilledreading,thislimitationof theSM89modelstemsfrom its useof inappropriatelystructuredortho-graphicandphonologicalrepresentations.Todemonstratethis,we traineda feedforwardnetworkwith only 30 hiddenunitsin an identicalfashionto theonewith 100hiddenunits fromSimulation4 (without semantics).This networkwaschosenfor comparisonsimplybecauseit is theonly onefor whichtherelevant acquisitiondatahasalreadybeenpresented,in Fig-ure22—theothernetworkswouldbeexpectedto show similareffects.Thecorrespondingdatafor theversionwith 30hidden


0%

100%

200%

300%

400%

500%

Training Epoch&0

10

20

30

40

50

60

70

80

90

100

Per

cent

Cor

rect

Training With 30 Hidden Unitsy

HF Reg(LF Reg)HF Exc(LF Exc)Nonwords*

Figure 26. Correctperformanceof a feedforwardnetworkwith only 30hiddenunitsonTarabanandMcClelland’s(1987)high- and low-frequency exception words and their regularconsistentcontrolwords,andon Glushko’s(1979)nonwords,as a function of training epoch. The network was trainedexactly as the one whosecorrespondingdata are shown inFigure22.

units aregiven in Figure26. As a comparisonof the figuresreveals,limiting thenumberof hiddenunitsselectively impairsperformanceonexceptionwords,particularlythoseof low fre-quency. By contrast,nonwordreadingis affectedonly veryslightly. Notice that theperformanceof thedyslexic networkat epoch500is quitesimilar to thatof thenormalnetworkataboutepoch150. Thus,limiting thecomputationalresourcesthat are available for learningthe spelling-to-soundtask re-producesthe basicdelayedpatternof developmentalsurfacedyslexia. Othermanipulationsthat impedelearning,suchasweak or noisy weight changes,would be expectedto yieldsimilar results.

With regardto developmentalphonologicaldyslexia, Manisetal. (in press)suggestthataselective impairmentin nonwordreadingmay arisefrom the useof phonologicalrepresenta-tionsthatarepoorlyarticulated,perhapsduetomoreperipheraldisturbances(alsosee,e.g.,Liberman& Shankweiler, 1985;Rack,Snowling, & Olson,1992). A considerationof thenor-mal SM89modelis instructive here. Thatnetworkemployedrepresentationsthat,we have argued,poorly capturetherele-vantstructurewithin andbetweenorthographyandphonology.As a result,themodelwasover 97%correctat readingwords,bothregularandexception,butonly75%correctonasubsetofGlushko’s (1979)nonwords(whenscoredappropriately;seeSeidenberg & McClelland,1990).Thus,in asense,themodelbehaved like a mild phonologicaldyslexic (seeBesneret al.,1990, for similar arguments). In this way, the performanceof the modelprovidesevidencethat a systemwith adequate

computationalresources,but which fails to developappropri-atelycomponentialorthographicand(particularly)phonologi-cal representations,will alsofail to acquirenormalproficiencyin sublexical spelling-soundtranslation.It shouldalsobekeptin mind that, to whatever extent thesemanticpathwaydevel-opsandcontributesduringreadingacquisition,thedissociationbetweenwordandnonwordreadingwouldbeexacerbated.

A final point of contentionwith regardto the implicationsof developmentalreadingdisordersfor the SM89frameworkconcernstheexistenceof childrenwhoseoral readingability,evenonexceptionwords,far surpassestheircomprehension—asin so-calledhyperlexia (Huttenlocher& Huttenlocher,1973;Mehegan& Dreifuss,1972;Metsala& Siegel, 1992;Silver-berg & Silverberg, 1967). Typically, thesechildrenaremod-eratelyto severelyretardedon standardizedintelligencetests,andmaytotally lack conversationalspeech.They alsotendtodevoteaconsiderableamountof timeandattentionto reading,althoughthishasnotbeenstudiedthoroughly. Wesuggestthat,perhapsdueto abnormallypoordevelopmentin thesemanticpathway, suchchildrenmayhave phonologicalpathwaysthatarelike our networkstrainedwithout semantics.In the limit,suchnetworkslearnto pronounceall typesof wordsandnon-wordsaccuratelywith no comprehension.

Lexical Decision. The final of Coltheartet al.’s (1993)objectionsto theSM89modelconcernsits ability to performlexical decisions. While SM89 establishthat, undersomestimulusconditions,the modelcan discriminatewordsfromnonwordson thebasisof a measureof its accuracy in regen-eratingtheorthographicinput,Besnerandcolleagues(Besneret al., 1990;Fera& Besner, 1992)have demonstratedthat itsaccuracy in doing so is worsethan that of humansubjectsin many conditions. Coltheartet al. (1993)mistakenlyclaimthat theSM89orthographicerrorscoresyield a false-positiverateof over80%onWatersandSeidenberg’s(1985)nonwordswhenword error ratesareequatedwith subjects’at 6.1%—in fact, thesenumbersresult from using phonological errorscores(Besneret al., 1990),which SM89 do not employ(al-thoughthey do suggestthat learningphonologicalattractorsfor words might help). While the actual false-positive rateis much lower—Besnerandcolleaguesreporta rateof 28%whenorthographicandphonologicalerrorscoresaresummedand orthographicallystrangewords are excluded—it is stillunsatisfactory.

Of course, SM89 never claimed that orthographicandphonologicalinformationarecompletelysufficient to accountfor lexical decisionperformanceunderall conditions,point-ing out that“theremaybeothercasesin which subjectsmustconsultinformationprovidedby thecomputationfrom orthog-raphy to semantics”(p. 552). Semanticsis a naturalsourceof informationonwhich to distinguishwordsfrom nonwords,given that, in fact, a string of lettersor phonemesis definedto bea word by virtue of it having a meaning.Coltheartandcolleaguesraisetheconcernthat, in a full implementationoftheSM89framework, thepresentationof anorthographically


regular nonword(e.g.,SARE) would activatesemanticsto thesamedegreeasaword(e.g.,CARE), therebyprecludinglexicaldecision.

While further simulationwork is clearly requiredto ad-dressthe full rangeof lexical decisiondataadequately, a fewcommentsmayserve to allay this specificconcern.We imag-ine that the semanticrepresentationsfor wordsarerelativelysparse, meaningthateachwordactivatesvery few of thepos-siblesemanticfeatures,andeachsemanticfeatureparticipatesin the meaningsof a very small percentageof words. Con-nectionistnetworksof the sort we are investigatinglearn tosetthebaseactivationlevel of eachoutputunit to theexpectedvalueof itscorrectactivationsacrosstheentiretrainingcorpus,becausethesevaluesminimizethetotalerrorin theabsenceofany informationabouttheinput. In thecaseof sparseseman-tic representations,this meansthat semanticfeatureswouldbealmostcompletelyinactive without specificevidencefromtheorthographicinput that they shouldbeactive. Noticethatthe natureof this evidencemustbe very specificin order topreventthesemanticfeaturesof a word like CARE from beingactivatedby thepresentationof orthographicallysimilarwordslike ARE, SCARE, CAR, etc. This extremesensitivity to smallorthographicdistinctionswouldalsopreventsemanticfeaturesfrom beingactivatedby a nonwordlike SARE. Thus,on thisaccount,the computationalrequirementsof a connectionistsystemthatmapsorthographyto semanticsveritablyentail theability to performlexical decision.

Pseudohomophone and Blocking Effects. Two other,somewhat overlappingsetsof empirical findings have beenviewed as problematicfor the currentapproach:pseudoho-mophoneeffects(Buchanan& Besner, 1993;Fera& Besner,1992;McCann& Besner, 1987;Pughet al., 1994)andblock-ing effects(Baluch& Besner, 1991;Coltheart& Rastle,1994;Monselletal.,1992).Thefirstsetinvolvesdemonstrationsthat,underavarietyconditions,pseudohomophones(i.e.,nonwordswith pronunciationsthatmatchthatof aword;e.g.,BRANE) areprocesseddifferentlythanorthographically-matchednonpseu-dohomophonicnonwords(e.g.,FRANE). For example,subjectsarefastertonamepseudohomophonesandslower(andlessac-curate)to rejectthemin lexical decision(McCann& Besner,1987).Thesecondsetof problematicfindingsinvolvesdemon-strationsthatsubjects’performanceis sensitive to thecontextin which orthographicstimuli occur, usuallyoperationalizedin termsof how stimuli are blockedtogetherduring an ex-periment. For example,subjectsareslower andmakemoreregularizationerrorswhenpronouncingexceptionwordsinter-mixedwith nonwordsthanwhenpronouncingpureblocksofexceptionwords(Monselletal., 1992).

Neither of thesesets of phenomenais handledparticu-larly well by the SM89 implementation,but both have nat-ural formulationswithin themoregeneralframework that in-cludessemantics.Pseudohomophoneeffectsmay stemfromanarticulatoryadvantagein initiating familiar pronunciations(Seidenberg, Petersen,MacDonald,& Plaut,in press)and/or

from interactionsbetweenphonologyandsemanticsthat donot occurfor controlnonwords.Blocking effectsmayreflectadjustments—eitherstimulus-drivenorunderthestrategiccon-trol of subjects—intherelativecontributionof thesemanticandphonologicalpathwaysin lexical tasks.Theseinterpretationsaresupportedby recentfindingsof Pughet al. (1994), whoinvestigatedeffectsof spelling-soundconsistency andseman-tic relatednessin lexical decision,asa functionof whetherornotthenonwordfoils includepseudohomophones.They foundfasterlatenciesfor consistentwordsthanfor inconsistentwordsonly in thecontext of purelynonpseudohomophonicnonwords;therewasno effect of consistency whenpseudohomophoneswerepresent.Similarly, in a dual lexical decisionparadigm,they obtainedfacilitation for visuallysimilarwordpairsthatarephonologicalconsistent(e.g.,BRIBE–TRIBE) andinhibition forthosethatareinconsistent(e.g.,COUCH–TOUCH; Meyer et al.,1974)only whennopseudohomophoneswerepresent;thein-troductionof pseudohomophoneseliminatedthe consistencyeffect. However, semanticrelatedness(e.g.,OCEAN–WATER)yieldedfacilitationregardlessof nonwordcontext. Thesefind-ingssuggestthatsubjectsnormallyuseboththesemanticandphonologicalpathwaysin lexical decision,but avoid theuseofthephonologicalpathwaywhenthiswouldleadto inappropri-atesemanticactivity,aswhenpseudohomophonesareincludedasfoils.

Effects of Position of Exceptional Correspondence.Coltheartand Rastle(1994) argue that one of the determi-nantsof namingRT for exception words is the position—-countinggraphemesandphonemesfromleft to right—atwhichtheworddeviatesfrom rule-governedcorrespondences.Theyclaim that such an effect is incompatiblewith any parallelapproachto word naming,whereastheDual-RouteCascaded(DRC)modelof Coltheartetal.(1993)bothpredictsandsimu-latesthiseffect,becausetheGPCprocedureof theDRCmodeloperatesserially acrossan input string. The threemonosyl-labic wordsfor which they provide simulationdatafrom theDRC modelareCHEF, TOMB andGLOW. By their account,thecritical factor is thatCHEF—for which themodelrequiresthelargestnumberof processingcycles—is irregular at its firstgrapheme/phoneme;TOMB, requiringanintermediatenumberof cycles,breakstherulesat the secondgrapheme/phoneme;andGLOW, whichyieldsthefastesttime from themodel,onlybecomesirregularat thethird position.

By our account,the critical differencebetweenthesethreewords may not be the positionof irregularity but rathertheproportionof otherknownwordswith similarspellingpatternsthatagreeor conflictwith thetargetword’spronunciation(seeJared& Seidenberg, 1990, for an elaborationof this argu-ment). TheConciseOxford Dictionary lists 72 monosyllabicwordsstartingwith CH ; 63 of thesehave the pronunciation/tS/ as in CHAIR; 5 have the pronunciation/S/ as in CHEF; 4arepronounced/k/ as in CHORD. CHEF is thereforea highlyinconsistentword. For thewordTOMB, it is somewhatdifficultto know what neighborhoodof wordsto choosefor a similar


analysis. If we takewordsbeginningwith TO , althoughthetwomostcommonpronunciationsare/a/asin TOPand/O/asinTONE, thethird mostlikely pronunciation,with 7 exemplars,is/U/ asin TO, TOO, andTOMB; otherpronunciations(asin TON,TOOK, TOIL) arelesscommon. At the body level, TOMB hasonefriend, WOMB, andtwo enemies,BOMB andCOMB. TOMB

is thereforeamoderatelyinconsistentword. Finally, for wordsendingin OW, althoughtheGPCprocedureof Coltheartetal.(1993)considersOW K /W/ (asin NOW) regularandOW K /O/asin GLOW irregular, in fact 17 of the29 monosyllabicwordsin Englishendingin OW rhymewith GLOW, whereasonly 12have Coltheartandcolleagues’“regular” pronunciationasinNOW. Thus,GLOW is inconsistentbut hasthe morefrequentcorrespondence.Consistentwith this interpretation,the at-tractornetworkdevelopedin Simulation3 producesnaminglatenciesof 2.00for CHEF, 1.92for TOMB, and1.73for GLOW.

Theexperimentwith humanreadersperformedby ColtheartandRastle(1994)revealedtheirpredictedrelationshipbetweenpositionof irregularity andnamingRT, with slowestRTs towords like CHAOS with an irregular first grapheme-phonemecorrespondenceandfastestRTsto wordslike BANDAGEwhichdo not becomeirregularuntil position5. All of the stimuluswordshadtwosyllables,whichpreventsusfromevaluatingtheperformanceof ournetworkson their materials.Inspectionofthesewordsin their appendix,however, againsuggestsa con-foundingbetweenpositionanddegreeof consistency. Taketheitemswhich,by theiranalysis,becomeirregularat position5;almosthalf of thesewords(6/14)weretwo-syllablewordswithfirst-syllablestressandwith secondsyllablesendingin silentE (e.g.,BANDAGE andFESTIVE). SincetheGPCprocedureofColtheartet al. (1993)appliesthe samerulesindependentofsyllableposition,it assignsthevowel /A/ to thegraphemeA E

in the secondsyllable of BANDAGE and the vowel /I/ to thegraphemeI E in the secondsyllableof FESTIVE. Despitethefact thatourmodelis notyetableto treatmultisyllabicwords,thenatureof its operationensuresthatit wouldbesensitive tothe fact thatwordswith this sortof patterndo not have tense(long) vowels in secondsyllable. Thegreat majority of two-syllablewordsendingin IVE (e.g.,ACTIVE, PASSIVE, MOTIVE,NATIVE) havethesamefinal vowelasFESTIVE, makingFESTIVE

a relatively consistentword. Whetherthis reinterpretationofthe ColtheartandRastleeffect turnsout to give an adequateaccountof their resultsremainsto be seenfrom future em-pirical andmodelingwork. Furthermore,even if a positioneffect is foundusingproperlycontrolledstimuli, it mayverywell be consistentwith a parallelcomputationof phonologyfrom orthographyin which thedecisionto initiatearticulationdependsonly on the initial phoneme(s)(Kawamoto,Kello, &Jones,1994,1995).Thus,ratherthanbeingincompatiblewithourapproach,ColtheartandRastle’sfindingsmayin factrelateto simplepropertiesof networksthatdeveloprepresentationsover time.

Extensions of the Approach

The approachwe have takencanbeextendedin a numberofdifferentdirections. Themostobviousandnaturalextensionis to thereadingof multisyllabicwords.Thepronunciationofthesewordsexhibits the samekind of quasi-regularstructurefoundatthelevelof monosyllables(Jared& Seidenberg,1990),but theseregularitiesnow applynot justtographeme-phonemecorrespondencesbut to the assignmentof stressaswell, andthey involvesensitivity to linguisticvariablessuchastheform-classof the word, its derivational status,and several otherfactors(Smith& Baker, 1976).

Onechallengethatarisesin extendingourapproachto mul-tisyllabicwordsis findingabettermethodfor condensingreg-ularitiesacrosspositionswithin a word. The representationswehaveusedcondenseregularitieswithin theonsetor thecodaof a monosyllabicword,but experiencewith particularcorre-spondencesin the onsetdo not affect processingof the samecorrespondencein the codaor vice versa. Indeed,our modelhastwo completelyseparatesetsof weightsfor implementingthesecorrespondences,andmostof its failures(e.g.,with theconsonantJ in the coda)are attributableto the fact that itsknowledgecannotbetransferredbetweenonsetsandcodas.

Ultimately, it seemslikely that thesolutionto the problemof condensingregularitieswill involvesequentialprocessingatsomelevel. Theparadigmcaseof this is theapproachusedinNETtalk (Sejnowski & Rosenberg, 1987,alsoseeBullinaria,1995),in whichthelettersareprocessedsequentially, proceed-ing throughatext fromleft to right. Theinputisshiftedthrougha window that is several slotswide andeachletter is mappedto its correspondingphonemewhenit falls in thecentralslot.Thisallowseachsuccessive letterto beprocessedby thesamesetof units,sotheregularitiesextractedin processinglettersinany positionareavailablefor processinglettersin every otherposition.At thesametime, thepresenceof otherlettersin theslotsflankingthecentralslot allowsthenetworkto becontextsensitive andto exhibit consistency effects.

Onedrawbackof sucha letter-by-letterapproachis thattheonsetof pronunciationof a word is completelyinsensitive totheconsistency of its vowel; consistency doesaffect thevowelcorrespondences,but theseonly comeinto play after thepro-nunciationof the onsethasbeencompleted.This presentsaproblembecausethe empiricalfinding of consistency effectsin naminglatenciesis oneof themainmotivationsof aconnec-tionistapproachto wordreading.For thisreason,andbecausethereis a greatdealof coarticulationof successive phonemes,we have takenthe view that fluent, skilled readinginvolvesa parallel constructionof a pronunciationof at leastseveralphonemesat a time. One possibility is that skilled readersattemptto processasmuchof thewordasthey canin parallel,thenredirectattentionto theremainingpartandtry again(seePlaut,McClelland,& Seidenberg, in press,for a simulationillustrating this approach).In this way, early on in learning,readingis strictly sequential,as in NETtalk, but as skill de-


velops,it becomesmuchmoreparallel,as in the modelswehave presentedhere. Theresultis that thesystemcanalwaysfall backon a sequentialapproach,which allows theapplica-tion of knowledgeof regularitiesacquiredin readingunitsofany sizeto beappliedacrosstheentirelengthof theutterance(Skoyles,1991). Theapproachextendsnaturallyto wordsofany length,with thesizeof thewindow of parallelcomputationbeingcompletelydependentonexperience.

Moving beyond single word reading,the approachtakenhereis applicable,we believe, to a wide rangeof linguisticand cognitive domains—essentially, to all thosewith quasi-regular structure,in the sensethat thereis systematicitythatcoexists with somearbitrarinessand many exceptions. Thefirst domainto which the approachwas appliedwasthat ofinflectionalmorphology(Rumelhart& McClelland,1986).Asstatedin the Introduction,this applicationcertainly remainscontroversial;Pinkerandhis colleagues(Marcuset al., 1992;Pinker, 1991;Pinker& Prince,1988)continueto maintainthatnosinglemechanismcanfully capturethebehavior of thereg-ularinflectionalprocessandthehandlingof exceptions.Whilewe do not claim that the existing connectionistsimulationshave fully addressedall valid criticisms raised,at this pointweseelittle in thesecriticismsthatstandsagainsttheapplica-bility of the connectionistapproachin principle. Indeed,theargumentsraisedin thesepapersdo not, in general,reflectafull appreciationof thecapabilitiesof connectionistnetworksin quasi-regular domains. For example,Pinker (1991)doesnot acknowledgethat connectionistmodelsof both spelling-to-sound(as shown here and in SM89) and of inflectionalmorphology(Daugherty& Seidenberg, 1992)show the veryfrequency-by-regularity interactionthathetakesasoneof thekey indicatorsof theoperationof a(frequency insensitive)rulesystemanda (frequency sensitive) lexical lookupmechanism.

Indeed,thereareseveralaspectsof theempiricaldatain thedomainof inflectional morphologythat appearat this pointto favor an interpretationin termsof a single, connectionistsystemthat is sensitive to both frequency and consistency.We will considerhereonesuchaspect,namelythe historicalevolution of the Englishpasttensesystem. HareandElman(in press)have reviewedthepatternof changefrom theearlyOld English(EOE)period(circa 870)to thepresent.In EOE,thereweretwo maintypesof verbs—strongandweak—eachconsistingof several subtypes.Over the periodbetween870andthepresent,thedifferenttypesof weakverbscoalescedintoa singletype: the current“regular” past. Many of the strongverbs“regularized,” butseveralof thempersistto thisdayasthevariousirregular verbsof modernEnglish. The coalescenceof the various types of weak verbs into a single type, thepatternof susceptibilityto regularizationamongthe strongverbs,andtheoccasionaloccurrenceof “irregularization,” inwhich a particularweakverb took on the characteristicsof aclusterof strongverbs,areall tracedto workingsof a singleconnectionistsystemthat is sensitive both to frequency andconsistency. In HareandElman’s approach,languagechange

iscastastheiterativeapplicationof anew generationof learners(simulatedby new, untrainednetworks)to the outputof theprevious generationof learners(simulatedby old networks,trainedontheoutputof evenoldernetworks).Eachgenerationimposesits own distortionson the corpus: amongthesearetheeliminationof subtledifferencesbetweenvariationsof theweakpastthat apply to similar forms,andthe regularizationof low-frequency irregular formswith few friends. Graduallyover thecourseof generations,thesystemis transformedfromthe highly complex systemof circa 870 to the muchsimplersystemthat is in usetoday. Theremainingirregularverbsareeitherhighly consistentwith their neighbors,highly frequent,orboth;lessfrequentandlessconsistentstrongverbshavebeenabsorbedby the regular system. Crucially for our argument,both the “regular” (or weak)systemandthe “exception” (orstrong)systemshow effectsof frequency andconsistency, aswouldbeexpectedona single-systemaccount.

Derivationalmorphology presentsanotherrichquasi-regulardomainto whichourapproachwouldapply. First of all, thereare many morphemesthat are partially productive in waysthataresimilartoquasi-regularcorrespondencesin inflectionalmorphologyandspelling-to-sound: that is, they appearto begovernedby a setof “soft” constraints.Second,themeaningof amorphologicallycomplex word is relatedto, but notcom-pletelydeterminedby, its constituentmorphemes;thus,thereis partial, but not complete,regularity in the mappingfrommeaningto sound(seeBybee,1985,for a discussionof thesepoints).

Gradedinfluencesof frequency andconsistency appeartooperatenot just at thelevel of individualwordsbut alsoat thelevel of sentences,asevidencedby recentfindingsof lexical,semanticand contextual effects in syntacticambiguity reso-lution (see,e.g.,MacDonald,1994;Taraban& McClelland,1988;Trueswell,Tanenhaus,& Garnsey, 1994).For example,considerthe temporarymainverb/reducedrelative ambiguityassociatedwith theword EXAMINED in thesentenceTHE EVI-DENCE EXAMINED BY THE LAWYER WAS USELESS(Ferreira&Clifton, 1986).Thedegreetowhichsubjectsareslowedin sen-tencecomprehensionwhenencounteringsuchambiguitiesissubjectto anumberof influences,includingapreviousdisam-biguatingcontext (Trueswelletal.,1994),thesemanticplausi-bility of theheadnounin themain-verbreading(cf. EVIDENCE

vs.ananimatenounlike WITNESS), andtherelative frequencywith which the verb is usedasa simplepasttense(e.g.,THE

PERSONEXAMINED THEOBJECT) asopposedtoapassivizedpastparticiple(e.g.,THE OBJECTWAS EXAMINED BY THE PERSON;MacDonald,1994).Verbsthatareconsistentlyusedin thesim-ple pasttenseleadto muchstrongergardenpatheffectswhena reducedrelative interpretationis requiredthandoverbsthataremoreambiguousin their usage.Theseeffectshave a nat-ural interpretationin termsof aconstraint-satisfactionprocessin whicha varietyof sourcesof lexical knowledgeconspiretoproducea coherentsentenceinterpretation,including gradedinfluenceswhosestrengthdependson the consistency of a


word-form’s usage(seeJuliano& Tanenhaus,in press;Mac-Donald,Pearlmutter, & Seidenberg, 1994,for discussion,andKawamoto,1993;Pearlmutter, Daugherty, MacDonald,& Sei-denberg,1994;St.John& McClelland,1990,for connectionistsimulationsillustratingsomeof theseprinciples).

Evenmoregenerally, thedomainsencompassedby seman-tic, episodic,andencyclopedicknowledgeareall quasi-regular,in thatfactsandexperiencesarepartiallyarbitrary, butalsopar-tially predictablefromthecharacteristicsof other, relatedfactsandexperiences(seeMcClelland,McNaughton,& O’Reilly,in press,for discussion).Considertherobin, for example. Itspropertiesarelargely predictablefrom thepropertiesof otherbirds,but its color andexactsize,thesoundthatit makes,thecolorof its eggs,etc,arerelativelyarbitrary. Rumelhart(1990;Rumelhart& Todd,1993)showshow a connectionistnetworkcanlearnthe contentsof a semanticnetwork,capturingboththe sharedstructurethat is presentin the set of concepts—so asto allow generalizationto new examples—whileat thesametime masteringtheidiosyncraticpropertiesof particularexamples. As anotherexample,considerJohnF. Kennedy’sassassination.Therewereseveral arbitrary aspects,suchasthe dateand time of the event, etc. But our understandingof what happeneddependson knowledgederived from othereventsinvolving presidents,motorcades,rifles,spies,etc. Ourunderstandingof thesethings informs, indeedpervades,ourmemoryof Kennedy’s assassination.And our understandingof other similar events is ultimately influencedby what welearnaboutKennedy’sassassination.St.John(1992)providesanexampleof a connectionistnetworkthatlearnsthecharac-teristicsof eventsand appliesthem to other, similar events,usingjust thesamelearningmechanism,governedby thesameprinciplesof combinedfrequency andconsistency sensitivity,asourspelling-to-sound simulations.

In summary, quasi-regular systemslike that found in theEnglishspelling-to-soundsystemappearto bepervasive, andthereareseveralinitial indicationsthatconnectionistnetworkssensitiveto frequency andconsistency will provideinsightintothewaysuchsystemsarelearnedandrepresented.

ConclusionsAt theendof their paper, Coltheartet al. (1993)reacha con-clusionthatseemsto them“inescapable.”

Ourability todealwith linguisticstimuliwehavenotpreviouslyencountered. . . canonly beexplainedbypostulatingthatwe have learnedsystemsof generallinguistic rules,andour ability at the sametime todeal correctly with exceptionsto theserules . . .canonly be explainedby postulatingthe existenceof systemsof word-specificlexical representations.(p. 606)

We have formulateda connectionistapproachto knowledgeandprocessingin quasi-regulardomains,instantiatedit in thespecificdomainof English word reading,and demonstrated

that it canaccountfor thebasicabilities of skilled readerstohandlecorrectlyboth regular andexceptionitemswhile stillgeneralizingwell tonovel items.Within theapproach,thepro-ficiency of humansin quasi-regulardomainsstemsnotfromtheexistenceof separaterule-basedanditem-specificmechanisms,but from the fact that thecognitive systemadheresto certaingeneralprinciplesof computationin neural-likesystems.

Ourconnectionistapproachnotonlyaddressesthesegeneralreadingabilities, but also provides insight into the detailedeffectsof frequency andconsistency bothin thenaminglatencyof normal readers,and in the impairednamingaccuracy ofacquiredanddevelopmentaldyslexic readers.A mathematicalanalysisof a simplified system,incorporatingonly someofthe relevant principles,forms the basisfor understandingtheintimaterelationshipbetweenthesefactorsand,in particular,theinherentlygradednatureof spelling-soundconsistency.

The more generallexical framework for word readingonwhich thecurrentwork is basedcontainsa semanticpathwayin additionto a phonologicalpathway. In contrastto the lex-ical and sublexical proceduresin dual-routetheories,whichoperatein fundamentallydifferentways,the two pathwaysinthe currentapproachoperateaccordingto a commonset ofcomputationalprinciples.As aresult,thenatureof processingin thetwo pathwaysis intimatelyrelated.In particular, a con-siderationof thepatternof impairedandpreservedabilitiesinacquiredsurfacedyslexia leadsto a view in which thereis apartialdivision of laborbetweenthe two pathways.Thecon-tribution of thephonologicalpathwayis a gradedfunctionoffrequency andconsistency; itemsweakon both measuresareprocessedparticularlypoorly. Overt accuracy on theseitemsis not compromised,however, becausethesemanticpathwayalsocontributesto the pronunciationof words(but not non-words). Therelative capabilitiesof the two pathwaysis opento individual differences,and thesedifferencesmay becomemanifestin the patternand severity of readingimpairmentsfollowing braindamage.

Needlessto say, much remainsto be done. The currentsimulationshave specificlimitations, suchas the restrictionto uninflectedmonosyllablesandlack of attentionpaidto thedevelopmentof orthographicrepresentations,thatneedto beremediedin futurework. Furthermore,thenatureof processingwithin thesemanticpathwayhasbeencharacterizedonly in thecoarsestway. Finally, awiderangeof relatedempiricalissues,includingphonologicaldyslexia,developmentaldyslexia, lexi-caldecision,andpseudohomophoneandblockingeffects,havebeenaddressedonly in very generalterms. Nonetheless,theresultsreportedhere,alongwith thoseof otherstakingsimilarapproaches,clearlysuggestthat the computationalprinciplesof connectionistmodelingcanleadto a deeperunderstandingof thecentralempiricalphenomenain word readingin partic-ular, andin quasi-regulardomainsmoregenerally.


Appendix 1: Stimuli Used in Simulation StudiesRegular RegularConsistent Inconsistent Ambiguous Exception Nonword

High FrequencyBEST BASE BROWN ARE LARE

BIG BONE CLEAR BOTH FOTH

CAME BUT DEAD BREAK DEAK

CLASS CATCH DOWN CHOOSE BOOSE

DARK COOL FOUR COME POME

DID DAYS GONE DO MO

FACT DEAR GOOD DOES POES

GOT FIVE HEAD DONE RONE

GROUP FLAT HOW FOOT POOT

HIM FLEW KNOW GIVE MIVE

MAIN FORM KNOWN GREAT REAT

OUT GO LOVE HAVE MAVE

PAGE GOES LOW MOVE BOVE

PLACE GROW NEAR PULL RULL

SEE HERE NOW PUT SUT

SOON HOME ONE SAID HAID

STOP MEAT OUR SAYS TAYS

TELL PAID OWN SHALL NALL

WEEK PLANT SHOW WANT BANT

WHEN ROLL SHOWN WATCH NATCH

WHICH ROOT STOOD WERE LERE

WILL SAND TOWN WHAT DAT

WITH SMALL YEAR WORD TORD

WRITE SPEAK YOUR WORK BORK

Low FrequencyBEAM BROOD BLOWN BOWL NOWL

BROKE COOK BROW BROAD BOAD

BUS CORD CONE BUSH FUSH

DEED COVE CROWN DEAF MEAF

DOTS CRAMP DIVE DOLL FOLL

FADE DARE DREAD FLOOD BOOD

FLOAT FOWL FLOUR GROSS TROSS

GRAPE GULL GEAR LOSE MOSE

LUNCH HARM GLOVE PEAR LEAR

PEEL HOE GLOW PHASE DASE

PITCH LASH GOWN PINT PHINT

PUMP LEAF GROVE PLOW CLOW

RIPE LOSS HOOD ROUSE NOUSE

SANK MAD LONE SEW TEW

SLAM MOOSE PLEAD SHOE CHOE

SLIP MOTH POUR SPOOK STOOK

STUNT MOUSE PRONE SWAMP DRAMP

SWORE MUSH SHONE SWARM STARM

TRUNK PORK SPEAR TOUCH MOUCH

WAKE POSE STOVE WAD NAD

WAX POUCH STRIVE WAND MAND

WELD RAVE SWEAR WASH TASH

WING TINT THREAD WOOL BOOL

WIT TOAD ZONE WORM PORM

Note: The“RegularConsistent”words,“RegularInconsistent”words,and“Exception”wordsarefrom Experi-ments1 and2 of TarabanandMcClelland(1987). In thosestudies,theregularconsistentwordsarethecontrolwordsfor the exceptionwords. In addition,eachregularinconsistentword sharesa bodywith someexceptionword. The“Ambiguous”wordscontainbodiesassociatedwith two or morepronunciations,eachof whichoccursin manywords.Theyweregeneratedby SeidenbergandMcClelland(1989)tobematchedin frequency(Kucera&Francis,1967)with theTarabanandMcClellandhigh-andlow-frequencyregularconsistentandexceptionwords.The“Nonwords”weregeneratedby alteringtheonsetsof theexceptionwords.


Appendix 2: Accepted Pronunciations of Glushko’s (1979) NonwordsConsistentNonwords InconsistentNonwords

Nonword Pronunciation(s) Nonword Pronunciation(s)BEED /bEd/ BILD /bIld/, /bild/BELD /beld/ BINT /bInt/, /bint/BINK /biNk/ BLEAD /blEd/, /bled/BLEAM /blEm/ BOOD /bUd/, /b z d/, /bud/BORT /bOrt/ BOST /bOst/,/b z st/,/bost/BROBE /brOb/ BROVE /brOv/, /brUv/, /br z v/CATH /k@T/, /kaT/ COSE /kOs/,/kOz/, /kUz/COBE /kOb/ COTH /kOT/, /koT/DOLD /dOld/, /dald/ DERE /dAr/, /dEr/, /dur/DOON /dUn/ DOMB /dOm/,/dUm/, /dam/,/damb/DORE /dOr/ DOOT /dUt/, /dut/DREED /drEd/ DROOD /drUd/, /dr z d/, /drud/FEAL /fEl/ FEAD /fEd/, /fed/GODE /gOd/ GOME /gOm/,/gz m/GROOL /grUl/, /grul/ GROOK /grUk/, /gruk/HEAN /hEn/ HAID /h@d/,/hAd/, /hed/HEEF /hEf/ HEAF /hEf/, /hef/HODE /hOd/ HEEN /hEn/,/hin/HOIL /hYl/ HOVE /hOv/, /hUv/, /h z v/LAIL /lAl/ LOME /lOm/, /l z m/LOLE /lOl/ LOOL /lUl/, /lul/MEAK /mAk/, /mEk/ MEAR /mAr/, /mEr/MOOP /mUp/ MONE /mOn/,/mz n/, /mon/MUNE /mUn/, /myUn/ MOOF /mUf/, /muf/NUST /nz st/ NUSH /nz S/,/nuS/PEET /pEt/ PILD /pIld/, /pild/PILT /pilt/ PLOVE /plOv/, /plUv/, /pl z v/PLORE /plOr/ POMB /pOm/,/pUm/, /pam/,/pamb/PODE /pOd/ POOT /pUt/, /put/POLD /pOld/, /pald/ POVE /pOv/, /pUv/, /p z v/PRAIN /prAn/ PRAID /pr@d/,/prAd/, /pred/SHEED /SEd/ SHEAD /SEd/,/Sed/SOAD /sOd/,/sod/ SOOD /sUd/,/sz d/, /sud/SPEET /spEt/ SOST /sOst/,/sz st/,/sost/STEET /stEt/ SPEAT /spAt/, /spEt/,/spet/SUFF /sz f/ STEAT /stAt/, /stEt/,/stet/SUST /sz st/ SULL /sz l/, /sul/SWEAL /swEl/ SWEAK /swAk/, /swEk/TAZE /tAz/ TAVE /t@v/, /tAv/, /tav/

WEAT /wAt/, /wEt/, /wet/ WEAD /wEd/, /wed/WOSH /waS/ WONE /wOn/,/w z n/, /won/WOTE /wOt/ WULL /w z l/, /wul/WUFF /w z f/ WUSH /w z S/,/wuS/Note: /a/ in POT, /@/ in CAT, /e/ in BED, /i/ in HIT, /o/ in DOG, /u/ in GOOD, /A/ in MAKE, /E/ in KEEP, /I/ inBIKE, /O/ in HOPE, /U/ in BOOT, /W/ in NOW, /Y/ in BOY, / { / in CUP, /N/ in RING, /S/ in SHE, /C/ in CHIN /Z/in BEIGE, /T/ in THIN, /D/ in THIS. All otherphonemesarerepresentedin theconventional way (e.g.,/b/ inBAT).


Appendix 3: Regularizations of Taraban and McClelland’s (1987) Exception WordsHigh-Frequency Exceptions Low-Frequency Exceptions

Word Correct Regularization(s) Word Correct Regularization(s)ARE /ar/ /Ar/ BOWL /bOl/ /bWl/BOTH /bOT/ /boT/ BROAD /brod/ /brOd/BREAK /brAk/ /brEk/ BUSH /buS/ /b z S/CHOOSE /CUz/ /CUs/ DEAF /def/ /dEf/COME /k z m/ /kOm/ DOLL /dal/ /dOl/DO /dU/ /dO/, /da/ FLOOD /fl z d/ /flUd/, /flud/DOES /d z z/ /dOz/,/dOs/ GROSS /grOs/ /gros/,/gras/DONE /dz n/ /dOn/ LOSE /lUz/ /lOs/, /lOz/FOOT /fut/ /fUt/ PEAR /pAr/ /pEr/GIVE /giv/ /gIv/ PHASE /fAz/ /fAs/GREAT /grAt/ /grEt/ PINT /pInt/ /pint/HAVE /hav/ /hAv/ PLOW /plW/ /plO/MOVE /mUv/ /mOv/ ROUSE /rWz/ /rWs/PULL /pul/ /pz l/ SEW /sO/ /sU/PUT /put/ /pz t/ SHOE /SU/ /SO/SAID /sed/ /sAd/ SPOOK /spUk/ /spuk/SAYS /sez/ /sAz/,/sAs/ SWAMP /swamp/ /sw@mp/SHALL /Sal/ /Sol/ SWARM /swOrm/ /swarm/WANT /want/ /w@nt/ TOUCH /t z C/ /tWC/WATCH /waC/ /w@C/ WAD /wad/ /w@d/WERE /wur/ /wEr/ WAND /wand/ /w@nd/WHAT /w z t/ /w@t/ WASH /woS/ /w@S/WORD /wurd/ /wOrd/ WOOL /wul/ /wUl/WORK /wurk/ /wOrk/ WORM /wurm/ /wOrm/Note: /a/ in POT, /@/ in CAT, /e/ in BED, /i/ in HIT, /o/ in DOG, /u/ in GOOD, /A/ in MAKE, /E/ in KEEP, /I/ inBIKE, /O/ in HOPE, /U/ in BOOT, /W/ in NOW, /Y/ in BOY, / { / in CUP, /N/ in RING, /S/ in SHE, /C/ in CHIN /Z/ inBEIGE, /T/ in THIN, /D/ in THIS. All otherphonemesarerepresentedin theconventionalway(e.g.,/b/ in BAT).


ReferencesAnderson,J.A., Silverstein,J.W., Ritz,S.A., & Jones,R. S. (1977).

Distinctive features,categoricalperception,andprobabilitylearn-ing: Someapplicationsof aneuralmodel.Psychological Review,84, 413–451.

Andrews, S. (1982). Phonologicalrecoding: Is the regularity effectconsistent?Memory and Cognition, 10, 565–575.

Backman,J.,Bruck,M., Hebert,M., & Seidenberg,M. S.(1984).Ac-quisitionanduseof spelling-soundinformationin reading.Journalof Experimental Child Psychology, 38, 114–133.

Baddeley, A. D., Ellis, N. C., Miles, T. C., & Lewis, V. J. (1982).Developmentalandacquireddyslexia: A comparison.Cognition,11, 185–199.

Balota,D., & Ferraro,R. (1993). A dissociationof frequency andregularityeffectsin pronunciationperformanceacrossyongadults,olderadults,andindividualswith seniledementiaof theAlzheimertype. Journal of Memory and Language,32, 573–592.

Baluch,B., & Besner, D. (1991).Visualwordrecognition:Evidencefor strategic controlof lexical andnonlexical routinesin oral read-ing. Journal of Experimental Psychology: Learning, Memory, andCognition, 17(4), 644–652.

Baron,J., & Strawson,C. (1976). Use of orthographicand word-specificknowledgein readingwordsaloud.Journal of Experimen-tal Psychology: Human Perception and Performance,4, 207–214.

Beauvois,M.-F., & Derouesne,J.(1979).Phonologicalalexia: Threedissociations.Journal of Neurology, Neurosurgery, and Psychia-try, 42, 1115–1124.

Behrmann,M., & Bub,D. (1992). Surfacedyslexia anddysgraphia:Dual routes,a single lexicon. Cognitive Neuropsychology, 9(3),209–258.

Besner, D., & Smith,M. C. (1992). Modelsof visualword recogni-tion: Whenobscuringthestimulusyieldsa clearerview. Journalof Experimental Psychology: Learning, Memory, and Cognition,18(3),468–482.

Besner, D., Twilley, L., McCann,R. S.,& Seergobin,K. (1990).Ontheconnectionbetweenconnectionismanddata:Are a few wordsnecessary?Psychological Review, 97(3), 432–446.

Brousse,O., & Smolensky, P. (1989).Virtual memoriesandmassivegeneralizationin connectionistcombinatoriallearning. Proceed-ings of the 11th Annual Conference of the Cognitive Science Society(pp.380–387).Hillsdale,NJ:LawrenceErlbaumAssociates.

Bub, D., Cancelliere,A., & Kertesz,A. (1985). Whole-wordandanalytictranslationof spelling-to-soundin anon-semanticreader.In K. Patterson,M. Coltheart,& J. C. Marshall (Eds.),Surfacedyslexia (pp.15–34).Hillsdale,NJ:LawrenceErlbaumAssociates.

Bub, D. N., Black, S., Hampson,E., & Kertesz,A. (1988). Se-manticencodingof picturesandwords:Someneuropsychologicalobservations.Cognitive Neuropsychology, 5(1), 27–66.

Buchanan,L., & Besner, D. (1993).Readingaloud:Evidencefor theuseof a wholeword nonsemanticpathway. Canadian Journal ofExperimental Psychology, 47(2), 133–152.

Bullinaria,J.A. (1995).Representation, learning, generalization anddamage in neural network models of reading aloud. Manuscriptsubmittedfor publication.

Butler, B., & Hains,S.(1979). Individualdifferencesin word recog-nition latency. Memory and Cognition, 7(2), 68–76.

Bybee,J. L. (1985). Morphology: A study of the relation betweenmeaning and form. Philadelphia,PA: JohnBenjamins.

Bybee, J. L., & Slobin, D. L. (1982). Rulesand schemasin thedevelopmentand useof the Englishpast tense. Language, 58,265–289.

Castles,A., & Coltheart,M. (1993). Varietiesof developmentaldyslexia. Cognition, 47, 149–180.

Cipolotti, L., & Warrington,E. K. (1995). Semanticmemoryandreadingabilities: A casereport.Journal of the International Neu-ropsychological Society, 1, 104–110.

Coltheart,M. (1978). Lexical accessin simplereadingtasks. In G.Underwood(Ed.),Strategies of information processing.New York:AcademicPress.

Coltheart,M. (1981).Disordersof readingandtheir implicationsformodelsof normalreading.Visible Language, 15, 245–286.

Coltheart,M. (1985). Cognitive neuropsychologyandthe studyofreading. In M. I. Posner, & O. S. M. Marin (Eds.), Attentionand performance XI (pp.3–37).Hillsdale,NJ:LawrenceErlbaumAssociates.

Coltheart,M., Curtis, B., Atkins, P., & Haller, M. (1993). Modelsof readingaloud: Dual-routeandparallel-distributed-processingapproaches.Psychological Review, 100(4),589–608.

Coltheart,M., Patterson,K., & Marshall,J. C. (Eds.).(1980). Deepdyslexia. London:Routledge& KeganPaul.

Coltheart,M., & Rastle,K. (1994). Serial processingin readingaloud: Evidencefor dual-routemodelsof reading. Journal ofExperimental Psychology: Human Perception and Performance,20(6), 1197–1211.

Coslett,H. B. (1991).Readbut notwrite “idea”: Evidencefor athirdreadingmechanism.Brain and Language, 40, 425–443.

Cottrell,G.W., & Plunkett,K. (1991).Learningthepasttensein are-currentnetwork:Acquiringthemappingfrom meaningto sounds.Proceedings of the 13th Annual Conference of the Cognitive Sci-ence Society (pp.328–333).Hillsdale,NJ:LawrenceErlbaumAs-sociates.

Daugherty, K., & Seidenberg, M. S. (1992). Rulesor connections?Thepasttenserevisited. Proceedings of the 14th Annual Confer-ence of the Cognitive Science Society (pp.259–264).Hillsdale,NJ:LawrenceErlbaumAssociates.

Derthick,M. (1990). Mundanereasoningby settlingon a plausiblemodel.Artificial Intelligence, 46, 107–157.

Diesfeldt,H. F. A. (1992).Impairedandpreservedsemanticmemoryfunctionsin dementia.In L. Backman(Ed.),Memory functioningin dementia. Amsterdam:Elsevier SciencePublishers.

Ellis, A. W., & Marshall, J. C. (1978). Semanticerrorsor statis-tical flukes? A noteon Allport’s “On knowing the meaningsofwordsweareunableto report”.Quarterly Journal of ExperimentalPsychology, 30, 569–575.

Fera,P., & Besner, D. (1992).Theprocessof lexical decision:Morewordsaboutaparalleldistributedprocessingmodel.Journal of Ex-perimental Psychology: Learning, Memory, and Cognition, 18(4),749–764.

Ferreira,F., & Clifton, C. (1986). The independenceof syntacticprocessing.Journal of Memory and Language,25, 348–368.

Fodor, J.A., & Pylyshyn,Z. W. (1988).Connectionismandcognitivearchitecture:A critical analysis.Cognition, 28, 3–71.

Forster, K. I. (1994). Computationalmodelingandelementarypro-cessanalysisin visualwordrecognition.Journal of ExperimentalPsychology: Human Perception and Performance, 20(6),1292.

Forster, K. I., & Chambers,S.(1973).Lexicalaccessandnamingtime.Journal of Verbal Learning and Verbal Behaviour, 12, 627–635.


Frederiksen,J. R., & Kroll, J. F. (1976). Spellingandsound: Ap-proachesto theinternallexicon. Journal of Experimental Psychol-ogy: Human Perception and Performance,2, 361–379.

Friedman,R.B. (in press).Recoveryfromdeepalexiatophonologicalalexia. Brain and Language.

Funnell,E. (1983). Phonologicalprocessingin reading: New evi-dencefrom acquireddyslexia. British Journal of Psychology, 74,159–180.

Glosser, G., & Friedman, R. B. (1990). The continuum ofdeep/phonological alexia. Cortex, 26, 343–359.

Gluck,M. A., & Bower, G.H. (1988).Evaluatinganadaptivenetworkmodelof humanlearning.Journal of Memory and Language, 27,166–195.

Glushko,R.J.(1979).Theorganizationandactivationof orthographicknowledgein readingaloud.Journal of Experimental Psychology:Human Perception and Performance, 5(4), 674–691.

Graham,K. S.,Hodges,J. R., & Patterson,K. (1994). Therelation-shipbetweencomprehensionandoralreadingin progressivefluentaphasia.Neuropsychologia,32(3), 299–316.

Hanson,S.J.,& Burr, D. J.(1990).Whatconnectionistmodelslearn:Learningandrepresentationin connectionistnetworks.Behavioraland Brain Sciences, 13, 471–518.

Hare,M., & Elman,J.L. (1992).A connectionistaccountof Englishinflectionalmorphology: Evidencefrom languagechange.Pro-ceedings of the 14th Annual Conference of the Cognitive ScienceSociety (pp. 265–270).Hillsdale, NJ: LawrenceErlbaumAsso-ciates.

Hare, M., & Elman, J. L. (in press). Learningand morphologicalchange.Cognition.

Harris,M., & Coltheart,M. (1986).Language processing in childrenand adults. London:Routledge& KeganPaul.

Henderson,L. (1982).Orthography and word recognition in reading.London:AcademicPress.

Hinton, G. E. (1981). Implementingsemanticnetworksin parallelhardware.In G.E.Hinton,& J.A. Anderson(Eds.),Parallel mod-els of associative memory (pp.161–188).Hillsdale,NJ:LawrenceErlbaumAssociates.

Hinton, G. E. (1989). Connectionistlearningprocedures.ArtificialIntelligence, 40, 185–234.

Hinton, G. E. (1990). Mappingpart-wholehierarchiesinto connec-tionistnetworks.Artificial Intelligence, 46(1),47–76.

Hinton,G. E. (Ed.).(1991).Connectionist symbol processing. Cam-bridge,MA: MIT Press.

Hinton, G. E., McClelland,J. L., & Rumelhart,D. E. (1986). Dis-tributed representations.In D. E. Rumelhart,J. L. McClelland,& the PDP researchgroup (Eds.), Parallel distributed process-ing: Explorations in the microstructure of cognition. Volume 1:Foundations (pp.77–109).Cambridge,MA: MIT Press.

Hinton, G. E., & Sejnowski, T. J. (1986). Learningandrelearningin BoltzmannMachines. In D. E. Rumelhart,J. L. McClelland,& the PDP researchgroup (Eds.), Parallel distributed process-ing: Explorations in the microstructure of cognition. Volume 1:Foundations (pp.282–317).Cambridge,MA: MIT Press.

Hinton,G. E.,& Shallice,T. (1991).Lesioninganattractornetwork:Investigationsof acquireddyslexia. Psychological Review, 98(1),74–95.

Hoeffner, J. (1992). Are rulesa thing of the past? The acquisitionof verbal morphologyby an attractornetwork. Proceedings ofthe 14th Annual Conference of the Cognitive Science Society (pp.861–866).Hillsdale,NJ:LawrenceErlbaumAssociates.

Hoeffner, J. H., & McClelland,J. L. (1993). Cana perceptualpro-cessingdeficit explain theimpairmentof inflectionalmorphologyin developmentaldysphasia?A computationalinvestigation.Pro-ceedings of the 25th Annual Child Language Research Forum (pp.38–49).Stanford,CA.

Humphreys, G. W., & Evett, L. J. (1985). Are thereindependentlexical andnonlexical routesin word processing?An evaluationof thedual-routetheoryof reading.Behavioral and Brain Sciences,8, 689–740.

Huttenlocher, P., & Huttenlocher, J.(1973).A studyof childrenwithhyperlexia. Neurology, 26, 1107–1116.

Jacobs,R.A. (1988).Increasedratesof convergencethroughlearningrateadaptation.Neural Networks, 1, 295–307.

Jared,D., McRae,K., & Seidenberg, M. S. (1990). The basisofconsistency effectsin wordnaming.Journal of Memory and Lan-guage, 29, 687–715.

Jared,D., & Seidenberg,M. S.(1990).Namingmultisyllabicwords.Journal of Experimental Psychology: Human Perception and Per-formance, 16(1),92–105.

Jones,G. V. (1985). Deepdyslexia, imageability, andeaseof predi-cation.Brain and Language, 24, 1–19.

Jordan,M. I., & Rumelhart,D. E. (1992). Forward models: Su-pervisedlearningwith a distal teacher. Cognitive Science, 16(3),307–354.

Juliano,C., & Tanenhaus,M. K. (in press).A constraint-basedlexi-calistaccountof thesubject/objectattachmentpreference.Journalof Psycholinguistic Research.

Kawamoto,A. H. (1993).Nonlineardynamicsin theresolutionof lex-ical ambiguity: A paralleldistributedprocessingapproach.Jour-nal of Memory and Language,32, 474–516.

Kawamoto,A. H., Kello, C., & Jones,R. (1994,November).Locusof theexceptioneffect in naming.Proceedings of the 35th AnnualMeeting of the Psychonomic Society (p. 51).St.Louis,MO.

Kawamoto,A. H.,Kello,C.,& Jones,R.(1995).Temporal and spatialloci of the exception and consistency effects in naming. Manuscriptsubmittedfor publication.

Kucera, H., & Francis,W. N. (1967). Computational analysis ofpresent-day American English. Providence,RI: Brown UniversityPress.

Kullback,S.,& Leibler, R.A. (1951).Oninformationandsufficiency.Annals of Mathmatical Statistics, 22, 79–86.

Lachter, J., & Bever, T. G. (1988). The relationbetweenlinguisticstructureandtheoriesof languagelearning:A constructivecritiqueof someconnectionistlearningmodels.Cognition, 28, 195–247.

Lacouture,Y. (1989). From meansquarederror to reactiontime:A connectionistmodelof word recognition. In D. S. Touretzky,G. E. Hinton,& T. J.Sejnowski (Eds.),Proceeedings of the 1988connectionist models summer school (pp. 371–378).SanMateo,CA: MorganKauffman.

Lass, R. (1984). Phonology: An introduction to basic concepts.Cambridge,U.K.: CambridgeUniversityPress.

Liberman,I. Y., & Shankweiler, D. (1985).Phonologyandtheprob-lemsof learningto readandwrite. Remedial and Special Educa-tion, 6, 8–17.

MacDonald,M. C. (1994). Probabilisticconstraintsand syntacticambiguityresoluation.Language and Cognitive Processes,9,157–201.

MacDonald,M. C., Pearlmutter, N. J., & Seidenberg, M. S. (1994).Thelexicalnatureof syntacticambiguityresolution.PsychologicalReview, 101(4),676–703.


MacWhinney, B., & Leinbach,J. (1991). Implementationsarenotconceptualizations:Revising theverblearningmodel.Cognition,40, 121–153.

Maddieson,I. (1984). Patterns of sounds. Cambridge,U.K.: Cam-bridgeUniversityPress.

Manis,F. R., Seidenberg, M. S.,Doi, L. M., McBride-Chang,C., &Peterson,A. (in press).On thebasesof two subtypesof develop-mentaldyslexia. Cognition.

Marcel,T. (1980).Surfacedyslexiaandbeginningreading:A revisedhypothesisof the pronunciationof print andits impairments.InM. Coltheart,K. Patterson,& J.C. Marshall(Eds.),Deep dyslexia(pp.227–258).London:Routledge& KeganPaul.

Marchman,V. A. (1993).Constraintson plasticityin aconnectionistmodel of the English past tense. Journal of Cognitive Neuro-science, 5(2), 215–234.

Marcus,G. F., Pinker, S.,Ullman,M., Hollander, M., Rosen,J.T., &Xu, F. (1992).Overregularizationin languageacquisition.Mono-graphs of the Society for Research in Child Development, 57(4),1–165.

Marr, D. (1969).A theoryof cerebellarcortex. Journal of Physiology,202, 437–470.

Marshall,J. C. (1984). Towarda rationaltaxonomyof thedevelop-mentaldyslexias. In R. N. Malatesha,& H. A. Whitaker(Eds.),Dyslexia: A global issue (pp. 211–232).The Hague: MartinusNijhoff.

Marshall, J. C., & Newcombe,F. (1966). Syntacticand semanticerrorsin paralexia. Neuropsychologia, 4, 169–176.

Marshall,J. C., & Newcombe,F. (1973). Patternsof paralexia: Apsycholinguisticapproach.Journal of Psycholinguistic Research,2, 175–199.

Massaro,D. W. (1988). Somecriticisms of connectionistmodelsof humanperformance.Journal of Memory and Language, 27,213–234.

Massaro,D. W. (1989). TestingbetweentheTRACEmodelandthefuzzy logical modelof speechperception.Cognitive Psychology,21, 398–421.

Masterson,J.(1985).Onhowwereadnon-words:Datafromdifferentpopulations.In K. Patterson,M. Coltheart,& J.C.Marshall(Eds.),Surface dyslexia (pp.289–299).Hillsdale,NJ:LawrenceErlbaumAssociates.

McCann,R. S.,& Besner, D. (1987). Readingpseudohomophones:Implications for modelsof pronunciationand the locus of theword-frequency effectsin wordnaming.Journal of ExperimentalPsychology: Human Perception and Performance, 13, 14–24.

McCarthy, R., & Warrington,E. K. (1986). Phonologicalreading:Phenomenaandparadoxes.Cortex, 22, 359–380.

McClelland,J. L. (1987). The casefor interactionismin languageprocessing.In M. Coltheart(Ed.),Attention and performance XII:The psychology of reading (pp. 3–36).Hillsdale, NJ: LawrenceErlbaumAssociates.

McClelland, J. L. (1991). Stochasticinteractive processesand theeffectof context on perception.Cognitive Psychology, 23, 1–44.

McClelland,J.L. (1993).TheGRAIN model:A framework for mod-eling the dynamicsof informationprocessing.In D. E. Meyer,& S. Kornblum (Eds.), Attention and performance XIV: Syner-gies in experimental psychology, artifical intelligence, and cogni-tive neuroscience (pp.655–688).Hillsdale,NJ:LawrenceErlbaumAssociates.

McClelland, J. L., & Elman, J. L. (1986). The TRACE modelofspeechperception.Cognitive Psychology, 18, 1–86.

McClelland,J. L., McNaughton,B. L., & O’Reilly, R. C. (in press).Why therearecomplementarylearningsystemsin thehippocam-pus and neocortex: Insightsfrom the successes and failures ofconnectionistmodelsof learningandmemory. Psychological Re-view.

McClelland, J. L., & Rumelhart,D. E. (1981). An interactive ac-tivation modelof context effects in letter perception:Part 1. Anaccountof basicfindings.Psychological Review, 88(5), 375–407.

McClelland,J.L., Rumelhart,D. E.,& thePDPresearchgroup(Eds.).(1986). Parallel distributed processing: Explorations in the mi-crostructure of cognition. Volume 2: Psychological and biologicalmodels. Cambridge,MA: MIT Press.

McClelland,J.L., St.John,M., & Taraban,R. (1989).Sentencecom-prehension:A paralleldistributedprocessingapproach.Languageand Cognitive Processes, 4, 287–335.

McCloskey, M. (1991).Networksandtheories:Theplaceof connec-tionism in cognitive science. Psychological Science, 2(6), 387–395.

Mehegan,C. C., & Dreifuss,F. E. (1972). Hyperlexia: Exceptionalreadingin brain-damagedchildren.Neurology, 22, 1105–1111.

Metsala,J. L., & Siegel, L. S. (1992). Patternsof atypicalreadingdevelopment: Attributes and underlying readingprocesses.InS.J.Segalowitz, & I. Rapin(Eds.),Handbook of neuropsychology,Vol. 7. Amsterdam:Elsevier SciencePublishers.

Meyer, D. E.,Schvaneveldt,R.W.,& Ruddy,M. G.(1974).Functionsof graphemicand phonemiccodesin visual word recognition.Memory and Cognition, 2, 309–321.

Miceli, G., & Caramazza,A. (1993).Theassignmentof word stressin oral reading:Evidencefrom a caseof acquireddyslexia. CN,10(3), 273–296.

Minsky, M. (1975).A frameworkfor representingknowledge.In P. H.Winston(Ed.),The psychology of computer vision (pp.211–277).New York: McGraw-Hill.

Minsky, M., & Papert,S. (1969). Perceptrons: An introduction tocomputational geometry. Cambridge,MA: MIT Press.

Monsell,S.(1991).Thenatureandlocusof wordfrequency effectsinreading.In D. Besner,& G.W. Humphreys(Eds.),Basic processesin reading: Visual word recognition (pp.148–197).Hillsdale,NJ:LawrenceErlbaumAssociates.

Monsell, S., Patterson,K., Graham,A., Hughes,C. H., & Milroy,R. (1992).Lexical andsublexical translationof spellingto sound:Strategic anticipationof lexical status. Journal of ExperimentalPsychology: Learning, Memory, and Cognition, 18(3), 452–467.

Morais, J., Bertelson,P., Cary, L., & Alegria, J. (1986). Literacytrainingandspeechsegmentation.Cognition, 24, 45–64.

Morais,J.,Cary, L., Alegria, J.,& Bertelson,P. (1979).Doesaware-nessof speechasasequenceof phonesarisespontaneously?Cog-nition, 7, 323–331.

Morton,J.(1969).Theinteractionof informationin wordrecognition.Psychological Review, 76, 165–178.

Morton, J., & Patterson,K. (1980). A new attemptat an interpreta-tion, Or, an attemptat a new interpretation.In M. Coltheart,K.Patterson,& J. C. Marshall (Eds.),Deep dyslexia (pp. 91–118).London:Routledge& KeganPaul.

Movellan, J. R., & McClelland,J. L. (1993). Learningcontinuousprobabilitydistributionswith symmetricdiffusionnetworks.Cog-nitive Science, 17(4), 463–496.

Mozer, M. C. (1990). Discovering faithful “Wickelfeature”repre-sentationsin a connectionistnetwork. Proceedings of the 12th


Annual Conference of the Cognitive Science Society. Hillsdale,NJ:LawrenceErlbaumAssociates.

Mozer, M. C. (1991). The perception of multiple objects: A connec-tionist approach. Cambridge,MA: MIT Press.

Norris, D. (1994). A quantitative multiple-levels modelof readingaloud. Journal of Experimental Psychology: Human Perceptionand Performance, 20(6),1212–1232.

Olsen,A., & Caramazza,A. (1991). Therole of cognitive theoryinneuropsychologicalresearch.In S.Corkin,J.Grafman,& F. Boller(Eds.),Handbook of neuropsychology (pp.287–309).Amsterdam:Elsevier.

Paap,K. R., & Noel, R. W. (1991). Dual routemodelsof print tosound:Still agoodhorserace.PsychologicalResearch,53, 13–24.

Parkin, A. J. (1982). Phonologicalrecodingin lexical decision:Ef-fectsof spelling-to-soundregularity dependon how regularity isdefined.Memory and Cognition, 10, 43–53.

Patterson,K. (1990). Alexia andneuralnets. Japanese Journal ofNeuropsychology,6, 90–99.

Patterson,K., Coltheart,M., & Marshall,J.C. (Eds.).(1985).Surfacedyslexia. Hillsdale,NJ:LawrenceErlbaumAssociates.

Patterson,K., Graham,N., & Hodges,J. R. (1994). ReadinginAlzheimer’s typedementia:A preservedability? Neuropsychol-ogy, 8(3), 395–412.

Patterson,K., & Hodges,J.R.(1992).Deteriorationof wordmeaning:Implicationsfor reading.Neuropsychologia, 30(12),1025–1040.

Patterson,K., & Marcel, A. J. (1992). PhonologicalALEXIA orPHONOLOGICALalexia? In J. Alegria, D. Holender, J. Juncade Morais, & M. Radeau(Eds.),Analytic approaches to humancognition (pp.259–274).New York: Elsevier.

Patterson,K., & Morton,J.(1985).Fromorthographyto phonology:An attemptatanold interpretation.In K. Patterson,M. Coltheart,& J.C.Marshall(Eds.),Surface dyslexia (pp.335–359).Hillsdale,NJ:LawrenceErlbaumAssociates.

Patterson,K., Seidenberg, M. S.,& McClelland,J. L. (1989). Con-nectionsanddisconnections: Acquireddyslexia in acomputationalmodelof readingprocesses.In R.G.M. Morris (Ed.),Parallel dis-tributed processing: Implications for psychology and neuroscience(pp.131–181).London:OxfordUniversityPress.

Patterson,K., Suzuki, T., Wydell, T., & Sasanuma,S. (in press).Progressiveaphasiaandsurfacealexia in Japanese. Neurocase.

Pearlmutter, B. A. (1989). Learningstatespacetrajectoriesin recur-rentneuralnetworks.Neural Computation, 1(2),263–269.

Pearlmutter, N. J., Daugherty, K. G., MacDonald,M. C., & Seiden-berg, M. S.(1994).Modelingtheuseof frequency andcontextualbiasesin sentenceprocessing. Proceedings of the 16th AnnualConference of the Cognitive Science Society (pp.699–704).Hills-dale,NJ:LawrenceErlbaumAssociates.

Phillips, W. A., Hay, I. M., & Smith, L. S. (1993). Lexicality andpronunciation in a simulated neural net (TechnicalReportCCCN-14). Stirling, Scotland:Centrefor Cognitive andComputationalNeuroscience,Universityof Stirling.

Pinker, S. (1984).Language learnability and language development.Cambridge,MA: HarvardUniversityPress.

Pinker, S. (1991).Rulesof language.Science, 253, 530–535.Pinker, S., & Prince,A. (1988). On languageand connectionism:

Analysis of a parallel distributed processingmodelof languageacquisition.Cognition, 28, 73–193.

Plaut,D. C. (in press).Doubledissociationwithoutmodularity: Ev-idencefrom connectionistneuropsychology. Journal of Clinicaland Experimental Neuropsychology.

Plaut,D. C.,Behrmann,M., Patterson,K., & McClelland,J.L. (1993,November). Impairedoral readingin surfacedyslexia: Detailedcomparisonof a patientanda connectionistnetwork[Abstract].Proceedings of the 34th Annual Meeting of the Psychonomic Soci-ety (p. 48).Washington,DC.

Plaut,D. C., & Hinton, G. E. (1987). Learningsetsof filters usingbackpropagation.Computer Speech and Language,2, 35–61.

Plaut,D. C.,& McClelland,J.L. (1993).Generalizationwith compo-nentialattractors:Word andnonwordreadingin an attractornet-work. Proceedings of the 15th Annual Conference of the CognitiveScience Society (pp. 824–829).Hillsdale,NJ: LawrenceErlbaumAssociates.

Plaut, D. C., McClelland, J. L., & Seidenberg, M. S. (in press).Readingexceptionwordsandpseudowords:Are two routesreallynecessary?In J.A. Bullinaria(Ed.),Proceedings of the 2nd neuralcomputation and psychology workshop. Edinburgh,Scotland.

Plaut, D. C., & Shallice,T. (1993). Deepdyslexia: A casestudyof connectionistneuropsychology. Cognitive Neuropsychology,10(5), 377–500.

Plunkett,K., & Marchman,V. A. (1991). U-shapedlearningandfrequency effects in a multi-layeredperceptron:Implicationsforchild languageacquisition.Cognition, 38, 43–102.

Plunkett,K., & Marchman,V. A. (1993).Fromrotelearningtosystembuilding: Acquiringverbmorphologyin childrenandconnection-ist nets.Cognition, 48(1),21–69.

Pugh,K. R.,Rexer, K., & Katz,L. (1994).Evidenceof flexiblecodingin visualword recognition.Journal of Experimental Psychology:Human Perception and Performance,20(4), 807–825.

Quinlan,P. (1991).Connectionism and psychology: A psychologicalperspective on new connectionist research. Chicago: Universityof ChicagoPress.

Rack,J. P., Snowling, M. J., & Olson,R. K. (1992). The nonwordreadingdeficit in developmentaldyslexia: A review. ReadingResearch Quartely, 27, 29–53.

Reggia,J. A., Marsland,P. M., & Berndt,R. S. (1988). Competitivedynamicsin a dual-routeconnectionistmodel of print-to-soundtransformation.Complex Systems, 2, 509–547.

Richardson,J. T. E. (1976). The effectsof stimulusattributesuponlatency of word recognition. British Journal of Psychology, 67,315–325.

Rumelhart,D. E. (1990). Brain style computation: Learningandgeneralization.In S.F. Zornetzer, J.L. Davis, & C. Lau(Eds.),Anintroduction to neural and electronic networks (Chap.21). NewYork: AcademicPress.

Rumelhart,D. E., Durbin, R., Golden,R., & Chauvin,Y. (in press).Backpropagation:The basictheory. In D. E. Rumelhart,& Y.Chauvin (Eds.), Backpropagation: Theory and practice. Cam-bridge,MA: MIT Press.

Rumelhart,D. E.,Hinton,G. E.,& Williams, R. J.(1986a).Learninginternal representationsby error propagation. In D. E. Rumel-hart,J.L. McClelland,& thePDPresearchgroup(Eds.),Paralleldistributed processing: Explorations in the microstructure of cog-nition. Volume 1: Foundations (pp. 318–362).Cambridge,MA:MIT Press.

Rumelhart,D. E.,Hinton,G. E.,& Williams, R.J.(1986b).Learningrepresentationsby back-propagatingerrors.Nature, 323(9), 533–536.

Rumelhart,D. E., & McClelland, J. L. (1982). An interactive ac-tivation modelof context effects in letter perception:Part 2. the


contextual enhancement effect andsometestsandextensionsofthemodel.Psychological Review, 89, 60–94.

Rumelhart,D. E., & McClelland,J. L. (1986). On learningthepasttensesof Englishverbs. In J. L. McClelland,D. E. Rumelhart,& thePDPresearchgroup(Eds.),Parallel distributed processing:Explorations in the microstructure of cognition. Volume 2: Psy-chological and biological models (pp.216–271).Cambridge,MA:MIT Press.

Rumelhart,D. E.,McClelland,J.L., & thePDPresearchgroup(Eds.).(1986). Parallel distributed processing: Explorations in the mi-crostructure of cognition. Volume 1: Foundations. Cambridge,MA: MIT Press.

Rumelhart,D. E.,& Todd,P. M. (1993). Learningandconnectionistrepresentations.In D. E. Meyer, & S. Kornblum(Eds.),Atten-tion and performance XIV: Synergies in experimental psychology,artifical intelligence, and cognitive neuroscience (pp.3–30).Cam-bridge,MA: MIT Press.

Saffran, E. M., Bogyo, L. C., Schwartz,M. F., & Marin, O. S. M.(1980). Doesdeepdyslexia reflectright-hemispherereading? InM. Coltheart,K. Patterson,& J.C. Marshall(Eds.),Deep dyslexia(pp.381–406).London:Routledge& KeganPaul.

Schwartz,M. F., Marin, O. M., & Saffran, E. M. (1979). Dissocia-tionsof languagefunction in dementia:A casestudy. Brain andLanguage, 7, 277–306.

Schwartz,M. F.,Saffran,E.M., & Marin,O.S.M. (1980).Fractioningthereadingprocessin dementia:Evidencefor word-specificprint-to-soundassociations. In M. Coltheart,K. Patterson,& J. C.Marshall(Eds.),Deep dyslexia (pp.259–269).London:Routledge& KeganPaul.

Seidenberg, M. S. (1985). The time courseof phonologicalcodeactivationin two writing systems.Cognition, 19, 1–10.

Seidenberg,M. S.(1993).Connectionistmodelsandcognitivetheory.Psychological Science, 4(4),228–235.

Seidenberg, M. S.,& McClelland,J.L. (1989).A distributed,devel-opmentalmodelof word recognitionandnaming. PsychologicalReview, 96, 523–568.

Seidenberg, M. S.,& McClelland,J.L. (1990).More wordsbut stillno lexicon: Replyto Besneret al. (1990). Psychological Review,97(3), 477–452.

Seidenberg, M. S., & McClelland, J. L. (1992). Connectionistmodels and explanatory theories in cognition (TechnicalReportPDP.CNS.92.4).Pittsburgh,PA: CarnegieMellon University, De-partmentof Psychology.

Seidenberg, M. S.,Petersen,A., MacDonald,M. C., & Plaut,D. C.(in press).Pseudohomophoneeffectsandmodelsof word recog-nition. Journal of Experimental Psychology: Human Perceptionand Performance.

Seidenberg, M. S., Plaut,D. C., Petersen,A. S., McClelland,J. L.,& McRae, K. (1994). Nonword pronunciationand modelsofword recognition. Journal of Experimental Psychology: HumanPerception and Performance, 20(6),1177–1196.

Seidenberg,M. S.,Waters,G. S.,Barnes,M. A., & Tanenhaus,M. K.(1984). Whendoesirregular spellingor pronunciationinfluenceword recognition? Journal of Verbal Learning and Verbal Be-haviour, 23, 383–404.

Sejnowski,T. J.,Koch,C.,& Churchland,P. S.(1989).Computationalneuroscience.Science, 241, 1299–1306.

Sejnowski, T. J., & Rosenberg, C. R. (1987). Parallel networksthatlearnto pronounceEnglishtext. Complex Systems, 1, 145–168.

Shallice,T. (1988).From neuropsychology to mental structure. Cam-bridge:CambridgeUniversityPress.

Shallice,T., & McCarthy, R. (1985). Phonologicalreading: Frompatternsof impairmentto possibleprocedures.In K. Patterson,M.Coltheart,& J.C.Marshall(Eds.),Surface dyslexia (pp.361–398).Hillsdale,NJ:LawrenceErlbaumAssociates.

Shallice,T., & McGill, J. (1978). The origins of mixed errors. InJ. Requin(Ed.), Attention and performance VII (pp. 193–208).Hillsdale,NJ:LawrenceErlbaumAssociates.

Shallice,T., Warrington,E. K., & McCarthy, R. (1983). Readingwithoutsemantics.Quarterly Journal of Experimental Psychology,35A, 111–138.

Silverberg, N. E., & Silverberg, M. C. (1967).Hyperlexia—Specificword recognitionskills in youngchildren. Exceptional Children,34, 41–42.

Skoyles,J.(1991).Connectionism,readingandthelimitsof cognition.Psycoloquy, 2(8), 4.

Smith,P. T., & Baker,R. G. (1976).Theinfluenceof Englishspellingpatternsonpronunciation.Journal of Verbal Learning and VerbalBehaviour, 15, 267–285.

St. John,M. F. (1992). The story Gestalt: A modelof knowledge-intensiveprocessesin text comprehension.Cognitive Science, 16,271–306.

St. John,M. F., & McClelland, J. L. (1990). Learningand apply-ing contextual constraintsin sentencecomprehension.ArtificialIntelligence, 46, 217–257.

Stanhope,N., & Parkin, A. J. (1987). Furtherexploration of theconsistency effect in word andnonwordpronunciation.Memoryand Cognition, 15, 169–179.

Stone,G. O. (1986). An analysisof thedeltarule andthe learningof statisticalassociations.In D. E. Rumelhart,J. L. McClelland,& the PDP researchgroup (Eds.), Parallel distributed process-ing: Explorations in the microstructure of cognition. Volume 1:Foundations (pp.444–459).Cambridge,MA: MIT Press.

Stone,G. O., & VanOrden,G. C. (1989).Are wordsrepresentedbynodes?Memory and Cognition, 17(5), 511–524.

Stone,G. O., & Van Orden,G. C. (1994). Building a resonanceframework for word recognitionusingdesignandsystemprinci-ples.Journal of Experimental Psychology: Human Perception andPerformance, 20(6),1248.

Strain,E.,Patterson,K., & Seidenberg,M. S.(in press).Semanticef-fectsin singlewordnaming.Journal of Experimental Psychology:Learning, Memory, and Cognition.

Sutton,R.S.(1992).Adaptingbiasbygradientdescent: An incremen-tal versionof Delta-Bar-Delta. Proceedings of the 10th NationalConference on Artificial Intelligence (pp. 171–176).Cambridge,MA: MIT Press.

Taraban,R.,& McClelland,J.L. (1987).Conspiracy effectsin wordrecognition.Journal of Memory and Language, 26, 608–631.

Taraban,R., & McClelland, J. L. (1988). Constituentattachmentandthematicrole assignmentin sentenceprocessing:Influencesof content-basedexpectations.Journal of Memory and Language,27, 597–632.

Treiman,R., & Chafetz,J. (1987). Are thereonset-andrime-likeunits in printedwords? In M. Coltheart(Ed.),Attention and per-formance XII: The psychology of reading (pp.281–327).Hillsdale,NJ:LawrenceErlbaumAssociates.

Treiman,R., Mullennix, J., Bijeljac-Babic,R., & Richmond-Welty,E. D. (in press).Thespecialrole of rimesin thedescription,use,


andacquisitionof Englishorthography. Journal of ExperimentalPsychology: General.

Trueswell,J. C., Tanenhaus,M. K., & Garnsey, S. M. (1994). Se-manticinfluenceson parsing:Useof thematicrole inofrmationinsyntacticdisambiguation.Journal of Memory and Language, 33,285–318.

VanOrden,G. C. (1987). A ROWS is a ROSE:Spelling,soundandreading.Memory and Cognition, 15, 181–198.

VanOrden,G.C.,& Goldinger,S.D. (1994).Interdependenceof formandfunction in cognitive systemsexplainsperceptionof printedwords. Journal of Experimental Psychology: Human Perceptionand Performance, 20(6),1269.

Van Orden,G. C., Pennington,B. F., & Stone,G. O. (1990). Wordidentificationin readingandthepromiseof subsymbolicpsycholin-guistics.Psychological Review, 97(4), 488–522.

Vellutino,F. (1979).Dyslexia. Cambridge,MA: MIT Press.Venezky, R. L. (1970). The structure of English orthography. The

Hague:Mouton.Waters,G. S.,& Seidenberg,M. S.(1985).Spelling-soundeffectsin

reading:Time courseanddecisioncriteria. Memory and Cogni-tion, 13, 557–572.

Wickelgren, W. A. (1969). Context-sensitive coding, associativememory, and serial order in (speech)behavior. PsychologicalReview, 76, 1–15.

Widrow, G.,& Hoff, M. E. (1960).Adaptiveswitchingcircuits.Insti-tute of Radio Engineers, Western Electronic Show and Convention,Convention Record, Part 4 (pp.96–104).

Williams, R. J.,& Peng,J. (1990).An efficient gradient-basedalgo-rithm for on-linetrainingof recurrentnetworktrajectories.NeuralComputation, 2(4), 490–501.

Zorzi, M., Houghton,G., & Butterworth, B. (1995). Two routesor one in reading aloud? A connectionist “dual-process” model(TechnicalReportUCL.RRG.95.1).London,UK: DepartmentofPsychology,UniversityCollege,Universityof London.

Documents

Understanding Normal and Impaired W ord Reading ...plaut/papers/pdf/PlautETAL96PsyRev.wordReading.pdf · Understanding Normal and Impaired W ord Reading: Computational Principles