30
LANGUAGE AND COGNITIVE PROCESSES, 1998, 13 (2/3), 307–336 Requests for reprints should be addressed to N. Ellis, School of Psychology, University of Wales, Bangor, Gwynedd, Wales, LL57 2DG, UK. We thank Martin Wilson for help with administering the experiment and Gordon Brown and Ernest Lee for useful discussions. This work was assisted by grant R000236099 from the Economic and Social Research Council (UK) to the rst author. A preliminary report of some of this data has previously been reported in Ellis, N.C. & Schmidt, R. (1997). Morphology and longer distance dependencies. Studies in Second Language Acquisition, 19, 145–171. q 1998 Psychology Press Ltd Rules or Associations in the Acquisition of Morphology? The Frequency by Regularity Interaction in Human and PDP Learning of Morphosyntax Nick C. Ellis University of Wales, Bangor, UK Richard Schmidt University of Hawai’i at Manoa When uent English speakers are asked to produce past tense forms, their latencies are affected by frequency of past tense forms when generating irregular in ections, but not when generating regular ones. This interaction has been used to support hybrid accounts of morphosyntax where regular in ections are computed by an af xation rule in a neurally based symbol manipulating syntactic system, while irregular verbs are retrieved from an associative memory. This article describes adult learning of morphosyntax in a novel language where frequency and regularity are factorially combined. The accuracy and latency data demonstrate frequency effects for both regular and irregular forms early in the acquisition process. However, as learning progresses, the frequency effect on regular items diminishes whereas it remains for irregular items. The regularity by frequency interaction is a natural consequence of the power law of practice and is thus entirely consistent with associative learning processes: Regularity is frequency by another name. Performance of a simple connectionist system, when trained on the same materials, shows a very close correspondence to the human acquisition data.

Rules or Associations in the Acquisition of Morphology ...nflrc.hawaii.edu/PDFs/SCHMIDT Rules or associations in the... · FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 309

Embed Size (px)

Citation preview

LANGUAGE AND COGNITIVE PROCESSES 1998 13 (23) 307ndash336

Requests for reprints should be addressed to N Ellis School of Psychology University ofWales Bangor Gwynedd Wales LL57 2DG UK

We thank Martin Wilson for help with administering the experiment and Gordon Brown andErnest Lee for useful discussions This work was assisted by grant R000236099 from theEconomic and Social Research Council (UK) to the rst author

A preliminary report of some of this data has previously been reported in Ellis NC ampSchmidt R (1997) Morphology and longer distance dependencies Studies in SecondLanguage Acquisition 19 145ndash171

q 1998 Psychology Press Ltd

Rules or Associations in the Acquisition ofMorphology The Frequency by RegularityInteraction in Human and PDP Learning of

Morphosyntax

Nick C EllisUniversity of Wales Bangor UK

Richard SchmidtUniversity of Hawairsquoi at Manoa

When uent English speakers are asked to produce past tense forms theirlatencies are affected by frequency of past tense forms when generatingirregular inections but not when generating regular ones This interactionhas been used to support hybrid accounts of morphosyntax where regularinections are computed by an afxation rule in a neurally based symbolmanipulating syntactic system while irregular verbs are retrieved from anassociative memory This article describes adult learning of morphosyntax in anovel language where frequency and regularity are factorially combined Theaccuracy and latency data demonstrate frequency effects for both regular andirregular forms early in the acquisition process However as learningprogresses the frequency effect on regular items diminishes whereas itremains for irregular items The regularity by frequency interaction is a naturalconsequence of the power law of practice and is thus entirely consistent withassociative learning processes Regularity is frequency by another namePerformance of a simple connectionist system when trained on the samematerials shows a very close correspondence to the human acquisition data

308 ELLIS AND SCHMIDT

INTRODUCTION

Can human morphological abilities be understood in terms of associativeprocesses or is it necessary to postulate rule-based symbol processingsystems underlying these grammatical skills This question has generatedconsiderable debate in the literature over the past decade much of itfocusing on the behaviour of ldquoregularrdquo and ldquoirregularrdquo inectionalmorphology There are broadly two contrasting accounts Dual-processingmodels (for example Marcus Brinkmann Clahsen Wiese amp Pinker 1995Pinker amp Prince 1988 Prasada Pinker amp Snyder 1990) take the differencesin behaviour of regular and irregular inections to represent the separateunderlying processes by which they are produced Regular inections areproduced by rules (for example for the past tense ldquoadd -ed to a Verbrdquo)while irregular inections are listed in memory Associative accountswhether connectionist (eg MacWhinney amp Leinback 1991 Plunkett ampMarchmann 1993 Rumelhart amp McClelland 1986) or schema-network(Bybee 1995) models assume that both regular and irregular inectionsarise from the same mechanism a single distributed associative networkwith the differences in behaviour being due to statistical distributionalfactors

This debate often makes reference to one key behavioural differencebetween regular and irregular inections When people are asked toproduce past tense forms their latencies are affected by frequency of pasttense forms when generating irregular inections but not when generatingregular ones Prasada et al (1990) and Seidenberg and Bruck (1990) showedthat when uent native English speakers see verb stems on a screen and arerequired to produce the past tense form as quickly as possible they takesignicantly less time (16ndash29msec in three experiments) for irregular verbswith high past tense frequencies (like went) than for irregular verbs with lowpast tense frequencies (like slung) even when stem frequencies are equatedHowever there is no effect on latency of the past tense frequency of regularverbs whose past tense is generated by adding -ed

This lack of frequency effect on regular forms has been taken as evidencethat grammar cannot be understood solely in terms of associativemechanisms Pinker (1991) uses it in support of a hybrid account ofmorphological inection Regular verbs (walkndashwalked) are computed by asufxation rule in a neurally based symbol manipulating syntactic systemwhile irregular verbs (runndashran) are retrieved from an associative memoryBriey his explanation is as follows (1) Irregular inected forms must bememorised since they do not conform to a rule A general property ofassociative memory systems is that there are robust frequency effectsFrequently encountered items are better remembered and faster accessedThus low frequency irregular forms take longer to access than high

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 309

frequency ones (2) Regular inections are not stored in associative memorybut are generated by a rule-based symbolic system the time to produce theinected form simply reecting (a) the time to access the lemma form and(b) the time to bind procedurally the regular inectional afx Thus thereare no frequency effects on their production latencies For example walkand afford are both quite common in their stem forms but the past tenseform walked is much more common than is afforded Nevertheless arule-generated account predicts that afforded will be produced as quickly aswalked since the stem forms being equally frequent are equally readilyaccessed and it takes a constant amount of time to add an -ed ending

Beck (1995) reports similar regularity by frequency interactions in thelatencies of productions of non-native speakers and thus broadens theapplication of this account to second language learning Indeed the effect isgenerally cited as key evidence for the existence of symbol-manipulatingrules in a specically linguistic mental module underpinning both rst andsecond language acquisition (Eubank amp Gregg 1995 Pinker 1991)

It is an elegant and attractive argument and the latency of productiondata are indeed consistent with such an account But there are two problemsThe rst is that there is a simpler more parsimonious explanation In thisarticle we will show that a basic principle of learning the power law ofpractice also generates frequency by regularity interactions Thus thesebehavioural dissociations between ldquoregularrdquo and ldquoirregularrdquo forms areequally consistent with connectionist accounts of morphosyntax The secondproblem is that although these theories are trying to explain both languageprocessing and language acquisition these particular data come from highlyuent language users It is difcult to gain an understanding of learning anddevelopment from observations of the nal state when we have no record ofthe content of the learnersrsquo years of exposure to language nor of thedevelopmental course of their prociencies If we want to understandlearning we must study it directly

The present report therefore describes adult acquisition of secondlanguage morphology using a miniature articial language (MAL) wherefrequency and regularity are factorially combined The accuracy and latencydata demonstrate frequency effects for both regular and irregular formsearly on in the acquisition process However as learning progresses so thefrequency effect for regular items diminishes although it remains forirregular items The results thus converge on the end-point described byPrasada et al (1990) However they also show how subjects reach thisendpoint with the convergence of performance on high and low frequencyregular plurals indexing the rate of acquisition of the regular pattern Wenext describe a simple connectionist model which was exposed to the sameexemplars in the same order as the human subjects The results of thesesimulations closely parallel those of the human learnersmdashthere are initially

310 ELLIS AND SCHMIDT

frequency effects for both the regular and irregular forms but with increasedexposure so the frequency effect for regular forms is attenuated Thus aconnectionist system which has no ldquorulesrdquo can duplicate this ldquorule-likerdquobehaviour Rather as shown by Plaut McClelland Seidenberg andPatterson (1996) for the case of reading the frequency by regularityinteraction is a natural and necessary result of associative learning processes

HUMAN LEARNING

If we wish to investigate the effects of input and practice on the acquisition oflanguage structure then we need a proper record of learner input Yet it isvirtually impossible to gather a complete corpus of learnersrsquo exposure andproduction of natural language How can we ascertain how many types andtokens of regular and irregular inections have been processed by forexample learners of English or of German At best for natural languagewe can only guess by extrapolation of frequency counts from languagecorpora and unveriable assumptions about registers Much of the disputeabout the implications of the regularity by frequency effect in morphosyntaxcentres on such assumptions (Bybee 1995 Marcus et al 1995 Plunkett ampMarchman 1991 Prasada amp Pinker 1993 Rumelhart amp McClelland 1986)One way around this is to have people learn a miniature articial language(MAL) under laboratory conditions

There is a rich tradition of using MALs to investigate processes ofacquisition of native language (Braine Brody Brooks Sudhalter RossCatalano amp Fisch 1990 Moeser amp Bregman 1972 Morgan Meier ampNewport 1987 Morgan amp Newport 1981 Palermo amp Howe 1970 Winter ampReber 1994) and second- and foreign-languages (MacWhinney 1983McLaughlin 1980 Yang amp Givon 1997) The number of published studies isat least in the hundreds if not more This is because MAL experiments havemany advantages They allow (a) a complete log of exposure to be recorded(b) accuracy to be monitored at each point (c) factorial manipulation of thepotential independent variables of interest and the teasing apart of naturallyconfounded effects and (d) relatively rapid collection of data But theseadvantages are bought at the cost of reduced ecological validity (1) MALsare toy languages when compared to the true complexity of naturallanguage (2) the period of study falls far short of lifespan practice (3)laboratory learning exposure conditions are far from naturalistic and (4)volunteer learners are often atypical in their motivations and demographicsAll of these very real problems of laboratory research stem from thesacrices made necessary by the goals of experimental control and

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 311

microanalysis of learning in real time This is the classic ldquoexperimenterrsquosdilemmardquo Naturalistic situations limit experimental control and thus theinternal logical validity of research laboratory research limits ecologicalvalidity (Jung 1971)

In adopting MAL research we are not denying naturalistic eld studiesWe might caricature the rst as providing valid descriptions of articiallanguage learning and the latter as providing tentative descriptions ofnatural language learning However the use of a MAL in this study avoids atleast three problems that have plagued similar experiments using naturallanguages (Beck 1995) (1) Uncertainty whether frequencies derived fromcorpora accurately represent input to learners (2) problems attributed tointerference from phonological similar items in regular and irregular sets(eg lean lend or y ow) or derived forms (eg head as a verb derived froma noun) and (3) evidence from only an advanced stage of learning forcingreliance on logical argumentation rather than empirical evidence to describeacquisition

Subjects

Seven monolingual English volunteers for the School of Psychologyvolunteer panel served as subjects There were three males and four femalesThey were aged between 18 and 40 yearsrsquo old They were paid pound250 per hourfor their involvement They usually worked an hour a day at the experiment

The Miniature Articial Language

Moeser and Bregman (1972) criticised the generalisability of MALexperiments which involved subjects listening to strings of words fromsemantically empty languages because some syntactic rules that were easilyacquired when the MAL referred to a stimulus world were not acquiredwhen it did not The MAL in the present study therefore incorporatedreference The subjectsrsquo initial task was to learn MAL names for 20 picturestimuli They were told that they were learning vocabulary in a newlanguage The pictures drawn from Snodgrass and Vanderwart (1980) aredescribed in the Appendix along with the stem form of their MAL namesand their corresponding plural forms Like Braine et al (1990) we choseMAL names which were suggestive of English cognates in order to makethem readily learnable thus for example the MAL words for umbrella andsh are respectively ldquobrolrdquo and ldquopiscrdquo To the degree that the task onlyinvolves ostensive denition and is not embedded in a larger goal-directedsetting it is acknowledgedly limited as an analogue of natural languagevocabulary acquisition However it allows clean and precise experimentalcontrol whilst providing a reasonable model of ostensive vocabulary

312 ELLIS AND SCHMIDT

learning that occurs to some considerable degree in L1 and even more so inintentional foreign language learning

Subjects learned the stem forms before studying the plural forms In thestem learning phase all items appeared equally often In the subsequentplural learning phase in order to maximise the sensitivity of the reactiontime (RT) measure plurality in the MAL was marked by a prex Half of theitems had a regular plural marker (ldquobu-rdquo) the remaining 10 items hadidiosyncratic afxes as shown in the Appendix The use of a prexinectional system afforded the additional advantage of minimising transfereffects from the subjectsrsquo rst language since although it is found in naturallanguages like Ndebele it is quite different from English plural formationThus the MAL was designed with English cognates in order to promotepositive transfer of learning of the stem forms and a very different inectionsystem in order to exclude any morphological transfer Frequency wasfactorially crossed with regularity with half of each set being presented vetimes more often The high and low frequency irregular items were matchedfor initial phoneme to control voice onset time

Method

The experiment was controlled by a Macintosh LCIII computerprogrammed with PsyScope (Cohen MacWhinney Flatt amp Provost 1993)Model pronunciations of the MAL lexis spoken by the rst author wererecorded using MacRecorder Subjectsrsquo vocal reaction times were measuredusing a voice key

Stem Learning Subjects rst learned the stem forms of the MAL Thisphase consisted of blocks of 20 trials In each block every picture appearedonce in a randomly chosen ordermdashthe subjectsrsquo frequency of exposure to allof the stem forms was the same Each trial consisted of the followingsequence (1) one of the pictures appeared mid-screen for 2sec (2) if thesubject thought they knew the picture name they spoke it into themicrophone as quickly as possible (3) 2sec after picture onset the computerspoke the correct name for the picture (4) the experimenter marked thesubjectrsquos utterance as correct or not by pressing one of two keys Thedependent variables were thus correctness and RT These blocks of trialswere repeated until the subjects knew the MAL names for the pictures andcould begin uttering them within 2sec of stimulus-onset to a criterion of100 correct on two successive blocks At this point they graduated to theplural learning phase

Plural Learning This phase used the same procedures except that eachblock consisted of 80 trials presented in random order (1) One presentation

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 313

of each of the 20 singular forms as in the preceding phase (2) vepresentations of each of the ve high frequency regular (HiFreqReg) pluralforms (3) ve presentations of each of the ve high frequency irregular(HiFreqIrreg) plural forms (4) one presentation of each of the ve lowfrequency regular (LoFreqReg) forms and (5) one presentation of each ofthe ve low frequency irregular (LoFreqIrreg) forms On the singular trialsjust one picture appeared midscreen on the plural trials a pair of adjacentidentical pictures appeared This phase continued for several (mean 5 43range 5 0 to 9) blocks beyond the point at which the learners had achieved100 accuracy on all plural forms in order to monitor increasing uency asindexed by RT improvement

Results

Stem Learning The stem learning data will only be presented insummary since the major focus of the experiment lies with the plural formsSubjects took an average of 917 (SD 593) blocks to achieve the criterion ofcorrectness Some stem forms were easier to learn than others (F(19 2161)5 2307 P 0001) Particularly easy words included ldquofantrdquo (92 correctover all trials) ldquopiscrdquo (85) and ldquolantrdquo (78) Particularly difcult wordsincluded ldquoprillrdquo (32) ldquocharprdquo (43) and ldquobreenrdquo (46) However forpurposes of control it is important to note that the stem forms of the itemsthat were later allocated in the Plural Learning phase to regularirregularplural morphology or highlow frequency of exposure did not signicantlydiffer in difculty of learning at this stage Regularity [F(1 16) 5 0703 ns)Frequency [F(1 16) 5 0569 ns) Regularity 3 Frequency (F(1 16) 5 0029ns)

Plural Learning Subjects partook of between 13 and 15 blocks of thisphase

The key interest lies with the rate of acquisition of the plural forms Wewill rst describe analyses of accuracy and then RT These data are shown inFig 1

ANOVA was used to assess the effects of frequency regularity and blockFor the main effects of regularity and frequency and their interaction wereport additional analyses which determine the robustness of these effectswhen separately analysed by subjects and by words There was a signicanteffect of frequency on accuracy with the advantage going to the highfrequency items [overall analysis F(1 5939) 5 43117 P 0001 by subjectsF(1 6) 5 5631 P 0005 by words F(1 16) 5 17200 P 0001] There wasa signicant effect of regularity with the regular plurals being learned betterthan the irregulars [overall analysis F(15939) 5 8152 P 0001 bysubjects F(1 6) 5 664 P 005 by words F(1 16) 5 3050 P 0001]

314 ELLIS AND SCHMIDT

FIG 1 Acquisition data for human learners of the MAL morphology The four curvesillustrate the interactions of regularity and frequency The left-hand panel shows accuracyimproving with practice The right-hand panel shows vocal reaction time diminishing withpractice In this graph as in Figs 2 and 3 the frequency effect for regular items is assessed bycomparing the two solid lines and the frequency effect for irregular items lies in the differencebetween the two dotted lines

There was signicant improvement over blocks [F(14 5939) 5 13200 P 0001] The interaction of regularity by frequency was signicant with thefrequency effect being larger for the irregular items [overall analysis F(15939) 5 7352 P 0001 by subjects F(1 6) 5 1241 P 002 by words F(116) 5 2773 P 0001] A signicant interaction between regularity byfrequency by block [F(14 5939) 5 222 P 0005] shows that the largerfrequency effect for irregular items is maximal in the mid-order blocksmdashit isa lesser effect at early and later stages of learning (Fig 1)

These patterns are conrmed in the somewhat noisier RT data where thefollowing sources of variation were signicant at least in the overall analysis(a) frequency [overall analysis F(1 5123) 5 65074 P 0001 by subjectsF(1 6) 5 6308 P 0001 by words F(1 16) 5 7396 P 0001] (b)

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 315

regularity [overall analysis F(1 5123) 5 1062 P 0001 by subjects F(1 6)5 326 ns by words F(1 16) 1 ns] (c) block [F(14 5123) 5 2872 P 0001] (d) regularity by frequency [overall analysis F(1 5123) 5 2092 P 0001 by subjects F(1 6) 5 1015 P 002 by words F(1 16) 5 215 ns] (e)regularity by frequency by block [F(14 5123) 5 195 P 005]

It is clear from both panels of Fig 1 that there was much less regularityeffect for high frequency items than for low frequency items and incounterpart that the frequency effect was less for regular items Inparticular if the last four blocks of training are taken being typical of moreuent performance they demonstrate that ceiling effects on the accuracydata allow no frequency effect for the regular items whereas the effect offrequency is maintained for the irregular ones The RT curves in theright-hand panel of Fig 1 are clearly non-linear In each case a powerfunction better ts the data than does a linear function the R2s for the powerfunction ts being respectively HiFreqReg 094 HiFreqIrreg 097LoFreqReg 074 LoFreqIrreg 076 Thus the frequency by regularityinteraction seems a natural result of asymptotic performance limits forcorrectness the 100 accuracy ceiling for RT the latency ldquooorrdquo governedby the power law of practice The curves in Fig 1 give no hint of a suddenstep in performance whereafter all regular items are produced with similarefciency

Discussion of Human Data

Like Prasada et al (1990) these data show a regularity by frequencyinteraction in the processing of morphology However contra Prasada et althe present data which concern the learning of morphology demonstrate(a) that there are frequency effects (both on accuracy and RT) for regularitems in the early stages of acquisition (b) the sizes of these effects diminishwith learning (converging on a position at uency as described by Prasada etal) and (c) the size of the frequency effect on irregular items similarlydiminishes with learning but it does so more slowly

These effects are readily explained by simple associative theories oflearning It is not necessary to invoke hybrid systems separating rule-governed regular morphosyntax from associatively stored irregulars Ifthere is one ubiquitous quantitative law of human learning it is the powerlaw of practice (Anderson 1982) The critical feature in this relationship isnot just that performance typically time improves with practice but that therelationship involves the power law in which the amount of improvementdecreases as a function of increasing practice or frequency Anderson (1982)showed that this function applies to a variety of tasks including for examplecigar rolling syllogistic reasoning book writing industrial productionreading inverted text and lexical decision For the case of language

316 ELLIS AND SCHMIDT

acquisition Kirsner (1994) has shown that lexical recognition processes(both for speech perception and reading) and lexical production processes(articulation and writing) are independently governed by the relationshipT5 BN-a where T is some measure of latency of response and N the numberof trials of practice DeKeyser (1997) shows that automatisation ofcomprehension and production performance involving explicitly learnedsecond-language morphosyntax separately follow independent skill-specic power functions Ellis (1996) describes the general implications ofthe power law for second-language acquisition

The human acquisition data in Fig 1 clearly follow the power law oflearning Thus as performance approaches asymptote so previouslyseparated functions tend to converge High frequency items are closer toasymptote Therefore whereas performance levels for regular and irregularitems are clearly distinguishable at low frequencies they are much lessdistinct at high frequencies This comes as no surprise to us when weconsider the ceiling imposed by 100 accuracy But the power law ofpractice equally implies an asymptotic ceiling whatever our performancemeasure

The power law entails that the contribution of any potential independentvariable affecting performance will be more difcult to demonstrate withhigh-frequency items in practised individuals This is certainly the case inreading For example while spelling and graphemendashphoneme regularityhave clear effects on low frequency items they show little or no effectsamong high frequency words (Seidenberg Waters Barnes amp Tanenhaus1984) Our learning data illustrate the same principle operating in theacquisition of morphology It is not the case that there is no regularity effecton high frequency items (or concomitantly no frequency effect on regularitems) it is simply that such effects are much smaller closer to asymptoteand thus are likely to be swamped by random error Indeed highfrequency regular inected forms do exhibit a small (but non-signicant)advantage over low frequency forms in naturally occurring errorsand they can be shown to have a larger (signicant) advantage ina more controlled experimental task in which subjects produced thepast-tense forms of regular English verbs (Stemberger amp MacWhinney1986)

We have shown that the interaction of frequency and regularity resultsfrom developmental trends that are consistent with the ubiquitousdescriptive law of associative learning In the next section we willdemonstrate how such data can be generated by a very general mechanismof associative learning When presented with the same materials at the samerelative frequencies of exposure a standard three-layer feed-forwardconnectionist model closely simulates our language-learnersrsquo acquisitioncurves

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 317

CONNECTIONIST SIMULATIONS

Connectionist models allow the assessment of just how much of languageacquisition can be done by extraction of probabilistic patterns ofgrammatical and morphological regularities Since the only relation inconnectionist models is strength of association between nodes they areexcellent modelling media in which to investigate the formation ofassociations (both between surface-form elements and between these andemergent more abstract internal representations) as a result of exposure tolanguage The advantages of connectionist models over traditional symbolicmodels are that (a) they are neurally inspired (b) they incorporatedistributed representation and control of information (c) they are data-driven with prototypical representations emerging as a natural outcomeof the learning process rather than being prespecied and innately givenby the modellers as in more nativist cognitive accounts (d) they showgraceful degradation as do humans with language disorder and (e)they are in essence models of learning and acquisition rather than staticdescriptions

There have been a number of compelling PDP models of the acquisition ofmorphology The pioneers were Rumelhart and McClelland (1986) whoshowed that a simple learning model reproduced to a remarkable degreethe characteristics of young children learning the morphology of the pasttense in Englishmdashthe model generated the so-called U-shaped learningcurve for irregular forms it exhibited a tendency to overgeneralise and inthe model as in children different past-tense forms for the same word couldco-exist at the same time Yet there was no ldquorulerdquomdashldquoit is possible to imaginethat the system simply stores a set of rote-associations between base andpast-tense forms with novel responses generated by lsquoon-linersquo generalisationsfrom the stored exemplarsrdquo (Rumelhart amp McClelland 1986 p 267) Thisoriginal past-tense model was very inuential It laid the foundations for theconnectionist approach to language research which this special issue attestsit generated a large number of criticisms (Lachter amp Bever 1988 Pinker ampPrince 1988) some of which are undeniably valid and in turn it thusspawned a number of revised and improved PDP models of different aspectsof the acquisition of the English past tense (eg Cottrell amp Plunkett 1994Daugherty amp Seidenberg 1994 MacWhinney amp Leinbach 1991Marchman 1993 Plunkett amp Marchman 1991)

Of these newer models only that of Daugherty and Seidenberg (19921994) addressed the regularity by frequency interaction Their model was athree-layer feed-forward network mapping the input of phonologicalstructure of present tense encoded over 120 phonological units representinga CCCVVCCC template for English monosyllables onto an output ofsimilarly coded phonological structure of past tense form Simulation 1

318 ELLIS AND SCHMIDT

where the model was trained on all presentndashpast tense pairs with Francis andKucera frequencies 1 (309 verbs with regular past tenses 104 verbs withirregular past tenses) failed to generate any frequency by regularityinteraction in error score However when in simulation 2 the number ofirregular verbs in the training set was reduced to just 24 this resulted in therebeing little effect of frequency on performance with the regular itemswhereas performance was better for high frequency irregular verbs than forlow frequency ones This is an important demonstration that the frequencyby regularity interaction can be simulated by a connectionist systemHowever this model concerned mappings between present- and past-tenseforms not direct access from semantics as in our human data Furthermoreit is unclear from these simulations how much the results are due toregularity per se how much to phonological factors (for example insimulation 1 the error scores for regulars in generalisation tests were inatedby there being a high proportion of phonologically similar irregular pasttense false friends in the training corpus 1994 p 375) and given thecontrasting results of simulations 1 and 2 how much to the particular choiceof training items and the relative proportions of regular and irregular items

Indeed much of the debate over the validity of all of these models hasconcerned (a) the adequacy of the adopted low-level phonologicalrepresentations whether these might serve as TRICS (The RepresentationsIt Crucially Supposes) which cryptoembody rules within the connectionistnetwork (Lachter amp Bever 1988) (b) over-reliance on phonological cues inmodels that used sound-to-sound conversions to link base forms with pasttense forms (Daugherty amp Seidenberg 1992 MacWhinney 1994MacWhinney amp Leinbach 1991) and (c) the appropriateness of the trainingsets that are used in exposing the models to the evidence of language andwhether they properly reect the types and tokens in representative ratiosof regular and irregular forms in a sequence that plausibly mirrors learnerlanguage exposure at different stages of development (Daugherty ampSeidenberg 1992 Plunkett amp Marchman 1991) The models are usuallyconcerned with child learner language exposure yet here the extrapolationis particularly tenuous since adult language frequency norms are typicallythe only available reference database

In our simple demonstration with its intended focus on the frequency byregularity interaction in the acquisition of morphology we circumventedthese problems by the following means

1 We eliminated TRICS from our input and output representations byentirely ignoring the low level representations and instead simply havingone input unit for each picture and one output unit for each morphemeWe make no pretence of plausibility of these models for low levels ofrepresentation in either input or output processing but we are presently

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 319

neither concerned with low-level feature perception nor the details ofmotor programming for pronunciation Each input unit is supposedbroadly to correspond to activation of some picture detector or ldquoimagenrdquo(Paivio 1986) each output unit to some speech output ldquologogenrdquo(Morton 1979) We acknowledge that these parts of the model are grosslysimplied and we believe that these aspects ultimately involve distributedrepresentations as well However there is one advantage to thissimplicitymdashwhere as here each input detector or output logogen isrepresented by just one unit with all units having the same form there isno scope for making some more similar than others other that is than isdetermined by the frequency of the inputndashoutput mappings Thisencoding scheme allows the most hygienic investigation of frequency andregularity uncontaminated by other factors2 Like Cottrell and Plunkett (1994) we are modelling direct access fromsemantics rather than generating past tense from stem form phonologyBecause there are no phonological representations in our model there isno chance of the results reecting any confound with phonology As usualcosts accompany the benets Our simulations can have no bearing onphonological aspects of inection and thus while they might generatequantitatively clean data unlike the elegant error analyses performed byfor example Daugherty and Seidenberg (1994) and MacWhinney andLeinbach (1991) the error responses in the present simulations will bequalitatively uninteresting3 We eliminated uncertainty about the detailed content of the complexevidence which human learners are exposed to during their early years ofhearing natural language by modelling adult subjectsrsquo learning of theMAL that was reported in the preceding section Because we determinedthe exposure sequence of types and tokens of regular and irregular itemsin this language learning task we could train the models ensuring theidentical history of exposure

The most common architecture of connectionist model has three layersthe input layer of units the output layer and an intervening layer of hiddenunits (HUs) The presence of HUs enables more difcult inputoutputmappings to be learned than would be possible if the input units weredirectly connected to the output units (Broeder amp Plunkett 1994Rumelhart amp McClelland 1986) The most common learning algorithm isldquoback propagationrdquo (Rumelhart Hinton amp Williams 1986) where on eachlearning trial the network compares its output with the target output andany difference is propagated back to the hidden unit weights and in turn tothe input weights in a way that reduces the error Our simulations adoptedthis standard architecture Thus whatever the pattern of results they aregenerated by a very general learning system whose processes were not

320 ELLIS AND SCHMIDT

tweaked in any way to make it particular as a Language Acquisition DeviceSo what are the emergent patterns of language acquisition that result whenthis general associative learning mechanism is applied to the particularcontent of picture stimuli with their corresponding singular and plural lexicalresponses as experienced at the same relative frequencies of exposure as ourhuman learners

The Models

Architecture Every model had 22 input (I-) units Each of I-units 1ndash20represented one of the pictures used in the training set of the AppendixI-unit 21 represented another picture (the generalisation test item TesterP)which was only ever presented for training to the model in the singularmdashlater it was presented as a plural test item to see which plural afx the modelwould choose for this generalisation item (akin to asking you what is theplural of a novel word like ldquowugrdquo) I-unit 22 coded plurality that iswhether a singular stimulus item or a pair were presented Every model had32 output (O-) units O-units 1ndash20 represented the stem forms of the lexisshown in the Appendix O-unit 21 represented the stem form correspondingto I-unit 21 O-units 22ndash31 represented each of the other 10 unique pluralafxes for irregular items O-unit 32 represented the regular plural afxThis numbering of I- and O-units is of course arbitrary and was random-ised across modelsmdashwhat mattered and remained constant was that thesame O-unit was always reinforced whenever a particular I-unit wasactivated

We investigated four different classes of model which differed in theircomputational capacity or resources The larger the number of HUs in amodel the larger the number of connections in the network and the greaterits capacity to learn new associations and abstractions Thus we comparedmodels with 3 5 8 and 15 HUs

Stem Training At the outset the connection weights of the models wererandomised Then just like our human learners the models were rsttrained on the singular forms Each epoch of training consisted of 21 trialsEach trial consisted of presentation of a unique input pattern one for each ofthe input pictures Thus just one of I-units 1ndash21 would be ldquoonrdquo on any trialThroughout the singular training phase I-unit 22 (representing singlepluralstimuli) was set to ldquooffrdquo For each input pattern the model responded with apattern of output over its 32 O-units Initially this was the random result ofthe random connection weights But the model was also presented with thecorrect pattern of output for that corresponding input pattern (eg if I-unit 1

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 321

was on and all others off O-unit 1 should have had value 10 and all otherszero) On each trial the back-propagation algorithm calculated thedifference between the level of activity that was produced on each O-unitand the ldquocorrectrdquo level of activity and a small adjustment was made to theconnection strength to that unit in such a way that when the same processoccurred again a closer approximation to the correct pattern of outputactivation would be achieved The models were trained for 500 epochs ofsingular experience For each size of model we ran ve examples startingwith different arbitrary unit allocation and different initial randomconnection strengths The data we produce for each model is the averageperformance of these ve examples

Plural Training The model weights that resulted from this singulartraining then served as the starting point for another 700 epochs of trainingon plurals The trials constituting each epoch were very similar in nature tothose used with the human learners Each epoch consisted of 81 trialspresented in random order (a) One presentation of each of the 21 singularforms as in the preceding phase (b) ve presentations of each of the ve highfrequency regular (HiFreqReg) plural forms (c) ve presentations of eachof the ve high frequency irregular (HiFreqIrreg) plural forms (d) onepresentation of each of the ve low frequency regular (LoFreqReg) formsand (e) one presentation of each of the ve low frequency irregular(LoFreqIrreg) forms For training trials of type (a) just one of I-units 1ndash21was activated I-unit 22 was off and just the corresponding one of O-units1ndash21 was reinforced For the other training types (bndashe) one of I-units 1ndash20was activated I-unit 22 was on and one of O-units 1ndash20 (the correspondingstem form) along with one of O-units 22ndash32 (the corresponding plural afx)were reinforced The learning algorithm operated as it did in the stemtraining phase At regular intervals we tested the state of learning of themodel by presenting it without feedback with test input patterns thatrepresented the plural cases of all 21 pictures At these tests for eachstimulus we measured the pattern of activation (between 0 [no activation]and 1 [full on]) across O-units 22ndash32 and compared it against the targetplural activation for that input pattern

Results

Regularity by Frequency Figure 2 shows the Root Mean Square (RMS)error calculated across the plural afx O-units (22ndash32) averaged over the veitems in each of the following classes HiFreqReg HiFreqIrreg LoFreqRegLoFreqIrreg at each point in testing of the model These graphs illustratethat learning in all of the models showed clear effects of frequency (high

322

FIG

2

Acq

uisi

tion

data

for

fou

r co

nnec

tioni

st m

odel

s w

ith

incr

easi

ng c

ompu

tati

onal

pow

er t

rain

ed o

n th

e M

AL

mor

phol

ogy

The

re a

re c

lear

reg

ular

ity b

y fr

eque

ncy

inte

ract

ions

in a

ll m

odel

s

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 323

frequency items were learned faster than low frequency ones) regularity(regular items were learned faster than irregular ones) and a frequency byregularity interaction whereby there was much less regularity effect for highfrequency items than for low frequency items and equally that thefrequency effect was less for regular items than for irregular ones

ANOVAs on these RMS data for each size of model demonstrated thatthere was high consistency of response across items and examplesimulations For example when the 8HU model was analysed as a repeatedmeasures ANOVA across 15 roughly equally spaced blocks of training (toparallel the human data analysis) the following signicant effects wereobserved (a) Frequency [by simulations F(1 16) 5 2080 P 00005 bywords F(1 16) 5 5665 P 00001] (b) regularity [by simulations F(1 16)5 907 P 001 by words F(1 16) 5 3957 P 00001] (c) regularity byfrequency [by simulations F(1 16) 5 485 P 005 by words F(1 16) 51561 P 0005] (d) block [by simulations F(14 224) 5 6803 P 00001by words F(14 224) 5 14914 P 00001] (e) block by regularity [bysimulations F(14 224) 5 3675 P 00001 by words F(14 224) 5 2929 P 00001] (f) block by frequency [by simulations F(14 224) 5 1893 P 00001 by words F(14 224) 5 1184 P 00001] and (g) block by regularityby frequency [by simulations F(14 224) 5 1611 P 00001 by words F(14224) 5 1306 P 00001]

Comparison of this pattern of ANOVA effects with that reported earlierfor the human data shows important similarities in both cases there aresignicant main effects of frequency regularity and blocks and there aresignicant interactions involving regularity by frequency and regularity byfrequency by block Thus the connectionist models demonstrate effectswhich broadly parallel those found in humans

Comparison with Human Data More detailed comparison is alsopossible Although RMS error is the usual measure of model performancebecause it assesses how well the network learns to inhibit non-relevant unitsas well as to excite relevant ones we also extracted simple accuracy data forthe 8HU model This accuracy score is the amount of activation (between 0and 1) on the single O-unit which corresponds to the appropriate target afxfor that input pattern Figure 3 shows the performance of the 8HU modelusing this metric It is clear that accuracy scores generate a graph which iseffectively a reection in a horizontal plane of the RMS data shown in thethird panel of Fig 2 In fact in the current simulations correct activation isalmost perfectly correlated with MSE (for example r 5 2 0988 for the 8HUmodel) However the activation metric has the advantage of more readyinterpretation and direct comparison with the human data

When the 8HU model and the human data are aligned as in Fig 3 thesecorrespondences become clear Pairwise comparison of individual points

324 ELLIS AND SCHMIDT

FIG 3 A comparison of human accuracy performance and that of the eight hidden unitconnectionist simulation

across these two graphs by correlation shows that the simulation predicts alarge proportion of the variance in the human data (R2 5 078) There aresome differences in detailmdashas is claried in Fig 4 where performance isaveraged over blocks the model performs somewhat better on the regularitems and worse on the irregular items particularly the low frequencyirregular items than do the humans ANOVA (three factor [humanmodelregularity and frequency] with 15 blocks as repeated measures by wordsanalysis) comparing the human and 8HU model data conrms theseinteractions (a) humanmodel F(1 32) 5 136 ns (b) humanmodel byfrequency F(1 32) 5 047 ns (c) humanmodel by regularity F(1 32) 53028 P 00001 (d) humanmodel by regularity by frequency F(1 32) 5501 P 005

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 325

FIG 4 The regularity by frequency interaction averaged over blocks in humans and the eighthidden unit model Error bars reect 95 condence intervals

Generalisation So far we have described performance with traineditems However we also tested model output when the stimulus was thepattern for generalisation item (TesterP) along with activation of the pluralmarking I-unit 22 a state of input on which the models had never beentrained Table 1 shows performance of the different models at the end oftraining It is clear that the larger models have abstracted the regular pluralpattern and tend to apply it by default to the generalisation test item for the15HU model (a) average activation on the regular plural O-unit is 060 (b)mean RMS error comparing observed activation across O-units 22ndash32 andthe target regular plural pattern (10000000000) is just 045 and (c) four outof the ve exemplar runs of this size of model chose the regular pluralpattern as being the closest to observed output as measured by minimum

326 ELLIS AND SCHMIDT

TABLE 1Performance on the Target Regular Plural Pattern for the Four Sizes of Model When

Presented with the Generalisation Wug-test Item TesterP at End of Training

Model Size

Measure 3HUa 5HU 8HU 15HU

RMS errorb

M 081 079 053 045SD 043 050 045 032

Activation weightc

M 020 028 057 060SD 044 044 052 035

N hits (5)d 1 2 3 4

There were ve examplars of each size of model aHU 5 hidden units bRMS error calculatedagainst the target activation pattern across O-units 22ndash32 for the regular plural afx cActivationweight on the regular plural afx O-Unit dNumber of exemplar models (5) which chose theregular plural afx pattern for TesterP as indexed by output weights on O-units 22ndash32 beingclosest to the regular plural afx target pattern activation using a squared Euclidean distancemetric

squared Euclidean distance Thus when the larger models are presentedwith a plural stimulus which they have only ever previously experienced as asingle form there is a tendency for them to generalise and apply the regularplural morpheme (bu-) in the same way that humans might generalise thatthe plural of ldquowugrdquo is ldquowugsrdquo

Effects of Different Sizes of Model Figure 2 also illustrates the effects ofmanipulating computational capacity of model (1) Models with lowercomputational power ( 5 a smaller number of HUs) learn the high frequencyitems quite wellmdashalmost as well as the largest model (2) The most strikingeffect of varying the computational power of the models lies in their abilitiesto learn low frequency irregular itemsmdashthis is by far the most sensitive indexof morphological learning ability The 3HU model hardly manages to learnthese forms at all The 15HU model eventually learns them rather well (3)There is essentially no frequency effect for regular items in the highercomputational power models but none the less the frequency effect forirregular items remains strong (4) The smaller models continue to show afrequency effect for regular items at the end of training Table l provides oneadditional effect of model size (5) The greater the computational power ofthe models the more they operate in ldquorule-likerdquo way by abstracting aldquoregularrdquo plural form which is applied by default to novel items In sumwhile lower computational power models are reasonably good on highfrequency regular items they show frequency effects for irregular and

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 327

regular items are relatively poor on ldquowug testsrdquo and have particulardifculty on low frequency irregular items

Discussion of Simulations

We believe that at least for the issue of regularity and frequency effects inmorphosyntax this is to date the most complete quantitative analysis of theadequacy of t of simulation to human data We are not simply makingpredictions about how an underspecied model might behave (theDaugherty amp Seidenberg 1994 criticisms of the Pinker amp Prince 1988 andPinker 1991 theories) We are not simply demonstrating that simulation andhuman data alike exhibit rst order interactions of frequency and regularity(Daugherty amp Seidenberg 1994) Instead we are showing the parallelpatterns of signicance of main effects rst and second order interactions inANOVAs of simulation and human data and we are showing that thesimulations explain close to 80 of the relevant human data When we go asfar as actually comparing human and model performance in a multifactorialANOVA we nd some differences of detail in the size of interactions thatare qualied by the humanmodel factor But these differences of detail donot detract from the general success of the models in simulating the humanpattern of development of the frequency by regularity interaction Inhumans and models alike high frequency items were learned signicantlyfaster than low frequency ones regular items were learned signicantlyfaster than irregular ones there was a signicant frequency by regularityinteraction where the frequency effect was less for regular items than forirregular ones and this is qualied as the higher level interaction with blockwhereby there is a developmental trendmdashthe frequency effect for regularitems attenuates faster than that for irregular items

We have demonstrated that the models can generalise and produce thedefault plural afx for a novel stimulus Similar ldquowug testrdquo performance by ahuman learner would be taken as an operationalisation that they hadacquired the ldquoregularrdquo morphological systematicity

Finally we have shown how varying the computational capacity of themodels affects both the rate of acquisition of default case the presence orabsence of frequency effects for regular items and ability to acquireirregular items This is compatible with existing data for children withspecic language impairment (SLI) Oetting and Rice (1993) compared ve-year-old SLI children with age-matched controls on their ability to formplurals The SLI children were signicantly worse at generating regularplurals for nonce (5 wug) items they were worse at generating regularplurals and they showed an effect of frequency on the regular items whichthe control children because of ceiling effects did not UnfortunatelyOetting and Rice (1993) do not provide clear data on the childrenrsquos ability to

328 ELLIS AND SCHMIDT

form irregular plurals However their pattern of differences between SLIand control childrenrsquos performance on regular items is sufciently close tothat between the present low-capacity and high-capacity simulations tosuggest that morphosyntactic impairments in individuals with SLI might beexplained by reduced language processing capacity in a general associativememory network rather than by a hybrid account The SLI childrenrsquosshowing frequency effects for regular items is particularly compelling in thisrespect However further assessment of regularity by frequency effects anddefault abstraction in individuals with SLI and with Williams syndrome(whose ability on regular forms is said to outstrip their performance onirregularsmdashBellugi Bihrle Jernigan Trauner amp Dougherty 1990) isnecessary to test these parallels further (see Marchman 1993 for othersimulations of different types of language dysfunction)

GENERAL DISCUSSION

Fluent language users have processed many millions of utterances involvingtens of thousands of types presented as innumerable tokens It should comeas no surprise either that they demonstrate such effortless and complex skillas a result of this mass of practice or that researchers lacking any truerecord of the learnersrsquo experience are awed and confused by thesesophisticated grammatical abilities While we have no wish to deny any ofthe complexity of the nal uent state we suspect that much of the mysteryof morphology can be claried by focusing on the acquisition process ratherthan the end-point This has been our aim in this paper Our MAL is atravesty of natural language but at least we know the types and tokens in thelearnersrsquo language evidence and there is no need to speculate or argue aboutextrapolations from corpus data or assumptions about registers

Human learning of this MAL inectional morphology quickly culminatesin a state where as with natural language frequency and regularity haveinteractive effects on performance But as we chart acquisition it is clearthat this interaction need not imply complex dual-mechanisms of processingRather it simply reects the asymptotes expected from the power law ofpractice a simple associative law of learning Thus we have shown that oneof the most frequently introduced arguments for the necessity of adual-mechanism approach a frequency effect for irregulars and the absenceof such an effect for regulars is not a good argument at all Furthermore wehave demonstrated that a simple connectionist model as an implementationof associative learning provided with the same language evidenceaccurately simulates the human acquisition data

But how is the power law instantiated in human and connectionistsystems and what is being associated in the acquisition of inectional

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 329

morphosyntax The power law of learning in human performance has beeninterpreted as resulting from basic associative mechanisms involving theformation of new chunks and the effects of frequency on the accessibility ofthese representations (Newell 1990 Newell amp Rosenbloom 1981)Anderson and Schooler (1991) suggest that memory (both as its behaviouralexpression in error rate and latency and as its neural expression in LTP)displays properties such as the power law of learning because theseproperties reect an optimal response to the environment where theprobability of an item occurring at any particular time is a power function ofits past frequency of occurrence Neural activation which controlsbehaviour reects the probability of an item occurring in the environmentthus the neural processes are designed to adapt behaviour to the statisticalproperties of the environment (Anderson 1993) Connectionist systems aredesigned to do the same thing (Chater 1995)

In our simplied account of inectional morphology where phonologicalfactors are put to one side the relevant units for chunking are the stem formsand the plural afxes From an associative perspective regularity andfrequency are essentially the same factor under different names The rstmeaning of ldquoregularrdquo in the Pocket Oxford Dictionary involves ldquohabitualconstantrdquo acts a denition in terms of statistical frequencies consistencyand descriptive generalisation the second stresses ldquoconforming to a rule orprinciplerdquo We need to disentangle these senses (see Sharwood-Smith 1994and Lima Corrigan amp Iverson 1994 for conceptual analysis of ldquorules oflanguagerdquo) Whether regular morphology is generated according to a rule ornot it is certainly the case for English and the MAL under study here (andgenerally it is the default if not the universal casemdashwe will return to thismatter later) that regular afxes are more habitual or frequent And asdemonstrated in Fig 5 the power law of practice entails that an effect of aconstant increment of regularity (in its frequency sense) is much moreapparent at low than at high frequencies of practice

Although it is a general principle the degree to which it applies dependson a range of factors including (a) the exponent of the power function (b)the particular level of experience attained and thus the placement ofcomparison points on the learning curve and (c) the degree to whichfrequency and regularity are additive or multiplicative In the presentexperiment a vefold increase in the frequency of the regular items resultsin a (5 3 the number of regular items) increase in use of the regular afx avefold increase in the frequency of an irregular item results in merely avefold increase in the use of the irregular afx Thus frequency andregularity are interactive rather than additive But even if we allow forinteraction the function still results in greater regularity effects for lowfrequency itemsmdashjust as for example the power function

330 ELLIS AND SCHMIDT

FIG 5 A frequency by regularity interaction arising from additive contributions of regularity(solid horizontal arrows) and frequency (dotted horizontal arrows) inputting into anasymptoting power function Notice in particular the solid vertical bars measuring out the largeregularity effect at low frequencies and the much smaller one at high frequencies (Adaptedfrom Plaut McClelland Seidenberg amp Patterson 1994)

y 5 1 2 x2 2

asymptotes so does any power function

y 5 1 2 (xn)2 2

where n 0 the shape remains the same albeit stretched or condensedalong the horizontal axis Thus all associative accounts of morphologywhether they stress the importance of type or token frequency (Bybee 1995)in the determination of statistical regularity imply a frequency by regularityinteraction in performance

Plaut et al (1996) analyse the operation of connectionist networks in theparticular quasi-regular domain of spellingndashsound consistency in reading todemonstrate how the frequency by regularity interaction is a direct

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 331

consequence of the nonlinearity adaptivity and distributed representationproperties of learning and representation in PDP networks In what followswe will minimally rephrase their analysis as it applies to the quasi-regulardomain of inectional morphology In a connectionist network the weightchanges induced by an inputoutput pattern (IOP) on any training epochserve to reduce the error on that IOP The frequency of the IOP (and theunits it involves) is reected in how often it is presented to the network Thusword frequency directly amplies weight changes that are helpful to theIOP itself Consistency of the morphological inections of two stems isreected in the similarity of afx units that are co-activated in their IOPsFurthermore two inputs will induce similar weight changes to the extentthat they activate similar units In our MAL as an extreme case consistentforms all activate the same afx unit irregular ones each activate a differentidiosyncratic afx Given that the weight changes that are induced by eachIOP are superimposed on the weight changes for all other IOPs an IOPwill tend to be helped by the weight changes for IOPs whose inputoutputmappings are consistent with its own and hindered by the weight changesfor inconsistent IOPs Thus frequency and consistency sum because theyboth arise from similar weight changes that are simply added together duringtraining The weight changes result in corresponding increases in thesummed input to output units that should be active and decreases to thesummed units that should be inactive However due to the non-linearity ofthe input-output function of units these changes do not produce directlyproportionate reduction of error Rather as the magnitude of the summedinput to output units increases their states gradually asymptote towards10mdasha given increase in the summed input to a unit yields progressivelysmaller decrements in error over the course of training Thus althoughfrequency and regularity-as-consistency each contribute to the weights andhence to the summed input to units their effect on error is subjected to agradual ceiling effect as unit states are driven towards extremal values

Thus a connectionist associative account of simple morphosyntax as it isembodied in our MAL holds that learning involves associating inputpatterns representing single or plural concepts with stem and afx lemmasacross a large distributed network Frequency of experience increases thestrength of the appropriate IO associations Regularity effects stem fromconsistency the consistent items all involve pairings between plurality andthe regular lemma and thus regularity is frequency by another name Thenetwork sums and abstracts these consistencies but it does so usingnon-linear unit inputndashoutput functions thereby resulting in the frequency byregularity interaction Networks are not simple competitive chunking orMarkov chaining mechanisms working on surface form Their massivelydistributed nature allows the emergence of more abstract internalrepresentations We have argued that this analysis accounts for the human

332 ELLIS AND SCHMIDT

acquisition data of simple MAL morphosyntax quite well We believe thatthe acquisition of natural language morphosyntax where there are manyadditional factors of different phonological consistencies (of the type forexample where the neighbours sink drink and stink are irregular in theirpast tenses but all behave in the same -ankway) are equally conducive to theprinciples of this type of account although as illustrated in grandersimulation enterprises (Cottrell amp Plunkett 1994 Daugherty amp Seidenberg1994 MacWhinney amp Leinbach 1991 Marchman 1993 Plunkett ampMarchman 1993) the complexity of interaction of the factors that are therein the language evidence leads to much more complex developmentaloutcomes Our role here has been to study human acquisition underprecisely known circumstances and to demonstrate just how well aconnectionist associative account can simulate these data

A simple regularity5 consistency account of this type will have difculty ifthe ldquoregularrdquo or ldquodefaultrdquo case is not the most frequent case in a naturallanguage Although there is agreement for English past tense and formorphology more generally that the default case is more frequent theremay be exceptions Marcus et al (1995) argue that while the German particle-t applies to a much smaller percentage of verbs than its English counterpartand the German plural -s applies only to a small percentage of nounsnevertheless these afxes behave as defaults in the language These defaultsufxations in German could thus pose a problem for statistical orconnectionist accounts of the acquisition of the more frequent patterns asdefault since they may not be due to a large number of regular wordsreinforcing a pattern in associative memory (Prasada amp Pinker 1993)However this is still a matter of some debate Bybee (1995) suggests that amore reasonable method of counting German particle type frequency doesshow the default (or ldquoproductiverdquo) process to have the highest typefrequency She also argues that to a large extent the productivity patterns ofGerman plurals also reect their type frequency Nakisa and Hahn (1996)and Plunkett and Nakisa (in press) show that generalisation to unseen ornovel forms in German and Arabic (where there have also been claims for aminority default) is more accurately predicted by their phonologicalsimilarity to existing forms in the language (properly represented for typeand token frequency) rather than by the operation of a default rule FinallyHare Elman and Daugherty (1995) demonstrate that multilayerednetworks can develop a default category even in the absence of superior typefrequency as long as the non-default classes are well dened and narrowlydened so that they serve as strong prototypes for analogising to novelforms In such cases the area outside these well-dened attractor basins canconstitute a potential default (see also Plunkett amp Marchman 1991)

In the original hybrid model irregulars were stored and accessed fromrote memory Pinker and Prince (1994 p 326) modied this part of the

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 333

model arguing that since rote memory could not account (a) for similaritiesbetween the morphological base and irregular forms (eg swingndashswung) (b)for similarity within sets of base forms undergoing similar processes (egsingndashsang ringndashrang springndashsprang) or (c) for the kind of semi-productivityshown when children produce errors such as bringndashbrang or swingndashswangthe memory system underlying such productions must be associative anddynamic somewhat as connectionism portrays it Yet to account for datasuch as the frequencyregularity interaction this revised hybrid model stillholds that regular forms are rule-governed But a purely rule-based accountof regulars cannot explain false friends effects where regular inconsistentitems (eg bakendashbaked is similar in rhyme to neighbours makendashmade andtakendashtook which have inconsistent past tenses) are produced more slowlythat entirely regular ones (Daugherty amp Seidenberg 1994 Seidenberg ampBruck 1990) or frequency effects on regular forms (Oetting amp Rice 1993Stemberger amp MacWhinney 1986) Unlike connectionist models a rule-based account of regulars cannot explain these aspects of the human dataNor is the regularityfrequency interaction any reason to reject connectionistaccounts of morphosyntax in favour of a hybrid model

REFERENCESAnderson JR (1982) Acquisition of cognitive skill Psychological Review 89 369ndash406Anderson JR (1993) Rules of the mind Hillsdale NJ Lawrence Erlbaum Associates IncAnderson JR amp Schooler LJ (1991) Reections of the environment in memory

Psychological Science 2 396ndash408Beck M (1995) Tracking down the source of NSndashNNS differences in syntactic competence

Unpublished manuscript University of North TexasBellugi U Bihrle A Jernigan D Trauner D amp Dougherty S (1990)

Neuropsychological neurological and neuroanatomical prole of Williams SyndromeAmerican Journal of Medical Genetics 6 115ndash125

Braine MDS Brody RE Brooks PJ Sudhalter V Ross JA Catalano L amp FischSM (1990) Exploring language acquisition in children with a miniature articiallanguage Effects of item and pattern frequency arbitrary subclasses and correctionJournal of Memory and Language 29 591ndash610

Broeder P amp Plunkett K (1994) Connectionism and second language acquisition In NEllis (Ed) Implicit and explicit learning of languages (pp 421ndash454) London AcademicPress

Bybee J (1995) Regular morphology and the lexicon Language and Cognitive Processes10 425ndash455

Chater N (1995) Neural networks The new statistical models of mind In JP Levy DBairaktaris JA Bullinaria amp P Cairns (Eds) Connectionist models of memory andlanguage London UCL Press

Cohen JD MacWhinney B Flatt M amp Provost J (1993) PsyScope A new graphicinteractive environment for designing psychology experiments Behavioral ResearchMethods Instruments and Computers 25 257ndash271

Cottrell G amp Plunkett K (1994) Acquiring the mapping from meaning to soundsConnection Science 6 379ndash412

334 ELLIS AND SCHMIDT

Daugherty KG amp Seidenberg MS (1992) Rules or connections The past tense revisitedIn Proceedings of the 14th annual conference of the Cognitive Science Society (pp 259ndash264)Pittsburgh PA Cognitive Science Society

Daugherty KG amp Seidenberg MS (1994) Beyond rules and exceptions A connectionistapproach to inectional morphology In SD Lima RL Corrigan amp GK Iverson (Eds)The reality of linguistic rules (pp 353ndash388) Amsterdam John Benjamins

DeKeyser R (1997) Beyond explicit rule learning Automatizing second languagemorphosyntax Studies in Second Language Acquisition 19 195ndash222

Ellis NC (1996) Sequencing in SLA Phonological memory chunking and points of orderStudies in Second Language Acquisition 18 91ndash126

Eubank L amp Gregg KR (1995) ldquoEt in Amygdala Egordquo UG (S)LA and neurobiologyStudies in Second Language Acquisition 17 35ndash58

Hare M Elman JL amp Daugherty KG (1995) Default generalisation in connectionistnetworks Language and Cognitive Processes 10 601ndash630

Jung J (1971) The experimenterrsquos dilemma New York Harper amp RowKirsner K (1994) Implicit processes in second language learning In N Ellis (Ed) Implicit

and explicit learning of languages (pp 283ndash312) London Academic PressLachter J amp Bever T (1988) The relation between linguistic structure and associative

theories of language learning A constructive critique of some connectionist learningmodels Cognition 28 195ndash247

Lima SD Corrigan RL amp Iverson GK (Eds) (1994) The reality of linguistic rulesAmsterdam John Benjamins

MacWhinney B (1983) Miniature language systems as tests of use of universal operatingprinciples in second-language learning by children and adults Journal of PsycholinguisticResearch 12 467ndash478

MacWhinney B (1994) The dinosaurs and the ring In SD Lima RL Corrigan amp GKIverson (Eds) The reality of linguistic rules (pp 283ndash320) Amsterdam John Benjamins

MacWhinney B amp Leinbach J (1991) Implementations are not conceptualizationsRevising the verb learning model Cognition 40 121ndash157

Marchman VA (1993) Constraints on plasticity in a connectionist model of the Englishpast tense Journal of Cognitive Neuroscience 5 215ndash234

Marcus GF Brinkmann U Clahsen H Wiese R amp Pinker S (1995) Germaninection The exception that proves the rule Cognitive Psychology 29 198ndash256

McLaughlin B (1980) On the use of miniature articial languages in second-languageresearch Applied Psycholinguistics 1 357ndash369

Moeser SD amp Bregman AS (1972) The role of reference in the acquisition of a miniaturearticial language Journal of Verbal Learning and Verbal Behavior 11 759ndash769

Morgan JL Meier RP amp Newport EL (1987) Structural packaging in the input tolanguage learning Contributions of prosodic and morphological marking of phrases to theacquisition of language Cognitive Psychology 19 498ndash550

Morgan JL amp Newport EL (1981) The role of constituent structure in the induction of anarticial language Journal of Verbal Learning and Verbal Behavior 20 67ndash85

Morton J (1979) Facilitation in word recognition Experiments causing change in thelogogen model In PA Kolers ME Wrolstad amp M Bouma (Eds) Processing of visiblelanguage (pp 259ndash268) New York Plenum

Nakisa R amp Hahn U (1996) Where defaults donrsquot help The case of the German pluralsystem In Proceedings of the 18th annual conference of the Cognitive Science Society (pp177ndash182) Hillsdale NJ Lawrence Erlbaum Associates Inc

Newell A (1990) Unied theories of cognition Cambridge MA Harvard University PressNewell A amp Rosenbloom P (1981) Mechanisms of skill acquisition and the law of

practice In JR Anderson (Ed) Cognitive skills and their acquisition Hillsdale NJLawrence Erlbaum Associates Inc

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 335

Oetting JB amp Rice ML (1993) Plural acquisition in children with specic languageimpairment Journal of Speech and Hearing Research 36 1236ndash1248

Paivio A (1986) Mental representations A dual coding approach Oxford UK OxfordUniversity Press

Palermo DS amp Howe HE (1970) An experimental analogy to the learning of past-tenseinection rules Journal of Verbal Learning and Verbal Behavior 9 410ndash416

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S amp Prince A (1988) On language and connectionism Analysis of a parallel

distributed processing model of language acquisition Cognition 29 195ndash247Pinker S amp Prince A (1994) Regular and irregular morphology and the psychological

status of rules of grammar In SD Lima RL Corrigan amp GK Iverson (Eds) The reality oflinguistic rules (pp 321ndash351) Amsterdam John Benjamins

Plaut DC McClelland JL Seidenberg MS amp Patterson KE (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plunkett K amp Marchman V (1991) U-shaped learning and frequency effects in amulti-layered perceptron Implications for child language acquisition Cognition 38 3ndash102

Plunkett K amp Marchman V (1993) From rote learning to system building Acquiring verbmorphology in children and connectionist nets Cognition 48 21ndash69

Plunkett K amp Nakisa RC (in press) A connectionist model of Arabic plural systemLanguage and Cognitive Processes

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Prasada S Pinker S amp Snyder W (1990) Some evidence that irregular forms are retrievedfrom memory but regular forms are rule-governed Paper presented at the 31st meeting ofthe Psychonomic Society New Orleans November

Rumelhart D Hinton G amp Williams R (1986) Learning internal representations by backpropagation In DE Rumelhart amp JL McClelland (Ed) Parallel distributed processingExplorations in the microstructure of cognition Cambridge MA MIT Press

Rumelhart D amp McClelland J (1986) On learning the past tense of English verbs In DERumelhart amp JL McClelland (Eds) Parallel distributed processing Explorations in themicrostructure of cognition Vol 2 Psychological and biological models (pp 272ndash326)Cambridge MA MIT Press

Seidenberg MS amp Bruck M (1990) Consistency effects in the generation of past tensemorphology Paper presented at the 31st meeting of the Psychonomic Society New OrleansNovember

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behavior 23 383ndash404

Sharwood Smith MA (1994) The unruly world of language In N Ellis (Ed) Implicit andexplicit learning of languages (pp 33ndash44) London Academic Press

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Human Learning and Memory 6 174ndash215

Stemberger JP amp MacWhinney B (1986) Frequency and the lexical storage of regularlyinected forms Memory and Cognition 14 17ndash26

Winter B amp Reber AS (1994) Implicit learning and the acquisition of natural languagesIn N Ellis (Ed) Implicit and explicit learning of languages (pp 115ndash146) LondonAcademic Press

Yang LR amp Givon T (1997) Benets and drawbacks of controlled laboratory studies ofsecond language acquisition The Keck second language learning project Studies in SecondLanguage Acquisition 19 173ndash194

336 ELLIS AND SCHMIDT

APPENDIX

The Word-forms of the Articial Language

Picture Stem Plural Form Frequency Regularity

car garth bugarth 5 Rbed pid bupid 1 Rlamp lant bulant 5 Rtable tib butid 1 Rplane poon bupoon 5 Rball prill buprill 1 Rtrain dram budram 5 Rhouse hize buhize 1 Rbook bisk bubisk 5 Rbroom breen bubreen 1 Rphone feem gofeem 5 Iumbrella brol gubrol 1 Ichair charp zecharp 5 Ihorse naig zonaig 1 Imonkey chonk nuchonk 5 Idog woop niwoop 1 Ielephant fant vefant 5 Iscissors zoze vuzoze 1 Ikite kag rekag 5 Ish pisc ropisc 1 I

308 ELLIS AND SCHMIDT

INTRODUCTION

Can human morphological abilities be understood in terms of associativeprocesses or is it necessary to postulate rule-based symbol processingsystems underlying these grammatical skills This question has generatedconsiderable debate in the literature over the past decade much of itfocusing on the behaviour of ldquoregularrdquo and ldquoirregularrdquo inectionalmorphology There are broadly two contrasting accounts Dual-processingmodels (for example Marcus Brinkmann Clahsen Wiese amp Pinker 1995Pinker amp Prince 1988 Prasada Pinker amp Snyder 1990) take the differencesin behaviour of regular and irregular inections to represent the separateunderlying processes by which they are produced Regular inections areproduced by rules (for example for the past tense ldquoadd -ed to a Verbrdquo)while irregular inections are listed in memory Associative accountswhether connectionist (eg MacWhinney amp Leinback 1991 Plunkett ampMarchmann 1993 Rumelhart amp McClelland 1986) or schema-network(Bybee 1995) models assume that both regular and irregular inectionsarise from the same mechanism a single distributed associative networkwith the differences in behaviour being due to statistical distributionalfactors

This debate often makes reference to one key behavioural differencebetween regular and irregular inections When people are asked toproduce past tense forms their latencies are affected by frequency of pasttense forms when generating irregular inections but not when generatingregular ones Prasada et al (1990) and Seidenberg and Bruck (1990) showedthat when uent native English speakers see verb stems on a screen and arerequired to produce the past tense form as quickly as possible they takesignicantly less time (16ndash29msec in three experiments) for irregular verbswith high past tense frequencies (like went) than for irregular verbs with lowpast tense frequencies (like slung) even when stem frequencies are equatedHowever there is no effect on latency of the past tense frequency of regularverbs whose past tense is generated by adding -ed

This lack of frequency effect on regular forms has been taken as evidencethat grammar cannot be understood solely in terms of associativemechanisms Pinker (1991) uses it in support of a hybrid account ofmorphological inection Regular verbs (walkndashwalked) are computed by asufxation rule in a neurally based symbol manipulating syntactic systemwhile irregular verbs (runndashran) are retrieved from an associative memoryBriey his explanation is as follows (1) Irregular inected forms must bememorised since they do not conform to a rule A general property ofassociative memory systems is that there are robust frequency effectsFrequently encountered items are better remembered and faster accessedThus low frequency irregular forms take longer to access than high

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 309

frequency ones (2) Regular inections are not stored in associative memorybut are generated by a rule-based symbolic system the time to produce theinected form simply reecting (a) the time to access the lemma form and(b) the time to bind procedurally the regular inectional afx Thus thereare no frequency effects on their production latencies For example walkand afford are both quite common in their stem forms but the past tenseform walked is much more common than is afforded Nevertheless arule-generated account predicts that afforded will be produced as quickly aswalked since the stem forms being equally frequent are equally readilyaccessed and it takes a constant amount of time to add an -ed ending

Beck (1995) reports similar regularity by frequency interactions in thelatencies of productions of non-native speakers and thus broadens theapplication of this account to second language learning Indeed the effect isgenerally cited as key evidence for the existence of symbol-manipulatingrules in a specically linguistic mental module underpinning both rst andsecond language acquisition (Eubank amp Gregg 1995 Pinker 1991)

It is an elegant and attractive argument and the latency of productiondata are indeed consistent with such an account But there are two problemsThe rst is that there is a simpler more parsimonious explanation In thisarticle we will show that a basic principle of learning the power law ofpractice also generates frequency by regularity interactions Thus thesebehavioural dissociations between ldquoregularrdquo and ldquoirregularrdquo forms areequally consistent with connectionist accounts of morphosyntax The secondproblem is that although these theories are trying to explain both languageprocessing and language acquisition these particular data come from highlyuent language users It is difcult to gain an understanding of learning anddevelopment from observations of the nal state when we have no record ofthe content of the learnersrsquo years of exposure to language nor of thedevelopmental course of their prociencies If we want to understandlearning we must study it directly

The present report therefore describes adult acquisition of secondlanguage morphology using a miniature articial language (MAL) wherefrequency and regularity are factorially combined The accuracy and latencydata demonstrate frequency effects for both regular and irregular formsearly on in the acquisition process However as learning progresses so thefrequency effect for regular items diminishes although it remains forirregular items The results thus converge on the end-point described byPrasada et al (1990) However they also show how subjects reach thisendpoint with the convergence of performance on high and low frequencyregular plurals indexing the rate of acquisition of the regular pattern Wenext describe a simple connectionist model which was exposed to the sameexemplars in the same order as the human subjects The results of thesesimulations closely parallel those of the human learnersmdashthere are initially

310 ELLIS AND SCHMIDT

frequency effects for both the regular and irregular forms but with increasedexposure so the frequency effect for regular forms is attenuated Thus aconnectionist system which has no ldquorulesrdquo can duplicate this ldquorule-likerdquobehaviour Rather as shown by Plaut McClelland Seidenberg andPatterson (1996) for the case of reading the frequency by regularityinteraction is a natural and necessary result of associative learning processes

HUMAN LEARNING

If we wish to investigate the effects of input and practice on the acquisition oflanguage structure then we need a proper record of learner input Yet it isvirtually impossible to gather a complete corpus of learnersrsquo exposure andproduction of natural language How can we ascertain how many types andtokens of regular and irregular inections have been processed by forexample learners of English or of German At best for natural languagewe can only guess by extrapolation of frequency counts from languagecorpora and unveriable assumptions about registers Much of the disputeabout the implications of the regularity by frequency effect in morphosyntaxcentres on such assumptions (Bybee 1995 Marcus et al 1995 Plunkett ampMarchman 1991 Prasada amp Pinker 1993 Rumelhart amp McClelland 1986)One way around this is to have people learn a miniature articial language(MAL) under laboratory conditions

There is a rich tradition of using MALs to investigate processes ofacquisition of native language (Braine Brody Brooks Sudhalter RossCatalano amp Fisch 1990 Moeser amp Bregman 1972 Morgan Meier ampNewport 1987 Morgan amp Newport 1981 Palermo amp Howe 1970 Winter ampReber 1994) and second- and foreign-languages (MacWhinney 1983McLaughlin 1980 Yang amp Givon 1997) The number of published studies isat least in the hundreds if not more This is because MAL experiments havemany advantages They allow (a) a complete log of exposure to be recorded(b) accuracy to be monitored at each point (c) factorial manipulation of thepotential independent variables of interest and the teasing apart of naturallyconfounded effects and (d) relatively rapid collection of data But theseadvantages are bought at the cost of reduced ecological validity (1) MALsare toy languages when compared to the true complexity of naturallanguage (2) the period of study falls far short of lifespan practice (3)laboratory learning exposure conditions are far from naturalistic and (4)volunteer learners are often atypical in their motivations and demographicsAll of these very real problems of laboratory research stem from thesacrices made necessary by the goals of experimental control and

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 311

microanalysis of learning in real time This is the classic ldquoexperimenterrsquosdilemmardquo Naturalistic situations limit experimental control and thus theinternal logical validity of research laboratory research limits ecologicalvalidity (Jung 1971)

In adopting MAL research we are not denying naturalistic eld studiesWe might caricature the rst as providing valid descriptions of articiallanguage learning and the latter as providing tentative descriptions ofnatural language learning However the use of a MAL in this study avoids atleast three problems that have plagued similar experiments using naturallanguages (Beck 1995) (1) Uncertainty whether frequencies derived fromcorpora accurately represent input to learners (2) problems attributed tointerference from phonological similar items in regular and irregular sets(eg lean lend or y ow) or derived forms (eg head as a verb derived froma noun) and (3) evidence from only an advanced stage of learning forcingreliance on logical argumentation rather than empirical evidence to describeacquisition

Subjects

Seven monolingual English volunteers for the School of Psychologyvolunteer panel served as subjects There were three males and four femalesThey were aged between 18 and 40 yearsrsquo old They were paid pound250 per hourfor their involvement They usually worked an hour a day at the experiment

The Miniature Articial Language

Moeser and Bregman (1972) criticised the generalisability of MALexperiments which involved subjects listening to strings of words fromsemantically empty languages because some syntactic rules that were easilyacquired when the MAL referred to a stimulus world were not acquiredwhen it did not The MAL in the present study therefore incorporatedreference The subjectsrsquo initial task was to learn MAL names for 20 picturestimuli They were told that they were learning vocabulary in a newlanguage The pictures drawn from Snodgrass and Vanderwart (1980) aredescribed in the Appendix along with the stem form of their MAL namesand their corresponding plural forms Like Braine et al (1990) we choseMAL names which were suggestive of English cognates in order to makethem readily learnable thus for example the MAL words for umbrella andsh are respectively ldquobrolrdquo and ldquopiscrdquo To the degree that the task onlyinvolves ostensive denition and is not embedded in a larger goal-directedsetting it is acknowledgedly limited as an analogue of natural languagevocabulary acquisition However it allows clean and precise experimentalcontrol whilst providing a reasonable model of ostensive vocabulary

312 ELLIS AND SCHMIDT

learning that occurs to some considerable degree in L1 and even more so inintentional foreign language learning

Subjects learned the stem forms before studying the plural forms In thestem learning phase all items appeared equally often In the subsequentplural learning phase in order to maximise the sensitivity of the reactiontime (RT) measure plurality in the MAL was marked by a prex Half of theitems had a regular plural marker (ldquobu-rdquo) the remaining 10 items hadidiosyncratic afxes as shown in the Appendix The use of a prexinectional system afforded the additional advantage of minimising transfereffects from the subjectsrsquo rst language since although it is found in naturallanguages like Ndebele it is quite different from English plural formationThus the MAL was designed with English cognates in order to promotepositive transfer of learning of the stem forms and a very different inectionsystem in order to exclude any morphological transfer Frequency wasfactorially crossed with regularity with half of each set being presented vetimes more often The high and low frequency irregular items were matchedfor initial phoneme to control voice onset time

Method

The experiment was controlled by a Macintosh LCIII computerprogrammed with PsyScope (Cohen MacWhinney Flatt amp Provost 1993)Model pronunciations of the MAL lexis spoken by the rst author wererecorded using MacRecorder Subjectsrsquo vocal reaction times were measuredusing a voice key

Stem Learning Subjects rst learned the stem forms of the MAL Thisphase consisted of blocks of 20 trials In each block every picture appearedonce in a randomly chosen ordermdashthe subjectsrsquo frequency of exposure to allof the stem forms was the same Each trial consisted of the followingsequence (1) one of the pictures appeared mid-screen for 2sec (2) if thesubject thought they knew the picture name they spoke it into themicrophone as quickly as possible (3) 2sec after picture onset the computerspoke the correct name for the picture (4) the experimenter marked thesubjectrsquos utterance as correct or not by pressing one of two keys Thedependent variables were thus correctness and RT These blocks of trialswere repeated until the subjects knew the MAL names for the pictures andcould begin uttering them within 2sec of stimulus-onset to a criterion of100 correct on two successive blocks At this point they graduated to theplural learning phase

Plural Learning This phase used the same procedures except that eachblock consisted of 80 trials presented in random order (1) One presentation

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 313

of each of the 20 singular forms as in the preceding phase (2) vepresentations of each of the ve high frequency regular (HiFreqReg) pluralforms (3) ve presentations of each of the ve high frequency irregular(HiFreqIrreg) plural forms (4) one presentation of each of the ve lowfrequency regular (LoFreqReg) forms and (5) one presentation of each ofthe ve low frequency irregular (LoFreqIrreg) forms On the singular trialsjust one picture appeared midscreen on the plural trials a pair of adjacentidentical pictures appeared This phase continued for several (mean 5 43range 5 0 to 9) blocks beyond the point at which the learners had achieved100 accuracy on all plural forms in order to monitor increasing uency asindexed by RT improvement

Results

Stem Learning The stem learning data will only be presented insummary since the major focus of the experiment lies with the plural formsSubjects took an average of 917 (SD 593) blocks to achieve the criterion ofcorrectness Some stem forms were easier to learn than others (F(19 2161)5 2307 P 0001) Particularly easy words included ldquofantrdquo (92 correctover all trials) ldquopiscrdquo (85) and ldquolantrdquo (78) Particularly difcult wordsincluded ldquoprillrdquo (32) ldquocharprdquo (43) and ldquobreenrdquo (46) However forpurposes of control it is important to note that the stem forms of the itemsthat were later allocated in the Plural Learning phase to regularirregularplural morphology or highlow frequency of exposure did not signicantlydiffer in difculty of learning at this stage Regularity [F(1 16) 5 0703 ns)Frequency [F(1 16) 5 0569 ns) Regularity 3 Frequency (F(1 16) 5 0029ns)

Plural Learning Subjects partook of between 13 and 15 blocks of thisphase

The key interest lies with the rate of acquisition of the plural forms Wewill rst describe analyses of accuracy and then RT These data are shown inFig 1

ANOVA was used to assess the effects of frequency regularity and blockFor the main effects of regularity and frequency and their interaction wereport additional analyses which determine the robustness of these effectswhen separately analysed by subjects and by words There was a signicanteffect of frequency on accuracy with the advantage going to the highfrequency items [overall analysis F(1 5939) 5 43117 P 0001 by subjectsF(1 6) 5 5631 P 0005 by words F(1 16) 5 17200 P 0001] There wasa signicant effect of regularity with the regular plurals being learned betterthan the irregulars [overall analysis F(15939) 5 8152 P 0001 bysubjects F(1 6) 5 664 P 005 by words F(1 16) 5 3050 P 0001]

314 ELLIS AND SCHMIDT

FIG 1 Acquisition data for human learners of the MAL morphology The four curvesillustrate the interactions of regularity and frequency The left-hand panel shows accuracyimproving with practice The right-hand panel shows vocal reaction time diminishing withpractice In this graph as in Figs 2 and 3 the frequency effect for regular items is assessed bycomparing the two solid lines and the frequency effect for irregular items lies in the differencebetween the two dotted lines

There was signicant improvement over blocks [F(14 5939) 5 13200 P 0001] The interaction of regularity by frequency was signicant with thefrequency effect being larger for the irregular items [overall analysis F(15939) 5 7352 P 0001 by subjects F(1 6) 5 1241 P 002 by words F(116) 5 2773 P 0001] A signicant interaction between regularity byfrequency by block [F(14 5939) 5 222 P 0005] shows that the largerfrequency effect for irregular items is maximal in the mid-order blocksmdashit isa lesser effect at early and later stages of learning (Fig 1)

These patterns are conrmed in the somewhat noisier RT data where thefollowing sources of variation were signicant at least in the overall analysis(a) frequency [overall analysis F(1 5123) 5 65074 P 0001 by subjectsF(1 6) 5 6308 P 0001 by words F(1 16) 5 7396 P 0001] (b)

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 315

regularity [overall analysis F(1 5123) 5 1062 P 0001 by subjects F(1 6)5 326 ns by words F(1 16) 1 ns] (c) block [F(14 5123) 5 2872 P 0001] (d) regularity by frequency [overall analysis F(1 5123) 5 2092 P 0001 by subjects F(1 6) 5 1015 P 002 by words F(1 16) 5 215 ns] (e)regularity by frequency by block [F(14 5123) 5 195 P 005]

It is clear from both panels of Fig 1 that there was much less regularityeffect for high frequency items than for low frequency items and incounterpart that the frequency effect was less for regular items Inparticular if the last four blocks of training are taken being typical of moreuent performance they demonstrate that ceiling effects on the accuracydata allow no frequency effect for the regular items whereas the effect offrequency is maintained for the irregular ones The RT curves in theright-hand panel of Fig 1 are clearly non-linear In each case a powerfunction better ts the data than does a linear function the R2s for the powerfunction ts being respectively HiFreqReg 094 HiFreqIrreg 097LoFreqReg 074 LoFreqIrreg 076 Thus the frequency by regularityinteraction seems a natural result of asymptotic performance limits forcorrectness the 100 accuracy ceiling for RT the latency ldquooorrdquo governedby the power law of practice The curves in Fig 1 give no hint of a suddenstep in performance whereafter all regular items are produced with similarefciency

Discussion of Human Data

Like Prasada et al (1990) these data show a regularity by frequencyinteraction in the processing of morphology However contra Prasada et althe present data which concern the learning of morphology demonstrate(a) that there are frequency effects (both on accuracy and RT) for regularitems in the early stages of acquisition (b) the sizes of these effects diminishwith learning (converging on a position at uency as described by Prasada etal) and (c) the size of the frequency effect on irregular items similarlydiminishes with learning but it does so more slowly

These effects are readily explained by simple associative theories oflearning It is not necessary to invoke hybrid systems separating rule-governed regular morphosyntax from associatively stored irregulars Ifthere is one ubiquitous quantitative law of human learning it is the powerlaw of practice (Anderson 1982) The critical feature in this relationship isnot just that performance typically time improves with practice but that therelationship involves the power law in which the amount of improvementdecreases as a function of increasing practice or frequency Anderson (1982)showed that this function applies to a variety of tasks including for examplecigar rolling syllogistic reasoning book writing industrial productionreading inverted text and lexical decision For the case of language

316 ELLIS AND SCHMIDT

acquisition Kirsner (1994) has shown that lexical recognition processes(both for speech perception and reading) and lexical production processes(articulation and writing) are independently governed by the relationshipT5 BN-a where T is some measure of latency of response and N the numberof trials of practice DeKeyser (1997) shows that automatisation ofcomprehension and production performance involving explicitly learnedsecond-language morphosyntax separately follow independent skill-specic power functions Ellis (1996) describes the general implications ofthe power law for second-language acquisition

The human acquisition data in Fig 1 clearly follow the power law oflearning Thus as performance approaches asymptote so previouslyseparated functions tend to converge High frequency items are closer toasymptote Therefore whereas performance levels for regular and irregularitems are clearly distinguishable at low frequencies they are much lessdistinct at high frequencies This comes as no surprise to us when weconsider the ceiling imposed by 100 accuracy But the power law ofpractice equally implies an asymptotic ceiling whatever our performancemeasure

The power law entails that the contribution of any potential independentvariable affecting performance will be more difcult to demonstrate withhigh-frequency items in practised individuals This is certainly the case inreading For example while spelling and graphemendashphoneme regularityhave clear effects on low frequency items they show little or no effectsamong high frequency words (Seidenberg Waters Barnes amp Tanenhaus1984) Our learning data illustrate the same principle operating in theacquisition of morphology It is not the case that there is no regularity effecton high frequency items (or concomitantly no frequency effect on regularitems) it is simply that such effects are much smaller closer to asymptoteand thus are likely to be swamped by random error Indeed highfrequency regular inected forms do exhibit a small (but non-signicant)advantage over low frequency forms in naturally occurring errorsand they can be shown to have a larger (signicant) advantage ina more controlled experimental task in which subjects produced thepast-tense forms of regular English verbs (Stemberger amp MacWhinney1986)

We have shown that the interaction of frequency and regularity resultsfrom developmental trends that are consistent with the ubiquitousdescriptive law of associative learning In the next section we willdemonstrate how such data can be generated by a very general mechanismof associative learning When presented with the same materials at the samerelative frequencies of exposure a standard three-layer feed-forwardconnectionist model closely simulates our language-learnersrsquo acquisitioncurves

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 317

CONNECTIONIST SIMULATIONS

Connectionist models allow the assessment of just how much of languageacquisition can be done by extraction of probabilistic patterns ofgrammatical and morphological regularities Since the only relation inconnectionist models is strength of association between nodes they areexcellent modelling media in which to investigate the formation ofassociations (both between surface-form elements and between these andemergent more abstract internal representations) as a result of exposure tolanguage The advantages of connectionist models over traditional symbolicmodels are that (a) they are neurally inspired (b) they incorporatedistributed representation and control of information (c) they are data-driven with prototypical representations emerging as a natural outcomeof the learning process rather than being prespecied and innately givenby the modellers as in more nativist cognitive accounts (d) they showgraceful degradation as do humans with language disorder and (e)they are in essence models of learning and acquisition rather than staticdescriptions

There have been a number of compelling PDP models of the acquisition ofmorphology The pioneers were Rumelhart and McClelland (1986) whoshowed that a simple learning model reproduced to a remarkable degreethe characteristics of young children learning the morphology of the pasttense in Englishmdashthe model generated the so-called U-shaped learningcurve for irregular forms it exhibited a tendency to overgeneralise and inthe model as in children different past-tense forms for the same word couldco-exist at the same time Yet there was no ldquorulerdquomdashldquoit is possible to imaginethat the system simply stores a set of rote-associations between base andpast-tense forms with novel responses generated by lsquoon-linersquo generalisationsfrom the stored exemplarsrdquo (Rumelhart amp McClelland 1986 p 267) Thisoriginal past-tense model was very inuential It laid the foundations for theconnectionist approach to language research which this special issue attestsit generated a large number of criticisms (Lachter amp Bever 1988 Pinker ampPrince 1988) some of which are undeniably valid and in turn it thusspawned a number of revised and improved PDP models of different aspectsof the acquisition of the English past tense (eg Cottrell amp Plunkett 1994Daugherty amp Seidenberg 1994 MacWhinney amp Leinbach 1991Marchman 1993 Plunkett amp Marchman 1991)

Of these newer models only that of Daugherty and Seidenberg (19921994) addressed the regularity by frequency interaction Their model was athree-layer feed-forward network mapping the input of phonologicalstructure of present tense encoded over 120 phonological units representinga CCCVVCCC template for English monosyllables onto an output ofsimilarly coded phonological structure of past tense form Simulation 1

318 ELLIS AND SCHMIDT

where the model was trained on all presentndashpast tense pairs with Francis andKucera frequencies 1 (309 verbs with regular past tenses 104 verbs withirregular past tenses) failed to generate any frequency by regularityinteraction in error score However when in simulation 2 the number ofirregular verbs in the training set was reduced to just 24 this resulted in therebeing little effect of frequency on performance with the regular itemswhereas performance was better for high frequency irregular verbs than forlow frequency ones This is an important demonstration that the frequencyby regularity interaction can be simulated by a connectionist systemHowever this model concerned mappings between present- and past-tenseforms not direct access from semantics as in our human data Furthermoreit is unclear from these simulations how much the results are due toregularity per se how much to phonological factors (for example insimulation 1 the error scores for regulars in generalisation tests were inatedby there being a high proportion of phonologically similar irregular pasttense false friends in the training corpus 1994 p 375) and given thecontrasting results of simulations 1 and 2 how much to the particular choiceof training items and the relative proportions of regular and irregular items

Indeed much of the debate over the validity of all of these models hasconcerned (a) the adequacy of the adopted low-level phonologicalrepresentations whether these might serve as TRICS (The RepresentationsIt Crucially Supposes) which cryptoembody rules within the connectionistnetwork (Lachter amp Bever 1988) (b) over-reliance on phonological cues inmodels that used sound-to-sound conversions to link base forms with pasttense forms (Daugherty amp Seidenberg 1992 MacWhinney 1994MacWhinney amp Leinbach 1991) and (c) the appropriateness of the trainingsets that are used in exposing the models to the evidence of language andwhether they properly reect the types and tokens in representative ratiosof regular and irregular forms in a sequence that plausibly mirrors learnerlanguage exposure at different stages of development (Daugherty ampSeidenberg 1992 Plunkett amp Marchman 1991) The models are usuallyconcerned with child learner language exposure yet here the extrapolationis particularly tenuous since adult language frequency norms are typicallythe only available reference database

In our simple demonstration with its intended focus on the frequency byregularity interaction in the acquisition of morphology we circumventedthese problems by the following means

1 We eliminated TRICS from our input and output representations byentirely ignoring the low level representations and instead simply havingone input unit for each picture and one output unit for each morphemeWe make no pretence of plausibility of these models for low levels ofrepresentation in either input or output processing but we are presently

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 319

neither concerned with low-level feature perception nor the details ofmotor programming for pronunciation Each input unit is supposedbroadly to correspond to activation of some picture detector or ldquoimagenrdquo(Paivio 1986) each output unit to some speech output ldquologogenrdquo(Morton 1979) We acknowledge that these parts of the model are grosslysimplied and we believe that these aspects ultimately involve distributedrepresentations as well However there is one advantage to thissimplicitymdashwhere as here each input detector or output logogen isrepresented by just one unit with all units having the same form there isno scope for making some more similar than others other that is than isdetermined by the frequency of the inputndashoutput mappings Thisencoding scheme allows the most hygienic investigation of frequency andregularity uncontaminated by other factors2 Like Cottrell and Plunkett (1994) we are modelling direct access fromsemantics rather than generating past tense from stem form phonologyBecause there are no phonological representations in our model there isno chance of the results reecting any confound with phonology As usualcosts accompany the benets Our simulations can have no bearing onphonological aspects of inection and thus while they might generatequantitatively clean data unlike the elegant error analyses performed byfor example Daugherty and Seidenberg (1994) and MacWhinney andLeinbach (1991) the error responses in the present simulations will bequalitatively uninteresting3 We eliminated uncertainty about the detailed content of the complexevidence which human learners are exposed to during their early years ofhearing natural language by modelling adult subjectsrsquo learning of theMAL that was reported in the preceding section Because we determinedthe exposure sequence of types and tokens of regular and irregular itemsin this language learning task we could train the models ensuring theidentical history of exposure

The most common architecture of connectionist model has three layersthe input layer of units the output layer and an intervening layer of hiddenunits (HUs) The presence of HUs enables more difcult inputoutputmappings to be learned than would be possible if the input units weredirectly connected to the output units (Broeder amp Plunkett 1994Rumelhart amp McClelland 1986) The most common learning algorithm isldquoback propagationrdquo (Rumelhart Hinton amp Williams 1986) where on eachlearning trial the network compares its output with the target output andany difference is propagated back to the hidden unit weights and in turn tothe input weights in a way that reduces the error Our simulations adoptedthis standard architecture Thus whatever the pattern of results they aregenerated by a very general learning system whose processes were not

320 ELLIS AND SCHMIDT

tweaked in any way to make it particular as a Language Acquisition DeviceSo what are the emergent patterns of language acquisition that result whenthis general associative learning mechanism is applied to the particularcontent of picture stimuli with their corresponding singular and plural lexicalresponses as experienced at the same relative frequencies of exposure as ourhuman learners

The Models

Architecture Every model had 22 input (I-) units Each of I-units 1ndash20represented one of the pictures used in the training set of the AppendixI-unit 21 represented another picture (the generalisation test item TesterP)which was only ever presented for training to the model in the singularmdashlater it was presented as a plural test item to see which plural afx the modelwould choose for this generalisation item (akin to asking you what is theplural of a novel word like ldquowugrdquo) I-unit 22 coded plurality that iswhether a singular stimulus item or a pair were presented Every model had32 output (O-) units O-units 1ndash20 represented the stem forms of the lexisshown in the Appendix O-unit 21 represented the stem form correspondingto I-unit 21 O-units 22ndash31 represented each of the other 10 unique pluralafxes for irregular items O-unit 32 represented the regular plural afxThis numbering of I- and O-units is of course arbitrary and was random-ised across modelsmdashwhat mattered and remained constant was that thesame O-unit was always reinforced whenever a particular I-unit wasactivated

We investigated four different classes of model which differed in theircomputational capacity or resources The larger the number of HUs in amodel the larger the number of connections in the network and the greaterits capacity to learn new associations and abstractions Thus we comparedmodels with 3 5 8 and 15 HUs

Stem Training At the outset the connection weights of the models wererandomised Then just like our human learners the models were rsttrained on the singular forms Each epoch of training consisted of 21 trialsEach trial consisted of presentation of a unique input pattern one for each ofthe input pictures Thus just one of I-units 1ndash21 would be ldquoonrdquo on any trialThroughout the singular training phase I-unit 22 (representing singlepluralstimuli) was set to ldquooffrdquo For each input pattern the model responded with apattern of output over its 32 O-units Initially this was the random result ofthe random connection weights But the model was also presented with thecorrect pattern of output for that corresponding input pattern (eg if I-unit 1

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 321

was on and all others off O-unit 1 should have had value 10 and all otherszero) On each trial the back-propagation algorithm calculated thedifference between the level of activity that was produced on each O-unitand the ldquocorrectrdquo level of activity and a small adjustment was made to theconnection strength to that unit in such a way that when the same processoccurred again a closer approximation to the correct pattern of outputactivation would be achieved The models were trained for 500 epochs ofsingular experience For each size of model we ran ve examples startingwith different arbitrary unit allocation and different initial randomconnection strengths The data we produce for each model is the averageperformance of these ve examples

Plural Training The model weights that resulted from this singulartraining then served as the starting point for another 700 epochs of trainingon plurals The trials constituting each epoch were very similar in nature tothose used with the human learners Each epoch consisted of 81 trialspresented in random order (a) One presentation of each of the 21 singularforms as in the preceding phase (b) ve presentations of each of the ve highfrequency regular (HiFreqReg) plural forms (c) ve presentations of eachof the ve high frequency irregular (HiFreqIrreg) plural forms (d) onepresentation of each of the ve low frequency regular (LoFreqReg) formsand (e) one presentation of each of the ve low frequency irregular(LoFreqIrreg) forms For training trials of type (a) just one of I-units 1ndash21was activated I-unit 22 was off and just the corresponding one of O-units1ndash21 was reinforced For the other training types (bndashe) one of I-units 1ndash20was activated I-unit 22 was on and one of O-units 1ndash20 (the correspondingstem form) along with one of O-units 22ndash32 (the corresponding plural afx)were reinforced The learning algorithm operated as it did in the stemtraining phase At regular intervals we tested the state of learning of themodel by presenting it without feedback with test input patterns thatrepresented the plural cases of all 21 pictures At these tests for eachstimulus we measured the pattern of activation (between 0 [no activation]and 1 [full on]) across O-units 22ndash32 and compared it against the targetplural activation for that input pattern

Results

Regularity by Frequency Figure 2 shows the Root Mean Square (RMS)error calculated across the plural afx O-units (22ndash32) averaged over the veitems in each of the following classes HiFreqReg HiFreqIrreg LoFreqRegLoFreqIrreg at each point in testing of the model These graphs illustratethat learning in all of the models showed clear effects of frequency (high

322

FIG

2

Acq

uisi

tion

data

for

fou

r co

nnec

tioni

st m

odel

s w

ith

incr

easi

ng c

ompu

tati

onal

pow

er t

rain

ed o

n th

e M

AL

mor

phol

ogy

The

re a

re c

lear

reg

ular

ity b

y fr

eque

ncy

inte

ract

ions

in a

ll m

odel

s

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 323

frequency items were learned faster than low frequency ones) regularity(regular items were learned faster than irregular ones) and a frequency byregularity interaction whereby there was much less regularity effect for highfrequency items than for low frequency items and equally that thefrequency effect was less for regular items than for irregular ones

ANOVAs on these RMS data for each size of model demonstrated thatthere was high consistency of response across items and examplesimulations For example when the 8HU model was analysed as a repeatedmeasures ANOVA across 15 roughly equally spaced blocks of training (toparallel the human data analysis) the following signicant effects wereobserved (a) Frequency [by simulations F(1 16) 5 2080 P 00005 bywords F(1 16) 5 5665 P 00001] (b) regularity [by simulations F(1 16)5 907 P 001 by words F(1 16) 5 3957 P 00001] (c) regularity byfrequency [by simulations F(1 16) 5 485 P 005 by words F(1 16) 51561 P 0005] (d) block [by simulations F(14 224) 5 6803 P 00001by words F(14 224) 5 14914 P 00001] (e) block by regularity [bysimulations F(14 224) 5 3675 P 00001 by words F(14 224) 5 2929 P 00001] (f) block by frequency [by simulations F(14 224) 5 1893 P 00001 by words F(14 224) 5 1184 P 00001] and (g) block by regularityby frequency [by simulations F(14 224) 5 1611 P 00001 by words F(14224) 5 1306 P 00001]

Comparison of this pattern of ANOVA effects with that reported earlierfor the human data shows important similarities in both cases there aresignicant main effects of frequency regularity and blocks and there aresignicant interactions involving regularity by frequency and regularity byfrequency by block Thus the connectionist models demonstrate effectswhich broadly parallel those found in humans

Comparison with Human Data More detailed comparison is alsopossible Although RMS error is the usual measure of model performancebecause it assesses how well the network learns to inhibit non-relevant unitsas well as to excite relevant ones we also extracted simple accuracy data forthe 8HU model This accuracy score is the amount of activation (between 0and 1) on the single O-unit which corresponds to the appropriate target afxfor that input pattern Figure 3 shows the performance of the 8HU modelusing this metric It is clear that accuracy scores generate a graph which iseffectively a reection in a horizontal plane of the RMS data shown in thethird panel of Fig 2 In fact in the current simulations correct activation isalmost perfectly correlated with MSE (for example r 5 2 0988 for the 8HUmodel) However the activation metric has the advantage of more readyinterpretation and direct comparison with the human data

When the 8HU model and the human data are aligned as in Fig 3 thesecorrespondences become clear Pairwise comparison of individual points

324 ELLIS AND SCHMIDT

FIG 3 A comparison of human accuracy performance and that of the eight hidden unitconnectionist simulation

across these two graphs by correlation shows that the simulation predicts alarge proportion of the variance in the human data (R2 5 078) There aresome differences in detailmdashas is claried in Fig 4 where performance isaveraged over blocks the model performs somewhat better on the regularitems and worse on the irregular items particularly the low frequencyirregular items than do the humans ANOVA (three factor [humanmodelregularity and frequency] with 15 blocks as repeated measures by wordsanalysis) comparing the human and 8HU model data conrms theseinteractions (a) humanmodel F(1 32) 5 136 ns (b) humanmodel byfrequency F(1 32) 5 047 ns (c) humanmodel by regularity F(1 32) 53028 P 00001 (d) humanmodel by regularity by frequency F(1 32) 5501 P 005

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 325

FIG 4 The regularity by frequency interaction averaged over blocks in humans and the eighthidden unit model Error bars reect 95 condence intervals

Generalisation So far we have described performance with traineditems However we also tested model output when the stimulus was thepattern for generalisation item (TesterP) along with activation of the pluralmarking I-unit 22 a state of input on which the models had never beentrained Table 1 shows performance of the different models at the end oftraining It is clear that the larger models have abstracted the regular pluralpattern and tend to apply it by default to the generalisation test item for the15HU model (a) average activation on the regular plural O-unit is 060 (b)mean RMS error comparing observed activation across O-units 22ndash32 andthe target regular plural pattern (10000000000) is just 045 and (c) four outof the ve exemplar runs of this size of model chose the regular pluralpattern as being the closest to observed output as measured by minimum

326 ELLIS AND SCHMIDT

TABLE 1Performance on the Target Regular Plural Pattern for the Four Sizes of Model When

Presented with the Generalisation Wug-test Item TesterP at End of Training

Model Size

Measure 3HUa 5HU 8HU 15HU

RMS errorb

M 081 079 053 045SD 043 050 045 032

Activation weightc

M 020 028 057 060SD 044 044 052 035

N hits (5)d 1 2 3 4

There were ve examplars of each size of model aHU 5 hidden units bRMS error calculatedagainst the target activation pattern across O-units 22ndash32 for the regular plural afx cActivationweight on the regular plural afx O-Unit dNumber of exemplar models (5) which chose theregular plural afx pattern for TesterP as indexed by output weights on O-units 22ndash32 beingclosest to the regular plural afx target pattern activation using a squared Euclidean distancemetric

squared Euclidean distance Thus when the larger models are presentedwith a plural stimulus which they have only ever previously experienced as asingle form there is a tendency for them to generalise and apply the regularplural morpheme (bu-) in the same way that humans might generalise thatthe plural of ldquowugrdquo is ldquowugsrdquo

Effects of Different Sizes of Model Figure 2 also illustrates the effects ofmanipulating computational capacity of model (1) Models with lowercomputational power ( 5 a smaller number of HUs) learn the high frequencyitems quite wellmdashalmost as well as the largest model (2) The most strikingeffect of varying the computational power of the models lies in their abilitiesto learn low frequency irregular itemsmdashthis is by far the most sensitive indexof morphological learning ability The 3HU model hardly manages to learnthese forms at all The 15HU model eventually learns them rather well (3)There is essentially no frequency effect for regular items in the highercomputational power models but none the less the frequency effect forirregular items remains strong (4) The smaller models continue to show afrequency effect for regular items at the end of training Table l provides oneadditional effect of model size (5) The greater the computational power ofthe models the more they operate in ldquorule-likerdquo way by abstracting aldquoregularrdquo plural form which is applied by default to novel items In sumwhile lower computational power models are reasonably good on highfrequency regular items they show frequency effects for irregular and

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 327

regular items are relatively poor on ldquowug testsrdquo and have particulardifculty on low frequency irregular items

Discussion of Simulations

We believe that at least for the issue of regularity and frequency effects inmorphosyntax this is to date the most complete quantitative analysis of theadequacy of t of simulation to human data We are not simply makingpredictions about how an underspecied model might behave (theDaugherty amp Seidenberg 1994 criticisms of the Pinker amp Prince 1988 andPinker 1991 theories) We are not simply demonstrating that simulation andhuman data alike exhibit rst order interactions of frequency and regularity(Daugherty amp Seidenberg 1994) Instead we are showing the parallelpatterns of signicance of main effects rst and second order interactions inANOVAs of simulation and human data and we are showing that thesimulations explain close to 80 of the relevant human data When we go asfar as actually comparing human and model performance in a multifactorialANOVA we nd some differences of detail in the size of interactions thatare qualied by the humanmodel factor But these differences of detail donot detract from the general success of the models in simulating the humanpattern of development of the frequency by regularity interaction Inhumans and models alike high frequency items were learned signicantlyfaster than low frequency ones regular items were learned signicantlyfaster than irregular ones there was a signicant frequency by regularityinteraction where the frequency effect was less for regular items than forirregular ones and this is qualied as the higher level interaction with blockwhereby there is a developmental trendmdashthe frequency effect for regularitems attenuates faster than that for irregular items

We have demonstrated that the models can generalise and produce thedefault plural afx for a novel stimulus Similar ldquowug testrdquo performance by ahuman learner would be taken as an operationalisation that they hadacquired the ldquoregularrdquo morphological systematicity

Finally we have shown how varying the computational capacity of themodels affects both the rate of acquisition of default case the presence orabsence of frequency effects for regular items and ability to acquireirregular items This is compatible with existing data for children withspecic language impairment (SLI) Oetting and Rice (1993) compared ve-year-old SLI children with age-matched controls on their ability to formplurals The SLI children were signicantly worse at generating regularplurals for nonce (5 wug) items they were worse at generating regularplurals and they showed an effect of frequency on the regular items whichthe control children because of ceiling effects did not UnfortunatelyOetting and Rice (1993) do not provide clear data on the childrenrsquos ability to

328 ELLIS AND SCHMIDT

form irregular plurals However their pattern of differences between SLIand control childrenrsquos performance on regular items is sufciently close tothat between the present low-capacity and high-capacity simulations tosuggest that morphosyntactic impairments in individuals with SLI might beexplained by reduced language processing capacity in a general associativememory network rather than by a hybrid account The SLI childrenrsquosshowing frequency effects for regular items is particularly compelling in thisrespect However further assessment of regularity by frequency effects anddefault abstraction in individuals with SLI and with Williams syndrome(whose ability on regular forms is said to outstrip their performance onirregularsmdashBellugi Bihrle Jernigan Trauner amp Dougherty 1990) isnecessary to test these parallels further (see Marchman 1993 for othersimulations of different types of language dysfunction)

GENERAL DISCUSSION

Fluent language users have processed many millions of utterances involvingtens of thousands of types presented as innumerable tokens It should comeas no surprise either that they demonstrate such effortless and complex skillas a result of this mass of practice or that researchers lacking any truerecord of the learnersrsquo experience are awed and confused by thesesophisticated grammatical abilities While we have no wish to deny any ofthe complexity of the nal uent state we suspect that much of the mysteryof morphology can be claried by focusing on the acquisition process ratherthan the end-point This has been our aim in this paper Our MAL is atravesty of natural language but at least we know the types and tokens in thelearnersrsquo language evidence and there is no need to speculate or argue aboutextrapolations from corpus data or assumptions about registers

Human learning of this MAL inectional morphology quickly culminatesin a state where as with natural language frequency and regularity haveinteractive effects on performance But as we chart acquisition it is clearthat this interaction need not imply complex dual-mechanisms of processingRather it simply reects the asymptotes expected from the power law ofpractice a simple associative law of learning Thus we have shown that oneof the most frequently introduced arguments for the necessity of adual-mechanism approach a frequency effect for irregulars and the absenceof such an effect for regulars is not a good argument at all Furthermore wehave demonstrated that a simple connectionist model as an implementationof associative learning provided with the same language evidenceaccurately simulates the human acquisition data

But how is the power law instantiated in human and connectionistsystems and what is being associated in the acquisition of inectional

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 329

morphosyntax The power law of learning in human performance has beeninterpreted as resulting from basic associative mechanisms involving theformation of new chunks and the effects of frequency on the accessibility ofthese representations (Newell 1990 Newell amp Rosenbloom 1981)Anderson and Schooler (1991) suggest that memory (both as its behaviouralexpression in error rate and latency and as its neural expression in LTP)displays properties such as the power law of learning because theseproperties reect an optimal response to the environment where theprobability of an item occurring at any particular time is a power function ofits past frequency of occurrence Neural activation which controlsbehaviour reects the probability of an item occurring in the environmentthus the neural processes are designed to adapt behaviour to the statisticalproperties of the environment (Anderson 1993) Connectionist systems aredesigned to do the same thing (Chater 1995)

In our simplied account of inectional morphology where phonologicalfactors are put to one side the relevant units for chunking are the stem formsand the plural afxes From an associative perspective regularity andfrequency are essentially the same factor under different names The rstmeaning of ldquoregularrdquo in the Pocket Oxford Dictionary involves ldquohabitualconstantrdquo acts a denition in terms of statistical frequencies consistencyand descriptive generalisation the second stresses ldquoconforming to a rule orprinciplerdquo We need to disentangle these senses (see Sharwood-Smith 1994and Lima Corrigan amp Iverson 1994 for conceptual analysis of ldquorules oflanguagerdquo) Whether regular morphology is generated according to a rule ornot it is certainly the case for English and the MAL under study here (andgenerally it is the default if not the universal casemdashwe will return to thismatter later) that regular afxes are more habitual or frequent And asdemonstrated in Fig 5 the power law of practice entails that an effect of aconstant increment of regularity (in its frequency sense) is much moreapparent at low than at high frequencies of practice

Although it is a general principle the degree to which it applies dependson a range of factors including (a) the exponent of the power function (b)the particular level of experience attained and thus the placement ofcomparison points on the learning curve and (c) the degree to whichfrequency and regularity are additive or multiplicative In the presentexperiment a vefold increase in the frequency of the regular items resultsin a (5 3 the number of regular items) increase in use of the regular afx avefold increase in the frequency of an irregular item results in merely avefold increase in the use of the irregular afx Thus frequency andregularity are interactive rather than additive But even if we allow forinteraction the function still results in greater regularity effects for lowfrequency itemsmdashjust as for example the power function

330 ELLIS AND SCHMIDT

FIG 5 A frequency by regularity interaction arising from additive contributions of regularity(solid horizontal arrows) and frequency (dotted horizontal arrows) inputting into anasymptoting power function Notice in particular the solid vertical bars measuring out the largeregularity effect at low frequencies and the much smaller one at high frequencies (Adaptedfrom Plaut McClelland Seidenberg amp Patterson 1994)

y 5 1 2 x2 2

asymptotes so does any power function

y 5 1 2 (xn)2 2

where n 0 the shape remains the same albeit stretched or condensedalong the horizontal axis Thus all associative accounts of morphologywhether they stress the importance of type or token frequency (Bybee 1995)in the determination of statistical regularity imply a frequency by regularityinteraction in performance

Plaut et al (1996) analyse the operation of connectionist networks in theparticular quasi-regular domain of spellingndashsound consistency in reading todemonstrate how the frequency by regularity interaction is a direct

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 331

consequence of the nonlinearity adaptivity and distributed representationproperties of learning and representation in PDP networks In what followswe will minimally rephrase their analysis as it applies to the quasi-regulardomain of inectional morphology In a connectionist network the weightchanges induced by an inputoutput pattern (IOP) on any training epochserve to reduce the error on that IOP The frequency of the IOP (and theunits it involves) is reected in how often it is presented to the network Thusword frequency directly amplies weight changes that are helpful to theIOP itself Consistency of the morphological inections of two stems isreected in the similarity of afx units that are co-activated in their IOPsFurthermore two inputs will induce similar weight changes to the extentthat they activate similar units In our MAL as an extreme case consistentforms all activate the same afx unit irregular ones each activate a differentidiosyncratic afx Given that the weight changes that are induced by eachIOP are superimposed on the weight changes for all other IOPs an IOPwill tend to be helped by the weight changes for IOPs whose inputoutputmappings are consistent with its own and hindered by the weight changesfor inconsistent IOPs Thus frequency and consistency sum because theyboth arise from similar weight changes that are simply added together duringtraining The weight changes result in corresponding increases in thesummed input to output units that should be active and decreases to thesummed units that should be inactive However due to the non-linearity ofthe input-output function of units these changes do not produce directlyproportionate reduction of error Rather as the magnitude of the summedinput to output units increases their states gradually asymptote towards10mdasha given increase in the summed input to a unit yields progressivelysmaller decrements in error over the course of training Thus althoughfrequency and regularity-as-consistency each contribute to the weights andhence to the summed input to units their effect on error is subjected to agradual ceiling effect as unit states are driven towards extremal values

Thus a connectionist associative account of simple morphosyntax as it isembodied in our MAL holds that learning involves associating inputpatterns representing single or plural concepts with stem and afx lemmasacross a large distributed network Frequency of experience increases thestrength of the appropriate IO associations Regularity effects stem fromconsistency the consistent items all involve pairings between plurality andthe regular lemma and thus regularity is frequency by another name Thenetwork sums and abstracts these consistencies but it does so usingnon-linear unit inputndashoutput functions thereby resulting in the frequency byregularity interaction Networks are not simple competitive chunking orMarkov chaining mechanisms working on surface form Their massivelydistributed nature allows the emergence of more abstract internalrepresentations We have argued that this analysis accounts for the human

332 ELLIS AND SCHMIDT

acquisition data of simple MAL morphosyntax quite well We believe thatthe acquisition of natural language morphosyntax where there are manyadditional factors of different phonological consistencies (of the type forexample where the neighbours sink drink and stink are irregular in theirpast tenses but all behave in the same -ankway) are equally conducive to theprinciples of this type of account although as illustrated in grandersimulation enterprises (Cottrell amp Plunkett 1994 Daugherty amp Seidenberg1994 MacWhinney amp Leinbach 1991 Marchman 1993 Plunkett ampMarchman 1993) the complexity of interaction of the factors that are therein the language evidence leads to much more complex developmentaloutcomes Our role here has been to study human acquisition underprecisely known circumstances and to demonstrate just how well aconnectionist associative account can simulate these data

A simple regularity5 consistency account of this type will have difculty ifthe ldquoregularrdquo or ldquodefaultrdquo case is not the most frequent case in a naturallanguage Although there is agreement for English past tense and formorphology more generally that the default case is more frequent theremay be exceptions Marcus et al (1995) argue that while the German particle-t applies to a much smaller percentage of verbs than its English counterpartand the German plural -s applies only to a small percentage of nounsnevertheless these afxes behave as defaults in the language These defaultsufxations in German could thus pose a problem for statistical orconnectionist accounts of the acquisition of the more frequent patterns asdefault since they may not be due to a large number of regular wordsreinforcing a pattern in associative memory (Prasada amp Pinker 1993)However this is still a matter of some debate Bybee (1995) suggests that amore reasonable method of counting German particle type frequency doesshow the default (or ldquoproductiverdquo) process to have the highest typefrequency She also argues that to a large extent the productivity patterns ofGerman plurals also reect their type frequency Nakisa and Hahn (1996)and Plunkett and Nakisa (in press) show that generalisation to unseen ornovel forms in German and Arabic (where there have also been claims for aminority default) is more accurately predicted by their phonologicalsimilarity to existing forms in the language (properly represented for typeand token frequency) rather than by the operation of a default rule FinallyHare Elman and Daugherty (1995) demonstrate that multilayerednetworks can develop a default category even in the absence of superior typefrequency as long as the non-default classes are well dened and narrowlydened so that they serve as strong prototypes for analogising to novelforms In such cases the area outside these well-dened attractor basins canconstitute a potential default (see also Plunkett amp Marchman 1991)

In the original hybrid model irregulars were stored and accessed fromrote memory Pinker and Prince (1994 p 326) modied this part of the

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 333

model arguing that since rote memory could not account (a) for similaritiesbetween the morphological base and irregular forms (eg swingndashswung) (b)for similarity within sets of base forms undergoing similar processes (egsingndashsang ringndashrang springndashsprang) or (c) for the kind of semi-productivityshown when children produce errors such as bringndashbrang or swingndashswangthe memory system underlying such productions must be associative anddynamic somewhat as connectionism portrays it Yet to account for datasuch as the frequencyregularity interaction this revised hybrid model stillholds that regular forms are rule-governed But a purely rule-based accountof regulars cannot explain false friends effects where regular inconsistentitems (eg bakendashbaked is similar in rhyme to neighbours makendashmade andtakendashtook which have inconsistent past tenses) are produced more slowlythat entirely regular ones (Daugherty amp Seidenberg 1994 Seidenberg ampBruck 1990) or frequency effects on regular forms (Oetting amp Rice 1993Stemberger amp MacWhinney 1986) Unlike connectionist models a rule-based account of regulars cannot explain these aspects of the human dataNor is the regularityfrequency interaction any reason to reject connectionistaccounts of morphosyntax in favour of a hybrid model

REFERENCESAnderson JR (1982) Acquisition of cognitive skill Psychological Review 89 369ndash406Anderson JR (1993) Rules of the mind Hillsdale NJ Lawrence Erlbaum Associates IncAnderson JR amp Schooler LJ (1991) Reections of the environment in memory

Psychological Science 2 396ndash408Beck M (1995) Tracking down the source of NSndashNNS differences in syntactic competence

Unpublished manuscript University of North TexasBellugi U Bihrle A Jernigan D Trauner D amp Dougherty S (1990)

Neuropsychological neurological and neuroanatomical prole of Williams SyndromeAmerican Journal of Medical Genetics 6 115ndash125

Braine MDS Brody RE Brooks PJ Sudhalter V Ross JA Catalano L amp FischSM (1990) Exploring language acquisition in children with a miniature articiallanguage Effects of item and pattern frequency arbitrary subclasses and correctionJournal of Memory and Language 29 591ndash610

Broeder P amp Plunkett K (1994) Connectionism and second language acquisition In NEllis (Ed) Implicit and explicit learning of languages (pp 421ndash454) London AcademicPress

Bybee J (1995) Regular morphology and the lexicon Language and Cognitive Processes10 425ndash455

Chater N (1995) Neural networks The new statistical models of mind In JP Levy DBairaktaris JA Bullinaria amp P Cairns (Eds) Connectionist models of memory andlanguage London UCL Press

Cohen JD MacWhinney B Flatt M amp Provost J (1993) PsyScope A new graphicinteractive environment for designing psychology experiments Behavioral ResearchMethods Instruments and Computers 25 257ndash271

Cottrell G amp Plunkett K (1994) Acquiring the mapping from meaning to soundsConnection Science 6 379ndash412

334 ELLIS AND SCHMIDT

Daugherty KG amp Seidenberg MS (1992) Rules or connections The past tense revisitedIn Proceedings of the 14th annual conference of the Cognitive Science Society (pp 259ndash264)Pittsburgh PA Cognitive Science Society

Daugherty KG amp Seidenberg MS (1994) Beyond rules and exceptions A connectionistapproach to inectional morphology In SD Lima RL Corrigan amp GK Iverson (Eds)The reality of linguistic rules (pp 353ndash388) Amsterdam John Benjamins

DeKeyser R (1997) Beyond explicit rule learning Automatizing second languagemorphosyntax Studies in Second Language Acquisition 19 195ndash222

Ellis NC (1996) Sequencing in SLA Phonological memory chunking and points of orderStudies in Second Language Acquisition 18 91ndash126

Eubank L amp Gregg KR (1995) ldquoEt in Amygdala Egordquo UG (S)LA and neurobiologyStudies in Second Language Acquisition 17 35ndash58

Hare M Elman JL amp Daugherty KG (1995) Default generalisation in connectionistnetworks Language and Cognitive Processes 10 601ndash630

Jung J (1971) The experimenterrsquos dilemma New York Harper amp RowKirsner K (1994) Implicit processes in second language learning In N Ellis (Ed) Implicit

and explicit learning of languages (pp 283ndash312) London Academic PressLachter J amp Bever T (1988) The relation between linguistic structure and associative

theories of language learning A constructive critique of some connectionist learningmodels Cognition 28 195ndash247

Lima SD Corrigan RL amp Iverson GK (Eds) (1994) The reality of linguistic rulesAmsterdam John Benjamins

MacWhinney B (1983) Miniature language systems as tests of use of universal operatingprinciples in second-language learning by children and adults Journal of PsycholinguisticResearch 12 467ndash478

MacWhinney B (1994) The dinosaurs and the ring In SD Lima RL Corrigan amp GKIverson (Eds) The reality of linguistic rules (pp 283ndash320) Amsterdam John Benjamins

MacWhinney B amp Leinbach J (1991) Implementations are not conceptualizationsRevising the verb learning model Cognition 40 121ndash157

Marchman VA (1993) Constraints on plasticity in a connectionist model of the Englishpast tense Journal of Cognitive Neuroscience 5 215ndash234

Marcus GF Brinkmann U Clahsen H Wiese R amp Pinker S (1995) Germaninection The exception that proves the rule Cognitive Psychology 29 198ndash256

McLaughlin B (1980) On the use of miniature articial languages in second-languageresearch Applied Psycholinguistics 1 357ndash369

Moeser SD amp Bregman AS (1972) The role of reference in the acquisition of a miniaturearticial language Journal of Verbal Learning and Verbal Behavior 11 759ndash769

Morgan JL Meier RP amp Newport EL (1987) Structural packaging in the input tolanguage learning Contributions of prosodic and morphological marking of phrases to theacquisition of language Cognitive Psychology 19 498ndash550

Morgan JL amp Newport EL (1981) The role of constituent structure in the induction of anarticial language Journal of Verbal Learning and Verbal Behavior 20 67ndash85

Morton J (1979) Facilitation in word recognition Experiments causing change in thelogogen model In PA Kolers ME Wrolstad amp M Bouma (Eds) Processing of visiblelanguage (pp 259ndash268) New York Plenum

Nakisa R amp Hahn U (1996) Where defaults donrsquot help The case of the German pluralsystem In Proceedings of the 18th annual conference of the Cognitive Science Society (pp177ndash182) Hillsdale NJ Lawrence Erlbaum Associates Inc

Newell A (1990) Unied theories of cognition Cambridge MA Harvard University PressNewell A amp Rosenbloom P (1981) Mechanisms of skill acquisition and the law of

practice In JR Anderson (Ed) Cognitive skills and their acquisition Hillsdale NJLawrence Erlbaum Associates Inc

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 335

Oetting JB amp Rice ML (1993) Plural acquisition in children with specic languageimpairment Journal of Speech and Hearing Research 36 1236ndash1248

Paivio A (1986) Mental representations A dual coding approach Oxford UK OxfordUniversity Press

Palermo DS amp Howe HE (1970) An experimental analogy to the learning of past-tenseinection rules Journal of Verbal Learning and Verbal Behavior 9 410ndash416

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S amp Prince A (1988) On language and connectionism Analysis of a parallel

distributed processing model of language acquisition Cognition 29 195ndash247Pinker S amp Prince A (1994) Regular and irregular morphology and the psychological

status of rules of grammar In SD Lima RL Corrigan amp GK Iverson (Eds) The reality oflinguistic rules (pp 321ndash351) Amsterdam John Benjamins

Plaut DC McClelland JL Seidenberg MS amp Patterson KE (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plunkett K amp Marchman V (1991) U-shaped learning and frequency effects in amulti-layered perceptron Implications for child language acquisition Cognition 38 3ndash102

Plunkett K amp Marchman V (1993) From rote learning to system building Acquiring verbmorphology in children and connectionist nets Cognition 48 21ndash69

Plunkett K amp Nakisa RC (in press) A connectionist model of Arabic plural systemLanguage and Cognitive Processes

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Prasada S Pinker S amp Snyder W (1990) Some evidence that irregular forms are retrievedfrom memory but regular forms are rule-governed Paper presented at the 31st meeting ofthe Psychonomic Society New Orleans November

Rumelhart D Hinton G amp Williams R (1986) Learning internal representations by backpropagation In DE Rumelhart amp JL McClelland (Ed) Parallel distributed processingExplorations in the microstructure of cognition Cambridge MA MIT Press

Rumelhart D amp McClelland J (1986) On learning the past tense of English verbs In DERumelhart amp JL McClelland (Eds) Parallel distributed processing Explorations in themicrostructure of cognition Vol 2 Psychological and biological models (pp 272ndash326)Cambridge MA MIT Press

Seidenberg MS amp Bruck M (1990) Consistency effects in the generation of past tensemorphology Paper presented at the 31st meeting of the Psychonomic Society New OrleansNovember

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behavior 23 383ndash404

Sharwood Smith MA (1994) The unruly world of language In N Ellis (Ed) Implicit andexplicit learning of languages (pp 33ndash44) London Academic Press

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Human Learning and Memory 6 174ndash215

Stemberger JP amp MacWhinney B (1986) Frequency and the lexical storage of regularlyinected forms Memory and Cognition 14 17ndash26

Winter B amp Reber AS (1994) Implicit learning and the acquisition of natural languagesIn N Ellis (Ed) Implicit and explicit learning of languages (pp 115ndash146) LondonAcademic Press

Yang LR amp Givon T (1997) Benets and drawbacks of controlled laboratory studies ofsecond language acquisition The Keck second language learning project Studies in SecondLanguage Acquisition 19 173ndash194

336 ELLIS AND SCHMIDT

APPENDIX

The Word-forms of the Articial Language

Picture Stem Plural Form Frequency Regularity

car garth bugarth 5 Rbed pid bupid 1 Rlamp lant bulant 5 Rtable tib butid 1 Rplane poon bupoon 5 Rball prill buprill 1 Rtrain dram budram 5 Rhouse hize buhize 1 Rbook bisk bubisk 5 Rbroom breen bubreen 1 Rphone feem gofeem 5 Iumbrella brol gubrol 1 Ichair charp zecharp 5 Ihorse naig zonaig 1 Imonkey chonk nuchonk 5 Idog woop niwoop 1 Ielephant fant vefant 5 Iscissors zoze vuzoze 1 Ikite kag rekag 5 Ish pisc ropisc 1 I

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 309

frequency ones (2) Regular inections are not stored in associative memorybut are generated by a rule-based symbolic system the time to produce theinected form simply reecting (a) the time to access the lemma form and(b) the time to bind procedurally the regular inectional afx Thus thereare no frequency effects on their production latencies For example walkand afford are both quite common in their stem forms but the past tenseform walked is much more common than is afforded Nevertheless arule-generated account predicts that afforded will be produced as quickly aswalked since the stem forms being equally frequent are equally readilyaccessed and it takes a constant amount of time to add an -ed ending

Beck (1995) reports similar regularity by frequency interactions in thelatencies of productions of non-native speakers and thus broadens theapplication of this account to second language learning Indeed the effect isgenerally cited as key evidence for the existence of symbol-manipulatingrules in a specically linguistic mental module underpinning both rst andsecond language acquisition (Eubank amp Gregg 1995 Pinker 1991)

It is an elegant and attractive argument and the latency of productiondata are indeed consistent with such an account But there are two problemsThe rst is that there is a simpler more parsimonious explanation In thisarticle we will show that a basic principle of learning the power law ofpractice also generates frequency by regularity interactions Thus thesebehavioural dissociations between ldquoregularrdquo and ldquoirregularrdquo forms areequally consistent with connectionist accounts of morphosyntax The secondproblem is that although these theories are trying to explain both languageprocessing and language acquisition these particular data come from highlyuent language users It is difcult to gain an understanding of learning anddevelopment from observations of the nal state when we have no record ofthe content of the learnersrsquo years of exposure to language nor of thedevelopmental course of their prociencies If we want to understandlearning we must study it directly

The present report therefore describes adult acquisition of secondlanguage morphology using a miniature articial language (MAL) wherefrequency and regularity are factorially combined The accuracy and latencydata demonstrate frequency effects for both regular and irregular formsearly on in the acquisition process However as learning progresses so thefrequency effect for regular items diminishes although it remains forirregular items The results thus converge on the end-point described byPrasada et al (1990) However they also show how subjects reach thisendpoint with the convergence of performance on high and low frequencyregular plurals indexing the rate of acquisition of the regular pattern Wenext describe a simple connectionist model which was exposed to the sameexemplars in the same order as the human subjects The results of thesesimulations closely parallel those of the human learnersmdashthere are initially

310 ELLIS AND SCHMIDT

frequency effects for both the regular and irregular forms but with increasedexposure so the frequency effect for regular forms is attenuated Thus aconnectionist system which has no ldquorulesrdquo can duplicate this ldquorule-likerdquobehaviour Rather as shown by Plaut McClelland Seidenberg andPatterson (1996) for the case of reading the frequency by regularityinteraction is a natural and necessary result of associative learning processes

HUMAN LEARNING

If we wish to investigate the effects of input and practice on the acquisition oflanguage structure then we need a proper record of learner input Yet it isvirtually impossible to gather a complete corpus of learnersrsquo exposure andproduction of natural language How can we ascertain how many types andtokens of regular and irregular inections have been processed by forexample learners of English or of German At best for natural languagewe can only guess by extrapolation of frequency counts from languagecorpora and unveriable assumptions about registers Much of the disputeabout the implications of the regularity by frequency effect in morphosyntaxcentres on such assumptions (Bybee 1995 Marcus et al 1995 Plunkett ampMarchman 1991 Prasada amp Pinker 1993 Rumelhart amp McClelland 1986)One way around this is to have people learn a miniature articial language(MAL) under laboratory conditions

There is a rich tradition of using MALs to investigate processes ofacquisition of native language (Braine Brody Brooks Sudhalter RossCatalano amp Fisch 1990 Moeser amp Bregman 1972 Morgan Meier ampNewport 1987 Morgan amp Newport 1981 Palermo amp Howe 1970 Winter ampReber 1994) and second- and foreign-languages (MacWhinney 1983McLaughlin 1980 Yang amp Givon 1997) The number of published studies isat least in the hundreds if not more This is because MAL experiments havemany advantages They allow (a) a complete log of exposure to be recorded(b) accuracy to be monitored at each point (c) factorial manipulation of thepotential independent variables of interest and the teasing apart of naturallyconfounded effects and (d) relatively rapid collection of data But theseadvantages are bought at the cost of reduced ecological validity (1) MALsare toy languages when compared to the true complexity of naturallanguage (2) the period of study falls far short of lifespan practice (3)laboratory learning exposure conditions are far from naturalistic and (4)volunteer learners are often atypical in their motivations and demographicsAll of these very real problems of laboratory research stem from thesacrices made necessary by the goals of experimental control and

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 311

microanalysis of learning in real time This is the classic ldquoexperimenterrsquosdilemmardquo Naturalistic situations limit experimental control and thus theinternal logical validity of research laboratory research limits ecologicalvalidity (Jung 1971)

In adopting MAL research we are not denying naturalistic eld studiesWe might caricature the rst as providing valid descriptions of articiallanguage learning and the latter as providing tentative descriptions ofnatural language learning However the use of a MAL in this study avoids atleast three problems that have plagued similar experiments using naturallanguages (Beck 1995) (1) Uncertainty whether frequencies derived fromcorpora accurately represent input to learners (2) problems attributed tointerference from phonological similar items in regular and irregular sets(eg lean lend or y ow) or derived forms (eg head as a verb derived froma noun) and (3) evidence from only an advanced stage of learning forcingreliance on logical argumentation rather than empirical evidence to describeacquisition

Subjects

Seven monolingual English volunteers for the School of Psychologyvolunteer panel served as subjects There were three males and four femalesThey were aged between 18 and 40 yearsrsquo old They were paid pound250 per hourfor their involvement They usually worked an hour a day at the experiment

The Miniature Articial Language

Moeser and Bregman (1972) criticised the generalisability of MALexperiments which involved subjects listening to strings of words fromsemantically empty languages because some syntactic rules that were easilyacquired when the MAL referred to a stimulus world were not acquiredwhen it did not The MAL in the present study therefore incorporatedreference The subjectsrsquo initial task was to learn MAL names for 20 picturestimuli They were told that they were learning vocabulary in a newlanguage The pictures drawn from Snodgrass and Vanderwart (1980) aredescribed in the Appendix along with the stem form of their MAL namesand their corresponding plural forms Like Braine et al (1990) we choseMAL names which were suggestive of English cognates in order to makethem readily learnable thus for example the MAL words for umbrella andsh are respectively ldquobrolrdquo and ldquopiscrdquo To the degree that the task onlyinvolves ostensive denition and is not embedded in a larger goal-directedsetting it is acknowledgedly limited as an analogue of natural languagevocabulary acquisition However it allows clean and precise experimentalcontrol whilst providing a reasonable model of ostensive vocabulary

312 ELLIS AND SCHMIDT

learning that occurs to some considerable degree in L1 and even more so inintentional foreign language learning

Subjects learned the stem forms before studying the plural forms In thestem learning phase all items appeared equally often In the subsequentplural learning phase in order to maximise the sensitivity of the reactiontime (RT) measure plurality in the MAL was marked by a prex Half of theitems had a regular plural marker (ldquobu-rdquo) the remaining 10 items hadidiosyncratic afxes as shown in the Appendix The use of a prexinectional system afforded the additional advantage of minimising transfereffects from the subjectsrsquo rst language since although it is found in naturallanguages like Ndebele it is quite different from English plural formationThus the MAL was designed with English cognates in order to promotepositive transfer of learning of the stem forms and a very different inectionsystem in order to exclude any morphological transfer Frequency wasfactorially crossed with regularity with half of each set being presented vetimes more often The high and low frequency irregular items were matchedfor initial phoneme to control voice onset time

Method

The experiment was controlled by a Macintosh LCIII computerprogrammed with PsyScope (Cohen MacWhinney Flatt amp Provost 1993)Model pronunciations of the MAL lexis spoken by the rst author wererecorded using MacRecorder Subjectsrsquo vocal reaction times were measuredusing a voice key

Stem Learning Subjects rst learned the stem forms of the MAL Thisphase consisted of blocks of 20 trials In each block every picture appearedonce in a randomly chosen ordermdashthe subjectsrsquo frequency of exposure to allof the stem forms was the same Each trial consisted of the followingsequence (1) one of the pictures appeared mid-screen for 2sec (2) if thesubject thought they knew the picture name they spoke it into themicrophone as quickly as possible (3) 2sec after picture onset the computerspoke the correct name for the picture (4) the experimenter marked thesubjectrsquos utterance as correct or not by pressing one of two keys Thedependent variables were thus correctness and RT These blocks of trialswere repeated until the subjects knew the MAL names for the pictures andcould begin uttering them within 2sec of stimulus-onset to a criterion of100 correct on two successive blocks At this point they graduated to theplural learning phase

Plural Learning This phase used the same procedures except that eachblock consisted of 80 trials presented in random order (1) One presentation

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 313

of each of the 20 singular forms as in the preceding phase (2) vepresentations of each of the ve high frequency regular (HiFreqReg) pluralforms (3) ve presentations of each of the ve high frequency irregular(HiFreqIrreg) plural forms (4) one presentation of each of the ve lowfrequency regular (LoFreqReg) forms and (5) one presentation of each ofthe ve low frequency irregular (LoFreqIrreg) forms On the singular trialsjust one picture appeared midscreen on the plural trials a pair of adjacentidentical pictures appeared This phase continued for several (mean 5 43range 5 0 to 9) blocks beyond the point at which the learners had achieved100 accuracy on all plural forms in order to monitor increasing uency asindexed by RT improvement

Results

Stem Learning The stem learning data will only be presented insummary since the major focus of the experiment lies with the plural formsSubjects took an average of 917 (SD 593) blocks to achieve the criterion ofcorrectness Some stem forms were easier to learn than others (F(19 2161)5 2307 P 0001) Particularly easy words included ldquofantrdquo (92 correctover all trials) ldquopiscrdquo (85) and ldquolantrdquo (78) Particularly difcult wordsincluded ldquoprillrdquo (32) ldquocharprdquo (43) and ldquobreenrdquo (46) However forpurposes of control it is important to note that the stem forms of the itemsthat were later allocated in the Plural Learning phase to regularirregularplural morphology or highlow frequency of exposure did not signicantlydiffer in difculty of learning at this stage Regularity [F(1 16) 5 0703 ns)Frequency [F(1 16) 5 0569 ns) Regularity 3 Frequency (F(1 16) 5 0029ns)

Plural Learning Subjects partook of between 13 and 15 blocks of thisphase

The key interest lies with the rate of acquisition of the plural forms Wewill rst describe analyses of accuracy and then RT These data are shown inFig 1

ANOVA was used to assess the effects of frequency regularity and blockFor the main effects of regularity and frequency and their interaction wereport additional analyses which determine the robustness of these effectswhen separately analysed by subjects and by words There was a signicanteffect of frequency on accuracy with the advantage going to the highfrequency items [overall analysis F(1 5939) 5 43117 P 0001 by subjectsF(1 6) 5 5631 P 0005 by words F(1 16) 5 17200 P 0001] There wasa signicant effect of regularity with the regular plurals being learned betterthan the irregulars [overall analysis F(15939) 5 8152 P 0001 bysubjects F(1 6) 5 664 P 005 by words F(1 16) 5 3050 P 0001]

314 ELLIS AND SCHMIDT

FIG 1 Acquisition data for human learners of the MAL morphology The four curvesillustrate the interactions of regularity and frequency The left-hand panel shows accuracyimproving with practice The right-hand panel shows vocal reaction time diminishing withpractice In this graph as in Figs 2 and 3 the frequency effect for regular items is assessed bycomparing the two solid lines and the frequency effect for irregular items lies in the differencebetween the two dotted lines

There was signicant improvement over blocks [F(14 5939) 5 13200 P 0001] The interaction of regularity by frequency was signicant with thefrequency effect being larger for the irregular items [overall analysis F(15939) 5 7352 P 0001 by subjects F(1 6) 5 1241 P 002 by words F(116) 5 2773 P 0001] A signicant interaction between regularity byfrequency by block [F(14 5939) 5 222 P 0005] shows that the largerfrequency effect for irregular items is maximal in the mid-order blocksmdashit isa lesser effect at early and later stages of learning (Fig 1)

These patterns are conrmed in the somewhat noisier RT data where thefollowing sources of variation were signicant at least in the overall analysis(a) frequency [overall analysis F(1 5123) 5 65074 P 0001 by subjectsF(1 6) 5 6308 P 0001 by words F(1 16) 5 7396 P 0001] (b)

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 315

regularity [overall analysis F(1 5123) 5 1062 P 0001 by subjects F(1 6)5 326 ns by words F(1 16) 1 ns] (c) block [F(14 5123) 5 2872 P 0001] (d) regularity by frequency [overall analysis F(1 5123) 5 2092 P 0001 by subjects F(1 6) 5 1015 P 002 by words F(1 16) 5 215 ns] (e)regularity by frequency by block [F(14 5123) 5 195 P 005]

It is clear from both panels of Fig 1 that there was much less regularityeffect for high frequency items than for low frequency items and incounterpart that the frequency effect was less for regular items Inparticular if the last four blocks of training are taken being typical of moreuent performance they demonstrate that ceiling effects on the accuracydata allow no frequency effect for the regular items whereas the effect offrequency is maintained for the irregular ones The RT curves in theright-hand panel of Fig 1 are clearly non-linear In each case a powerfunction better ts the data than does a linear function the R2s for the powerfunction ts being respectively HiFreqReg 094 HiFreqIrreg 097LoFreqReg 074 LoFreqIrreg 076 Thus the frequency by regularityinteraction seems a natural result of asymptotic performance limits forcorrectness the 100 accuracy ceiling for RT the latency ldquooorrdquo governedby the power law of practice The curves in Fig 1 give no hint of a suddenstep in performance whereafter all regular items are produced with similarefciency

Discussion of Human Data

Like Prasada et al (1990) these data show a regularity by frequencyinteraction in the processing of morphology However contra Prasada et althe present data which concern the learning of morphology demonstrate(a) that there are frequency effects (both on accuracy and RT) for regularitems in the early stages of acquisition (b) the sizes of these effects diminishwith learning (converging on a position at uency as described by Prasada etal) and (c) the size of the frequency effect on irregular items similarlydiminishes with learning but it does so more slowly

These effects are readily explained by simple associative theories oflearning It is not necessary to invoke hybrid systems separating rule-governed regular morphosyntax from associatively stored irregulars Ifthere is one ubiquitous quantitative law of human learning it is the powerlaw of practice (Anderson 1982) The critical feature in this relationship isnot just that performance typically time improves with practice but that therelationship involves the power law in which the amount of improvementdecreases as a function of increasing practice or frequency Anderson (1982)showed that this function applies to a variety of tasks including for examplecigar rolling syllogistic reasoning book writing industrial productionreading inverted text and lexical decision For the case of language

316 ELLIS AND SCHMIDT

acquisition Kirsner (1994) has shown that lexical recognition processes(both for speech perception and reading) and lexical production processes(articulation and writing) are independently governed by the relationshipT5 BN-a where T is some measure of latency of response and N the numberof trials of practice DeKeyser (1997) shows that automatisation ofcomprehension and production performance involving explicitly learnedsecond-language morphosyntax separately follow independent skill-specic power functions Ellis (1996) describes the general implications ofthe power law for second-language acquisition

The human acquisition data in Fig 1 clearly follow the power law oflearning Thus as performance approaches asymptote so previouslyseparated functions tend to converge High frequency items are closer toasymptote Therefore whereas performance levels for regular and irregularitems are clearly distinguishable at low frequencies they are much lessdistinct at high frequencies This comes as no surprise to us when weconsider the ceiling imposed by 100 accuracy But the power law ofpractice equally implies an asymptotic ceiling whatever our performancemeasure

The power law entails that the contribution of any potential independentvariable affecting performance will be more difcult to demonstrate withhigh-frequency items in practised individuals This is certainly the case inreading For example while spelling and graphemendashphoneme regularityhave clear effects on low frequency items they show little or no effectsamong high frequency words (Seidenberg Waters Barnes amp Tanenhaus1984) Our learning data illustrate the same principle operating in theacquisition of morphology It is not the case that there is no regularity effecton high frequency items (or concomitantly no frequency effect on regularitems) it is simply that such effects are much smaller closer to asymptoteand thus are likely to be swamped by random error Indeed highfrequency regular inected forms do exhibit a small (but non-signicant)advantage over low frequency forms in naturally occurring errorsand they can be shown to have a larger (signicant) advantage ina more controlled experimental task in which subjects produced thepast-tense forms of regular English verbs (Stemberger amp MacWhinney1986)

We have shown that the interaction of frequency and regularity resultsfrom developmental trends that are consistent with the ubiquitousdescriptive law of associative learning In the next section we willdemonstrate how such data can be generated by a very general mechanismof associative learning When presented with the same materials at the samerelative frequencies of exposure a standard three-layer feed-forwardconnectionist model closely simulates our language-learnersrsquo acquisitioncurves

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 317

CONNECTIONIST SIMULATIONS

Connectionist models allow the assessment of just how much of languageacquisition can be done by extraction of probabilistic patterns ofgrammatical and morphological regularities Since the only relation inconnectionist models is strength of association between nodes they areexcellent modelling media in which to investigate the formation ofassociations (both between surface-form elements and between these andemergent more abstract internal representations) as a result of exposure tolanguage The advantages of connectionist models over traditional symbolicmodels are that (a) they are neurally inspired (b) they incorporatedistributed representation and control of information (c) they are data-driven with prototypical representations emerging as a natural outcomeof the learning process rather than being prespecied and innately givenby the modellers as in more nativist cognitive accounts (d) they showgraceful degradation as do humans with language disorder and (e)they are in essence models of learning and acquisition rather than staticdescriptions

There have been a number of compelling PDP models of the acquisition ofmorphology The pioneers were Rumelhart and McClelland (1986) whoshowed that a simple learning model reproduced to a remarkable degreethe characteristics of young children learning the morphology of the pasttense in Englishmdashthe model generated the so-called U-shaped learningcurve for irregular forms it exhibited a tendency to overgeneralise and inthe model as in children different past-tense forms for the same word couldco-exist at the same time Yet there was no ldquorulerdquomdashldquoit is possible to imaginethat the system simply stores a set of rote-associations between base andpast-tense forms with novel responses generated by lsquoon-linersquo generalisationsfrom the stored exemplarsrdquo (Rumelhart amp McClelland 1986 p 267) Thisoriginal past-tense model was very inuential It laid the foundations for theconnectionist approach to language research which this special issue attestsit generated a large number of criticisms (Lachter amp Bever 1988 Pinker ampPrince 1988) some of which are undeniably valid and in turn it thusspawned a number of revised and improved PDP models of different aspectsof the acquisition of the English past tense (eg Cottrell amp Plunkett 1994Daugherty amp Seidenberg 1994 MacWhinney amp Leinbach 1991Marchman 1993 Plunkett amp Marchman 1991)

Of these newer models only that of Daugherty and Seidenberg (19921994) addressed the regularity by frequency interaction Their model was athree-layer feed-forward network mapping the input of phonologicalstructure of present tense encoded over 120 phonological units representinga CCCVVCCC template for English monosyllables onto an output ofsimilarly coded phonological structure of past tense form Simulation 1

318 ELLIS AND SCHMIDT

where the model was trained on all presentndashpast tense pairs with Francis andKucera frequencies 1 (309 verbs with regular past tenses 104 verbs withirregular past tenses) failed to generate any frequency by regularityinteraction in error score However when in simulation 2 the number ofirregular verbs in the training set was reduced to just 24 this resulted in therebeing little effect of frequency on performance with the regular itemswhereas performance was better for high frequency irregular verbs than forlow frequency ones This is an important demonstration that the frequencyby regularity interaction can be simulated by a connectionist systemHowever this model concerned mappings between present- and past-tenseforms not direct access from semantics as in our human data Furthermoreit is unclear from these simulations how much the results are due toregularity per se how much to phonological factors (for example insimulation 1 the error scores for regulars in generalisation tests were inatedby there being a high proportion of phonologically similar irregular pasttense false friends in the training corpus 1994 p 375) and given thecontrasting results of simulations 1 and 2 how much to the particular choiceof training items and the relative proportions of regular and irregular items

Indeed much of the debate over the validity of all of these models hasconcerned (a) the adequacy of the adopted low-level phonologicalrepresentations whether these might serve as TRICS (The RepresentationsIt Crucially Supposes) which cryptoembody rules within the connectionistnetwork (Lachter amp Bever 1988) (b) over-reliance on phonological cues inmodels that used sound-to-sound conversions to link base forms with pasttense forms (Daugherty amp Seidenberg 1992 MacWhinney 1994MacWhinney amp Leinbach 1991) and (c) the appropriateness of the trainingsets that are used in exposing the models to the evidence of language andwhether they properly reect the types and tokens in representative ratiosof regular and irregular forms in a sequence that plausibly mirrors learnerlanguage exposure at different stages of development (Daugherty ampSeidenberg 1992 Plunkett amp Marchman 1991) The models are usuallyconcerned with child learner language exposure yet here the extrapolationis particularly tenuous since adult language frequency norms are typicallythe only available reference database

In our simple demonstration with its intended focus on the frequency byregularity interaction in the acquisition of morphology we circumventedthese problems by the following means

1 We eliminated TRICS from our input and output representations byentirely ignoring the low level representations and instead simply havingone input unit for each picture and one output unit for each morphemeWe make no pretence of plausibility of these models for low levels ofrepresentation in either input or output processing but we are presently

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 319

neither concerned with low-level feature perception nor the details ofmotor programming for pronunciation Each input unit is supposedbroadly to correspond to activation of some picture detector or ldquoimagenrdquo(Paivio 1986) each output unit to some speech output ldquologogenrdquo(Morton 1979) We acknowledge that these parts of the model are grosslysimplied and we believe that these aspects ultimately involve distributedrepresentations as well However there is one advantage to thissimplicitymdashwhere as here each input detector or output logogen isrepresented by just one unit with all units having the same form there isno scope for making some more similar than others other that is than isdetermined by the frequency of the inputndashoutput mappings Thisencoding scheme allows the most hygienic investigation of frequency andregularity uncontaminated by other factors2 Like Cottrell and Plunkett (1994) we are modelling direct access fromsemantics rather than generating past tense from stem form phonologyBecause there are no phonological representations in our model there isno chance of the results reecting any confound with phonology As usualcosts accompany the benets Our simulations can have no bearing onphonological aspects of inection and thus while they might generatequantitatively clean data unlike the elegant error analyses performed byfor example Daugherty and Seidenberg (1994) and MacWhinney andLeinbach (1991) the error responses in the present simulations will bequalitatively uninteresting3 We eliminated uncertainty about the detailed content of the complexevidence which human learners are exposed to during their early years ofhearing natural language by modelling adult subjectsrsquo learning of theMAL that was reported in the preceding section Because we determinedthe exposure sequence of types and tokens of regular and irregular itemsin this language learning task we could train the models ensuring theidentical history of exposure

The most common architecture of connectionist model has three layersthe input layer of units the output layer and an intervening layer of hiddenunits (HUs) The presence of HUs enables more difcult inputoutputmappings to be learned than would be possible if the input units weredirectly connected to the output units (Broeder amp Plunkett 1994Rumelhart amp McClelland 1986) The most common learning algorithm isldquoback propagationrdquo (Rumelhart Hinton amp Williams 1986) where on eachlearning trial the network compares its output with the target output andany difference is propagated back to the hidden unit weights and in turn tothe input weights in a way that reduces the error Our simulations adoptedthis standard architecture Thus whatever the pattern of results they aregenerated by a very general learning system whose processes were not

320 ELLIS AND SCHMIDT

tweaked in any way to make it particular as a Language Acquisition DeviceSo what are the emergent patterns of language acquisition that result whenthis general associative learning mechanism is applied to the particularcontent of picture stimuli with their corresponding singular and plural lexicalresponses as experienced at the same relative frequencies of exposure as ourhuman learners

The Models

Architecture Every model had 22 input (I-) units Each of I-units 1ndash20represented one of the pictures used in the training set of the AppendixI-unit 21 represented another picture (the generalisation test item TesterP)which was only ever presented for training to the model in the singularmdashlater it was presented as a plural test item to see which plural afx the modelwould choose for this generalisation item (akin to asking you what is theplural of a novel word like ldquowugrdquo) I-unit 22 coded plurality that iswhether a singular stimulus item or a pair were presented Every model had32 output (O-) units O-units 1ndash20 represented the stem forms of the lexisshown in the Appendix O-unit 21 represented the stem form correspondingto I-unit 21 O-units 22ndash31 represented each of the other 10 unique pluralafxes for irregular items O-unit 32 represented the regular plural afxThis numbering of I- and O-units is of course arbitrary and was random-ised across modelsmdashwhat mattered and remained constant was that thesame O-unit was always reinforced whenever a particular I-unit wasactivated

We investigated four different classes of model which differed in theircomputational capacity or resources The larger the number of HUs in amodel the larger the number of connections in the network and the greaterits capacity to learn new associations and abstractions Thus we comparedmodels with 3 5 8 and 15 HUs

Stem Training At the outset the connection weights of the models wererandomised Then just like our human learners the models were rsttrained on the singular forms Each epoch of training consisted of 21 trialsEach trial consisted of presentation of a unique input pattern one for each ofthe input pictures Thus just one of I-units 1ndash21 would be ldquoonrdquo on any trialThroughout the singular training phase I-unit 22 (representing singlepluralstimuli) was set to ldquooffrdquo For each input pattern the model responded with apattern of output over its 32 O-units Initially this was the random result ofthe random connection weights But the model was also presented with thecorrect pattern of output for that corresponding input pattern (eg if I-unit 1

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 321

was on and all others off O-unit 1 should have had value 10 and all otherszero) On each trial the back-propagation algorithm calculated thedifference between the level of activity that was produced on each O-unitand the ldquocorrectrdquo level of activity and a small adjustment was made to theconnection strength to that unit in such a way that when the same processoccurred again a closer approximation to the correct pattern of outputactivation would be achieved The models were trained for 500 epochs ofsingular experience For each size of model we ran ve examples startingwith different arbitrary unit allocation and different initial randomconnection strengths The data we produce for each model is the averageperformance of these ve examples

Plural Training The model weights that resulted from this singulartraining then served as the starting point for another 700 epochs of trainingon plurals The trials constituting each epoch were very similar in nature tothose used with the human learners Each epoch consisted of 81 trialspresented in random order (a) One presentation of each of the 21 singularforms as in the preceding phase (b) ve presentations of each of the ve highfrequency regular (HiFreqReg) plural forms (c) ve presentations of eachof the ve high frequency irregular (HiFreqIrreg) plural forms (d) onepresentation of each of the ve low frequency regular (LoFreqReg) formsand (e) one presentation of each of the ve low frequency irregular(LoFreqIrreg) forms For training trials of type (a) just one of I-units 1ndash21was activated I-unit 22 was off and just the corresponding one of O-units1ndash21 was reinforced For the other training types (bndashe) one of I-units 1ndash20was activated I-unit 22 was on and one of O-units 1ndash20 (the correspondingstem form) along with one of O-units 22ndash32 (the corresponding plural afx)were reinforced The learning algorithm operated as it did in the stemtraining phase At regular intervals we tested the state of learning of themodel by presenting it without feedback with test input patterns thatrepresented the plural cases of all 21 pictures At these tests for eachstimulus we measured the pattern of activation (between 0 [no activation]and 1 [full on]) across O-units 22ndash32 and compared it against the targetplural activation for that input pattern

Results

Regularity by Frequency Figure 2 shows the Root Mean Square (RMS)error calculated across the plural afx O-units (22ndash32) averaged over the veitems in each of the following classes HiFreqReg HiFreqIrreg LoFreqRegLoFreqIrreg at each point in testing of the model These graphs illustratethat learning in all of the models showed clear effects of frequency (high

322

FIG

2

Acq

uisi

tion

data

for

fou

r co

nnec

tioni

st m

odel

s w

ith

incr

easi

ng c

ompu

tati

onal

pow

er t

rain

ed o

n th

e M

AL

mor

phol

ogy

The

re a

re c

lear

reg

ular

ity b

y fr

eque

ncy

inte

ract

ions

in a

ll m

odel

s

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 323

frequency items were learned faster than low frequency ones) regularity(regular items were learned faster than irregular ones) and a frequency byregularity interaction whereby there was much less regularity effect for highfrequency items than for low frequency items and equally that thefrequency effect was less for regular items than for irregular ones

ANOVAs on these RMS data for each size of model demonstrated thatthere was high consistency of response across items and examplesimulations For example when the 8HU model was analysed as a repeatedmeasures ANOVA across 15 roughly equally spaced blocks of training (toparallel the human data analysis) the following signicant effects wereobserved (a) Frequency [by simulations F(1 16) 5 2080 P 00005 bywords F(1 16) 5 5665 P 00001] (b) regularity [by simulations F(1 16)5 907 P 001 by words F(1 16) 5 3957 P 00001] (c) regularity byfrequency [by simulations F(1 16) 5 485 P 005 by words F(1 16) 51561 P 0005] (d) block [by simulations F(14 224) 5 6803 P 00001by words F(14 224) 5 14914 P 00001] (e) block by regularity [bysimulations F(14 224) 5 3675 P 00001 by words F(14 224) 5 2929 P 00001] (f) block by frequency [by simulations F(14 224) 5 1893 P 00001 by words F(14 224) 5 1184 P 00001] and (g) block by regularityby frequency [by simulations F(14 224) 5 1611 P 00001 by words F(14224) 5 1306 P 00001]

Comparison of this pattern of ANOVA effects with that reported earlierfor the human data shows important similarities in both cases there aresignicant main effects of frequency regularity and blocks and there aresignicant interactions involving regularity by frequency and regularity byfrequency by block Thus the connectionist models demonstrate effectswhich broadly parallel those found in humans

Comparison with Human Data More detailed comparison is alsopossible Although RMS error is the usual measure of model performancebecause it assesses how well the network learns to inhibit non-relevant unitsas well as to excite relevant ones we also extracted simple accuracy data forthe 8HU model This accuracy score is the amount of activation (between 0and 1) on the single O-unit which corresponds to the appropriate target afxfor that input pattern Figure 3 shows the performance of the 8HU modelusing this metric It is clear that accuracy scores generate a graph which iseffectively a reection in a horizontal plane of the RMS data shown in thethird panel of Fig 2 In fact in the current simulations correct activation isalmost perfectly correlated with MSE (for example r 5 2 0988 for the 8HUmodel) However the activation metric has the advantage of more readyinterpretation and direct comparison with the human data

When the 8HU model and the human data are aligned as in Fig 3 thesecorrespondences become clear Pairwise comparison of individual points

324 ELLIS AND SCHMIDT

FIG 3 A comparison of human accuracy performance and that of the eight hidden unitconnectionist simulation

across these two graphs by correlation shows that the simulation predicts alarge proportion of the variance in the human data (R2 5 078) There aresome differences in detailmdashas is claried in Fig 4 where performance isaveraged over blocks the model performs somewhat better on the regularitems and worse on the irregular items particularly the low frequencyirregular items than do the humans ANOVA (three factor [humanmodelregularity and frequency] with 15 blocks as repeated measures by wordsanalysis) comparing the human and 8HU model data conrms theseinteractions (a) humanmodel F(1 32) 5 136 ns (b) humanmodel byfrequency F(1 32) 5 047 ns (c) humanmodel by regularity F(1 32) 53028 P 00001 (d) humanmodel by regularity by frequency F(1 32) 5501 P 005

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 325

FIG 4 The regularity by frequency interaction averaged over blocks in humans and the eighthidden unit model Error bars reect 95 condence intervals

Generalisation So far we have described performance with traineditems However we also tested model output when the stimulus was thepattern for generalisation item (TesterP) along with activation of the pluralmarking I-unit 22 a state of input on which the models had never beentrained Table 1 shows performance of the different models at the end oftraining It is clear that the larger models have abstracted the regular pluralpattern and tend to apply it by default to the generalisation test item for the15HU model (a) average activation on the regular plural O-unit is 060 (b)mean RMS error comparing observed activation across O-units 22ndash32 andthe target regular plural pattern (10000000000) is just 045 and (c) four outof the ve exemplar runs of this size of model chose the regular pluralpattern as being the closest to observed output as measured by minimum

326 ELLIS AND SCHMIDT

TABLE 1Performance on the Target Regular Plural Pattern for the Four Sizes of Model When

Presented with the Generalisation Wug-test Item TesterP at End of Training

Model Size

Measure 3HUa 5HU 8HU 15HU

RMS errorb

M 081 079 053 045SD 043 050 045 032

Activation weightc

M 020 028 057 060SD 044 044 052 035

N hits (5)d 1 2 3 4

There were ve examplars of each size of model aHU 5 hidden units bRMS error calculatedagainst the target activation pattern across O-units 22ndash32 for the regular plural afx cActivationweight on the regular plural afx O-Unit dNumber of exemplar models (5) which chose theregular plural afx pattern for TesterP as indexed by output weights on O-units 22ndash32 beingclosest to the regular plural afx target pattern activation using a squared Euclidean distancemetric

squared Euclidean distance Thus when the larger models are presentedwith a plural stimulus which they have only ever previously experienced as asingle form there is a tendency for them to generalise and apply the regularplural morpheme (bu-) in the same way that humans might generalise thatthe plural of ldquowugrdquo is ldquowugsrdquo

Effects of Different Sizes of Model Figure 2 also illustrates the effects ofmanipulating computational capacity of model (1) Models with lowercomputational power ( 5 a smaller number of HUs) learn the high frequencyitems quite wellmdashalmost as well as the largest model (2) The most strikingeffect of varying the computational power of the models lies in their abilitiesto learn low frequency irregular itemsmdashthis is by far the most sensitive indexof morphological learning ability The 3HU model hardly manages to learnthese forms at all The 15HU model eventually learns them rather well (3)There is essentially no frequency effect for regular items in the highercomputational power models but none the less the frequency effect forirregular items remains strong (4) The smaller models continue to show afrequency effect for regular items at the end of training Table l provides oneadditional effect of model size (5) The greater the computational power ofthe models the more they operate in ldquorule-likerdquo way by abstracting aldquoregularrdquo plural form which is applied by default to novel items In sumwhile lower computational power models are reasonably good on highfrequency regular items they show frequency effects for irregular and

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 327

regular items are relatively poor on ldquowug testsrdquo and have particulardifculty on low frequency irregular items

Discussion of Simulations

We believe that at least for the issue of regularity and frequency effects inmorphosyntax this is to date the most complete quantitative analysis of theadequacy of t of simulation to human data We are not simply makingpredictions about how an underspecied model might behave (theDaugherty amp Seidenberg 1994 criticisms of the Pinker amp Prince 1988 andPinker 1991 theories) We are not simply demonstrating that simulation andhuman data alike exhibit rst order interactions of frequency and regularity(Daugherty amp Seidenberg 1994) Instead we are showing the parallelpatterns of signicance of main effects rst and second order interactions inANOVAs of simulation and human data and we are showing that thesimulations explain close to 80 of the relevant human data When we go asfar as actually comparing human and model performance in a multifactorialANOVA we nd some differences of detail in the size of interactions thatare qualied by the humanmodel factor But these differences of detail donot detract from the general success of the models in simulating the humanpattern of development of the frequency by regularity interaction Inhumans and models alike high frequency items were learned signicantlyfaster than low frequency ones regular items were learned signicantlyfaster than irregular ones there was a signicant frequency by regularityinteraction where the frequency effect was less for regular items than forirregular ones and this is qualied as the higher level interaction with blockwhereby there is a developmental trendmdashthe frequency effect for regularitems attenuates faster than that for irregular items

We have demonstrated that the models can generalise and produce thedefault plural afx for a novel stimulus Similar ldquowug testrdquo performance by ahuman learner would be taken as an operationalisation that they hadacquired the ldquoregularrdquo morphological systematicity

Finally we have shown how varying the computational capacity of themodels affects both the rate of acquisition of default case the presence orabsence of frequency effects for regular items and ability to acquireirregular items This is compatible with existing data for children withspecic language impairment (SLI) Oetting and Rice (1993) compared ve-year-old SLI children with age-matched controls on their ability to formplurals The SLI children were signicantly worse at generating regularplurals for nonce (5 wug) items they were worse at generating regularplurals and they showed an effect of frequency on the regular items whichthe control children because of ceiling effects did not UnfortunatelyOetting and Rice (1993) do not provide clear data on the childrenrsquos ability to

328 ELLIS AND SCHMIDT

form irregular plurals However their pattern of differences between SLIand control childrenrsquos performance on regular items is sufciently close tothat between the present low-capacity and high-capacity simulations tosuggest that morphosyntactic impairments in individuals with SLI might beexplained by reduced language processing capacity in a general associativememory network rather than by a hybrid account The SLI childrenrsquosshowing frequency effects for regular items is particularly compelling in thisrespect However further assessment of regularity by frequency effects anddefault abstraction in individuals with SLI and with Williams syndrome(whose ability on regular forms is said to outstrip their performance onirregularsmdashBellugi Bihrle Jernigan Trauner amp Dougherty 1990) isnecessary to test these parallels further (see Marchman 1993 for othersimulations of different types of language dysfunction)

GENERAL DISCUSSION

Fluent language users have processed many millions of utterances involvingtens of thousands of types presented as innumerable tokens It should comeas no surprise either that they demonstrate such effortless and complex skillas a result of this mass of practice or that researchers lacking any truerecord of the learnersrsquo experience are awed and confused by thesesophisticated grammatical abilities While we have no wish to deny any ofthe complexity of the nal uent state we suspect that much of the mysteryof morphology can be claried by focusing on the acquisition process ratherthan the end-point This has been our aim in this paper Our MAL is atravesty of natural language but at least we know the types and tokens in thelearnersrsquo language evidence and there is no need to speculate or argue aboutextrapolations from corpus data or assumptions about registers

Human learning of this MAL inectional morphology quickly culminatesin a state where as with natural language frequency and regularity haveinteractive effects on performance But as we chart acquisition it is clearthat this interaction need not imply complex dual-mechanisms of processingRather it simply reects the asymptotes expected from the power law ofpractice a simple associative law of learning Thus we have shown that oneof the most frequently introduced arguments for the necessity of adual-mechanism approach a frequency effect for irregulars and the absenceof such an effect for regulars is not a good argument at all Furthermore wehave demonstrated that a simple connectionist model as an implementationof associative learning provided with the same language evidenceaccurately simulates the human acquisition data

But how is the power law instantiated in human and connectionistsystems and what is being associated in the acquisition of inectional

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 329

morphosyntax The power law of learning in human performance has beeninterpreted as resulting from basic associative mechanisms involving theformation of new chunks and the effects of frequency on the accessibility ofthese representations (Newell 1990 Newell amp Rosenbloom 1981)Anderson and Schooler (1991) suggest that memory (both as its behaviouralexpression in error rate and latency and as its neural expression in LTP)displays properties such as the power law of learning because theseproperties reect an optimal response to the environment where theprobability of an item occurring at any particular time is a power function ofits past frequency of occurrence Neural activation which controlsbehaviour reects the probability of an item occurring in the environmentthus the neural processes are designed to adapt behaviour to the statisticalproperties of the environment (Anderson 1993) Connectionist systems aredesigned to do the same thing (Chater 1995)

In our simplied account of inectional morphology where phonologicalfactors are put to one side the relevant units for chunking are the stem formsand the plural afxes From an associative perspective regularity andfrequency are essentially the same factor under different names The rstmeaning of ldquoregularrdquo in the Pocket Oxford Dictionary involves ldquohabitualconstantrdquo acts a denition in terms of statistical frequencies consistencyand descriptive generalisation the second stresses ldquoconforming to a rule orprinciplerdquo We need to disentangle these senses (see Sharwood-Smith 1994and Lima Corrigan amp Iverson 1994 for conceptual analysis of ldquorules oflanguagerdquo) Whether regular morphology is generated according to a rule ornot it is certainly the case for English and the MAL under study here (andgenerally it is the default if not the universal casemdashwe will return to thismatter later) that regular afxes are more habitual or frequent And asdemonstrated in Fig 5 the power law of practice entails that an effect of aconstant increment of regularity (in its frequency sense) is much moreapparent at low than at high frequencies of practice

Although it is a general principle the degree to which it applies dependson a range of factors including (a) the exponent of the power function (b)the particular level of experience attained and thus the placement ofcomparison points on the learning curve and (c) the degree to whichfrequency and regularity are additive or multiplicative In the presentexperiment a vefold increase in the frequency of the regular items resultsin a (5 3 the number of regular items) increase in use of the regular afx avefold increase in the frequency of an irregular item results in merely avefold increase in the use of the irregular afx Thus frequency andregularity are interactive rather than additive But even if we allow forinteraction the function still results in greater regularity effects for lowfrequency itemsmdashjust as for example the power function

330 ELLIS AND SCHMIDT

FIG 5 A frequency by regularity interaction arising from additive contributions of regularity(solid horizontal arrows) and frequency (dotted horizontal arrows) inputting into anasymptoting power function Notice in particular the solid vertical bars measuring out the largeregularity effect at low frequencies and the much smaller one at high frequencies (Adaptedfrom Plaut McClelland Seidenberg amp Patterson 1994)

y 5 1 2 x2 2

asymptotes so does any power function

y 5 1 2 (xn)2 2

where n 0 the shape remains the same albeit stretched or condensedalong the horizontal axis Thus all associative accounts of morphologywhether they stress the importance of type or token frequency (Bybee 1995)in the determination of statistical regularity imply a frequency by regularityinteraction in performance

Plaut et al (1996) analyse the operation of connectionist networks in theparticular quasi-regular domain of spellingndashsound consistency in reading todemonstrate how the frequency by regularity interaction is a direct

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 331

consequence of the nonlinearity adaptivity and distributed representationproperties of learning and representation in PDP networks In what followswe will minimally rephrase their analysis as it applies to the quasi-regulardomain of inectional morphology In a connectionist network the weightchanges induced by an inputoutput pattern (IOP) on any training epochserve to reduce the error on that IOP The frequency of the IOP (and theunits it involves) is reected in how often it is presented to the network Thusword frequency directly amplies weight changes that are helpful to theIOP itself Consistency of the morphological inections of two stems isreected in the similarity of afx units that are co-activated in their IOPsFurthermore two inputs will induce similar weight changes to the extentthat they activate similar units In our MAL as an extreme case consistentforms all activate the same afx unit irregular ones each activate a differentidiosyncratic afx Given that the weight changes that are induced by eachIOP are superimposed on the weight changes for all other IOPs an IOPwill tend to be helped by the weight changes for IOPs whose inputoutputmappings are consistent with its own and hindered by the weight changesfor inconsistent IOPs Thus frequency and consistency sum because theyboth arise from similar weight changes that are simply added together duringtraining The weight changes result in corresponding increases in thesummed input to output units that should be active and decreases to thesummed units that should be inactive However due to the non-linearity ofthe input-output function of units these changes do not produce directlyproportionate reduction of error Rather as the magnitude of the summedinput to output units increases their states gradually asymptote towards10mdasha given increase in the summed input to a unit yields progressivelysmaller decrements in error over the course of training Thus althoughfrequency and regularity-as-consistency each contribute to the weights andhence to the summed input to units their effect on error is subjected to agradual ceiling effect as unit states are driven towards extremal values

Thus a connectionist associative account of simple morphosyntax as it isembodied in our MAL holds that learning involves associating inputpatterns representing single or plural concepts with stem and afx lemmasacross a large distributed network Frequency of experience increases thestrength of the appropriate IO associations Regularity effects stem fromconsistency the consistent items all involve pairings between plurality andthe regular lemma and thus regularity is frequency by another name Thenetwork sums and abstracts these consistencies but it does so usingnon-linear unit inputndashoutput functions thereby resulting in the frequency byregularity interaction Networks are not simple competitive chunking orMarkov chaining mechanisms working on surface form Their massivelydistributed nature allows the emergence of more abstract internalrepresentations We have argued that this analysis accounts for the human

332 ELLIS AND SCHMIDT

acquisition data of simple MAL morphosyntax quite well We believe thatthe acquisition of natural language morphosyntax where there are manyadditional factors of different phonological consistencies (of the type forexample where the neighbours sink drink and stink are irregular in theirpast tenses but all behave in the same -ankway) are equally conducive to theprinciples of this type of account although as illustrated in grandersimulation enterprises (Cottrell amp Plunkett 1994 Daugherty amp Seidenberg1994 MacWhinney amp Leinbach 1991 Marchman 1993 Plunkett ampMarchman 1993) the complexity of interaction of the factors that are therein the language evidence leads to much more complex developmentaloutcomes Our role here has been to study human acquisition underprecisely known circumstances and to demonstrate just how well aconnectionist associative account can simulate these data

A simple regularity5 consistency account of this type will have difculty ifthe ldquoregularrdquo or ldquodefaultrdquo case is not the most frequent case in a naturallanguage Although there is agreement for English past tense and formorphology more generally that the default case is more frequent theremay be exceptions Marcus et al (1995) argue that while the German particle-t applies to a much smaller percentage of verbs than its English counterpartand the German plural -s applies only to a small percentage of nounsnevertheless these afxes behave as defaults in the language These defaultsufxations in German could thus pose a problem for statistical orconnectionist accounts of the acquisition of the more frequent patterns asdefault since they may not be due to a large number of regular wordsreinforcing a pattern in associative memory (Prasada amp Pinker 1993)However this is still a matter of some debate Bybee (1995) suggests that amore reasonable method of counting German particle type frequency doesshow the default (or ldquoproductiverdquo) process to have the highest typefrequency She also argues that to a large extent the productivity patterns ofGerman plurals also reect their type frequency Nakisa and Hahn (1996)and Plunkett and Nakisa (in press) show that generalisation to unseen ornovel forms in German and Arabic (where there have also been claims for aminority default) is more accurately predicted by their phonologicalsimilarity to existing forms in the language (properly represented for typeand token frequency) rather than by the operation of a default rule FinallyHare Elman and Daugherty (1995) demonstrate that multilayerednetworks can develop a default category even in the absence of superior typefrequency as long as the non-default classes are well dened and narrowlydened so that they serve as strong prototypes for analogising to novelforms In such cases the area outside these well-dened attractor basins canconstitute a potential default (see also Plunkett amp Marchman 1991)

In the original hybrid model irregulars were stored and accessed fromrote memory Pinker and Prince (1994 p 326) modied this part of the

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 333

model arguing that since rote memory could not account (a) for similaritiesbetween the morphological base and irregular forms (eg swingndashswung) (b)for similarity within sets of base forms undergoing similar processes (egsingndashsang ringndashrang springndashsprang) or (c) for the kind of semi-productivityshown when children produce errors such as bringndashbrang or swingndashswangthe memory system underlying such productions must be associative anddynamic somewhat as connectionism portrays it Yet to account for datasuch as the frequencyregularity interaction this revised hybrid model stillholds that regular forms are rule-governed But a purely rule-based accountof regulars cannot explain false friends effects where regular inconsistentitems (eg bakendashbaked is similar in rhyme to neighbours makendashmade andtakendashtook which have inconsistent past tenses) are produced more slowlythat entirely regular ones (Daugherty amp Seidenberg 1994 Seidenberg ampBruck 1990) or frequency effects on regular forms (Oetting amp Rice 1993Stemberger amp MacWhinney 1986) Unlike connectionist models a rule-based account of regulars cannot explain these aspects of the human dataNor is the regularityfrequency interaction any reason to reject connectionistaccounts of morphosyntax in favour of a hybrid model

REFERENCESAnderson JR (1982) Acquisition of cognitive skill Psychological Review 89 369ndash406Anderson JR (1993) Rules of the mind Hillsdale NJ Lawrence Erlbaum Associates IncAnderson JR amp Schooler LJ (1991) Reections of the environment in memory

Psychological Science 2 396ndash408Beck M (1995) Tracking down the source of NSndashNNS differences in syntactic competence

Unpublished manuscript University of North TexasBellugi U Bihrle A Jernigan D Trauner D amp Dougherty S (1990)

Neuropsychological neurological and neuroanatomical prole of Williams SyndromeAmerican Journal of Medical Genetics 6 115ndash125

Braine MDS Brody RE Brooks PJ Sudhalter V Ross JA Catalano L amp FischSM (1990) Exploring language acquisition in children with a miniature articiallanguage Effects of item and pattern frequency arbitrary subclasses and correctionJournal of Memory and Language 29 591ndash610

Broeder P amp Plunkett K (1994) Connectionism and second language acquisition In NEllis (Ed) Implicit and explicit learning of languages (pp 421ndash454) London AcademicPress

Bybee J (1995) Regular morphology and the lexicon Language and Cognitive Processes10 425ndash455

Chater N (1995) Neural networks The new statistical models of mind In JP Levy DBairaktaris JA Bullinaria amp P Cairns (Eds) Connectionist models of memory andlanguage London UCL Press

Cohen JD MacWhinney B Flatt M amp Provost J (1993) PsyScope A new graphicinteractive environment for designing psychology experiments Behavioral ResearchMethods Instruments and Computers 25 257ndash271

Cottrell G amp Plunkett K (1994) Acquiring the mapping from meaning to soundsConnection Science 6 379ndash412

334 ELLIS AND SCHMIDT

Daugherty KG amp Seidenberg MS (1992) Rules or connections The past tense revisitedIn Proceedings of the 14th annual conference of the Cognitive Science Society (pp 259ndash264)Pittsburgh PA Cognitive Science Society

Daugherty KG amp Seidenberg MS (1994) Beyond rules and exceptions A connectionistapproach to inectional morphology In SD Lima RL Corrigan amp GK Iverson (Eds)The reality of linguistic rules (pp 353ndash388) Amsterdam John Benjamins

DeKeyser R (1997) Beyond explicit rule learning Automatizing second languagemorphosyntax Studies in Second Language Acquisition 19 195ndash222

Ellis NC (1996) Sequencing in SLA Phonological memory chunking and points of orderStudies in Second Language Acquisition 18 91ndash126

Eubank L amp Gregg KR (1995) ldquoEt in Amygdala Egordquo UG (S)LA and neurobiologyStudies in Second Language Acquisition 17 35ndash58

Hare M Elman JL amp Daugherty KG (1995) Default generalisation in connectionistnetworks Language and Cognitive Processes 10 601ndash630

Jung J (1971) The experimenterrsquos dilemma New York Harper amp RowKirsner K (1994) Implicit processes in second language learning In N Ellis (Ed) Implicit

and explicit learning of languages (pp 283ndash312) London Academic PressLachter J amp Bever T (1988) The relation between linguistic structure and associative

theories of language learning A constructive critique of some connectionist learningmodels Cognition 28 195ndash247

Lima SD Corrigan RL amp Iverson GK (Eds) (1994) The reality of linguistic rulesAmsterdam John Benjamins

MacWhinney B (1983) Miniature language systems as tests of use of universal operatingprinciples in second-language learning by children and adults Journal of PsycholinguisticResearch 12 467ndash478

MacWhinney B (1994) The dinosaurs and the ring In SD Lima RL Corrigan amp GKIverson (Eds) The reality of linguistic rules (pp 283ndash320) Amsterdam John Benjamins

MacWhinney B amp Leinbach J (1991) Implementations are not conceptualizationsRevising the verb learning model Cognition 40 121ndash157

Marchman VA (1993) Constraints on plasticity in a connectionist model of the Englishpast tense Journal of Cognitive Neuroscience 5 215ndash234

Marcus GF Brinkmann U Clahsen H Wiese R amp Pinker S (1995) Germaninection The exception that proves the rule Cognitive Psychology 29 198ndash256

McLaughlin B (1980) On the use of miniature articial languages in second-languageresearch Applied Psycholinguistics 1 357ndash369

Moeser SD amp Bregman AS (1972) The role of reference in the acquisition of a miniaturearticial language Journal of Verbal Learning and Verbal Behavior 11 759ndash769

Morgan JL Meier RP amp Newport EL (1987) Structural packaging in the input tolanguage learning Contributions of prosodic and morphological marking of phrases to theacquisition of language Cognitive Psychology 19 498ndash550

Morgan JL amp Newport EL (1981) The role of constituent structure in the induction of anarticial language Journal of Verbal Learning and Verbal Behavior 20 67ndash85

Morton J (1979) Facilitation in word recognition Experiments causing change in thelogogen model In PA Kolers ME Wrolstad amp M Bouma (Eds) Processing of visiblelanguage (pp 259ndash268) New York Plenum

Nakisa R amp Hahn U (1996) Where defaults donrsquot help The case of the German pluralsystem In Proceedings of the 18th annual conference of the Cognitive Science Society (pp177ndash182) Hillsdale NJ Lawrence Erlbaum Associates Inc

Newell A (1990) Unied theories of cognition Cambridge MA Harvard University PressNewell A amp Rosenbloom P (1981) Mechanisms of skill acquisition and the law of

practice In JR Anderson (Ed) Cognitive skills and their acquisition Hillsdale NJLawrence Erlbaum Associates Inc

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 335

Oetting JB amp Rice ML (1993) Plural acquisition in children with specic languageimpairment Journal of Speech and Hearing Research 36 1236ndash1248

Paivio A (1986) Mental representations A dual coding approach Oxford UK OxfordUniversity Press

Palermo DS amp Howe HE (1970) An experimental analogy to the learning of past-tenseinection rules Journal of Verbal Learning and Verbal Behavior 9 410ndash416

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S amp Prince A (1988) On language and connectionism Analysis of a parallel

distributed processing model of language acquisition Cognition 29 195ndash247Pinker S amp Prince A (1994) Regular and irregular morphology and the psychological

status of rules of grammar In SD Lima RL Corrigan amp GK Iverson (Eds) The reality oflinguistic rules (pp 321ndash351) Amsterdam John Benjamins

Plaut DC McClelland JL Seidenberg MS amp Patterson KE (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plunkett K amp Marchman V (1991) U-shaped learning and frequency effects in amulti-layered perceptron Implications for child language acquisition Cognition 38 3ndash102

Plunkett K amp Marchman V (1993) From rote learning to system building Acquiring verbmorphology in children and connectionist nets Cognition 48 21ndash69

Plunkett K amp Nakisa RC (in press) A connectionist model of Arabic plural systemLanguage and Cognitive Processes

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Prasada S Pinker S amp Snyder W (1990) Some evidence that irregular forms are retrievedfrom memory but regular forms are rule-governed Paper presented at the 31st meeting ofthe Psychonomic Society New Orleans November

Rumelhart D Hinton G amp Williams R (1986) Learning internal representations by backpropagation In DE Rumelhart amp JL McClelland (Ed) Parallel distributed processingExplorations in the microstructure of cognition Cambridge MA MIT Press

Rumelhart D amp McClelland J (1986) On learning the past tense of English verbs In DERumelhart amp JL McClelland (Eds) Parallel distributed processing Explorations in themicrostructure of cognition Vol 2 Psychological and biological models (pp 272ndash326)Cambridge MA MIT Press

Seidenberg MS amp Bruck M (1990) Consistency effects in the generation of past tensemorphology Paper presented at the 31st meeting of the Psychonomic Society New OrleansNovember

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behavior 23 383ndash404

Sharwood Smith MA (1994) The unruly world of language In N Ellis (Ed) Implicit andexplicit learning of languages (pp 33ndash44) London Academic Press

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Human Learning and Memory 6 174ndash215

Stemberger JP amp MacWhinney B (1986) Frequency and the lexical storage of regularlyinected forms Memory and Cognition 14 17ndash26

Winter B amp Reber AS (1994) Implicit learning and the acquisition of natural languagesIn N Ellis (Ed) Implicit and explicit learning of languages (pp 115ndash146) LondonAcademic Press

Yang LR amp Givon T (1997) Benets and drawbacks of controlled laboratory studies ofsecond language acquisition The Keck second language learning project Studies in SecondLanguage Acquisition 19 173ndash194

336 ELLIS AND SCHMIDT

APPENDIX

The Word-forms of the Articial Language

Picture Stem Plural Form Frequency Regularity

car garth bugarth 5 Rbed pid bupid 1 Rlamp lant bulant 5 Rtable tib butid 1 Rplane poon bupoon 5 Rball prill buprill 1 Rtrain dram budram 5 Rhouse hize buhize 1 Rbook bisk bubisk 5 Rbroom breen bubreen 1 Rphone feem gofeem 5 Iumbrella brol gubrol 1 Ichair charp zecharp 5 Ihorse naig zonaig 1 Imonkey chonk nuchonk 5 Idog woop niwoop 1 Ielephant fant vefant 5 Iscissors zoze vuzoze 1 Ikite kag rekag 5 Ish pisc ropisc 1 I

310 ELLIS AND SCHMIDT

frequency effects for both the regular and irregular forms but with increasedexposure so the frequency effect for regular forms is attenuated Thus aconnectionist system which has no ldquorulesrdquo can duplicate this ldquorule-likerdquobehaviour Rather as shown by Plaut McClelland Seidenberg andPatterson (1996) for the case of reading the frequency by regularityinteraction is a natural and necessary result of associative learning processes

HUMAN LEARNING

If we wish to investigate the effects of input and practice on the acquisition oflanguage structure then we need a proper record of learner input Yet it isvirtually impossible to gather a complete corpus of learnersrsquo exposure andproduction of natural language How can we ascertain how many types andtokens of regular and irregular inections have been processed by forexample learners of English or of German At best for natural languagewe can only guess by extrapolation of frequency counts from languagecorpora and unveriable assumptions about registers Much of the disputeabout the implications of the regularity by frequency effect in morphosyntaxcentres on such assumptions (Bybee 1995 Marcus et al 1995 Plunkett ampMarchman 1991 Prasada amp Pinker 1993 Rumelhart amp McClelland 1986)One way around this is to have people learn a miniature articial language(MAL) under laboratory conditions

There is a rich tradition of using MALs to investigate processes ofacquisition of native language (Braine Brody Brooks Sudhalter RossCatalano amp Fisch 1990 Moeser amp Bregman 1972 Morgan Meier ampNewport 1987 Morgan amp Newport 1981 Palermo amp Howe 1970 Winter ampReber 1994) and second- and foreign-languages (MacWhinney 1983McLaughlin 1980 Yang amp Givon 1997) The number of published studies isat least in the hundreds if not more This is because MAL experiments havemany advantages They allow (a) a complete log of exposure to be recorded(b) accuracy to be monitored at each point (c) factorial manipulation of thepotential independent variables of interest and the teasing apart of naturallyconfounded effects and (d) relatively rapid collection of data But theseadvantages are bought at the cost of reduced ecological validity (1) MALsare toy languages when compared to the true complexity of naturallanguage (2) the period of study falls far short of lifespan practice (3)laboratory learning exposure conditions are far from naturalistic and (4)volunteer learners are often atypical in their motivations and demographicsAll of these very real problems of laboratory research stem from thesacrices made necessary by the goals of experimental control and

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 311

microanalysis of learning in real time This is the classic ldquoexperimenterrsquosdilemmardquo Naturalistic situations limit experimental control and thus theinternal logical validity of research laboratory research limits ecologicalvalidity (Jung 1971)

In adopting MAL research we are not denying naturalistic eld studiesWe might caricature the rst as providing valid descriptions of articiallanguage learning and the latter as providing tentative descriptions ofnatural language learning However the use of a MAL in this study avoids atleast three problems that have plagued similar experiments using naturallanguages (Beck 1995) (1) Uncertainty whether frequencies derived fromcorpora accurately represent input to learners (2) problems attributed tointerference from phonological similar items in regular and irregular sets(eg lean lend or y ow) or derived forms (eg head as a verb derived froma noun) and (3) evidence from only an advanced stage of learning forcingreliance on logical argumentation rather than empirical evidence to describeacquisition

Subjects

Seven monolingual English volunteers for the School of Psychologyvolunteer panel served as subjects There were three males and four femalesThey were aged between 18 and 40 yearsrsquo old They were paid pound250 per hourfor their involvement They usually worked an hour a day at the experiment

The Miniature Articial Language

Moeser and Bregman (1972) criticised the generalisability of MALexperiments which involved subjects listening to strings of words fromsemantically empty languages because some syntactic rules that were easilyacquired when the MAL referred to a stimulus world were not acquiredwhen it did not The MAL in the present study therefore incorporatedreference The subjectsrsquo initial task was to learn MAL names for 20 picturestimuli They were told that they were learning vocabulary in a newlanguage The pictures drawn from Snodgrass and Vanderwart (1980) aredescribed in the Appendix along with the stem form of their MAL namesand their corresponding plural forms Like Braine et al (1990) we choseMAL names which were suggestive of English cognates in order to makethem readily learnable thus for example the MAL words for umbrella andsh are respectively ldquobrolrdquo and ldquopiscrdquo To the degree that the task onlyinvolves ostensive denition and is not embedded in a larger goal-directedsetting it is acknowledgedly limited as an analogue of natural languagevocabulary acquisition However it allows clean and precise experimentalcontrol whilst providing a reasonable model of ostensive vocabulary

312 ELLIS AND SCHMIDT

learning that occurs to some considerable degree in L1 and even more so inintentional foreign language learning

Subjects learned the stem forms before studying the plural forms In thestem learning phase all items appeared equally often In the subsequentplural learning phase in order to maximise the sensitivity of the reactiontime (RT) measure plurality in the MAL was marked by a prex Half of theitems had a regular plural marker (ldquobu-rdquo) the remaining 10 items hadidiosyncratic afxes as shown in the Appendix The use of a prexinectional system afforded the additional advantage of minimising transfereffects from the subjectsrsquo rst language since although it is found in naturallanguages like Ndebele it is quite different from English plural formationThus the MAL was designed with English cognates in order to promotepositive transfer of learning of the stem forms and a very different inectionsystem in order to exclude any morphological transfer Frequency wasfactorially crossed with regularity with half of each set being presented vetimes more often The high and low frequency irregular items were matchedfor initial phoneme to control voice onset time

Method

The experiment was controlled by a Macintosh LCIII computerprogrammed with PsyScope (Cohen MacWhinney Flatt amp Provost 1993)Model pronunciations of the MAL lexis spoken by the rst author wererecorded using MacRecorder Subjectsrsquo vocal reaction times were measuredusing a voice key

Stem Learning Subjects rst learned the stem forms of the MAL Thisphase consisted of blocks of 20 trials In each block every picture appearedonce in a randomly chosen ordermdashthe subjectsrsquo frequency of exposure to allof the stem forms was the same Each trial consisted of the followingsequence (1) one of the pictures appeared mid-screen for 2sec (2) if thesubject thought they knew the picture name they spoke it into themicrophone as quickly as possible (3) 2sec after picture onset the computerspoke the correct name for the picture (4) the experimenter marked thesubjectrsquos utterance as correct or not by pressing one of two keys Thedependent variables were thus correctness and RT These blocks of trialswere repeated until the subjects knew the MAL names for the pictures andcould begin uttering them within 2sec of stimulus-onset to a criterion of100 correct on two successive blocks At this point they graduated to theplural learning phase

Plural Learning This phase used the same procedures except that eachblock consisted of 80 trials presented in random order (1) One presentation

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 313

of each of the 20 singular forms as in the preceding phase (2) vepresentations of each of the ve high frequency regular (HiFreqReg) pluralforms (3) ve presentations of each of the ve high frequency irregular(HiFreqIrreg) plural forms (4) one presentation of each of the ve lowfrequency regular (LoFreqReg) forms and (5) one presentation of each ofthe ve low frequency irregular (LoFreqIrreg) forms On the singular trialsjust one picture appeared midscreen on the plural trials a pair of adjacentidentical pictures appeared This phase continued for several (mean 5 43range 5 0 to 9) blocks beyond the point at which the learners had achieved100 accuracy on all plural forms in order to monitor increasing uency asindexed by RT improvement

Results

Stem Learning The stem learning data will only be presented insummary since the major focus of the experiment lies with the plural formsSubjects took an average of 917 (SD 593) blocks to achieve the criterion ofcorrectness Some stem forms were easier to learn than others (F(19 2161)5 2307 P 0001) Particularly easy words included ldquofantrdquo (92 correctover all trials) ldquopiscrdquo (85) and ldquolantrdquo (78) Particularly difcult wordsincluded ldquoprillrdquo (32) ldquocharprdquo (43) and ldquobreenrdquo (46) However forpurposes of control it is important to note that the stem forms of the itemsthat were later allocated in the Plural Learning phase to regularirregularplural morphology or highlow frequency of exposure did not signicantlydiffer in difculty of learning at this stage Regularity [F(1 16) 5 0703 ns)Frequency [F(1 16) 5 0569 ns) Regularity 3 Frequency (F(1 16) 5 0029ns)

Plural Learning Subjects partook of between 13 and 15 blocks of thisphase

The key interest lies with the rate of acquisition of the plural forms Wewill rst describe analyses of accuracy and then RT These data are shown inFig 1

ANOVA was used to assess the effects of frequency regularity and blockFor the main effects of regularity and frequency and their interaction wereport additional analyses which determine the robustness of these effectswhen separately analysed by subjects and by words There was a signicanteffect of frequency on accuracy with the advantage going to the highfrequency items [overall analysis F(1 5939) 5 43117 P 0001 by subjectsF(1 6) 5 5631 P 0005 by words F(1 16) 5 17200 P 0001] There wasa signicant effect of regularity with the regular plurals being learned betterthan the irregulars [overall analysis F(15939) 5 8152 P 0001 bysubjects F(1 6) 5 664 P 005 by words F(1 16) 5 3050 P 0001]

314 ELLIS AND SCHMIDT

FIG 1 Acquisition data for human learners of the MAL morphology The four curvesillustrate the interactions of regularity and frequency The left-hand panel shows accuracyimproving with practice The right-hand panel shows vocal reaction time diminishing withpractice In this graph as in Figs 2 and 3 the frequency effect for regular items is assessed bycomparing the two solid lines and the frequency effect for irregular items lies in the differencebetween the two dotted lines

There was signicant improvement over blocks [F(14 5939) 5 13200 P 0001] The interaction of regularity by frequency was signicant with thefrequency effect being larger for the irregular items [overall analysis F(15939) 5 7352 P 0001 by subjects F(1 6) 5 1241 P 002 by words F(116) 5 2773 P 0001] A signicant interaction between regularity byfrequency by block [F(14 5939) 5 222 P 0005] shows that the largerfrequency effect for irregular items is maximal in the mid-order blocksmdashit isa lesser effect at early and later stages of learning (Fig 1)

These patterns are conrmed in the somewhat noisier RT data where thefollowing sources of variation were signicant at least in the overall analysis(a) frequency [overall analysis F(1 5123) 5 65074 P 0001 by subjectsF(1 6) 5 6308 P 0001 by words F(1 16) 5 7396 P 0001] (b)

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 315

regularity [overall analysis F(1 5123) 5 1062 P 0001 by subjects F(1 6)5 326 ns by words F(1 16) 1 ns] (c) block [F(14 5123) 5 2872 P 0001] (d) regularity by frequency [overall analysis F(1 5123) 5 2092 P 0001 by subjects F(1 6) 5 1015 P 002 by words F(1 16) 5 215 ns] (e)regularity by frequency by block [F(14 5123) 5 195 P 005]

It is clear from both panels of Fig 1 that there was much less regularityeffect for high frequency items than for low frequency items and incounterpart that the frequency effect was less for regular items Inparticular if the last four blocks of training are taken being typical of moreuent performance they demonstrate that ceiling effects on the accuracydata allow no frequency effect for the regular items whereas the effect offrequency is maintained for the irregular ones The RT curves in theright-hand panel of Fig 1 are clearly non-linear In each case a powerfunction better ts the data than does a linear function the R2s for the powerfunction ts being respectively HiFreqReg 094 HiFreqIrreg 097LoFreqReg 074 LoFreqIrreg 076 Thus the frequency by regularityinteraction seems a natural result of asymptotic performance limits forcorrectness the 100 accuracy ceiling for RT the latency ldquooorrdquo governedby the power law of practice The curves in Fig 1 give no hint of a suddenstep in performance whereafter all regular items are produced with similarefciency

Discussion of Human Data

Like Prasada et al (1990) these data show a regularity by frequencyinteraction in the processing of morphology However contra Prasada et althe present data which concern the learning of morphology demonstrate(a) that there are frequency effects (both on accuracy and RT) for regularitems in the early stages of acquisition (b) the sizes of these effects diminishwith learning (converging on a position at uency as described by Prasada etal) and (c) the size of the frequency effect on irregular items similarlydiminishes with learning but it does so more slowly

These effects are readily explained by simple associative theories oflearning It is not necessary to invoke hybrid systems separating rule-governed regular morphosyntax from associatively stored irregulars Ifthere is one ubiquitous quantitative law of human learning it is the powerlaw of practice (Anderson 1982) The critical feature in this relationship isnot just that performance typically time improves with practice but that therelationship involves the power law in which the amount of improvementdecreases as a function of increasing practice or frequency Anderson (1982)showed that this function applies to a variety of tasks including for examplecigar rolling syllogistic reasoning book writing industrial productionreading inverted text and lexical decision For the case of language

316 ELLIS AND SCHMIDT

acquisition Kirsner (1994) has shown that lexical recognition processes(both for speech perception and reading) and lexical production processes(articulation and writing) are independently governed by the relationshipT5 BN-a where T is some measure of latency of response and N the numberof trials of practice DeKeyser (1997) shows that automatisation ofcomprehension and production performance involving explicitly learnedsecond-language morphosyntax separately follow independent skill-specic power functions Ellis (1996) describes the general implications ofthe power law for second-language acquisition

The human acquisition data in Fig 1 clearly follow the power law oflearning Thus as performance approaches asymptote so previouslyseparated functions tend to converge High frequency items are closer toasymptote Therefore whereas performance levels for regular and irregularitems are clearly distinguishable at low frequencies they are much lessdistinct at high frequencies This comes as no surprise to us when weconsider the ceiling imposed by 100 accuracy But the power law ofpractice equally implies an asymptotic ceiling whatever our performancemeasure

The power law entails that the contribution of any potential independentvariable affecting performance will be more difcult to demonstrate withhigh-frequency items in practised individuals This is certainly the case inreading For example while spelling and graphemendashphoneme regularityhave clear effects on low frequency items they show little or no effectsamong high frequency words (Seidenberg Waters Barnes amp Tanenhaus1984) Our learning data illustrate the same principle operating in theacquisition of morphology It is not the case that there is no regularity effecton high frequency items (or concomitantly no frequency effect on regularitems) it is simply that such effects are much smaller closer to asymptoteand thus are likely to be swamped by random error Indeed highfrequency regular inected forms do exhibit a small (but non-signicant)advantage over low frequency forms in naturally occurring errorsand they can be shown to have a larger (signicant) advantage ina more controlled experimental task in which subjects produced thepast-tense forms of regular English verbs (Stemberger amp MacWhinney1986)

We have shown that the interaction of frequency and regularity resultsfrom developmental trends that are consistent with the ubiquitousdescriptive law of associative learning In the next section we willdemonstrate how such data can be generated by a very general mechanismof associative learning When presented with the same materials at the samerelative frequencies of exposure a standard three-layer feed-forwardconnectionist model closely simulates our language-learnersrsquo acquisitioncurves

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 317

CONNECTIONIST SIMULATIONS

Connectionist models allow the assessment of just how much of languageacquisition can be done by extraction of probabilistic patterns ofgrammatical and morphological regularities Since the only relation inconnectionist models is strength of association between nodes they areexcellent modelling media in which to investigate the formation ofassociations (both between surface-form elements and between these andemergent more abstract internal representations) as a result of exposure tolanguage The advantages of connectionist models over traditional symbolicmodels are that (a) they are neurally inspired (b) they incorporatedistributed representation and control of information (c) they are data-driven with prototypical representations emerging as a natural outcomeof the learning process rather than being prespecied and innately givenby the modellers as in more nativist cognitive accounts (d) they showgraceful degradation as do humans with language disorder and (e)they are in essence models of learning and acquisition rather than staticdescriptions

There have been a number of compelling PDP models of the acquisition ofmorphology The pioneers were Rumelhart and McClelland (1986) whoshowed that a simple learning model reproduced to a remarkable degreethe characteristics of young children learning the morphology of the pasttense in Englishmdashthe model generated the so-called U-shaped learningcurve for irregular forms it exhibited a tendency to overgeneralise and inthe model as in children different past-tense forms for the same word couldco-exist at the same time Yet there was no ldquorulerdquomdashldquoit is possible to imaginethat the system simply stores a set of rote-associations between base andpast-tense forms with novel responses generated by lsquoon-linersquo generalisationsfrom the stored exemplarsrdquo (Rumelhart amp McClelland 1986 p 267) Thisoriginal past-tense model was very inuential It laid the foundations for theconnectionist approach to language research which this special issue attestsit generated a large number of criticisms (Lachter amp Bever 1988 Pinker ampPrince 1988) some of which are undeniably valid and in turn it thusspawned a number of revised and improved PDP models of different aspectsof the acquisition of the English past tense (eg Cottrell amp Plunkett 1994Daugherty amp Seidenberg 1994 MacWhinney amp Leinbach 1991Marchman 1993 Plunkett amp Marchman 1991)

Of these newer models only that of Daugherty and Seidenberg (19921994) addressed the regularity by frequency interaction Their model was athree-layer feed-forward network mapping the input of phonologicalstructure of present tense encoded over 120 phonological units representinga CCCVVCCC template for English monosyllables onto an output ofsimilarly coded phonological structure of past tense form Simulation 1

318 ELLIS AND SCHMIDT

where the model was trained on all presentndashpast tense pairs with Francis andKucera frequencies 1 (309 verbs with regular past tenses 104 verbs withirregular past tenses) failed to generate any frequency by regularityinteraction in error score However when in simulation 2 the number ofirregular verbs in the training set was reduced to just 24 this resulted in therebeing little effect of frequency on performance with the regular itemswhereas performance was better for high frequency irregular verbs than forlow frequency ones This is an important demonstration that the frequencyby regularity interaction can be simulated by a connectionist systemHowever this model concerned mappings between present- and past-tenseforms not direct access from semantics as in our human data Furthermoreit is unclear from these simulations how much the results are due toregularity per se how much to phonological factors (for example insimulation 1 the error scores for regulars in generalisation tests were inatedby there being a high proportion of phonologically similar irregular pasttense false friends in the training corpus 1994 p 375) and given thecontrasting results of simulations 1 and 2 how much to the particular choiceof training items and the relative proportions of regular and irregular items

Indeed much of the debate over the validity of all of these models hasconcerned (a) the adequacy of the adopted low-level phonologicalrepresentations whether these might serve as TRICS (The RepresentationsIt Crucially Supposes) which cryptoembody rules within the connectionistnetwork (Lachter amp Bever 1988) (b) over-reliance on phonological cues inmodels that used sound-to-sound conversions to link base forms with pasttense forms (Daugherty amp Seidenberg 1992 MacWhinney 1994MacWhinney amp Leinbach 1991) and (c) the appropriateness of the trainingsets that are used in exposing the models to the evidence of language andwhether they properly reect the types and tokens in representative ratiosof regular and irregular forms in a sequence that plausibly mirrors learnerlanguage exposure at different stages of development (Daugherty ampSeidenberg 1992 Plunkett amp Marchman 1991) The models are usuallyconcerned with child learner language exposure yet here the extrapolationis particularly tenuous since adult language frequency norms are typicallythe only available reference database

In our simple demonstration with its intended focus on the frequency byregularity interaction in the acquisition of morphology we circumventedthese problems by the following means

1 We eliminated TRICS from our input and output representations byentirely ignoring the low level representations and instead simply havingone input unit for each picture and one output unit for each morphemeWe make no pretence of plausibility of these models for low levels ofrepresentation in either input or output processing but we are presently

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 319

neither concerned with low-level feature perception nor the details ofmotor programming for pronunciation Each input unit is supposedbroadly to correspond to activation of some picture detector or ldquoimagenrdquo(Paivio 1986) each output unit to some speech output ldquologogenrdquo(Morton 1979) We acknowledge that these parts of the model are grosslysimplied and we believe that these aspects ultimately involve distributedrepresentations as well However there is one advantage to thissimplicitymdashwhere as here each input detector or output logogen isrepresented by just one unit with all units having the same form there isno scope for making some more similar than others other that is than isdetermined by the frequency of the inputndashoutput mappings Thisencoding scheme allows the most hygienic investigation of frequency andregularity uncontaminated by other factors2 Like Cottrell and Plunkett (1994) we are modelling direct access fromsemantics rather than generating past tense from stem form phonologyBecause there are no phonological representations in our model there isno chance of the results reecting any confound with phonology As usualcosts accompany the benets Our simulations can have no bearing onphonological aspects of inection and thus while they might generatequantitatively clean data unlike the elegant error analyses performed byfor example Daugherty and Seidenberg (1994) and MacWhinney andLeinbach (1991) the error responses in the present simulations will bequalitatively uninteresting3 We eliminated uncertainty about the detailed content of the complexevidence which human learners are exposed to during their early years ofhearing natural language by modelling adult subjectsrsquo learning of theMAL that was reported in the preceding section Because we determinedthe exposure sequence of types and tokens of regular and irregular itemsin this language learning task we could train the models ensuring theidentical history of exposure

The most common architecture of connectionist model has three layersthe input layer of units the output layer and an intervening layer of hiddenunits (HUs) The presence of HUs enables more difcult inputoutputmappings to be learned than would be possible if the input units weredirectly connected to the output units (Broeder amp Plunkett 1994Rumelhart amp McClelland 1986) The most common learning algorithm isldquoback propagationrdquo (Rumelhart Hinton amp Williams 1986) where on eachlearning trial the network compares its output with the target output andany difference is propagated back to the hidden unit weights and in turn tothe input weights in a way that reduces the error Our simulations adoptedthis standard architecture Thus whatever the pattern of results they aregenerated by a very general learning system whose processes were not

320 ELLIS AND SCHMIDT

tweaked in any way to make it particular as a Language Acquisition DeviceSo what are the emergent patterns of language acquisition that result whenthis general associative learning mechanism is applied to the particularcontent of picture stimuli with their corresponding singular and plural lexicalresponses as experienced at the same relative frequencies of exposure as ourhuman learners

The Models

Architecture Every model had 22 input (I-) units Each of I-units 1ndash20represented one of the pictures used in the training set of the AppendixI-unit 21 represented another picture (the generalisation test item TesterP)which was only ever presented for training to the model in the singularmdashlater it was presented as a plural test item to see which plural afx the modelwould choose for this generalisation item (akin to asking you what is theplural of a novel word like ldquowugrdquo) I-unit 22 coded plurality that iswhether a singular stimulus item or a pair were presented Every model had32 output (O-) units O-units 1ndash20 represented the stem forms of the lexisshown in the Appendix O-unit 21 represented the stem form correspondingto I-unit 21 O-units 22ndash31 represented each of the other 10 unique pluralafxes for irregular items O-unit 32 represented the regular plural afxThis numbering of I- and O-units is of course arbitrary and was random-ised across modelsmdashwhat mattered and remained constant was that thesame O-unit was always reinforced whenever a particular I-unit wasactivated

We investigated four different classes of model which differed in theircomputational capacity or resources The larger the number of HUs in amodel the larger the number of connections in the network and the greaterits capacity to learn new associations and abstractions Thus we comparedmodels with 3 5 8 and 15 HUs

Stem Training At the outset the connection weights of the models wererandomised Then just like our human learners the models were rsttrained on the singular forms Each epoch of training consisted of 21 trialsEach trial consisted of presentation of a unique input pattern one for each ofthe input pictures Thus just one of I-units 1ndash21 would be ldquoonrdquo on any trialThroughout the singular training phase I-unit 22 (representing singlepluralstimuli) was set to ldquooffrdquo For each input pattern the model responded with apattern of output over its 32 O-units Initially this was the random result ofthe random connection weights But the model was also presented with thecorrect pattern of output for that corresponding input pattern (eg if I-unit 1

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 321

was on and all others off O-unit 1 should have had value 10 and all otherszero) On each trial the back-propagation algorithm calculated thedifference between the level of activity that was produced on each O-unitand the ldquocorrectrdquo level of activity and a small adjustment was made to theconnection strength to that unit in such a way that when the same processoccurred again a closer approximation to the correct pattern of outputactivation would be achieved The models were trained for 500 epochs ofsingular experience For each size of model we ran ve examples startingwith different arbitrary unit allocation and different initial randomconnection strengths The data we produce for each model is the averageperformance of these ve examples

Plural Training The model weights that resulted from this singulartraining then served as the starting point for another 700 epochs of trainingon plurals The trials constituting each epoch were very similar in nature tothose used with the human learners Each epoch consisted of 81 trialspresented in random order (a) One presentation of each of the 21 singularforms as in the preceding phase (b) ve presentations of each of the ve highfrequency regular (HiFreqReg) plural forms (c) ve presentations of eachof the ve high frequency irregular (HiFreqIrreg) plural forms (d) onepresentation of each of the ve low frequency regular (LoFreqReg) formsand (e) one presentation of each of the ve low frequency irregular(LoFreqIrreg) forms For training trials of type (a) just one of I-units 1ndash21was activated I-unit 22 was off and just the corresponding one of O-units1ndash21 was reinforced For the other training types (bndashe) one of I-units 1ndash20was activated I-unit 22 was on and one of O-units 1ndash20 (the correspondingstem form) along with one of O-units 22ndash32 (the corresponding plural afx)were reinforced The learning algorithm operated as it did in the stemtraining phase At regular intervals we tested the state of learning of themodel by presenting it without feedback with test input patterns thatrepresented the plural cases of all 21 pictures At these tests for eachstimulus we measured the pattern of activation (between 0 [no activation]and 1 [full on]) across O-units 22ndash32 and compared it against the targetplural activation for that input pattern

Results

Regularity by Frequency Figure 2 shows the Root Mean Square (RMS)error calculated across the plural afx O-units (22ndash32) averaged over the veitems in each of the following classes HiFreqReg HiFreqIrreg LoFreqRegLoFreqIrreg at each point in testing of the model These graphs illustratethat learning in all of the models showed clear effects of frequency (high

322

FIG

2

Acq

uisi

tion

data

for

fou

r co

nnec

tioni

st m

odel

s w

ith

incr

easi

ng c

ompu

tati

onal

pow

er t

rain

ed o

n th

e M

AL

mor

phol

ogy

The

re a

re c

lear

reg

ular

ity b

y fr

eque

ncy

inte

ract

ions

in a

ll m

odel

s

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 323

frequency items were learned faster than low frequency ones) regularity(regular items were learned faster than irregular ones) and a frequency byregularity interaction whereby there was much less regularity effect for highfrequency items than for low frequency items and equally that thefrequency effect was less for regular items than for irregular ones

ANOVAs on these RMS data for each size of model demonstrated thatthere was high consistency of response across items and examplesimulations For example when the 8HU model was analysed as a repeatedmeasures ANOVA across 15 roughly equally spaced blocks of training (toparallel the human data analysis) the following signicant effects wereobserved (a) Frequency [by simulations F(1 16) 5 2080 P 00005 bywords F(1 16) 5 5665 P 00001] (b) regularity [by simulations F(1 16)5 907 P 001 by words F(1 16) 5 3957 P 00001] (c) regularity byfrequency [by simulations F(1 16) 5 485 P 005 by words F(1 16) 51561 P 0005] (d) block [by simulations F(14 224) 5 6803 P 00001by words F(14 224) 5 14914 P 00001] (e) block by regularity [bysimulations F(14 224) 5 3675 P 00001 by words F(14 224) 5 2929 P 00001] (f) block by frequency [by simulations F(14 224) 5 1893 P 00001 by words F(14 224) 5 1184 P 00001] and (g) block by regularityby frequency [by simulations F(14 224) 5 1611 P 00001 by words F(14224) 5 1306 P 00001]

Comparison of this pattern of ANOVA effects with that reported earlierfor the human data shows important similarities in both cases there aresignicant main effects of frequency regularity and blocks and there aresignicant interactions involving regularity by frequency and regularity byfrequency by block Thus the connectionist models demonstrate effectswhich broadly parallel those found in humans

Comparison with Human Data More detailed comparison is alsopossible Although RMS error is the usual measure of model performancebecause it assesses how well the network learns to inhibit non-relevant unitsas well as to excite relevant ones we also extracted simple accuracy data forthe 8HU model This accuracy score is the amount of activation (between 0and 1) on the single O-unit which corresponds to the appropriate target afxfor that input pattern Figure 3 shows the performance of the 8HU modelusing this metric It is clear that accuracy scores generate a graph which iseffectively a reection in a horizontal plane of the RMS data shown in thethird panel of Fig 2 In fact in the current simulations correct activation isalmost perfectly correlated with MSE (for example r 5 2 0988 for the 8HUmodel) However the activation metric has the advantage of more readyinterpretation and direct comparison with the human data

When the 8HU model and the human data are aligned as in Fig 3 thesecorrespondences become clear Pairwise comparison of individual points

324 ELLIS AND SCHMIDT

FIG 3 A comparison of human accuracy performance and that of the eight hidden unitconnectionist simulation

across these two graphs by correlation shows that the simulation predicts alarge proportion of the variance in the human data (R2 5 078) There aresome differences in detailmdashas is claried in Fig 4 where performance isaveraged over blocks the model performs somewhat better on the regularitems and worse on the irregular items particularly the low frequencyirregular items than do the humans ANOVA (three factor [humanmodelregularity and frequency] with 15 blocks as repeated measures by wordsanalysis) comparing the human and 8HU model data conrms theseinteractions (a) humanmodel F(1 32) 5 136 ns (b) humanmodel byfrequency F(1 32) 5 047 ns (c) humanmodel by regularity F(1 32) 53028 P 00001 (d) humanmodel by regularity by frequency F(1 32) 5501 P 005

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 325

FIG 4 The regularity by frequency interaction averaged over blocks in humans and the eighthidden unit model Error bars reect 95 condence intervals

Generalisation So far we have described performance with traineditems However we also tested model output when the stimulus was thepattern for generalisation item (TesterP) along with activation of the pluralmarking I-unit 22 a state of input on which the models had never beentrained Table 1 shows performance of the different models at the end oftraining It is clear that the larger models have abstracted the regular pluralpattern and tend to apply it by default to the generalisation test item for the15HU model (a) average activation on the regular plural O-unit is 060 (b)mean RMS error comparing observed activation across O-units 22ndash32 andthe target regular plural pattern (10000000000) is just 045 and (c) four outof the ve exemplar runs of this size of model chose the regular pluralpattern as being the closest to observed output as measured by minimum

326 ELLIS AND SCHMIDT

TABLE 1Performance on the Target Regular Plural Pattern for the Four Sizes of Model When

Presented with the Generalisation Wug-test Item TesterP at End of Training

Model Size

Measure 3HUa 5HU 8HU 15HU

RMS errorb

M 081 079 053 045SD 043 050 045 032

Activation weightc

M 020 028 057 060SD 044 044 052 035

N hits (5)d 1 2 3 4

There were ve examplars of each size of model aHU 5 hidden units bRMS error calculatedagainst the target activation pattern across O-units 22ndash32 for the regular plural afx cActivationweight on the regular plural afx O-Unit dNumber of exemplar models (5) which chose theregular plural afx pattern for TesterP as indexed by output weights on O-units 22ndash32 beingclosest to the regular plural afx target pattern activation using a squared Euclidean distancemetric

squared Euclidean distance Thus when the larger models are presentedwith a plural stimulus which they have only ever previously experienced as asingle form there is a tendency for them to generalise and apply the regularplural morpheme (bu-) in the same way that humans might generalise thatthe plural of ldquowugrdquo is ldquowugsrdquo

Effects of Different Sizes of Model Figure 2 also illustrates the effects ofmanipulating computational capacity of model (1) Models with lowercomputational power ( 5 a smaller number of HUs) learn the high frequencyitems quite wellmdashalmost as well as the largest model (2) The most strikingeffect of varying the computational power of the models lies in their abilitiesto learn low frequency irregular itemsmdashthis is by far the most sensitive indexof morphological learning ability The 3HU model hardly manages to learnthese forms at all The 15HU model eventually learns them rather well (3)There is essentially no frequency effect for regular items in the highercomputational power models but none the less the frequency effect forirregular items remains strong (4) The smaller models continue to show afrequency effect for regular items at the end of training Table l provides oneadditional effect of model size (5) The greater the computational power ofthe models the more they operate in ldquorule-likerdquo way by abstracting aldquoregularrdquo plural form which is applied by default to novel items In sumwhile lower computational power models are reasonably good on highfrequency regular items they show frequency effects for irregular and

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 327

regular items are relatively poor on ldquowug testsrdquo and have particulardifculty on low frequency irregular items

Discussion of Simulations

We believe that at least for the issue of regularity and frequency effects inmorphosyntax this is to date the most complete quantitative analysis of theadequacy of t of simulation to human data We are not simply makingpredictions about how an underspecied model might behave (theDaugherty amp Seidenberg 1994 criticisms of the Pinker amp Prince 1988 andPinker 1991 theories) We are not simply demonstrating that simulation andhuman data alike exhibit rst order interactions of frequency and regularity(Daugherty amp Seidenberg 1994) Instead we are showing the parallelpatterns of signicance of main effects rst and second order interactions inANOVAs of simulation and human data and we are showing that thesimulations explain close to 80 of the relevant human data When we go asfar as actually comparing human and model performance in a multifactorialANOVA we nd some differences of detail in the size of interactions thatare qualied by the humanmodel factor But these differences of detail donot detract from the general success of the models in simulating the humanpattern of development of the frequency by regularity interaction Inhumans and models alike high frequency items were learned signicantlyfaster than low frequency ones regular items were learned signicantlyfaster than irregular ones there was a signicant frequency by regularityinteraction where the frequency effect was less for regular items than forirregular ones and this is qualied as the higher level interaction with blockwhereby there is a developmental trendmdashthe frequency effect for regularitems attenuates faster than that for irregular items

We have demonstrated that the models can generalise and produce thedefault plural afx for a novel stimulus Similar ldquowug testrdquo performance by ahuman learner would be taken as an operationalisation that they hadacquired the ldquoregularrdquo morphological systematicity

Finally we have shown how varying the computational capacity of themodels affects both the rate of acquisition of default case the presence orabsence of frequency effects for regular items and ability to acquireirregular items This is compatible with existing data for children withspecic language impairment (SLI) Oetting and Rice (1993) compared ve-year-old SLI children with age-matched controls on their ability to formplurals The SLI children were signicantly worse at generating regularplurals for nonce (5 wug) items they were worse at generating regularplurals and they showed an effect of frequency on the regular items whichthe control children because of ceiling effects did not UnfortunatelyOetting and Rice (1993) do not provide clear data on the childrenrsquos ability to

328 ELLIS AND SCHMIDT

form irregular plurals However their pattern of differences between SLIand control childrenrsquos performance on regular items is sufciently close tothat between the present low-capacity and high-capacity simulations tosuggest that morphosyntactic impairments in individuals with SLI might beexplained by reduced language processing capacity in a general associativememory network rather than by a hybrid account The SLI childrenrsquosshowing frequency effects for regular items is particularly compelling in thisrespect However further assessment of regularity by frequency effects anddefault abstraction in individuals with SLI and with Williams syndrome(whose ability on regular forms is said to outstrip their performance onirregularsmdashBellugi Bihrle Jernigan Trauner amp Dougherty 1990) isnecessary to test these parallels further (see Marchman 1993 for othersimulations of different types of language dysfunction)

GENERAL DISCUSSION

Fluent language users have processed many millions of utterances involvingtens of thousands of types presented as innumerable tokens It should comeas no surprise either that they demonstrate such effortless and complex skillas a result of this mass of practice or that researchers lacking any truerecord of the learnersrsquo experience are awed and confused by thesesophisticated grammatical abilities While we have no wish to deny any ofthe complexity of the nal uent state we suspect that much of the mysteryof morphology can be claried by focusing on the acquisition process ratherthan the end-point This has been our aim in this paper Our MAL is atravesty of natural language but at least we know the types and tokens in thelearnersrsquo language evidence and there is no need to speculate or argue aboutextrapolations from corpus data or assumptions about registers

Human learning of this MAL inectional morphology quickly culminatesin a state where as with natural language frequency and regularity haveinteractive effects on performance But as we chart acquisition it is clearthat this interaction need not imply complex dual-mechanisms of processingRather it simply reects the asymptotes expected from the power law ofpractice a simple associative law of learning Thus we have shown that oneof the most frequently introduced arguments for the necessity of adual-mechanism approach a frequency effect for irregulars and the absenceof such an effect for regulars is not a good argument at all Furthermore wehave demonstrated that a simple connectionist model as an implementationof associative learning provided with the same language evidenceaccurately simulates the human acquisition data

But how is the power law instantiated in human and connectionistsystems and what is being associated in the acquisition of inectional

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 329

morphosyntax The power law of learning in human performance has beeninterpreted as resulting from basic associative mechanisms involving theformation of new chunks and the effects of frequency on the accessibility ofthese representations (Newell 1990 Newell amp Rosenbloom 1981)Anderson and Schooler (1991) suggest that memory (both as its behaviouralexpression in error rate and latency and as its neural expression in LTP)displays properties such as the power law of learning because theseproperties reect an optimal response to the environment where theprobability of an item occurring at any particular time is a power function ofits past frequency of occurrence Neural activation which controlsbehaviour reects the probability of an item occurring in the environmentthus the neural processes are designed to adapt behaviour to the statisticalproperties of the environment (Anderson 1993) Connectionist systems aredesigned to do the same thing (Chater 1995)

In our simplied account of inectional morphology where phonologicalfactors are put to one side the relevant units for chunking are the stem formsand the plural afxes From an associative perspective regularity andfrequency are essentially the same factor under different names The rstmeaning of ldquoregularrdquo in the Pocket Oxford Dictionary involves ldquohabitualconstantrdquo acts a denition in terms of statistical frequencies consistencyand descriptive generalisation the second stresses ldquoconforming to a rule orprinciplerdquo We need to disentangle these senses (see Sharwood-Smith 1994and Lima Corrigan amp Iverson 1994 for conceptual analysis of ldquorules oflanguagerdquo) Whether regular morphology is generated according to a rule ornot it is certainly the case for English and the MAL under study here (andgenerally it is the default if not the universal casemdashwe will return to thismatter later) that regular afxes are more habitual or frequent And asdemonstrated in Fig 5 the power law of practice entails that an effect of aconstant increment of regularity (in its frequency sense) is much moreapparent at low than at high frequencies of practice

Although it is a general principle the degree to which it applies dependson a range of factors including (a) the exponent of the power function (b)the particular level of experience attained and thus the placement ofcomparison points on the learning curve and (c) the degree to whichfrequency and regularity are additive or multiplicative In the presentexperiment a vefold increase in the frequency of the regular items resultsin a (5 3 the number of regular items) increase in use of the regular afx avefold increase in the frequency of an irregular item results in merely avefold increase in the use of the irregular afx Thus frequency andregularity are interactive rather than additive But even if we allow forinteraction the function still results in greater regularity effects for lowfrequency itemsmdashjust as for example the power function

330 ELLIS AND SCHMIDT

FIG 5 A frequency by regularity interaction arising from additive contributions of regularity(solid horizontal arrows) and frequency (dotted horizontal arrows) inputting into anasymptoting power function Notice in particular the solid vertical bars measuring out the largeregularity effect at low frequencies and the much smaller one at high frequencies (Adaptedfrom Plaut McClelland Seidenberg amp Patterson 1994)

y 5 1 2 x2 2

asymptotes so does any power function

y 5 1 2 (xn)2 2

where n 0 the shape remains the same albeit stretched or condensedalong the horizontal axis Thus all associative accounts of morphologywhether they stress the importance of type or token frequency (Bybee 1995)in the determination of statistical regularity imply a frequency by regularityinteraction in performance

Plaut et al (1996) analyse the operation of connectionist networks in theparticular quasi-regular domain of spellingndashsound consistency in reading todemonstrate how the frequency by regularity interaction is a direct

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 331

consequence of the nonlinearity adaptivity and distributed representationproperties of learning and representation in PDP networks In what followswe will minimally rephrase their analysis as it applies to the quasi-regulardomain of inectional morphology In a connectionist network the weightchanges induced by an inputoutput pattern (IOP) on any training epochserve to reduce the error on that IOP The frequency of the IOP (and theunits it involves) is reected in how often it is presented to the network Thusword frequency directly amplies weight changes that are helpful to theIOP itself Consistency of the morphological inections of two stems isreected in the similarity of afx units that are co-activated in their IOPsFurthermore two inputs will induce similar weight changes to the extentthat they activate similar units In our MAL as an extreme case consistentforms all activate the same afx unit irregular ones each activate a differentidiosyncratic afx Given that the weight changes that are induced by eachIOP are superimposed on the weight changes for all other IOPs an IOPwill tend to be helped by the weight changes for IOPs whose inputoutputmappings are consistent with its own and hindered by the weight changesfor inconsistent IOPs Thus frequency and consistency sum because theyboth arise from similar weight changes that are simply added together duringtraining The weight changes result in corresponding increases in thesummed input to output units that should be active and decreases to thesummed units that should be inactive However due to the non-linearity ofthe input-output function of units these changes do not produce directlyproportionate reduction of error Rather as the magnitude of the summedinput to output units increases their states gradually asymptote towards10mdasha given increase in the summed input to a unit yields progressivelysmaller decrements in error over the course of training Thus althoughfrequency and regularity-as-consistency each contribute to the weights andhence to the summed input to units their effect on error is subjected to agradual ceiling effect as unit states are driven towards extremal values

Thus a connectionist associative account of simple morphosyntax as it isembodied in our MAL holds that learning involves associating inputpatterns representing single or plural concepts with stem and afx lemmasacross a large distributed network Frequency of experience increases thestrength of the appropriate IO associations Regularity effects stem fromconsistency the consistent items all involve pairings between plurality andthe regular lemma and thus regularity is frequency by another name Thenetwork sums and abstracts these consistencies but it does so usingnon-linear unit inputndashoutput functions thereby resulting in the frequency byregularity interaction Networks are not simple competitive chunking orMarkov chaining mechanisms working on surface form Their massivelydistributed nature allows the emergence of more abstract internalrepresentations We have argued that this analysis accounts for the human

332 ELLIS AND SCHMIDT

acquisition data of simple MAL morphosyntax quite well We believe thatthe acquisition of natural language morphosyntax where there are manyadditional factors of different phonological consistencies (of the type forexample where the neighbours sink drink and stink are irregular in theirpast tenses but all behave in the same -ankway) are equally conducive to theprinciples of this type of account although as illustrated in grandersimulation enterprises (Cottrell amp Plunkett 1994 Daugherty amp Seidenberg1994 MacWhinney amp Leinbach 1991 Marchman 1993 Plunkett ampMarchman 1993) the complexity of interaction of the factors that are therein the language evidence leads to much more complex developmentaloutcomes Our role here has been to study human acquisition underprecisely known circumstances and to demonstrate just how well aconnectionist associative account can simulate these data

A simple regularity5 consistency account of this type will have difculty ifthe ldquoregularrdquo or ldquodefaultrdquo case is not the most frequent case in a naturallanguage Although there is agreement for English past tense and formorphology more generally that the default case is more frequent theremay be exceptions Marcus et al (1995) argue that while the German particle-t applies to a much smaller percentage of verbs than its English counterpartand the German plural -s applies only to a small percentage of nounsnevertheless these afxes behave as defaults in the language These defaultsufxations in German could thus pose a problem for statistical orconnectionist accounts of the acquisition of the more frequent patterns asdefault since they may not be due to a large number of regular wordsreinforcing a pattern in associative memory (Prasada amp Pinker 1993)However this is still a matter of some debate Bybee (1995) suggests that amore reasonable method of counting German particle type frequency doesshow the default (or ldquoproductiverdquo) process to have the highest typefrequency She also argues that to a large extent the productivity patterns ofGerman plurals also reect their type frequency Nakisa and Hahn (1996)and Plunkett and Nakisa (in press) show that generalisation to unseen ornovel forms in German and Arabic (where there have also been claims for aminority default) is more accurately predicted by their phonologicalsimilarity to existing forms in the language (properly represented for typeand token frequency) rather than by the operation of a default rule FinallyHare Elman and Daugherty (1995) demonstrate that multilayerednetworks can develop a default category even in the absence of superior typefrequency as long as the non-default classes are well dened and narrowlydened so that they serve as strong prototypes for analogising to novelforms In such cases the area outside these well-dened attractor basins canconstitute a potential default (see also Plunkett amp Marchman 1991)

In the original hybrid model irregulars were stored and accessed fromrote memory Pinker and Prince (1994 p 326) modied this part of the

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 333

model arguing that since rote memory could not account (a) for similaritiesbetween the morphological base and irregular forms (eg swingndashswung) (b)for similarity within sets of base forms undergoing similar processes (egsingndashsang ringndashrang springndashsprang) or (c) for the kind of semi-productivityshown when children produce errors such as bringndashbrang or swingndashswangthe memory system underlying such productions must be associative anddynamic somewhat as connectionism portrays it Yet to account for datasuch as the frequencyregularity interaction this revised hybrid model stillholds that regular forms are rule-governed But a purely rule-based accountof regulars cannot explain false friends effects where regular inconsistentitems (eg bakendashbaked is similar in rhyme to neighbours makendashmade andtakendashtook which have inconsistent past tenses) are produced more slowlythat entirely regular ones (Daugherty amp Seidenberg 1994 Seidenberg ampBruck 1990) or frequency effects on regular forms (Oetting amp Rice 1993Stemberger amp MacWhinney 1986) Unlike connectionist models a rule-based account of regulars cannot explain these aspects of the human dataNor is the regularityfrequency interaction any reason to reject connectionistaccounts of morphosyntax in favour of a hybrid model

REFERENCESAnderson JR (1982) Acquisition of cognitive skill Psychological Review 89 369ndash406Anderson JR (1993) Rules of the mind Hillsdale NJ Lawrence Erlbaum Associates IncAnderson JR amp Schooler LJ (1991) Reections of the environment in memory

Psychological Science 2 396ndash408Beck M (1995) Tracking down the source of NSndashNNS differences in syntactic competence

Unpublished manuscript University of North TexasBellugi U Bihrle A Jernigan D Trauner D amp Dougherty S (1990)

Neuropsychological neurological and neuroanatomical prole of Williams SyndromeAmerican Journal of Medical Genetics 6 115ndash125

Braine MDS Brody RE Brooks PJ Sudhalter V Ross JA Catalano L amp FischSM (1990) Exploring language acquisition in children with a miniature articiallanguage Effects of item and pattern frequency arbitrary subclasses and correctionJournal of Memory and Language 29 591ndash610

Broeder P amp Plunkett K (1994) Connectionism and second language acquisition In NEllis (Ed) Implicit and explicit learning of languages (pp 421ndash454) London AcademicPress

Bybee J (1995) Regular morphology and the lexicon Language and Cognitive Processes10 425ndash455

Chater N (1995) Neural networks The new statistical models of mind In JP Levy DBairaktaris JA Bullinaria amp P Cairns (Eds) Connectionist models of memory andlanguage London UCL Press

Cohen JD MacWhinney B Flatt M amp Provost J (1993) PsyScope A new graphicinteractive environment for designing psychology experiments Behavioral ResearchMethods Instruments and Computers 25 257ndash271

Cottrell G amp Plunkett K (1994) Acquiring the mapping from meaning to soundsConnection Science 6 379ndash412

334 ELLIS AND SCHMIDT

Daugherty KG amp Seidenberg MS (1992) Rules or connections The past tense revisitedIn Proceedings of the 14th annual conference of the Cognitive Science Society (pp 259ndash264)Pittsburgh PA Cognitive Science Society

Daugherty KG amp Seidenberg MS (1994) Beyond rules and exceptions A connectionistapproach to inectional morphology In SD Lima RL Corrigan amp GK Iverson (Eds)The reality of linguistic rules (pp 353ndash388) Amsterdam John Benjamins

DeKeyser R (1997) Beyond explicit rule learning Automatizing second languagemorphosyntax Studies in Second Language Acquisition 19 195ndash222

Ellis NC (1996) Sequencing in SLA Phonological memory chunking and points of orderStudies in Second Language Acquisition 18 91ndash126

Eubank L amp Gregg KR (1995) ldquoEt in Amygdala Egordquo UG (S)LA and neurobiologyStudies in Second Language Acquisition 17 35ndash58

Hare M Elman JL amp Daugherty KG (1995) Default generalisation in connectionistnetworks Language and Cognitive Processes 10 601ndash630

Jung J (1971) The experimenterrsquos dilemma New York Harper amp RowKirsner K (1994) Implicit processes in second language learning In N Ellis (Ed) Implicit

and explicit learning of languages (pp 283ndash312) London Academic PressLachter J amp Bever T (1988) The relation between linguistic structure and associative

theories of language learning A constructive critique of some connectionist learningmodels Cognition 28 195ndash247

Lima SD Corrigan RL amp Iverson GK (Eds) (1994) The reality of linguistic rulesAmsterdam John Benjamins

MacWhinney B (1983) Miniature language systems as tests of use of universal operatingprinciples in second-language learning by children and adults Journal of PsycholinguisticResearch 12 467ndash478

MacWhinney B (1994) The dinosaurs and the ring In SD Lima RL Corrigan amp GKIverson (Eds) The reality of linguistic rules (pp 283ndash320) Amsterdam John Benjamins

MacWhinney B amp Leinbach J (1991) Implementations are not conceptualizationsRevising the verb learning model Cognition 40 121ndash157

Marchman VA (1993) Constraints on plasticity in a connectionist model of the Englishpast tense Journal of Cognitive Neuroscience 5 215ndash234

Marcus GF Brinkmann U Clahsen H Wiese R amp Pinker S (1995) Germaninection The exception that proves the rule Cognitive Psychology 29 198ndash256

McLaughlin B (1980) On the use of miniature articial languages in second-languageresearch Applied Psycholinguistics 1 357ndash369

Moeser SD amp Bregman AS (1972) The role of reference in the acquisition of a miniaturearticial language Journal of Verbal Learning and Verbal Behavior 11 759ndash769

Morgan JL Meier RP amp Newport EL (1987) Structural packaging in the input tolanguage learning Contributions of prosodic and morphological marking of phrases to theacquisition of language Cognitive Psychology 19 498ndash550

Morgan JL amp Newport EL (1981) The role of constituent structure in the induction of anarticial language Journal of Verbal Learning and Verbal Behavior 20 67ndash85

Morton J (1979) Facilitation in word recognition Experiments causing change in thelogogen model In PA Kolers ME Wrolstad amp M Bouma (Eds) Processing of visiblelanguage (pp 259ndash268) New York Plenum

Nakisa R amp Hahn U (1996) Where defaults donrsquot help The case of the German pluralsystem In Proceedings of the 18th annual conference of the Cognitive Science Society (pp177ndash182) Hillsdale NJ Lawrence Erlbaum Associates Inc

Newell A (1990) Unied theories of cognition Cambridge MA Harvard University PressNewell A amp Rosenbloom P (1981) Mechanisms of skill acquisition and the law of

practice In JR Anderson (Ed) Cognitive skills and their acquisition Hillsdale NJLawrence Erlbaum Associates Inc

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 335

Oetting JB amp Rice ML (1993) Plural acquisition in children with specic languageimpairment Journal of Speech and Hearing Research 36 1236ndash1248

Paivio A (1986) Mental representations A dual coding approach Oxford UK OxfordUniversity Press

Palermo DS amp Howe HE (1970) An experimental analogy to the learning of past-tenseinection rules Journal of Verbal Learning and Verbal Behavior 9 410ndash416

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S amp Prince A (1988) On language and connectionism Analysis of a parallel

distributed processing model of language acquisition Cognition 29 195ndash247Pinker S amp Prince A (1994) Regular and irregular morphology and the psychological

status of rules of grammar In SD Lima RL Corrigan amp GK Iverson (Eds) The reality oflinguistic rules (pp 321ndash351) Amsterdam John Benjamins

Plaut DC McClelland JL Seidenberg MS amp Patterson KE (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plunkett K amp Marchman V (1991) U-shaped learning and frequency effects in amulti-layered perceptron Implications for child language acquisition Cognition 38 3ndash102

Plunkett K amp Marchman V (1993) From rote learning to system building Acquiring verbmorphology in children and connectionist nets Cognition 48 21ndash69

Plunkett K amp Nakisa RC (in press) A connectionist model of Arabic plural systemLanguage and Cognitive Processes

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Prasada S Pinker S amp Snyder W (1990) Some evidence that irregular forms are retrievedfrom memory but regular forms are rule-governed Paper presented at the 31st meeting ofthe Psychonomic Society New Orleans November

Rumelhart D Hinton G amp Williams R (1986) Learning internal representations by backpropagation In DE Rumelhart amp JL McClelland (Ed) Parallel distributed processingExplorations in the microstructure of cognition Cambridge MA MIT Press

Rumelhart D amp McClelland J (1986) On learning the past tense of English verbs In DERumelhart amp JL McClelland (Eds) Parallel distributed processing Explorations in themicrostructure of cognition Vol 2 Psychological and biological models (pp 272ndash326)Cambridge MA MIT Press

Seidenberg MS amp Bruck M (1990) Consistency effects in the generation of past tensemorphology Paper presented at the 31st meeting of the Psychonomic Society New OrleansNovember

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behavior 23 383ndash404

Sharwood Smith MA (1994) The unruly world of language In N Ellis (Ed) Implicit andexplicit learning of languages (pp 33ndash44) London Academic Press

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Human Learning and Memory 6 174ndash215

Stemberger JP amp MacWhinney B (1986) Frequency and the lexical storage of regularlyinected forms Memory and Cognition 14 17ndash26

Winter B amp Reber AS (1994) Implicit learning and the acquisition of natural languagesIn N Ellis (Ed) Implicit and explicit learning of languages (pp 115ndash146) LondonAcademic Press

Yang LR amp Givon T (1997) Benets and drawbacks of controlled laboratory studies ofsecond language acquisition The Keck second language learning project Studies in SecondLanguage Acquisition 19 173ndash194

336 ELLIS AND SCHMIDT

APPENDIX

The Word-forms of the Articial Language

Picture Stem Plural Form Frequency Regularity

car garth bugarth 5 Rbed pid bupid 1 Rlamp lant bulant 5 Rtable tib butid 1 Rplane poon bupoon 5 Rball prill buprill 1 Rtrain dram budram 5 Rhouse hize buhize 1 Rbook bisk bubisk 5 Rbroom breen bubreen 1 Rphone feem gofeem 5 Iumbrella brol gubrol 1 Ichair charp zecharp 5 Ihorse naig zonaig 1 Imonkey chonk nuchonk 5 Idog woop niwoop 1 Ielephant fant vefant 5 Iscissors zoze vuzoze 1 Ikite kag rekag 5 Ish pisc ropisc 1 I

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 311

microanalysis of learning in real time This is the classic ldquoexperimenterrsquosdilemmardquo Naturalistic situations limit experimental control and thus theinternal logical validity of research laboratory research limits ecologicalvalidity (Jung 1971)

In adopting MAL research we are not denying naturalistic eld studiesWe might caricature the rst as providing valid descriptions of articiallanguage learning and the latter as providing tentative descriptions ofnatural language learning However the use of a MAL in this study avoids atleast three problems that have plagued similar experiments using naturallanguages (Beck 1995) (1) Uncertainty whether frequencies derived fromcorpora accurately represent input to learners (2) problems attributed tointerference from phonological similar items in regular and irregular sets(eg lean lend or y ow) or derived forms (eg head as a verb derived froma noun) and (3) evidence from only an advanced stage of learning forcingreliance on logical argumentation rather than empirical evidence to describeacquisition

Subjects

Seven monolingual English volunteers for the School of Psychologyvolunteer panel served as subjects There were three males and four femalesThey were aged between 18 and 40 yearsrsquo old They were paid pound250 per hourfor their involvement They usually worked an hour a day at the experiment

The Miniature Articial Language

Moeser and Bregman (1972) criticised the generalisability of MALexperiments which involved subjects listening to strings of words fromsemantically empty languages because some syntactic rules that were easilyacquired when the MAL referred to a stimulus world were not acquiredwhen it did not The MAL in the present study therefore incorporatedreference The subjectsrsquo initial task was to learn MAL names for 20 picturestimuli They were told that they were learning vocabulary in a newlanguage The pictures drawn from Snodgrass and Vanderwart (1980) aredescribed in the Appendix along with the stem form of their MAL namesand their corresponding plural forms Like Braine et al (1990) we choseMAL names which were suggestive of English cognates in order to makethem readily learnable thus for example the MAL words for umbrella andsh are respectively ldquobrolrdquo and ldquopiscrdquo To the degree that the task onlyinvolves ostensive denition and is not embedded in a larger goal-directedsetting it is acknowledgedly limited as an analogue of natural languagevocabulary acquisition However it allows clean and precise experimentalcontrol whilst providing a reasonable model of ostensive vocabulary

312 ELLIS AND SCHMIDT

learning that occurs to some considerable degree in L1 and even more so inintentional foreign language learning

Subjects learned the stem forms before studying the plural forms In thestem learning phase all items appeared equally often In the subsequentplural learning phase in order to maximise the sensitivity of the reactiontime (RT) measure plurality in the MAL was marked by a prex Half of theitems had a regular plural marker (ldquobu-rdquo) the remaining 10 items hadidiosyncratic afxes as shown in the Appendix The use of a prexinectional system afforded the additional advantage of minimising transfereffects from the subjectsrsquo rst language since although it is found in naturallanguages like Ndebele it is quite different from English plural formationThus the MAL was designed with English cognates in order to promotepositive transfer of learning of the stem forms and a very different inectionsystem in order to exclude any morphological transfer Frequency wasfactorially crossed with regularity with half of each set being presented vetimes more often The high and low frequency irregular items were matchedfor initial phoneme to control voice onset time

Method

The experiment was controlled by a Macintosh LCIII computerprogrammed with PsyScope (Cohen MacWhinney Flatt amp Provost 1993)Model pronunciations of the MAL lexis spoken by the rst author wererecorded using MacRecorder Subjectsrsquo vocal reaction times were measuredusing a voice key

Stem Learning Subjects rst learned the stem forms of the MAL Thisphase consisted of blocks of 20 trials In each block every picture appearedonce in a randomly chosen ordermdashthe subjectsrsquo frequency of exposure to allof the stem forms was the same Each trial consisted of the followingsequence (1) one of the pictures appeared mid-screen for 2sec (2) if thesubject thought they knew the picture name they spoke it into themicrophone as quickly as possible (3) 2sec after picture onset the computerspoke the correct name for the picture (4) the experimenter marked thesubjectrsquos utterance as correct or not by pressing one of two keys Thedependent variables were thus correctness and RT These blocks of trialswere repeated until the subjects knew the MAL names for the pictures andcould begin uttering them within 2sec of stimulus-onset to a criterion of100 correct on two successive blocks At this point they graduated to theplural learning phase

Plural Learning This phase used the same procedures except that eachblock consisted of 80 trials presented in random order (1) One presentation

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 313

of each of the 20 singular forms as in the preceding phase (2) vepresentations of each of the ve high frequency regular (HiFreqReg) pluralforms (3) ve presentations of each of the ve high frequency irregular(HiFreqIrreg) plural forms (4) one presentation of each of the ve lowfrequency regular (LoFreqReg) forms and (5) one presentation of each ofthe ve low frequency irregular (LoFreqIrreg) forms On the singular trialsjust one picture appeared midscreen on the plural trials a pair of adjacentidentical pictures appeared This phase continued for several (mean 5 43range 5 0 to 9) blocks beyond the point at which the learners had achieved100 accuracy on all plural forms in order to monitor increasing uency asindexed by RT improvement

Results

Stem Learning The stem learning data will only be presented insummary since the major focus of the experiment lies with the plural formsSubjects took an average of 917 (SD 593) blocks to achieve the criterion ofcorrectness Some stem forms were easier to learn than others (F(19 2161)5 2307 P 0001) Particularly easy words included ldquofantrdquo (92 correctover all trials) ldquopiscrdquo (85) and ldquolantrdquo (78) Particularly difcult wordsincluded ldquoprillrdquo (32) ldquocharprdquo (43) and ldquobreenrdquo (46) However forpurposes of control it is important to note that the stem forms of the itemsthat were later allocated in the Plural Learning phase to regularirregularplural morphology or highlow frequency of exposure did not signicantlydiffer in difculty of learning at this stage Regularity [F(1 16) 5 0703 ns)Frequency [F(1 16) 5 0569 ns) Regularity 3 Frequency (F(1 16) 5 0029ns)

Plural Learning Subjects partook of between 13 and 15 blocks of thisphase

The key interest lies with the rate of acquisition of the plural forms Wewill rst describe analyses of accuracy and then RT These data are shown inFig 1

ANOVA was used to assess the effects of frequency regularity and blockFor the main effects of regularity and frequency and their interaction wereport additional analyses which determine the robustness of these effectswhen separately analysed by subjects and by words There was a signicanteffect of frequency on accuracy with the advantage going to the highfrequency items [overall analysis F(1 5939) 5 43117 P 0001 by subjectsF(1 6) 5 5631 P 0005 by words F(1 16) 5 17200 P 0001] There wasa signicant effect of regularity with the regular plurals being learned betterthan the irregulars [overall analysis F(15939) 5 8152 P 0001 bysubjects F(1 6) 5 664 P 005 by words F(1 16) 5 3050 P 0001]

314 ELLIS AND SCHMIDT

FIG 1 Acquisition data for human learners of the MAL morphology The four curvesillustrate the interactions of regularity and frequency The left-hand panel shows accuracyimproving with practice The right-hand panel shows vocal reaction time diminishing withpractice In this graph as in Figs 2 and 3 the frequency effect for regular items is assessed bycomparing the two solid lines and the frequency effect for irregular items lies in the differencebetween the two dotted lines

There was signicant improvement over blocks [F(14 5939) 5 13200 P 0001] The interaction of regularity by frequency was signicant with thefrequency effect being larger for the irregular items [overall analysis F(15939) 5 7352 P 0001 by subjects F(1 6) 5 1241 P 002 by words F(116) 5 2773 P 0001] A signicant interaction between regularity byfrequency by block [F(14 5939) 5 222 P 0005] shows that the largerfrequency effect for irregular items is maximal in the mid-order blocksmdashit isa lesser effect at early and later stages of learning (Fig 1)

These patterns are conrmed in the somewhat noisier RT data where thefollowing sources of variation were signicant at least in the overall analysis(a) frequency [overall analysis F(1 5123) 5 65074 P 0001 by subjectsF(1 6) 5 6308 P 0001 by words F(1 16) 5 7396 P 0001] (b)

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 315

regularity [overall analysis F(1 5123) 5 1062 P 0001 by subjects F(1 6)5 326 ns by words F(1 16) 1 ns] (c) block [F(14 5123) 5 2872 P 0001] (d) regularity by frequency [overall analysis F(1 5123) 5 2092 P 0001 by subjects F(1 6) 5 1015 P 002 by words F(1 16) 5 215 ns] (e)regularity by frequency by block [F(14 5123) 5 195 P 005]

It is clear from both panels of Fig 1 that there was much less regularityeffect for high frequency items than for low frequency items and incounterpart that the frequency effect was less for regular items Inparticular if the last four blocks of training are taken being typical of moreuent performance they demonstrate that ceiling effects on the accuracydata allow no frequency effect for the regular items whereas the effect offrequency is maintained for the irregular ones The RT curves in theright-hand panel of Fig 1 are clearly non-linear In each case a powerfunction better ts the data than does a linear function the R2s for the powerfunction ts being respectively HiFreqReg 094 HiFreqIrreg 097LoFreqReg 074 LoFreqIrreg 076 Thus the frequency by regularityinteraction seems a natural result of asymptotic performance limits forcorrectness the 100 accuracy ceiling for RT the latency ldquooorrdquo governedby the power law of practice The curves in Fig 1 give no hint of a suddenstep in performance whereafter all regular items are produced with similarefciency

Discussion of Human Data

Like Prasada et al (1990) these data show a regularity by frequencyinteraction in the processing of morphology However contra Prasada et althe present data which concern the learning of morphology demonstrate(a) that there are frequency effects (both on accuracy and RT) for regularitems in the early stages of acquisition (b) the sizes of these effects diminishwith learning (converging on a position at uency as described by Prasada etal) and (c) the size of the frequency effect on irregular items similarlydiminishes with learning but it does so more slowly

These effects are readily explained by simple associative theories oflearning It is not necessary to invoke hybrid systems separating rule-governed regular morphosyntax from associatively stored irregulars Ifthere is one ubiquitous quantitative law of human learning it is the powerlaw of practice (Anderson 1982) The critical feature in this relationship isnot just that performance typically time improves with practice but that therelationship involves the power law in which the amount of improvementdecreases as a function of increasing practice or frequency Anderson (1982)showed that this function applies to a variety of tasks including for examplecigar rolling syllogistic reasoning book writing industrial productionreading inverted text and lexical decision For the case of language

316 ELLIS AND SCHMIDT

acquisition Kirsner (1994) has shown that lexical recognition processes(both for speech perception and reading) and lexical production processes(articulation and writing) are independently governed by the relationshipT5 BN-a where T is some measure of latency of response and N the numberof trials of practice DeKeyser (1997) shows that automatisation ofcomprehension and production performance involving explicitly learnedsecond-language morphosyntax separately follow independent skill-specic power functions Ellis (1996) describes the general implications ofthe power law for second-language acquisition

The human acquisition data in Fig 1 clearly follow the power law oflearning Thus as performance approaches asymptote so previouslyseparated functions tend to converge High frequency items are closer toasymptote Therefore whereas performance levels for regular and irregularitems are clearly distinguishable at low frequencies they are much lessdistinct at high frequencies This comes as no surprise to us when weconsider the ceiling imposed by 100 accuracy But the power law ofpractice equally implies an asymptotic ceiling whatever our performancemeasure

The power law entails that the contribution of any potential independentvariable affecting performance will be more difcult to demonstrate withhigh-frequency items in practised individuals This is certainly the case inreading For example while spelling and graphemendashphoneme regularityhave clear effects on low frequency items they show little or no effectsamong high frequency words (Seidenberg Waters Barnes amp Tanenhaus1984) Our learning data illustrate the same principle operating in theacquisition of morphology It is not the case that there is no regularity effecton high frequency items (or concomitantly no frequency effect on regularitems) it is simply that such effects are much smaller closer to asymptoteand thus are likely to be swamped by random error Indeed highfrequency regular inected forms do exhibit a small (but non-signicant)advantage over low frequency forms in naturally occurring errorsand they can be shown to have a larger (signicant) advantage ina more controlled experimental task in which subjects produced thepast-tense forms of regular English verbs (Stemberger amp MacWhinney1986)

We have shown that the interaction of frequency and regularity resultsfrom developmental trends that are consistent with the ubiquitousdescriptive law of associative learning In the next section we willdemonstrate how such data can be generated by a very general mechanismof associative learning When presented with the same materials at the samerelative frequencies of exposure a standard three-layer feed-forwardconnectionist model closely simulates our language-learnersrsquo acquisitioncurves

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 317

CONNECTIONIST SIMULATIONS

Connectionist models allow the assessment of just how much of languageacquisition can be done by extraction of probabilistic patterns ofgrammatical and morphological regularities Since the only relation inconnectionist models is strength of association between nodes they areexcellent modelling media in which to investigate the formation ofassociations (both between surface-form elements and between these andemergent more abstract internal representations) as a result of exposure tolanguage The advantages of connectionist models over traditional symbolicmodels are that (a) they are neurally inspired (b) they incorporatedistributed representation and control of information (c) they are data-driven with prototypical representations emerging as a natural outcomeof the learning process rather than being prespecied and innately givenby the modellers as in more nativist cognitive accounts (d) they showgraceful degradation as do humans with language disorder and (e)they are in essence models of learning and acquisition rather than staticdescriptions

There have been a number of compelling PDP models of the acquisition ofmorphology The pioneers were Rumelhart and McClelland (1986) whoshowed that a simple learning model reproduced to a remarkable degreethe characteristics of young children learning the morphology of the pasttense in Englishmdashthe model generated the so-called U-shaped learningcurve for irregular forms it exhibited a tendency to overgeneralise and inthe model as in children different past-tense forms for the same word couldco-exist at the same time Yet there was no ldquorulerdquomdashldquoit is possible to imaginethat the system simply stores a set of rote-associations between base andpast-tense forms with novel responses generated by lsquoon-linersquo generalisationsfrom the stored exemplarsrdquo (Rumelhart amp McClelland 1986 p 267) Thisoriginal past-tense model was very inuential It laid the foundations for theconnectionist approach to language research which this special issue attestsit generated a large number of criticisms (Lachter amp Bever 1988 Pinker ampPrince 1988) some of which are undeniably valid and in turn it thusspawned a number of revised and improved PDP models of different aspectsof the acquisition of the English past tense (eg Cottrell amp Plunkett 1994Daugherty amp Seidenberg 1994 MacWhinney amp Leinbach 1991Marchman 1993 Plunkett amp Marchman 1991)

Of these newer models only that of Daugherty and Seidenberg (19921994) addressed the regularity by frequency interaction Their model was athree-layer feed-forward network mapping the input of phonologicalstructure of present tense encoded over 120 phonological units representinga CCCVVCCC template for English monosyllables onto an output ofsimilarly coded phonological structure of past tense form Simulation 1

318 ELLIS AND SCHMIDT

where the model was trained on all presentndashpast tense pairs with Francis andKucera frequencies 1 (309 verbs with regular past tenses 104 verbs withirregular past tenses) failed to generate any frequency by regularityinteraction in error score However when in simulation 2 the number ofirregular verbs in the training set was reduced to just 24 this resulted in therebeing little effect of frequency on performance with the regular itemswhereas performance was better for high frequency irregular verbs than forlow frequency ones This is an important demonstration that the frequencyby regularity interaction can be simulated by a connectionist systemHowever this model concerned mappings between present- and past-tenseforms not direct access from semantics as in our human data Furthermoreit is unclear from these simulations how much the results are due toregularity per se how much to phonological factors (for example insimulation 1 the error scores for regulars in generalisation tests were inatedby there being a high proportion of phonologically similar irregular pasttense false friends in the training corpus 1994 p 375) and given thecontrasting results of simulations 1 and 2 how much to the particular choiceof training items and the relative proportions of regular and irregular items

Indeed much of the debate over the validity of all of these models hasconcerned (a) the adequacy of the adopted low-level phonologicalrepresentations whether these might serve as TRICS (The RepresentationsIt Crucially Supposes) which cryptoembody rules within the connectionistnetwork (Lachter amp Bever 1988) (b) over-reliance on phonological cues inmodels that used sound-to-sound conversions to link base forms with pasttense forms (Daugherty amp Seidenberg 1992 MacWhinney 1994MacWhinney amp Leinbach 1991) and (c) the appropriateness of the trainingsets that are used in exposing the models to the evidence of language andwhether they properly reect the types and tokens in representative ratiosof regular and irregular forms in a sequence that plausibly mirrors learnerlanguage exposure at different stages of development (Daugherty ampSeidenberg 1992 Plunkett amp Marchman 1991) The models are usuallyconcerned with child learner language exposure yet here the extrapolationis particularly tenuous since adult language frequency norms are typicallythe only available reference database

In our simple demonstration with its intended focus on the frequency byregularity interaction in the acquisition of morphology we circumventedthese problems by the following means

1 We eliminated TRICS from our input and output representations byentirely ignoring the low level representations and instead simply havingone input unit for each picture and one output unit for each morphemeWe make no pretence of plausibility of these models for low levels ofrepresentation in either input or output processing but we are presently

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 319

neither concerned with low-level feature perception nor the details ofmotor programming for pronunciation Each input unit is supposedbroadly to correspond to activation of some picture detector or ldquoimagenrdquo(Paivio 1986) each output unit to some speech output ldquologogenrdquo(Morton 1979) We acknowledge that these parts of the model are grosslysimplied and we believe that these aspects ultimately involve distributedrepresentations as well However there is one advantage to thissimplicitymdashwhere as here each input detector or output logogen isrepresented by just one unit with all units having the same form there isno scope for making some more similar than others other that is than isdetermined by the frequency of the inputndashoutput mappings Thisencoding scheme allows the most hygienic investigation of frequency andregularity uncontaminated by other factors2 Like Cottrell and Plunkett (1994) we are modelling direct access fromsemantics rather than generating past tense from stem form phonologyBecause there are no phonological representations in our model there isno chance of the results reecting any confound with phonology As usualcosts accompany the benets Our simulations can have no bearing onphonological aspects of inection and thus while they might generatequantitatively clean data unlike the elegant error analyses performed byfor example Daugherty and Seidenberg (1994) and MacWhinney andLeinbach (1991) the error responses in the present simulations will bequalitatively uninteresting3 We eliminated uncertainty about the detailed content of the complexevidence which human learners are exposed to during their early years ofhearing natural language by modelling adult subjectsrsquo learning of theMAL that was reported in the preceding section Because we determinedthe exposure sequence of types and tokens of regular and irregular itemsin this language learning task we could train the models ensuring theidentical history of exposure

The most common architecture of connectionist model has three layersthe input layer of units the output layer and an intervening layer of hiddenunits (HUs) The presence of HUs enables more difcult inputoutputmappings to be learned than would be possible if the input units weredirectly connected to the output units (Broeder amp Plunkett 1994Rumelhart amp McClelland 1986) The most common learning algorithm isldquoback propagationrdquo (Rumelhart Hinton amp Williams 1986) where on eachlearning trial the network compares its output with the target output andany difference is propagated back to the hidden unit weights and in turn tothe input weights in a way that reduces the error Our simulations adoptedthis standard architecture Thus whatever the pattern of results they aregenerated by a very general learning system whose processes were not

320 ELLIS AND SCHMIDT

tweaked in any way to make it particular as a Language Acquisition DeviceSo what are the emergent patterns of language acquisition that result whenthis general associative learning mechanism is applied to the particularcontent of picture stimuli with their corresponding singular and plural lexicalresponses as experienced at the same relative frequencies of exposure as ourhuman learners

The Models

Architecture Every model had 22 input (I-) units Each of I-units 1ndash20represented one of the pictures used in the training set of the AppendixI-unit 21 represented another picture (the generalisation test item TesterP)which was only ever presented for training to the model in the singularmdashlater it was presented as a plural test item to see which plural afx the modelwould choose for this generalisation item (akin to asking you what is theplural of a novel word like ldquowugrdquo) I-unit 22 coded plurality that iswhether a singular stimulus item or a pair were presented Every model had32 output (O-) units O-units 1ndash20 represented the stem forms of the lexisshown in the Appendix O-unit 21 represented the stem form correspondingto I-unit 21 O-units 22ndash31 represented each of the other 10 unique pluralafxes for irregular items O-unit 32 represented the regular plural afxThis numbering of I- and O-units is of course arbitrary and was random-ised across modelsmdashwhat mattered and remained constant was that thesame O-unit was always reinforced whenever a particular I-unit wasactivated

We investigated four different classes of model which differed in theircomputational capacity or resources The larger the number of HUs in amodel the larger the number of connections in the network and the greaterits capacity to learn new associations and abstractions Thus we comparedmodels with 3 5 8 and 15 HUs

Stem Training At the outset the connection weights of the models wererandomised Then just like our human learners the models were rsttrained on the singular forms Each epoch of training consisted of 21 trialsEach trial consisted of presentation of a unique input pattern one for each ofthe input pictures Thus just one of I-units 1ndash21 would be ldquoonrdquo on any trialThroughout the singular training phase I-unit 22 (representing singlepluralstimuli) was set to ldquooffrdquo For each input pattern the model responded with apattern of output over its 32 O-units Initially this was the random result ofthe random connection weights But the model was also presented with thecorrect pattern of output for that corresponding input pattern (eg if I-unit 1

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 321

was on and all others off O-unit 1 should have had value 10 and all otherszero) On each trial the back-propagation algorithm calculated thedifference between the level of activity that was produced on each O-unitand the ldquocorrectrdquo level of activity and a small adjustment was made to theconnection strength to that unit in such a way that when the same processoccurred again a closer approximation to the correct pattern of outputactivation would be achieved The models were trained for 500 epochs ofsingular experience For each size of model we ran ve examples startingwith different arbitrary unit allocation and different initial randomconnection strengths The data we produce for each model is the averageperformance of these ve examples

Plural Training The model weights that resulted from this singulartraining then served as the starting point for another 700 epochs of trainingon plurals The trials constituting each epoch were very similar in nature tothose used with the human learners Each epoch consisted of 81 trialspresented in random order (a) One presentation of each of the 21 singularforms as in the preceding phase (b) ve presentations of each of the ve highfrequency regular (HiFreqReg) plural forms (c) ve presentations of eachof the ve high frequency irregular (HiFreqIrreg) plural forms (d) onepresentation of each of the ve low frequency regular (LoFreqReg) formsand (e) one presentation of each of the ve low frequency irregular(LoFreqIrreg) forms For training trials of type (a) just one of I-units 1ndash21was activated I-unit 22 was off and just the corresponding one of O-units1ndash21 was reinforced For the other training types (bndashe) one of I-units 1ndash20was activated I-unit 22 was on and one of O-units 1ndash20 (the correspondingstem form) along with one of O-units 22ndash32 (the corresponding plural afx)were reinforced The learning algorithm operated as it did in the stemtraining phase At regular intervals we tested the state of learning of themodel by presenting it without feedback with test input patterns thatrepresented the plural cases of all 21 pictures At these tests for eachstimulus we measured the pattern of activation (between 0 [no activation]and 1 [full on]) across O-units 22ndash32 and compared it against the targetplural activation for that input pattern

Results

Regularity by Frequency Figure 2 shows the Root Mean Square (RMS)error calculated across the plural afx O-units (22ndash32) averaged over the veitems in each of the following classes HiFreqReg HiFreqIrreg LoFreqRegLoFreqIrreg at each point in testing of the model These graphs illustratethat learning in all of the models showed clear effects of frequency (high

322

FIG

2

Acq

uisi

tion

data

for

fou

r co

nnec

tioni

st m

odel

s w

ith

incr

easi

ng c

ompu

tati

onal

pow

er t

rain

ed o

n th

e M

AL

mor

phol

ogy

The

re a

re c

lear

reg

ular

ity b

y fr

eque

ncy

inte

ract

ions

in a

ll m

odel

s

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 323

frequency items were learned faster than low frequency ones) regularity(regular items were learned faster than irregular ones) and a frequency byregularity interaction whereby there was much less regularity effect for highfrequency items than for low frequency items and equally that thefrequency effect was less for regular items than for irregular ones

ANOVAs on these RMS data for each size of model demonstrated thatthere was high consistency of response across items and examplesimulations For example when the 8HU model was analysed as a repeatedmeasures ANOVA across 15 roughly equally spaced blocks of training (toparallel the human data analysis) the following signicant effects wereobserved (a) Frequency [by simulations F(1 16) 5 2080 P 00005 bywords F(1 16) 5 5665 P 00001] (b) regularity [by simulations F(1 16)5 907 P 001 by words F(1 16) 5 3957 P 00001] (c) regularity byfrequency [by simulations F(1 16) 5 485 P 005 by words F(1 16) 51561 P 0005] (d) block [by simulations F(14 224) 5 6803 P 00001by words F(14 224) 5 14914 P 00001] (e) block by regularity [bysimulations F(14 224) 5 3675 P 00001 by words F(14 224) 5 2929 P 00001] (f) block by frequency [by simulations F(14 224) 5 1893 P 00001 by words F(14 224) 5 1184 P 00001] and (g) block by regularityby frequency [by simulations F(14 224) 5 1611 P 00001 by words F(14224) 5 1306 P 00001]

Comparison of this pattern of ANOVA effects with that reported earlierfor the human data shows important similarities in both cases there aresignicant main effects of frequency regularity and blocks and there aresignicant interactions involving regularity by frequency and regularity byfrequency by block Thus the connectionist models demonstrate effectswhich broadly parallel those found in humans

Comparison with Human Data More detailed comparison is alsopossible Although RMS error is the usual measure of model performancebecause it assesses how well the network learns to inhibit non-relevant unitsas well as to excite relevant ones we also extracted simple accuracy data forthe 8HU model This accuracy score is the amount of activation (between 0and 1) on the single O-unit which corresponds to the appropriate target afxfor that input pattern Figure 3 shows the performance of the 8HU modelusing this metric It is clear that accuracy scores generate a graph which iseffectively a reection in a horizontal plane of the RMS data shown in thethird panel of Fig 2 In fact in the current simulations correct activation isalmost perfectly correlated with MSE (for example r 5 2 0988 for the 8HUmodel) However the activation metric has the advantage of more readyinterpretation and direct comparison with the human data

When the 8HU model and the human data are aligned as in Fig 3 thesecorrespondences become clear Pairwise comparison of individual points

324 ELLIS AND SCHMIDT

FIG 3 A comparison of human accuracy performance and that of the eight hidden unitconnectionist simulation

across these two graphs by correlation shows that the simulation predicts alarge proportion of the variance in the human data (R2 5 078) There aresome differences in detailmdashas is claried in Fig 4 where performance isaveraged over blocks the model performs somewhat better on the regularitems and worse on the irregular items particularly the low frequencyirregular items than do the humans ANOVA (three factor [humanmodelregularity and frequency] with 15 blocks as repeated measures by wordsanalysis) comparing the human and 8HU model data conrms theseinteractions (a) humanmodel F(1 32) 5 136 ns (b) humanmodel byfrequency F(1 32) 5 047 ns (c) humanmodel by regularity F(1 32) 53028 P 00001 (d) humanmodel by regularity by frequency F(1 32) 5501 P 005

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 325

FIG 4 The regularity by frequency interaction averaged over blocks in humans and the eighthidden unit model Error bars reect 95 condence intervals

Generalisation So far we have described performance with traineditems However we also tested model output when the stimulus was thepattern for generalisation item (TesterP) along with activation of the pluralmarking I-unit 22 a state of input on which the models had never beentrained Table 1 shows performance of the different models at the end oftraining It is clear that the larger models have abstracted the regular pluralpattern and tend to apply it by default to the generalisation test item for the15HU model (a) average activation on the regular plural O-unit is 060 (b)mean RMS error comparing observed activation across O-units 22ndash32 andthe target regular plural pattern (10000000000) is just 045 and (c) four outof the ve exemplar runs of this size of model chose the regular pluralpattern as being the closest to observed output as measured by minimum

326 ELLIS AND SCHMIDT

TABLE 1Performance on the Target Regular Plural Pattern for the Four Sizes of Model When

Presented with the Generalisation Wug-test Item TesterP at End of Training

Model Size

Measure 3HUa 5HU 8HU 15HU

RMS errorb

M 081 079 053 045SD 043 050 045 032

Activation weightc

M 020 028 057 060SD 044 044 052 035

N hits (5)d 1 2 3 4

There were ve examplars of each size of model aHU 5 hidden units bRMS error calculatedagainst the target activation pattern across O-units 22ndash32 for the regular plural afx cActivationweight on the regular plural afx O-Unit dNumber of exemplar models (5) which chose theregular plural afx pattern for TesterP as indexed by output weights on O-units 22ndash32 beingclosest to the regular plural afx target pattern activation using a squared Euclidean distancemetric

squared Euclidean distance Thus when the larger models are presentedwith a plural stimulus which they have only ever previously experienced as asingle form there is a tendency for them to generalise and apply the regularplural morpheme (bu-) in the same way that humans might generalise thatthe plural of ldquowugrdquo is ldquowugsrdquo

Effects of Different Sizes of Model Figure 2 also illustrates the effects ofmanipulating computational capacity of model (1) Models with lowercomputational power ( 5 a smaller number of HUs) learn the high frequencyitems quite wellmdashalmost as well as the largest model (2) The most strikingeffect of varying the computational power of the models lies in their abilitiesto learn low frequency irregular itemsmdashthis is by far the most sensitive indexof morphological learning ability The 3HU model hardly manages to learnthese forms at all The 15HU model eventually learns them rather well (3)There is essentially no frequency effect for regular items in the highercomputational power models but none the less the frequency effect forirregular items remains strong (4) The smaller models continue to show afrequency effect for regular items at the end of training Table l provides oneadditional effect of model size (5) The greater the computational power ofthe models the more they operate in ldquorule-likerdquo way by abstracting aldquoregularrdquo plural form which is applied by default to novel items In sumwhile lower computational power models are reasonably good on highfrequency regular items they show frequency effects for irregular and

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 327

regular items are relatively poor on ldquowug testsrdquo and have particulardifculty on low frequency irregular items

Discussion of Simulations

We believe that at least for the issue of regularity and frequency effects inmorphosyntax this is to date the most complete quantitative analysis of theadequacy of t of simulation to human data We are not simply makingpredictions about how an underspecied model might behave (theDaugherty amp Seidenberg 1994 criticisms of the Pinker amp Prince 1988 andPinker 1991 theories) We are not simply demonstrating that simulation andhuman data alike exhibit rst order interactions of frequency and regularity(Daugherty amp Seidenberg 1994) Instead we are showing the parallelpatterns of signicance of main effects rst and second order interactions inANOVAs of simulation and human data and we are showing that thesimulations explain close to 80 of the relevant human data When we go asfar as actually comparing human and model performance in a multifactorialANOVA we nd some differences of detail in the size of interactions thatare qualied by the humanmodel factor But these differences of detail donot detract from the general success of the models in simulating the humanpattern of development of the frequency by regularity interaction Inhumans and models alike high frequency items were learned signicantlyfaster than low frequency ones regular items were learned signicantlyfaster than irregular ones there was a signicant frequency by regularityinteraction where the frequency effect was less for regular items than forirregular ones and this is qualied as the higher level interaction with blockwhereby there is a developmental trendmdashthe frequency effect for regularitems attenuates faster than that for irregular items

We have demonstrated that the models can generalise and produce thedefault plural afx for a novel stimulus Similar ldquowug testrdquo performance by ahuman learner would be taken as an operationalisation that they hadacquired the ldquoregularrdquo morphological systematicity

Finally we have shown how varying the computational capacity of themodels affects both the rate of acquisition of default case the presence orabsence of frequency effects for regular items and ability to acquireirregular items This is compatible with existing data for children withspecic language impairment (SLI) Oetting and Rice (1993) compared ve-year-old SLI children with age-matched controls on their ability to formplurals The SLI children were signicantly worse at generating regularplurals for nonce (5 wug) items they were worse at generating regularplurals and they showed an effect of frequency on the regular items whichthe control children because of ceiling effects did not UnfortunatelyOetting and Rice (1993) do not provide clear data on the childrenrsquos ability to

328 ELLIS AND SCHMIDT

form irregular plurals However their pattern of differences between SLIand control childrenrsquos performance on regular items is sufciently close tothat between the present low-capacity and high-capacity simulations tosuggest that morphosyntactic impairments in individuals with SLI might beexplained by reduced language processing capacity in a general associativememory network rather than by a hybrid account The SLI childrenrsquosshowing frequency effects for regular items is particularly compelling in thisrespect However further assessment of regularity by frequency effects anddefault abstraction in individuals with SLI and with Williams syndrome(whose ability on regular forms is said to outstrip their performance onirregularsmdashBellugi Bihrle Jernigan Trauner amp Dougherty 1990) isnecessary to test these parallels further (see Marchman 1993 for othersimulations of different types of language dysfunction)

GENERAL DISCUSSION

Fluent language users have processed many millions of utterances involvingtens of thousands of types presented as innumerable tokens It should comeas no surprise either that they demonstrate such effortless and complex skillas a result of this mass of practice or that researchers lacking any truerecord of the learnersrsquo experience are awed and confused by thesesophisticated grammatical abilities While we have no wish to deny any ofthe complexity of the nal uent state we suspect that much of the mysteryof morphology can be claried by focusing on the acquisition process ratherthan the end-point This has been our aim in this paper Our MAL is atravesty of natural language but at least we know the types and tokens in thelearnersrsquo language evidence and there is no need to speculate or argue aboutextrapolations from corpus data or assumptions about registers

Human learning of this MAL inectional morphology quickly culminatesin a state where as with natural language frequency and regularity haveinteractive effects on performance But as we chart acquisition it is clearthat this interaction need not imply complex dual-mechanisms of processingRather it simply reects the asymptotes expected from the power law ofpractice a simple associative law of learning Thus we have shown that oneof the most frequently introduced arguments for the necessity of adual-mechanism approach a frequency effect for irregulars and the absenceof such an effect for regulars is not a good argument at all Furthermore wehave demonstrated that a simple connectionist model as an implementationof associative learning provided with the same language evidenceaccurately simulates the human acquisition data

But how is the power law instantiated in human and connectionistsystems and what is being associated in the acquisition of inectional

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 329

morphosyntax The power law of learning in human performance has beeninterpreted as resulting from basic associative mechanisms involving theformation of new chunks and the effects of frequency on the accessibility ofthese representations (Newell 1990 Newell amp Rosenbloom 1981)Anderson and Schooler (1991) suggest that memory (both as its behaviouralexpression in error rate and latency and as its neural expression in LTP)displays properties such as the power law of learning because theseproperties reect an optimal response to the environment where theprobability of an item occurring at any particular time is a power function ofits past frequency of occurrence Neural activation which controlsbehaviour reects the probability of an item occurring in the environmentthus the neural processes are designed to adapt behaviour to the statisticalproperties of the environment (Anderson 1993) Connectionist systems aredesigned to do the same thing (Chater 1995)

In our simplied account of inectional morphology where phonologicalfactors are put to one side the relevant units for chunking are the stem formsand the plural afxes From an associative perspective regularity andfrequency are essentially the same factor under different names The rstmeaning of ldquoregularrdquo in the Pocket Oxford Dictionary involves ldquohabitualconstantrdquo acts a denition in terms of statistical frequencies consistencyand descriptive generalisation the second stresses ldquoconforming to a rule orprinciplerdquo We need to disentangle these senses (see Sharwood-Smith 1994and Lima Corrigan amp Iverson 1994 for conceptual analysis of ldquorules oflanguagerdquo) Whether regular morphology is generated according to a rule ornot it is certainly the case for English and the MAL under study here (andgenerally it is the default if not the universal casemdashwe will return to thismatter later) that regular afxes are more habitual or frequent And asdemonstrated in Fig 5 the power law of practice entails that an effect of aconstant increment of regularity (in its frequency sense) is much moreapparent at low than at high frequencies of practice

Although it is a general principle the degree to which it applies dependson a range of factors including (a) the exponent of the power function (b)the particular level of experience attained and thus the placement ofcomparison points on the learning curve and (c) the degree to whichfrequency and regularity are additive or multiplicative In the presentexperiment a vefold increase in the frequency of the regular items resultsin a (5 3 the number of regular items) increase in use of the regular afx avefold increase in the frequency of an irregular item results in merely avefold increase in the use of the irregular afx Thus frequency andregularity are interactive rather than additive But even if we allow forinteraction the function still results in greater regularity effects for lowfrequency itemsmdashjust as for example the power function

330 ELLIS AND SCHMIDT

FIG 5 A frequency by regularity interaction arising from additive contributions of regularity(solid horizontal arrows) and frequency (dotted horizontal arrows) inputting into anasymptoting power function Notice in particular the solid vertical bars measuring out the largeregularity effect at low frequencies and the much smaller one at high frequencies (Adaptedfrom Plaut McClelland Seidenberg amp Patterson 1994)

y 5 1 2 x2 2

asymptotes so does any power function

y 5 1 2 (xn)2 2

where n 0 the shape remains the same albeit stretched or condensedalong the horizontal axis Thus all associative accounts of morphologywhether they stress the importance of type or token frequency (Bybee 1995)in the determination of statistical regularity imply a frequency by regularityinteraction in performance

Plaut et al (1996) analyse the operation of connectionist networks in theparticular quasi-regular domain of spellingndashsound consistency in reading todemonstrate how the frequency by regularity interaction is a direct

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 331

consequence of the nonlinearity adaptivity and distributed representationproperties of learning and representation in PDP networks In what followswe will minimally rephrase their analysis as it applies to the quasi-regulardomain of inectional morphology In a connectionist network the weightchanges induced by an inputoutput pattern (IOP) on any training epochserve to reduce the error on that IOP The frequency of the IOP (and theunits it involves) is reected in how often it is presented to the network Thusword frequency directly amplies weight changes that are helpful to theIOP itself Consistency of the morphological inections of two stems isreected in the similarity of afx units that are co-activated in their IOPsFurthermore two inputs will induce similar weight changes to the extentthat they activate similar units In our MAL as an extreme case consistentforms all activate the same afx unit irregular ones each activate a differentidiosyncratic afx Given that the weight changes that are induced by eachIOP are superimposed on the weight changes for all other IOPs an IOPwill tend to be helped by the weight changes for IOPs whose inputoutputmappings are consistent with its own and hindered by the weight changesfor inconsistent IOPs Thus frequency and consistency sum because theyboth arise from similar weight changes that are simply added together duringtraining The weight changes result in corresponding increases in thesummed input to output units that should be active and decreases to thesummed units that should be inactive However due to the non-linearity ofthe input-output function of units these changes do not produce directlyproportionate reduction of error Rather as the magnitude of the summedinput to output units increases their states gradually asymptote towards10mdasha given increase in the summed input to a unit yields progressivelysmaller decrements in error over the course of training Thus althoughfrequency and regularity-as-consistency each contribute to the weights andhence to the summed input to units their effect on error is subjected to agradual ceiling effect as unit states are driven towards extremal values

Thus a connectionist associative account of simple morphosyntax as it isembodied in our MAL holds that learning involves associating inputpatterns representing single or plural concepts with stem and afx lemmasacross a large distributed network Frequency of experience increases thestrength of the appropriate IO associations Regularity effects stem fromconsistency the consistent items all involve pairings between plurality andthe regular lemma and thus regularity is frequency by another name Thenetwork sums and abstracts these consistencies but it does so usingnon-linear unit inputndashoutput functions thereby resulting in the frequency byregularity interaction Networks are not simple competitive chunking orMarkov chaining mechanisms working on surface form Their massivelydistributed nature allows the emergence of more abstract internalrepresentations We have argued that this analysis accounts for the human

332 ELLIS AND SCHMIDT

acquisition data of simple MAL morphosyntax quite well We believe thatthe acquisition of natural language morphosyntax where there are manyadditional factors of different phonological consistencies (of the type forexample where the neighbours sink drink and stink are irregular in theirpast tenses but all behave in the same -ankway) are equally conducive to theprinciples of this type of account although as illustrated in grandersimulation enterprises (Cottrell amp Plunkett 1994 Daugherty amp Seidenberg1994 MacWhinney amp Leinbach 1991 Marchman 1993 Plunkett ampMarchman 1993) the complexity of interaction of the factors that are therein the language evidence leads to much more complex developmentaloutcomes Our role here has been to study human acquisition underprecisely known circumstances and to demonstrate just how well aconnectionist associative account can simulate these data

A simple regularity5 consistency account of this type will have difculty ifthe ldquoregularrdquo or ldquodefaultrdquo case is not the most frequent case in a naturallanguage Although there is agreement for English past tense and formorphology more generally that the default case is more frequent theremay be exceptions Marcus et al (1995) argue that while the German particle-t applies to a much smaller percentage of verbs than its English counterpartand the German plural -s applies only to a small percentage of nounsnevertheless these afxes behave as defaults in the language These defaultsufxations in German could thus pose a problem for statistical orconnectionist accounts of the acquisition of the more frequent patterns asdefault since they may not be due to a large number of regular wordsreinforcing a pattern in associative memory (Prasada amp Pinker 1993)However this is still a matter of some debate Bybee (1995) suggests that amore reasonable method of counting German particle type frequency doesshow the default (or ldquoproductiverdquo) process to have the highest typefrequency She also argues that to a large extent the productivity patterns ofGerman plurals also reect their type frequency Nakisa and Hahn (1996)and Plunkett and Nakisa (in press) show that generalisation to unseen ornovel forms in German and Arabic (where there have also been claims for aminority default) is more accurately predicted by their phonologicalsimilarity to existing forms in the language (properly represented for typeand token frequency) rather than by the operation of a default rule FinallyHare Elman and Daugherty (1995) demonstrate that multilayerednetworks can develop a default category even in the absence of superior typefrequency as long as the non-default classes are well dened and narrowlydened so that they serve as strong prototypes for analogising to novelforms In such cases the area outside these well-dened attractor basins canconstitute a potential default (see also Plunkett amp Marchman 1991)

In the original hybrid model irregulars were stored and accessed fromrote memory Pinker and Prince (1994 p 326) modied this part of the

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 333

model arguing that since rote memory could not account (a) for similaritiesbetween the morphological base and irregular forms (eg swingndashswung) (b)for similarity within sets of base forms undergoing similar processes (egsingndashsang ringndashrang springndashsprang) or (c) for the kind of semi-productivityshown when children produce errors such as bringndashbrang or swingndashswangthe memory system underlying such productions must be associative anddynamic somewhat as connectionism portrays it Yet to account for datasuch as the frequencyregularity interaction this revised hybrid model stillholds that regular forms are rule-governed But a purely rule-based accountof regulars cannot explain false friends effects where regular inconsistentitems (eg bakendashbaked is similar in rhyme to neighbours makendashmade andtakendashtook which have inconsistent past tenses) are produced more slowlythat entirely regular ones (Daugherty amp Seidenberg 1994 Seidenberg ampBruck 1990) or frequency effects on regular forms (Oetting amp Rice 1993Stemberger amp MacWhinney 1986) Unlike connectionist models a rule-based account of regulars cannot explain these aspects of the human dataNor is the regularityfrequency interaction any reason to reject connectionistaccounts of morphosyntax in favour of a hybrid model

REFERENCESAnderson JR (1982) Acquisition of cognitive skill Psychological Review 89 369ndash406Anderson JR (1993) Rules of the mind Hillsdale NJ Lawrence Erlbaum Associates IncAnderson JR amp Schooler LJ (1991) Reections of the environment in memory

Psychological Science 2 396ndash408Beck M (1995) Tracking down the source of NSndashNNS differences in syntactic competence

Unpublished manuscript University of North TexasBellugi U Bihrle A Jernigan D Trauner D amp Dougherty S (1990)

Neuropsychological neurological and neuroanatomical prole of Williams SyndromeAmerican Journal of Medical Genetics 6 115ndash125

Braine MDS Brody RE Brooks PJ Sudhalter V Ross JA Catalano L amp FischSM (1990) Exploring language acquisition in children with a miniature articiallanguage Effects of item and pattern frequency arbitrary subclasses and correctionJournal of Memory and Language 29 591ndash610

Broeder P amp Plunkett K (1994) Connectionism and second language acquisition In NEllis (Ed) Implicit and explicit learning of languages (pp 421ndash454) London AcademicPress

Bybee J (1995) Regular morphology and the lexicon Language and Cognitive Processes10 425ndash455

Chater N (1995) Neural networks The new statistical models of mind In JP Levy DBairaktaris JA Bullinaria amp P Cairns (Eds) Connectionist models of memory andlanguage London UCL Press

Cohen JD MacWhinney B Flatt M amp Provost J (1993) PsyScope A new graphicinteractive environment for designing psychology experiments Behavioral ResearchMethods Instruments and Computers 25 257ndash271

Cottrell G amp Plunkett K (1994) Acquiring the mapping from meaning to soundsConnection Science 6 379ndash412

334 ELLIS AND SCHMIDT

Daugherty KG amp Seidenberg MS (1992) Rules or connections The past tense revisitedIn Proceedings of the 14th annual conference of the Cognitive Science Society (pp 259ndash264)Pittsburgh PA Cognitive Science Society

Daugherty KG amp Seidenberg MS (1994) Beyond rules and exceptions A connectionistapproach to inectional morphology In SD Lima RL Corrigan amp GK Iverson (Eds)The reality of linguistic rules (pp 353ndash388) Amsterdam John Benjamins

DeKeyser R (1997) Beyond explicit rule learning Automatizing second languagemorphosyntax Studies in Second Language Acquisition 19 195ndash222

Ellis NC (1996) Sequencing in SLA Phonological memory chunking and points of orderStudies in Second Language Acquisition 18 91ndash126

Eubank L amp Gregg KR (1995) ldquoEt in Amygdala Egordquo UG (S)LA and neurobiologyStudies in Second Language Acquisition 17 35ndash58

Hare M Elman JL amp Daugherty KG (1995) Default generalisation in connectionistnetworks Language and Cognitive Processes 10 601ndash630

Jung J (1971) The experimenterrsquos dilemma New York Harper amp RowKirsner K (1994) Implicit processes in second language learning In N Ellis (Ed) Implicit

and explicit learning of languages (pp 283ndash312) London Academic PressLachter J amp Bever T (1988) The relation between linguistic structure and associative

theories of language learning A constructive critique of some connectionist learningmodels Cognition 28 195ndash247

Lima SD Corrigan RL amp Iverson GK (Eds) (1994) The reality of linguistic rulesAmsterdam John Benjamins

MacWhinney B (1983) Miniature language systems as tests of use of universal operatingprinciples in second-language learning by children and adults Journal of PsycholinguisticResearch 12 467ndash478

MacWhinney B (1994) The dinosaurs and the ring In SD Lima RL Corrigan amp GKIverson (Eds) The reality of linguistic rules (pp 283ndash320) Amsterdam John Benjamins

MacWhinney B amp Leinbach J (1991) Implementations are not conceptualizationsRevising the verb learning model Cognition 40 121ndash157

Marchman VA (1993) Constraints on plasticity in a connectionist model of the Englishpast tense Journal of Cognitive Neuroscience 5 215ndash234

Marcus GF Brinkmann U Clahsen H Wiese R amp Pinker S (1995) Germaninection The exception that proves the rule Cognitive Psychology 29 198ndash256

McLaughlin B (1980) On the use of miniature articial languages in second-languageresearch Applied Psycholinguistics 1 357ndash369

Moeser SD amp Bregman AS (1972) The role of reference in the acquisition of a miniaturearticial language Journal of Verbal Learning and Verbal Behavior 11 759ndash769

Morgan JL Meier RP amp Newport EL (1987) Structural packaging in the input tolanguage learning Contributions of prosodic and morphological marking of phrases to theacquisition of language Cognitive Psychology 19 498ndash550

Morgan JL amp Newport EL (1981) The role of constituent structure in the induction of anarticial language Journal of Verbal Learning and Verbal Behavior 20 67ndash85

Morton J (1979) Facilitation in word recognition Experiments causing change in thelogogen model In PA Kolers ME Wrolstad amp M Bouma (Eds) Processing of visiblelanguage (pp 259ndash268) New York Plenum

Nakisa R amp Hahn U (1996) Where defaults donrsquot help The case of the German pluralsystem In Proceedings of the 18th annual conference of the Cognitive Science Society (pp177ndash182) Hillsdale NJ Lawrence Erlbaum Associates Inc

Newell A (1990) Unied theories of cognition Cambridge MA Harvard University PressNewell A amp Rosenbloom P (1981) Mechanisms of skill acquisition and the law of

practice In JR Anderson (Ed) Cognitive skills and their acquisition Hillsdale NJLawrence Erlbaum Associates Inc

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 335

Oetting JB amp Rice ML (1993) Plural acquisition in children with specic languageimpairment Journal of Speech and Hearing Research 36 1236ndash1248

Paivio A (1986) Mental representations A dual coding approach Oxford UK OxfordUniversity Press

Palermo DS amp Howe HE (1970) An experimental analogy to the learning of past-tenseinection rules Journal of Verbal Learning and Verbal Behavior 9 410ndash416

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S amp Prince A (1988) On language and connectionism Analysis of a parallel

distributed processing model of language acquisition Cognition 29 195ndash247Pinker S amp Prince A (1994) Regular and irregular morphology and the psychological

status of rules of grammar In SD Lima RL Corrigan amp GK Iverson (Eds) The reality oflinguistic rules (pp 321ndash351) Amsterdam John Benjamins

Plaut DC McClelland JL Seidenberg MS amp Patterson KE (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plunkett K amp Marchman V (1991) U-shaped learning and frequency effects in amulti-layered perceptron Implications for child language acquisition Cognition 38 3ndash102

Plunkett K amp Marchman V (1993) From rote learning to system building Acquiring verbmorphology in children and connectionist nets Cognition 48 21ndash69

Plunkett K amp Nakisa RC (in press) A connectionist model of Arabic plural systemLanguage and Cognitive Processes

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Prasada S Pinker S amp Snyder W (1990) Some evidence that irregular forms are retrievedfrom memory but regular forms are rule-governed Paper presented at the 31st meeting ofthe Psychonomic Society New Orleans November

Rumelhart D Hinton G amp Williams R (1986) Learning internal representations by backpropagation In DE Rumelhart amp JL McClelland (Ed) Parallel distributed processingExplorations in the microstructure of cognition Cambridge MA MIT Press

Rumelhart D amp McClelland J (1986) On learning the past tense of English verbs In DERumelhart amp JL McClelland (Eds) Parallel distributed processing Explorations in themicrostructure of cognition Vol 2 Psychological and biological models (pp 272ndash326)Cambridge MA MIT Press

Seidenberg MS amp Bruck M (1990) Consistency effects in the generation of past tensemorphology Paper presented at the 31st meeting of the Psychonomic Society New OrleansNovember

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behavior 23 383ndash404

Sharwood Smith MA (1994) The unruly world of language In N Ellis (Ed) Implicit andexplicit learning of languages (pp 33ndash44) London Academic Press

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Human Learning and Memory 6 174ndash215

Stemberger JP amp MacWhinney B (1986) Frequency and the lexical storage of regularlyinected forms Memory and Cognition 14 17ndash26

Winter B amp Reber AS (1994) Implicit learning and the acquisition of natural languagesIn N Ellis (Ed) Implicit and explicit learning of languages (pp 115ndash146) LondonAcademic Press

Yang LR amp Givon T (1997) Benets and drawbacks of controlled laboratory studies ofsecond language acquisition The Keck second language learning project Studies in SecondLanguage Acquisition 19 173ndash194

336 ELLIS AND SCHMIDT

APPENDIX

The Word-forms of the Articial Language

Picture Stem Plural Form Frequency Regularity

car garth bugarth 5 Rbed pid bupid 1 Rlamp lant bulant 5 Rtable tib butid 1 Rplane poon bupoon 5 Rball prill buprill 1 Rtrain dram budram 5 Rhouse hize buhize 1 Rbook bisk bubisk 5 Rbroom breen bubreen 1 Rphone feem gofeem 5 Iumbrella brol gubrol 1 Ichair charp zecharp 5 Ihorse naig zonaig 1 Imonkey chonk nuchonk 5 Idog woop niwoop 1 Ielephant fant vefant 5 Iscissors zoze vuzoze 1 Ikite kag rekag 5 Ish pisc ropisc 1 I

312 ELLIS AND SCHMIDT

learning that occurs to some considerable degree in L1 and even more so inintentional foreign language learning

Subjects learned the stem forms before studying the plural forms In thestem learning phase all items appeared equally often In the subsequentplural learning phase in order to maximise the sensitivity of the reactiontime (RT) measure plurality in the MAL was marked by a prex Half of theitems had a regular plural marker (ldquobu-rdquo) the remaining 10 items hadidiosyncratic afxes as shown in the Appendix The use of a prexinectional system afforded the additional advantage of minimising transfereffects from the subjectsrsquo rst language since although it is found in naturallanguages like Ndebele it is quite different from English plural formationThus the MAL was designed with English cognates in order to promotepositive transfer of learning of the stem forms and a very different inectionsystem in order to exclude any morphological transfer Frequency wasfactorially crossed with regularity with half of each set being presented vetimes more often The high and low frequency irregular items were matchedfor initial phoneme to control voice onset time

Method

The experiment was controlled by a Macintosh LCIII computerprogrammed with PsyScope (Cohen MacWhinney Flatt amp Provost 1993)Model pronunciations of the MAL lexis spoken by the rst author wererecorded using MacRecorder Subjectsrsquo vocal reaction times were measuredusing a voice key

Stem Learning Subjects rst learned the stem forms of the MAL Thisphase consisted of blocks of 20 trials In each block every picture appearedonce in a randomly chosen ordermdashthe subjectsrsquo frequency of exposure to allof the stem forms was the same Each trial consisted of the followingsequence (1) one of the pictures appeared mid-screen for 2sec (2) if thesubject thought they knew the picture name they spoke it into themicrophone as quickly as possible (3) 2sec after picture onset the computerspoke the correct name for the picture (4) the experimenter marked thesubjectrsquos utterance as correct or not by pressing one of two keys Thedependent variables were thus correctness and RT These blocks of trialswere repeated until the subjects knew the MAL names for the pictures andcould begin uttering them within 2sec of stimulus-onset to a criterion of100 correct on two successive blocks At this point they graduated to theplural learning phase

Plural Learning This phase used the same procedures except that eachblock consisted of 80 trials presented in random order (1) One presentation

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 313

of each of the 20 singular forms as in the preceding phase (2) vepresentations of each of the ve high frequency regular (HiFreqReg) pluralforms (3) ve presentations of each of the ve high frequency irregular(HiFreqIrreg) plural forms (4) one presentation of each of the ve lowfrequency regular (LoFreqReg) forms and (5) one presentation of each ofthe ve low frequency irregular (LoFreqIrreg) forms On the singular trialsjust one picture appeared midscreen on the plural trials a pair of adjacentidentical pictures appeared This phase continued for several (mean 5 43range 5 0 to 9) blocks beyond the point at which the learners had achieved100 accuracy on all plural forms in order to monitor increasing uency asindexed by RT improvement

Results

Stem Learning The stem learning data will only be presented insummary since the major focus of the experiment lies with the plural formsSubjects took an average of 917 (SD 593) blocks to achieve the criterion ofcorrectness Some stem forms were easier to learn than others (F(19 2161)5 2307 P 0001) Particularly easy words included ldquofantrdquo (92 correctover all trials) ldquopiscrdquo (85) and ldquolantrdquo (78) Particularly difcult wordsincluded ldquoprillrdquo (32) ldquocharprdquo (43) and ldquobreenrdquo (46) However forpurposes of control it is important to note that the stem forms of the itemsthat were later allocated in the Plural Learning phase to regularirregularplural morphology or highlow frequency of exposure did not signicantlydiffer in difculty of learning at this stage Regularity [F(1 16) 5 0703 ns)Frequency [F(1 16) 5 0569 ns) Regularity 3 Frequency (F(1 16) 5 0029ns)

Plural Learning Subjects partook of between 13 and 15 blocks of thisphase

The key interest lies with the rate of acquisition of the plural forms Wewill rst describe analyses of accuracy and then RT These data are shown inFig 1

ANOVA was used to assess the effects of frequency regularity and blockFor the main effects of regularity and frequency and their interaction wereport additional analyses which determine the robustness of these effectswhen separately analysed by subjects and by words There was a signicanteffect of frequency on accuracy with the advantage going to the highfrequency items [overall analysis F(1 5939) 5 43117 P 0001 by subjectsF(1 6) 5 5631 P 0005 by words F(1 16) 5 17200 P 0001] There wasa signicant effect of regularity with the regular plurals being learned betterthan the irregulars [overall analysis F(15939) 5 8152 P 0001 bysubjects F(1 6) 5 664 P 005 by words F(1 16) 5 3050 P 0001]

314 ELLIS AND SCHMIDT

FIG 1 Acquisition data for human learners of the MAL morphology The four curvesillustrate the interactions of regularity and frequency The left-hand panel shows accuracyimproving with practice The right-hand panel shows vocal reaction time diminishing withpractice In this graph as in Figs 2 and 3 the frequency effect for regular items is assessed bycomparing the two solid lines and the frequency effect for irregular items lies in the differencebetween the two dotted lines

There was signicant improvement over blocks [F(14 5939) 5 13200 P 0001] The interaction of regularity by frequency was signicant with thefrequency effect being larger for the irregular items [overall analysis F(15939) 5 7352 P 0001 by subjects F(1 6) 5 1241 P 002 by words F(116) 5 2773 P 0001] A signicant interaction between regularity byfrequency by block [F(14 5939) 5 222 P 0005] shows that the largerfrequency effect for irregular items is maximal in the mid-order blocksmdashit isa lesser effect at early and later stages of learning (Fig 1)

These patterns are conrmed in the somewhat noisier RT data where thefollowing sources of variation were signicant at least in the overall analysis(a) frequency [overall analysis F(1 5123) 5 65074 P 0001 by subjectsF(1 6) 5 6308 P 0001 by words F(1 16) 5 7396 P 0001] (b)

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 315

regularity [overall analysis F(1 5123) 5 1062 P 0001 by subjects F(1 6)5 326 ns by words F(1 16) 1 ns] (c) block [F(14 5123) 5 2872 P 0001] (d) regularity by frequency [overall analysis F(1 5123) 5 2092 P 0001 by subjects F(1 6) 5 1015 P 002 by words F(1 16) 5 215 ns] (e)regularity by frequency by block [F(14 5123) 5 195 P 005]

It is clear from both panels of Fig 1 that there was much less regularityeffect for high frequency items than for low frequency items and incounterpart that the frequency effect was less for regular items Inparticular if the last four blocks of training are taken being typical of moreuent performance they demonstrate that ceiling effects on the accuracydata allow no frequency effect for the regular items whereas the effect offrequency is maintained for the irregular ones The RT curves in theright-hand panel of Fig 1 are clearly non-linear In each case a powerfunction better ts the data than does a linear function the R2s for the powerfunction ts being respectively HiFreqReg 094 HiFreqIrreg 097LoFreqReg 074 LoFreqIrreg 076 Thus the frequency by regularityinteraction seems a natural result of asymptotic performance limits forcorrectness the 100 accuracy ceiling for RT the latency ldquooorrdquo governedby the power law of practice The curves in Fig 1 give no hint of a suddenstep in performance whereafter all regular items are produced with similarefciency

Discussion of Human Data

Like Prasada et al (1990) these data show a regularity by frequencyinteraction in the processing of morphology However contra Prasada et althe present data which concern the learning of morphology demonstrate(a) that there are frequency effects (both on accuracy and RT) for regularitems in the early stages of acquisition (b) the sizes of these effects diminishwith learning (converging on a position at uency as described by Prasada etal) and (c) the size of the frequency effect on irregular items similarlydiminishes with learning but it does so more slowly

These effects are readily explained by simple associative theories oflearning It is not necessary to invoke hybrid systems separating rule-governed regular morphosyntax from associatively stored irregulars Ifthere is one ubiquitous quantitative law of human learning it is the powerlaw of practice (Anderson 1982) The critical feature in this relationship isnot just that performance typically time improves with practice but that therelationship involves the power law in which the amount of improvementdecreases as a function of increasing practice or frequency Anderson (1982)showed that this function applies to a variety of tasks including for examplecigar rolling syllogistic reasoning book writing industrial productionreading inverted text and lexical decision For the case of language

316 ELLIS AND SCHMIDT

acquisition Kirsner (1994) has shown that lexical recognition processes(both for speech perception and reading) and lexical production processes(articulation and writing) are independently governed by the relationshipT5 BN-a where T is some measure of latency of response and N the numberof trials of practice DeKeyser (1997) shows that automatisation ofcomprehension and production performance involving explicitly learnedsecond-language morphosyntax separately follow independent skill-specic power functions Ellis (1996) describes the general implications ofthe power law for second-language acquisition

The human acquisition data in Fig 1 clearly follow the power law oflearning Thus as performance approaches asymptote so previouslyseparated functions tend to converge High frequency items are closer toasymptote Therefore whereas performance levels for regular and irregularitems are clearly distinguishable at low frequencies they are much lessdistinct at high frequencies This comes as no surprise to us when weconsider the ceiling imposed by 100 accuracy But the power law ofpractice equally implies an asymptotic ceiling whatever our performancemeasure

The power law entails that the contribution of any potential independentvariable affecting performance will be more difcult to demonstrate withhigh-frequency items in practised individuals This is certainly the case inreading For example while spelling and graphemendashphoneme regularityhave clear effects on low frequency items they show little or no effectsamong high frequency words (Seidenberg Waters Barnes amp Tanenhaus1984) Our learning data illustrate the same principle operating in theacquisition of morphology It is not the case that there is no regularity effecton high frequency items (or concomitantly no frequency effect on regularitems) it is simply that such effects are much smaller closer to asymptoteand thus are likely to be swamped by random error Indeed highfrequency regular inected forms do exhibit a small (but non-signicant)advantage over low frequency forms in naturally occurring errorsand they can be shown to have a larger (signicant) advantage ina more controlled experimental task in which subjects produced thepast-tense forms of regular English verbs (Stemberger amp MacWhinney1986)

We have shown that the interaction of frequency and regularity resultsfrom developmental trends that are consistent with the ubiquitousdescriptive law of associative learning In the next section we willdemonstrate how such data can be generated by a very general mechanismof associative learning When presented with the same materials at the samerelative frequencies of exposure a standard three-layer feed-forwardconnectionist model closely simulates our language-learnersrsquo acquisitioncurves

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 317

CONNECTIONIST SIMULATIONS

Connectionist models allow the assessment of just how much of languageacquisition can be done by extraction of probabilistic patterns ofgrammatical and morphological regularities Since the only relation inconnectionist models is strength of association between nodes they areexcellent modelling media in which to investigate the formation ofassociations (both between surface-form elements and between these andemergent more abstract internal representations) as a result of exposure tolanguage The advantages of connectionist models over traditional symbolicmodels are that (a) they are neurally inspired (b) they incorporatedistributed representation and control of information (c) they are data-driven with prototypical representations emerging as a natural outcomeof the learning process rather than being prespecied and innately givenby the modellers as in more nativist cognitive accounts (d) they showgraceful degradation as do humans with language disorder and (e)they are in essence models of learning and acquisition rather than staticdescriptions

There have been a number of compelling PDP models of the acquisition ofmorphology The pioneers were Rumelhart and McClelland (1986) whoshowed that a simple learning model reproduced to a remarkable degreethe characteristics of young children learning the morphology of the pasttense in Englishmdashthe model generated the so-called U-shaped learningcurve for irregular forms it exhibited a tendency to overgeneralise and inthe model as in children different past-tense forms for the same word couldco-exist at the same time Yet there was no ldquorulerdquomdashldquoit is possible to imaginethat the system simply stores a set of rote-associations between base andpast-tense forms with novel responses generated by lsquoon-linersquo generalisationsfrom the stored exemplarsrdquo (Rumelhart amp McClelland 1986 p 267) Thisoriginal past-tense model was very inuential It laid the foundations for theconnectionist approach to language research which this special issue attestsit generated a large number of criticisms (Lachter amp Bever 1988 Pinker ampPrince 1988) some of which are undeniably valid and in turn it thusspawned a number of revised and improved PDP models of different aspectsof the acquisition of the English past tense (eg Cottrell amp Plunkett 1994Daugherty amp Seidenberg 1994 MacWhinney amp Leinbach 1991Marchman 1993 Plunkett amp Marchman 1991)

Of these newer models only that of Daugherty and Seidenberg (19921994) addressed the regularity by frequency interaction Their model was athree-layer feed-forward network mapping the input of phonologicalstructure of present tense encoded over 120 phonological units representinga CCCVVCCC template for English monosyllables onto an output ofsimilarly coded phonological structure of past tense form Simulation 1

318 ELLIS AND SCHMIDT

where the model was trained on all presentndashpast tense pairs with Francis andKucera frequencies 1 (309 verbs with regular past tenses 104 verbs withirregular past tenses) failed to generate any frequency by regularityinteraction in error score However when in simulation 2 the number ofirregular verbs in the training set was reduced to just 24 this resulted in therebeing little effect of frequency on performance with the regular itemswhereas performance was better for high frequency irregular verbs than forlow frequency ones This is an important demonstration that the frequencyby regularity interaction can be simulated by a connectionist systemHowever this model concerned mappings between present- and past-tenseforms not direct access from semantics as in our human data Furthermoreit is unclear from these simulations how much the results are due toregularity per se how much to phonological factors (for example insimulation 1 the error scores for regulars in generalisation tests were inatedby there being a high proportion of phonologically similar irregular pasttense false friends in the training corpus 1994 p 375) and given thecontrasting results of simulations 1 and 2 how much to the particular choiceof training items and the relative proportions of regular and irregular items

Indeed much of the debate over the validity of all of these models hasconcerned (a) the adequacy of the adopted low-level phonologicalrepresentations whether these might serve as TRICS (The RepresentationsIt Crucially Supposes) which cryptoembody rules within the connectionistnetwork (Lachter amp Bever 1988) (b) over-reliance on phonological cues inmodels that used sound-to-sound conversions to link base forms with pasttense forms (Daugherty amp Seidenberg 1992 MacWhinney 1994MacWhinney amp Leinbach 1991) and (c) the appropriateness of the trainingsets that are used in exposing the models to the evidence of language andwhether they properly reect the types and tokens in representative ratiosof regular and irregular forms in a sequence that plausibly mirrors learnerlanguage exposure at different stages of development (Daugherty ampSeidenberg 1992 Plunkett amp Marchman 1991) The models are usuallyconcerned with child learner language exposure yet here the extrapolationis particularly tenuous since adult language frequency norms are typicallythe only available reference database

In our simple demonstration with its intended focus on the frequency byregularity interaction in the acquisition of morphology we circumventedthese problems by the following means

1 We eliminated TRICS from our input and output representations byentirely ignoring the low level representations and instead simply havingone input unit for each picture and one output unit for each morphemeWe make no pretence of plausibility of these models for low levels ofrepresentation in either input or output processing but we are presently

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 319

neither concerned with low-level feature perception nor the details ofmotor programming for pronunciation Each input unit is supposedbroadly to correspond to activation of some picture detector or ldquoimagenrdquo(Paivio 1986) each output unit to some speech output ldquologogenrdquo(Morton 1979) We acknowledge that these parts of the model are grosslysimplied and we believe that these aspects ultimately involve distributedrepresentations as well However there is one advantage to thissimplicitymdashwhere as here each input detector or output logogen isrepresented by just one unit with all units having the same form there isno scope for making some more similar than others other that is than isdetermined by the frequency of the inputndashoutput mappings Thisencoding scheme allows the most hygienic investigation of frequency andregularity uncontaminated by other factors2 Like Cottrell and Plunkett (1994) we are modelling direct access fromsemantics rather than generating past tense from stem form phonologyBecause there are no phonological representations in our model there isno chance of the results reecting any confound with phonology As usualcosts accompany the benets Our simulations can have no bearing onphonological aspects of inection and thus while they might generatequantitatively clean data unlike the elegant error analyses performed byfor example Daugherty and Seidenberg (1994) and MacWhinney andLeinbach (1991) the error responses in the present simulations will bequalitatively uninteresting3 We eliminated uncertainty about the detailed content of the complexevidence which human learners are exposed to during their early years ofhearing natural language by modelling adult subjectsrsquo learning of theMAL that was reported in the preceding section Because we determinedthe exposure sequence of types and tokens of regular and irregular itemsin this language learning task we could train the models ensuring theidentical history of exposure

The most common architecture of connectionist model has three layersthe input layer of units the output layer and an intervening layer of hiddenunits (HUs) The presence of HUs enables more difcult inputoutputmappings to be learned than would be possible if the input units weredirectly connected to the output units (Broeder amp Plunkett 1994Rumelhart amp McClelland 1986) The most common learning algorithm isldquoback propagationrdquo (Rumelhart Hinton amp Williams 1986) where on eachlearning trial the network compares its output with the target output andany difference is propagated back to the hidden unit weights and in turn tothe input weights in a way that reduces the error Our simulations adoptedthis standard architecture Thus whatever the pattern of results they aregenerated by a very general learning system whose processes were not

320 ELLIS AND SCHMIDT

tweaked in any way to make it particular as a Language Acquisition DeviceSo what are the emergent patterns of language acquisition that result whenthis general associative learning mechanism is applied to the particularcontent of picture stimuli with their corresponding singular and plural lexicalresponses as experienced at the same relative frequencies of exposure as ourhuman learners

The Models

Architecture Every model had 22 input (I-) units Each of I-units 1ndash20represented one of the pictures used in the training set of the AppendixI-unit 21 represented another picture (the generalisation test item TesterP)which was only ever presented for training to the model in the singularmdashlater it was presented as a plural test item to see which plural afx the modelwould choose for this generalisation item (akin to asking you what is theplural of a novel word like ldquowugrdquo) I-unit 22 coded plurality that iswhether a singular stimulus item or a pair were presented Every model had32 output (O-) units O-units 1ndash20 represented the stem forms of the lexisshown in the Appendix O-unit 21 represented the stem form correspondingto I-unit 21 O-units 22ndash31 represented each of the other 10 unique pluralafxes for irregular items O-unit 32 represented the regular plural afxThis numbering of I- and O-units is of course arbitrary and was random-ised across modelsmdashwhat mattered and remained constant was that thesame O-unit was always reinforced whenever a particular I-unit wasactivated

We investigated four different classes of model which differed in theircomputational capacity or resources The larger the number of HUs in amodel the larger the number of connections in the network and the greaterits capacity to learn new associations and abstractions Thus we comparedmodels with 3 5 8 and 15 HUs

Stem Training At the outset the connection weights of the models wererandomised Then just like our human learners the models were rsttrained on the singular forms Each epoch of training consisted of 21 trialsEach trial consisted of presentation of a unique input pattern one for each ofthe input pictures Thus just one of I-units 1ndash21 would be ldquoonrdquo on any trialThroughout the singular training phase I-unit 22 (representing singlepluralstimuli) was set to ldquooffrdquo For each input pattern the model responded with apattern of output over its 32 O-units Initially this was the random result ofthe random connection weights But the model was also presented with thecorrect pattern of output for that corresponding input pattern (eg if I-unit 1

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 321

was on and all others off O-unit 1 should have had value 10 and all otherszero) On each trial the back-propagation algorithm calculated thedifference between the level of activity that was produced on each O-unitand the ldquocorrectrdquo level of activity and a small adjustment was made to theconnection strength to that unit in such a way that when the same processoccurred again a closer approximation to the correct pattern of outputactivation would be achieved The models were trained for 500 epochs ofsingular experience For each size of model we ran ve examples startingwith different arbitrary unit allocation and different initial randomconnection strengths The data we produce for each model is the averageperformance of these ve examples

Plural Training The model weights that resulted from this singulartraining then served as the starting point for another 700 epochs of trainingon plurals The trials constituting each epoch were very similar in nature tothose used with the human learners Each epoch consisted of 81 trialspresented in random order (a) One presentation of each of the 21 singularforms as in the preceding phase (b) ve presentations of each of the ve highfrequency regular (HiFreqReg) plural forms (c) ve presentations of eachof the ve high frequency irregular (HiFreqIrreg) plural forms (d) onepresentation of each of the ve low frequency regular (LoFreqReg) formsand (e) one presentation of each of the ve low frequency irregular(LoFreqIrreg) forms For training trials of type (a) just one of I-units 1ndash21was activated I-unit 22 was off and just the corresponding one of O-units1ndash21 was reinforced For the other training types (bndashe) one of I-units 1ndash20was activated I-unit 22 was on and one of O-units 1ndash20 (the correspondingstem form) along with one of O-units 22ndash32 (the corresponding plural afx)were reinforced The learning algorithm operated as it did in the stemtraining phase At regular intervals we tested the state of learning of themodel by presenting it without feedback with test input patterns thatrepresented the plural cases of all 21 pictures At these tests for eachstimulus we measured the pattern of activation (between 0 [no activation]and 1 [full on]) across O-units 22ndash32 and compared it against the targetplural activation for that input pattern

Results

Regularity by Frequency Figure 2 shows the Root Mean Square (RMS)error calculated across the plural afx O-units (22ndash32) averaged over the veitems in each of the following classes HiFreqReg HiFreqIrreg LoFreqRegLoFreqIrreg at each point in testing of the model These graphs illustratethat learning in all of the models showed clear effects of frequency (high

322

FIG

2

Acq

uisi

tion

data

for

fou

r co

nnec

tioni

st m

odel

s w

ith

incr

easi

ng c

ompu

tati

onal

pow

er t

rain

ed o

n th

e M

AL

mor

phol

ogy

The

re a

re c

lear

reg

ular

ity b

y fr

eque

ncy

inte

ract

ions

in a

ll m

odel

s

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 323

frequency items were learned faster than low frequency ones) regularity(regular items were learned faster than irregular ones) and a frequency byregularity interaction whereby there was much less regularity effect for highfrequency items than for low frequency items and equally that thefrequency effect was less for regular items than for irregular ones

ANOVAs on these RMS data for each size of model demonstrated thatthere was high consistency of response across items and examplesimulations For example when the 8HU model was analysed as a repeatedmeasures ANOVA across 15 roughly equally spaced blocks of training (toparallel the human data analysis) the following signicant effects wereobserved (a) Frequency [by simulations F(1 16) 5 2080 P 00005 bywords F(1 16) 5 5665 P 00001] (b) regularity [by simulations F(1 16)5 907 P 001 by words F(1 16) 5 3957 P 00001] (c) regularity byfrequency [by simulations F(1 16) 5 485 P 005 by words F(1 16) 51561 P 0005] (d) block [by simulations F(14 224) 5 6803 P 00001by words F(14 224) 5 14914 P 00001] (e) block by regularity [bysimulations F(14 224) 5 3675 P 00001 by words F(14 224) 5 2929 P 00001] (f) block by frequency [by simulations F(14 224) 5 1893 P 00001 by words F(14 224) 5 1184 P 00001] and (g) block by regularityby frequency [by simulations F(14 224) 5 1611 P 00001 by words F(14224) 5 1306 P 00001]

Comparison of this pattern of ANOVA effects with that reported earlierfor the human data shows important similarities in both cases there aresignicant main effects of frequency regularity and blocks and there aresignicant interactions involving regularity by frequency and regularity byfrequency by block Thus the connectionist models demonstrate effectswhich broadly parallel those found in humans

Comparison with Human Data More detailed comparison is alsopossible Although RMS error is the usual measure of model performancebecause it assesses how well the network learns to inhibit non-relevant unitsas well as to excite relevant ones we also extracted simple accuracy data forthe 8HU model This accuracy score is the amount of activation (between 0and 1) on the single O-unit which corresponds to the appropriate target afxfor that input pattern Figure 3 shows the performance of the 8HU modelusing this metric It is clear that accuracy scores generate a graph which iseffectively a reection in a horizontal plane of the RMS data shown in thethird panel of Fig 2 In fact in the current simulations correct activation isalmost perfectly correlated with MSE (for example r 5 2 0988 for the 8HUmodel) However the activation metric has the advantage of more readyinterpretation and direct comparison with the human data

When the 8HU model and the human data are aligned as in Fig 3 thesecorrespondences become clear Pairwise comparison of individual points

324 ELLIS AND SCHMIDT

FIG 3 A comparison of human accuracy performance and that of the eight hidden unitconnectionist simulation

across these two graphs by correlation shows that the simulation predicts alarge proportion of the variance in the human data (R2 5 078) There aresome differences in detailmdashas is claried in Fig 4 where performance isaveraged over blocks the model performs somewhat better on the regularitems and worse on the irregular items particularly the low frequencyirregular items than do the humans ANOVA (three factor [humanmodelregularity and frequency] with 15 blocks as repeated measures by wordsanalysis) comparing the human and 8HU model data conrms theseinteractions (a) humanmodel F(1 32) 5 136 ns (b) humanmodel byfrequency F(1 32) 5 047 ns (c) humanmodel by regularity F(1 32) 53028 P 00001 (d) humanmodel by regularity by frequency F(1 32) 5501 P 005

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 325

FIG 4 The regularity by frequency interaction averaged over blocks in humans and the eighthidden unit model Error bars reect 95 condence intervals

Generalisation So far we have described performance with traineditems However we also tested model output when the stimulus was thepattern for generalisation item (TesterP) along with activation of the pluralmarking I-unit 22 a state of input on which the models had never beentrained Table 1 shows performance of the different models at the end oftraining It is clear that the larger models have abstracted the regular pluralpattern and tend to apply it by default to the generalisation test item for the15HU model (a) average activation on the regular plural O-unit is 060 (b)mean RMS error comparing observed activation across O-units 22ndash32 andthe target regular plural pattern (10000000000) is just 045 and (c) four outof the ve exemplar runs of this size of model chose the regular pluralpattern as being the closest to observed output as measured by minimum

326 ELLIS AND SCHMIDT

TABLE 1Performance on the Target Regular Plural Pattern for the Four Sizes of Model When

Presented with the Generalisation Wug-test Item TesterP at End of Training

Model Size

Measure 3HUa 5HU 8HU 15HU

RMS errorb

M 081 079 053 045SD 043 050 045 032

Activation weightc

M 020 028 057 060SD 044 044 052 035

N hits (5)d 1 2 3 4

There were ve examplars of each size of model aHU 5 hidden units bRMS error calculatedagainst the target activation pattern across O-units 22ndash32 for the regular plural afx cActivationweight on the regular plural afx O-Unit dNumber of exemplar models (5) which chose theregular plural afx pattern for TesterP as indexed by output weights on O-units 22ndash32 beingclosest to the regular plural afx target pattern activation using a squared Euclidean distancemetric

squared Euclidean distance Thus when the larger models are presentedwith a plural stimulus which they have only ever previously experienced as asingle form there is a tendency for them to generalise and apply the regularplural morpheme (bu-) in the same way that humans might generalise thatthe plural of ldquowugrdquo is ldquowugsrdquo

Effects of Different Sizes of Model Figure 2 also illustrates the effects ofmanipulating computational capacity of model (1) Models with lowercomputational power ( 5 a smaller number of HUs) learn the high frequencyitems quite wellmdashalmost as well as the largest model (2) The most strikingeffect of varying the computational power of the models lies in their abilitiesto learn low frequency irregular itemsmdashthis is by far the most sensitive indexof morphological learning ability The 3HU model hardly manages to learnthese forms at all The 15HU model eventually learns them rather well (3)There is essentially no frequency effect for regular items in the highercomputational power models but none the less the frequency effect forirregular items remains strong (4) The smaller models continue to show afrequency effect for regular items at the end of training Table l provides oneadditional effect of model size (5) The greater the computational power ofthe models the more they operate in ldquorule-likerdquo way by abstracting aldquoregularrdquo plural form which is applied by default to novel items In sumwhile lower computational power models are reasonably good on highfrequency regular items they show frequency effects for irregular and

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 327

regular items are relatively poor on ldquowug testsrdquo and have particulardifculty on low frequency irregular items

Discussion of Simulations

We believe that at least for the issue of regularity and frequency effects inmorphosyntax this is to date the most complete quantitative analysis of theadequacy of t of simulation to human data We are not simply makingpredictions about how an underspecied model might behave (theDaugherty amp Seidenberg 1994 criticisms of the Pinker amp Prince 1988 andPinker 1991 theories) We are not simply demonstrating that simulation andhuman data alike exhibit rst order interactions of frequency and regularity(Daugherty amp Seidenberg 1994) Instead we are showing the parallelpatterns of signicance of main effects rst and second order interactions inANOVAs of simulation and human data and we are showing that thesimulations explain close to 80 of the relevant human data When we go asfar as actually comparing human and model performance in a multifactorialANOVA we nd some differences of detail in the size of interactions thatare qualied by the humanmodel factor But these differences of detail donot detract from the general success of the models in simulating the humanpattern of development of the frequency by regularity interaction Inhumans and models alike high frequency items were learned signicantlyfaster than low frequency ones regular items were learned signicantlyfaster than irregular ones there was a signicant frequency by regularityinteraction where the frequency effect was less for regular items than forirregular ones and this is qualied as the higher level interaction with blockwhereby there is a developmental trendmdashthe frequency effect for regularitems attenuates faster than that for irregular items

We have demonstrated that the models can generalise and produce thedefault plural afx for a novel stimulus Similar ldquowug testrdquo performance by ahuman learner would be taken as an operationalisation that they hadacquired the ldquoregularrdquo morphological systematicity

Finally we have shown how varying the computational capacity of themodels affects both the rate of acquisition of default case the presence orabsence of frequency effects for regular items and ability to acquireirregular items This is compatible with existing data for children withspecic language impairment (SLI) Oetting and Rice (1993) compared ve-year-old SLI children with age-matched controls on their ability to formplurals The SLI children were signicantly worse at generating regularplurals for nonce (5 wug) items they were worse at generating regularplurals and they showed an effect of frequency on the regular items whichthe control children because of ceiling effects did not UnfortunatelyOetting and Rice (1993) do not provide clear data on the childrenrsquos ability to

328 ELLIS AND SCHMIDT

form irregular plurals However their pattern of differences between SLIand control childrenrsquos performance on regular items is sufciently close tothat between the present low-capacity and high-capacity simulations tosuggest that morphosyntactic impairments in individuals with SLI might beexplained by reduced language processing capacity in a general associativememory network rather than by a hybrid account The SLI childrenrsquosshowing frequency effects for regular items is particularly compelling in thisrespect However further assessment of regularity by frequency effects anddefault abstraction in individuals with SLI and with Williams syndrome(whose ability on regular forms is said to outstrip their performance onirregularsmdashBellugi Bihrle Jernigan Trauner amp Dougherty 1990) isnecessary to test these parallels further (see Marchman 1993 for othersimulations of different types of language dysfunction)

GENERAL DISCUSSION

Fluent language users have processed many millions of utterances involvingtens of thousands of types presented as innumerable tokens It should comeas no surprise either that they demonstrate such effortless and complex skillas a result of this mass of practice or that researchers lacking any truerecord of the learnersrsquo experience are awed and confused by thesesophisticated grammatical abilities While we have no wish to deny any ofthe complexity of the nal uent state we suspect that much of the mysteryof morphology can be claried by focusing on the acquisition process ratherthan the end-point This has been our aim in this paper Our MAL is atravesty of natural language but at least we know the types and tokens in thelearnersrsquo language evidence and there is no need to speculate or argue aboutextrapolations from corpus data or assumptions about registers

Human learning of this MAL inectional morphology quickly culminatesin a state where as with natural language frequency and regularity haveinteractive effects on performance But as we chart acquisition it is clearthat this interaction need not imply complex dual-mechanisms of processingRather it simply reects the asymptotes expected from the power law ofpractice a simple associative law of learning Thus we have shown that oneof the most frequently introduced arguments for the necessity of adual-mechanism approach a frequency effect for irregulars and the absenceof such an effect for regulars is not a good argument at all Furthermore wehave demonstrated that a simple connectionist model as an implementationof associative learning provided with the same language evidenceaccurately simulates the human acquisition data

But how is the power law instantiated in human and connectionistsystems and what is being associated in the acquisition of inectional

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 329

morphosyntax The power law of learning in human performance has beeninterpreted as resulting from basic associative mechanisms involving theformation of new chunks and the effects of frequency on the accessibility ofthese representations (Newell 1990 Newell amp Rosenbloom 1981)Anderson and Schooler (1991) suggest that memory (both as its behaviouralexpression in error rate and latency and as its neural expression in LTP)displays properties such as the power law of learning because theseproperties reect an optimal response to the environment where theprobability of an item occurring at any particular time is a power function ofits past frequency of occurrence Neural activation which controlsbehaviour reects the probability of an item occurring in the environmentthus the neural processes are designed to adapt behaviour to the statisticalproperties of the environment (Anderson 1993) Connectionist systems aredesigned to do the same thing (Chater 1995)

In our simplied account of inectional morphology where phonologicalfactors are put to one side the relevant units for chunking are the stem formsand the plural afxes From an associative perspective regularity andfrequency are essentially the same factor under different names The rstmeaning of ldquoregularrdquo in the Pocket Oxford Dictionary involves ldquohabitualconstantrdquo acts a denition in terms of statistical frequencies consistencyand descriptive generalisation the second stresses ldquoconforming to a rule orprinciplerdquo We need to disentangle these senses (see Sharwood-Smith 1994and Lima Corrigan amp Iverson 1994 for conceptual analysis of ldquorules oflanguagerdquo) Whether regular morphology is generated according to a rule ornot it is certainly the case for English and the MAL under study here (andgenerally it is the default if not the universal casemdashwe will return to thismatter later) that regular afxes are more habitual or frequent And asdemonstrated in Fig 5 the power law of practice entails that an effect of aconstant increment of regularity (in its frequency sense) is much moreapparent at low than at high frequencies of practice

Although it is a general principle the degree to which it applies dependson a range of factors including (a) the exponent of the power function (b)the particular level of experience attained and thus the placement ofcomparison points on the learning curve and (c) the degree to whichfrequency and regularity are additive or multiplicative In the presentexperiment a vefold increase in the frequency of the regular items resultsin a (5 3 the number of regular items) increase in use of the regular afx avefold increase in the frequency of an irregular item results in merely avefold increase in the use of the irregular afx Thus frequency andregularity are interactive rather than additive But even if we allow forinteraction the function still results in greater regularity effects for lowfrequency itemsmdashjust as for example the power function

330 ELLIS AND SCHMIDT

FIG 5 A frequency by regularity interaction arising from additive contributions of regularity(solid horizontal arrows) and frequency (dotted horizontal arrows) inputting into anasymptoting power function Notice in particular the solid vertical bars measuring out the largeregularity effect at low frequencies and the much smaller one at high frequencies (Adaptedfrom Plaut McClelland Seidenberg amp Patterson 1994)

y 5 1 2 x2 2

asymptotes so does any power function

y 5 1 2 (xn)2 2

where n 0 the shape remains the same albeit stretched or condensedalong the horizontal axis Thus all associative accounts of morphologywhether they stress the importance of type or token frequency (Bybee 1995)in the determination of statistical regularity imply a frequency by regularityinteraction in performance

Plaut et al (1996) analyse the operation of connectionist networks in theparticular quasi-regular domain of spellingndashsound consistency in reading todemonstrate how the frequency by regularity interaction is a direct

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 331

consequence of the nonlinearity adaptivity and distributed representationproperties of learning and representation in PDP networks In what followswe will minimally rephrase their analysis as it applies to the quasi-regulardomain of inectional morphology In a connectionist network the weightchanges induced by an inputoutput pattern (IOP) on any training epochserve to reduce the error on that IOP The frequency of the IOP (and theunits it involves) is reected in how often it is presented to the network Thusword frequency directly amplies weight changes that are helpful to theIOP itself Consistency of the morphological inections of two stems isreected in the similarity of afx units that are co-activated in their IOPsFurthermore two inputs will induce similar weight changes to the extentthat they activate similar units In our MAL as an extreme case consistentforms all activate the same afx unit irregular ones each activate a differentidiosyncratic afx Given that the weight changes that are induced by eachIOP are superimposed on the weight changes for all other IOPs an IOPwill tend to be helped by the weight changes for IOPs whose inputoutputmappings are consistent with its own and hindered by the weight changesfor inconsistent IOPs Thus frequency and consistency sum because theyboth arise from similar weight changes that are simply added together duringtraining The weight changes result in corresponding increases in thesummed input to output units that should be active and decreases to thesummed units that should be inactive However due to the non-linearity ofthe input-output function of units these changes do not produce directlyproportionate reduction of error Rather as the magnitude of the summedinput to output units increases their states gradually asymptote towards10mdasha given increase in the summed input to a unit yields progressivelysmaller decrements in error over the course of training Thus althoughfrequency and regularity-as-consistency each contribute to the weights andhence to the summed input to units their effect on error is subjected to agradual ceiling effect as unit states are driven towards extremal values

Thus a connectionist associative account of simple morphosyntax as it isembodied in our MAL holds that learning involves associating inputpatterns representing single or plural concepts with stem and afx lemmasacross a large distributed network Frequency of experience increases thestrength of the appropriate IO associations Regularity effects stem fromconsistency the consistent items all involve pairings between plurality andthe regular lemma and thus regularity is frequency by another name Thenetwork sums and abstracts these consistencies but it does so usingnon-linear unit inputndashoutput functions thereby resulting in the frequency byregularity interaction Networks are not simple competitive chunking orMarkov chaining mechanisms working on surface form Their massivelydistributed nature allows the emergence of more abstract internalrepresentations We have argued that this analysis accounts for the human

332 ELLIS AND SCHMIDT

acquisition data of simple MAL morphosyntax quite well We believe thatthe acquisition of natural language morphosyntax where there are manyadditional factors of different phonological consistencies (of the type forexample where the neighbours sink drink and stink are irregular in theirpast tenses but all behave in the same -ankway) are equally conducive to theprinciples of this type of account although as illustrated in grandersimulation enterprises (Cottrell amp Plunkett 1994 Daugherty amp Seidenberg1994 MacWhinney amp Leinbach 1991 Marchman 1993 Plunkett ampMarchman 1993) the complexity of interaction of the factors that are therein the language evidence leads to much more complex developmentaloutcomes Our role here has been to study human acquisition underprecisely known circumstances and to demonstrate just how well aconnectionist associative account can simulate these data

A simple regularity5 consistency account of this type will have difculty ifthe ldquoregularrdquo or ldquodefaultrdquo case is not the most frequent case in a naturallanguage Although there is agreement for English past tense and formorphology more generally that the default case is more frequent theremay be exceptions Marcus et al (1995) argue that while the German particle-t applies to a much smaller percentage of verbs than its English counterpartand the German plural -s applies only to a small percentage of nounsnevertheless these afxes behave as defaults in the language These defaultsufxations in German could thus pose a problem for statistical orconnectionist accounts of the acquisition of the more frequent patterns asdefault since they may not be due to a large number of regular wordsreinforcing a pattern in associative memory (Prasada amp Pinker 1993)However this is still a matter of some debate Bybee (1995) suggests that amore reasonable method of counting German particle type frequency doesshow the default (or ldquoproductiverdquo) process to have the highest typefrequency She also argues that to a large extent the productivity patterns ofGerman plurals also reect their type frequency Nakisa and Hahn (1996)and Plunkett and Nakisa (in press) show that generalisation to unseen ornovel forms in German and Arabic (where there have also been claims for aminority default) is more accurately predicted by their phonologicalsimilarity to existing forms in the language (properly represented for typeand token frequency) rather than by the operation of a default rule FinallyHare Elman and Daugherty (1995) demonstrate that multilayerednetworks can develop a default category even in the absence of superior typefrequency as long as the non-default classes are well dened and narrowlydened so that they serve as strong prototypes for analogising to novelforms In such cases the area outside these well-dened attractor basins canconstitute a potential default (see also Plunkett amp Marchman 1991)

In the original hybrid model irregulars were stored and accessed fromrote memory Pinker and Prince (1994 p 326) modied this part of the

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 333

model arguing that since rote memory could not account (a) for similaritiesbetween the morphological base and irregular forms (eg swingndashswung) (b)for similarity within sets of base forms undergoing similar processes (egsingndashsang ringndashrang springndashsprang) or (c) for the kind of semi-productivityshown when children produce errors such as bringndashbrang or swingndashswangthe memory system underlying such productions must be associative anddynamic somewhat as connectionism portrays it Yet to account for datasuch as the frequencyregularity interaction this revised hybrid model stillholds that regular forms are rule-governed But a purely rule-based accountof regulars cannot explain false friends effects where regular inconsistentitems (eg bakendashbaked is similar in rhyme to neighbours makendashmade andtakendashtook which have inconsistent past tenses) are produced more slowlythat entirely regular ones (Daugherty amp Seidenberg 1994 Seidenberg ampBruck 1990) or frequency effects on regular forms (Oetting amp Rice 1993Stemberger amp MacWhinney 1986) Unlike connectionist models a rule-based account of regulars cannot explain these aspects of the human dataNor is the regularityfrequency interaction any reason to reject connectionistaccounts of morphosyntax in favour of a hybrid model

REFERENCESAnderson JR (1982) Acquisition of cognitive skill Psychological Review 89 369ndash406Anderson JR (1993) Rules of the mind Hillsdale NJ Lawrence Erlbaum Associates IncAnderson JR amp Schooler LJ (1991) Reections of the environment in memory

Psychological Science 2 396ndash408Beck M (1995) Tracking down the source of NSndashNNS differences in syntactic competence

Unpublished manuscript University of North TexasBellugi U Bihrle A Jernigan D Trauner D amp Dougherty S (1990)

Neuropsychological neurological and neuroanatomical prole of Williams SyndromeAmerican Journal of Medical Genetics 6 115ndash125

Braine MDS Brody RE Brooks PJ Sudhalter V Ross JA Catalano L amp FischSM (1990) Exploring language acquisition in children with a miniature articiallanguage Effects of item and pattern frequency arbitrary subclasses and correctionJournal of Memory and Language 29 591ndash610

Broeder P amp Plunkett K (1994) Connectionism and second language acquisition In NEllis (Ed) Implicit and explicit learning of languages (pp 421ndash454) London AcademicPress

Bybee J (1995) Regular morphology and the lexicon Language and Cognitive Processes10 425ndash455

Chater N (1995) Neural networks The new statistical models of mind In JP Levy DBairaktaris JA Bullinaria amp P Cairns (Eds) Connectionist models of memory andlanguage London UCL Press

Cohen JD MacWhinney B Flatt M amp Provost J (1993) PsyScope A new graphicinteractive environment for designing psychology experiments Behavioral ResearchMethods Instruments and Computers 25 257ndash271

Cottrell G amp Plunkett K (1994) Acquiring the mapping from meaning to soundsConnection Science 6 379ndash412

334 ELLIS AND SCHMIDT

Daugherty KG amp Seidenberg MS (1992) Rules or connections The past tense revisitedIn Proceedings of the 14th annual conference of the Cognitive Science Society (pp 259ndash264)Pittsburgh PA Cognitive Science Society

Daugherty KG amp Seidenberg MS (1994) Beyond rules and exceptions A connectionistapproach to inectional morphology In SD Lima RL Corrigan amp GK Iverson (Eds)The reality of linguistic rules (pp 353ndash388) Amsterdam John Benjamins

DeKeyser R (1997) Beyond explicit rule learning Automatizing second languagemorphosyntax Studies in Second Language Acquisition 19 195ndash222

Ellis NC (1996) Sequencing in SLA Phonological memory chunking and points of orderStudies in Second Language Acquisition 18 91ndash126

Eubank L amp Gregg KR (1995) ldquoEt in Amygdala Egordquo UG (S)LA and neurobiologyStudies in Second Language Acquisition 17 35ndash58

Hare M Elman JL amp Daugherty KG (1995) Default generalisation in connectionistnetworks Language and Cognitive Processes 10 601ndash630

Jung J (1971) The experimenterrsquos dilemma New York Harper amp RowKirsner K (1994) Implicit processes in second language learning In N Ellis (Ed) Implicit

and explicit learning of languages (pp 283ndash312) London Academic PressLachter J amp Bever T (1988) The relation between linguistic structure and associative

theories of language learning A constructive critique of some connectionist learningmodels Cognition 28 195ndash247

Lima SD Corrigan RL amp Iverson GK (Eds) (1994) The reality of linguistic rulesAmsterdam John Benjamins

MacWhinney B (1983) Miniature language systems as tests of use of universal operatingprinciples in second-language learning by children and adults Journal of PsycholinguisticResearch 12 467ndash478

MacWhinney B (1994) The dinosaurs and the ring In SD Lima RL Corrigan amp GKIverson (Eds) The reality of linguistic rules (pp 283ndash320) Amsterdam John Benjamins

MacWhinney B amp Leinbach J (1991) Implementations are not conceptualizationsRevising the verb learning model Cognition 40 121ndash157

Marchman VA (1993) Constraints on plasticity in a connectionist model of the Englishpast tense Journal of Cognitive Neuroscience 5 215ndash234

Marcus GF Brinkmann U Clahsen H Wiese R amp Pinker S (1995) Germaninection The exception that proves the rule Cognitive Psychology 29 198ndash256

McLaughlin B (1980) On the use of miniature articial languages in second-languageresearch Applied Psycholinguistics 1 357ndash369

Moeser SD amp Bregman AS (1972) The role of reference in the acquisition of a miniaturearticial language Journal of Verbal Learning and Verbal Behavior 11 759ndash769

Morgan JL Meier RP amp Newport EL (1987) Structural packaging in the input tolanguage learning Contributions of prosodic and morphological marking of phrases to theacquisition of language Cognitive Psychology 19 498ndash550

Morgan JL amp Newport EL (1981) The role of constituent structure in the induction of anarticial language Journal of Verbal Learning and Verbal Behavior 20 67ndash85

Morton J (1979) Facilitation in word recognition Experiments causing change in thelogogen model In PA Kolers ME Wrolstad amp M Bouma (Eds) Processing of visiblelanguage (pp 259ndash268) New York Plenum

Nakisa R amp Hahn U (1996) Where defaults donrsquot help The case of the German pluralsystem In Proceedings of the 18th annual conference of the Cognitive Science Society (pp177ndash182) Hillsdale NJ Lawrence Erlbaum Associates Inc

Newell A (1990) Unied theories of cognition Cambridge MA Harvard University PressNewell A amp Rosenbloom P (1981) Mechanisms of skill acquisition and the law of

practice In JR Anderson (Ed) Cognitive skills and their acquisition Hillsdale NJLawrence Erlbaum Associates Inc

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 335

Oetting JB amp Rice ML (1993) Plural acquisition in children with specic languageimpairment Journal of Speech and Hearing Research 36 1236ndash1248

Paivio A (1986) Mental representations A dual coding approach Oxford UK OxfordUniversity Press

Palermo DS amp Howe HE (1970) An experimental analogy to the learning of past-tenseinection rules Journal of Verbal Learning and Verbal Behavior 9 410ndash416

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S amp Prince A (1988) On language and connectionism Analysis of a parallel

distributed processing model of language acquisition Cognition 29 195ndash247Pinker S amp Prince A (1994) Regular and irregular morphology and the psychological

status of rules of grammar In SD Lima RL Corrigan amp GK Iverson (Eds) The reality oflinguistic rules (pp 321ndash351) Amsterdam John Benjamins

Plaut DC McClelland JL Seidenberg MS amp Patterson KE (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plunkett K amp Marchman V (1991) U-shaped learning and frequency effects in amulti-layered perceptron Implications for child language acquisition Cognition 38 3ndash102

Plunkett K amp Marchman V (1993) From rote learning to system building Acquiring verbmorphology in children and connectionist nets Cognition 48 21ndash69

Plunkett K amp Nakisa RC (in press) A connectionist model of Arabic plural systemLanguage and Cognitive Processes

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Prasada S Pinker S amp Snyder W (1990) Some evidence that irregular forms are retrievedfrom memory but regular forms are rule-governed Paper presented at the 31st meeting ofthe Psychonomic Society New Orleans November

Rumelhart D Hinton G amp Williams R (1986) Learning internal representations by backpropagation In DE Rumelhart amp JL McClelland (Ed) Parallel distributed processingExplorations in the microstructure of cognition Cambridge MA MIT Press

Rumelhart D amp McClelland J (1986) On learning the past tense of English verbs In DERumelhart amp JL McClelland (Eds) Parallel distributed processing Explorations in themicrostructure of cognition Vol 2 Psychological and biological models (pp 272ndash326)Cambridge MA MIT Press

Seidenberg MS amp Bruck M (1990) Consistency effects in the generation of past tensemorphology Paper presented at the 31st meeting of the Psychonomic Society New OrleansNovember

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behavior 23 383ndash404

Sharwood Smith MA (1994) The unruly world of language In N Ellis (Ed) Implicit andexplicit learning of languages (pp 33ndash44) London Academic Press

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Human Learning and Memory 6 174ndash215

Stemberger JP amp MacWhinney B (1986) Frequency and the lexical storage of regularlyinected forms Memory and Cognition 14 17ndash26

Winter B amp Reber AS (1994) Implicit learning and the acquisition of natural languagesIn N Ellis (Ed) Implicit and explicit learning of languages (pp 115ndash146) LondonAcademic Press

Yang LR amp Givon T (1997) Benets and drawbacks of controlled laboratory studies ofsecond language acquisition The Keck second language learning project Studies in SecondLanguage Acquisition 19 173ndash194

336 ELLIS AND SCHMIDT

APPENDIX

The Word-forms of the Articial Language

Picture Stem Plural Form Frequency Regularity

car garth bugarth 5 Rbed pid bupid 1 Rlamp lant bulant 5 Rtable tib butid 1 Rplane poon bupoon 5 Rball prill buprill 1 Rtrain dram budram 5 Rhouse hize buhize 1 Rbook bisk bubisk 5 Rbroom breen bubreen 1 Rphone feem gofeem 5 Iumbrella brol gubrol 1 Ichair charp zecharp 5 Ihorse naig zonaig 1 Imonkey chonk nuchonk 5 Idog woop niwoop 1 Ielephant fant vefant 5 Iscissors zoze vuzoze 1 Ikite kag rekag 5 Ish pisc ropisc 1 I

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 313

of each of the 20 singular forms as in the preceding phase (2) vepresentations of each of the ve high frequency regular (HiFreqReg) pluralforms (3) ve presentations of each of the ve high frequency irregular(HiFreqIrreg) plural forms (4) one presentation of each of the ve lowfrequency regular (LoFreqReg) forms and (5) one presentation of each ofthe ve low frequency irregular (LoFreqIrreg) forms On the singular trialsjust one picture appeared midscreen on the plural trials a pair of adjacentidentical pictures appeared This phase continued for several (mean 5 43range 5 0 to 9) blocks beyond the point at which the learners had achieved100 accuracy on all plural forms in order to monitor increasing uency asindexed by RT improvement

Results

Stem Learning The stem learning data will only be presented insummary since the major focus of the experiment lies with the plural formsSubjects took an average of 917 (SD 593) blocks to achieve the criterion ofcorrectness Some stem forms were easier to learn than others (F(19 2161)5 2307 P 0001) Particularly easy words included ldquofantrdquo (92 correctover all trials) ldquopiscrdquo (85) and ldquolantrdquo (78) Particularly difcult wordsincluded ldquoprillrdquo (32) ldquocharprdquo (43) and ldquobreenrdquo (46) However forpurposes of control it is important to note that the stem forms of the itemsthat were later allocated in the Plural Learning phase to regularirregularplural morphology or highlow frequency of exposure did not signicantlydiffer in difculty of learning at this stage Regularity [F(1 16) 5 0703 ns)Frequency [F(1 16) 5 0569 ns) Regularity 3 Frequency (F(1 16) 5 0029ns)

Plural Learning Subjects partook of between 13 and 15 blocks of thisphase

The key interest lies with the rate of acquisition of the plural forms Wewill rst describe analyses of accuracy and then RT These data are shown inFig 1

ANOVA was used to assess the effects of frequency regularity and blockFor the main effects of regularity and frequency and their interaction wereport additional analyses which determine the robustness of these effectswhen separately analysed by subjects and by words There was a signicanteffect of frequency on accuracy with the advantage going to the highfrequency items [overall analysis F(1 5939) 5 43117 P 0001 by subjectsF(1 6) 5 5631 P 0005 by words F(1 16) 5 17200 P 0001] There wasa signicant effect of regularity with the regular plurals being learned betterthan the irregulars [overall analysis F(15939) 5 8152 P 0001 bysubjects F(1 6) 5 664 P 005 by words F(1 16) 5 3050 P 0001]

314 ELLIS AND SCHMIDT

FIG 1 Acquisition data for human learners of the MAL morphology The four curvesillustrate the interactions of regularity and frequency The left-hand panel shows accuracyimproving with practice The right-hand panel shows vocal reaction time diminishing withpractice In this graph as in Figs 2 and 3 the frequency effect for regular items is assessed bycomparing the two solid lines and the frequency effect for irregular items lies in the differencebetween the two dotted lines

There was signicant improvement over blocks [F(14 5939) 5 13200 P 0001] The interaction of regularity by frequency was signicant with thefrequency effect being larger for the irregular items [overall analysis F(15939) 5 7352 P 0001 by subjects F(1 6) 5 1241 P 002 by words F(116) 5 2773 P 0001] A signicant interaction between regularity byfrequency by block [F(14 5939) 5 222 P 0005] shows that the largerfrequency effect for irregular items is maximal in the mid-order blocksmdashit isa lesser effect at early and later stages of learning (Fig 1)

These patterns are conrmed in the somewhat noisier RT data where thefollowing sources of variation were signicant at least in the overall analysis(a) frequency [overall analysis F(1 5123) 5 65074 P 0001 by subjectsF(1 6) 5 6308 P 0001 by words F(1 16) 5 7396 P 0001] (b)

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 315

regularity [overall analysis F(1 5123) 5 1062 P 0001 by subjects F(1 6)5 326 ns by words F(1 16) 1 ns] (c) block [F(14 5123) 5 2872 P 0001] (d) regularity by frequency [overall analysis F(1 5123) 5 2092 P 0001 by subjects F(1 6) 5 1015 P 002 by words F(1 16) 5 215 ns] (e)regularity by frequency by block [F(14 5123) 5 195 P 005]

It is clear from both panels of Fig 1 that there was much less regularityeffect for high frequency items than for low frequency items and incounterpart that the frequency effect was less for regular items Inparticular if the last four blocks of training are taken being typical of moreuent performance they demonstrate that ceiling effects on the accuracydata allow no frequency effect for the regular items whereas the effect offrequency is maintained for the irregular ones The RT curves in theright-hand panel of Fig 1 are clearly non-linear In each case a powerfunction better ts the data than does a linear function the R2s for the powerfunction ts being respectively HiFreqReg 094 HiFreqIrreg 097LoFreqReg 074 LoFreqIrreg 076 Thus the frequency by regularityinteraction seems a natural result of asymptotic performance limits forcorrectness the 100 accuracy ceiling for RT the latency ldquooorrdquo governedby the power law of practice The curves in Fig 1 give no hint of a suddenstep in performance whereafter all regular items are produced with similarefciency

Discussion of Human Data

Like Prasada et al (1990) these data show a regularity by frequencyinteraction in the processing of morphology However contra Prasada et althe present data which concern the learning of morphology demonstrate(a) that there are frequency effects (both on accuracy and RT) for regularitems in the early stages of acquisition (b) the sizes of these effects diminishwith learning (converging on a position at uency as described by Prasada etal) and (c) the size of the frequency effect on irregular items similarlydiminishes with learning but it does so more slowly

These effects are readily explained by simple associative theories oflearning It is not necessary to invoke hybrid systems separating rule-governed regular morphosyntax from associatively stored irregulars Ifthere is one ubiquitous quantitative law of human learning it is the powerlaw of practice (Anderson 1982) The critical feature in this relationship isnot just that performance typically time improves with practice but that therelationship involves the power law in which the amount of improvementdecreases as a function of increasing practice or frequency Anderson (1982)showed that this function applies to a variety of tasks including for examplecigar rolling syllogistic reasoning book writing industrial productionreading inverted text and lexical decision For the case of language

316 ELLIS AND SCHMIDT

acquisition Kirsner (1994) has shown that lexical recognition processes(both for speech perception and reading) and lexical production processes(articulation and writing) are independently governed by the relationshipT5 BN-a where T is some measure of latency of response and N the numberof trials of practice DeKeyser (1997) shows that automatisation ofcomprehension and production performance involving explicitly learnedsecond-language morphosyntax separately follow independent skill-specic power functions Ellis (1996) describes the general implications ofthe power law for second-language acquisition

The human acquisition data in Fig 1 clearly follow the power law oflearning Thus as performance approaches asymptote so previouslyseparated functions tend to converge High frequency items are closer toasymptote Therefore whereas performance levels for regular and irregularitems are clearly distinguishable at low frequencies they are much lessdistinct at high frequencies This comes as no surprise to us when weconsider the ceiling imposed by 100 accuracy But the power law ofpractice equally implies an asymptotic ceiling whatever our performancemeasure

The power law entails that the contribution of any potential independentvariable affecting performance will be more difcult to demonstrate withhigh-frequency items in practised individuals This is certainly the case inreading For example while spelling and graphemendashphoneme regularityhave clear effects on low frequency items they show little or no effectsamong high frequency words (Seidenberg Waters Barnes amp Tanenhaus1984) Our learning data illustrate the same principle operating in theacquisition of morphology It is not the case that there is no regularity effecton high frequency items (or concomitantly no frequency effect on regularitems) it is simply that such effects are much smaller closer to asymptoteand thus are likely to be swamped by random error Indeed highfrequency regular inected forms do exhibit a small (but non-signicant)advantage over low frequency forms in naturally occurring errorsand they can be shown to have a larger (signicant) advantage ina more controlled experimental task in which subjects produced thepast-tense forms of regular English verbs (Stemberger amp MacWhinney1986)

We have shown that the interaction of frequency and regularity resultsfrom developmental trends that are consistent with the ubiquitousdescriptive law of associative learning In the next section we willdemonstrate how such data can be generated by a very general mechanismof associative learning When presented with the same materials at the samerelative frequencies of exposure a standard three-layer feed-forwardconnectionist model closely simulates our language-learnersrsquo acquisitioncurves

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 317

CONNECTIONIST SIMULATIONS

Connectionist models allow the assessment of just how much of languageacquisition can be done by extraction of probabilistic patterns ofgrammatical and morphological regularities Since the only relation inconnectionist models is strength of association between nodes they areexcellent modelling media in which to investigate the formation ofassociations (both between surface-form elements and between these andemergent more abstract internal representations) as a result of exposure tolanguage The advantages of connectionist models over traditional symbolicmodels are that (a) they are neurally inspired (b) they incorporatedistributed representation and control of information (c) they are data-driven with prototypical representations emerging as a natural outcomeof the learning process rather than being prespecied and innately givenby the modellers as in more nativist cognitive accounts (d) they showgraceful degradation as do humans with language disorder and (e)they are in essence models of learning and acquisition rather than staticdescriptions

There have been a number of compelling PDP models of the acquisition ofmorphology The pioneers were Rumelhart and McClelland (1986) whoshowed that a simple learning model reproduced to a remarkable degreethe characteristics of young children learning the morphology of the pasttense in Englishmdashthe model generated the so-called U-shaped learningcurve for irregular forms it exhibited a tendency to overgeneralise and inthe model as in children different past-tense forms for the same word couldco-exist at the same time Yet there was no ldquorulerdquomdashldquoit is possible to imaginethat the system simply stores a set of rote-associations between base andpast-tense forms with novel responses generated by lsquoon-linersquo generalisationsfrom the stored exemplarsrdquo (Rumelhart amp McClelland 1986 p 267) Thisoriginal past-tense model was very inuential It laid the foundations for theconnectionist approach to language research which this special issue attestsit generated a large number of criticisms (Lachter amp Bever 1988 Pinker ampPrince 1988) some of which are undeniably valid and in turn it thusspawned a number of revised and improved PDP models of different aspectsof the acquisition of the English past tense (eg Cottrell amp Plunkett 1994Daugherty amp Seidenberg 1994 MacWhinney amp Leinbach 1991Marchman 1993 Plunkett amp Marchman 1991)

Of these newer models only that of Daugherty and Seidenberg (19921994) addressed the regularity by frequency interaction Their model was athree-layer feed-forward network mapping the input of phonologicalstructure of present tense encoded over 120 phonological units representinga CCCVVCCC template for English monosyllables onto an output ofsimilarly coded phonological structure of past tense form Simulation 1

318 ELLIS AND SCHMIDT

where the model was trained on all presentndashpast tense pairs with Francis andKucera frequencies 1 (309 verbs with regular past tenses 104 verbs withirregular past tenses) failed to generate any frequency by regularityinteraction in error score However when in simulation 2 the number ofirregular verbs in the training set was reduced to just 24 this resulted in therebeing little effect of frequency on performance with the regular itemswhereas performance was better for high frequency irregular verbs than forlow frequency ones This is an important demonstration that the frequencyby regularity interaction can be simulated by a connectionist systemHowever this model concerned mappings between present- and past-tenseforms not direct access from semantics as in our human data Furthermoreit is unclear from these simulations how much the results are due toregularity per se how much to phonological factors (for example insimulation 1 the error scores for regulars in generalisation tests were inatedby there being a high proportion of phonologically similar irregular pasttense false friends in the training corpus 1994 p 375) and given thecontrasting results of simulations 1 and 2 how much to the particular choiceof training items and the relative proportions of regular and irregular items

Indeed much of the debate over the validity of all of these models hasconcerned (a) the adequacy of the adopted low-level phonologicalrepresentations whether these might serve as TRICS (The RepresentationsIt Crucially Supposes) which cryptoembody rules within the connectionistnetwork (Lachter amp Bever 1988) (b) over-reliance on phonological cues inmodels that used sound-to-sound conversions to link base forms with pasttense forms (Daugherty amp Seidenberg 1992 MacWhinney 1994MacWhinney amp Leinbach 1991) and (c) the appropriateness of the trainingsets that are used in exposing the models to the evidence of language andwhether they properly reect the types and tokens in representative ratiosof regular and irregular forms in a sequence that plausibly mirrors learnerlanguage exposure at different stages of development (Daugherty ampSeidenberg 1992 Plunkett amp Marchman 1991) The models are usuallyconcerned with child learner language exposure yet here the extrapolationis particularly tenuous since adult language frequency norms are typicallythe only available reference database

In our simple demonstration with its intended focus on the frequency byregularity interaction in the acquisition of morphology we circumventedthese problems by the following means

1 We eliminated TRICS from our input and output representations byentirely ignoring the low level representations and instead simply havingone input unit for each picture and one output unit for each morphemeWe make no pretence of plausibility of these models for low levels ofrepresentation in either input or output processing but we are presently

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 319

neither concerned with low-level feature perception nor the details ofmotor programming for pronunciation Each input unit is supposedbroadly to correspond to activation of some picture detector or ldquoimagenrdquo(Paivio 1986) each output unit to some speech output ldquologogenrdquo(Morton 1979) We acknowledge that these parts of the model are grosslysimplied and we believe that these aspects ultimately involve distributedrepresentations as well However there is one advantage to thissimplicitymdashwhere as here each input detector or output logogen isrepresented by just one unit with all units having the same form there isno scope for making some more similar than others other that is than isdetermined by the frequency of the inputndashoutput mappings Thisencoding scheme allows the most hygienic investigation of frequency andregularity uncontaminated by other factors2 Like Cottrell and Plunkett (1994) we are modelling direct access fromsemantics rather than generating past tense from stem form phonologyBecause there are no phonological representations in our model there isno chance of the results reecting any confound with phonology As usualcosts accompany the benets Our simulations can have no bearing onphonological aspects of inection and thus while they might generatequantitatively clean data unlike the elegant error analyses performed byfor example Daugherty and Seidenberg (1994) and MacWhinney andLeinbach (1991) the error responses in the present simulations will bequalitatively uninteresting3 We eliminated uncertainty about the detailed content of the complexevidence which human learners are exposed to during their early years ofhearing natural language by modelling adult subjectsrsquo learning of theMAL that was reported in the preceding section Because we determinedthe exposure sequence of types and tokens of regular and irregular itemsin this language learning task we could train the models ensuring theidentical history of exposure

The most common architecture of connectionist model has three layersthe input layer of units the output layer and an intervening layer of hiddenunits (HUs) The presence of HUs enables more difcult inputoutputmappings to be learned than would be possible if the input units weredirectly connected to the output units (Broeder amp Plunkett 1994Rumelhart amp McClelland 1986) The most common learning algorithm isldquoback propagationrdquo (Rumelhart Hinton amp Williams 1986) where on eachlearning trial the network compares its output with the target output andany difference is propagated back to the hidden unit weights and in turn tothe input weights in a way that reduces the error Our simulations adoptedthis standard architecture Thus whatever the pattern of results they aregenerated by a very general learning system whose processes were not

320 ELLIS AND SCHMIDT

tweaked in any way to make it particular as a Language Acquisition DeviceSo what are the emergent patterns of language acquisition that result whenthis general associative learning mechanism is applied to the particularcontent of picture stimuli with their corresponding singular and plural lexicalresponses as experienced at the same relative frequencies of exposure as ourhuman learners

The Models

Architecture Every model had 22 input (I-) units Each of I-units 1ndash20represented one of the pictures used in the training set of the AppendixI-unit 21 represented another picture (the generalisation test item TesterP)which was only ever presented for training to the model in the singularmdashlater it was presented as a plural test item to see which plural afx the modelwould choose for this generalisation item (akin to asking you what is theplural of a novel word like ldquowugrdquo) I-unit 22 coded plurality that iswhether a singular stimulus item or a pair were presented Every model had32 output (O-) units O-units 1ndash20 represented the stem forms of the lexisshown in the Appendix O-unit 21 represented the stem form correspondingto I-unit 21 O-units 22ndash31 represented each of the other 10 unique pluralafxes for irregular items O-unit 32 represented the regular plural afxThis numbering of I- and O-units is of course arbitrary and was random-ised across modelsmdashwhat mattered and remained constant was that thesame O-unit was always reinforced whenever a particular I-unit wasactivated

We investigated four different classes of model which differed in theircomputational capacity or resources The larger the number of HUs in amodel the larger the number of connections in the network and the greaterits capacity to learn new associations and abstractions Thus we comparedmodels with 3 5 8 and 15 HUs

Stem Training At the outset the connection weights of the models wererandomised Then just like our human learners the models were rsttrained on the singular forms Each epoch of training consisted of 21 trialsEach trial consisted of presentation of a unique input pattern one for each ofthe input pictures Thus just one of I-units 1ndash21 would be ldquoonrdquo on any trialThroughout the singular training phase I-unit 22 (representing singlepluralstimuli) was set to ldquooffrdquo For each input pattern the model responded with apattern of output over its 32 O-units Initially this was the random result ofthe random connection weights But the model was also presented with thecorrect pattern of output for that corresponding input pattern (eg if I-unit 1

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 321

was on and all others off O-unit 1 should have had value 10 and all otherszero) On each trial the back-propagation algorithm calculated thedifference between the level of activity that was produced on each O-unitand the ldquocorrectrdquo level of activity and a small adjustment was made to theconnection strength to that unit in such a way that when the same processoccurred again a closer approximation to the correct pattern of outputactivation would be achieved The models were trained for 500 epochs ofsingular experience For each size of model we ran ve examples startingwith different arbitrary unit allocation and different initial randomconnection strengths The data we produce for each model is the averageperformance of these ve examples

Plural Training The model weights that resulted from this singulartraining then served as the starting point for another 700 epochs of trainingon plurals The trials constituting each epoch were very similar in nature tothose used with the human learners Each epoch consisted of 81 trialspresented in random order (a) One presentation of each of the 21 singularforms as in the preceding phase (b) ve presentations of each of the ve highfrequency regular (HiFreqReg) plural forms (c) ve presentations of eachof the ve high frequency irregular (HiFreqIrreg) plural forms (d) onepresentation of each of the ve low frequency regular (LoFreqReg) formsand (e) one presentation of each of the ve low frequency irregular(LoFreqIrreg) forms For training trials of type (a) just one of I-units 1ndash21was activated I-unit 22 was off and just the corresponding one of O-units1ndash21 was reinforced For the other training types (bndashe) one of I-units 1ndash20was activated I-unit 22 was on and one of O-units 1ndash20 (the correspondingstem form) along with one of O-units 22ndash32 (the corresponding plural afx)were reinforced The learning algorithm operated as it did in the stemtraining phase At regular intervals we tested the state of learning of themodel by presenting it without feedback with test input patterns thatrepresented the plural cases of all 21 pictures At these tests for eachstimulus we measured the pattern of activation (between 0 [no activation]and 1 [full on]) across O-units 22ndash32 and compared it against the targetplural activation for that input pattern

Results

Regularity by Frequency Figure 2 shows the Root Mean Square (RMS)error calculated across the plural afx O-units (22ndash32) averaged over the veitems in each of the following classes HiFreqReg HiFreqIrreg LoFreqRegLoFreqIrreg at each point in testing of the model These graphs illustratethat learning in all of the models showed clear effects of frequency (high

322

FIG

2

Acq

uisi

tion

data

for

fou

r co

nnec

tioni

st m

odel

s w

ith

incr

easi

ng c

ompu

tati

onal

pow

er t

rain

ed o

n th

e M

AL

mor

phol

ogy

The

re a

re c

lear

reg

ular

ity b

y fr

eque

ncy

inte

ract

ions

in a

ll m

odel

s

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 323

frequency items were learned faster than low frequency ones) regularity(regular items were learned faster than irregular ones) and a frequency byregularity interaction whereby there was much less regularity effect for highfrequency items than for low frequency items and equally that thefrequency effect was less for regular items than for irregular ones

ANOVAs on these RMS data for each size of model demonstrated thatthere was high consistency of response across items and examplesimulations For example when the 8HU model was analysed as a repeatedmeasures ANOVA across 15 roughly equally spaced blocks of training (toparallel the human data analysis) the following signicant effects wereobserved (a) Frequency [by simulations F(1 16) 5 2080 P 00005 bywords F(1 16) 5 5665 P 00001] (b) regularity [by simulations F(1 16)5 907 P 001 by words F(1 16) 5 3957 P 00001] (c) regularity byfrequency [by simulations F(1 16) 5 485 P 005 by words F(1 16) 51561 P 0005] (d) block [by simulations F(14 224) 5 6803 P 00001by words F(14 224) 5 14914 P 00001] (e) block by regularity [bysimulations F(14 224) 5 3675 P 00001 by words F(14 224) 5 2929 P 00001] (f) block by frequency [by simulations F(14 224) 5 1893 P 00001 by words F(14 224) 5 1184 P 00001] and (g) block by regularityby frequency [by simulations F(14 224) 5 1611 P 00001 by words F(14224) 5 1306 P 00001]

Comparison of this pattern of ANOVA effects with that reported earlierfor the human data shows important similarities in both cases there aresignicant main effects of frequency regularity and blocks and there aresignicant interactions involving regularity by frequency and regularity byfrequency by block Thus the connectionist models demonstrate effectswhich broadly parallel those found in humans

Comparison with Human Data More detailed comparison is alsopossible Although RMS error is the usual measure of model performancebecause it assesses how well the network learns to inhibit non-relevant unitsas well as to excite relevant ones we also extracted simple accuracy data forthe 8HU model This accuracy score is the amount of activation (between 0and 1) on the single O-unit which corresponds to the appropriate target afxfor that input pattern Figure 3 shows the performance of the 8HU modelusing this metric It is clear that accuracy scores generate a graph which iseffectively a reection in a horizontal plane of the RMS data shown in thethird panel of Fig 2 In fact in the current simulations correct activation isalmost perfectly correlated with MSE (for example r 5 2 0988 for the 8HUmodel) However the activation metric has the advantage of more readyinterpretation and direct comparison with the human data

When the 8HU model and the human data are aligned as in Fig 3 thesecorrespondences become clear Pairwise comparison of individual points

324 ELLIS AND SCHMIDT

FIG 3 A comparison of human accuracy performance and that of the eight hidden unitconnectionist simulation

across these two graphs by correlation shows that the simulation predicts alarge proportion of the variance in the human data (R2 5 078) There aresome differences in detailmdashas is claried in Fig 4 where performance isaveraged over blocks the model performs somewhat better on the regularitems and worse on the irregular items particularly the low frequencyirregular items than do the humans ANOVA (three factor [humanmodelregularity and frequency] with 15 blocks as repeated measures by wordsanalysis) comparing the human and 8HU model data conrms theseinteractions (a) humanmodel F(1 32) 5 136 ns (b) humanmodel byfrequency F(1 32) 5 047 ns (c) humanmodel by regularity F(1 32) 53028 P 00001 (d) humanmodel by regularity by frequency F(1 32) 5501 P 005

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 325

FIG 4 The regularity by frequency interaction averaged over blocks in humans and the eighthidden unit model Error bars reect 95 condence intervals

Generalisation So far we have described performance with traineditems However we also tested model output when the stimulus was thepattern for generalisation item (TesterP) along with activation of the pluralmarking I-unit 22 a state of input on which the models had never beentrained Table 1 shows performance of the different models at the end oftraining It is clear that the larger models have abstracted the regular pluralpattern and tend to apply it by default to the generalisation test item for the15HU model (a) average activation on the regular plural O-unit is 060 (b)mean RMS error comparing observed activation across O-units 22ndash32 andthe target regular plural pattern (10000000000) is just 045 and (c) four outof the ve exemplar runs of this size of model chose the regular pluralpattern as being the closest to observed output as measured by minimum

326 ELLIS AND SCHMIDT

TABLE 1Performance on the Target Regular Plural Pattern for the Four Sizes of Model When

Presented with the Generalisation Wug-test Item TesterP at End of Training

Model Size

Measure 3HUa 5HU 8HU 15HU

RMS errorb

M 081 079 053 045SD 043 050 045 032

Activation weightc

M 020 028 057 060SD 044 044 052 035

N hits (5)d 1 2 3 4

There were ve examplars of each size of model aHU 5 hidden units bRMS error calculatedagainst the target activation pattern across O-units 22ndash32 for the regular plural afx cActivationweight on the regular plural afx O-Unit dNumber of exemplar models (5) which chose theregular plural afx pattern for TesterP as indexed by output weights on O-units 22ndash32 beingclosest to the regular plural afx target pattern activation using a squared Euclidean distancemetric

squared Euclidean distance Thus when the larger models are presentedwith a plural stimulus which they have only ever previously experienced as asingle form there is a tendency for them to generalise and apply the regularplural morpheme (bu-) in the same way that humans might generalise thatthe plural of ldquowugrdquo is ldquowugsrdquo

Effects of Different Sizes of Model Figure 2 also illustrates the effects ofmanipulating computational capacity of model (1) Models with lowercomputational power ( 5 a smaller number of HUs) learn the high frequencyitems quite wellmdashalmost as well as the largest model (2) The most strikingeffect of varying the computational power of the models lies in their abilitiesto learn low frequency irregular itemsmdashthis is by far the most sensitive indexof morphological learning ability The 3HU model hardly manages to learnthese forms at all The 15HU model eventually learns them rather well (3)There is essentially no frequency effect for regular items in the highercomputational power models but none the less the frequency effect forirregular items remains strong (4) The smaller models continue to show afrequency effect for regular items at the end of training Table l provides oneadditional effect of model size (5) The greater the computational power ofthe models the more they operate in ldquorule-likerdquo way by abstracting aldquoregularrdquo plural form which is applied by default to novel items In sumwhile lower computational power models are reasonably good on highfrequency regular items they show frequency effects for irregular and

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 327

regular items are relatively poor on ldquowug testsrdquo and have particulardifculty on low frequency irregular items

Discussion of Simulations

We believe that at least for the issue of regularity and frequency effects inmorphosyntax this is to date the most complete quantitative analysis of theadequacy of t of simulation to human data We are not simply makingpredictions about how an underspecied model might behave (theDaugherty amp Seidenberg 1994 criticisms of the Pinker amp Prince 1988 andPinker 1991 theories) We are not simply demonstrating that simulation andhuman data alike exhibit rst order interactions of frequency and regularity(Daugherty amp Seidenberg 1994) Instead we are showing the parallelpatterns of signicance of main effects rst and second order interactions inANOVAs of simulation and human data and we are showing that thesimulations explain close to 80 of the relevant human data When we go asfar as actually comparing human and model performance in a multifactorialANOVA we nd some differences of detail in the size of interactions thatare qualied by the humanmodel factor But these differences of detail donot detract from the general success of the models in simulating the humanpattern of development of the frequency by regularity interaction Inhumans and models alike high frequency items were learned signicantlyfaster than low frequency ones regular items were learned signicantlyfaster than irregular ones there was a signicant frequency by regularityinteraction where the frequency effect was less for regular items than forirregular ones and this is qualied as the higher level interaction with blockwhereby there is a developmental trendmdashthe frequency effect for regularitems attenuates faster than that for irregular items

We have demonstrated that the models can generalise and produce thedefault plural afx for a novel stimulus Similar ldquowug testrdquo performance by ahuman learner would be taken as an operationalisation that they hadacquired the ldquoregularrdquo morphological systematicity

Finally we have shown how varying the computational capacity of themodels affects both the rate of acquisition of default case the presence orabsence of frequency effects for regular items and ability to acquireirregular items This is compatible with existing data for children withspecic language impairment (SLI) Oetting and Rice (1993) compared ve-year-old SLI children with age-matched controls on their ability to formplurals The SLI children were signicantly worse at generating regularplurals for nonce (5 wug) items they were worse at generating regularplurals and they showed an effect of frequency on the regular items whichthe control children because of ceiling effects did not UnfortunatelyOetting and Rice (1993) do not provide clear data on the childrenrsquos ability to

328 ELLIS AND SCHMIDT

form irregular plurals However their pattern of differences between SLIand control childrenrsquos performance on regular items is sufciently close tothat between the present low-capacity and high-capacity simulations tosuggest that morphosyntactic impairments in individuals with SLI might beexplained by reduced language processing capacity in a general associativememory network rather than by a hybrid account The SLI childrenrsquosshowing frequency effects for regular items is particularly compelling in thisrespect However further assessment of regularity by frequency effects anddefault abstraction in individuals with SLI and with Williams syndrome(whose ability on regular forms is said to outstrip their performance onirregularsmdashBellugi Bihrle Jernigan Trauner amp Dougherty 1990) isnecessary to test these parallels further (see Marchman 1993 for othersimulations of different types of language dysfunction)

GENERAL DISCUSSION

Fluent language users have processed many millions of utterances involvingtens of thousands of types presented as innumerable tokens It should comeas no surprise either that they demonstrate such effortless and complex skillas a result of this mass of practice or that researchers lacking any truerecord of the learnersrsquo experience are awed and confused by thesesophisticated grammatical abilities While we have no wish to deny any ofthe complexity of the nal uent state we suspect that much of the mysteryof morphology can be claried by focusing on the acquisition process ratherthan the end-point This has been our aim in this paper Our MAL is atravesty of natural language but at least we know the types and tokens in thelearnersrsquo language evidence and there is no need to speculate or argue aboutextrapolations from corpus data or assumptions about registers

Human learning of this MAL inectional morphology quickly culminatesin a state where as with natural language frequency and regularity haveinteractive effects on performance But as we chart acquisition it is clearthat this interaction need not imply complex dual-mechanisms of processingRather it simply reects the asymptotes expected from the power law ofpractice a simple associative law of learning Thus we have shown that oneof the most frequently introduced arguments for the necessity of adual-mechanism approach a frequency effect for irregulars and the absenceof such an effect for regulars is not a good argument at all Furthermore wehave demonstrated that a simple connectionist model as an implementationof associative learning provided with the same language evidenceaccurately simulates the human acquisition data

But how is the power law instantiated in human and connectionistsystems and what is being associated in the acquisition of inectional

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 329

morphosyntax The power law of learning in human performance has beeninterpreted as resulting from basic associative mechanisms involving theformation of new chunks and the effects of frequency on the accessibility ofthese representations (Newell 1990 Newell amp Rosenbloom 1981)Anderson and Schooler (1991) suggest that memory (both as its behaviouralexpression in error rate and latency and as its neural expression in LTP)displays properties such as the power law of learning because theseproperties reect an optimal response to the environment where theprobability of an item occurring at any particular time is a power function ofits past frequency of occurrence Neural activation which controlsbehaviour reects the probability of an item occurring in the environmentthus the neural processes are designed to adapt behaviour to the statisticalproperties of the environment (Anderson 1993) Connectionist systems aredesigned to do the same thing (Chater 1995)

In our simplied account of inectional morphology where phonologicalfactors are put to one side the relevant units for chunking are the stem formsand the plural afxes From an associative perspective regularity andfrequency are essentially the same factor under different names The rstmeaning of ldquoregularrdquo in the Pocket Oxford Dictionary involves ldquohabitualconstantrdquo acts a denition in terms of statistical frequencies consistencyand descriptive generalisation the second stresses ldquoconforming to a rule orprinciplerdquo We need to disentangle these senses (see Sharwood-Smith 1994and Lima Corrigan amp Iverson 1994 for conceptual analysis of ldquorules oflanguagerdquo) Whether regular morphology is generated according to a rule ornot it is certainly the case for English and the MAL under study here (andgenerally it is the default if not the universal casemdashwe will return to thismatter later) that regular afxes are more habitual or frequent And asdemonstrated in Fig 5 the power law of practice entails that an effect of aconstant increment of regularity (in its frequency sense) is much moreapparent at low than at high frequencies of practice

Although it is a general principle the degree to which it applies dependson a range of factors including (a) the exponent of the power function (b)the particular level of experience attained and thus the placement ofcomparison points on the learning curve and (c) the degree to whichfrequency and regularity are additive or multiplicative In the presentexperiment a vefold increase in the frequency of the regular items resultsin a (5 3 the number of regular items) increase in use of the regular afx avefold increase in the frequency of an irregular item results in merely avefold increase in the use of the irregular afx Thus frequency andregularity are interactive rather than additive But even if we allow forinteraction the function still results in greater regularity effects for lowfrequency itemsmdashjust as for example the power function

330 ELLIS AND SCHMIDT

FIG 5 A frequency by regularity interaction arising from additive contributions of regularity(solid horizontal arrows) and frequency (dotted horizontal arrows) inputting into anasymptoting power function Notice in particular the solid vertical bars measuring out the largeregularity effect at low frequencies and the much smaller one at high frequencies (Adaptedfrom Plaut McClelland Seidenberg amp Patterson 1994)

y 5 1 2 x2 2

asymptotes so does any power function

y 5 1 2 (xn)2 2

where n 0 the shape remains the same albeit stretched or condensedalong the horizontal axis Thus all associative accounts of morphologywhether they stress the importance of type or token frequency (Bybee 1995)in the determination of statistical regularity imply a frequency by regularityinteraction in performance

Plaut et al (1996) analyse the operation of connectionist networks in theparticular quasi-regular domain of spellingndashsound consistency in reading todemonstrate how the frequency by regularity interaction is a direct

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 331

consequence of the nonlinearity adaptivity and distributed representationproperties of learning and representation in PDP networks In what followswe will minimally rephrase their analysis as it applies to the quasi-regulardomain of inectional morphology In a connectionist network the weightchanges induced by an inputoutput pattern (IOP) on any training epochserve to reduce the error on that IOP The frequency of the IOP (and theunits it involves) is reected in how often it is presented to the network Thusword frequency directly amplies weight changes that are helpful to theIOP itself Consistency of the morphological inections of two stems isreected in the similarity of afx units that are co-activated in their IOPsFurthermore two inputs will induce similar weight changes to the extentthat they activate similar units In our MAL as an extreme case consistentforms all activate the same afx unit irregular ones each activate a differentidiosyncratic afx Given that the weight changes that are induced by eachIOP are superimposed on the weight changes for all other IOPs an IOPwill tend to be helped by the weight changes for IOPs whose inputoutputmappings are consistent with its own and hindered by the weight changesfor inconsistent IOPs Thus frequency and consistency sum because theyboth arise from similar weight changes that are simply added together duringtraining The weight changes result in corresponding increases in thesummed input to output units that should be active and decreases to thesummed units that should be inactive However due to the non-linearity ofthe input-output function of units these changes do not produce directlyproportionate reduction of error Rather as the magnitude of the summedinput to output units increases their states gradually asymptote towards10mdasha given increase in the summed input to a unit yields progressivelysmaller decrements in error over the course of training Thus althoughfrequency and regularity-as-consistency each contribute to the weights andhence to the summed input to units their effect on error is subjected to agradual ceiling effect as unit states are driven towards extremal values

Thus a connectionist associative account of simple morphosyntax as it isembodied in our MAL holds that learning involves associating inputpatterns representing single or plural concepts with stem and afx lemmasacross a large distributed network Frequency of experience increases thestrength of the appropriate IO associations Regularity effects stem fromconsistency the consistent items all involve pairings between plurality andthe regular lemma and thus regularity is frequency by another name Thenetwork sums and abstracts these consistencies but it does so usingnon-linear unit inputndashoutput functions thereby resulting in the frequency byregularity interaction Networks are not simple competitive chunking orMarkov chaining mechanisms working on surface form Their massivelydistributed nature allows the emergence of more abstract internalrepresentations We have argued that this analysis accounts for the human

332 ELLIS AND SCHMIDT

acquisition data of simple MAL morphosyntax quite well We believe thatthe acquisition of natural language morphosyntax where there are manyadditional factors of different phonological consistencies (of the type forexample where the neighbours sink drink and stink are irregular in theirpast tenses but all behave in the same -ankway) are equally conducive to theprinciples of this type of account although as illustrated in grandersimulation enterprises (Cottrell amp Plunkett 1994 Daugherty amp Seidenberg1994 MacWhinney amp Leinbach 1991 Marchman 1993 Plunkett ampMarchman 1993) the complexity of interaction of the factors that are therein the language evidence leads to much more complex developmentaloutcomes Our role here has been to study human acquisition underprecisely known circumstances and to demonstrate just how well aconnectionist associative account can simulate these data

A simple regularity5 consistency account of this type will have difculty ifthe ldquoregularrdquo or ldquodefaultrdquo case is not the most frequent case in a naturallanguage Although there is agreement for English past tense and formorphology more generally that the default case is more frequent theremay be exceptions Marcus et al (1995) argue that while the German particle-t applies to a much smaller percentage of verbs than its English counterpartand the German plural -s applies only to a small percentage of nounsnevertheless these afxes behave as defaults in the language These defaultsufxations in German could thus pose a problem for statistical orconnectionist accounts of the acquisition of the more frequent patterns asdefault since they may not be due to a large number of regular wordsreinforcing a pattern in associative memory (Prasada amp Pinker 1993)However this is still a matter of some debate Bybee (1995) suggests that amore reasonable method of counting German particle type frequency doesshow the default (or ldquoproductiverdquo) process to have the highest typefrequency She also argues that to a large extent the productivity patterns ofGerman plurals also reect their type frequency Nakisa and Hahn (1996)and Plunkett and Nakisa (in press) show that generalisation to unseen ornovel forms in German and Arabic (where there have also been claims for aminority default) is more accurately predicted by their phonologicalsimilarity to existing forms in the language (properly represented for typeand token frequency) rather than by the operation of a default rule FinallyHare Elman and Daugherty (1995) demonstrate that multilayerednetworks can develop a default category even in the absence of superior typefrequency as long as the non-default classes are well dened and narrowlydened so that they serve as strong prototypes for analogising to novelforms In such cases the area outside these well-dened attractor basins canconstitute a potential default (see also Plunkett amp Marchman 1991)

In the original hybrid model irregulars were stored and accessed fromrote memory Pinker and Prince (1994 p 326) modied this part of the

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 333

model arguing that since rote memory could not account (a) for similaritiesbetween the morphological base and irregular forms (eg swingndashswung) (b)for similarity within sets of base forms undergoing similar processes (egsingndashsang ringndashrang springndashsprang) or (c) for the kind of semi-productivityshown when children produce errors such as bringndashbrang or swingndashswangthe memory system underlying such productions must be associative anddynamic somewhat as connectionism portrays it Yet to account for datasuch as the frequencyregularity interaction this revised hybrid model stillholds that regular forms are rule-governed But a purely rule-based accountof regulars cannot explain false friends effects where regular inconsistentitems (eg bakendashbaked is similar in rhyme to neighbours makendashmade andtakendashtook which have inconsistent past tenses) are produced more slowlythat entirely regular ones (Daugherty amp Seidenberg 1994 Seidenberg ampBruck 1990) or frequency effects on regular forms (Oetting amp Rice 1993Stemberger amp MacWhinney 1986) Unlike connectionist models a rule-based account of regulars cannot explain these aspects of the human dataNor is the regularityfrequency interaction any reason to reject connectionistaccounts of morphosyntax in favour of a hybrid model

REFERENCESAnderson JR (1982) Acquisition of cognitive skill Psychological Review 89 369ndash406Anderson JR (1993) Rules of the mind Hillsdale NJ Lawrence Erlbaum Associates IncAnderson JR amp Schooler LJ (1991) Reections of the environment in memory

Psychological Science 2 396ndash408Beck M (1995) Tracking down the source of NSndashNNS differences in syntactic competence

Unpublished manuscript University of North TexasBellugi U Bihrle A Jernigan D Trauner D amp Dougherty S (1990)

Neuropsychological neurological and neuroanatomical prole of Williams SyndromeAmerican Journal of Medical Genetics 6 115ndash125

Braine MDS Brody RE Brooks PJ Sudhalter V Ross JA Catalano L amp FischSM (1990) Exploring language acquisition in children with a miniature articiallanguage Effects of item and pattern frequency arbitrary subclasses and correctionJournal of Memory and Language 29 591ndash610

Broeder P amp Plunkett K (1994) Connectionism and second language acquisition In NEllis (Ed) Implicit and explicit learning of languages (pp 421ndash454) London AcademicPress

Bybee J (1995) Regular morphology and the lexicon Language and Cognitive Processes10 425ndash455

Chater N (1995) Neural networks The new statistical models of mind In JP Levy DBairaktaris JA Bullinaria amp P Cairns (Eds) Connectionist models of memory andlanguage London UCL Press

Cohen JD MacWhinney B Flatt M amp Provost J (1993) PsyScope A new graphicinteractive environment for designing psychology experiments Behavioral ResearchMethods Instruments and Computers 25 257ndash271

Cottrell G amp Plunkett K (1994) Acquiring the mapping from meaning to soundsConnection Science 6 379ndash412

334 ELLIS AND SCHMIDT

Daugherty KG amp Seidenberg MS (1992) Rules or connections The past tense revisitedIn Proceedings of the 14th annual conference of the Cognitive Science Society (pp 259ndash264)Pittsburgh PA Cognitive Science Society

Daugherty KG amp Seidenberg MS (1994) Beyond rules and exceptions A connectionistapproach to inectional morphology In SD Lima RL Corrigan amp GK Iverson (Eds)The reality of linguistic rules (pp 353ndash388) Amsterdam John Benjamins

DeKeyser R (1997) Beyond explicit rule learning Automatizing second languagemorphosyntax Studies in Second Language Acquisition 19 195ndash222

Ellis NC (1996) Sequencing in SLA Phonological memory chunking and points of orderStudies in Second Language Acquisition 18 91ndash126

Eubank L amp Gregg KR (1995) ldquoEt in Amygdala Egordquo UG (S)LA and neurobiologyStudies in Second Language Acquisition 17 35ndash58

Hare M Elman JL amp Daugherty KG (1995) Default generalisation in connectionistnetworks Language and Cognitive Processes 10 601ndash630

Jung J (1971) The experimenterrsquos dilemma New York Harper amp RowKirsner K (1994) Implicit processes in second language learning In N Ellis (Ed) Implicit

and explicit learning of languages (pp 283ndash312) London Academic PressLachter J amp Bever T (1988) The relation between linguistic structure and associative

theories of language learning A constructive critique of some connectionist learningmodels Cognition 28 195ndash247

Lima SD Corrigan RL amp Iverson GK (Eds) (1994) The reality of linguistic rulesAmsterdam John Benjamins

MacWhinney B (1983) Miniature language systems as tests of use of universal operatingprinciples in second-language learning by children and adults Journal of PsycholinguisticResearch 12 467ndash478

MacWhinney B (1994) The dinosaurs and the ring In SD Lima RL Corrigan amp GKIverson (Eds) The reality of linguistic rules (pp 283ndash320) Amsterdam John Benjamins

MacWhinney B amp Leinbach J (1991) Implementations are not conceptualizationsRevising the verb learning model Cognition 40 121ndash157

Marchman VA (1993) Constraints on plasticity in a connectionist model of the Englishpast tense Journal of Cognitive Neuroscience 5 215ndash234

Marcus GF Brinkmann U Clahsen H Wiese R amp Pinker S (1995) Germaninection The exception that proves the rule Cognitive Psychology 29 198ndash256

McLaughlin B (1980) On the use of miniature articial languages in second-languageresearch Applied Psycholinguistics 1 357ndash369

Moeser SD amp Bregman AS (1972) The role of reference in the acquisition of a miniaturearticial language Journal of Verbal Learning and Verbal Behavior 11 759ndash769

Morgan JL Meier RP amp Newport EL (1987) Structural packaging in the input tolanguage learning Contributions of prosodic and morphological marking of phrases to theacquisition of language Cognitive Psychology 19 498ndash550

Morgan JL amp Newport EL (1981) The role of constituent structure in the induction of anarticial language Journal of Verbal Learning and Verbal Behavior 20 67ndash85

Morton J (1979) Facilitation in word recognition Experiments causing change in thelogogen model In PA Kolers ME Wrolstad amp M Bouma (Eds) Processing of visiblelanguage (pp 259ndash268) New York Plenum

Nakisa R amp Hahn U (1996) Where defaults donrsquot help The case of the German pluralsystem In Proceedings of the 18th annual conference of the Cognitive Science Society (pp177ndash182) Hillsdale NJ Lawrence Erlbaum Associates Inc

Newell A (1990) Unied theories of cognition Cambridge MA Harvard University PressNewell A amp Rosenbloom P (1981) Mechanisms of skill acquisition and the law of

practice In JR Anderson (Ed) Cognitive skills and their acquisition Hillsdale NJLawrence Erlbaum Associates Inc

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 335

Oetting JB amp Rice ML (1993) Plural acquisition in children with specic languageimpairment Journal of Speech and Hearing Research 36 1236ndash1248

Paivio A (1986) Mental representations A dual coding approach Oxford UK OxfordUniversity Press

Palermo DS amp Howe HE (1970) An experimental analogy to the learning of past-tenseinection rules Journal of Verbal Learning and Verbal Behavior 9 410ndash416

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S amp Prince A (1988) On language and connectionism Analysis of a parallel

distributed processing model of language acquisition Cognition 29 195ndash247Pinker S amp Prince A (1994) Regular and irregular morphology and the psychological

status of rules of grammar In SD Lima RL Corrigan amp GK Iverson (Eds) The reality oflinguistic rules (pp 321ndash351) Amsterdam John Benjamins

Plaut DC McClelland JL Seidenberg MS amp Patterson KE (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plunkett K amp Marchman V (1991) U-shaped learning and frequency effects in amulti-layered perceptron Implications for child language acquisition Cognition 38 3ndash102

Plunkett K amp Marchman V (1993) From rote learning to system building Acquiring verbmorphology in children and connectionist nets Cognition 48 21ndash69

Plunkett K amp Nakisa RC (in press) A connectionist model of Arabic plural systemLanguage and Cognitive Processes

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Prasada S Pinker S amp Snyder W (1990) Some evidence that irregular forms are retrievedfrom memory but regular forms are rule-governed Paper presented at the 31st meeting ofthe Psychonomic Society New Orleans November

Rumelhart D Hinton G amp Williams R (1986) Learning internal representations by backpropagation In DE Rumelhart amp JL McClelland (Ed) Parallel distributed processingExplorations in the microstructure of cognition Cambridge MA MIT Press

Rumelhart D amp McClelland J (1986) On learning the past tense of English verbs In DERumelhart amp JL McClelland (Eds) Parallel distributed processing Explorations in themicrostructure of cognition Vol 2 Psychological and biological models (pp 272ndash326)Cambridge MA MIT Press

Seidenberg MS amp Bruck M (1990) Consistency effects in the generation of past tensemorphology Paper presented at the 31st meeting of the Psychonomic Society New OrleansNovember

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behavior 23 383ndash404

Sharwood Smith MA (1994) The unruly world of language In N Ellis (Ed) Implicit andexplicit learning of languages (pp 33ndash44) London Academic Press

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Human Learning and Memory 6 174ndash215

Stemberger JP amp MacWhinney B (1986) Frequency and the lexical storage of regularlyinected forms Memory and Cognition 14 17ndash26

Winter B amp Reber AS (1994) Implicit learning and the acquisition of natural languagesIn N Ellis (Ed) Implicit and explicit learning of languages (pp 115ndash146) LondonAcademic Press

Yang LR amp Givon T (1997) Benets and drawbacks of controlled laboratory studies ofsecond language acquisition The Keck second language learning project Studies in SecondLanguage Acquisition 19 173ndash194

336 ELLIS AND SCHMIDT

APPENDIX

The Word-forms of the Articial Language

Picture Stem Plural Form Frequency Regularity

car garth bugarth 5 Rbed pid bupid 1 Rlamp lant bulant 5 Rtable tib butid 1 Rplane poon bupoon 5 Rball prill buprill 1 Rtrain dram budram 5 Rhouse hize buhize 1 Rbook bisk bubisk 5 Rbroom breen bubreen 1 Rphone feem gofeem 5 Iumbrella brol gubrol 1 Ichair charp zecharp 5 Ihorse naig zonaig 1 Imonkey chonk nuchonk 5 Idog woop niwoop 1 Ielephant fant vefant 5 Iscissors zoze vuzoze 1 Ikite kag rekag 5 Ish pisc ropisc 1 I

314 ELLIS AND SCHMIDT

FIG 1 Acquisition data for human learners of the MAL morphology The four curvesillustrate the interactions of regularity and frequency The left-hand panel shows accuracyimproving with practice The right-hand panel shows vocal reaction time diminishing withpractice In this graph as in Figs 2 and 3 the frequency effect for regular items is assessed bycomparing the two solid lines and the frequency effect for irregular items lies in the differencebetween the two dotted lines

There was signicant improvement over blocks [F(14 5939) 5 13200 P 0001] The interaction of regularity by frequency was signicant with thefrequency effect being larger for the irregular items [overall analysis F(15939) 5 7352 P 0001 by subjects F(1 6) 5 1241 P 002 by words F(116) 5 2773 P 0001] A signicant interaction between regularity byfrequency by block [F(14 5939) 5 222 P 0005] shows that the largerfrequency effect for irregular items is maximal in the mid-order blocksmdashit isa lesser effect at early and later stages of learning (Fig 1)

These patterns are conrmed in the somewhat noisier RT data where thefollowing sources of variation were signicant at least in the overall analysis(a) frequency [overall analysis F(1 5123) 5 65074 P 0001 by subjectsF(1 6) 5 6308 P 0001 by words F(1 16) 5 7396 P 0001] (b)

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 315

regularity [overall analysis F(1 5123) 5 1062 P 0001 by subjects F(1 6)5 326 ns by words F(1 16) 1 ns] (c) block [F(14 5123) 5 2872 P 0001] (d) regularity by frequency [overall analysis F(1 5123) 5 2092 P 0001 by subjects F(1 6) 5 1015 P 002 by words F(1 16) 5 215 ns] (e)regularity by frequency by block [F(14 5123) 5 195 P 005]

It is clear from both panels of Fig 1 that there was much less regularityeffect for high frequency items than for low frequency items and incounterpart that the frequency effect was less for regular items Inparticular if the last four blocks of training are taken being typical of moreuent performance they demonstrate that ceiling effects on the accuracydata allow no frequency effect for the regular items whereas the effect offrequency is maintained for the irregular ones The RT curves in theright-hand panel of Fig 1 are clearly non-linear In each case a powerfunction better ts the data than does a linear function the R2s for the powerfunction ts being respectively HiFreqReg 094 HiFreqIrreg 097LoFreqReg 074 LoFreqIrreg 076 Thus the frequency by regularityinteraction seems a natural result of asymptotic performance limits forcorrectness the 100 accuracy ceiling for RT the latency ldquooorrdquo governedby the power law of practice The curves in Fig 1 give no hint of a suddenstep in performance whereafter all regular items are produced with similarefciency

Discussion of Human Data

Like Prasada et al (1990) these data show a regularity by frequencyinteraction in the processing of morphology However contra Prasada et althe present data which concern the learning of morphology demonstrate(a) that there are frequency effects (both on accuracy and RT) for regularitems in the early stages of acquisition (b) the sizes of these effects diminishwith learning (converging on a position at uency as described by Prasada etal) and (c) the size of the frequency effect on irregular items similarlydiminishes with learning but it does so more slowly

These effects are readily explained by simple associative theories oflearning It is not necessary to invoke hybrid systems separating rule-governed regular morphosyntax from associatively stored irregulars Ifthere is one ubiquitous quantitative law of human learning it is the powerlaw of practice (Anderson 1982) The critical feature in this relationship isnot just that performance typically time improves with practice but that therelationship involves the power law in which the amount of improvementdecreases as a function of increasing practice or frequency Anderson (1982)showed that this function applies to a variety of tasks including for examplecigar rolling syllogistic reasoning book writing industrial productionreading inverted text and lexical decision For the case of language

316 ELLIS AND SCHMIDT

acquisition Kirsner (1994) has shown that lexical recognition processes(both for speech perception and reading) and lexical production processes(articulation and writing) are independently governed by the relationshipT5 BN-a where T is some measure of latency of response and N the numberof trials of practice DeKeyser (1997) shows that automatisation ofcomprehension and production performance involving explicitly learnedsecond-language morphosyntax separately follow independent skill-specic power functions Ellis (1996) describes the general implications ofthe power law for second-language acquisition

The human acquisition data in Fig 1 clearly follow the power law oflearning Thus as performance approaches asymptote so previouslyseparated functions tend to converge High frequency items are closer toasymptote Therefore whereas performance levels for regular and irregularitems are clearly distinguishable at low frequencies they are much lessdistinct at high frequencies This comes as no surprise to us when weconsider the ceiling imposed by 100 accuracy But the power law ofpractice equally implies an asymptotic ceiling whatever our performancemeasure

The power law entails that the contribution of any potential independentvariable affecting performance will be more difcult to demonstrate withhigh-frequency items in practised individuals This is certainly the case inreading For example while spelling and graphemendashphoneme regularityhave clear effects on low frequency items they show little or no effectsamong high frequency words (Seidenberg Waters Barnes amp Tanenhaus1984) Our learning data illustrate the same principle operating in theacquisition of morphology It is not the case that there is no regularity effecton high frequency items (or concomitantly no frequency effect on regularitems) it is simply that such effects are much smaller closer to asymptoteand thus are likely to be swamped by random error Indeed highfrequency regular inected forms do exhibit a small (but non-signicant)advantage over low frequency forms in naturally occurring errorsand they can be shown to have a larger (signicant) advantage ina more controlled experimental task in which subjects produced thepast-tense forms of regular English verbs (Stemberger amp MacWhinney1986)

We have shown that the interaction of frequency and regularity resultsfrom developmental trends that are consistent with the ubiquitousdescriptive law of associative learning In the next section we willdemonstrate how such data can be generated by a very general mechanismof associative learning When presented with the same materials at the samerelative frequencies of exposure a standard three-layer feed-forwardconnectionist model closely simulates our language-learnersrsquo acquisitioncurves

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 317

CONNECTIONIST SIMULATIONS

Connectionist models allow the assessment of just how much of languageacquisition can be done by extraction of probabilistic patterns ofgrammatical and morphological regularities Since the only relation inconnectionist models is strength of association between nodes they areexcellent modelling media in which to investigate the formation ofassociations (both between surface-form elements and between these andemergent more abstract internal representations) as a result of exposure tolanguage The advantages of connectionist models over traditional symbolicmodels are that (a) they are neurally inspired (b) they incorporatedistributed representation and control of information (c) they are data-driven with prototypical representations emerging as a natural outcomeof the learning process rather than being prespecied and innately givenby the modellers as in more nativist cognitive accounts (d) they showgraceful degradation as do humans with language disorder and (e)they are in essence models of learning and acquisition rather than staticdescriptions

There have been a number of compelling PDP models of the acquisition ofmorphology The pioneers were Rumelhart and McClelland (1986) whoshowed that a simple learning model reproduced to a remarkable degreethe characteristics of young children learning the morphology of the pasttense in Englishmdashthe model generated the so-called U-shaped learningcurve for irregular forms it exhibited a tendency to overgeneralise and inthe model as in children different past-tense forms for the same word couldco-exist at the same time Yet there was no ldquorulerdquomdashldquoit is possible to imaginethat the system simply stores a set of rote-associations between base andpast-tense forms with novel responses generated by lsquoon-linersquo generalisationsfrom the stored exemplarsrdquo (Rumelhart amp McClelland 1986 p 267) Thisoriginal past-tense model was very inuential It laid the foundations for theconnectionist approach to language research which this special issue attestsit generated a large number of criticisms (Lachter amp Bever 1988 Pinker ampPrince 1988) some of which are undeniably valid and in turn it thusspawned a number of revised and improved PDP models of different aspectsof the acquisition of the English past tense (eg Cottrell amp Plunkett 1994Daugherty amp Seidenberg 1994 MacWhinney amp Leinbach 1991Marchman 1993 Plunkett amp Marchman 1991)

Of these newer models only that of Daugherty and Seidenberg (19921994) addressed the regularity by frequency interaction Their model was athree-layer feed-forward network mapping the input of phonologicalstructure of present tense encoded over 120 phonological units representinga CCCVVCCC template for English monosyllables onto an output ofsimilarly coded phonological structure of past tense form Simulation 1

318 ELLIS AND SCHMIDT

where the model was trained on all presentndashpast tense pairs with Francis andKucera frequencies 1 (309 verbs with regular past tenses 104 verbs withirregular past tenses) failed to generate any frequency by regularityinteraction in error score However when in simulation 2 the number ofirregular verbs in the training set was reduced to just 24 this resulted in therebeing little effect of frequency on performance with the regular itemswhereas performance was better for high frequency irregular verbs than forlow frequency ones This is an important demonstration that the frequencyby regularity interaction can be simulated by a connectionist systemHowever this model concerned mappings between present- and past-tenseforms not direct access from semantics as in our human data Furthermoreit is unclear from these simulations how much the results are due toregularity per se how much to phonological factors (for example insimulation 1 the error scores for regulars in generalisation tests were inatedby there being a high proportion of phonologically similar irregular pasttense false friends in the training corpus 1994 p 375) and given thecontrasting results of simulations 1 and 2 how much to the particular choiceof training items and the relative proportions of regular and irregular items

Indeed much of the debate over the validity of all of these models hasconcerned (a) the adequacy of the adopted low-level phonologicalrepresentations whether these might serve as TRICS (The RepresentationsIt Crucially Supposes) which cryptoembody rules within the connectionistnetwork (Lachter amp Bever 1988) (b) over-reliance on phonological cues inmodels that used sound-to-sound conversions to link base forms with pasttense forms (Daugherty amp Seidenberg 1992 MacWhinney 1994MacWhinney amp Leinbach 1991) and (c) the appropriateness of the trainingsets that are used in exposing the models to the evidence of language andwhether they properly reect the types and tokens in representative ratiosof regular and irregular forms in a sequence that plausibly mirrors learnerlanguage exposure at different stages of development (Daugherty ampSeidenberg 1992 Plunkett amp Marchman 1991) The models are usuallyconcerned with child learner language exposure yet here the extrapolationis particularly tenuous since adult language frequency norms are typicallythe only available reference database

In our simple demonstration with its intended focus on the frequency byregularity interaction in the acquisition of morphology we circumventedthese problems by the following means

1 We eliminated TRICS from our input and output representations byentirely ignoring the low level representations and instead simply havingone input unit for each picture and one output unit for each morphemeWe make no pretence of plausibility of these models for low levels ofrepresentation in either input or output processing but we are presently

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 319

neither concerned with low-level feature perception nor the details ofmotor programming for pronunciation Each input unit is supposedbroadly to correspond to activation of some picture detector or ldquoimagenrdquo(Paivio 1986) each output unit to some speech output ldquologogenrdquo(Morton 1979) We acknowledge that these parts of the model are grosslysimplied and we believe that these aspects ultimately involve distributedrepresentations as well However there is one advantage to thissimplicitymdashwhere as here each input detector or output logogen isrepresented by just one unit with all units having the same form there isno scope for making some more similar than others other that is than isdetermined by the frequency of the inputndashoutput mappings Thisencoding scheme allows the most hygienic investigation of frequency andregularity uncontaminated by other factors2 Like Cottrell and Plunkett (1994) we are modelling direct access fromsemantics rather than generating past tense from stem form phonologyBecause there are no phonological representations in our model there isno chance of the results reecting any confound with phonology As usualcosts accompany the benets Our simulations can have no bearing onphonological aspects of inection and thus while they might generatequantitatively clean data unlike the elegant error analyses performed byfor example Daugherty and Seidenberg (1994) and MacWhinney andLeinbach (1991) the error responses in the present simulations will bequalitatively uninteresting3 We eliminated uncertainty about the detailed content of the complexevidence which human learners are exposed to during their early years ofhearing natural language by modelling adult subjectsrsquo learning of theMAL that was reported in the preceding section Because we determinedthe exposure sequence of types and tokens of regular and irregular itemsin this language learning task we could train the models ensuring theidentical history of exposure

The most common architecture of connectionist model has three layersthe input layer of units the output layer and an intervening layer of hiddenunits (HUs) The presence of HUs enables more difcult inputoutputmappings to be learned than would be possible if the input units weredirectly connected to the output units (Broeder amp Plunkett 1994Rumelhart amp McClelland 1986) The most common learning algorithm isldquoback propagationrdquo (Rumelhart Hinton amp Williams 1986) where on eachlearning trial the network compares its output with the target output andany difference is propagated back to the hidden unit weights and in turn tothe input weights in a way that reduces the error Our simulations adoptedthis standard architecture Thus whatever the pattern of results they aregenerated by a very general learning system whose processes were not

320 ELLIS AND SCHMIDT

tweaked in any way to make it particular as a Language Acquisition DeviceSo what are the emergent patterns of language acquisition that result whenthis general associative learning mechanism is applied to the particularcontent of picture stimuli with their corresponding singular and plural lexicalresponses as experienced at the same relative frequencies of exposure as ourhuman learners

The Models

Architecture Every model had 22 input (I-) units Each of I-units 1ndash20represented one of the pictures used in the training set of the AppendixI-unit 21 represented another picture (the generalisation test item TesterP)which was only ever presented for training to the model in the singularmdashlater it was presented as a plural test item to see which plural afx the modelwould choose for this generalisation item (akin to asking you what is theplural of a novel word like ldquowugrdquo) I-unit 22 coded plurality that iswhether a singular stimulus item or a pair were presented Every model had32 output (O-) units O-units 1ndash20 represented the stem forms of the lexisshown in the Appendix O-unit 21 represented the stem form correspondingto I-unit 21 O-units 22ndash31 represented each of the other 10 unique pluralafxes for irregular items O-unit 32 represented the regular plural afxThis numbering of I- and O-units is of course arbitrary and was random-ised across modelsmdashwhat mattered and remained constant was that thesame O-unit was always reinforced whenever a particular I-unit wasactivated

We investigated four different classes of model which differed in theircomputational capacity or resources The larger the number of HUs in amodel the larger the number of connections in the network and the greaterits capacity to learn new associations and abstractions Thus we comparedmodels with 3 5 8 and 15 HUs

Stem Training At the outset the connection weights of the models wererandomised Then just like our human learners the models were rsttrained on the singular forms Each epoch of training consisted of 21 trialsEach trial consisted of presentation of a unique input pattern one for each ofthe input pictures Thus just one of I-units 1ndash21 would be ldquoonrdquo on any trialThroughout the singular training phase I-unit 22 (representing singlepluralstimuli) was set to ldquooffrdquo For each input pattern the model responded with apattern of output over its 32 O-units Initially this was the random result ofthe random connection weights But the model was also presented with thecorrect pattern of output for that corresponding input pattern (eg if I-unit 1

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 321

was on and all others off O-unit 1 should have had value 10 and all otherszero) On each trial the back-propagation algorithm calculated thedifference between the level of activity that was produced on each O-unitand the ldquocorrectrdquo level of activity and a small adjustment was made to theconnection strength to that unit in such a way that when the same processoccurred again a closer approximation to the correct pattern of outputactivation would be achieved The models were trained for 500 epochs ofsingular experience For each size of model we ran ve examples startingwith different arbitrary unit allocation and different initial randomconnection strengths The data we produce for each model is the averageperformance of these ve examples

Plural Training The model weights that resulted from this singulartraining then served as the starting point for another 700 epochs of trainingon plurals The trials constituting each epoch were very similar in nature tothose used with the human learners Each epoch consisted of 81 trialspresented in random order (a) One presentation of each of the 21 singularforms as in the preceding phase (b) ve presentations of each of the ve highfrequency regular (HiFreqReg) plural forms (c) ve presentations of eachof the ve high frequency irregular (HiFreqIrreg) plural forms (d) onepresentation of each of the ve low frequency regular (LoFreqReg) formsand (e) one presentation of each of the ve low frequency irregular(LoFreqIrreg) forms For training trials of type (a) just one of I-units 1ndash21was activated I-unit 22 was off and just the corresponding one of O-units1ndash21 was reinforced For the other training types (bndashe) one of I-units 1ndash20was activated I-unit 22 was on and one of O-units 1ndash20 (the correspondingstem form) along with one of O-units 22ndash32 (the corresponding plural afx)were reinforced The learning algorithm operated as it did in the stemtraining phase At regular intervals we tested the state of learning of themodel by presenting it without feedback with test input patterns thatrepresented the plural cases of all 21 pictures At these tests for eachstimulus we measured the pattern of activation (between 0 [no activation]and 1 [full on]) across O-units 22ndash32 and compared it against the targetplural activation for that input pattern

Results

Regularity by Frequency Figure 2 shows the Root Mean Square (RMS)error calculated across the plural afx O-units (22ndash32) averaged over the veitems in each of the following classes HiFreqReg HiFreqIrreg LoFreqRegLoFreqIrreg at each point in testing of the model These graphs illustratethat learning in all of the models showed clear effects of frequency (high

322

FIG

2

Acq

uisi

tion

data

for

fou

r co

nnec

tioni

st m

odel

s w

ith

incr

easi

ng c

ompu

tati

onal

pow

er t

rain

ed o

n th

e M

AL

mor

phol

ogy

The

re a

re c

lear

reg

ular

ity b

y fr

eque

ncy

inte

ract

ions

in a

ll m

odel

s

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 323

frequency items were learned faster than low frequency ones) regularity(regular items were learned faster than irregular ones) and a frequency byregularity interaction whereby there was much less regularity effect for highfrequency items than for low frequency items and equally that thefrequency effect was less for regular items than for irregular ones

ANOVAs on these RMS data for each size of model demonstrated thatthere was high consistency of response across items and examplesimulations For example when the 8HU model was analysed as a repeatedmeasures ANOVA across 15 roughly equally spaced blocks of training (toparallel the human data analysis) the following signicant effects wereobserved (a) Frequency [by simulations F(1 16) 5 2080 P 00005 bywords F(1 16) 5 5665 P 00001] (b) regularity [by simulations F(1 16)5 907 P 001 by words F(1 16) 5 3957 P 00001] (c) regularity byfrequency [by simulations F(1 16) 5 485 P 005 by words F(1 16) 51561 P 0005] (d) block [by simulations F(14 224) 5 6803 P 00001by words F(14 224) 5 14914 P 00001] (e) block by regularity [bysimulations F(14 224) 5 3675 P 00001 by words F(14 224) 5 2929 P 00001] (f) block by frequency [by simulations F(14 224) 5 1893 P 00001 by words F(14 224) 5 1184 P 00001] and (g) block by regularityby frequency [by simulations F(14 224) 5 1611 P 00001 by words F(14224) 5 1306 P 00001]

Comparison of this pattern of ANOVA effects with that reported earlierfor the human data shows important similarities in both cases there aresignicant main effects of frequency regularity and blocks and there aresignicant interactions involving regularity by frequency and regularity byfrequency by block Thus the connectionist models demonstrate effectswhich broadly parallel those found in humans

Comparison with Human Data More detailed comparison is alsopossible Although RMS error is the usual measure of model performancebecause it assesses how well the network learns to inhibit non-relevant unitsas well as to excite relevant ones we also extracted simple accuracy data forthe 8HU model This accuracy score is the amount of activation (between 0and 1) on the single O-unit which corresponds to the appropriate target afxfor that input pattern Figure 3 shows the performance of the 8HU modelusing this metric It is clear that accuracy scores generate a graph which iseffectively a reection in a horizontal plane of the RMS data shown in thethird panel of Fig 2 In fact in the current simulations correct activation isalmost perfectly correlated with MSE (for example r 5 2 0988 for the 8HUmodel) However the activation metric has the advantage of more readyinterpretation and direct comparison with the human data

When the 8HU model and the human data are aligned as in Fig 3 thesecorrespondences become clear Pairwise comparison of individual points

324 ELLIS AND SCHMIDT

FIG 3 A comparison of human accuracy performance and that of the eight hidden unitconnectionist simulation

across these two graphs by correlation shows that the simulation predicts alarge proportion of the variance in the human data (R2 5 078) There aresome differences in detailmdashas is claried in Fig 4 where performance isaveraged over blocks the model performs somewhat better on the regularitems and worse on the irregular items particularly the low frequencyirregular items than do the humans ANOVA (three factor [humanmodelregularity and frequency] with 15 blocks as repeated measures by wordsanalysis) comparing the human and 8HU model data conrms theseinteractions (a) humanmodel F(1 32) 5 136 ns (b) humanmodel byfrequency F(1 32) 5 047 ns (c) humanmodel by regularity F(1 32) 53028 P 00001 (d) humanmodel by regularity by frequency F(1 32) 5501 P 005

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 325

FIG 4 The regularity by frequency interaction averaged over blocks in humans and the eighthidden unit model Error bars reect 95 condence intervals

Generalisation So far we have described performance with traineditems However we also tested model output when the stimulus was thepattern for generalisation item (TesterP) along with activation of the pluralmarking I-unit 22 a state of input on which the models had never beentrained Table 1 shows performance of the different models at the end oftraining It is clear that the larger models have abstracted the regular pluralpattern and tend to apply it by default to the generalisation test item for the15HU model (a) average activation on the regular plural O-unit is 060 (b)mean RMS error comparing observed activation across O-units 22ndash32 andthe target regular plural pattern (10000000000) is just 045 and (c) four outof the ve exemplar runs of this size of model chose the regular pluralpattern as being the closest to observed output as measured by minimum

326 ELLIS AND SCHMIDT

TABLE 1Performance on the Target Regular Plural Pattern for the Four Sizes of Model When

Presented with the Generalisation Wug-test Item TesterP at End of Training

Model Size

Measure 3HUa 5HU 8HU 15HU

RMS errorb

M 081 079 053 045SD 043 050 045 032

Activation weightc

M 020 028 057 060SD 044 044 052 035

N hits (5)d 1 2 3 4

There were ve examplars of each size of model aHU 5 hidden units bRMS error calculatedagainst the target activation pattern across O-units 22ndash32 for the regular plural afx cActivationweight on the regular plural afx O-Unit dNumber of exemplar models (5) which chose theregular plural afx pattern for TesterP as indexed by output weights on O-units 22ndash32 beingclosest to the regular plural afx target pattern activation using a squared Euclidean distancemetric

squared Euclidean distance Thus when the larger models are presentedwith a plural stimulus which they have only ever previously experienced as asingle form there is a tendency for them to generalise and apply the regularplural morpheme (bu-) in the same way that humans might generalise thatthe plural of ldquowugrdquo is ldquowugsrdquo

Effects of Different Sizes of Model Figure 2 also illustrates the effects ofmanipulating computational capacity of model (1) Models with lowercomputational power ( 5 a smaller number of HUs) learn the high frequencyitems quite wellmdashalmost as well as the largest model (2) The most strikingeffect of varying the computational power of the models lies in their abilitiesto learn low frequency irregular itemsmdashthis is by far the most sensitive indexof morphological learning ability The 3HU model hardly manages to learnthese forms at all The 15HU model eventually learns them rather well (3)There is essentially no frequency effect for regular items in the highercomputational power models but none the less the frequency effect forirregular items remains strong (4) The smaller models continue to show afrequency effect for regular items at the end of training Table l provides oneadditional effect of model size (5) The greater the computational power ofthe models the more they operate in ldquorule-likerdquo way by abstracting aldquoregularrdquo plural form which is applied by default to novel items In sumwhile lower computational power models are reasonably good on highfrequency regular items they show frequency effects for irregular and

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 327

regular items are relatively poor on ldquowug testsrdquo and have particulardifculty on low frequency irregular items

Discussion of Simulations

We believe that at least for the issue of regularity and frequency effects inmorphosyntax this is to date the most complete quantitative analysis of theadequacy of t of simulation to human data We are not simply makingpredictions about how an underspecied model might behave (theDaugherty amp Seidenberg 1994 criticisms of the Pinker amp Prince 1988 andPinker 1991 theories) We are not simply demonstrating that simulation andhuman data alike exhibit rst order interactions of frequency and regularity(Daugherty amp Seidenberg 1994) Instead we are showing the parallelpatterns of signicance of main effects rst and second order interactions inANOVAs of simulation and human data and we are showing that thesimulations explain close to 80 of the relevant human data When we go asfar as actually comparing human and model performance in a multifactorialANOVA we nd some differences of detail in the size of interactions thatare qualied by the humanmodel factor But these differences of detail donot detract from the general success of the models in simulating the humanpattern of development of the frequency by regularity interaction Inhumans and models alike high frequency items were learned signicantlyfaster than low frequency ones regular items were learned signicantlyfaster than irregular ones there was a signicant frequency by regularityinteraction where the frequency effect was less for regular items than forirregular ones and this is qualied as the higher level interaction with blockwhereby there is a developmental trendmdashthe frequency effect for regularitems attenuates faster than that for irregular items

We have demonstrated that the models can generalise and produce thedefault plural afx for a novel stimulus Similar ldquowug testrdquo performance by ahuman learner would be taken as an operationalisation that they hadacquired the ldquoregularrdquo morphological systematicity

Finally we have shown how varying the computational capacity of themodels affects both the rate of acquisition of default case the presence orabsence of frequency effects for regular items and ability to acquireirregular items This is compatible with existing data for children withspecic language impairment (SLI) Oetting and Rice (1993) compared ve-year-old SLI children with age-matched controls on their ability to formplurals The SLI children were signicantly worse at generating regularplurals for nonce (5 wug) items they were worse at generating regularplurals and they showed an effect of frequency on the regular items whichthe control children because of ceiling effects did not UnfortunatelyOetting and Rice (1993) do not provide clear data on the childrenrsquos ability to

328 ELLIS AND SCHMIDT

form irregular plurals However their pattern of differences between SLIand control childrenrsquos performance on regular items is sufciently close tothat between the present low-capacity and high-capacity simulations tosuggest that morphosyntactic impairments in individuals with SLI might beexplained by reduced language processing capacity in a general associativememory network rather than by a hybrid account The SLI childrenrsquosshowing frequency effects for regular items is particularly compelling in thisrespect However further assessment of regularity by frequency effects anddefault abstraction in individuals with SLI and with Williams syndrome(whose ability on regular forms is said to outstrip their performance onirregularsmdashBellugi Bihrle Jernigan Trauner amp Dougherty 1990) isnecessary to test these parallels further (see Marchman 1993 for othersimulations of different types of language dysfunction)

GENERAL DISCUSSION

Fluent language users have processed many millions of utterances involvingtens of thousands of types presented as innumerable tokens It should comeas no surprise either that they demonstrate such effortless and complex skillas a result of this mass of practice or that researchers lacking any truerecord of the learnersrsquo experience are awed and confused by thesesophisticated grammatical abilities While we have no wish to deny any ofthe complexity of the nal uent state we suspect that much of the mysteryof morphology can be claried by focusing on the acquisition process ratherthan the end-point This has been our aim in this paper Our MAL is atravesty of natural language but at least we know the types and tokens in thelearnersrsquo language evidence and there is no need to speculate or argue aboutextrapolations from corpus data or assumptions about registers

Human learning of this MAL inectional morphology quickly culminatesin a state where as with natural language frequency and regularity haveinteractive effects on performance But as we chart acquisition it is clearthat this interaction need not imply complex dual-mechanisms of processingRather it simply reects the asymptotes expected from the power law ofpractice a simple associative law of learning Thus we have shown that oneof the most frequently introduced arguments for the necessity of adual-mechanism approach a frequency effect for irregulars and the absenceof such an effect for regulars is not a good argument at all Furthermore wehave demonstrated that a simple connectionist model as an implementationof associative learning provided with the same language evidenceaccurately simulates the human acquisition data

But how is the power law instantiated in human and connectionistsystems and what is being associated in the acquisition of inectional

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 329

morphosyntax The power law of learning in human performance has beeninterpreted as resulting from basic associative mechanisms involving theformation of new chunks and the effects of frequency on the accessibility ofthese representations (Newell 1990 Newell amp Rosenbloom 1981)Anderson and Schooler (1991) suggest that memory (both as its behaviouralexpression in error rate and latency and as its neural expression in LTP)displays properties such as the power law of learning because theseproperties reect an optimal response to the environment where theprobability of an item occurring at any particular time is a power function ofits past frequency of occurrence Neural activation which controlsbehaviour reects the probability of an item occurring in the environmentthus the neural processes are designed to adapt behaviour to the statisticalproperties of the environment (Anderson 1993) Connectionist systems aredesigned to do the same thing (Chater 1995)

In our simplied account of inectional morphology where phonologicalfactors are put to one side the relevant units for chunking are the stem formsand the plural afxes From an associative perspective regularity andfrequency are essentially the same factor under different names The rstmeaning of ldquoregularrdquo in the Pocket Oxford Dictionary involves ldquohabitualconstantrdquo acts a denition in terms of statistical frequencies consistencyand descriptive generalisation the second stresses ldquoconforming to a rule orprinciplerdquo We need to disentangle these senses (see Sharwood-Smith 1994and Lima Corrigan amp Iverson 1994 for conceptual analysis of ldquorules oflanguagerdquo) Whether regular morphology is generated according to a rule ornot it is certainly the case for English and the MAL under study here (andgenerally it is the default if not the universal casemdashwe will return to thismatter later) that regular afxes are more habitual or frequent And asdemonstrated in Fig 5 the power law of practice entails that an effect of aconstant increment of regularity (in its frequency sense) is much moreapparent at low than at high frequencies of practice

Although it is a general principle the degree to which it applies dependson a range of factors including (a) the exponent of the power function (b)the particular level of experience attained and thus the placement ofcomparison points on the learning curve and (c) the degree to whichfrequency and regularity are additive or multiplicative In the presentexperiment a vefold increase in the frequency of the regular items resultsin a (5 3 the number of regular items) increase in use of the regular afx avefold increase in the frequency of an irregular item results in merely avefold increase in the use of the irregular afx Thus frequency andregularity are interactive rather than additive But even if we allow forinteraction the function still results in greater regularity effects for lowfrequency itemsmdashjust as for example the power function

330 ELLIS AND SCHMIDT

FIG 5 A frequency by regularity interaction arising from additive contributions of regularity(solid horizontal arrows) and frequency (dotted horizontal arrows) inputting into anasymptoting power function Notice in particular the solid vertical bars measuring out the largeregularity effect at low frequencies and the much smaller one at high frequencies (Adaptedfrom Plaut McClelland Seidenberg amp Patterson 1994)

y 5 1 2 x2 2

asymptotes so does any power function

y 5 1 2 (xn)2 2

where n 0 the shape remains the same albeit stretched or condensedalong the horizontal axis Thus all associative accounts of morphologywhether they stress the importance of type or token frequency (Bybee 1995)in the determination of statistical regularity imply a frequency by regularityinteraction in performance

Plaut et al (1996) analyse the operation of connectionist networks in theparticular quasi-regular domain of spellingndashsound consistency in reading todemonstrate how the frequency by regularity interaction is a direct

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 331

consequence of the nonlinearity adaptivity and distributed representationproperties of learning and representation in PDP networks In what followswe will minimally rephrase their analysis as it applies to the quasi-regulardomain of inectional morphology In a connectionist network the weightchanges induced by an inputoutput pattern (IOP) on any training epochserve to reduce the error on that IOP The frequency of the IOP (and theunits it involves) is reected in how often it is presented to the network Thusword frequency directly amplies weight changes that are helpful to theIOP itself Consistency of the morphological inections of two stems isreected in the similarity of afx units that are co-activated in their IOPsFurthermore two inputs will induce similar weight changes to the extentthat they activate similar units In our MAL as an extreme case consistentforms all activate the same afx unit irregular ones each activate a differentidiosyncratic afx Given that the weight changes that are induced by eachIOP are superimposed on the weight changes for all other IOPs an IOPwill tend to be helped by the weight changes for IOPs whose inputoutputmappings are consistent with its own and hindered by the weight changesfor inconsistent IOPs Thus frequency and consistency sum because theyboth arise from similar weight changes that are simply added together duringtraining The weight changes result in corresponding increases in thesummed input to output units that should be active and decreases to thesummed units that should be inactive However due to the non-linearity ofthe input-output function of units these changes do not produce directlyproportionate reduction of error Rather as the magnitude of the summedinput to output units increases their states gradually asymptote towards10mdasha given increase in the summed input to a unit yields progressivelysmaller decrements in error over the course of training Thus althoughfrequency and regularity-as-consistency each contribute to the weights andhence to the summed input to units their effect on error is subjected to agradual ceiling effect as unit states are driven towards extremal values

Thus a connectionist associative account of simple morphosyntax as it isembodied in our MAL holds that learning involves associating inputpatterns representing single or plural concepts with stem and afx lemmasacross a large distributed network Frequency of experience increases thestrength of the appropriate IO associations Regularity effects stem fromconsistency the consistent items all involve pairings between plurality andthe regular lemma and thus regularity is frequency by another name Thenetwork sums and abstracts these consistencies but it does so usingnon-linear unit inputndashoutput functions thereby resulting in the frequency byregularity interaction Networks are not simple competitive chunking orMarkov chaining mechanisms working on surface form Their massivelydistributed nature allows the emergence of more abstract internalrepresentations We have argued that this analysis accounts for the human

332 ELLIS AND SCHMIDT

acquisition data of simple MAL morphosyntax quite well We believe thatthe acquisition of natural language morphosyntax where there are manyadditional factors of different phonological consistencies (of the type forexample where the neighbours sink drink and stink are irregular in theirpast tenses but all behave in the same -ankway) are equally conducive to theprinciples of this type of account although as illustrated in grandersimulation enterprises (Cottrell amp Plunkett 1994 Daugherty amp Seidenberg1994 MacWhinney amp Leinbach 1991 Marchman 1993 Plunkett ampMarchman 1993) the complexity of interaction of the factors that are therein the language evidence leads to much more complex developmentaloutcomes Our role here has been to study human acquisition underprecisely known circumstances and to demonstrate just how well aconnectionist associative account can simulate these data

A simple regularity5 consistency account of this type will have difculty ifthe ldquoregularrdquo or ldquodefaultrdquo case is not the most frequent case in a naturallanguage Although there is agreement for English past tense and formorphology more generally that the default case is more frequent theremay be exceptions Marcus et al (1995) argue that while the German particle-t applies to a much smaller percentage of verbs than its English counterpartand the German plural -s applies only to a small percentage of nounsnevertheless these afxes behave as defaults in the language These defaultsufxations in German could thus pose a problem for statistical orconnectionist accounts of the acquisition of the more frequent patterns asdefault since they may not be due to a large number of regular wordsreinforcing a pattern in associative memory (Prasada amp Pinker 1993)However this is still a matter of some debate Bybee (1995) suggests that amore reasonable method of counting German particle type frequency doesshow the default (or ldquoproductiverdquo) process to have the highest typefrequency She also argues that to a large extent the productivity patterns ofGerman plurals also reect their type frequency Nakisa and Hahn (1996)and Plunkett and Nakisa (in press) show that generalisation to unseen ornovel forms in German and Arabic (where there have also been claims for aminority default) is more accurately predicted by their phonologicalsimilarity to existing forms in the language (properly represented for typeand token frequency) rather than by the operation of a default rule FinallyHare Elman and Daugherty (1995) demonstrate that multilayerednetworks can develop a default category even in the absence of superior typefrequency as long as the non-default classes are well dened and narrowlydened so that they serve as strong prototypes for analogising to novelforms In such cases the area outside these well-dened attractor basins canconstitute a potential default (see also Plunkett amp Marchman 1991)

In the original hybrid model irregulars were stored and accessed fromrote memory Pinker and Prince (1994 p 326) modied this part of the

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 333

model arguing that since rote memory could not account (a) for similaritiesbetween the morphological base and irregular forms (eg swingndashswung) (b)for similarity within sets of base forms undergoing similar processes (egsingndashsang ringndashrang springndashsprang) or (c) for the kind of semi-productivityshown when children produce errors such as bringndashbrang or swingndashswangthe memory system underlying such productions must be associative anddynamic somewhat as connectionism portrays it Yet to account for datasuch as the frequencyregularity interaction this revised hybrid model stillholds that regular forms are rule-governed But a purely rule-based accountof regulars cannot explain false friends effects where regular inconsistentitems (eg bakendashbaked is similar in rhyme to neighbours makendashmade andtakendashtook which have inconsistent past tenses) are produced more slowlythat entirely regular ones (Daugherty amp Seidenberg 1994 Seidenberg ampBruck 1990) or frequency effects on regular forms (Oetting amp Rice 1993Stemberger amp MacWhinney 1986) Unlike connectionist models a rule-based account of regulars cannot explain these aspects of the human dataNor is the regularityfrequency interaction any reason to reject connectionistaccounts of morphosyntax in favour of a hybrid model

REFERENCESAnderson JR (1982) Acquisition of cognitive skill Psychological Review 89 369ndash406Anderson JR (1993) Rules of the mind Hillsdale NJ Lawrence Erlbaum Associates IncAnderson JR amp Schooler LJ (1991) Reections of the environment in memory

Psychological Science 2 396ndash408Beck M (1995) Tracking down the source of NSndashNNS differences in syntactic competence

Unpublished manuscript University of North TexasBellugi U Bihrle A Jernigan D Trauner D amp Dougherty S (1990)

Neuropsychological neurological and neuroanatomical prole of Williams SyndromeAmerican Journal of Medical Genetics 6 115ndash125

Braine MDS Brody RE Brooks PJ Sudhalter V Ross JA Catalano L amp FischSM (1990) Exploring language acquisition in children with a miniature articiallanguage Effects of item and pattern frequency arbitrary subclasses and correctionJournal of Memory and Language 29 591ndash610

Broeder P amp Plunkett K (1994) Connectionism and second language acquisition In NEllis (Ed) Implicit and explicit learning of languages (pp 421ndash454) London AcademicPress

Bybee J (1995) Regular morphology and the lexicon Language and Cognitive Processes10 425ndash455

Chater N (1995) Neural networks The new statistical models of mind In JP Levy DBairaktaris JA Bullinaria amp P Cairns (Eds) Connectionist models of memory andlanguage London UCL Press

Cohen JD MacWhinney B Flatt M amp Provost J (1993) PsyScope A new graphicinteractive environment for designing psychology experiments Behavioral ResearchMethods Instruments and Computers 25 257ndash271

Cottrell G amp Plunkett K (1994) Acquiring the mapping from meaning to soundsConnection Science 6 379ndash412

334 ELLIS AND SCHMIDT

Daugherty KG amp Seidenberg MS (1992) Rules or connections The past tense revisitedIn Proceedings of the 14th annual conference of the Cognitive Science Society (pp 259ndash264)Pittsburgh PA Cognitive Science Society

Daugherty KG amp Seidenberg MS (1994) Beyond rules and exceptions A connectionistapproach to inectional morphology In SD Lima RL Corrigan amp GK Iverson (Eds)The reality of linguistic rules (pp 353ndash388) Amsterdam John Benjamins

DeKeyser R (1997) Beyond explicit rule learning Automatizing second languagemorphosyntax Studies in Second Language Acquisition 19 195ndash222

Ellis NC (1996) Sequencing in SLA Phonological memory chunking and points of orderStudies in Second Language Acquisition 18 91ndash126

Eubank L amp Gregg KR (1995) ldquoEt in Amygdala Egordquo UG (S)LA and neurobiologyStudies in Second Language Acquisition 17 35ndash58

Hare M Elman JL amp Daugherty KG (1995) Default generalisation in connectionistnetworks Language and Cognitive Processes 10 601ndash630

Jung J (1971) The experimenterrsquos dilemma New York Harper amp RowKirsner K (1994) Implicit processes in second language learning In N Ellis (Ed) Implicit

and explicit learning of languages (pp 283ndash312) London Academic PressLachter J amp Bever T (1988) The relation between linguistic structure and associative

theories of language learning A constructive critique of some connectionist learningmodels Cognition 28 195ndash247

Lima SD Corrigan RL amp Iverson GK (Eds) (1994) The reality of linguistic rulesAmsterdam John Benjamins

MacWhinney B (1983) Miniature language systems as tests of use of universal operatingprinciples in second-language learning by children and adults Journal of PsycholinguisticResearch 12 467ndash478

MacWhinney B (1994) The dinosaurs and the ring In SD Lima RL Corrigan amp GKIverson (Eds) The reality of linguistic rules (pp 283ndash320) Amsterdam John Benjamins

MacWhinney B amp Leinbach J (1991) Implementations are not conceptualizationsRevising the verb learning model Cognition 40 121ndash157

Marchman VA (1993) Constraints on plasticity in a connectionist model of the Englishpast tense Journal of Cognitive Neuroscience 5 215ndash234

Marcus GF Brinkmann U Clahsen H Wiese R amp Pinker S (1995) Germaninection The exception that proves the rule Cognitive Psychology 29 198ndash256

McLaughlin B (1980) On the use of miniature articial languages in second-languageresearch Applied Psycholinguistics 1 357ndash369

Moeser SD amp Bregman AS (1972) The role of reference in the acquisition of a miniaturearticial language Journal of Verbal Learning and Verbal Behavior 11 759ndash769

Morgan JL Meier RP amp Newport EL (1987) Structural packaging in the input tolanguage learning Contributions of prosodic and morphological marking of phrases to theacquisition of language Cognitive Psychology 19 498ndash550

Morgan JL amp Newport EL (1981) The role of constituent structure in the induction of anarticial language Journal of Verbal Learning and Verbal Behavior 20 67ndash85

Morton J (1979) Facilitation in word recognition Experiments causing change in thelogogen model In PA Kolers ME Wrolstad amp M Bouma (Eds) Processing of visiblelanguage (pp 259ndash268) New York Plenum

Nakisa R amp Hahn U (1996) Where defaults donrsquot help The case of the German pluralsystem In Proceedings of the 18th annual conference of the Cognitive Science Society (pp177ndash182) Hillsdale NJ Lawrence Erlbaum Associates Inc

Newell A (1990) Unied theories of cognition Cambridge MA Harvard University PressNewell A amp Rosenbloom P (1981) Mechanisms of skill acquisition and the law of

practice In JR Anderson (Ed) Cognitive skills and their acquisition Hillsdale NJLawrence Erlbaum Associates Inc

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 335

Oetting JB amp Rice ML (1993) Plural acquisition in children with specic languageimpairment Journal of Speech and Hearing Research 36 1236ndash1248

Paivio A (1986) Mental representations A dual coding approach Oxford UK OxfordUniversity Press

Palermo DS amp Howe HE (1970) An experimental analogy to the learning of past-tenseinection rules Journal of Verbal Learning and Verbal Behavior 9 410ndash416

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S amp Prince A (1988) On language and connectionism Analysis of a parallel

distributed processing model of language acquisition Cognition 29 195ndash247Pinker S amp Prince A (1994) Regular and irregular morphology and the psychological

status of rules of grammar In SD Lima RL Corrigan amp GK Iverson (Eds) The reality oflinguistic rules (pp 321ndash351) Amsterdam John Benjamins

Plaut DC McClelland JL Seidenberg MS amp Patterson KE (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plunkett K amp Marchman V (1991) U-shaped learning and frequency effects in amulti-layered perceptron Implications for child language acquisition Cognition 38 3ndash102

Plunkett K amp Marchman V (1993) From rote learning to system building Acquiring verbmorphology in children and connectionist nets Cognition 48 21ndash69

Plunkett K amp Nakisa RC (in press) A connectionist model of Arabic plural systemLanguage and Cognitive Processes

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Prasada S Pinker S amp Snyder W (1990) Some evidence that irregular forms are retrievedfrom memory but regular forms are rule-governed Paper presented at the 31st meeting ofthe Psychonomic Society New Orleans November

Rumelhart D Hinton G amp Williams R (1986) Learning internal representations by backpropagation In DE Rumelhart amp JL McClelland (Ed) Parallel distributed processingExplorations in the microstructure of cognition Cambridge MA MIT Press

Rumelhart D amp McClelland J (1986) On learning the past tense of English verbs In DERumelhart amp JL McClelland (Eds) Parallel distributed processing Explorations in themicrostructure of cognition Vol 2 Psychological and biological models (pp 272ndash326)Cambridge MA MIT Press

Seidenberg MS amp Bruck M (1990) Consistency effects in the generation of past tensemorphology Paper presented at the 31st meeting of the Psychonomic Society New OrleansNovember

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behavior 23 383ndash404

Sharwood Smith MA (1994) The unruly world of language In N Ellis (Ed) Implicit andexplicit learning of languages (pp 33ndash44) London Academic Press

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Human Learning and Memory 6 174ndash215

Stemberger JP amp MacWhinney B (1986) Frequency and the lexical storage of regularlyinected forms Memory and Cognition 14 17ndash26

Winter B amp Reber AS (1994) Implicit learning and the acquisition of natural languagesIn N Ellis (Ed) Implicit and explicit learning of languages (pp 115ndash146) LondonAcademic Press

Yang LR amp Givon T (1997) Benets and drawbacks of controlled laboratory studies ofsecond language acquisition The Keck second language learning project Studies in SecondLanguage Acquisition 19 173ndash194

336 ELLIS AND SCHMIDT

APPENDIX

The Word-forms of the Articial Language

Picture Stem Plural Form Frequency Regularity

car garth bugarth 5 Rbed pid bupid 1 Rlamp lant bulant 5 Rtable tib butid 1 Rplane poon bupoon 5 Rball prill buprill 1 Rtrain dram budram 5 Rhouse hize buhize 1 Rbook bisk bubisk 5 Rbroom breen bubreen 1 Rphone feem gofeem 5 Iumbrella brol gubrol 1 Ichair charp zecharp 5 Ihorse naig zonaig 1 Imonkey chonk nuchonk 5 Idog woop niwoop 1 Ielephant fant vefant 5 Iscissors zoze vuzoze 1 Ikite kag rekag 5 Ish pisc ropisc 1 I

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 315

regularity [overall analysis F(1 5123) 5 1062 P 0001 by subjects F(1 6)5 326 ns by words F(1 16) 1 ns] (c) block [F(14 5123) 5 2872 P 0001] (d) regularity by frequency [overall analysis F(1 5123) 5 2092 P 0001 by subjects F(1 6) 5 1015 P 002 by words F(1 16) 5 215 ns] (e)regularity by frequency by block [F(14 5123) 5 195 P 005]

It is clear from both panels of Fig 1 that there was much less regularityeffect for high frequency items than for low frequency items and incounterpart that the frequency effect was less for regular items Inparticular if the last four blocks of training are taken being typical of moreuent performance they demonstrate that ceiling effects on the accuracydata allow no frequency effect for the regular items whereas the effect offrequency is maintained for the irregular ones The RT curves in theright-hand panel of Fig 1 are clearly non-linear In each case a powerfunction better ts the data than does a linear function the R2s for the powerfunction ts being respectively HiFreqReg 094 HiFreqIrreg 097LoFreqReg 074 LoFreqIrreg 076 Thus the frequency by regularityinteraction seems a natural result of asymptotic performance limits forcorrectness the 100 accuracy ceiling for RT the latency ldquooorrdquo governedby the power law of practice The curves in Fig 1 give no hint of a suddenstep in performance whereafter all regular items are produced with similarefciency

Discussion of Human Data

Like Prasada et al (1990) these data show a regularity by frequencyinteraction in the processing of morphology However contra Prasada et althe present data which concern the learning of morphology demonstrate(a) that there are frequency effects (both on accuracy and RT) for regularitems in the early stages of acquisition (b) the sizes of these effects diminishwith learning (converging on a position at uency as described by Prasada etal) and (c) the size of the frequency effect on irregular items similarlydiminishes with learning but it does so more slowly

These effects are readily explained by simple associative theories oflearning It is not necessary to invoke hybrid systems separating rule-governed regular morphosyntax from associatively stored irregulars Ifthere is one ubiquitous quantitative law of human learning it is the powerlaw of practice (Anderson 1982) The critical feature in this relationship isnot just that performance typically time improves with practice but that therelationship involves the power law in which the amount of improvementdecreases as a function of increasing practice or frequency Anderson (1982)showed that this function applies to a variety of tasks including for examplecigar rolling syllogistic reasoning book writing industrial productionreading inverted text and lexical decision For the case of language

316 ELLIS AND SCHMIDT

acquisition Kirsner (1994) has shown that lexical recognition processes(both for speech perception and reading) and lexical production processes(articulation and writing) are independently governed by the relationshipT5 BN-a where T is some measure of latency of response and N the numberof trials of practice DeKeyser (1997) shows that automatisation ofcomprehension and production performance involving explicitly learnedsecond-language morphosyntax separately follow independent skill-specic power functions Ellis (1996) describes the general implications ofthe power law for second-language acquisition

The human acquisition data in Fig 1 clearly follow the power law oflearning Thus as performance approaches asymptote so previouslyseparated functions tend to converge High frequency items are closer toasymptote Therefore whereas performance levels for regular and irregularitems are clearly distinguishable at low frequencies they are much lessdistinct at high frequencies This comes as no surprise to us when weconsider the ceiling imposed by 100 accuracy But the power law ofpractice equally implies an asymptotic ceiling whatever our performancemeasure

The power law entails that the contribution of any potential independentvariable affecting performance will be more difcult to demonstrate withhigh-frequency items in practised individuals This is certainly the case inreading For example while spelling and graphemendashphoneme regularityhave clear effects on low frequency items they show little or no effectsamong high frequency words (Seidenberg Waters Barnes amp Tanenhaus1984) Our learning data illustrate the same principle operating in theacquisition of morphology It is not the case that there is no regularity effecton high frequency items (or concomitantly no frequency effect on regularitems) it is simply that such effects are much smaller closer to asymptoteand thus are likely to be swamped by random error Indeed highfrequency regular inected forms do exhibit a small (but non-signicant)advantage over low frequency forms in naturally occurring errorsand they can be shown to have a larger (signicant) advantage ina more controlled experimental task in which subjects produced thepast-tense forms of regular English verbs (Stemberger amp MacWhinney1986)

We have shown that the interaction of frequency and regularity resultsfrom developmental trends that are consistent with the ubiquitousdescriptive law of associative learning In the next section we willdemonstrate how such data can be generated by a very general mechanismof associative learning When presented with the same materials at the samerelative frequencies of exposure a standard three-layer feed-forwardconnectionist model closely simulates our language-learnersrsquo acquisitioncurves

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 317

CONNECTIONIST SIMULATIONS

Connectionist models allow the assessment of just how much of languageacquisition can be done by extraction of probabilistic patterns ofgrammatical and morphological regularities Since the only relation inconnectionist models is strength of association between nodes they areexcellent modelling media in which to investigate the formation ofassociations (both between surface-form elements and between these andemergent more abstract internal representations) as a result of exposure tolanguage The advantages of connectionist models over traditional symbolicmodels are that (a) they are neurally inspired (b) they incorporatedistributed representation and control of information (c) they are data-driven with prototypical representations emerging as a natural outcomeof the learning process rather than being prespecied and innately givenby the modellers as in more nativist cognitive accounts (d) they showgraceful degradation as do humans with language disorder and (e)they are in essence models of learning and acquisition rather than staticdescriptions

There have been a number of compelling PDP models of the acquisition ofmorphology The pioneers were Rumelhart and McClelland (1986) whoshowed that a simple learning model reproduced to a remarkable degreethe characteristics of young children learning the morphology of the pasttense in Englishmdashthe model generated the so-called U-shaped learningcurve for irregular forms it exhibited a tendency to overgeneralise and inthe model as in children different past-tense forms for the same word couldco-exist at the same time Yet there was no ldquorulerdquomdashldquoit is possible to imaginethat the system simply stores a set of rote-associations between base andpast-tense forms with novel responses generated by lsquoon-linersquo generalisationsfrom the stored exemplarsrdquo (Rumelhart amp McClelland 1986 p 267) Thisoriginal past-tense model was very inuential It laid the foundations for theconnectionist approach to language research which this special issue attestsit generated a large number of criticisms (Lachter amp Bever 1988 Pinker ampPrince 1988) some of which are undeniably valid and in turn it thusspawned a number of revised and improved PDP models of different aspectsof the acquisition of the English past tense (eg Cottrell amp Plunkett 1994Daugherty amp Seidenberg 1994 MacWhinney amp Leinbach 1991Marchman 1993 Plunkett amp Marchman 1991)

Of these newer models only that of Daugherty and Seidenberg (19921994) addressed the regularity by frequency interaction Their model was athree-layer feed-forward network mapping the input of phonologicalstructure of present tense encoded over 120 phonological units representinga CCCVVCCC template for English monosyllables onto an output ofsimilarly coded phonological structure of past tense form Simulation 1

318 ELLIS AND SCHMIDT

where the model was trained on all presentndashpast tense pairs with Francis andKucera frequencies 1 (309 verbs with regular past tenses 104 verbs withirregular past tenses) failed to generate any frequency by regularityinteraction in error score However when in simulation 2 the number ofirregular verbs in the training set was reduced to just 24 this resulted in therebeing little effect of frequency on performance with the regular itemswhereas performance was better for high frequency irregular verbs than forlow frequency ones This is an important demonstration that the frequencyby regularity interaction can be simulated by a connectionist systemHowever this model concerned mappings between present- and past-tenseforms not direct access from semantics as in our human data Furthermoreit is unclear from these simulations how much the results are due toregularity per se how much to phonological factors (for example insimulation 1 the error scores for regulars in generalisation tests were inatedby there being a high proportion of phonologically similar irregular pasttense false friends in the training corpus 1994 p 375) and given thecontrasting results of simulations 1 and 2 how much to the particular choiceof training items and the relative proportions of regular and irregular items

Indeed much of the debate over the validity of all of these models hasconcerned (a) the adequacy of the adopted low-level phonologicalrepresentations whether these might serve as TRICS (The RepresentationsIt Crucially Supposes) which cryptoembody rules within the connectionistnetwork (Lachter amp Bever 1988) (b) over-reliance on phonological cues inmodels that used sound-to-sound conversions to link base forms with pasttense forms (Daugherty amp Seidenberg 1992 MacWhinney 1994MacWhinney amp Leinbach 1991) and (c) the appropriateness of the trainingsets that are used in exposing the models to the evidence of language andwhether they properly reect the types and tokens in representative ratiosof regular and irregular forms in a sequence that plausibly mirrors learnerlanguage exposure at different stages of development (Daugherty ampSeidenberg 1992 Plunkett amp Marchman 1991) The models are usuallyconcerned with child learner language exposure yet here the extrapolationis particularly tenuous since adult language frequency norms are typicallythe only available reference database

In our simple demonstration with its intended focus on the frequency byregularity interaction in the acquisition of morphology we circumventedthese problems by the following means

1 We eliminated TRICS from our input and output representations byentirely ignoring the low level representations and instead simply havingone input unit for each picture and one output unit for each morphemeWe make no pretence of plausibility of these models for low levels ofrepresentation in either input or output processing but we are presently

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 319

neither concerned with low-level feature perception nor the details ofmotor programming for pronunciation Each input unit is supposedbroadly to correspond to activation of some picture detector or ldquoimagenrdquo(Paivio 1986) each output unit to some speech output ldquologogenrdquo(Morton 1979) We acknowledge that these parts of the model are grosslysimplied and we believe that these aspects ultimately involve distributedrepresentations as well However there is one advantage to thissimplicitymdashwhere as here each input detector or output logogen isrepresented by just one unit with all units having the same form there isno scope for making some more similar than others other that is than isdetermined by the frequency of the inputndashoutput mappings Thisencoding scheme allows the most hygienic investigation of frequency andregularity uncontaminated by other factors2 Like Cottrell and Plunkett (1994) we are modelling direct access fromsemantics rather than generating past tense from stem form phonologyBecause there are no phonological representations in our model there isno chance of the results reecting any confound with phonology As usualcosts accompany the benets Our simulations can have no bearing onphonological aspects of inection and thus while they might generatequantitatively clean data unlike the elegant error analyses performed byfor example Daugherty and Seidenberg (1994) and MacWhinney andLeinbach (1991) the error responses in the present simulations will bequalitatively uninteresting3 We eliminated uncertainty about the detailed content of the complexevidence which human learners are exposed to during their early years ofhearing natural language by modelling adult subjectsrsquo learning of theMAL that was reported in the preceding section Because we determinedthe exposure sequence of types and tokens of regular and irregular itemsin this language learning task we could train the models ensuring theidentical history of exposure

The most common architecture of connectionist model has three layersthe input layer of units the output layer and an intervening layer of hiddenunits (HUs) The presence of HUs enables more difcult inputoutputmappings to be learned than would be possible if the input units weredirectly connected to the output units (Broeder amp Plunkett 1994Rumelhart amp McClelland 1986) The most common learning algorithm isldquoback propagationrdquo (Rumelhart Hinton amp Williams 1986) where on eachlearning trial the network compares its output with the target output andany difference is propagated back to the hidden unit weights and in turn tothe input weights in a way that reduces the error Our simulations adoptedthis standard architecture Thus whatever the pattern of results they aregenerated by a very general learning system whose processes were not

320 ELLIS AND SCHMIDT

tweaked in any way to make it particular as a Language Acquisition DeviceSo what are the emergent patterns of language acquisition that result whenthis general associative learning mechanism is applied to the particularcontent of picture stimuli with their corresponding singular and plural lexicalresponses as experienced at the same relative frequencies of exposure as ourhuman learners

The Models

Architecture Every model had 22 input (I-) units Each of I-units 1ndash20represented one of the pictures used in the training set of the AppendixI-unit 21 represented another picture (the generalisation test item TesterP)which was only ever presented for training to the model in the singularmdashlater it was presented as a plural test item to see which plural afx the modelwould choose for this generalisation item (akin to asking you what is theplural of a novel word like ldquowugrdquo) I-unit 22 coded plurality that iswhether a singular stimulus item or a pair were presented Every model had32 output (O-) units O-units 1ndash20 represented the stem forms of the lexisshown in the Appendix O-unit 21 represented the stem form correspondingto I-unit 21 O-units 22ndash31 represented each of the other 10 unique pluralafxes for irregular items O-unit 32 represented the regular plural afxThis numbering of I- and O-units is of course arbitrary and was random-ised across modelsmdashwhat mattered and remained constant was that thesame O-unit was always reinforced whenever a particular I-unit wasactivated

We investigated four different classes of model which differed in theircomputational capacity or resources The larger the number of HUs in amodel the larger the number of connections in the network and the greaterits capacity to learn new associations and abstractions Thus we comparedmodels with 3 5 8 and 15 HUs

Stem Training At the outset the connection weights of the models wererandomised Then just like our human learners the models were rsttrained on the singular forms Each epoch of training consisted of 21 trialsEach trial consisted of presentation of a unique input pattern one for each ofthe input pictures Thus just one of I-units 1ndash21 would be ldquoonrdquo on any trialThroughout the singular training phase I-unit 22 (representing singlepluralstimuli) was set to ldquooffrdquo For each input pattern the model responded with apattern of output over its 32 O-units Initially this was the random result ofthe random connection weights But the model was also presented with thecorrect pattern of output for that corresponding input pattern (eg if I-unit 1

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 321

was on and all others off O-unit 1 should have had value 10 and all otherszero) On each trial the back-propagation algorithm calculated thedifference between the level of activity that was produced on each O-unitand the ldquocorrectrdquo level of activity and a small adjustment was made to theconnection strength to that unit in such a way that when the same processoccurred again a closer approximation to the correct pattern of outputactivation would be achieved The models were trained for 500 epochs ofsingular experience For each size of model we ran ve examples startingwith different arbitrary unit allocation and different initial randomconnection strengths The data we produce for each model is the averageperformance of these ve examples

Plural Training The model weights that resulted from this singulartraining then served as the starting point for another 700 epochs of trainingon plurals The trials constituting each epoch were very similar in nature tothose used with the human learners Each epoch consisted of 81 trialspresented in random order (a) One presentation of each of the 21 singularforms as in the preceding phase (b) ve presentations of each of the ve highfrequency regular (HiFreqReg) plural forms (c) ve presentations of eachof the ve high frequency irregular (HiFreqIrreg) plural forms (d) onepresentation of each of the ve low frequency regular (LoFreqReg) formsand (e) one presentation of each of the ve low frequency irregular(LoFreqIrreg) forms For training trials of type (a) just one of I-units 1ndash21was activated I-unit 22 was off and just the corresponding one of O-units1ndash21 was reinforced For the other training types (bndashe) one of I-units 1ndash20was activated I-unit 22 was on and one of O-units 1ndash20 (the correspondingstem form) along with one of O-units 22ndash32 (the corresponding plural afx)were reinforced The learning algorithm operated as it did in the stemtraining phase At regular intervals we tested the state of learning of themodel by presenting it without feedback with test input patterns thatrepresented the plural cases of all 21 pictures At these tests for eachstimulus we measured the pattern of activation (between 0 [no activation]and 1 [full on]) across O-units 22ndash32 and compared it against the targetplural activation for that input pattern

Results

Regularity by Frequency Figure 2 shows the Root Mean Square (RMS)error calculated across the plural afx O-units (22ndash32) averaged over the veitems in each of the following classes HiFreqReg HiFreqIrreg LoFreqRegLoFreqIrreg at each point in testing of the model These graphs illustratethat learning in all of the models showed clear effects of frequency (high

322

FIG

2

Acq

uisi

tion

data

for

fou

r co

nnec

tioni

st m

odel

s w

ith

incr

easi

ng c

ompu

tati

onal

pow

er t

rain

ed o

n th

e M

AL

mor

phol

ogy

The

re a

re c

lear

reg

ular

ity b

y fr

eque

ncy

inte

ract

ions

in a

ll m

odel

s

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 323

frequency items were learned faster than low frequency ones) regularity(regular items were learned faster than irregular ones) and a frequency byregularity interaction whereby there was much less regularity effect for highfrequency items than for low frequency items and equally that thefrequency effect was less for regular items than for irregular ones

ANOVAs on these RMS data for each size of model demonstrated thatthere was high consistency of response across items and examplesimulations For example when the 8HU model was analysed as a repeatedmeasures ANOVA across 15 roughly equally spaced blocks of training (toparallel the human data analysis) the following signicant effects wereobserved (a) Frequency [by simulations F(1 16) 5 2080 P 00005 bywords F(1 16) 5 5665 P 00001] (b) regularity [by simulations F(1 16)5 907 P 001 by words F(1 16) 5 3957 P 00001] (c) regularity byfrequency [by simulations F(1 16) 5 485 P 005 by words F(1 16) 51561 P 0005] (d) block [by simulations F(14 224) 5 6803 P 00001by words F(14 224) 5 14914 P 00001] (e) block by regularity [bysimulations F(14 224) 5 3675 P 00001 by words F(14 224) 5 2929 P 00001] (f) block by frequency [by simulations F(14 224) 5 1893 P 00001 by words F(14 224) 5 1184 P 00001] and (g) block by regularityby frequency [by simulations F(14 224) 5 1611 P 00001 by words F(14224) 5 1306 P 00001]

Comparison of this pattern of ANOVA effects with that reported earlierfor the human data shows important similarities in both cases there aresignicant main effects of frequency regularity and blocks and there aresignicant interactions involving regularity by frequency and regularity byfrequency by block Thus the connectionist models demonstrate effectswhich broadly parallel those found in humans

Comparison with Human Data More detailed comparison is alsopossible Although RMS error is the usual measure of model performancebecause it assesses how well the network learns to inhibit non-relevant unitsas well as to excite relevant ones we also extracted simple accuracy data forthe 8HU model This accuracy score is the amount of activation (between 0and 1) on the single O-unit which corresponds to the appropriate target afxfor that input pattern Figure 3 shows the performance of the 8HU modelusing this metric It is clear that accuracy scores generate a graph which iseffectively a reection in a horizontal plane of the RMS data shown in thethird panel of Fig 2 In fact in the current simulations correct activation isalmost perfectly correlated with MSE (for example r 5 2 0988 for the 8HUmodel) However the activation metric has the advantage of more readyinterpretation and direct comparison with the human data

When the 8HU model and the human data are aligned as in Fig 3 thesecorrespondences become clear Pairwise comparison of individual points

324 ELLIS AND SCHMIDT

FIG 3 A comparison of human accuracy performance and that of the eight hidden unitconnectionist simulation

across these two graphs by correlation shows that the simulation predicts alarge proportion of the variance in the human data (R2 5 078) There aresome differences in detailmdashas is claried in Fig 4 where performance isaveraged over blocks the model performs somewhat better on the regularitems and worse on the irregular items particularly the low frequencyirregular items than do the humans ANOVA (three factor [humanmodelregularity and frequency] with 15 blocks as repeated measures by wordsanalysis) comparing the human and 8HU model data conrms theseinteractions (a) humanmodel F(1 32) 5 136 ns (b) humanmodel byfrequency F(1 32) 5 047 ns (c) humanmodel by regularity F(1 32) 53028 P 00001 (d) humanmodel by regularity by frequency F(1 32) 5501 P 005

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 325

FIG 4 The regularity by frequency interaction averaged over blocks in humans and the eighthidden unit model Error bars reect 95 condence intervals

Generalisation So far we have described performance with traineditems However we also tested model output when the stimulus was thepattern for generalisation item (TesterP) along with activation of the pluralmarking I-unit 22 a state of input on which the models had never beentrained Table 1 shows performance of the different models at the end oftraining It is clear that the larger models have abstracted the regular pluralpattern and tend to apply it by default to the generalisation test item for the15HU model (a) average activation on the regular plural O-unit is 060 (b)mean RMS error comparing observed activation across O-units 22ndash32 andthe target regular plural pattern (10000000000) is just 045 and (c) four outof the ve exemplar runs of this size of model chose the regular pluralpattern as being the closest to observed output as measured by minimum

326 ELLIS AND SCHMIDT

TABLE 1Performance on the Target Regular Plural Pattern for the Four Sizes of Model When

Presented with the Generalisation Wug-test Item TesterP at End of Training

Model Size

Measure 3HUa 5HU 8HU 15HU

RMS errorb

M 081 079 053 045SD 043 050 045 032

Activation weightc

M 020 028 057 060SD 044 044 052 035

N hits (5)d 1 2 3 4

There were ve examplars of each size of model aHU 5 hidden units bRMS error calculatedagainst the target activation pattern across O-units 22ndash32 for the regular plural afx cActivationweight on the regular plural afx O-Unit dNumber of exemplar models (5) which chose theregular plural afx pattern for TesterP as indexed by output weights on O-units 22ndash32 beingclosest to the regular plural afx target pattern activation using a squared Euclidean distancemetric

squared Euclidean distance Thus when the larger models are presentedwith a plural stimulus which they have only ever previously experienced as asingle form there is a tendency for them to generalise and apply the regularplural morpheme (bu-) in the same way that humans might generalise thatthe plural of ldquowugrdquo is ldquowugsrdquo

Effects of Different Sizes of Model Figure 2 also illustrates the effects ofmanipulating computational capacity of model (1) Models with lowercomputational power ( 5 a smaller number of HUs) learn the high frequencyitems quite wellmdashalmost as well as the largest model (2) The most strikingeffect of varying the computational power of the models lies in their abilitiesto learn low frequency irregular itemsmdashthis is by far the most sensitive indexof morphological learning ability The 3HU model hardly manages to learnthese forms at all The 15HU model eventually learns them rather well (3)There is essentially no frequency effect for regular items in the highercomputational power models but none the less the frequency effect forirregular items remains strong (4) The smaller models continue to show afrequency effect for regular items at the end of training Table l provides oneadditional effect of model size (5) The greater the computational power ofthe models the more they operate in ldquorule-likerdquo way by abstracting aldquoregularrdquo plural form which is applied by default to novel items In sumwhile lower computational power models are reasonably good on highfrequency regular items they show frequency effects for irregular and

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 327

regular items are relatively poor on ldquowug testsrdquo and have particulardifculty on low frequency irregular items

Discussion of Simulations

We believe that at least for the issue of regularity and frequency effects inmorphosyntax this is to date the most complete quantitative analysis of theadequacy of t of simulation to human data We are not simply makingpredictions about how an underspecied model might behave (theDaugherty amp Seidenberg 1994 criticisms of the Pinker amp Prince 1988 andPinker 1991 theories) We are not simply demonstrating that simulation andhuman data alike exhibit rst order interactions of frequency and regularity(Daugherty amp Seidenberg 1994) Instead we are showing the parallelpatterns of signicance of main effects rst and second order interactions inANOVAs of simulation and human data and we are showing that thesimulations explain close to 80 of the relevant human data When we go asfar as actually comparing human and model performance in a multifactorialANOVA we nd some differences of detail in the size of interactions thatare qualied by the humanmodel factor But these differences of detail donot detract from the general success of the models in simulating the humanpattern of development of the frequency by regularity interaction Inhumans and models alike high frequency items were learned signicantlyfaster than low frequency ones regular items were learned signicantlyfaster than irregular ones there was a signicant frequency by regularityinteraction where the frequency effect was less for regular items than forirregular ones and this is qualied as the higher level interaction with blockwhereby there is a developmental trendmdashthe frequency effect for regularitems attenuates faster than that for irregular items

We have demonstrated that the models can generalise and produce thedefault plural afx for a novel stimulus Similar ldquowug testrdquo performance by ahuman learner would be taken as an operationalisation that they hadacquired the ldquoregularrdquo morphological systematicity

Finally we have shown how varying the computational capacity of themodels affects both the rate of acquisition of default case the presence orabsence of frequency effects for regular items and ability to acquireirregular items This is compatible with existing data for children withspecic language impairment (SLI) Oetting and Rice (1993) compared ve-year-old SLI children with age-matched controls on their ability to formplurals The SLI children were signicantly worse at generating regularplurals for nonce (5 wug) items they were worse at generating regularplurals and they showed an effect of frequency on the regular items whichthe control children because of ceiling effects did not UnfortunatelyOetting and Rice (1993) do not provide clear data on the childrenrsquos ability to

328 ELLIS AND SCHMIDT

form irregular plurals However their pattern of differences between SLIand control childrenrsquos performance on regular items is sufciently close tothat between the present low-capacity and high-capacity simulations tosuggest that morphosyntactic impairments in individuals with SLI might beexplained by reduced language processing capacity in a general associativememory network rather than by a hybrid account The SLI childrenrsquosshowing frequency effects for regular items is particularly compelling in thisrespect However further assessment of regularity by frequency effects anddefault abstraction in individuals with SLI and with Williams syndrome(whose ability on regular forms is said to outstrip their performance onirregularsmdashBellugi Bihrle Jernigan Trauner amp Dougherty 1990) isnecessary to test these parallels further (see Marchman 1993 for othersimulations of different types of language dysfunction)

GENERAL DISCUSSION

Fluent language users have processed many millions of utterances involvingtens of thousands of types presented as innumerable tokens It should comeas no surprise either that they demonstrate such effortless and complex skillas a result of this mass of practice or that researchers lacking any truerecord of the learnersrsquo experience are awed and confused by thesesophisticated grammatical abilities While we have no wish to deny any ofthe complexity of the nal uent state we suspect that much of the mysteryof morphology can be claried by focusing on the acquisition process ratherthan the end-point This has been our aim in this paper Our MAL is atravesty of natural language but at least we know the types and tokens in thelearnersrsquo language evidence and there is no need to speculate or argue aboutextrapolations from corpus data or assumptions about registers

Human learning of this MAL inectional morphology quickly culminatesin a state where as with natural language frequency and regularity haveinteractive effects on performance But as we chart acquisition it is clearthat this interaction need not imply complex dual-mechanisms of processingRather it simply reects the asymptotes expected from the power law ofpractice a simple associative law of learning Thus we have shown that oneof the most frequently introduced arguments for the necessity of adual-mechanism approach a frequency effect for irregulars and the absenceof such an effect for regulars is not a good argument at all Furthermore wehave demonstrated that a simple connectionist model as an implementationof associative learning provided with the same language evidenceaccurately simulates the human acquisition data

But how is the power law instantiated in human and connectionistsystems and what is being associated in the acquisition of inectional

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 329

morphosyntax The power law of learning in human performance has beeninterpreted as resulting from basic associative mechanisms involving theformation of new chunks and the effects of frequency on the accessibility ofthese representations (Newell 1990 Newell amp Rosenbloom 1981)Anderson and Schooler (1991) suggest that memory (both as its behaviouralexpression in error rate and latency and as its neural expression in LTP)displays properties such as the power law of learning because theseproperties reect an optimal response to the environment where theprobability of an item occurring at any particular time is a power function ofits past frequency of occurrence Neural activation which controlsbehaviour reects the probability of an item occurring in the environmentthus the neural processes are designed to adapt behaviour to the statisticalproperties of the environment (Anderson 1993) Connectionist systems aredesigned to do the same thing (Chater 1995)

In our simplied account of inectional morphology where phonologicalfactors are put to one side the relevant units for chunking are the stem formsand the plural afxes From an associative perspective regularity andfrequency are essentially the same factor under different names The rstmeaning of ldquoregularrdquo in the Pocket Oxford Dictionary involves ldquohabitualconstantrdquo acts a denition in terms of statistical frequencies consistencyand descriptive generalisation the second stresses ldquoconforming to a rule orprinciplerdquo We need to disentangle these senses (see Sharwood-Smith 1994and Lima Corrigan amp Iverson 1994 for conceptual analysis of ldquorules oflanguagerdquo) Whether regular morphology is generated according to a rule ornot it is certainly the case for English and the MAL under study here (andgenerally it is the default if not the universal casemdashwe will return to thismatter later) that regular afxes are more habitual or frequent And asdemonstrated in Fig 5 the power law of practice entails that an effect of aconstant increment of regularity (in its frequency sense) is much moreapparent at low than at high frequencies of practice

Although it is a general principle the degree to which it applies dependson a range of factors including (a) the exponent of the power function (b)the particular level of experience attained and thus the placement ofcomparison points on the learning curve and (c) the degree to whichfrequency and regularity are additive or multiplicative In the presentexperiment a vefold increase in the frequency of the regular items resultsin a (5 3 the number of regular items) increase in use of the regular afx avefold increase in the frequency of an irregular item results in merely avefold increase in the use of the irregular afx Thus frequency andregularity are interactive rather than additive But even if we allow forinteraction the function still results in greater regularity effects for lowfrequency itemsmdashjust as for example the power function

330 ELLIS AND SCHMIDT

FIG 5 A frequency by regularity interaction arising from additive contributions of regularity(solid horizontal arrows) and frequency (dotted horizontal arrows) inputting into anasymptoting power function Notice in particular the solid vertical bars measuring out the largeregularity effect at low frequencies and the much smaller one at high frequencies (Adaptedfrom Plaut McClelland Seidenberg amp Patterson 1994)

y 5 1 2 x2 2

asymptotes so does any power function

y 5 1 2 (xn)2 2

where n 0 the shape remains the same albeit stretched or condensedalong the horizontal axis Thus all associative accounts of morphologywhether they stress the importance of type or token frequency (Bybee 1995)in the determination of statistical regularity imply a frequency by regularityinteraction in performance

Plaut et al (1996) analyse the operation of connectionist networks in theparticular quasi-regular domain of spellingndashsound consistency in reading todemonstrate how the frequency by regularity interaction is a direct

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 331

consequence of the nonlinearity adaptivity and distributed representationproperties of learning and representation in PDP networks In what followswe will minimally rephrase their analysis as it applies to the quasi-regulardomain of inectional morphology In a connectionist network the weightchanges induced by an inputoutput pattern (IOP) on any training epochserve to reduce the error on that IOP The frequency of the IOP (and theunits it involves) is reected in how often it is presented to the network Thusword frequency directly amplies weight changes that are helpful to theIOP itself Consistency of the morphological inections of two stems isreected in the similarity of afx units that are co-activated in their IOPsFurthermore two inputs will induce similar weight changes to the extentthat they activate similar units In our MAL as an extreme case consistentforms all activate the same afx unit irregular ones each activate a differentidiosyncratic afx Given that the weight changes that are induced by eachIOP are superimposed on the weight changes for all other IOPs an IOPwill tend to be helped by the weight changes for IOPs whose inputoutputmappings are consistent with its own and hindered by the weight changesfor inconsistent IOPs Thus frequency and consistency sum because theyboth arise from similar weight changes that are simply added together duringtraining The weight changes result in corresponding increases in thesummed input to output units that should be active and decreases to thesummed units that should be inactive However due to the non-linearity ofthe input-output function of units these changes do not produce directlyproportionate reduction of error Rather as the magnitude of the summedinput to output units increases their states gradually asymptote towards10mdasha given increase in the summed input to a unit yields progressivelysmaller decrements in error over the course of training Thus althoughfrequency and regularity-as-consistency each contribute to the weights andhence to the summed input to units their effect on error is subjected to agradual ceiling effect as unit states are driven towards extremal values

Thus a connectionist associative account of simple morphosyntax as it isembodied in our MAL holds that learning involves associating inputpatterns representing single or plural concepts with stem and afx lemmasacross a large distributed network Frequency of experience increases thestrength of the appropriate IO associations Regularity effects stem fromconsistency the consistent items all involve pairings between plurality andthe regular lemma and thus regularity is frequency by another name Thenetwork sums and abstracts these consistencies but it does so usingnon-linear unit inputndashoutput functions thereby resulting in the frequency byregularity interaction Networks are not simple competitive chunking orMarkov chaining mechanisms working on surface form Their massivelydistributed nature allows the emergence of more abstract internalrepresentations We have argued that this analysis accounts for the human

332 ELLIS AND SCHMIDT

acquisition data of simple MAL morphosyntax quite well We believe thatthe acquisition of natural language morphosyntax where there are manyadditional factors of different phonological consistencies (of the type forexample where the neighbours sink drink and stink are irregular in theirpast tenses but all behave in the same -ankway) are equally conducive to theprinciples of this type of account although as illustrated in grandersimulation enterprises (Cottrell amp Plunkett 1994 Daugherty amp Seidenberg1994 MacWhinney amp Leinbach 1991 Marchman 1993 Plunkett ampMarchman 1993) the complexity of interaction of the factors that are therein the language evidence leads to much more complex developmentaloutcomes Our role here has been to study human acquisition underprecisely known circumstances and to demonstrate just how well aconnectionist associative account can simulate these data

A simple regularity5 consistency account of this type will have difculty ifthe ldquoregularrdquo or ldquodefaultrdquo case is not the most frequent case in a naturallanguage Although there is agreement for English past tense and formorphology more generally that the default case is more frequent theremay be exceptions Marcus et al (1995) argue that while the German particle-t applies to a much smaller percentage of verbs than its English counterpartand the German plural -s applies only to a small percentage of nounsnevertheless these afxes behave as defaults in the language These defaultsufxations in German could thus pose a problem for statistical orconnectionist accounts of the acquisition of the more frequent patterns asdefault since they may not be due to a large number of regular wordsreinforcing a pattern in associative memory (Prasada amp Pinker 1993)However this is still a matter of some debate Bybee (1995) suggests that amore reasonable method of counting German particle type frequency doesshow the default (or ldquoproductiverdquo) process to have the highest typefrequency She also argues that to a large extent the productivity patterns ofGerman plurals also reect their type frequency Nakisa and Hahn (1996)and Plunkett and Nakisa (in press) show that generalisation to unseen ornovel forms in German and Arabic (where there have also been claims for aminority default) is more accurately predicted by their phonologicalsimilarity to existing forms in the language (properly represented for typeand token frequency) rather than by the operation of a default rule FinallyHare Elman and Daugherty (1995) demonstrate that multilayerednetworks can develop a default category even in the absence of superior typefrequency as long as the non-default classes are well dened and narrowlydened so that they serve as strong prototypes for analogising to novelforms In such cases the area outside these well-dened attractor basins canconstitute a potential default (see also Plunkett amp Marchman 1991)

In the original hybrid model irregulars were stored and accessed fromrote memory Pinker and Prince (1994 p 326) modied this part of the

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 333

model arguing that since rote memory could not account (a) for similaritiesbetween the morphological base and irregular forms (eg swingndashswung) (b)for similarity within sets of base forms undergoing similar processes (egsingndashsang ringndashrang springndashsprang) or (c) for the kind of semi-productivityshown when children produce errors such as bringndashbrang or swingndashswangthe memory system underlying such productions must be associative anddynamic somewhat as connectionism portrays it Yet to account for datasuch as the frequencyregularity interaction this revised hybrid model stillholds that regular forms are rule-governed But a purely rule-based accountof regulars cannot explain false friends effects where regular inconsistentitems (eg bakendashbaked is similar in rhyme to neighbours makendashmade andtakendashtook which have inconsistent past tenses) are produced more slowlythat entirely regular ones (Daugherty amp Seidenberg 1994 Seidenberg ampBruck 1990) or frequency effects on regular forms (Oetting amp Rice 1993Stemberger amp MacWhinney 1986) Unlike connectionist models a rule-based account of regulars cannot explain these aspects of the human dataNor is the regularityfrequency interaction any reason to reject connectionistaccounts of morphosyntax in favour of a hybrid model

REFERENCESAnderson JR (1982) Acquisition of cognitive skill Psychological Review 89 369ndash406Anderson JR (1993) Rules of the mind Hillsdale NJ Lawrence Erlbaum Associates IncAnderson JR amp Schooler LJ (1991) Reections of the environment in memory

Psychological Science 2 396ndash408Beck M (1995) Tracking down the source of NSndashNNS differences in syntactic competence

Unpublished manuscript University of North TexasBellugi U Bihrle A Jernigan D Trauner D amp Dougherty S (1990)

Neuropsychological neurological and neuroanatomical prole of Williams SyndromeAmerican Journal of Medical Genetics 6 115ndash125

Braine MDS Brody RE Brooks PJ Sudhalter V Ross JA Catalano L amp FischSM (1990) Exploring language acquisition in children with a miniature articiallanguage Effects of item and pattern frequency arbitrary subclasses and correctionJournal of Memory and Language 29 591ndash610

Broeder P amp Plunkett K (1994) Connectionism and second language acquisition In NEllis (Ed) Implicit and explicit learning of languages (pp 421ndash454) London AcademicPress

Bybee J (1995) Regular morphology and the lexicon Language and Cognitive Processes10 425ndash455

Chater N (1995) Neural networks The new statistical models of mind In JP Levy DBairaktaris JA Bullinaria amp P Cairns (Eds) Connectionist models of memory andlanguage London UCL Press

Cohen JD MacWhinney B Flatt M amp Provost J (1993) PsyScope A new graphicinteractive environment for designing psychology experiments Behavioral ResearchMethods Instruments and Computers 25 257ndash271

Cottrell G amp Plunkett K (1994) Acquiring the mapping from meaning to soundsConnection Science 6 379ndash412

334 ELLIS AND SCHMIDT

Daugherty KG amp Seidenberg MS (1992) Rules or connections The past tense revisitedIn Proceedings of the 14th annual conference of the Cognitive Science Society (pp 259ndash264)Pittsburgh PA Cognitive Science Society

Daugherty KG amp Seidenberg MS (1994) Beyond rules and exceptions A connectionistapproach to inectional morphology In SD Lima RL Corrigan amp GK Iverson (Eds)The reality of linguistic rules (pp 353ndash388) Amsterdam John Benjamins

DeKeyser R (1997) Beyond explicit rule learning Automatizing second languagemorphosyntax Studies in Second Language Acquisition 19 195ndash222

Ellis NC (1996) Sequencing in SLA Phonological memory chunking and points of orderStudies in Second Language Acquisition 18 91ndash126

Eubank L amp Gregg KR (1995) ldquoEt in Amygdala Egordquo UG (S)LA and neurobiologyStudies in Second Language Acquisition 17 35ndash58

Hare M Elman JL amp Daugherty KG (1995) Default generalisation in connectionistnetworks Language and Cognitive Processes 10 601ndash630

Jung J (1971) The experimenterrsquos dilemma New York Harper amp RowKirsner K (1994) Implicit processes in second language learning In N Ellis (Ed) Implicit

and explicit learning of languages (pp 283ndash312) London Academic PressLachter J amp Bever T (1988) The relation between linguistic structure and associative

theories of language learning A constructive critique of some connectionist learningmodels Cognition 28 195ndash247

Lima SD Corrigan RL amp Iverson GK (Eds) (1994) The reality of linguistic rulesAmsterdam John Benjamins

MacWhinney B (1983) Miniature language systems as tests of use of universal operatingprinciples in second-language learning by children and adults Journal of PsycholinguisticResearch 12 467ndash478

MacWhinney B (1994) The dinosaurs and the ring In SD Lima RL Corrigan amp GKIverson (Eds) The reality of linguistic rules (pp 283ndash320) Amsterdam John Benjamins

MacWhinney B amp Leinbach J (1991) Implementations are not conceptualizationsRevising the verb learning model Cognition 40 121ndash157

Marchman VA (1993) Constraints on plasticity in a connectionist model of the Englishpast tense Journal of Cognitive Neuroscience 5 215ndash234

Marcus GF Brinkmann U Clahsen H Wiese R amp Pinker S (1995) Germaninection The exception that proves the rule Cognitive Psychology 29 198ndash256

McLaughlin B (1980) On the use of miniature articial languages in second-languageresearch Applied Psycholinguistics 1 357ndash369

Moeser SD amp Bregman AS (1972) The role of reference in the acquisition of a miniaturearticial language Journal of Verbal Learning and Verbal Behavior 11 759ndash769

Morgan JL Meier RP amp Newport EL (1987) Structural packaging in the input tolanguage learning Contributions of prosodic and morphological marking of phrases to theacquisition of language Cognitive Psychology 19 498ndash550

Morgan JL amp Newport EL (1981) The role of constituent structure in the induction of anarticial language Journal of Verbal Learning and Verbal Behavior 20 67ndash85

Morton J (1979) Facilitation in word recognition Experiments causing change in thelogogen model In PA Kolers ME Wrolstad amp M Bouma (Eds) Processing of visiblelanguage (pp 259ndash268) New York Plenum

Nakisa R amp Hahn U (1996) Where defaults donrsquot help The case of the German pluralsystem In Proceedings of the 18th annual conference of the Cognitive Science Society (pp177ndash182) Hillsdale NJ Lawrence Erlbaum Associates Inc

Newell A (1990) Unied theories of cognition Cambridge MA Harvard University PressNewell A amp Rosenbloom P (1981) Mechanisms of skill acquisition and the law of

practice In JR Anderson (Ed) Cognitive skills and their acquisition Hillsdale NJLawrence Erlbaum Associates Inc

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 335

Oetting JB amp Rice ML (1993) Plural acquisition in children with specic languageimpairment Journal of Speech and Hearing Research 36 1236ndash1248

Paivio A (1986) Mental representations A dual coding approach Oxford UK OxfordUniversity Press

Palermo DS amp Howe HE (1970) An experimental analogy to the learning of past-tenseinection rules Journal of Verbal Learning and Verbal Behavior 9 410ndash416

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S amp Prince A (1988) On language and connectionism Analysis of a parallel

distributed processing model of language acquisition Cognition 29 195ndash247Pinker S amp Prince A (1994) Regular and irregular morphology and the psychological

status of rules of grammar In SD Lima RL Corrigan amp GK Iverson (Eds) The reality oflinguistic rules (pp 321ndash351) Amsterdam John Benjamins

Plaut DC McClelland JL Seidenberg MS amp Patterson KE (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plunkett K amp Marchman V (1991) U-shaped learning and frequency effects in amulti-layered perceptron Implications for child language acquisition Cognition 38 3ndash102

Plunkett K amp Marchman V (1993) From rote learning to system building Acquiring verbmorphology in children and connectionist nets Cognition 48 21ndash69

Plunkett K amp Nakisa RC (in press) A connectionist model of Arabic plural systemLanguage and Cognitive Processes

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Prasada S Pinker S amp Snyder W (1990) Some evidence that irregular forms are retrievedfrom memory but regular forms are rule-governed Paper presented at the 31st meeting ofthe Psychonomic Society New Orleans November

Rumelhart D Hinton G amp Williams R (1986) Learning internal representations by backpropagation In DE Rumelhart amp JL McClelland (Ed) Parallel distributed processingExplorations in the microstructure of cognition Cambridge MA MIT Press

Rumelhart D amp McClelland J (1986) On learning the past tense of English verbs In DERumelhart amp JL McClelland (Eds) Parallel distributed processing Explorations in themicrostructure of cognition Vol 2 Psychological and biological models (pp 272ndash326)Cambridge MA MIT Press

Seidenberg MS amp Bruck M (1990) Consistency effects in the generation of past tensemorphology Paper presented at the 31st meeting of the Psychonomic Society New OrleansNovember

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behavior 23 383ndash404

Sharwood Smith MA (1994) The unruly world of language In N Ellis (Ed) Implicit andexplicit learning of languages (pp 33ndash44) London Academic Press

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Human Learning and Memory 6 174ndash215

Stemberger JP amp MacWhinney B (1986) Frequency and the lexical storage of regularlyinected forms Memory and Cognition 14 17ndash26

Winter B amp Reber AS (1994) Implicit learning and the acquisition of natural languagesIn N Ellis (Ed) Implicit and explicit learning of languages (pp 115ndash146) LondonAcademic Press

Yang LR amp Givon T (1997) Benets and drawbacks of controlled laboratory studies ofsecond language acquisition The Keck second language learning project Studies in SecondLanguage Acquisition 19 173ndash194

336 ELLIS AND SCHMIDT

APPENDIX

The Word-forms of the Articial Language

Picture Stem Plural Form Frequency Regularity

car garth bugarth 5 Rbed pid bupid 1 Rlamp lant bulant 5 Rtable tib butid 1 Rplane poon bupoon 5 Rball prill buprill 1 Rtrain dram budram 5 Rhouse hize buhize 1 Rbook bisk bubisk 5 Rbroom breen bubreen 1 Rphone feem gofeem 5 Iumbrella brol gubrol 1 Ichair charp zecharp 5 Ihorse naig zonaig 1 Imonkey chonk nuchonk 5 Idog woop niwoop 1 Ielephant fant vefant 5 Iscissors zoze vuzoze 1 Ikite kag rekag 5 Ish pisc ropisc 1 I

316 ELLIS AND SCHMIDT

acquisition Kirsner (1994) has shown that lexical recognition processes(both for speech perception and reading) and lexical production processes(articulation and writing) are independently governed by the relationshipT5 BN-a where T is some measure of latency of response and N the numberof trials of practice DeKeyser (1997) shows that automatisation ofcomprehension and production performance involving explicitly learnedsecond-language morphosyntax separately follow independent skill-specic power functions Ellis (1996) describes the general implications ofthe power law for second-language acquisition

The human acquisition data in Fig 1 clearly follow the power law oflearning Thus as performance approaches asymptote so previouslyseparated functions tend to converge High frequency items are closer toasymptote Therefore whereas performance levels for regular and irregularitems are clearly distinguishable at low frequencies they are much lessdistinct at high frequencies This comes as no surprise to us when weconsider the ceiling imposed by 100 accuracy But the power law ofpractice equally implies an asymptotic ceiling whatever our performancemeasure

The power law entails that the contribution of any potential independentvariable affecting performance will be more difcult to demonstrate withhigh-frequency items in practised individuals This is certainly the case inreading For example while spelling and graphemendashphoneme regularityhave clear effects on low frequency items they show little or no effectsamong high frequency words (Seidenberg Waters Barnes amp Tanenhaus1984) Our learning data illustrate the same principle operating in theacquisition of morphology It is not the case that there is no regularity effecton high frequency items (or concomitantly no frequency effect on regularitems) it is simply that such effects are much smaller closer to asymptoteand thus are likely to be swamped by random error Indeed highfrequency regular inected forms do exhibit a small (but non-signicant)advantage over low frequency forms in naturally occurring errorsand they can be shown to have a larger (signicant) advantage ina more controlled experimental task in which subjects produced thepast-tense forms of regular English verbs (Stemberger amp MacWhinney1986)

We have shown that the interaction of frequency and regularity resultsfrom developmental trends that are consistent with the ubiquitousdescriptive law of associative learning In the next section we willdemonstrate how such data can be generated by a very general mechanismof associative learning When presented with the same materials at the samerelative frequencies of exposure a standard three-layer feed-forwardconnectionist model closely simulates our language-learnersrsquo acquisitioncurves

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 317

CONNECTIONIST SIMULATIONS

Connectionist models allow the assessment of just how much of languageacquisition can be done by extraction of probabilistic patterns ofgrammatical and morphological regularities Since the only relation inconnectionist models is strength of association between nodes they areexcellent modelling media in which to investigate the formation ofassociations (both between surface-form elements and between these andemergent more abstract internal representations) as a result of exposure tolanguage The advantages of connectionist models over traditional symbolicmodels are that (a) they are neurally inspired (b) they incorporatedistributed representation and control of information (c) they are data-driven with prototypical representations emerging as a natural outcomeof the learning process rather than being prespecied and innately givenby the modellers as in more nativist cognitive accounts (d) they showgraceful degradation as do humans with language disorder and (e)they are in essence models of learning and acquisition rather than staticdescriptions

There have been a number of compelling PDP models of the acquisition ofmorphology The pioneers were Rumelhart and McClelland (1986) whoshowed that a simple learning model reproduced to a remarkable degreethe characteristics of young children learning the morphology of the pasttense in Englishmdashthe model generated the so-called U-shaped learningcurve for irregular forms it exhibited a tendency to overgeneralise and inthe model as in children different past-tense forms for the same word couldco-exist at the same time Yet there was no ldquorulerdquomdashldquoit is possible to imaginethat the system simply stores a set of rote-associations between base andpast-tense forms with novel responses generated by lsquoon-linersquo generalisationsfrom the stored exemplarsrdquo (Rumelhart amp McClelland 1986 p 267) Thisoriginal past-tense model was very inuential It laid the foundations for theconnectionist approach to language research which this special issue attestsit generated a large number of criticisms (Lachter amp Bever 1988 Pinker ampPrince 1988) some of which are undeniably valid and in turn it thusspawned a number of revised and improved PDP models of different aspectsof the acquisition of the English past tense (eg Cottrell amp Plunkett 1994Daugherty amp Seidenberg 1994 MacWhinney amp Leinbach 1991Marchman 1993 Plunkett amp Marchman 1991)

Of these newer models only that of Daugherty and Seidenberg (19921994) addressed the regularity by frequency interaction Their model was athree-layer feed-forward network mapping the input of phonologicalstructure of present tense encoded over 120 phonological units representinga CCCVVCCC template for English monosyllables onto an output ofsimilarly coded phonological structure of past tense form Simulation 1

318 ELLIS AND SCHMIDT

where the model was trained on all presentndashpast tense pairs with Francis andKucera frequencies 1 (309 verbs with regular past tenses 104 verbs withirregular past tenses) failed to generate any frequency by regularityinteraction in error score However when in simulation 2 the number ofirregular verbs in the training set was reduced to just 24 this resulted in therebeing little effect of frequency on performance with the regular itemswhereas performance was better for high frequency irregular verbs than forlow frequency ones This is an important demonstration that the frequencyby regularity interaction can be simulated by a connectionist systemHowever this model concerned mappings between present- and past-tenseforms not direct access from semantics as in our human data Furthermoreit is unclear from these simulations how much the results are due toregularity per se how much to phonological factors (for example insimulation 1 the error scores for regulars in generalisation tests were inatedby there being a high proportion of phonologically similar irregular pasttense false friends in the training corpus 1994 p 375) and given thecontrasting results of simulations 1 and 2 how much to the particular choiceof training items and the relative proportions of regular and irregular items

Indeed much of the debate over the validity of all of these models hasconcerned (a) the adequacy of the adopted low-level phonologicalrepresentations whether these might serve as TRICS (The RepresentationsIt Crucially Supposes) which cryptoembody rules within the connectionistnetwork (Lachter amp Bever 1988) (b) over-reliance on phonological cues inmodels that used sound-to-sound conversions to link base forms with pasttense forms (Daugherty amp Seidenberg 1992 MacWhinney 1994MacWhinney amp Leinbach 1991) and (c) the appropriateness of the trainingsets that are used in exposing the models to the evidence of language andwhether they properly reect the types and tokens in representative ratiosof regular and irregular forms in a sequence that plausibly mirrors learnerlanguage exposure at different stages of development (Daugherty ampSeidenberg 1992 Plunkett amp Marchman 1991) The models are usuallyconcerned with child learner language exposure yet here the extrapolationis particularly tenuous since adult language frequency norms are typicallythe only available reference database

In our simple demonstration with its intended focus on the frequency byregularity interaction in the acquisition of morphology we circumventedthese problems by the following means

1 We eliminated TRICS from our input and output representations byentirely ignoring the low level representations and instead simply havingone input unit for each picture and one output unit for each morphemeWe make no pretence of plausibility of these models for low levels ofrepresentation in either input or output processing but we are presently

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 319

neither concerned with low-level feature perception nor the details ofmotor programming for pronunciation Each input unit is supposedbroadly to correspond to activation of some picture detector or ldquoimagenrdquo(Paivio 1986) each output unit to some speech output ldquologogenrdquo(Morton 1979) We acknowledge that these parts of the model are grosslysimplied and we believe that these aspects ultimately involve distributedrepresentations as well However there is one advantage to thissimplicitymdashwhere as here each input detector or output logogen isrepresented by just one unit with all units having the same form there isno scope for making some more similar than others other that is than isdetermined by the frequency of the inputndashoutput mappings Thisencoding scheme allows the most hygienic investigation of frequency andregularity uncontaminated by other factors2 Like Cottrell and Plunkett (1994) we are modelling direct access fromsemantics rather than generating past tense from stem form phonologyBecause there are no phonological representations in our model there isno chance of the results reecting any confound with phonology As usualcosts accompany the benets Our simulations can have no bearing onphonological aspects of inection and thus while they might generatequantitatively clean data unlike the elegant error analyses performed byfor example Daugherty and Seidenberg (1994) and MacWhinney andLeinbach (1991) the error responses in the present simulations will bequalitatively uninteresting3 We eliminated uncertainty about the detailed content of the complexevidence which human learners are exposed to during their early years ofhearing natural language by modelling adult subjectsrsquo learning of theMAL that was reported in the preceding section Because we determinedthe exposure sequence of types and tokens of regular and irregular itemsin this language learning task we could train the models ensuring theidentical history of exposure

The most common architecture of connectionist model has three layersthe input layer of units the output layer and an intervening layer of hiddenunits (HUs) The presence of HUs enables more difcult inputoutputmappings to be learned than would be possible if the input units weredirectly connected to the output units (Broeder amp Plunkett 1994Rumelhart amp McClelland 1986) The most common learning algorithm isldquoback propagationrdquo (Rumelhart Hinton amp Williams 1986) where on eachlearning trial the network compares its output with the target output andany difference is propagated back to the hidden unit weights and in turn tothe input weights in a way that reduces the error Our simulations adoptedthis standard architecture Thus whatever the pattern of results they aregenerated by a very general learning system whose processes were not

320 ELLIS AND SCHMIDT

tweaked in any way to make it particular as a Language Acquisition DeviceSo what are the emergent patterns of language acquisition that result whenthis general associative learning mechanism is applied to the particularcontent of picture stimuli with their corresponding singular and plural lexicalresponses as experienced at the same relative frequencies of exposure as ourhuman learners

The Models

Architecture Every model had 22 input (I-) units Each of I-units 1ndash20represented one of the pictures used in the training set of the AppendixI-unit 21 represented another picture (the generalisation test item TesterP)which was only ever presented for training to the model in the singularmdashlater it was presented as a plural test item to see which plural afx the modelwould choose for this generalisation item (akin to asking you what is theplural of a novel word like ldquowugrdquo) I-unit 22 coded plurality that iswhether a singular stimulus item or a pair were presented Every model had32 output (O-) units O-units 1ndash20 represented the stem forms of the lexisshown in the Appendix O-unit 21 represented the stem form correspondingto I-unit 21 O-units 22ndash31 represented each of the other 10 unique pluralafxes for irregular items O-unit 32 represented the regular plural afxThis numbering of I- and O-units is of course arbitrary and was random-ised across modelsmdashwhat mattered and remained constant was that thesame O-unit was always reinforced whenever a particular I-unit wasactivated

We investigated four different classes of model which differed in theircomputational capacity or resources The larger the number of HUs in amodel the larger the number of connections in the network and the greaterits capacity to learn new associations and abstractions Thus we comparedmodels with 3 5 8 and 15 HUs

Stem Training At the outset the connection weights of the models wererandomised Then just like our human learners the models were rsttrained on the singular forms Each epoch of training consisted of 21 trialsEach trial consisted of presentation of a unique input pattern one for each ofthe input pictures Thus just one of I-units 1ndash21 would be ldquoonrdquo on any trialThroughout the singular training phase I-unit 22 (representing singlepluralstimuli) was set to ldquooffrdquo For each input pattern the model responded with apattern of output over its 32 O-units Initially this was the random result ofthe random connection weights But the model was also presented with thecorrect pattern of output for that corresponding input pattern (eg if I-unit 1

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 321

was on and all others off O-unit 1 should have had value 10 and all otherszero) On each trial the back-propagation algorithm calculated thedifference between the level of activity that was produced on each O-unitand the ldquocorrectrdquo level of activity and a small adjustment was made to theconnection strength to that unit in such a way that when the same processoccurred again a closer approximation to the correct pattern of outputactivation would be achieved The models were trained for 500 epochs ofsingular experience For each size of model we ran ve examples startingwith different arbitrary unit allocation and different initial randomconnection strengths The data we produce for each model is the averageperformance of these ve examples

Plural Training The model weights that resulted from this singulartraining then served as the starting point for another 700 epochs of trainingon plurals The trials constituting each epoch were very similar in nature tothose used with the human learners Each epoch consisted of 81 trialspresented in random order (a) One presentation of each of the 21 singularforms as in the preceding phase (b) ve presentations of each of the ve highfrequency regular (HiFreqReg) plural forms (c) ve presentations of eachof the ve high frequency irregular (HiFreqIrreg) plural forms (d) onepresentation of each of the ve low frequency regular (LoFreqReg) formsand (e) one presentation of each of the ve low frequency irregular(LoFreqIrreg) forms For training trials of type (a) just one of I-units 1ndash21was activated I-unit 22 was off and just the corresponding one of O-units1ndash21 was reinforced For the other training types (bndashe) one of I-units 1ndash20was activated I-unit 22 was on and one of O-units 1ndash20 (the correspondingstem form) along with one of O-units 22ndash32 (the corresponding plural afx)were reinforced The learning algorithm operated as it did in the stemtraining phase At regular intervals we tested the state of learning of themodel by presenting it without feedback with test input patterns thatrepresented the plural cases of all 21 pictures At these tests for eachstimulus we measured the pattern of activation (between 0 [no activation]and 1 [full on]) across O-units 22ndash32 and compared it against the targetplural activation for that input pattern

Results

Regularity by Frequency Figure 2 shows the Root Mean Square (RMS)error calculated across the plural afx O-units (22ndash32) averaged over the veitems in each of the following classes HiFreqReg HiFreqIrreg LoFreqRegLoFreqIrreg at each point in testing of the model These graphs illustratethat learning in all of the models showed clear effects of frequency (high

322

FIG

2

Acq

uisi

tion

data

for

fou

r co

nnec

tioni

st m

odel

s w

ith

incr

easi

ng c

ompu

tati

onal

pow

er t

rain

ed o

n th

e M

AL

mor

phol

ogy

The

re a

re c

lear

reg

ular

ity b

y fr

eque

ncy

inte

ract

ions

in a

ll m

odel

s

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 323

frequency items were learned faster than low frequency ones) regularity(regular items were learned faster than irregular ones) and a frequency byregularity interaction whereby there was much less regularity effect for highfrequency items than for low frequency items and equally that thefrequency effect was less for regular items than for irregular ones

ANOVAs on these RMS data for each size of model demonstrated thatthere was high consistency of response across items and examplesimulations For example when the 8HU model was analysed as a repeatedmeasures ANOVA across 15 roughly equally spaced blocks of training (toparallel the human data analysis) the following signicant effects wereobserved (a) Frequency [by simulations F(1 16) 5 2080 P 00005 bywords F(1 16) 5 5665 P 00001] (b) regularity [by simulations F(1 16)5 907 P 001 by words F(1 16) 5 3957 P 00001] (c) regularity byfrequency [by simulations F(1 16) 5 485 P 005 by words F(1 16) 51561 P 0005] (d) block [by simulations F(14 224) 5 6803 P 00001by words F(14 224) 5 14914 P 00001] (e) block by regularity [bysimulations F(14 224) 5 3675 P 00001 by words F(14 224) 5 2929 P 00001] (f) block by frequency [by simulations F(14 224) 5 1893 P 00001 by words F(14 224) 5 1184 P 00001] and (g) block by regularityby frequency [by simulations F(14 224) 5 1611 P 00001 by words F(14224) 5 1306 P 00001]

Comparison of this pattern of ANOVA effects with that reported earlierfor the human data shows important similarities in both cases there aresignicant main effects of frequency regularity and blocks and there aresignicant interactions involving regularity by frequency and regularity byfrequency by block Thus the connectionist models demonstrate effectswhich broadly parallel those found in humans

Comparison with Human Data More detailed comparison is alsopossible Although RMS error is the usual measure of model performancebecause it assesses how well the network learns to inhibit non-relevant unitsas well as to excite relevant ones we also extracted simple accuracy data forthe 8HU model This accuracy score is the amount of activation (between 0and 1) on the single O-unit which corresponds to the appropriate target afxfor that input pattern Figure 3 shows the performance of the 8HU modelusing this metric It is clear that accuracy scores generate a graph which iseffectively a reection in a horizontal plane of the RMS data shown in thethird panel of Fig 2 In fact in the current simulations correct activation isalmost perfectly correlated with MSE (for example r 5 2 0988 for the 8HUmodel) However the activation metric has the advantage of more readyinterpretation and direct comparison with the human data

When the 8HU model and the human data are aligned as in Fig 3 thesecorrespondences become clear Pairwise comparison of individual points

324 ELLIS AND SCHMIDT

FIG 3 A comparison of human accuracy performance and that of the eight hidden unitconnectionist simulation

across these two graphs by correlation shows that the simulation predicts alarge proportion of the variance in the human data (R2 5 078) There aresome differences in detailmdashas is claried in Fig 4 where performance isaveraged over blocks the model performs somewhat better on the regularitems and worse on the irregular items particularly the low frequencyirregular items than do the humans ANOVA (three factor [humanmodelregularity and frequency] with 15 blocks as repeated measures by wordsanalysis) comparing the human and 8HU model data conrms theseinteractions (a) humanmodel F(1 32) 5 136 ns (b) humanmodel byfrequency F(1 32) 5 047 ns (c) humanmodel by regularity F(1 32) 53028 P 00001 (d) humanmodel by regularity by frequency F(1 32) 5501 P 005

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 325

FIG 4 The regularity by frequency interaction averaged over blocks in humans and the eighthidden unit model Error bars reect 95 condence intervals

Generalisation So far we have described performance with traineditems However we also tested model output when the stimulus was thepattern for generalisation item (TesterP) along with activation of the pluralmarking I-unit 22 a state of input on which the models had never beentrained Table 1 shows performance of the different models at the end oftraining It is clear that the larger models have abstracted the regular pluralpattern and tend to apply it by default to the generalisation test item for the15HU model (a) average activation on the regular plural O-unit is 060 (b)mean RMS error comparing observed activation across O-units 22ndash32 andthe target regular plural pattern (10000000000) is just 045 and (c) four outof the ve exemplar runs of this size of model chose the regular pluralpattern as being the closest to observed output as measured by minimum

326 ELLIS AND SCHMIDT

TABLE 1Performance on the Target Regular Plural Pattern for the Four Sizes of Model When

Presented with the Generalisation Wug-test Item TesterP at End of Training

Model Size

Measure 3HUa 5HU 8HU 15HU

RMS errorb

M 081 079 053 045SD 043 050 045 032

Activation weightc

M 020 028 057 060SD 044 044 052 035

N hits (5)d 1 2 3 4

There were ve examplars of each size of model aHU 5 hidden units bRMS error calculatedagainst the target activation pattern across O-units 22ndash32 for the regular plural afx cActivationweight on the regular plural afx O-Unit dNumber of exemplar models (5) which chose theregular plural afx pattern for TesterP as indexed by output weights on O-units 22ndash32 beingclosest to the regular plural afx target pattern activation using a squared Euclidean distancemetric

squared Euclidean distance Thus when the larger models are presentedwith a plural stimulus which they have only ever previously experienced as asingle form there is a tendency for them to generalise and apply the regularplural morpheme (bu-) in the same way that humans might generalise thatthe plural of ldquowugrdquo is ldquowugsrdquo

Effects of Different Sizes of Model Figure 2 also illustrates the effects ofmanipulating computational capacity of model (1) Models with lowercomputational power ( 5 a smaller number of HUs) learn the high frequencyitems quite wellmdashalmost as well as the largest model (2) The most strikingeffect of varying the computational power of the models lies in their abilitiesto learn low frequency irregular itemsmdashthis is by far the most sensitive indexof morphological learning ability The 3HU model hardly manages to learnthese forms at all The 15HU model eventually learns them rather well (3)There is essentially no frequency effect for regular items in the highercomputational power models but none the less the frequency effect forirregular items remains strong (4) The smaller models continue to show afrequency effect for regular items at the end of training Table l provides oneadditional effect of model size (5) The greater the computational power ofthe models the more they operate in ldquorule-likerdquo way by abstracting aldquoregularrdquo plural form which is applied by default to novel items In sumwhile lower computational power models are reasonably good on highfrequency regular items they show frequency effects for irregular and

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 327

regular items are relatively poor on ldquowug testsrdquo and have particulardifculty on low frequency irregular items

Discussion of Simulations

We believe that at least for the issue of regularity and frequency effects inmorphosyntax this is to date the most complete quantitative analysis of theadequacy of t of simulation to human data We are not simply makingpredictions about how an underspecied model might behave (theDaugherty amp Seidenberg 1994 criticisms of the Pinker amp Prince 1988 andPinker 1991 theories) We are not simply demonstrating that simulation andhuman data alike exhibit rst order interactions of frequency and regularity(Daugherty amp Seidenberg 1994) Instead we are showing the parallelpatterns of signicance of main effects rst and second order interactions inANOVAs of simulation and human data and we are showing that thesimulations explain close to 80 of the relevant human data When we go asfar as actually comparing human and model performance in a multifactorialANOVA we nd some differences of detail in the size of interactions thatare qualied by the humanmodel factor But these differences of detail donot detract from the general success of the models in simulating the humanpattern of development of the frequency by regularity interaction Inhumans and models alike high frequency items were learned signicantlyfaster than low frequency ones regular items were learned signicantlyfaster than irregular ones there was a signicant frequency by regularityinteraction where the frequency effect was less for regular items than forirregular ones and this is qualied as the higher level interaction with blockwhereby there is a developmental trendmdashthe frequency effect for regularitems attenuates faster than that for irregular items

We have demonstrated that the models can generalise and produce thedefault plural afx for a novel stimulus Similar ldquowug testrdquo performance by ahuman learner would be taken as an operationalisation that they hadacquired the ldquoregularrdquo morphological systematicity

Finally we have shown how varying the computational capacity of themodels affects both the rate of acquisition of default case the presence orabsence of frequency effects for regular items and ability to acquireirregular items This is compatible with existing data for children withspecic language impairment (SLI) Oetting and Rice (1993) compared ve-year-old SLI children with age-matched controls on their ability to formplurals The SLI children were signicantly worse at generating regularplurals for nonce (5 wug) items they were worse at generating regularplurals and they showed an effect of frequency on the regular items whichthe control children because of ceiling effects did not UnfortunatelyOetting and Rice (1993) do not provide clear data on the childrenrsquos ability to

328 ELLIS AND SCHMIDT

form irregular plurals However their pattern of differences between SLIand control childrenrsquos performance on regular items is sufciently close tothat between the present low-capacity and high-capacity simulations tosuggest that morphosyntactic impairments in individuals with SLI might beexplained by reduced language processing capacity in a general associativememory network rather than by a hybrid account The SLI childrenrsquosshowing frequency effects for regular items is particularly compelling in thisrespect However further assessment of regularity by frequency effects anddefault abstraction in individuals with SLI and with Williams syndrome(whose ability on regular forms is said to outstrip their performance onirregularsmdashBellugi Bihrle Jernigan Trauner amp Dougherty 1990) isnecessary to test these parallels further (see Marchman 1993 for othersimulations of different types of language dysfunction)

GENERAL DISCUSSION

Fluent language users have processed many millions of utterances involvingtens of thousands of types presented as innumerable tokens It should comeas no surprise either that they demonstrate such effortless and complex skillas a result of this mass of practice or that researchers lacking any truerecord of the learnersrsquo experience are awed and confused by thesesophisticated grammatical abilities While we have no wish to deny any ofthe complexity of the nal uent state we suspect that much of the mysteryof morphology can be claried by focusing on the acquisition process ratherthan the end-point This has been our aim in this paper Our MAL is atravesty of natural language but at least we know the types and tokens in thelearnersrsquo language evidence and there is no need to speculate or argue aboutextrapolations from corpus data or assumptions about registers

Human learning of this MAL inectional morphology quickly culminatesin a state where as with natural language frequency and regularity haveinteractive effects on performance But as we chart acquisition it is clearthat this interaction need not imply complex dual-mechanisms of processingRather it simply reects the asymptotes expected from the power law ofpractice a simple associative law of learning Thus we have shown that oneof the most frequently introduced arguments for the necessity of adual-mechanism approach a frequency effect for irregulars and the absenceof such an effect for regulars is not a good argument at all Furthermore wehave demonstrated that a simple connectionist model as an implementationof associative learning provided with the same language evidenceaccurately simulates the human acquisition data

But how is the power law instantiated in human and connectionistsystems and what is being associated in the acquisition of inectional

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 329

morphosyntax The power law of learning in human performance has beeninterpreted as resulting from basic associative mechanisms involving theformation of new chunks and the effects of frequency on the accessibility ofthese representations (Newell 1990 Newell amp Rosenbloom 1981)Anderson and Schooler (1991) suggest that memory (both as its behaviouralexpression in error rate and latency and as its neural expression in LTP)displays properties such as the power law of learning because theseproperties reect an optimal response to the environment where theprobability of an item occurring at any particular time is a power function ofits past frequency of occurrence Neural activation which controlsbehaviour reects the probability of an item occurring in the environmentthus the neural processes are designed to adapt behaviour to the statisticalproperties of the environment (Anderson 1993) Connectionist systems aredesigned to do the same thing (Chater 1995)

In our simplied account of inectional morphology where phonologicalfactors are put to one side the relevant units for chunking are the stem formsand the plural afxes From an associative perspective regularity andfrequency are essentially the same factor under different names The rstmeaning of ldquoregularrdquo in the Pocket Oxford Dictionary involves ldquohabitualconstantrdquo acts a denition in terms of statistical frequencies consistencyand descriptive generalisation the second stresses ldquoconforming to a rule orprinciplerdquo We need to disentangle these senses (see Sharwood-Smith 1994and Lima Corrigan amp Iverson 1994 for conceptual analysis of ldquorules oflanguagerdquo) Whether regular morphology is generated according to a rule ornot it is certainly the case for English and the MAL under study here (andgenerally it is the default if not the universal casemdashwe will return to thismatter later) that regular afxes are more habitual or frequent And asdemonstrated in Fig 5 the power law of practice entails that an effect of aconstant increment of regularity (in its frequency sense) is much moreapparent at low than at high frequencies of practice

Although it is a general principle the degree to which it applies dependson a range of factors including (a) the exponent of the power function (b)the particular level of experience attained and thus the placement ofcomparison points on the learning curve and (c) the degree to whichfrequency and regularity are additive or multiplicative In the presentexperiment a vefold increase in the frequency of the regular items resultsin a (5 3 the number of regular items) increase in use of the regular afx avefold increase in the frequency of an irregular item results in merely avefold increase in the use of the irregular afx Thus frequency andregularity are interactive rather than additive But even if we allow forinteraction the function still results in greater regularity effects for lowfrequency itemsmdashjust as for example the power function

330 ELLIS AND SCHMIDT

FIG 5 A frequency by regularity interaction arising from additive contributions of regularity(solid horizontal arrows) and frequency (dotted horizontal arrows) inputting into anasymptoting power function Notice in particular the solid vertical bars measuring out the largeregularity effect at low frequencies and the much smaller one at high frequencies (Adaptedfrom Plaut McClelland Seidenberg amp Patterson 1994)

y 5 1 2 x2 2

asymptotes so does any power function

y 5 1 2 (xn)2 2

where n 0 the shape remains the same albeit stretched or condensedalong the horizontal axis Thus all associative accounts of morphologywhether they stress the importance of type or token frequency (Bybee 1995)in the determination of statistical regularity imply a frequency by regularityinteraction in performance

Plaut et al (1996) analyse the operation of connectionist networks in theparticular quasi-regular domain of spellingndashsound consistency in reading todemonstrate how the frequency by regularity interaction is a direct

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 331

consequence of the nonlinearity adaptivity and distributed representationproperties of learning and representation in PDP networks In what followswe will minimally rephrase their analysis as it applies to the quasi-regulardomain of inectional morphology In a connectionist network the weightchanges induced by an inputoutput pattern (IOP) on any training epochserve to reduce the error on that IOP The frequency of the IOP (and theunits it involves) is reected in how often it is presented to the network Thusword frequency directly amplies weight changes that are helpful to theIOP itself Consistency of the morphological inections of two stems isreected in the similarity of afx units that are co-activated in their IOPsFurthermore two inputs will induce similar weight changes to the extentthat they activate similar units In our MAL as an extreme case consistentforms all activate the same afx unit irregular ones each activate a differentidiosyncratic afx Given that the weight changes that are induced by eachIOP are superimposed on the weight changes for all other IOPs an IOPwill tend to be helped by the weight changes for IOPs whose inputoutputmappings are consistent with its own and hindered by the weight changesfor inconsistent IOPs Thus frequency and consistency sum because theyboth arise from similar weight changes that are simply added together duringtraining The weight changes result in corresponding increases in thesummed input to output units that should be active and decreases to thesummed units that should be inactive However due to the non-linearity ofthe input-output function of units these changes do not produce directlyproportionate reduction of error Rather as the magnitude of the summedinput to output units increases their states gradually asymptote towards10mdasha given increase in the summed input to a unit yields progressivelysmaller decrements in error over the course of training Thus althoughfrequency and regularity-as-consistency each contribute to the weights andhence to the summed input to units their effect on error is subjected to agradual ceiling effect as unit states are driven towards extremal values

Thus a connectionist associative account of simple morphosyntax as it isembodied in our MAL holds that learning involves associating inputpatterns representing single or plural concepts with stem and afx lemmasacross a large distributed network Frequency of experience increases thestrength of the appropriate IO associations Regularity effects stem fromconsistency the consistent items all involve pairings between plurality andthe regular lemma and thus regularity is frequency by another name Thenetwork sums and abstracts these consistencies but it does so usingnon-linear unit inputndashoutput functions thereby resulting in the frequency byregularity interaction Networks are not simple competitive chunking orMarkov chaining mechanisms working on surface form Their massivelydistributed nature allows the emergence of more abstract internalrepresentations We have argued that this analysis accounts for the human

332 ELLIS AND SCHMIDT

acquisition data of simple MAL morphosyntax quite well We believe thatthe acquisition of natural language morphosyntax where there are manyadditional factors of different phonological consistencies (of the type forexample where the neighbours sink drink and stink are irregular in theirpast tenses but all behave in the same -ankway) are equally conducive to theprinciples of this type of account although as illustrated in grandersimulation enterprises (Cottrell amp Plunkett 1994 Daugherty amp Seidenberg1994 MacWhinney amp Leinbach 1991 Marchman 1993 Plunkett ampMarchman 1993) the complexity of interaction of the factors that are therein the language evidence leads to much more complex developmentaloutcomes Our role here has been to study human acquisition underprecisely known circumstances and to demonstrate just how well aconnectionist associative account can simulate these data

A simple regularity5 consistency account of this type will have difculty ifthe ldquoregularrdquo or ldquodefaultrdquo case is not the most frequent case in a naturallanguage Although there is agreement for English past tense and formorphology more generally that the default case is more frequent theremay be exceptions Marcus et al (1995) argue that while the German particle-t applies to a much smaller percentage of verbs than its English counterpartand the German plural -s applies only to a small percentage of nounsnevertheless these afxes behave as defaults in the language These defaultsufxations in German could thus pose a problem for statistical orconnectionist accounts of the acquisition of the more frequent patterns asdefault since they may not be due to a large number of regular wordsreinforcing a pattern in associative memory (Prasada amp Pinker 1993)However this is still a matter of some debate Bybee (1995) suggests that amore reasonable method of counting German particle type frequency doesshow the default (or ldquoproductiverdquo) process to have the highest typefrequency She also argues that to a large extent the productivity patterns ofGerman plurals also reect their type frequency Nakisa and Hahn (1996)and Plunkett and Nakisa (in press) show that generalisation to unseen ornovel forms in German and Arabic (where there have also been claims for aminority default) is more accurately predicted by their phonologicalsimilarity to existing forms in the language (properly represented for typeand token frequency) rather than by the operation of a default rule FinallyHare Elman and Daugherty (1995) demonstrate that multilayerednetworks can develop a default category even in the absence of superior typefrequency as long as the non-default classes are well dened and narrowlydened so that they serve as strong prototypes for analogising to novelforms In such cases the area outside these well-dened attractor basins canconstitute a potential default (see also Plunkett amp Marchman 1991)

In the original hybrid model irregulars were stored and accessed fromrote memory Pinker and Prince (1994 p 326) modied this part of the

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 333

model arguing that since rote memory could not account (a) for similaritiesbetween the morphological base and irregular forms (eg swingndashswung) (b)for similarity within sets of base forms undergoing similar processes (egsingndashsang ringndashrang springndashsprang) or (c) for the kind of semi-productivityshown when children produce errors such as bringndashbrang or swingndashswangthe memory system underlying such productions must be associative anddynamic somewhat as connectionism portrays it Yet to account for datasuch as the frequencyregularity interaction this revised hybrid model stillholds that regular forms are rule-governed But a purely rule-based accountof regulars cannot explain false friends effects where regular inconsistentitems (eg bakendashbaked is similar in rhyme to neighbours makendashmade andtakendashtook which have inconsistent past tenses) are produced more slowlythat entirely regular ones (Daugherty amp Seidenberg 1994 Seidenberg ampBruck 1990) or frequency effects on regular forms (Oetting amp Rice 1993Stemberger amp MacWhinney 1986) Unlike connectionist models a rule-based account of regulars cannot explain these aspects of the human dataNor is the regularityfrequency interaction any reason to reject connectionistaccounts of morphosyntax in favour of a hybrid model

REFERENCESAnderson JR (1982) Acquisition of cognitive skill Psychological Review 89 369ndash406Anderson JR (1993) Rules of the mind Hillsdale NJ Lawrence Erlbaum Associates IncAnderson JR amp Schooler LJ (1991) Reections of the environment in memory

Psychological Science 2 396ndash408Beck M (1995) Tracking down the source of NSndashNNS differences in syntactic competence

Unpublished manuscript University of North TexasBellugi U Bihrle A Jernigan D Trauner D amp Dougherty S (1990)

Neuropsychological neurological and neuroanatomical prole of Williams SyndromeAmerican Journal of Medical Genetics 6 115ndash125

Braine MDS Brody RE Brooks PJ Sudhalter V Ross JA Catalano L amp FischSM (1990) Exploring language acquisition in children with a miniature articiallanguage Effects of item and pattern frequency arbitrary subclasses and correctionJournal of Memory and Language 29 591ndash610

Broeder P amp Plunkett K (1994) Connectionism and second language acquisition In NEllis (Ed) Implicit and explicit learning of languages (pp 421ndash454) London AcademicPress

Bybee J (1995) Regular morphology and the lexicon Language and Cognitive Processes10 425ndash455

Chater N (1995) Neural networks The new statistical models of mind In JP Levy DBairaktaris JA Bullinaria amp P Cairns (Eds) Connectionist models of memory andlanguage London UCL Press

Cohen JD MacWhinney B Flatt M amp Provost J (1993) PsyScope A new graphicinteractive environment for designing psychology experiments Behavioral ResearchMethods Instruments and Computers 25 257ndash271

Cottrell G amp Plunkett K (1994) Acquiring the mapping from meaning to soundsConnection Science 6 379ndash412

334 ELLIS AND SCHMIDT

Daugherty KG amp Seidenberg MS (1992) Rules or connections The past tense revisitedIn Proceedings of the 14th annual conference of the Cognitive Science Society (pp 259ndash264)Pittsburgh PA Cognitive Science Society

Daugherty KG amp Seidenberg MS (1994) Beyond rules and exceptions A connectionistapproach to inectional morphology In SD Lima RL Corrigan amp GK Iverson (Eds)The reality of linguistic rules (pp 353ndash388) Amsterdam John Benjamins

DeKeyser R (1997) Beyond explicit rule learning Automatizing second languagemorphosyntax Studies in Second Language Acquisition 19 195ndash222

Ellis NC (1996) Sequencing in SLA Phonological memory chunking and points of orderStudies in Second Language Acquisition 18 91ndash126

Eubank L amp Gregg KR (1995) ldquoEt in Amygdala Egordquo UG (S)LA and neurobiologyStudies in Second Language Acquisition 17 35ndash58

Hare M Elman JL amp Daugherty KG (1995) Default generalisation in connectionistnetworks Language and Cognitive Processes 10 601ndash630

Jung J (1971) The experimenterrsquos dilemma New York Harper amp RowKirsner K (1994) Implicit processes in second language learning In N Ellis (Ed) Implicit

and explicit learning of languages (pp 283ndash312) London Academic PressLachter J amp Bever T (1988) The relation between linguistic structure and associative

theories of language learning A constructive critique of some connectionist learningmodels Cognition 28 195ndash247

Lima SD Corrigan RL amp Iverson GK (Eds) (1994) The reality of linguistic rulesAmsterdam John Benjamins

MacWhinney B (1983) Miniature language systems as tests of use of universal operatingprinciples in second-language learning by children and adults Journal of PsycholinguisticResearch 12 467ndash478

MacWhinney B (1994) The dinosaurs and the ring In SD Lima RL Corrigan amp GKIverson (Eds) The reality of linguistic rules (pp 283ndash320) Amsterdam John Benjamins

MacWhinney B amp Leinbach J (1991) Implementations are not conceptualizationsRevising the verb learning model Cognition 40 121ndash157

Marchman VA (1993) Constraints on plasticity in a connectionist model of the Englishpast tense Journal of Cognitive Neuroscience 5 215ndash234

Marcus GF Brinkmann U Clahsen H Wiese R amp Pinker S (1995) Germaninection The exception that proves the rule Cognitive Psychology 29 198ndash256

McLaughlin B (1980) On the use of miniature articial languages in second-languageresearch Applied Psycholinguistics 1 357ndash369

Moeser SD amp Bregman AS (1972) The role of reference in the acquisition of a miniaturearticial language Journal of Verbal Learning and Verbal Behavior 11 759ndash769

Morgan JL Meier RP amp Newport EL (1987) Structural packaging in the input tolanguage learning Contributions of prosodic and morphological marking of phrases to theacquisition of language Cognitive Psychology 19 498ndash550

Morgan JL amp Newport EL (1981) The role of constituent structure in the induction of anarticial language Journal of Verbal Learning and Verbal Behavior 20 67ndash85

Morton J (1979) Facilitation in word recognition Experiments causing change in thelogogen model In PA Kolers ME Wrolstad amp M Bouma (Eds) Processing of visiblelanguage (pp 259ndash268) New York Plenum

Nakisa R amp Hahn U (1996) Where defaults donrsquot help The case of the German pluralsystem In Proceedings of the 18th annual conference of the Cognitive Science Society (pp177ndash182) Hillsdale NJ Lawrence Erlbaum Associates Inc

Newell A (1990) Unied theories of cognition Cambridge MA Harvard University PressNewell A amp Rosenbloom P (1981) Mechanisms of skill acquisition and the law of

practice In JR Anderson (Ed) Cognitive skills and their acquisition Hillsdale NJLawrence Erlbaum Associates Inc

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 335

Oetting JB amp Rice ML (1993) Plural acquisition in children with specic languageimpairment Journal of Speech and Hearing Research 36 1236ndash1248

Paivio A (1986) Mental representations A dual coding approach Oxford UK OxfordUniversity Press

Palermo DS amp Howe HE (1970) An experimental analogy to the learning of past-tenseinection rules Journal of Verbal Learning and Verbal Behavior 9 410ndash416

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S amp Prince A (1988) On language and connectionism Analysis of a parallel

distributed processing model of language acquisition Cognition 29 195ndash247Pinker S amp Prince A (1994) Regular and irregular morphology and the psychological

status of rules of grammar In SD Lima RL Corrigan amp GK Iverson (Eds) The reality oflinguistic rules (pp 321ndash351) Amsterdam John Benjamins

Plaut DC McClelland JL Seidenberg MS amp Patterson KE (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plunkett K amp Marchman V (1991) U-shaped learning and frequency effects in amulti-layered perceptron Implications for child language acquisition Cognition 38 3ndash102

Plunkett K amp Marchman V (1993) From rote learning to system building Acquiring verbmorphology in children and connectionist nets Cognition 48 21ndash69

Plunkett K amp Nakisa RC (in press) A connectionist model of Arabic plural systemLanguage and Cognitive Processes

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Prasada S Pinker S amp Snyder W (1990) Some evidence that irregular forms are retrievedfrom memory but regular forms are rule-governed Paper presented at the 31st meeting ofthe Psychonomic Society New Orleans November

Rumelhart D Hinton G amp Williams R (1986) Learning internal representations by backpropagation In DE Rumelhart amp JL McClelland (Ed) Parallel distributed processingExplorations in the microstructure of cognition Cambridge MA MIT Press

Rumelhart D amp McClelland J (1986) On learning the past tense of English verbs In DERumelhart amp JL McClelland (Eds) Parallel distributed processing Explorations in themicrostructure of cognition Vol 2 Psychological and biological models (pp 272ndash326)Cambridge MA MIT Press

Seidenberg MS amp Bruck M (1990) Consistency effects in the generation of past tensemorphology Paper presented at the 31st meeting of the Psychonomic Society New OrleansNovember

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behavior 23 383ndash404

Sharwood Smith MA (1994) The unruly world of language In N Ellis (Ed) Implicit andexplicit learning of languages (pp 33ndash44) London Academic Press

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Human Learning and Memory 6 174ndash215

Stemberger JP amp MacWhinney B (1986) Frequency and the lexical storage of regularlyinected forms Memory and Cognition 14 17ndash26

Winter B amp Reber AS (1994) Implicit learning and the acquisition of natural languagesIn N Ellis (Ed) Implicit and explicit learning of languages (pp 115ndash146) LondonAcademic Press

Yang LR amp Givon T (1997) Benets and drawbacks of controlled laboratory studies ofsecond language acquisition The Keck second language learning project Studies in SecondLanguage Acquisition 19 173ndash194

336 ELLIS AND SCHMIDT

APPENDIX

The Word-forms of the Articial Language

Picture Stem Plural Form Frequency Regularity

car garth bugarth 5 Rbed pid bupid 1 Rlamp lant bulant 5 Rtable tib butid 1 Rplane poon bupoon 5 Rball prill buprill 1 Rtrain dram budram 5 Rhouse hize buhize 1 Rbook bisk bubisk 5 Rbroom breen bubreen 1 Rphone feem gofeem 5 Iumbrella brol gubrol 1 Ichair charp zecharp 5 Ihorse naig zonaig 1 Imonkey chonk nuchonk 5 Idog woop niwoop 1 Ielephant fant vefant 5 Iscissors zoze vuzoze 1 Ikite kag rekag 5 Ish pisc ropisc 1 I

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 317

CONNECTIONIST SIMULATIONS

Connectionist models allow the assessment of just how much of languageacquisition can be done by extraction of probabilistic patterns ofgrammatical and morphological regularities Since the only relation inconnectionist models is strength of association between nodes they areexcellent modelling media in which to investigate the formation ofassociations (both between surface-form elements and between these andemergent more abstract internal representations) as a result of exposure tolanguage The advantages of connectionist models over traditional symbolicmodels are that (a) they are neurally inspired (b) they incorporatedistributed representation and control of information (c) they are data-driven with prototypical representations emerging as a natural outcomeof the learning process rather than being prespecied and innately givenby the modellers as in more nativist cognitive accounts (d) they showgraceful degradation as do humans with language disorder and (e)they are in essence models of learning and acquisition rather than staticdescriptions

There have been a number of compelling PDP models of the acquisition ofmorphology The pioneers were Rumelhart and McClelland (1986) whoshowed that a simple learning model reproduced to a remarkable degreethe characteristics of young children learning the morphology of the pasttense in Englishmdashthe model generated the so-called U-shaped learningcurve for irregular forms it exhibited a tendency to overgeneralise and inthe model as in children different past-tense forms for the same word couldco-exist at the same time Yet there was no ldquorulerdquomdashldquoit is possible to imaginethat the system simply stores a set of rote-associations between base andpast-tense forms with novel responses generated by lsquoon-linersquo generalisationsfrom the stored exemplarsrdquo (Rumelhart amp McClelland 1986 p 267) Thisoriginal past-tense model was very inuential It laid the foundations for theconnectionist approach to language research which this special issue attestsit generated a large number of criticisms (Lachter amp Bever 1988 Pinker ampPrince 1988) some of which are undeniably valid and in turn it thusspawned a number of revised and improved PDP models of different aspectsof the acquisition of the English past tense (eg Cottrell amp Plunkett 1994Daugherty amp Seidenberg 1994 MacWhinney amp Leinbach 1991Marchman 1993 Plunkett amp Marchman 1991)

Of these newer models only that of Daugherty and Seidenberg (19921994) addressed the regularity by frequency interaction Their model was athree-layer feed-forward network mapping the input of phonologicalstructure of present tense encoded over 120 phonological units representinga CCCVVCCC template for English monosyllables onto an output ofsimilarly coded phonological structure of past tense form Simulation 1

318 ELLIS AND SCHMIDT

where the model was trained on all presentndashpast tense pairs with Francis andKucera frequencies 1 (309 verbs with regular past tenses 104 verbs withirregular past tenses) failed to generate any frequency by regularityinteraction in error score However when in simulation 2 the number ofirregular verbs in the training set was reduced to just 24 this resulted in therebeing little effect of frequency on performance with the regular itemswhereas performance was better for high frequency irregular verbs than forlow frequency ones This is an important demonstration that the frequencyby regularity interaction can be simulated by a connectionist systemHowever this model concerned mappings between present- and past-tenseforms not direct access from semantics as in our human data Furthermoreit is unclear from these simulations how much the results are due toregularity per se how much to phonological factors (for example insimulation 1 the error scores for regulars in generalisation tests were inatedby there being a high proportion of phonologically similar irregular pasttense false friends in the training corpus 1994 p 375) and given thecontrasting results of simulations 1 and 2 how much to the particular choiceof training items and the relative proportions of regular and irregular items

Indeed much of the debate over the validity of all of these models hasconcerned (a) the adequacy of the adopted low-level phonologicalrepresentations whether these might serve as TRICS (The RepresentationsIt Crucially Supposes) which cryptoembody rules within the connectionistnetwork (Lachter amp Bever 1988) (b) over-reliance on phonological cues inmodels that used sound-to-sound conversions to link base forms with pasttense forms (Daugherty amp Seidenberg 1992 MacWhinney 1994MacWhinney amp Leinbach 1991) and (c) the appropriateness of the trainingsets that are used in exposing the models to the evidence of language andwhether they properly reect the types and tokens in representative ratiosof regular and irregular forms in a sequence that plausibly mirrors learnerlanguage exposure at different stages of development (Daugherty ampSeidenberg 1992 Plunkett amp Marchman 1991) The models are usuallyconcerned with child learner language exposure yet here the extrapolationis particularly tenuous since adult language frequency norms are typicallythe only available reference database

In our simple demonstration with its intended focus on the frequency byregularity interaction in the acquisition of morphology we circumventedthese problems by the following means

1 We eliminated TRICS from our input and output representations byentirely ignoring the low level representations and instead simply havingone input unit for each picture and one output unit for each morphemeWe make no pretence of plausibility of these models for low levels ofrepresentation in either input or output processing but we are presently

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 319

neither concerned with low-level feature perception nor the details ofmotor programming for pronunciation Each input unit is supposedbroadly to correspond to activation of some picture detector or ldquoimagenrdquo(Paivio 1986) each output unit to some speech output ldquologogenrdquo(Morton 1979) We acknowledge that these parts of the model are grosslysimplied and we believe that these aspects ultimately involve distributedrepresentations as well However there is one advantage to thissimplicitymdashwhere as here each input detector or output logogen isrepresented by just one unit with all units having the same form there isno scope for making some more similar than others other that is than isdetermined by the frequency of the inputndashoutput mappings Thisencoding scheme allows the most hygienic investigation of frequency andregularity uncontaminated by other factors2 Like Cottrell and Plunkett (1994) we are modelling direct access fromsemantics rather than generating past tense from stem form phonologyBecause there are no phonological representations in our model there isno chance of the results reecting any confound with phonology As usualcosts accompany the benets Our simulations can have no bearing onphonological aspects of inection and thus while they might generatequantitatively clean data unlike the elegant error analyses performed byfor example Daugherty and Seidenberg (1994) and MacWhinney andLeinbach (1991) the error responses in the present simulations will bequalitatively uninteresting3 We eliminated uncertainty about the detailed content of the complexevidence which human learners are exposed to during their early years ofhearing natural language by modelling adult subjectsrsquo learning of theMAL that was reported in the preceding section Because we determinedthe exposure sequence of types and tokens of regular and irregular itemsin this language learning task we could train the models ensuring theidentical history of exposure

The most common architecture of connectionist model has three layersthe input layer of units the output layer and an intervening layer of hiddenunits (HUs) The presence of HUs enables more difcult inputoutputmappings to be learned than would be possible if the input units weredirectly connected to the output units (Broeder amp Plunkett 1994Rumelhart amp McClelland 1986) The most common learning algorithm isldquoback propagationrdquo (Rumelhart Hinton amp Williams 1986) where on eachlearning trial the network compares its output with the target output andany difference is propagated back to the hidden unit weights and in turn tothe input weights in a way that reduces the error Our simulations adoptedthis standard architecture Thus whatever the pattern of results they aregenerated by a very general learning system whose processes were not

320 ELLIS AND SCHMIDT

tweaked in any way to make it particular as a Language Acquisition DeviceSo what are the emergent patterns of language acquisition that result whenthis general associative learning mechanism is applied to the particularcontent of picture stimuli with their corresponding singular and plural lexicalresponses as experienced at the same relative frequencies of exposure as ourhuman learners

The Models

Architecture Every model had 22 input (I-) units Each of I-units 1ndash20represented one of the pictures used in the training set of the AppendixI-unit 21 represented another picture (the generalisation test item TesterP)which was only ever presented for training to the model in the singularmdashlater it was presented as a plural test item to see which plural afx the modelwould choose for this generalisation item (akin to asking you what is theplural of a novel word like ldquowugrdquo) I-unit 22 coded plurality that iswhether a singular stimulus item or a pair were presented Every model had32 output (O-) units O-units 1ndash20 represented the stem forms of the lexisshown in the Appendix O-unit 21 represented the stem form correspondingto I-unit 21 O-units 22ndash31 represented each of the other 10 unique pluralafxes for irregular items O-unit 32 represented the regular plural afxThis numbering of I- and O-units is of course arbitrary and was random-ised across modelsmdashwhat mattered and remained constant was that thesame O-unit was always reinforced whenever a particular I-unit wasactivated

We investigated four different classes of model which differed in theircomputational capacity or resources The larger the number of HUs in amodel the larger the number of connections in the network and the greaterits capacity to learn new associations and abstractions Thus we comparedmodels with 3 5 8 and 15 HUs

Stem Training At the outset the connection weights of the models wererandomised Then just like our human learners the models were rsttrained on the singular forms Each epoch of training consisted of 21 trialsEach trial consisted of presentation of a unique input pattern one for each ofthe input pictures Thus just one of I-units 1ndash21 would be ldquoonrdquo on any trialThroughout the singular training phase I-unit 22 (representing singlepluralstimuli) was set to ldquooffrdquo For each input pattern the model responded with apattern of output over its 32 O-units Initially this was the random result ofthe random connection weights But the model was also presented with thecorrect pattern of output for that corresponding input pattern (eg if I-unit 1

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 321

was on and all others off O-unit 1 should have had value 10 and all otherszero) On each trial the back-propagation algorithm calculated thedifference between the level of activity that was produced on each O-unitand the ldquocorrectrdquo level of activity and a small adjustment was made to theconnection strength to that unit in such a way that when the same processoccurred again a closer approximation to the correct pattern of outputactivation would be achieved The models were trained for 500 epochs ofsingular experience For each size of model we ran ve examples startingwith different arbitrary unit allocation and different initial randomconnection strengths The data we produce for each model is the averageperformance of these ve examples

Plural Training The model weights that resulted from this singulartraining then served as the starting point for another 700 epochs of trainingon plurals The trials constituting each epoch were very similar in nature tothose used with the human learners Each epoch consisted of 81 trialspresented in random order (a) One presentation of each of the 21 singularforms as in the preceding phase (b) ve presentations of each of the ve highfrequency regular (HiFreqReg) plural forms (c) ve presentations of eachof the ve high frequency irregular (HiFreqIrreg) plural forms (d) onepresentation of each of the ve low frequency regular (LoFreqReg) formsand (e) one presentation of each of the ve low frequency irregular(LoFreqIrreg) forms For training trials of type (a) just one of I-units 1ndash21was activated I-unit 22 was off and just the corresponding one of O-units1ndash21 was reinforced For the other training types (bndashe) one of I-units 1ndash20was activated I-unit 22 was on and one of O-units 1ndash20 (the correspondingstem form) along with one of O-units 22ndash32 (the corresponding plural afx)were reinforced The learning algorithm operated as it did in the stemtraining phase At regular intervals we tested the state of learning of themodel by presenting it without feedback with test input patterns thatrepresented the plural cases of all 21 pictures At these tests for eachstimulus we measured the pattern of activation (between 0 [no activation]and 1 [full on]) across O-units 22ndash32 and compared it against the targetplural activation for that input pattern

Results

Regularity by Frequency Figure 2 shows the Root Mean Square (RMS)error calculated across the plural afx O-units (22ndash32) averaged over the veitems in each of the following classes HiFreqReg HiFreqIrreg LoFreqRegLoFreqIrreg at each point in testing of the model These graphs illustratethat learning in all of the models showed clear effects of frequency (high

322

FIG

2

Acq

uisi

tion

data

for

fou

r co

nnec

tioni

st m

odel

s w

ith

incr

easi

ng c

ompu

tati

onal

pow

er t

rain

ed o

n th

e M

AL

mor

phol

ogy

The

re a

re c

lear

reg

ular

ity b

y fr

eque

ncy

inte

ract

ions

in a

ll m

odel

s

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 323

frequency items were learned faster than low frequency ones) regularity(regular items were learned faster than irregular ones) and a frequency byregularity interaction whereby there was much less regularity effect for highfrequency items than for low frequency items and equally that thefrequency effect was less for regular items than for irregular ones

ANOVAs on these RMS data for each size of model demonstrated thatthere was high consistency of response across items and examplesimulations For example when the 8HU model was analysed as a repeatedmeasures ANOVA across 15 roughly equally spaced blocks of training (toparallel the human data analysis) the following signicant effects wereobserved (a) Frequency [by simulations F(1 16) 5 2080 P 00005 bywords F(1 16) 5 5665 P 00001] (b) regularity [by simulations F(1 16)5 907 P 001 by words F(1 16) 5 3957 P 00001] (c) regularity byfrequency [by simulations F(1 16) 5 485 P 005 by words F(1 16) 51561 P 0005] (d) block [by simulations F(14 224) 5 6803 P 00001by words F(14 224) 5 14914 P 00001] (e) block by regularity [bysimulations F(14 224) 5 3675 P 00001 by words F(14 224) 5 2929 P 00001] (f) block by frequency [by simulations F(14 224) 5 1893 P 00001 by words F(14 224) 5 1184 P 00001] and (g) block by regularityby frequency [by simulations F(14 224) 5 1611 P 00001 by words F(14224) 5 1306 P 00001]

Comparison of this pattern of ANOVA effects with that reported earlierfor the human data shows important similarities in both cases there aresignicant main effects of frequency regularity and blocks and there aresignicant interactions involving regularity by frequency and regularity byfrequency by block Thus the connectionist models demonstrate effectswhich broadly parallel those found in humans

Comparison with Human Data More detailed comparison is alsopossible Although RMS error is the usual measure of model performancebecause it assesses how well the network learns to inhibit non-relevant unitsas well as to excite relevant ones we also extracted simple accuracy data forthe 8HU model This accuracy score is the amount of activation (between 0and 1) on the single O-unit which corresponds to the appropriate target afxfor that input pattern Figure 3 shows the performance of the 8HU modelusing this metric It is clear that accuracy scores generate a graph which iseffectively a reection in a horizontal plane of the RMS data shown in thethird panel of Fig 2 In fact in the current simulations correct activation isalmost perfectly correlated with MSE (for example r 5 2 0988 for the 8HUmodel) However the activation metric has the advantage of more readyinterpretation and direct comparison with the human data

When the 8HU model and the human data are aligned as in Fig 3 thesecorrespondences become clear Pairwise comparison of individual points

324 ELLIS AND SCHMIDT

FIG 3 A comparison of human accuracy performance and that of the eight hidden unitconnectionist simulation

across these two graphs by correlation shows that the simulation predicts alarge proportion of the variance in the human data (R2 5 078) There aresome differences in detailmdashas is claried in Fig 4 where performance isaveraged over blocks the model performs somewhat better on the regularitems and worse on the irregular items particularly the low frequencyirregular items than do the humans ANOVA (three factor [humanmodelregularity and frequency] with 15 blocks as repeated measures by wordsanalysis) comparing the human and 8HU model data conrms theseinteractions (a) humanmodel F(1 32) 5 136 ns (b) humanmodel byfrequency F(1 32) 5 047 ns (c) humanmodel by regularity F(1 32) 53028 P 00001 (d) humanmodel by regularity by frequency F(1 32) 5501 P 005

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 325

FIG 4 The regularity by frequency interaction averaged over blocks in humans and the eighthidden unit model Error bars reect 95 condence intervals

Generalisation So far we have described performance with traineditems However we also tested model output when the stimulus was thepattern for generalisation item (TesterP) along with activation of the pluralmarking I-unit 22 a state of input on which the models had never beentrained Table 1 shows performance of the different models at the end oftraining It is clear that the larger models have abstracted the regular pluralpattern and tend to apply it by default to the generalisation test item for the15HU model (a) average activation on the regular plural O-unit is 060 (b)mean RMS error comparing observed activation across O-units 22ndash32 andthe target regular plural pattern (10000000000) is just 045 and (c) four outof the ve exemplar runs of this size of model chose the regular pluralpattern as being the closest to observed output as measured by minimum

326 ELLIS AND SCHMIDT

TABLE 1Performance on the Target Regular Plural Pattern for the Four Sizes of Model When

Presented with the Generalisation Wug-test Item TesterP at End of Training

Model Size

Measure 3HUa 5HU 8HU 15HU

RMS errorb

M 081 079 053 045SD 043 050 045 032

Activation weightc

M 020 028 057 060SD 044 044 052 035

N hits (5)d 1 2 3 4

There were ve examplars of each size of model aHU 5 hidden units bRMS error calculatedagainst the target activation pattern across O-units 22ndash32 for the regular plural afx cActivationweight on the regular plural afx O-Unit dNumber of exemplar models (5) which chose theregular plural afx pattern for TesterP as indexed by output weights on O-units 22ndash32 beingclosest to the regular plural afx target pattern activation using a squared Euclidean distancemetric

squared Euclidean distance Thus when the larger models are presentedwith a plural stimulus which they have only ever previously experienced as asingle form there is a tendency for them to generalise and apply the regularplural morpheme (bu-) in the same way that humans might generalise thatthe plural of ldquowugrdquo is ldquowugsrdquo

Effects of Different Sizes of Model Figure 2 also illustrates the effects ofmanipulating computational capacity of model (1) Models with lowercomputational power ( 5 a smaller number of HUs) learn the high frequencyitems quite wellmdashalmost as well as the largest model (2) The most strikingeffect of varying the computational power of the models lies in their abilitiesto learn low frequency irregular itemsmdashthis is by far the most sensitive indexof morphological learning ability The 3HU model hardly manages to learnthese forms at all The 15HU model eventually learns them rather well (3)There is essentially no frequency effect for regular items in the highercomputational power models but none the less the frequency effect forirregular items remains strong (4) The smaller models continue to show afrequency effect for regular items at the end of training Table l provides oneadditional effect of model size (5) The greater the computational power ofthe models the more they operate in ldquorule-likerdquo way by abstracting aldquoregularrdquo plural form which is applied by default to novel items In sumwhile lower computational power models are reasonably good on highfrequency regular items they show frequency effects for irregular and

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 327

regular items are relatively poor on ldquowug testsrdquo and have particulardifculty on low frequency irregular items

Discussion of Simulations

We believe that at least for the issue of regularity and frequency effects inmorphosyntax this is to date the most complete quantitative analysis of theadequacy of t of simulation to human data We are not simply makingpredictions about how an underspecied model might behave (theDaugherty amp Seidenberg 1994 criticisms of the Pinker amp Prince 1988 andPinker 1991 theories) We are not simply demonstrating that simulation andhuman data alike exhibit rst order interactions of frequency and regularity(Daugherty amp Seidenberg 1994) Instead we are showing the parallelpatterns of signicance of main effects rst and second order interactions inANOVAs of simulation and human data and we are showing that thesimulations explain close to 80 of the relevant human data When we go asfar as actually comparing human and model performance in a multifactorialANOVA we nd some differences of detail in the size of interactions thatare qualied by the humanmodel factor But these differences of detail donot detract from the general success of the models in simulating the humanpattern of development of the frequency by regularity interaction Inhumans and models alike high frequency items were learned signicantlyfaster than low frequency ones regular items were learned signicantlyfaster than irregular ones there was a signicant frequency by regularityinteraction where the frequency effect was less for regular items than forirregular ones and this is qualied as the higher level interaction with blockwhereby there is a developmental trendmdashthe frequency effect for regularitems attenuates faster than that for irregular items

We have demonstrated that the models can generalise and produce thedefault plural afx for a novel stimulus Similar ldquowug testrdquo performance by ahuman learner would be taken as an operationalisation that they hadacquired the ldquoregularrdquo morphological systematicity

Finally we have shown how varying the computational capacity of themodels affects both the rate of acquisition of default case the presence orabsence of frequency effects for regular items and ability to acquireirregular items This is compatible with existing data for children withspecic language impairment (SLI) Oetting and Rice (1993) compared ve-year-old SLI children with age-matched controls on their ability to formplurals The SLI children were signicantly worse at generating regularplurals for nonce (5 wug) items they were worse at generating regularplurals and they showed an effect of frequency on the regular items whichthe control children because of ceiling effects did not UnfortunatelyOetting and Rice (1993) do not provide clear data on the childrenrsquos ability to

328 ELLIS AND SCHMIDT

form irregular plurals However their pattern of differences between SLIand control childrenrsquos performance on regular items is sufciently close tothat between the present low-capacity and high-capacity simulations tosuggest that morphosyntactic impairments in individuals with SLI might beexplained by reduced language processing capacity in a general associativememory network rather than by a hybrid account The SLI childrenrsquosshowing frequency effects for regular items is particularly compelling in thisrespect However further assessment of regularity by frequency effects anddefault abstraction in individuals with SLI and with Williams syndrome(whose ability on regular forms is said to outstrip their performance onirregularsmdashBellugi Bihrle Jernigan Trauner amp Dougherty 1990) isnecessary to test these parallels further (see Marchman 1993 for othersimulations of different types of language dysfunction)

GENERAL DISCUSSION

Fluent language users have processed many millions of utterances involvingtens of thousands of types presented as innumerable tokens It should comeas no surprise either that they demonstrate such effortless and complex skillas a result of this mass of practice or that researchers lacking any truerecord of the learnersrsquo experience are awed and confused by thesesophisticated grammatical abilities While we have no wish to deny any ofthe complexity of the nal uent state we suspect that much of the mysteryof morphology can be claried by focusing on the acquisition process ratherthan the end-point This has been our aim in this paper Our MAL is atravesty of natural language but at least we know the types and tokens in thelearnersrsquo language evidence and there is no need to speculate or argue aboutextrapolations from corpus data or assumptions about registers

Human learning of this MAL inectional morphology quickly culminatesin a state where as with natural language frequency and regularity haveinteractive effects on performance But as we chart acquisition it is clearthat this interaction need not imply complex dual-mechanisms of processingRather it simply reects the asymptotes expected from the power law ofpractice a simple associative law of learning Thus we have shown that oneof the most frequently introduced arguments for the necessity of adual-mechanism approach a frequency effect for irregulars and the absenceof such an effect for regulars is not a good argument at all Furthermore wehave demonstrated that a simple connectionist model as an implementationof associative learning provided with the same language evidenceaccurately simulates the human acquisition data

But how is the power law instantiated in human and connectionistsystems and what is being associated in the acquisition of inectional

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 329

morphosyntax The power law of learning in human performance has beeninterpreted as resulting from basic associative mechanisms involving theformation of new chunks and the effects of frequency on the accessibility ofthese representations (Newell 1990 Newell amp Rosenbloom 1981)Anderson and Schooler (1991) suggest that memory (both as its behaviouralexpression in error rate and latency and as its neural expression in LTP)displays properties such as the power law of learning because theseproperties reect an optimal response to the environment where theprobability of an item occurring at any particular time is a power function ofits past frequency of occurrence Neural activation which controlsbehaviour reects the probability of an item occurring in the environmentthus the neural processes are designed to adapt behaviour to the statisticalproperties of the environment (Anderson 1993) Connectionist systems aredesigned to do the same thing (Chater 1995)

In our simplied account of inectional morphology where phonologicalfactors are put to one side the relevant units for chunking are the stem formsand the plural afxes From an associative perspective regularity andfrequency are essentially the same factor under different names The rstmeaning of ldquoregularrdquo in the Pocket Oxford Dictionary involves ldquohabitualconstantrdquo acts a denition in terms of statistical frequencies consistencyand descriptive generalisation the second stresses ldquoconforming to a rule orprinciplerdquo We need to disentangle these senses (see Sharwood-Smith 1994and Lima Corrigan amp Iverson 1994 for conceptual analysis of ldquorules oflanguagerdquo) Whether regular morphology is generated according to a rule ornot it is certainly the case for English and the MAL under study here (andgenerally it is the default if not the universal casemdashwe will return to thismatter later) that regular afxes are more habitual or frequent And asdemonstrated in Fig 5 the power law of practice entails that an effect of aconstant increment of regularity (in its frequency sense) is much moreapparent at low than at high frequencies of practice

Although it is a general principle the degree to which it applies dependson a range of factors including (a) the exponent of the power function (b)the particular level of experience attained and thus the placement ofcomparison points on the learning curve and (c) the degree to whichfrequency and regularity are additive or multiplicative In the presentexperiment a vefold increase in the frequency of the regular items resultsin a (5 3 the number of regular items) increase in use of the regular afx avefold increase in the frequency of an irregular item results in merely avefold increase in the use of the irregular afx Thus frequency andregularity are interactive rather than additive But even if we allow forinteraction the function still results in greater regularity effects for lowfrequency itemsmdashjust as for example the power function

330 ELLIS AND SCHMIDT

FIG 5 A frequency by regularity interaction arising from additive contributions of regularity(solid horizontal arrows) and frequency (dotted horizontal arrows) inputting into anasymptoting power function Notice in particular the solid vertical bars measuring out the largeregularity effect at low frequencies and the much smaller one at high frequencies (Adaptedfrom Plaut McClelland Seidenberg amp Patterson 1994)

y 5 1 2 x2 2

asymptotes so does any power function

y 5 1 2 (xn)2 2

where n 0 the shape remains the same albeit stretched or condensedalong the horizontal axis Thus all associative accounts of morphologywhether they stress the importance of type or token frequency (Bybee 1995)in the determination of statistical regularity imply a frequency by regularityinteraction in performance

Plaut et al (1996) analyse the operation of connectionist networks in theparticular quasi-regular domain of spellingndashsound consistency in reading todemonstrate how the frequency by regularity interaction is a direct

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 331

consequence of the nonlinearity adaptivity and distributed representationproperties of learning and representation in PDP networks In what followswe will minimally rephrase their analysis as it applies to the quasi-regulardomain of inectional morphology In a connectionist network the weightchanges induced by an inputoutput pattern (IOP) on any training epochserve to reduce the error on that IOP The frequency of the IOP (and theunits it involves) is reected in how often it is presented to the network Thusword frequency directly amplies weight changes that are helpful to theIOP itself Consistency of the morphological inections of two stems isreected in the similarity of afx units that are co-activated in their IOPsFurthermore two inputs will induce similar weight changes to the extentthat they activate similar units In our MAL as an extreme case consistentforms all activate the same afx unit irregular ones each activate a differentidiosyncratic afx Given that the weight changes that are induced by eachIOP are superimposed on the weight changes for all other IOPs an IOPwill tend to be helped by the weight changes for IOPs whose inputoutputmappings are consistent with its own and hindered by the weight changesfor inconsistent IOPs Thus frequency and consistency sum because theyboth arise from similar weight changes that are simply added together duringtraining The weight changes result in corresponding increases in thesummed input to output units that should be active and decreases to thesummed units that should be inactive However due to the non-linearity ofthe input-output function of units these changes do not produce directlyproportionate reduction of error Rather as the magnitude of the summedinput to output units increases their states gradually asymptote towards10mdasha given increase in the summed input to a unit yields progressivelysmaller decrements in error over the course of training Thus althoughfrequency and regularity-as-consistency each contribute to the weights andhence to the summed input to units their effect on error is subjected to agradual ceiling effect as unit states are driven towards extremal values

Thus a connectionist associative account of simple morphosyntax as it isembodied in our MAL holds that learning involves associating inputpatterns representing single or plural concepts with stem and afx lemmasacross a large distributed network Frequency of experience increases thestrength of the appropriate IO associations Regularity effects stem fromconsistency the consistent items all involve pairings between plurality andthe regular lemma and thus regularity is frequency by another name Thenetwork sums and abstracts these consistencies but it does so usingnon-linear unit inputndashoutput functions thereby resulting in the frequency byregularity interaction Networks are not simple competitive chunking orMarkov chaining mechanisms working on surface form Their massivelydistributed nature allows the emergence of more abstract internalrepresentations We have argued that this analysis accounts for the human

332 ELLIS AND SCHMIDT

acquisition data of simple MAL morphosyntax quite well We believe thatthe acquisition of natural language morphosyntax where there are manyadditional factors of different phonological consistencies (of the type forexample where the neighbours sink drink and stink are irregular in theirpast tenses but all behave in the same -ankway) are equally conducive to theprinciples of this type of account although as illustrated in grandersimulation enterprises (Cottrell amp Plunkett 1994 Daugherty amp Seidenberg1994 MacWhinney amp Leinbach 1991 Marchman 1993 Plunkett ampMarchman 1993) the complexity of interaction of the factors that are therein the language evidence leads to much more complex developmentaloutcomes Our role here has been to study human acquisition underprecisely known circumstances and to demonstrate just how well aconnectionist associative account can simulate these data

A simple regularity5 consistency account of this type will have difculty ifthe ldquoregularrdquo or ldquodefaultrdquo case is not the most frequent case in a naturallanguage Although there is agreement for English past tense and formorphology more generally that the default case is more frequent theremay be exceptions Marcus et al (1995) argue that while the German particle-t applies to a much smaller percentage of verbs than its English counterpartand the German plural -s applies only to a small percentage of nounsnevertheless these afxes behave as defaults in the language These defaultsufxations in German could thus pose a problem for statistical orconnectionist accounts of the acquisition of the more frequent patterns asdefault since they may not be due to a large number of regular wordsreinforcing a pattern in associative memory (Prasada amp Pinker 1993)However this is still a matter of some debate Bybee (1995) suggests that amore reasonable method of counting German particle type frequency doesshow the default (or ldquoproductiverdquo) process to have the highest typefrequency She also argues that to a large extent the productivity patterns ofGerman plurals also reect their type frequency Nakisa and Hahn (1996)and Plunkett and Nakisa (in press) show that generalisation to unseen ornovel forms in German and Arabic (where there have also been claims for aminority default) is more accurately predicted by their phonologicalsimilarity to existing forms in the language (properly represented for typeand token frequency) rather than by the operation of a default rule FinallyHare Elman and Daugherty (1995) demonstrate that multilayerednetworks can develop a default category even in the absence of superior typefrequency as long as the non-default classes are well dened and narrowlydened so that they serve as strong prototypes for analogising to novelforms In such cases the area outside these well-dened attractor basins canconstitute a potential default (see also Plunkett amp Marchman 1991)

In the original hybrid model irregulars were stored and accessed fromrote memory Pinker and Prince (1994 p 326) modied this part of the

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 333

model arguing that since rote memory could not account (a) for similaritiesbetween the morphological base and irregular forms (eg swingndashswung) (b)for similarity within sets of base forms undergoing similar processes (egsingndashsang ringndashrang springndashsprang) or (c) for the kind of semi-productivityshown when children produce errors such as bringndashbrang or swingndashswangthe memory system underlying such productions must be associative anddynamic somewhat as connectionism portrays it Yet to account for datasuch as the frequencyregularity interaction this revised hybrid model stillholds that regular forms are rule-governed But a purely rule-based accountof regulars cannot explain false friends effects where regular inconsistentitems (eg bakendashbaked is similar in rhyme to neighbours makendashmade andtakendashtook which have inconsistent past tenses) are produced more slowlythat entirely regular ones (Daugherty amp Seidenberg 1994 Seidenberg ampBruck 1990) or frequency effects on regular forms (Oetting amp Rice 1993Stemberger amp MacWhinney 1986) Unlike connectionist models a rule-based account of regulars cannot explain these aspects of the human dataNor is the regularityfrequency interaction any reason to reject connectionistaccounts of morphosyntax in favour of a hybrid model

REFERENCESAnderson JR (1982) Acquisition of cognitive skill Psychological Review 89 369ndash406Anderson JR (1993) Rules of the mind Hillsdale NJ Lawrence Erlbaum Associates IncAnderson JR amp Schooler LJ (1991) Reections of the environment in memory

Psychological Science 2 396ndash408Beck M (1995) Tracking down the source of NSndashNNS differences in syntactic competence

Unpublished manuscript University of North TexasBellugi U Bihrle A Jernigan D Trauner D amp Dougherty S (1990)

Neuropsychological neurological and neuroanatomical prole of Williams SyndromeAmerican Journal of Medical Genetics 6 115ndash125

Braine MDS Brody RE Brooks PJ Sudhalter V Ross JA Catalano L amp FischSM (1990) Exploring language acquisition in children with a miniature articiallanguage Effects of item and pattern frequency arbitrary subclasses and correctionJournal of Memory and Language 29 591ndash610

Broeder P amp Plunkett K (1994) Connectionism and second language acquisition In NEllis (Ed) Implicit and explicit learning of languages (pp 421ndash454) London AcademicPress

Bybee J (1995) Regular morphology and the lexicon Language and Cognitive Processes10 425ndash455

Chater N (1995) Neural networks The new statistical models of mind In JP Levy DBairaktaris JA Bullinaria amp P Cairns (Eds) Connectionist models of memory andlanguage London UCL Press

Cohen JD MacWhinney B Flatt M amp Provost J (1993) PsyScope A new graphicinteractive environment for designing psychology experiments Behavioral ResearchMethods Instruments and Computers 25 257ndash271

Cottrell G amp Plunkett K (1994) Acquiring the mapping from meaning to soundsConnection Science 6 379ndash412

334 ELLIS AND SCHMIDT

Daugherty KG amp Seidenberg MS (1992) Rules or connections The past tense revisitedIn Proceedings of the 14th annual conference of the Cognitive Science Society (pp 259ndash264)Pittsburgh PA Cognitive Science Society

Daugherty KG amp Seidenberg MS (1994) Beyond rules and exceptions A connectionistapproach to inectional morphology In SD Lima RL Corrigan amp GK Iverson (Eds)The reality of linguistic rules (pp 353ndash388) Amsterdam John Benjamins

DeKeyser R (1997) Beyond explicit rule learning Automatizing second languagemorphosyntax Studies in Second Language Acquisition 19 195ndash222

Ellis NC (1996) Sequencing in SLA Phonological memory chunking and points of orderStudies in Second Language Acquisition 18 91ndash126

Eubank L amp Gregg KR (1995) ldquoEt in Amygdala Egordquo UG (S)LA and neurobiologyStudies in Second Language Acquisition 17 35ndash58

Hare M Elman JL amp Daugherty KG (1995) Default generalisation in connectionistnetworks Language and Cognitive Processes 10 601ndash630

Jung J (1971) The experimenterrsquos dilemma New York Harper amp RowKirsner K (1994) Implicit processes in second language learning In N Ellis (Ed) Implicit

and explicit learning of languages (pp 283ndash312) London Academic PressLachter J amp Bever T (1988) The relation between linguistic structure and associative

theories of language learning A constructive critique of some connectionist learningmodels Cognition 28 195ndash247

Lima SD Corrigan RL amp Iverson GK (Eds) (1994) The reality of linguistic rulesAmsterdam John Benjamins

MacWhinney B (1983) Miniature language systems as tests of use of universal operatingprinciples in second-language learning by children and adults Journal of PsycholinguisticResearch 12 467ndash478

MacWhinney B (1994) The dinosaurs and the ring In SD Lima RL Corrigan amp GKIverson (Eds) The reality of linguistic rules (pp 283ndash320) Amsterdam John Benjamins

MacWhinney B amp Leinbach J (1991) Implementations are not conceptualizationsRevising the verb learning model Cognition 40 121ndash157

Marchman VA (1993) Constraints on plasticity in a connectionist model of the Englishpast tense Journal of Cognitive Neuroscience 5 215ndash234

Marcus GF Brinkmann U Clahsen H Wiese R amp Pinker S (1995) Germaninection The exception that proves the rule Cognitive Psychology 29 198ndash256

McLaughlin B (1980) On the use of miniature articial languages in second-languageresearch Applied Psycholinguistics 1 357ndash369

Moeser SD amp Bregman AS (1972) The role of reference in the acquisition of a miniaturearticial language Journal of Verbal Learning and Verbal Behavior 11 759ndash769

Morgan JL Meier RP amp Newport EL (1987) Structural packaging in the input tolanguage learning Contributions of prosodic and morphological marking of phrases to theacquisition of language Cognitive Psychology 19 498ndash550

Morgan JL amp Newport EL (1981) The role of constituent structure in the induction of anarticial language Journal of Verbal Learning and Verbal Behavior 20 67ndash85

Morton J (1979) Facilitation in word recognition Experiments causing change in thelogogen model In PA Kolers ME Wrolstad amp M Bouma (Eds) Processing of visiblelanguage (pp 259ndash268) New York Plenum

Nakisa R amp Hahn U (1996) Where defaults donrsquot help The case of the German pluralsystem In Proceedings of the 18th annual conference of the Cognitive Science Society (pp177ndash182) Hillsdale NJ Lawrence Erlbaum Associates Inc

Newell A (1990) Unied theories of cognition Cambridge MA Harvard University PressNewell A amp Rosenbloom P (1981) Mechanisms of skill acquisition and the law of

practice In JR Anderson (Ed) Cognitive skills and their acquisition Hillsdale NJLawrence Erlbaum Associates Inc

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 335

Oetting JB amp Rice ML (1993) Plural acquisition in children with specic languageimpairment Journal of Speech and Hearing Research 36 1236ndash1248

Paivio A (1986) Mental representations A dual coding approach Oxford UK OxfordUniversity Press

Palermo DS amp Howe HE (1970) An experimental analogy to the learning of past-tenseinection rules Journal of Verbal Learning and Verbal Behavior 9 410ndash416

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S amp Prince A (1988) On language and connectionism Analysis of a parallel

distributed processing model of language acquisition Cognition 29 195ndash247Pinker S amp Prince A (1994) Regular and irregular morphology and the psychological

status of rules of grammar In SD Lima RL Corrigan amp GK Iverson (Eds) The reality oflinguistic rules (pp 321ndash351) Amsterdam John Benjamins

Plaut DC McClelland JL Seidenberg MS amp Patterson KE (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plunkett K amp Marchman V (1991) U-shaped learning and frequency effects in amulti-layered perceptron Implications for child language acquisition Cognition 38 3ndash102

Plunkett K amp Marchman V (1993) From rote learning to system building Acquiring verbmorphology in children and connectionist nets Cognition 48 21ndash69

Plunkett K amp Nakisa RC (in press) A connectionist model of Arabic plural systemLanguage and Cognitive Processes

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Prasada S Pinker S amp Snyder W (1990) Some evidence that irregular forms are retrievedfrom memory but regular forms are rule-governed Paper presented at the 31st meeting ofthe Psychonomic Society New Orleans November

Rumelhart D Hinton G amp Williams R (1986) Learning internal representations by backpropagation In DE Rumelhart amp JL McClelland (Ed) Parallel distributed processingExplorations in the microstructure of cognition Cambridge MA MIT Press

Rumelhart D amp McClelland J (1986) On learning the past tense of English verbs In DERumelhart amp JL McClelland (Eds) Parallel distributed processing Explorations in themicrostructure of cognition Vol 2 Psychological and biological models (pp 272ndash326)Cambridge MA MIT Press

Seidenberg MS amp Bruck M (1990) Consistency effects in the generation of past tensemorphology Paper presented at the 31st meeting of the Psychonomic Society New OrleansNovember

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behavior 23 383ndash404

Sharwood Smith MA (1994) The unruly world of language In N Ellis (Ed) Implicit andexplicit learning of languages (pp 33ndash44) London Academic Press

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Human Learning and Memory 6 174ndash215

Stemberger JP amp MacWhinney B (1986) Frequency and the lexical storage of regularlyinected forms Memory and Cognition 14 17ndash26

Winter B amp Reber AS (1994) Implicit learning and the acquisition of natural languagesIn N Ellis (Ed) Implicit and explicit learning of languages (pp 115ndash146) LondonAcademic Press

Yang LR amp Givon T (1997) Benets and drawbacks of controlled laboratory studies ofsecond language acquisition The Keck second language learning project Studies in SecondLanguage Acquisition 19 173ndash194

336 ELLIS AND SCHMIDT

APPENDIX

The Word-forms of the Articial Language

Picture Stem Plural Form Frequency Regularity

car garth bugarth 5 Rbed pid bupid 1 Rlamp lant bulant 5 Rtable tib butid 1 Rplane poon bupoon 5 Rball prill buprill 1 Rtrain dram budram 5 Rhouse hize buhize 1 Rbook bisk bubisk 5 Rbroom breen bubreen 1 Rphone feem gofeem 5 Iumbrella brol gubrol 1 Ichair charp zecharp 5 Ihorse naig zonaig 1 Imonkey chonk nuchonk 5 Idog woop niwoop 1 Ielephant fant vefant 5 Iscissors zoze vuzoze 1 Ikite kag rekag 5 Ish pisc ropisc 1 I

318 ELLIS AND SCHMIDT

where the model was trained on all presentndashpast tense pairs with Francis andKucera frequencies 1 (309 verbs with regular past tenses 104 verbs withirregular past tenses) failed to generate any frequency by regularityinteraction in error score However when in simulation 2 the number ofirregular verbs in the training set was reduced to just 24 this resulted in therebeing little effect of frequency on performance with the regular itemswhereas performance was better for high frequency irregular verbs than forlow frequency ones This is an important demonstration that the frequencyby regularity interaction can be simulated by a connectionist systemHowever this model concerned mappings between present- and past-tenseforms not direct access from semantics as in our human data Furthermoreit is unclear from these simulations how much the results are due toregularity per se how much to phonological factors (for example insimulation 1 the error scores for regulars in generalisation tests were inatedby there being a high proportion of phonologically similar irregular pasttense false friends in the training corpus 1994 p 375) and given thecontrasting results of simulations 1 and 2 how much to the particular choiceof training items and the relative proportions of regular and irregular items

Indeed much of the debate over the validity of all of these models hasconcerned (a) the adequacy of the adopted low-level phonologicalrepresentations whether these might serve as TRICS (The RepresentationsIt Crucially Supposes) which cryptoembody rules within the connectionistnetwork (Lachter amp Bever 1988) (b) over-reliance on phonological cues inmodels that used sound-to-sound conversions to link base forms with pasttense forms (Daugherty amp Seidenberg 1992 MacWhinney 1994MacWhinney amp Leinbach 1991) and (c) the appropriateness of the trainingsets that are used in exposing the models to the evidence of language andwhether they properly reect the types and tokens in representative ratiosof regular and irregular forms in a sequence that plausibly mirrors learnerlanguage exposure at different stages of development (Daugherty ampSeidenberg 1992 Plunkett amp Marchman 1991) The models are usuallyconcerned with child learner language exposure yet here the extrapolationis particularly tenuous since adult language frequency norms are typicallythe only available reference database

In our simple demonstration with its intended focus on the frequency byregularity interaction in the acquisition of morphology we circumventedthese problems by the following means

1 We eliminated TRICS from our input and output representations byentirely ignoring the low level representations and instead simply havingone input unit for each picture and one output unit for each morphemeWe make no pretence of plausibility of these models for low levels ofrepresentation in either input or output processing but we are presently

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 319

neither concerned with low-level feature perception nor the details ofmotor programming for pronunciation Each input unit is supposedbroadly to correspond to activation of some picture detector or ldquoimagenrdquo(Paivio 1986) each output unit to some speech output ldquologogenrdquo(Morton 1979) We acknowledge that these parts of the model are grosslysimplied and we believe that these aspects ultimately involve distributedrepresentations as well However there is one advantage to thissimplicitymdashwhere as here each input detector or output logogen isrepresented by just one unit with all units having the same form there isno scope for making some more similar than others other that is than isdetermined by the frequency of the inputndashoutput mappings Thisencoding scheme allows the most hygienic investigation of frequency andregularity uncontaminated by other factors2 Like Cottrell and Plunkett (1994) we are modelling direct access fromsemantics rather than generating past tense from stem form phonologyBecause there are no phonological representations in our model there isno chance of the results reecting any confound with phonology As usualcosts accompany the benets Our simulations can have no bearing onphonological aspects of inection and thus while they might generatequantitatively clean data unlike the elegant error analyses performed byfor example Daugherty and Seidenberg (1994) and MacWhinney andLeinbach (1991) the error responses in the present simulations will bequalitatively uninteresting3 We eliminated uncertainty about the detailed content of the complexevidence which human learners are exposed to during their early years ofhearing natural language by modelling adult subjectsrsquo learning of theMAL that was reported in the preceding section Because we determinedthe exposure sequence of types and tokens of regular and irregular itemsin this language learning task we could train the models ensuring theidentical history of exposure

The most common architecture of connectionist model has three layersthe input layer of units the output layer and an intervening layer of hiddenunits (HUs) The presence of HUs enables more difcult inputoutputmappings to be learned than would be possible if the input units weredirectly connected to the output units (Broeder amp Plunkett 1994Rumelhart amp McClelland 1986) The most common learning algorithm isldquoback propagationrdquo (Rumelhart Hinton amp Williams 1986) where on eachlearning trial the network compares its output with the target output andany difference is propagated back to the hidden unit weights and in turn tothe input weights in a way that reduces the error Our simulations adoptedthis standard architecture Thus whatever the pattern of results they aregenerated by a very general learning system whose processes were not

320 ELLIS AND SCHMIDT

tweaked in any way to make it particular as a Language Acquisition DeviceSo what are the emergent patterns of language acquisition that result whenthis general associative learning mechanism is applied to the particularcontent of picture stimuli with their corresponding singular and plural lexicalresponses as experienced at the same relative frequencies of exposure as ourhuman learners

The Models

Architecture Every model had 22 input (I-) units Each of I-units 1ndash20represented one of the pictures used in the training set of the AppendixI-unit 21 represented another picture (the generalisation test item TesterP)which was only ever presented for training to the model in the singularmdashlater it was presented as a plural test item to see which plural afx the modelwould choose for this generalisation item (akin to asking you what is theplural of a novel word like ldquowugrdquo) I-unit 22 coded plurality that iswhether a singular stimulus item or a pair were presented Every model had32 output (O-) units O-units 1ndash20 represented the stem forms of the lexisshown in the Appendix O-unit 21 represented the stem form correspondingto I-unit 21 O-units 22ndash31 represented each of the other 10 unique pluralafxes for irregular items O-unit 32 represented the regular plural afxThis numbering of I- and O-units is of course arbitrary and was random-ised across modelsmdashwhat mattered and remained constant was that thesame O-unit was always reinforced whenever a particular I-unit wasactivated

We investigated four different classes of model which differed in theircomputational capacity or resources The larger the number of HUs in amodel the larger the number of connections in the network and the greaterits capacity to learn new associations and abstractions Thus we comparedmodels with 3 5 8 and 15 HUs

Stem Training At the outset the connection weights of the models wererandomised Then just like our human learners the models were rsttrained on the singular forms Each epoch of training consisted of 21 trialsEach trial consisted of presentation of a unique input pattern one for each ofthe input pictures Thus just one of I-units 1ndash21 would be ldquoonrdquo on any trialThroughout the singular training phase I-unit 22 (representing singlepluralstimuli) was set to ldquooffrdquo For each input pattern the model responded with apattern of output over its 32 O-units Initially this was the random result ofthe random connection weights But the model was also presented with thecorrect pattern of output for that corresponding input pattern (eg if I-unit 1

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 321

was on and all others off O-unit 1 should have had value 10 and all otherszero) On each trial the back-propagation algorithm calculated thedifference between the level of activity that was produced on each O-unitand the ldquocorrectrdquo level of activity and a small adjustment was made to theconnection strength to that unit in such a way that when the same processoccurred again a closer approximation to the correct pattern of outputactivation would be achieved The models were trained for 500 epochs ofsingular experience For each size of model we ran ve examples startingwith different arbitrary unit allocation and different initial randomconnection strengths The data we produce for each model is the averageperformance of these ve examples

Plural Training The model weights that resulted from this singulartraining then served as the starting point for another 700 epochs of trainingon plurals The trials constituting each epoch were very similar in nature tothose used with the human learners Each epoch consisted of 81 trialspresented in random order (a) One presentation of each of the 21 singularforms as in the preceding phase (b) ve presentations of each of the ve highfrequency regular (HiFreqReg) plural forms (c) ve presentations of eachof the ve high frequency irregular (HiFreqIrreg) plural forms (d) onepresentation of each of the ve low frequency regular (LoFreqReg) formsand (e) one presentation of each of the ve low frequency irregular(LoFreqIrreg) forms For training trials of type (a) just one of I-units 1ndash21was activated I-unit 22 was off and just the corresponding one of O-units1ndash21 was reinforced For the other training types (bndashe) one of I-units 1ndash20was activated I-unit 22 was on and one of O-units 1ndash20 (the correspondingstem form) along with one of O-units 22ndash32 (the corresponding plural afx)were reinforced The learning algorithm operated as it did in the stemtraining phase At regular intervals we tested the state of learning of themodel by presenting it without feedback with test input patterns thatrepresented the plural cases of all 21 pictures At these tests for eachstimulus we measured the pattern of activation (between 0 [no activation]and 1 [full on]) across O-units 22ndash32 and compared it against the targetplural activation for that input pattern

Results

Regularity by Frequency Figure 2 shows the Root Mean Square (RMS)error calculated across the plural afx O-units (22ndash32) averaged over the veitems in each of the following classes HiFreqReg HiFreqIrreg LoFreqRegLoFreqIrreg at each point in testing of the model These graphs illustratethat learning in all of the models showed clear effects of frequency (high

322

FIG

2

Acq

uisi

tion

data

for

fou

r co

nnec

tioni

st m

odel

s w

ith

incr

easi

ng c

ompu

tati

onal

pow

er t

rain

ed o

n th

e M

AL

mor

phol

ogy

The

re a

re c

lear

reg

ular

ity b

y fr

eque

ncy

inte

ract

ions

in a

ll m

odel

s

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 323

frequency items were learned faster than low frequency ones) regularity(regular items were learned faster than irregular ones) and a frequency byregularity interaction whereby there was much less regularity effect for highfrequency items than for low frequency items and equally that thefrequency effect was less for regular items than for irregular ones

ANOVAs on these RMS data for each size of model demonstrated thatthere was high consistency of response across items and examplesimulations For example when the 8HU model was analysed as a repeatedmeasures ANOVA across 15 roughly equally spaced blocks of training (toparallel the human data analysis) the following signicant effects wereobserved (a) Frequency [by simulations F(1 16) 5 2080 P 00005 bywords F(1 16) 5 5665 P 00001] (b) regularity [by simulations F(1 16)5 907 P 001 by words F(1 16) 5 3957 P 00001] (c) regularity byfrequency [by simulations F(1 16) 5 485 P 005 by words F(1 16) 51561 P 0005] (d) block [by simulations F(14 224) 5 6803 P 00001by words F(14 224) 5 14914 P 00001] (e) block by regularity [bysimulations F(14 224) 5 3675 P 00001 by words F(14 224) 5 2929 P 00001] (f) block by frequency [by simulations F(14 224) 5 1893 P 00001 by words F(14 224) 5 1184 P 00001] and (g) block by regularityby frequency [by simulations F(14 224) 5 1611 P 00001 by words F(14224) 5 1306 P 00001]

Comparison of this pattern of ANOVA effects with that reported earlierfor the human data shows important similarities in both cases there aresignicant main effects of frequency regularity and blocks and there aresignicant interactions involving regularity by frequency and regularity byfrequency by block Thus the connectionist models demonstrate effectswhich broadly parallel those found in humans

Comparison with Human Data More detailed comparison is alsopossible Although RMS error is the usual measure of model performancebecause it assesses how well the network learns to inhibit non-relevant unitsas well as to excite relevant ones we also extracted simple accuracy data forthe 8HU model This accuracy score is the amount of activation (between 0and 1) on the single O-unit which corresponds to the appropriate target afxfor that input pattern Figure 3 shows the performance of the 8HU modelusing this metric It is clear that accuracy scores generate a graph which iseffectively a reection in a horizontal plane of the RMS data shown in thethird panel of Fig 2 In fact in the current simulations correct activation isalmost perfectly correlated with MSE (for example r 5 2 0988 for the 8HUmodel) However the activation metric has the advantage of more readyinterpretation and direct comparison with the human data

When the 8HU model and the human data are aligned as in Fig 3 thesecorrespondences become clear Pairwise comparison of individual points

324 ELLIS AND SCHMIDT

FIG 3 A comparison of human accuracy performance and that of the eight hidden unitconnectionist simulation

across these two graphs by correlation shows that the simulation predicts alarge proportion of the variance in the human data (R2 5 078) There aresome differences in detailmdashas is claried in Fig 4 where performance isaveraged over blocks the model performs somewhat better on the regularitems and worse on the irregular items particularly the low frequencyirregular items than do the humans ANOVA (three factor [humanmodelregularity and frequency] with 15 blocks as repeated measures by wordsanalysis) comparing the human and 8HU model data conrms theseinteractions (a) humanmodel F(1 32) 5 136 ns (b) humanmodel byfrequency F(1 32) 5 047 ns (c) humanmodel by regularity F(1 32) 53028 P 00001 (d) humanmodel by regularity by frequency F(1 32) 5501 P 005

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 325

FIG 4 The regularity by frequency interaction averaged over blocks in humans and the eighthidden unit model Error bars reect 95 condence intervals

Generalisation So far we have described performance with traineditems However we also tested model output when the stimulus was thepattern for generalisation item (TesterP) along with activation of the pluralmarking I-unit 22 a state of input on which the models had never beentrained Table 1 shows performance of the different models at the end oftraining It is clear that the larger models have abstracted the regular pluralpattern and tend to apply it by default to the generalisation test item for the15HU model (a) average activation on the regular plural O-unit is 060 (b)mean RMS error comparing observed activation across O-units 22ndash32 andthe target regular plural pattern (10000000000) is just 045 and (c) four outof the ve exemplar runs of this size of model chose the regular pluralpattern as being the closest to observed output as measured by minimum

326 ELLIS AND SCHMIDT

TABLE 1Performance on the Target Regular Plural Pattern for the Four Sizes of Model When

Presented with the Generalisation Wug-test Item TesterP at End of Training

Model Size

Measure 3HUa 5HU 8HU 15HU

RMS errorb

M 081 079 053 045SD 043 050 045 032

Activation weightc

M 020 028 057 060SD 044 044 052 035

N hits (5)d 1 2 3 4

There were ve examplars of each size of model aHU 5 hidden units bRMS error calculatedagainst the target activation pattern across O-units 22ndash32 for the regular plural afx cActivationweight on the regular plural afx O-Unit dNumber of exemplar models (5) which chose theregular plural afx pattern for TesterP as indexed by output weights on O-units 22ndash32 beingclosest to the regular plural afx target pattern activation using a squared Euclidean distancemetric

squared Euclidean distance Thus when the larger models are presentedwith a plural stimulus which they have only ever previously experienced as asingle form there is a tendency for them to generalise and apply the regularplural morpheme (bu-) in the same way that humans might generalise thatthe plural of ldquowugrdquo is ldquowugsrdquo

Effects of Different Sizes of Model Figure 2 also illustrates the effects ofmanipulating computational capacity of model (1) Models with lowercomputational power ( 5 a smaller number of HUs) learn the high frequencyitems quite wellmdashalmost as well as the largest model (2) The most strikingeffect of varying the computational power of the models lies in their abilitiesto learn low frequency irregular itemsmdashthis is by far the most sensitive indexof morphological learning ability The 3HU model hardly manages to learnthese forms at all The 15HU model eventually learns them rather well (3)There is essentially no frequency effect for regular items in the highercomputational power models but none the less the frequency effect forirregular items remains strong (4) The smaller models continue to show afrequency effect for regular items at the end of training Table l provides oneadditional effect of model size (5) The greater the computational power ofthe models the more they operate in ldquorule-likerdquo way by abstracting aldquoregularrdquo plural form which is applied by default to novel items In sumwhile lower computational power models are reasonably good on highfrequency regular items they show frequency effects for irregular and

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 327

regular items are relatively poor on ldquowug testsrdquo and have particulardifculty on low frequency irregular items

Discussion of Simulations

We believe that at least for the issue of regularity and frequency effects inmorphosyntax this is to date the most complete quantitative analysis of theadequacy of t of simulation to human data We are not simply makingpredictions about how an underspecied model might behave (theDaugherty amp Seidenberg 1994 criticisms of the Pinker amp Prince 1988 andPinker 1991 theories) We are not simply demonstrating that simulation andhuman data alike exhibit rst order interactions of frequency and regularity(Daugherty amp Seidenberg 1994) Instead we are showing the parallelpatterns of signicance of main effects rst and second order interactions inANOVAs of simulation and human data and we are showing that thesimulations explain close to 80 of the relevant human data When we go asfar as actually comparing human and model performance in a multifactorialANOVA we nd some differences of detail in the size of interactions thatare qualied by the humanmodel factor But these differences of detail donot detract from the general success of the models in simulating the humanpattern of development of the frequency by regularity interaction Inhumans and models alike high frequency items were learned signicantlyfaster than low frequency ones regular items were learned signicantlyfaster than irregular ones there was a signicant frequency by regularityinteraction where the frequency effect was less for regular items than forirregular ones and this is qualied as the higher level interaction with blockwhereby there is a developmental trendmdashthe frequency effect for regularitems attenuates faster than that for irregular items

We have demonstrated that the models can generalise and produce thedefault plural afx for a novel stimulus Similar ldquowug testrdquo performance by ahuman learner would be taken as an operationalisation that they hadacquired the ldquoregularrdquo morphological systematicity

Finally we have shown how varying the computational capacity of themodels affects both the rate of acquisition of default case the presence orabsence of frequency effects for regular items and ability to acquireirregular items This is compatible with existing data for children withspecic language impairment (SLI) Oetting and Rice (1993) compared ve-year-old SLI children with age-matched controls on their ability to formplurals The SLI children were signicantly worse at generating regularplurals for nonce (5 wug) items they were worse at generating regularplurals and they showed an effect of frequency on the regular items whichthe control children because of ceiling effects did not UnfortunatelyOetting and Rice (1993) do not provide clear data on the childrenrsquos ability to

328 ELLIS AND SCHMIDT

form irregular plurals However their pattern of differences between SLIand control childrenrsquos performance on regular items is sufciently close tothat between the present low-capacity and high-capacity simulations tosuggest that morphosyntactic impairments in individuals with SLI might beexplained by reduced language processing capacity in a general associativememory network rather than by a hybrid account The SLI childrenrsquosshowing frequency effects for regular items is particularly compelling in thisrespect However further assessment of regularity by frequency effects anddefault abstraction in individuals with SLI and with Williams syndrome(whose ability on regular forms is said to outstrip their performance onirregularsmdashBellugi Bihrle Jernigan Trauner amp Dougherty 1990) isnecessary to test these parallels further (see Marchman 1993 for othersimulations of different types of language dysfunction)

GENERAL DISCUSSION

Fluent language users have processed many millions of utterances involvingtens of thousands of types presented as innumerable tokens It should comeas no surprise either that they demonstrate such effortless and complex skillas a result of this mass of practice or that researchers lacking any truerecord of the learnersrsquo experience are awed and confused by thesesophisticated grammatical abilities While we have no wish to deny any ofthe complexity of the nal uent state we suspect that much of the mysteryof morphology can be claried by focusing on the acquisition process ratherthan the end-point This has been our aim in this paper Our MAL is atravesty of natural language but at least we know the types and tokens in thelearnersrsquo language evidence and there is no need to speculate or argue aboutextrapolations from corpus data or assumptions about registers

Human learning of this MAL inectional morphology quickly culminatesin a state where as with natural language frequency and regularity haveinteractive effects on performance But as we chart acquisition it is clearthat this interaction need not imply complex dual-mechanisms of processingRather it simply reects the asymptotes expected from the power law ofpractice a simple associative law of learning Thus we have shown that oneof the most frequently introduced arguments for the necessity of adual-mechanism approach a frequency effect for irregulars and the absenceof such an effect for regulars is not a good argument at all Furthermore wehave demonstrated that a simple connectionist model as an implementationof associative learning provided with the same language evidenceaccurately simulates the human acquisition data

But how is the power law instantiated in human and connectionistsystems and what is being associated in the acquisition of inectional

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 329

morphosyntax The power law of learning in human performance has beeninterpreted as resulting from basic associative mechanisms involving theformation of new chunks and the effects of frequency on the accessibility ofthese representations (Newell 1990 Newell amp Rosenbloom 1981)Anderson and Schooler (1991) suggest that memory (both as its behaviouralexpression in error rate and latency and as its neural expression in LTP)displays properties such as the power law of learning because theseproperties reect an optimal response to the environment where theprobability of an item occurring at any particular time is a power function ofits past frequency of occurrence Neural activation which controlsbehaviour reects the probability of an item occurring in the environmentthus the neural processes are designed to adapt behaviour to the statisticalproperties of the environment (Anderson 1993) Connectionist systems aredesigned to do the same thing (Chater 1995)

In our simplied account of inectional morphology where phonologicalfactors are put to one side the relevant units for chunking are the stem formsand the plural afxes From an associative perspective regularity andfrequency are essentially the same factor under different names The rstmeaning of ldquoregularrdquo in the Pocket Oxford Dictionary involves ldquohabitualconstantrdquo acts a denition in terms of statistical frequencies consistencyand descriptive generalisation the second stresses ldquoconforming to a rule orprinciplerdquo We need to disentangle these senses (see Sharwood-Smith 1994and Lima Corrigan amp Iverson 1994 for conceptual analysis of ldquorules oflanguagerdquo) Whether regular morphology is generated according to a rule ornot it is certainly the case for English and the MAL under study here (andgenerally it is the default if not the universal casemdashwe will return to thismatter later) that regular afxes are more habitual or frequent And asdemonstrated in Fig 5 the power law of practice entails that an effect of aconstant increment of regularity (in its frequency sense) is much moreapparent at low than at high frequencies of practice

Although it is a general principle the degree to which it applies dependson a range of factors including (a) the exponent of the power function (b)the particular level of experience attained and thus the placement ofcomparison points on the learning curve and (c) the degree to whichfrequency and regularity are additive or multiplicative In the presentexperiment a vefold increase in the frequency of the regular items resultsin a (5 3 the number of regular items) increase in use of the regular afx avefold increase in the frequency of an irregular item results in merely avefold increase in the use of the irregular afx Thus frequency andregularity are interactive rather than additive But even if we allow forinteraction the function still results in greater regularity effects for lowfrequency itemsmdashjust as for example the power function

330 ELLIS AND SCHMIDT

FIG 5 A frequency by regularity interaction arising from additive contributions of regularity(solid horizontal arrows) and frequency (dotted horizontal arrows) inputting into anasymptoting power function Notice in particular the solid vertical bars measuring out the largeregularity effect at low frequencies and the much smaller one at high frequencies (Adaptedfrom Plaut McClelland Seidenberg amp Patterson 1994)

y 5 1 2 x2 2

asymptotes so does any power function

y 5 1 2 (xn)2 2

where n 0 the shape remains the same albeit stretched or condensedalong the horizontal axis Thus all associative accounts of morphologywhether they stress the importance of type or token frequency (Bybee 1995)in the determination of statistical regularity imply a frequency by regularityinteraction in performance

Plaut et al (1996) analyse the operation of connectionist networks in theparticular quasi-regular domain of spellingndashsound consistency in reading todemonstrate how the frequency by regularity interaction is a direct

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 331

consequence of the nonlinearity adaptivity and distributed representationproperties of learning and representation in PDP networks In what followswe will minimally rephrase their analysis as it applies to the quasi-regulardomain of inectional morphology In a connectionist network the weightchanges induced by an inputoutput pattern (IOP) on any training epochserve to reduce the error on that IOP The frequency of the IOP (and theunits it involves) is reected in how often it is presented to the network Thusword frequency directly amplies weight changes that are helpful to theIOP itself Consistency of the morphological inections of two stems isreected in the similarity of afx units that are co-activated in their IOPsFurthermore two inputs will induce similar weight changes to the extentthat they activate similar units In our MAL as an extreme case consistentforms all activate the same afx unit irregular ones each activate a differentidiosyncratic afx Given that the weight changes that are induced by eachIOP are superimposed on the weight changes for all other IOPs an IOPwill tend to be helped by the weight changes for IOPs whose inputoutputmappings are consistent with its own and hindered by the weight changesfor inconsistent IOPs Thus frequency and consistency sum because theyboth arise from similar weight changes that are simply added together duringtraining The weight changes result in corresponding increases in thesummed input to output units that should be active and decreases to thesummed units that should be inactive However due to the non-linearity ofthe input-output function of units these changes do not produce directlyproportionate reduction of error Rather as the magnitude of the summedinput to output units increases their states gradually asymptote towards10mdasha given increase in the summed input to a unit yields progressivelysmaller decrements in error over the course of training Thus althoughfrequency and regularity-as-consistency each contribute to the weights andhence to the summed input to units their effect on error is subjected to agradual ceiling effect as unit states are driven towards extremal values

Thus a connectionist associative account of simple morphosyntax as it isembodied in our MAL holds that learning involves associating inputpatterns representing single or plural concepts with stem and afx lemmasacross a large distributed network Frequency of experience increases thestrength of the appropriate IO associations Regularity effects stem fromconsistency the consistent items all involve pairings between plurality andthe regular lemma and thus regularity is frequency by another name Thenetwork sums and abstracts these consistencies but it does so usingnon-linear unit inputndashoutput functions thereby resulting in the frequency byregularity interaction Networks are not simple competitive chunking orMarkov chaining mechanisms working on surface form Their massivelydistributed nature allows the emergence of more abstract internalrepresentations We have argued that this analysis accounts for the human

332 ELLIS AND SCHMIDT

acquisition data of simple MAL morphosyntax quite well We believe thatthe acquisition of natural language morphosyntax where there are manyadditional factors of different phonological consistencies (of the type forexample where the neighbours sink drink and stink are irregular in theirpast tenses but all behave in the same -ankway) are equally conducive to theprinciples of this type of account although as illustrated in grandersimulation enterprises (Cottrell amp Plunkett 1994 Daugherty amp Seidenberg1994 MacWhinney amp Leinbach 1991 Marchman 1993 Plunkett ampMarchman 1993) the complexity of interaction of the factors that are therein the language evidence leads to much more complex developmentaloutcomes Our role here has been to study human acquisition underprecisely known circumstances and to demonstrate just how well aconnectionist associative account can simulate these data

A simple regularity5 consistency account of this type will have difculty ifthe ldquoregularrdquo or ldquodefaultrdquo case is not the most frequent case in a naturallanguage Although there is agreement for English past tense and formorphology more generally that the default case is more frequent theremay be exceptions Marcus et al (1995) argue that while the German particle-t applies to a much smaller percentage of verbs than its English counterpartand the German plural -s applies only to a small percentage of nounsnevertheless these afxes behave as defaults in the language These defaultsufxations in German could thus pose a problem for statistical orconnectionist accounts of the acquisition of the more frequent patterns asdefault since they may not be due to a large number of regular wordsreinforcing a pattern in associative memory (Prasada amp Pinker 1993)However this is still a matter of some debate Bybee (1995) suggests that amore reasonable method of counting German particle type frequency doesshow the default (or ldquoproductiverdquo) process to have the highest typefrequency She also argues that to a large extent the productivity patterns ofGerman plurals also reect their type frequency Nakisa and Hahn (1996)and Plunkett and Nakisa (in press) show that generalisation to unseen ornovel forms in German and Arabic (where there have also been claims for aminority default) is more accurately predicted by their phonologicalsimilarity to existing forms in the language (properly represented for typeand token frequency) rather than by the operation of a default rule FinallyHare Elman and Daugherty (1995) demonstrate that multilayerednetworks can develop a default category even in the absence of superior typefrequency as long as the non-default classes are well dened and narrowlydened so that they serve as strong prototypes for analogising to novelforms In such cases the area outside these well-dened attractor basins canconstitute a potential default (see also Plunkett amp Marchman 1991)

In the original hybrid model irregulars were stored and accessed fromrote memory Pinker and Prince (1994 p 326) modied this part of the

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 333

model arguing that since rote memory could not account (a) for similaritiesbetween the morphological base and irregular forms (eg swingndashswung) (b)for similarity within sets of base forms undergoing similar processes (egsingndashsang ringndashrang springndashsprang) or (c) for the kind of semi-productivityshown when children produce errors such as bringndashbrang or swingndashswangthe memory system underlying such productions must be associative anddynamic somewhat as connectionism portrays it Yet to account for datasuch as the frequencyregularity interaction this revised hybrid model stillholds that regular forms are rule-governed But a purely rule-based accountof regulars cannot explain false friends effects where regular inconsistentitems (eg bakendashbaked is similar in rhyme to neighbours makendashmade andtakendashtook which have inconsistent past tenses) are produced more slowlythat entirely regular ones (Daugherty amp Seidenberg 1994 Seidenberg ampBruck 1990) or frequency effects on regular forms (Oetting amp Rice 1993Stemberger amp MacWhinney 1986) Unlike connectionist models a rule-based account of regulars cannot explain these aspects of the human dataNor is the regularityfrequency interaction any reason to reject connectionistaccounts of morphosyntax in favour of a hybrid model

REFERENCESAnderson JR (1982) Acquisition of cognitive skill Psychological Review 89 369ndash406Anderson JR (1993) Rules of the mind Hillsdale NJ Lawrence Erlbaum Associates IncAnderson JR amp Schooler LJ (1991) Reections of the environment in memory

Psychological Science 2 396ndash408Beck M (1995) Tracking down the source of NSndashNNS differences in syntactic competence

Unpublished manuscript University of North TexasBellugi U Bihrle A Jernigan D Trauner D amp Dougherty S (1990)

Neuropsychological neurological and neuroanatomical prole of Williams SyndromeAmerican Journal of Medical Genetics 6 115ndash125

Braine MDS Brody RE Brooks PJ Sudhalter V Ross JA Catalano L amp FischSM (1990) Exploring language acquisition in children with a miniature articiallanguage Effects of item and pattern frequency arbitrary subclasses and correctionJournal of Memory and Language 29 591ndash610

Broeder P amp Plunkett K (1994) Connectionism and second language acquisition In NEllis (Ed) Implicit and explicit learning of languages (pp 421ndash454) London AcademicPress

Bybee J (1995) Regular morphology and the lexicon Language and Cognitive Processes10 425ndash455

Chater N (1995) Neural networks The new statistical models of mind In JP Levy DBairaktaris JA Bullinaria amp P Cairns (Eds) Connectionist models of memory andlanguage London UCL Press

Cohen JD MacWhinney B Flatt M amp Provost J (1993) PsyScope A new graphicinteractive environment for designing psychology experiments Behavioral ResearchMethods Instruments and Computers 25 257ndash271

Cottrell G amp Plunkett K (1994) Acquiring the mapping from meaning to soundsConnection Science 6 379ndash412

334 ELLIS AND SCHMIDT

Daugherty KG amp Seidenberg MS (1992) Rules or connections The past tense revisitedIn Proceedings of the 14th annual conference of the Cognitive Science Society (pp 259ndash264)Pittsburgh PA Cognitive Science Society

Daugherty KG amp Seidenberg MS (1994) Beyond rules and exceptions A connectionistapproach to inectional morphology In SD Lima RL Corrigan amp GK Iverson (Eds)The reality of linguistic rules (pp 353ndash388) Amsterdam John Benjamins

DeKeyser R (1997) Beyond explicit rule learning Automatizing second languagemorphosyntax Studies in Second Language Acquisition 19 195ndash222

Ellis NC (1996) Sequencing in SLA Phonological memory chunking and points of orderStudies in Second Language Acquisition 18 91ndash126

Eubank L amp Gregg KR (1995) ldquoEt in Amygdala Egordquo UG (S)LA and neurobiologyStudies in Second Language Acquisition 17 35ndash58

Hare M Elman JL amp Daugherty KG (1995) Default generalisation in connectionistnetworks Language and Cognitive Processes 10 601ndash630

Jung J (1971) The experimenterrsquos dilemma New York Harper amp RowKirsner K (1994) Implicit processes in second language learning In N Ellis (Ed) Implicit

and explicit learning of languages (pp 283ndash312) London Academic PressLachter J amp Bever T (1988) The relation between linguistic structure and associative

theories of language learning A constructive critique of some connectionist learningmodels Cognition 28 195ndash247

Lima SD Corrigan RL amp Iverson GK (Eds) (1994) The reality of linguistic rulesAmsterdam John Benjamins

MacWhinney B (1983) Miniature language systems as tests of use of universal operatingprinciples in second-language learning by children and adults Journal of PsycholinguisticResearch 12 467ndash478

MacWhinney B (1994) The dinosaurs and the ring In SD Lima RL Corrigan amp GKIverson (Eds) The reality of linguistic rules (pp 283ndash320) Amsterdam John Benjamins

MacWhinney B amp Leinbach J (1991) Implementations are not conceptualizationsRevising the verb learning model Cognition 40 121ndash157

Marchman VA (1993) Constraints on plasticity in a connectionist model of the Englishpast tense Journal of Cognitive Neuroscience 5 215ndash234

Marcus GF Brinkmann U Clahsen H Wiese R amp Pinker S (1995) Germaninection The exception that proves the rule Cognitive Psychology 29 198ndash256

McLaughlin B (1980) On the use of miniature articial languages in second-languageresearch Applied Psycholinguistics 1 357ndash369

Moeser SD amp Bregman AS (1972) The role of reference in the acquisition of a miniaturearticial language Journal of Verbal Learning and Verbal Behavior 11 759ndash769

Morgan JL Meier RP amp Newport EL (1987) Structural packaging in the input tolanguage learning Contributions of prosodic and morphological marking of phrases to theacquisition of language Cognitive Psychology 19 498ndash550

Morgan JL amp Newport EL (1981) The role of constituent structure in the induction of anarticial language Journal of Verbal Learning and Verbal Behavior 20 67ndash85

Morton J (1979) Facilitation in word recognition Experiments causing change in thelogogen model In PA Kolers ME Wrolstad amp M Bouma (Eds) Processing of visiblelanguage (pp 259ndash268) New York Plenum

Nakisa R amp Hahn U (1996) Where defaults donrsquot help The case of the German pluralsystem In Proceedings of the 18th annual conference of the Cognitive Science Society (pp177ndash182) Hillsdale NJ Lawrence Erlbaum Associates Inc

Newell A (1990) Unied theories of cognition Cambridge MA Harvard University PressNewell A amp Rosenbloom P (1981) Mechanisms of skill acquisition and the law of

practice In JR Anderson (Ed) Cognitive skills and their acquisition Hillsdale NJLawrence Erlbaum Associates Inc

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 335

Oetting JB amp Rice ML (1993) Plural acquisition in children with specic languageimpairment Journal of Speech and Hearing Research 36 1236ndash1248

Paivio A (1986) Mental representations A dual coding approach Oxford UK OxfordUniversity Press

Palermo DS amp Howe HE (1970) An experimental analogy to the learning of past-tenseinection rules Journal of Verbal Learning and Verbal Behavior 9 410ndash416

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S amp Prince A (1988) On language and connectionism Analysis of a parallel

distributed processing model of language acquisition Cognition 29 195ndash247Pinker S amp Prince A (1994) Regular and irregular morphology and the psychological

status of rules of grammar In SD Lima RL Corrigan amp GK Iverson (Eds) The reality oflinguistic rules (pp 321ndash351) Amsterdam John Benjamins

Plaut DC McClelland JL Seidenberg MS amp Patterson KE (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plunkett K amp Marchman V (1991) U-shaped learning and frequency effects in amulti-layered perceptron Implications for child language acquisition Cognition 38 3ndash102

Plunkett K amp Marchman V (1993) From rote learning to system building Acquiring verbmorphology in children and connectionist nets Cognition 48 21ndash69

Plunkett K amp Nakisa RC (in press) A connectionist model of Arabic plural systemLanguage and Cognitive Processes

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Prasada S Pinker S amp Snyder W (1990) Some evidence that irregular forms are retrievedfrom memory but regular forms are rule-governed Paper presented at the 31st meeting ofthe Psychonomic Society New Orleans November

Rumelhart D Hinton G amp Williams R (1986) Learning internal representations by backpropagation In DE Rumelhart amp JL McClelland (Ed) Parallel distributed processingExplorations in the microstructure of cognition Cambridge MA MIT Press

Rumelhart D amp McClelland J (1986) On learning the past tense of English verbs In DERumelhart amp JL McClelland (Eds) Parallel distributed processing Explorations in themicrostructure of cognition Vol 2 Psychological and biological models (pp 272ndash326)Cambridge MA MIT Press

Seidenberg MS amp Bruck M (1990) Consistency effects in the generation of past tensemorphology Paper presented at the 31st meeting of the Psychonomic Society New OrleansNovember

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behavior 23 383ndash404

Sharwood Smith MA (1994) The unruly world of language In N Ellis (Ed) Implicit andexplicit learning of languages (pp 33ndash44) London Academic Press

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Human Learning and Memory 6 174ndash215

Stemberger JP amp MacWhinney B (1986) Frequency and the lexical storage of regularlyinected forms Memory and Cognition 14 17ndash26

Winter B amp Reber AS (1994) Implicit learning and the acquisition of natural languagesIn N Ellis (Ed) Implicit and explicit learning of languages (pp 115ndash146) LondonAcademic Press

Yang LR amp Givon T (1997) Benets and drawbacks of controlled laboratory studies ofsecond language acquisition The Keck second language learning project Studies in SecondLanguage Acquisition 19 173ndash194

336 ELLIS AND SCHMIDT

APPENDIX

The Word-forms of the Articial Language

Picture Stem Plural Form Frequency Regularity

car garth bugarth 5 Rbed pid bupid 1 Rlamp lant bulant 5 Rtable tib butid 1 Rplane poon bupoon 5 Rball prill buprill 1 Rtrain dram budram 5 Rhouse hize buhize 1 Rbook bisk bubisk 5 Rbroom breen bubreen 1 Rphone feem gofeem 5 Iumbrella brol gubrol 1 Ichair charp zecharp 5 Ihorse naig zonaig 1 Imonkey chonk nuchonk 5 Idog woop niwoop 1 Ielephant fant vefant 5 Iscissors zoze vuzoze 1 Ikite kag rekag 5 Ish pisc ropisc 1 I

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 319

neither concerned with low-level feature perception nor the details ofmotor programming for pronunciation Each input unit is supposedbroadly to correspond to activation of some picture detector or ldquoimagenrdquo(Paivio 1986) each output unit to some speech output ldquologogenrdquo(Morton 1979) We acknowledge that these parts of the model are grosslysimplied and we believe that these aspects ultimately involve distributedrepresentations as well However there is one advantage to thissimplicitymdashwhere as here each input detector or output logogen isrepresented by just one unit with all units having the same form there isno scope for making some more similar than others other that is than isdetermined by the frequency of the inputndashoutput mappings Thisencoding scheme allows the most hygienic investigation of frequency andregularity uncontaminated by other factors2 Like Cottrell and Plunkett (1994) we are modelling direct access fromsemantics rather than generating past tense from stem form phonologyBecause there are no phonological representations in our model there isno chance of the results reecting any confound with phonology As usualcosts accompany the benets Our simulations can have no bearing onphonological aspects of inection and thus while they might generatequantitatively clean data unlike the elegant error analyses performed byfor example Daugherty and Seidenberg (1994) and MacWhinney andLeinbach (1991) the error responses in the present simulations will bequalitatively uninteresting3 We eliminated uncertainty about the detailed content of the complexevidence which human learners are exposed to during their early years ofhearing natural language by modelling adult subjectsrsquo learning of theMAL that was reported in the preceding section Because we determinedthe exposure sequence of types and tokens of regular and irregular itemsin this language learning task we could train the models ensuring theidentical history of exposure

The most common architecture of connectionist model has three layersthe input layer of units the output layer and an intervening layer of hiddenunits (HUs) The presence of HUs enables more difcult inputoutputmappings to be learned than would be possible if the input units weredirectly connected to the output units (Broeder amp Plunkett 1994Rumelhart amp McClelland 1986) The most common learning algorithm isldquoback propagationrdquo (Rumelhart Hinton amp Williams 1986) where on eachlearning trial the network compares its output with the target output andany difference is propagated back to the hidden unit weights and in turn tothe input weights in a way that reduces the error Our simulations adoptedthis standard architecture Thus whatever the pattern of results they aregenerated by a very general learning system whose processes were not

320 ELLIS AND SCHMIDT

tweaked in any way to make it particular as a Language Acquisition DeviceSo what are the emergent patterns of language acquisition that result whenthis general associative learning mechanism is applied to the particularcontent of picture stimuli with their corresponding singular and plural lexicalresponses as experienced at the same relative frequencies of exposure as ourhuman learners

The Models

Architecture Every model had 22 input (I-) units Each of I-units 1ndash20represented one of the pictures used in the training set of the AppendixI-unit 21 represented another picture (the generalisation test item TesterP)which was only ever presented for training to the model in the singularmdashlater it was presented as a plural test item to see which plural afx the modelwould choose for this generalisation item (akin to asking you what is theplural of a novel word like ldquowugrdquo) I-unit 22 coded plurality that iswhether a singular stimulus item or a pair were presented Every model had32 output (O-) units O-units 1ndash20 represented the stem forms of the lexisshown in the Appendix O-unit 21 represented the stem form correspondingto I-unit 21 O-units 22ndash31 represented each of the other 10 unique pluralafxes for irregular items O-unit 32 represented the regular plural afxThis numbering of I- and O-units is of course arbitrary and was random-ised across modelsmdashwhat mattered and remained constant was that thesame O-unit was always reinforced whenever a particular I-unit wasactivated

We investigated four different classes of model which differed in theircomputational capacity or resources The larger the number of HUs in amodel the larger the number of connections in the network and the greaterits capacity to learn new associations and abstractions Thus we comparedmodels with 3 5 8 and 15 HUs

Stem Training At the outset the connection weights of the models wererandomised Then just like our human learners the models were rsttrained on the singular forms Each epoch of training consisted of 21 trialsEach trial consisted of presentation of a unique input pattern one for each ofthe input pictures Thus just one of I-units 1ndash21 would be ldquoonrdquo on any trialThroughout the singular training phase I-unit 22 (representing singlepluralstimuli) was set to ldquooffrdquo For each input pattern the model responded with apattern of output over its 32 O-units Initially this was the random result ofthe random connection weights But the model was also presented with thecorrect pattern of output for that corresponding input pattern (eg if I-unit 1

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 321

was on and all others off O-unit 1 should have had value 10 and all otherszero) On each trial the back-propagation algorithm calculated thedifference between the level of activity that was produced on each O-unitand the ldquocorrectrdquo level of activity and a small adjustment was made to theconnection strength to that unit in such a way that when the same processoccurred again a closer approximation to the correct pattern of outputactivation would be achieved The models were trained for 500 epochs ofsingular experience For each size of model we ran ve examples startingwith different arbitrary unit allocation and different initial randomconnection strengths The data we produce for each model is the averageperformance of these ve examples

Plural Training The model weights that resulted from this singulartraining then served as the starting point for another 700 epochs of trainingon plurals The trials constituting each epoch were very similar in nature tothose used with the human learners Each epoch consisted of 81 trialspresented in random order (a) One presentation of each of the 21 singularforms as in the preceding phase (b) ve presentations of each of the ve highfrequency regular (HiFreqReg) plural forms (c) ve presentations of eachof the ve high frequency irregular (HiFreqIrreg) plural forms (d) onepresentation of each of the ve low frequency regular (LoFreqReg) formsand (e) one presentation of each of the ve low frequency irregular(LoFreqIrreg) forms For training trials of type (a) just one of I-units 1ndash21was activated I-unit 22 was off and just the corresponding one of O-units1ndash21 was reinforced For the other training types (bndashe) one of I-units 1ndash20was activated I-unit 22 was on and one of O-units 1ndash20 (the correspondingstem form) along with one of O-units 22ndash32 (the corresponding plural afx)were reinforced The learning algorithm operated as it did in the stemtraining phase At regular intervals we tested the state of learning of themodel by presenting it without feedback with test input patterns thatrepresented the plural cases of all 21 pictures At these tests for eachstimulus we measured the pattern of activation (between 0 [no activation]and 1 [full on]) across O-units 22ndash32 and compared it against the targetplural activation for that input pattern

Results

Regularity by Frequency Figure 2 shows the Root Mean Square (RMS)error calculated across the plural afx O-units (22ndash32) averaged over the veitems in each of the following classes HiFreqReg HiFreqIrreg LoFreqRegLoFreqIrreg at each point in testing of the model These graphs illustratethat learning in all of the models showed clear effects of frequency (high

322

FIG

2

Acq

uisi

tion

data

for

fou

r co

nnec

tioni

st m

odel

s w

ith

incr

easi

ng c

ompu

tati

onal

pow

er t

rain

ed o

n th

e M

AL

mor

phol

ogy

The

re a

re c

lear

reg

ular

ity b

y fr

eque

ncy

inte

ract

ions

in a

ll m

odel

s

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 323

frequency items were learned faster than low frequency ones) regularity(regular items were learned faster than irregular ones) and a frequency byregularity interaction whereby there was much less regularity effect for highfrequency items than for low frequency items and equally that thefrequency effect was less for regular items than for irregular ones

ANOVAs on these RMS data for each size of model demonstrated thatthere was high consistency of response across items and examplesimulations For example when the 8HU model was analysed as a repeatedmeasures ANOVA across 15 roughly equally spaced blocks of training (toparallel the human data analysis) the following signicant effects wereobserved (a) Frequency [by simulations F(1 16) 5 2080 P 00005 bywords F(1 16) 5 5665 P 00001] (b) regularity [by simulations F(1 16)5 907 P 001 by words F(1 16) 5 3957 P 00001] (c) regularity byfrequency [by simulations F(1 16) 5 485 P 005 by words F(1 16) 51561 P 0005] (d) block [by simulations F(14 224) 5 6803 P 00001by words F(14 224) 5 14914 P 00001] (e) block by regularity [bysimulations F(14 224) 5 3675 P 00001 by words F(14 224) 5 2929 P 00001] (f) block by frequency [by simulations F(14 224) 5 1893 P 00001 by words F(14 224) 5 1184 P 00001] and (g) block by regularityby frequency [by simulations F(14 224) 5 1611 P 00001 by words F(14224) 5 1306 P 00001]

Comparison of this pattern of ANOVA effects with that reported earlierfor the human data shows important similarities in both cases there aresignicant main effects of frequency regularity and blocks and there aresignicant interactions involving regularity by frequency and regularity byfrequency by block Thus the connectionist models demonstrate effectswhich broadly parallel those found in humans

Comparison with Human Data More detailed comparison is alsopossible Although RMS error is the usual measure of model performancebecause it assesses how well the network learns to inhibit non-relevant unitsas well as to excite relevant ones we also extracted simple accuracy data forthe 8HU model This accuracy score is the amount of activation (between 0and 1) on the single O-unit which corresponds to the appropriate target afxfor that input pattern Figure 3 shows the performance of the 8HU modelusing this metric It is clear that accuracy scores generate a graph which iseffectively a reection in a horizontal plane of the RMS data shown in thethird panel of Fig 2 In fact in the current simulations correct activation isalmost perfectly correlated with MSE (for example r 5 2 0988 for the 8HUmodel) However the activation metric has the advantage of more readyinterpretation and direct comparison with the human data

When the 8HU model and the human data are aligned as in Fig 3 thesecorrespondences become clear Pairwise comparison of individual points

324 ELLIS AND SCHMIDT

FIG 3 A comparison of human accuracy performance and that of the eight hidden unitconnectionist simulation

across these two graphs by correlation shows that the simulation predicts alarge proportion of the variance in the human data (R2 5 078) There aresome differences in detailmdashas is claried in Fig 4 where performance isaveraged over blocks the model performs somewhat better on the regularitems and worse on the irregular items particularly the low frequencyirregular items than do the humans ANOVA (three factor [humanmodelregularity and frequency] with 15 blocks as repeated measures by wordsanalysis) comparing the human and 8HU model data conrms theseinteractions (a) humanmodel F(1 32) 5 136 ns (b) humanmodel byfrequency F(1 32) 5 047 ns (c) humanmodel by regularity F(1 32) 53028 P 00001 (d) humanmodel by regularity by frequency F(1 32) 5501 P 005

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 325

FIG 4 The regularity by frequency interaction averaged over blocks in humans and the eighthidden unit model Error bars reect 95 condence intervals

Generalisation So far we have described performance with traineditems However we also tested model output when the stimulus was thepattern for generalisation item (TesterP) along with activation of the pluralmarking I-unit 22 a state of input on which the models had never beentrained Table 1 shows performance of the different models at the end oftraining It is clear that the larger models have abstracted the regular pluralpattern and tend to apply it by default to the generalisation test item for the15HU model (a) average activation on the regular plural O-unit is 060 (b)mean RMS error comparing observed activation across O-units 22ndash32 andthe target regular plural pattern (10000000000) is just 045 and (c) four outof the ve exemplar runs of this size of model chose the regular pluralpattern as being the closest to observed output as measured by minimum

326 ELLIS AND SCHMIDT

TABLE 1Performance on the Target Regular Plural Pattern for the Four Sizes of Model When

Presented with the Generalisation Wug-test Item TesterP at End of Training

Model Size

Measure 3HUa 5HU 8HU 15HU

RMS errorb

M 081 079 053 045SD 043 050 045 032

Activation weightc

M 020 028 057 060SD 044 044 052 035

N hits (5)d 1 2 3 4

There were ve examplars of each size of model aHU 5 hidden units bRMS error calculatedagainst the target activation pattern across O-units 22ndash32 for the regular plural afx cActivationweight on the regular plural afx O-Unit dNumber of exemplar models (5) which chose theregular plural afx pattern for TesterP as indexed by output weights on O-units 22ndash32 beingclosest to the regular plural afx target pattern activation using a squared Euclidean distancemetric

squared Euclidean distance Thus when the larger models are presentedwith a plural stimulus which they have only ever previously experienced as asingle form there is a tendency for them to generalise and apply the regularplural morpheme (bu-) in the same way that humans might generalise thatthe plural of ldquowugrdquo is ldquowugsrdquo

Effects of Different Sizes of Model Figure 2 also illustrates the effects ofmanipulating computational capacity of model (1) Models with lowercomputational power ( 5 a smaller number of HUs) learn the high frequencyitems quite wellmdashalmost as well as the largest model (2) The most strikingeffect of varying the computational power of the models lies in their abilitiesto learn low frequency irregular itemsmdashthis is by far the most sensitive indexof morphological learning ability The 3HU model hardly manages to learnthese forms at all The 15HU model eventually learns them rather well (3)There is essentially no frequency effect for regular items in the highercomputational power models but none the less the frequency effect forirregular items remains strong (4) The smaller models continue to show afrequency effect for regular items at the end of training Table l provides oneadditional effect of model size (5) The greater the computational power ofthe models the more they operate in ldquorule-likerdquo way by abstracting aldquoregularrdquo plural form which is applied by default to novel items In sumwhile lower computational power models are reasonably good on highfrequency regular items they show frequency effects for irregular and

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 327

regular items are relatively poor on ldquowug testsrdquo and have particulardifculty on low frequency irregular items

Discussion of Simulations

We believe that at least for the issue of regularity and frequency effects inmorphosyntax this is to date the most complete quantitative analysis of theadequacy of t of simulation to human data We are not simply makingpredictions about how an underspecied model might behave (theDaugherty amp Seidenberg 1994 criticisms of the Pinker amp Prince 1988 andPinker 1991 theories) We are not simply demonstrating that simulation andhuman data alike exhibit rst order interactions of frequency and regularity(Daugherty amp Seidenberg 1994) Instead we are showing the parallelpatterns of signicance of main effects rst and second order interactions inANOVAs of simulation and human data and we are showing that thesimulations explain close to 80 of the relevant human data When we go asfar as actually comparing human and model performance in a multifactorialANOVA we nd some differences of detail in the size of interactions thatare qualied by the humanmodel factor But these differences of detail donot detract from the general success of the models in simulating the humanpattern of development of the frequency by regularity interaction Inhumans and models alike high frequency items were learned signicantlyfaster than low frequency ones regular items were learned signicantlyfaster than irregular ones there was a signicant frequency by regularityinteraction where the frequency effect was less for regular items than forirregular ones and this is qualied as the higher level interaction with blockwhereby there is a developmental trendmdashthe frequency effect for regularitems attenuates faster than that for irregular items

We have demonstrated that the models can generalise and produce thedefault plural afx for a novel stimulus Similar ldquowug testrdquo performance by ahuman learner would be taken as an operationalisation that they hadacquired the ldquoregularrdquo morphological systematicity

Finally we have shown how varying the computational capacity of themodels affects both the rate of acquisition of default case the presence orabsence of frequency effects for regular items and ability to acquireirregular items This is compatible with existing data for children withspecic language impairment (SLI) Oetting and Rice (1993) compared ve-year-old SLI children with age-matched controls on their ability to formplurals The SLI children were signicantly worse at generating regularplurals for nonce (5 wug) items they were worse at generating regularplurals and they showed an effect of frequency on the regular items whichthe control children because of ceiling effects did not UnfortunatelyOetting and Rice (1993) do not provide clear data on the childrenrsquos ability to

328 ELLIS AND SCHMIDT

form irregular plurals However their pattern of differences between SLIand control childrenrsquos performance on regular items is sufciently close tothat between the present low-capacity and high-capacity simulations tosuggest that morphosyntactic impairments in individuals with SLI might beexplained by reduced language processing capacity in a general associativememory network rather than by a hybrid account The SLI childrenrsquosshowing frequency effects for regular items is particularly compelling in thisrespect However further assessment of regularity by frequency effects anddefault abstraction in individuals with SLI and with Williams syndrome(whose ability on regular forms is said to outstrip their performance onirregularsmdashBellugi Bihrle Jernigan Trauner amp Dougherty 1990) isnecessary to test these parallels further (see Marchman 1993 for othersimulations of different types of language dysfunction)

GENERAL DISCUSSION

Fluent language users have processed many millions of utterances involvingtens of thousands of types presented as innumerable tokens It should comeas no surprise either that they demonstrate such effortless and complex skillas a result of this mass of practice or that researchers lacking any truerecord of the learnersrsquo experience are awed and confused by thesesophisticated grammatical abilities While we have no wish to deny any ofthe complexity of the nal uent state we suspect that much of the mysteryof morphology can be claried by focusing on the acquisition process ratherthan the end-point This has been our aim in this paper Our MAL is atravesty of natural language but at least we know the types and tokens in thelearnersrsquo language evidence and there is no need to speculate or argue aboutextrapolations from corpus data or assumptions about registers

Human learning of this MAL inectional morphology quickly culminatesin a state where as with natural language frequency and regularity haveinteractive effects on performance But as we chart acquisition it is clearthat this interaction need not imply complex dual-mechanisms of processingRather it simply reects the asymptotes expected from the power law ofpractice a simple associative law of learning Thus we have shown that oneof the most frequently introduced arguments for the necessity of adual-mechanism approach a frequency effect for irregulars and the absenceof such an effect for regulars is not a good argument at all Furthermore wehave demonstrated that a simple connectionist model as an implementationof associative learning provided with the same language evidenceaccurately simulates the human acquisition data

But how is the power law instantiated in human and connectionistsystems and what is being associated in the acquisition of inectional

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 329

morphosyntax The power law of learning in human performance has beeninterpreted as resulting from basic associative mechanisms involving theformation of new chunks and the effects of frequency on the accessibility ofthese representations (Newell 1990 Newell amp Rosenbloom 1981)Anderson and Schooler (1991) suggest that memory (both as its behaviouralexpression in error rate and latency and as its neural expression in LTP)displays properties such as the power law of learning because theseproperties reect an optimal response to the environment where theprobability of an item occurring at any particular time is a power function ofits past frequency of occurrence Neural activation which controlsbehaviour reects the probability of an item occurring in the environmentthus the neural processes are designed to adapt behaviour to the statisticalproperties of the environment (Anderson 1993) Connectionist systems aredesigned to do the same thing (Chater 1995)

In our simplied account of inectional morphology where phonologicalfactors are put to one side the relevant units for chunking are the stem formsand the plural afxes From an associative perspective regularity andfrequency are essentially the same factor under different names The rstmeaning of ldquoregularrdquo in the Pocket Oxford Dictionary involves ldquohabitualconstantrdquo acts a denition in terms of statistical frequencies consistencyand descriptive generalisation the second stresses ldquoconforming to a rule orprinciplerdquo We need to disentangle these senses (see Sharwood-Smith 1994and Lima Corrigan amp Iverson 1994 for conceptual analysis of ldquorules oflanguagerdquo) Whether regular morphology is generated according to a rule ornot it is certainly the case for English and the MAL under study here (andgenerally it is the default if not the universal casemdashwe will return to thismatter later) that regular afxes are more habitual or frequent And asdemonstrated in Fig 5 the power law of practice entails that an effect of aconstant increment of regularity (in its frequency sense) is much moreapparent at low than at high frequencies of practice

Although it is a general principle the degree to which it applies dependson a range of factors including (a) the exponent of the power function (b)the particular level of experience attained and thus the placement ofcomparison points on the learning curve and (c) the degree to whichfrequency and regularity are additive or multiplicative In the presentexperiment a vefold increase in the frequency of the regular items resultsin a (5 3 the number of regular items) increase in use of the regular afx avefold increase in the frequency of an irregular item results in merely avefold increase in the use of the irregular afx Thus frequency andregularity are interactive rather than additive But even if we allow forinteraction the function still results in greater regularity effects for lowfrequency itemsmdashjust as for example the power function

330 ELLIS AND SCHMIDT

FIG 5 A frequency by regularity interaction arising from additive contributions of regularity(solid horizontal arrows) and frequency (dotted horizontal arrows) inputting into anasymptoting power function Notice in particular the solid vertical bars measuring out the largeregularity effect at low frequencies and the much smaller one at high frequencies (Adaptedfrom Plaut McClelland Seidenberg amp Patterson 1994)

y 5 1 2 x2 2

asymptotes so does any power function

y 5 1 2 (xn)2 2

where n 0 the shape remains the same albeit stretched or condensedalong the horizontal axis Thus all associative accounts of morphologywhether they stress the importance of type or token frequency (Bybee 1995)in the determination of statistical regularity imply a frequency by regularityinteraction in performance

Plaut et al (1996) analyse the operation of connectionist networks in theparticular quasi-regular domain of spellingndashsound consistency in reading todemonstrate how the frequency by regularity interaction is a direct

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 331

consequence of the nonlinearity adaptivity and distributed representationproperties of learning and representation in PDP networks In what followswe will minimally rephrase their analysis as it applies to the quasi-regulardomain of inectional morphology In a connectionist network the weightchanges induced by an inputoutput pattern (IOP) on any training epochserve to reduce the error on that IOP The frequency of the IOP (and theunits it involves) is reected in how often it is presented to the network Thusword frequency directly amplies weight changes that are helpful to theIOP itself Consistency of the morphological inections of two stems isreected in the similarity of afx units that are co-activated in their IOPsFurthermore two inputs will induce similar weight changes to the extentthat they activate similar units In our MAL as an extreme case consistentforms all activate the same afx unit irregular ones each activate a differentidiosyncratic afx Given that the weight changes that are induced by eachIOP are superimposed on the weight changes for all other IOPs an IOPwill tend to be helped by the weight changes for IOPs whose inputoutputmappings are consistent with its own and hindered by the weight changesfor inconsistent IOPs Thus frequency and consistency sum because theyboth arise from similar weight changes that are simply added together duringtraining The weight changes result in corresponding increases in thesummed input to output units that should be active and decreases to thesummed units that should be inactive However due to the non-linearity ofthe input-output function of units these changes do not produce directlyproportionate reduction of error Rather as the magnitude of the summedinput to output units increases their states gradually asymptote towards10mdasha given increase in the summed input to a unit yields progressivelysmaller decrements in error over the course of training Thus althoughfrequency and regularity-as-consistency each contribute to the weights andhence to the summed input to units their effect on error is subjected to agradual ceiling effect as unit states are driven towards extremal values

Thus a connectionist associative account of simple morphosyntax as it isembodied in our MAL holds that learning involves associating inputpatterns representing single or plural concepts with stem and afx lemmasacross a large distributed network Frequency of experience increases thestrength of the appropriate IO associations Regularity effects stem fromconsistency the consistent items all involve pairings between plurality andthe regular lemma and thus regularity is frequency by another name Thenetwork sums and abstracts these consistencies but it does so usingnon-linear unit inputndashoutput functions thereby resulting in the frequency byregularity interaction Networks are not simple competitive chunking orMarkov chaining mechanisms working on surface form Their massivelydistributed nature allows the emergence of more abstract internalrepresentations We have argued that this analysis accounts for the human

332 ELLIS AND SCHMIDT

acquisition data of simple MAL morphosyntax quite well We believe thatthe acquisition of natural language morphosyntax where there are manyadditional factors of different phonological consistencies (of the type forexample where the neighbours sink drink and stink are irregular in theirpast tenses but all behave in the same -ankway) are equally conducive to theprinciples of this type of account although as illustrated in grandersimulation enterprises (Cottrell amp Plunkett 1994 Daugherty amp Seidenberg1994 MacWhinney amp Leinbach 1991 Marchman 1993 Plunkett ampMarchman 1993) the complexity of interaction of the factors that are therein the language evidence leads to much more complex developmentaloutcomes Our role here has been to study human acquisition underprecisely known circumstances and to demonstrate just how well aconnectionist associative account can simulate these data

A simple regularity5 consistency account of this type will have difculty ifthe ldquoregularrdquo or ldquodefaultrdquo case is not the most frequent case in a naturallanguage Although there is agreement for English past tense and formorphology more generally that the default case is more frequent theremay be exceptions Marcus et al (1995) argue that while the German particle-t applies to a much smaller percentage of verbs than its English counterpartand the German plural -s applies only to a small percentage of nounsnevertheless these afxes behave as defaults in the language These defaultsufxations in German could thus pose a problem for statistical orconnectionist accounts of the acquisition of the more frequent patterns asdefault since they may not be due to a large number of regular wordsreinforcing a pattern in associative memory (Prasada amp Pinker 1993)However this is still a matter of some debate Bybee (1995) suggests that amore reasonable method of counting German particle type frequency doesshow the default (or ldquoproductiverdquo) process to have the highest typefrequency She also argues that to a large extent the productivity patterns ofGerman plurals also reect their type frequency Nakisa and Hahn (1996)and Plunkett and Nakisa (in press) show that generalisation to unseen ornovel forms in German and Arabic (where there have also been claims for aminority default) is more accurately predicted by their phonologicalsimilarity to existing forms in the language (properly represented for typeand token frequency) rather than by the operation of a default rule FinallyHare Elman and Daugherty (1995) demonstrate that multilayerednetworks can develop a default category even in the absence of superior typefrequency as long as the non-default classes are well dened and narrowlydened so that they serve as strong prototypes for analogising to novelforms In such cases the area outside these well-dened attractor basins canconstitute a potential default (see also Plunkett amp Marchman 1991)

In the original hybrid model irregulars were stored and accessed fromrote memory Pinker and Prince (1994 p 326) modied this part of the

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 333

model arguing that since rote memory could not account (a) for similaritiesbetween the morphological base and irregular forms (eg swingndashswung) (b)for similarity within sets of base forms undergoing similar processes (egsingndashsang ringndashrang springndashsprang) or (c) for the kind of semi-productivityshown when children produce errors such as bringndashbrang or swingndashswangthe memory system underlying such productions must be associative anddynamic somewhat as connectionism portrays it Yet to account for datasuch as the frequencyregularity interaction this revised hybrid model stillholds that regular forms are rule-governed But a purely rule-based accountof regulars cannot explain false friends effects where regular inconsistentitems (eg bakendashbaked is similar in rhyme to neighbours makendashmade andtakendashtook which have inconsistent past tenses) are produced more slowlythat entirely regular ones (Daugherty amp Seidenberg 1994 Seidenberg ampBruck 1990) or frequency effects on regular forms (Oetting amp Rice 1993Stemberger amp MacWhinney 1986) Unlike connectionist models a rule-based account of regulars cannot explain these aspects of the human dataNor is the regularityfrequency interaction any reason to reject connectionistaccounts of morphosyntax in favour of a hybrid model

REFERENCESAnderson JR (1982) Acquisition of cognitive skill Psychological Review 89 369ndash406Anderson JR (1993) Rules of the mind Hillsdale NJ Lawrence Erlbaum Associates IncAnderson JR amp Schooler LJ (1991) Reections of the environment in memory

Psychological Science 2 396ndash408Beck M (1995) Tracking down the source of NSndashNNS differences in syntactic competence

Unpublished manuscript University of North TexasBellugi U Bihrle A Jernigan D Trauner D amp Dougherty S (1990)

Neuropsychological neurological and neuroanatomical prole of Williams SyndromeAmerican Journal of Medical Genetics 6 115ndash125

Braine MDS Brody RE Brooks PJ Sudhalter V Ross JA Catalano L amp FischSM (1990) Exploring language acquisition in children with a miniature articiallanguage Effects of item and pattern frequency arbitrary subclasses and correctionJournal of Memory and Language 29 591ndash610

Broeder P amp Plunkett K (1994) Connectionism and second language acquisition In NEllis (Ed) Implicit and explicit learning of languages (pp 421ndash454) London AcademicPress

Bybee J (1995) Regular morphology and the lexicon Language and Cognitive Processes10 425ndash455

Chater N (1995) Neural networks The new statistical models of mind In JP Levy DBairaktaris JA Bullinaria amp P Cairns (Eds) Connectionist models of memory andlanguage London UCL Press

Cohen JD MacWhinney B Flatt M amp Provost J (1993) PsyScope A new graphicinteractive environment for designing psychology experiments Behavioral ResearchMethods Instruments and Computers 25 257ndash271

Cottrell G amp Plunkett K (1994) Acquiring the mapping from meaning to soundsConnection Science 6 379ndash412

334 ELLIS AND SCHMIDT

Daugherty KG amp Seidenberg MS (1992) Rules or connections The past tense revisitedIn Proceedings of the 14th annual conference of the Cognitive Science Society (pp 259ndash264)Pittsburgh PA Cognitive Science Society

Daugherty KG amp Seidenberg MS (1994) Beyond rules and exceptions A connectionistapproach to inectional morphology In SD Lima RL Corrigan amp GK Iverson (Eds)The reality of linguistic rules (pp 353ndash388) Amsterdam John Benjamins

DeKeyser R (1997) Beyond explicit rule learning Automatizing second languagemorphosyntax Studies in Second Language Acquisition 19 195ndash222

Ellis NC (1996) Sequencing in SLA Phonological memory chunking and points of orderStudies in Second Language Acquisition 18 91ndash126

Eubank L amp Gregg KR (1995) ldquoEt in Amygdala Egordquo UG (S)LA and neurobiologyStudies in Second Language Acquisition 17 35ndash58

Hare M Elman JL amp Daugherty KG (1995) Default generalisation in connectionistnetworks Language and Cognitive Processes 10 601ndash630

Jung J (1971) The experimenterrsquos dilemma New York Harper amp RowKirsner K (1994) Implicit processes in second language learning In N Ellis (Ed) Implicit

and explicit learning of languages (pp 283ndash312) London Academic PressLachter J amp Bever T (1988) The relation between linguistic structure and associative

theories of language learning A constructive critique of some connectionist learningmodels Cognition 28 195ndash247

Lima SD Corrigan RL amp Iverson GK (Eds) (1994) The reality of linguistic rulesAmsterdam John Benjamins

MacWhinney B (1983) Miniature language systems as tests of use of universal operatingprinciples in second-language learning by children and adults Journal of PsycholinguisticResearch 12 467ndash478

MacWhinney B (1994) The dinosaurs and the ring In SD Lima RL Corrigan amp GKIverson (Eds) The reality of linguistic rules (pp 283ndash320) Amsterdam John Benjamins

MacWhinney B amp Leinbach J (1991) Implementations are not conceptualizationsRevising the verb learning model Cognition 40 121ndash157

Marchman VA (1993) Constraints on plasticity in a connectionist model of the Englishpast tense Journal of Cognitive Neuroscience 5 215ndash234

Marcus GF Brinkmann U Clahsen H Wiese R amp Pinker S (1995) Germaninection The exception that proves the rule Cognitive Psychology 29 198ndash256

McLaughlin B (1980) On the use of miniature articial languages in second-languageresearch Applied Psycholinguistics 1 357ndash369

Moeser SD amp Bregman AS (1972) The role of reference in the acquisition of a miniaturearticial language Journal of Verbal Learning and Verbal Behavior 11 759ndash769

Morgan JL Meier RP amp Newport EL (1987) Structural packaging in the input tolanguage learning Contributions of prosodic and morphological marking of phrases to theacquisition of language Cognitive Psychology 19 498ndash550

Morgan JL amp Newport EL (1981) The role of constituent structure in the induction of anarticial language Journal of Verbal Learning and Verbal Behavior 20 67ndash85

Morton J (1979) Facilitation in word recognition Experiments causing change in thelogogen model In PA Kolers ME Wrolstad amp M Bouma (Eds) Processing of visiblelanguage (pp 259ndash268) New York Plenum

Nakisa R amp Hahn U (1996) Where defaults donrsquot help The case of the German pluralsystem In Proceedings of the 18th annual conference of the Cognitive Science Society (pp177ndash182) Hillsdale NJ Lawrence Erlbaum Associates Inc

Newell A (1990) Unied theories of cognition Cambridge MA Harvard University PressNewell A amp Rosenbloom P (1981) Mechanisms of skill acquisition and the law of

practice In JR Anderson (Ed) Cognitive skills and their acquisition Hillsdale NJLawrence Erlbaum Associates Inc

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 335

Oetting JB amp Rice ML (1993) Plural acquisition in children with specic languageimpairment Journal of Speech and Hearing Research 36 1236ndash1248

Paivio A (1986) Mental representations A dual coding approach Oxford UK OxfordUniversity Press

Palermo DS amp Howe HE (1970) An experimental analogy to the learning of past-tenseinection rules Journal of Verbal Learning and Verbal Behavior 9 410ndash416

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S amp Prince A (1988) On language and connectionism Analysis of a parallel

distributed processing model of language acquisition Cognition 29 195ndash247Pinker S amp Prince A (1994) Regular and irregular morphology and the psychological

status of rules of grammar In SD Lima RL Corrigan amp GK Iverson (Eds) The reality oflinguistic rules (pp 321ndash351) Amsterdam John Benjamins

Plaut DC McClelland JL Seidenberg MS amp Patterson KE (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plunkett K amp Marchman V (1991) U-shaped learning and frequency effects in amulti-layered perceptron Implications for child language acquisition Cognition 38 3ndash102

Plunkett K amp Marchman V (1993) From rote learning to system building Acquiring verbmorphology in children and connectionist nets Cognition 48 21ndash69

Plunkett K amp Nakisa RC (in press) A connectionist model of Arabic plural systemLanguage and Cognitive Processes

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Prasada S Pinker S amp Snyder W (1990) Some evidence that irregular forms are retrievedfrom memory but regular forms are rule-governed Paper presented at the 31st meeting ofthe Psychonomic Society New Orleans November

Rumelhart D Hinton G amp Williams R (1986) Learning internal representations by backpropagation In DE Rumelhart amp JL McClelland (Ed) Parallel distributed processingExplorations in the microstructure of cognition Cambridge MA MIT Press

Rumelhart D amp McClelland J (1986) On learning the past tense of English verbs In DERumelhart amp JL McClelland (Eds) Parallel distributed processing Explorations in themicrostructure of cognition Vol 2 Psychological and biological models (pp 272ndash326)Cambridge MA MIT Press

Seidenberg MS amp Bruck M (1990) Consistency effects in the generation of past tensemorphology Paper presented at the 31st meeting of the Psychonomic Society New OrleansNovember

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behavior 23 383ndash404

Sharwood Smith MA (1994) The unruly world of language In N Ellis (Ed) Implicit andexplicit learning of languages (pp 33ndash44) London Academic Press

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Human Learning and Memory 6 174ndash215

Stemberger JP amp MacWhinney B (1986) Frequency and the lexical storage of regularlyinected forms Memory and Cognition 14 17ndash26

Winter B amp Reber AS (1994) Implicit learning and the acquisition of natural languagesIn N Ellis (Ed) Implicit and explicit learning of languages (pp 115ndash146) LondonAcademic Press

Yang LR amp Givon T (1997) Benets and drawbacks of controlled laboratory studies ofsecond language acquisition The Keck second language learning project Studies in SecondLanguage Acquisition 19 173ndash194

336 ELLIS AND SCHMIDT

APPENDIX

The Word-forms of the Articial Language

Picture Stem Plural Form Frequency Regularity

car garth bugarth 5 Rbed pid bupid 1 Rlamp lant bulant 5 Rtable tib butid 1 Rplane poon bupoon 5 Rball prill buprill 1 Rtrain dram budram 5 Rhouse hize buhize 1 Rbook bisk bubisk 5 Rbroom breen bubreen 1 Rphone feem gofeem 5 Iumbrella brol gubrol 1 Ichair charp zecharp 5 Ihorse naig zonaig 1 Imonkey chonk nuchonk 5 Idog woop niwoop 1 Ielephant fant vefant 5 Iscissors zoze vuzoze 1 Ikite kag rekag 5 Ish pisc ropisc 1 I

320 ELLIS AND SCHMIDT

tweaked in any way to make it particular as a Language Acquisition DeviceSo what are the emergent patterns of language acquisition that result whenthis general associative learning mechanism is applied to the particularcontent of picture stimuli with their corresponding singular and plural lexicalresponses as experienced at the same relative frequencies of exposure as ourhuman learners

The Models

Architecture Every model had 22 input (I-) units Each of I-units 1ndash20represented one of the pictures used in the training set of the AppendixI-unit 21 represented another picture (the generalisation test item TesterP)which was only ever presented for training to the model in the singularmdashlater it was presented as a plural test item to see which plural afx the modelwould choose for this generalisation item (akin to asking you what is theplural of a novel word like ldquowugrdquo) I-unit 22 coded plurality that iswhether a singular stimulus item or a pair were presented Every model had32 output (O-) units O-units 1ndash20 represented the stem forms of the lexisshown in the Appendix O-unit 21 represented the stem form correspondingto I-unit 21 O-units 22ndash31 represented each of the other 10 unique pluralafxes for irregular items O-unit 32 represented the regular plural afxThis numbering of I- and O-units is of course arbitrary and was random-ised across modelsmdashwhat mattered and remained constant was that thesame O-unit was always reinforced whenever a particular I-unit wasactivated

We investigated four different classes of model which differed in theircomputational capacity or resources The larger the number of HUs in amodel the larger the number of connections in the network and the greaterits capacity to learn new associations and abstractions Thus we comparedmodels with 3 5 8 and 15 HUs

Stem Training At the outset the connection weights of the models wererandomised Then just like our human learners the models were rsttrained on the singular forms Each epoch of training consisted of 21 trialsEach trial consisted of presentation of a unique input pattern one for each ofthe input pictures Thus just one of I-units 1ndash21 would be ldquoonrdquo on any trialThroughout the singular training phase I-unit 22 (representing singlepluralstimuli) was set to ldquooffrdquo For each input pattern the model responded with apattern of output over its 32 O-units Initially this was the random result ofthe random connection weights But the model was also presented with thecorrect pattern of output for that corresponding input pattern (eg if I-unit 1

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 321

was on and all others off O-unit 1 should have had value 10 and all otherszero) On each trial the back-propagation algorithm calculated thedifference between the level of activity that was produced on each O-unitand the ldquocorrectrdquo level of activity and a small adjustment was made to theconnection strength to that unit in such a way that when the same processoccurred again a closer approximation to the correct pattern of outputactivation would be achieved The models were trained for 500 epochs ofsingular experience For each size of model we ran ve examples startingwith different arbitrary unit allocation and different initial randomconnection strengths The data we produce for each model is the averageperformance of these ve examples

Plural Training The model weights that resulted from this singulartraining then served as the starting point for another 700 epochs of trainingon plurals The trials constituting each epoch were very similar in nature tothose used with the human learners Each epoch consisted of 81 trialspresented in random order (a) One presentation of each of the 21 singularforms as in the preceding phase (b) ve presentations of each of the ve highfrequency regular (HiFreqReg) plural forms (c) ve presentations of eachof the ve high frequency irregular (HiFreqIrreg) plural forms (d) onepresentation of each of the ve low frequency regular (LoFreqReg) formsand (e) one presentation of each of the ve low frequency irregular(LoFreqIrreg) forms For training trials of type (a) just one of I-units 1ndash21was activated I-unit 22 was off and just the corresponding one of O-units1ndash21 was reinforced For the other training types (bndashe) one of I-units 1ndash20was activated I-unit 22 was on and one of O-units 1ndash20 (the correspondingstem form) along with one of O-units 22ndash32 (the corresponding plural afx)were reinforced The learning algorithm operated as it did in the stemtraining phase At regular intervals we tested the state of learning of themodel by presenting it without feedback with test input patterns thatrepresented the plural cases of all 21 pictures At these tests for eachstimulus we measured the pattern of activation (between 0 [no activation]and 1 [full on]) across O-units 22ndash32 and compared it against the targetplural activation for that input pattern

Results

Regularity by Frequency Figure 2 shows the Root Mean Square (RMS)error calculated across the plural afx O-units (22ndash32) averaged over the veitems in each of the following classes HiFreqReg HiFreqIrreg LoFreqRegLoFreqIrreg at each point in testing of the model These graphs illustratethat learning in all of the models showed clear effects of frequency (high

322

FIG

2

Acq

uisi

tion

data

for

fou

r co

nnec

tioni

st m

odel

s w

ith

incr

easi

ng c

ompu

tati

onal

pow

er t

rain

ed o

n th

e M

AL

mor

phol

ogy

The

re a

re c

lear

reg

ular

ity b

y fr

eque

ncy

inte

ract

ions

in a

ll m

odel

s

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 323

frequency items were learned faster than low frequency ones) regularity(regular items were learned faster than irregular ones) and a frequency byregularity interaction whereby there was much less regularity effect for highfrequency items than for low frequency items and equally that thefrequency effect was less for regular items than for irregular ones

ANOVAs on these RMS data for each size of model demonstrated thatthere was high consistency of response across items and examplesimulations For example when the 8HU model was analysed as a repeatedmeasures ANOVA across 15 roughly equally spaced blocks of training (toparallel the human data analysis) the following signicant effects wereobserved (a) Frequency [by simulations F(1 16) 5 2080 P 00005 bywords F(1 16) 5 5665 P 00001] (b) regularity [by simulations F(1 16)5 907 P 001 by words F(1 16) 5 3957 P 00001] (c) regularity byfrequency [by simulations F(1 16) 5 485 P 005 by words F(1 16) 51561 P 0005] (d) block [by simulations F(14 224) 5 6803 P 00001by words F(14 224) 5 14914 P 00001] (e) block by regularity [bysimulations F(14 224) 5 3675 P 00001 by words F(14 224) 5 2929 P 00001] (f) block by frequency [by simulations F(14 224) 5 1893 P 00001 by words F(14 224) 5 1184 P 00001] and (g) block by regularityby frequency [by simulations F(14 224) 5 1611 P 00001 by words F(14224) 5 1306 P 00001]

Comparison of this pattern of ANOVA effects with that reported earlierfor the human data shows important similarities in both cases there aresignicant main effects of frequency regularity and blocks and there aresignicant interactions involving regularity by frequency and regularity byfrequency by block Thus the connectionist models demonstrate effectswhich broadly parallel those found in humans

Comparison with Human Data More detailed comparison is alsopossible Although RMS error is the usual measure of model performancebecause it assesses how well the network learns to inhibit non-relevant unitsas well as to excite relevant ones we also extracted simple accuracy data forthe 8HU model This accuracy score is the amount of activation (between 0and 1) on the single O-unit which corresponds to the appropriate target afxfor that input pattern Figure 3 shows the performance of the 8HU modelusing this metric It is clear that accuracy scores generate a graph which iseffectively a reection in a horizontal plane of the RMS data shown in thethird panel of Fig 2 In fact in the current simulations correct activation isalmost perfectly correlated with MSE (for example r 5 2 0988 for the 8HUmodel) However the activation metric has the advantage of more readyinterpretation and direct comparison with the human data

When the 8HU model and the human data are aligned as in Fig 3 thesecorrespondences become clear Pairwise comparison of individual points

324 ELLIS AND SCHMIDT

FIG 3 A comparison of human accuracy performance and that of the eight hidden unitconnectionist simulation

across these two graphs by correlation shows that the simulation predicts alarge proportion of the variance in the human data (R2 5 078) There aresome differences in detailmdashas is claried in Fig 4 where performance isaveraged over blocks the model performs somewhat better on the regularitems and worse on the irregular items particularly the low frequencyirregular items than do the humans ANOVA (three factor [humanmodelregularity and frequency] with 15 blocks as repeated measures by wordsanalysis) comparing the human and 8HU model data conrms theseinteractions (a) humanmodel F(1 32) 5 136 ns (b) humanmodel byfrequency F(1 32) 5 047 ns (c) humanmodel by regularity F(1 32) 53028 P 00001 (d) humanmodel by regularity by frequency F(1 32) 5501 P 005

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 325

FIG 4 The regularity by frequency interaction averaged over blocks in humans and the eighthidden unit model Error bars reect 95 condence intervals

Generalisation So far we have described performance with traineditems However we also tested model output when the stimulus was thepattern for generalisation item (TesterP) along with activation of the pluralmarking I-unit 22 a state of input on which the models had never beentrained Table 1 shows performance of the different models at the end oftraining It is clear that the larger models have abstracted the regular pluralpattern and tend to apply it by default to the generalisation test item for the15HU model (a) average activation on the regular plural O-unit is 060 (b)mean RMS error comparing observed activation across O-units 22ndash32 andthe target regular plural pattern (10000000000) is just 045 and (c) four outof the ve exemplar runs of this size of model chose the regular pluralpattern as being the closest to observed output as measured by minimum

326 ELLIS AND SCHMIDT

TABLE 1Performance on the Target Regular Plural Pattern for the Four Sizes of Model When

Presented with the Generalisation Wug-test Item TesterP at End of Training

Model Size

Measure 3HUa 5HU 8HU 15HU

RMS errorb

M 081 079 053 045SD 043 050 045 032

Activation weightc

M 020 028 057 060SD 044 044 052 035

N hits (5)d 1 2 3 4

There were ve examplars of each size of model aHU 5 hidden units bRMS error calculatedagainst the target activation pattern across O-units 22ndash32 for the regular plural afx cActivationweight on the regular plural afx O-Unit dNumber of exemplar models (5) which chose theregular plural afx pattern for TesterP as indexed by output weights on O-units 22ndash32 beingclosest to the regular plural afx target pattern activation using a squared Euclidean distancemetric

squared Euclidean distance Thus when the larger models are presentedwith a plural stimulus which they have only ever previously experienced as asingle form there is a tendency for them to generalise and apply the regularplural morpheme (bu-) in the same way that humans might generalise thatthe plural of ldquowugrdquo is ldquowugsrdquo

Effects of Different Sizes of Model Figure 2 also illustrates the effects ofmanipulating computational capacity of model (1) Models with lowercomputational power ( 5 a smaller number of HUs) learn the high frequencyitems quite wellmdashalmost as well as the largest model (2) The most strikingeffect of varying the computational power of the models lies in their abilitiesto learn low frequency irregular itemsmdashthis is by far the most sensitive indexof morphological learning ability The 3HU model hardly manages to learnthese forms at all The 15HU model eventually learns them rather well (3)There is essentially no frequency effect for regular items in the highercomputational power models but none the less the frequency effect forirregular items remains strong (4) The smaller models continue to show afrequency effect for regular items at the end of training Table l provides oneadditional effect of model size (5) The greater the computational power ofthe models the more they operate in ldquorule-likerdquo way by abstracting aldquoregularrdquo plural form which is applied by default to novel items In sumwhile lower computational power models are reasonably good on highfrequency regular items they show frequency effects for irregular and

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 327

regular items are relatively poor on ldquowug testsrdquo and have particulardifculty on low frequency irregular items

Discussion of Simulations

We believe that at least for the issue of regularity and frequency effects inmorphosyntax this is to date the most complete quantitative analysis of theadequacy of t of simulation to human data We are not simply makingpredictions about how an underspecied model might behave (theDaugherty amp Seidenberg 1994 criticisms of the Pinker amp Prince 1988 andPinker 1991 theories) We are not simply demonstrating that simulation andhuman data alike exhibit rst order interactions of frequency and regularity(Daugherty amp Seidenberg 1994) Instead we are showing the parallelpatterns of signicance of main effects rst and second order interactions inANOVAs of simulation and human data and we are showing that thesimulations explain close to 80 of the relevant human data When we go asfar as actually comparing human and model performance in a multifactorialANOVA we nd some differences of detail in the size of interactions thatare qualied by the humanmodel factor But these differences of detail donot detract from the general success of the models in simulating the humanpattern of development of the frequency by regularity interaction Inhumans and models alike high frequency items were learned signicantlyfaster than low frequency ones regular items were learned signicantlyfaster than irregular ones there was a signicant frequency by regularityinteraction where the frequency effect was less for regular items than forirregular ones and this is qualied as the higher level interaction with blockwhereby there is a developmental trendmdashthe frequency effect for regularitems attenuates faster than that for irregular items

We have demonstrated that the models can generalise and produce thedefault plural afx for a novel stimulus Similar ldquowug testrdquo performance by ahuman learner would be taken as an operationalisation that they hadacquired the ldquoregularrdquo morphological systematicity

Finally we have shown how varying the computational capacity of themodels affects both the rate of acquisition of default case the presence orabsence of frequency effects for regular items and ability to acquireirregular items This is compatible with existing data for children withspecic language impairment (SLI) Oetting and Rice (1993) compared ve-year-old SLI children with age-matched controls on their ability to formplurals The SLI children were signicantly worse at generating regularplurals for nonce (5 wug) items they were worse at generating regularplurals and they showed an effect of frequency on the regular items whichthe control children because of ceiling effects did not UnfortunatelyOetting and Rice (1993) do not provide clear data on the childrenrsquos ability to

328 ELLIS AND SCHMIDT

form irregular plurals However their pattern of differences between SLIand control childrenrsquos performance on regular items is sufciently close tothat between the present low-capacity and high-capacity simulations tosuggest that morphosyntactic impairments in individuals with SLI might beexplained by reduced language processing capacity in a general associativememory network rather than by a hybrid account The SLI childrenrsquosshowing frequency effects for regular items is particularly compelling in thisrespect However further assessment of regularity by frequency effects anddefault abstraction in individuals with SLI and with Williams syndrome(whose ability on regular forms is said to outstrip their performance onirregularsmdashBellugi Bihrle Jernigan Trauner amp Dougherty 1990) isnecessary to test these parallels further (see Marchman 1993 for othersimulations of different types of language dysfunction)

GENERAL DISCUSSION

Fluent language users have processed many millions of utterances involvingtens of thousands of types presented as innumerable tokens It should comeas no surprise either that they demonstrate such effortless and complex skillas a result of this mass of practice or that researchers lacking any truerecord of the learnersrsquo experience are awed and confused by thesesophisticated grammatical abilities While we have no wish to deny any ofthe complexity of the nal uent state we suspect that much of the mysteryof morphology can be claried by focusing on the acquisition process ratherthan the end-point This has been our aim in this paper Our MAL is atravesty of natural language but at least we know the types and tokens in thelearnersrsquo language evidence and there is no need to speculate or argue aboutextrapolations from corpus data or assumptions about registers

Human learning of this MAL inectional morphology quickly culminatesin a state where as with natural language frequency and regularity haveinteractive effects on performance But as we chart acquisition it is clearthat this interaction need not imply complex dual-mechanisms of processingRather it simply reects the asymptotes expected from the power law ofpractice a simple associative law of learning Thus we have shown that oneof the most frequently introduced arguments for the necessity of adual-mechanism approach a frequency effect for irregulars and the absenceof such an effect for regulars is not a good argument at all Furthermore wehave demonstrated that a simple connectionist model as an implementationof associative learning provided with the same language evidenceaccurately simulates the human acquisition data

But how is the power law instantiated in human and connectionistsystems and what is being associated in the acquisition of inectional

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 329

morphosyntax The power law of learning in human performance has beeninterpreted as resulting from basic associative mechanisms involving theformation of new chunks and the effects of frequency on the accessibility ofthese representations (Newell 1990 Newell amp Rosenbloom 1981)Anderson and Schooler (1991) suggest that memory (both as its behaviouralexpression in error rate and latency and as its neural expression in LTP)displays properties such as the power law of learning because theseproperties reect an optimal response to the environment where theprobability of an item occurring at any particular time is a power function ofits past frequency of occurrence Neural activation which controlsbehaviour reects the probability of an item occurring in the environmentthus the neural processes are designed to adapt behaviour to the statisticalproperties of the environment (Anderson 1993) Connectionist systems aredesigned to do the same thing (Chater 1995)

In our simplied account of inectional morphology where phonologicalfactors are put to one side the relevant units for chunking are the stem formsand the plural afxes From an associative perspective regularity andfrequency are essentially the same factor under different names The rstmeaning of ldquoregularrdquo in the Pocket Oxford Dictionary involves ldquohabitualconstantrdquo acts a denition in terms of statistical frequencies consistencyand descriptive generalisation the second stresses ldquoconforming to a rule orprinciplerdquo We need to disentangle these senses (see Sharwood-Smith 1994and Lima Corrigan amp Iverson 1994 for conceptual analysis of ldquorules oflanguagerdquo) Whether regular morphology is generated according to a rule ornot it is certainly the case for English and the MAL under study here (andgenerally it is the default if not the universal casemdashwe will return to thismatter later) that regular afxes are more habitual or frequent And asdemonstrated in Fig 5 the power law of practice entails that an effect of aconstant increment of regularity (in its frequency sense) is much moreapparent at low than at high frequencies of practice

Although it is a general principle the degree to which it applies dependson a range of factors including (a) the exponent of the power function (b)the particular level of experience attained and thus the placement ofcomparison points on the learning curve and (c) the degree to whichfrequency and regularity are additive or multiplicative In the presentexperiment a vefold increase in the frequency of the regular items resultsin a (5 3 the number of regular items) increase in use of the regular afx avefold increase in the frequency of an irregular item results in merely avefold increase in the use of the irregular afx Thus frequency andregularity are interactive rather than additive But even if we allow forinteraction the function still results in greater regularity effects for lowfrequency itemsmdashjust as for example the power function

330 ELLIS AND SCHMIDT

FIG 5 A frequency by regularity interaction arising from additive contributions of regularity(solid horizontal arrows) and frequency (dotted horizontal arrows) inputting into anasymptoting power function Notice in particular the solid vertical bars measuring out the largeregularity effect at low frequencies and the much smaller one at high frequencies (Adaptedfrom Plaut McClelland Seidenberg amp Patterson 1994)

y 5 1 2 x2 2

asymptotes so does any power function

y 5 1 2 (xn)2 2

where n 0 the shape remains the same albeit stretched or condensedalong the horizontal axis Thus all associative accounts of morphologywhether they stress the importance of type or token frequency (Bybee 1995)in the determination of statistical regularity imply a frequency by regularityinteraction in performance

Plaut et al (1996) analyse the operation of connectionist networks in theparticular quasi-regular domain of spellingndashsound consistency in reading todemonstrate how the frequency by regularity interaction is a direct

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 331

consequence of the nonlinearity adaptivity and distributed representationproperties of learning and representation in PDP networks In what followswe will minimally rephrase their analysis as it applies to the quasi-regulardomain of inectional morphology In a connectionist network the weightchanges induced by an inputoutput pattern (IOP) on any training epochserve to reduce the error on that IOP The frequency of the IOP (and theunits it involves) is reected in how often it is presented to the network Thusword frequency directly amplies weight changes that are helpful to theIOP itself Consistency of the morphological inections of two stems isreected in the similarity of afx units that are co-activated in their IOPsFurthermore two inputs will induce similar weight changes to the extentthat they activate similar units In our MAL as an extreme case consistentforms all activate the same afx unit irregular ones each activate a differentidiosyncratic afx Given that the weight changes that are induced by eachIOP are superimposed on the weight changes for all other IOPs an IOPwill tend to be helped by the weight changes for IOPs whose inputoutputmappings are consistent with its own and hindered by the weight changesfor inconsistent IOPs Thus frequency and consistency sum because theyboth arise from similar weight changes that are simply added together duringtraining The weight changes result in corresponding increases in thesummed input to output units that should be active and decreases to thesummed units that should be inactive However due to the non-linearity ofthe input-output function of units these changes do not produce directlyproportionate reduction of error Rather as the magnitude of the summedinput to output units increases their states gradually asymptote towards10mdasha given increase in the summed input to a unit yields progressivelysmaller decrements in error over the course of training Thus althoughfrequency and regularity-as-consistency each contribute to the weights andhence to the summed input to units their effect on error is subjected to agradual ceiling effect as unit states are driven towards extremal values

Thus a connectionist associative account of simple morphosyntax as it isembodied in our MAL holds that learning involves associating inputpatterns representing single or plural concepts with stem and afx lemmasacross a large distributed network Frequency of experience increases thestrength of the appropriate IO associations Regularity effects stem fromconsistency the consistent items all involve pairings between plurality andthe regular lemma and thus regularity is frequency by another name Thenetwork sums and abstracts these consistencies but it does so usingnon-linear unit inputndashoutput functions thereby resulting in the frequency byregularity interaction Networks are not simple competitive chunking orMarkov chaining mechanisms working on surface form Their massivelydistributed nature allows the emergence of more abstract internalrepresentations We have argued that this analysis accounts for the human

332 ELLIS AND SCHMIDT

acquisition data of simple MAL morphosyntax quite well We believe thatthe acquisition of natural language morphosyntax where there are manyadditional factors of different phonological consistencies (of the type forexample where the neighbours sink drink and stink are irregular in theirpast tenses but all behave in the same -ankway) are equally conducive to theprinciples of this type of account although as illustrated in grandersimulation enterprises (Cottrell amp Plunkett 1994 Daugherty amp Seidenberg1994 MacWhinney amp Leinbach 1991 Marchman 1993 Plunkett ampMarchman 1993) the complexity of interaction of the factors that are therein the language evidence leads to much more complex developmentaloutcomes Our role here has been to study human acquisition underprecisely known circumstances and to demonstrate just how well aconnectionist associative account can simulate these data

A simple regularity5 consistency account of this type will have difculty ifthe ldquoregularrdquo or ldquodefaultrdquo case is not the most frequent case in a naturallanguage Although there is agreement for English past tense and formorphology more generally that the default case is more frequent theremay be exceptions Marcus et al (1995) argue that while the German particle-t applies to a much smaller percentage of verbs than its English counterpartand the German plural -s applies only to a small percentage of nounsnevertheless these afxes behave as defaults in the language These defaultsufxations in German could thus pose a problem for statistical orconnectionist accounts of the acquisition of the more frequent patterns asdefault since they may not be due to a large number of regular wordsreinforcing a pattern in associative memory (Prasada amp Pinker 1993)However this is still a matter of some debate Bybee (1995) suggests that amore reasonable method of counting German particle type frequency doesshow the default (or ldquoproductiverdquo) process to have the highest typefrequency She also argues that to a large extent the productivity patterns ofGerman plurals also reect their type frequency Nakisa and Hahn (1996)and Plunkett and Nakisa (in press) show that generalisation to unseen ornovel forms in German and Arabic (where there have also been claims for aminority default) is more accurately predicted by their phonologicalsimilarity to existing forms in the language (properly represented for typeand token frequency) rather than by the operation of a default rule FinallyHare Elman and Daugherty (1995) demonstrate that multilayerednetworks can develop a default category even in the absence of superior typefrequency as long as the non-default classes are well dened and narrowlydened so that they serve as strong prototypes for analogising to novelforms In such cases the area outside these well-dened attractor basins canconstitute a potential default (see also Plunkett amp Marchman 1991)

In the original hybrid model irregulars were stored and accessed fromrote memory Pinker and Prince (1994 p 326) modied this part of the

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 333

model arguing that since rote memory could not account (a) for similaritiesbetween the morphological base and irregular forms (eg swingndashswung) (b)for similarity within sets of base forms undergoing similar processes (egsingndashsang ringndashrang springndashsprang) or (c) for the kind of semi-productivityshown when children produce errors such as bringndashbrang or swingndashswangthe memory system underlying such productions must be associative anddynamic somewhat as connectionism portrays it Yet to account for datasuch as the frequencyregularity interaction this revised hybrid model stillholds that regular forms are rule-governed But a purely rule-based accountof regulars cannot explain false friends effects where regular inconsistentitems (eg bakendashbaked is similar in rhyme to neighbours makendashmade andtakendashtook which have inconsistent past tenses) are produced more slowlythat entirely regular ones (Daugherty amp Seidenberg 1994 Seidenberg ampBruck 1990) or frequency effects on regular forms (Oetting amp Rice 1993Stemberger amp MacWhinney 1986) Unlike connectionist models a rule-based account of regulars cannot explain these aspects of the human dataNor is the regularityfrequency interaction any reason to reject connectionistaccounts of morphosyntax in favour of a hybrid model

REFERENCESAnderson JR (1982) Acquisition of cognitive skill Psychological Review 89 369ndash406Anderson JR (1993) Rules of the mind Hillsdale NJ Lawrence Erlbaum Associates IncAnderson JR amp Schooler LJ (1991) Reections of the environment in memory

Psychological Science 2 396ndash408Beck M (1995) Tracking down the source of NSndashNNS differences in syntactic competence

Unpublished manuscript University of North TexasBellugi U Bihrle A Jernigan D Trauner D amp Dougherty S (1990)

Neuropsychological neurological and neuroanatomical prole of Williams SyndromeAmerican Journal of Medical Genetics 6 115ndash125

Braine MDS Brody RE Brooks PJ Sudhalter V Ross JA Catalano L amp FischSM (1990) Exploring language acquisition in children with a miniature articiallanguage Effects of item and pattern frequency arbitrary subclasses and correctionJournal of Memory and Language 29 591ndash610

Broeder P amp Plunkett K (1994) Connectionism and second language acquisition In NEllis (Ed) Implicit and explicit learning of languages (pp 421ndash454) London AcademicPress

Bybee J (1995) Regular morphology and the lexicon Language and Cognitive Processes10 425ndash455

Chater N (1995) Neural networks The new statistical models of mind In JP Levy DBairaktaris JA Bullinaria amp P Cairns (Eds) Connectionist models of memory andlanguage London UCL Press

Cohen JD MacWhinney B Flatt M amp Provost J (1993) PsyScope A new graphicinteractive environment for designing psychology experiments Behavioral ResearchMethods Instruments and Computers 25 257ndash271

Cottrell G amp Plunkett K (1994) Acquiring the mapping from meaning to soundsConnection Science 6 379ndash412

334 ELLIS AND SCHMIDT

Daugherty KG amp Seidenberg MS (1992) Rules or connections The past tense revisitedIn Proceedings of the 14th annual conference of the Cognitive Science Society (pp 259ndash264)Pittsburgh PA Cognitive Science Society

Daugherty KG amp Seidenberg MS (1994) Beyond rules and exceptions A connectionistapproach to inectional morphology In SD Lima RL Corrigan amp GK Iverson (Eds)The reality of linguistic rules (pp 353ndash388) Amsterdam John Benjamins

DeKeyser R (1997) Beyond explicit rule learning Automatizing second languagemorphosyntax Studies in Second Language Acquisition 19 195ndash222

Ellis NC (1996) Sequencing in SLA Phonological memory chunking and points of orderStudies in Second Language Acquisition 18 91ndash126

Eubank L amp Gregg KR (1995) ldquoEt in Amygdala Egordquo UG (S)LA and neurobiologyStudies in Second Language Acquisition 17 35ndash58

Hare M Elman JL amp Daugherty KG (1995) Default generalisation in connectionistnetworks Language and Cognitive Processes 10 601ndash630

Jung J (1971) The experimenterrsquos dilemma New York Harper amp RowKirsner K (1994) Implicit processes in second language learning In N Ellis (Ed) Implicit

and explicit learning of languages (pp 283ndash312) London Academic PressLachter J amp Bever T (1988) The relation between linguistic structure and associative

theories of language learning A constructive critique of some connectionist learningmodels Cognition 28 195ndash247

Lima SD Corrigan RL amp Iverson GK (Eds) (1994) The reality of linguistic rulesAmsterdam John Benjamins

MacWhinney B (1983) Miniature language systems as tests of use of universal operatingprinciples in second-language learning by children and adults Journal of PsycholinguisticResearch 12 467ndash478

MacWhinney B (1994) The dinosaurs and the ring In SD Lima RL Corrigan amp GKIverson (Eds) The reality of linguistic rules (pp 283ndash320) Amsterdam John Benjamins

MacWhinney B amp Leinbach J (1991) Implementations are not conceptualizationsRevising the verb learning model Cognition 40 121ndash157

Marchman VA (1993) Constraints on plasticity in a connectionist model of the Englishpast tense Journal of Cognitive Neuroscience 5 215ndash234

Marcus GF Brinkmann U Clahsen H Wiese R amp Pinker S (1995) Germaninection The exception that proves the rule Cognitive Psychology 29 198ndash256

McLaughlin B (1980) On the use of miniature articial languages in second-languageresearch Applied Psycholinguistics 1 357ndash369

Moeser SD amp Bregman AS (1972) The role of reference in the acquisition of a miniaturearticial language Journal of Verbal Learning and Verbal Behavior 11 759ndash769

Morgan JL Meier RP amp Newport EL (1987) Structural packaging in the input tolanguage learning Contributions of prosodic and morphological marking of phrases to theacquisition of language Cognitive Psychology 19 498ndash550

Morgan JL amp Newport EL (1981) The role of constituent structure in the induction of anarticial language Journal of Verbal Learning and Verbal Behavior 20 67ndash85

Morton J (1979) Facilitation in word recognition Experiments causing change in thelogogen model In PA Kolers ME Wrolstad amp M Bouma (Eds) Processing of visiblelanguage (pp 259ndash268) New York Plenum

Nakisa R amp Hahn U (1996) Where defaults donrsquot help The case of the German pluralsystem In Proceedings of the 18th annual conference of the Cognitive Science Society (pp177ndash182) Hillsdale NJ Lawrence Erlbaum Associates Inc

Newell A (1990) Unied theories of cognition Cambridge MA Harvard University PressNewell A amp Rosenbloom P (1981) Mechanisms of skill acquisition and the law of

practice In JR Anderson (Ed) Cognitive skills and their acquisition Hillsdale NJLawrence Erlbaum Associates Inc

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 335

Oetting JB amp Rice ML (1993) Plural acquisition in children with specic languageimpairment Journal of Speech and Hearing Research 36 1236ndash1248

Paivio A (1986) Mental representations A dual coding approach Oxford UK OxfordUniversity Press

Palermo DS amp Howe HE (1970) An experimental analogy to the learning of past-tenseinection rules Journal of Verbal Learning and Verbal Behavior 9 410ndash416

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S amp Prince A (1988) On language and connectionism Analysis of a parallel

distributed processing model of language acquisition Cognition 29 195ndash247Pinker S amp Prince A (1994) Regular and irregular morphology and the psychological

status of rules of grammar In SD Lima RL Corrigan amp GK Iverson (Eds) The reality oflinguistic rules (pp 321ndash351) Amsterdam John Benjamins

Plaut DC McClelland JL Seidenberg MS amp Patterson KE (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plunkett K amp Marchman V (1991) U-shaped learning and frequency effects in amulti-layered perceptron Implications for child language acquisition Cognition 38 3ndash102

Plunkett K amp Marchman V (1993) From rote learning to system building Acquiring verbmorphology in children and connectionist nets Cognition 48 21ndash69

Plunkett K amp Nakisa RC (in press) A connectionist model of Arabic plural systemLanguage and Cognitive Processes

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Prasada S Pinker S amp Snyder W (1990) Some evidence that irregular forms are retrievedfrom memory but regular forms are rule-governed Paper presented at the 31st meeting ofthe Psychonomic Society New Orleans November

Rumelhart D Hinton G amp Williams R (1986) Learning internal representations by backpropagation In DE Rumelhart amp JL McClelland (Ed) Parallel distributed processingExplorations in the microstructure of cognition Cambridge MA MIT Press

Rumelhart D amp McClelland J (1986) On learning the past tense of English verbs In DERumelhart amp JL McClelland (Eds) Parallel distributed processing Explorations in themicrostructure of cognition Vol 2 Psychological and biological models (pp 272ndash326)Cambridge MA MIT Press

Seidenberg MS amp Bruck M (1990) Consistency effects in the generation of past tensemorphology Paper presented at the 31st meeting of the Psychonomic Society New OrleansNovember

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behavior 23 383ndash404

Sharwood Smith MA (1994) The unruly world of language In N Ellis (Ed) Implicit andexplicit learning of languages (pp 33ndash44) London Academic Press

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Human Learning and Memory 6 174ndash215

Stemberger JP amp MacWhinney B (1986) Frequency and the lexical storage of regularlyinected forms Memory and Cognition 14 17ndash26

Winter B amp Reber AS (1994) Implicit learning and the acquisition of natural languagesIn N Ellis (Ed) Implicit and explicit learning of languages (pp 115ndash146) LondonAcademic Press

Yang LR amp Givon T (1997) Benets and drawbacks of controlled laboratory studies ofsecond language acquisition The Keck second language learning project Studies in SecondLanguage Acquisition 19 173ndash194

336 ELLIS AND SCHMIDT

APPENDIX

The Word-forms of the Articial Language

Picture Stem Plural Form Frequency Regularity

car garth bugarth 5 Rbed pid bupid 1 Rlamp lant bulant 5 Rtable tib butid 1 Rplane poon bupoon 5 Rball prill buprill 1 Rtrain dram budram 5 Rhouse hize buhize 1 Rbook bisk bubisk 5 Rbroom breen bubreen 1 Rphone feem gofeem 5 Iumbrella brol gubrol 1 Ichair charp zecharp 5 Ihorse naig zonaig 1 Imonkey chonk nuchonk 5 Idog woop niwoop 1 Ielephant fant vefant 5 Iscissors zoze vuzoze 1 Ikite kag rekag 5 Ish pisc ropisc 1 I

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 321

was on and all others off O-unit 1 should have had value 10 and all otherszero) On each trial the back-propagation algorithm calculated thedifference between the level of activity that was produced on each O-unitand the ldquocorrectrdquo level of activity and a small adjustment was made to theconnection strength to that unit in such a way that when the same processoccurred again a closer approximation to the correct pattern of outputactivation would be achieved The models were trained for 500 epochs ofsingular experience For each size of model we ran ve examples startingwith different arbitrary unit allocation and different initial randomconnection strengths The data we produce for each model is the averageperformance of these ve examples

Plural Training The model weights that resulted from this singulartraining then served as the starting point for another 700 epochs of trainingon plurals The trials constituting each epoch were very similar in nature tothose used with the human learners Each epoch consisted of 81 trialspresented in random order (a) One presentation of each of the 21 singularforms as in the preceding phase (b) ve presentations of each of the ve highfrequency regular (HiFreqReg) plural forms (c) ve presentations of eachof the ve high frequency irregular (HiFreqIrreg) plural forms (d) onepresentation of each of the ve low frequency regular (LoFreqReg) formsand (e) one presentation of each of the ve low frequency irregular(LoFreqIrreg) forms For training trials of type (a) just one of I-units 1ndash21was activated I-unit 22 was off and just the corresponding one of O-units1ndash21 was reinforced For the other training types (bndashe) one of I-units 1ndash20was activated I-unit 22 was on and one of O-units 1ndash20 (the correspondingstem form) along with one of O-units 22ndash32 (the corresponding plural afx)were reinforced The learning algorithm operated as it did in the stemtraining phase At regular intervals we tested the state of learning of themodel by presenting it without feedback with test input patterns thatrepresented the plural cases of all 21 pictures At these tests for eachstimulus we measured the pattern of activation (between 0 [no activation]and 1 [full on]) across O-units 22ndash32 and compared it against the targetplural activation for that input pattern

Results

Regularity by Frequency Figure 2 shows the Root Mean Square (RMS)error calculated across the plural afx O-units (22ndash32) averaged over the veitems in each of the following classes HiFreqReg HiFreqIrreg LoFreqRegLoFreqIrreg at each point in testing of the model These graphs illustratethat learning in all of the models showed clear effects of frequency (high

322

FIG

2

Acq

uisi

tion

data

for

fou

r co

nnec

tioni

st m

odel

s w

ith

incr

easi

ng c

ompu

tati

onal

pow

er t

rain

ed o

n th

e M

AL

mor

phol

ogy

The

re a

re c

lear

reg

ular

ity b

y fr

eque

ncy

inte

ract

ions

in a

ll m

odel

s

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 323

frequency items were learned faster than low frequency ones) regularity(regular items were learned faster than irregular ones) and a frequency byregularity interaction whereby there was much less regularity effect for highfrequency items than for low frequency items and equally that thefrequency effect was less for regular items than for irregular ones

ANOVAs on these RMS data for each size of model demonstrated thatthere was high consistency of response across items and examplesimulations For example when the 8HU model was analysed as a repeatedmeasures ANOVA across 15 roughly equally spaced blocks of training (toparallel the human data analysis) the following signicant effects wereobserved (a) Frequency [by simulations F(1 16) 5 2080 P 00005 bywords F(1 16) 5 5665 P 00001] (b) regularity [by simulations F(1 16)5 907 P 001 by words F(1 16) 5 3957 P 00001] (c) regularity byfrequency [by simulations F(1 16) 5 485 P 005 by words F(1 16) 51561 P 0005] (d) block [by simulations F(14 224) 5 6803 P 00001by words F(14 224) 5 14914 P 00001] (e) block by regularity [bysimulations F(14 224) 5 3675 P 00001 by words F(14 224) 5 2929 P 00001] (f) block by frequency [by simulations F(14 224) 5 1893 P 00001 by words F(14 224) 5 1184 P 00001] and (g) block by regularityby frequency [by simulations F(14 224) 5 1611 P 00001 by words F(14224) 5 1306 P 00001]

Comparison of this pattern of ANOVA effects with that reported earlierfor the human data shows important similarities in both cases there aresignicant main effects of frequency regularity and blocks and there aresignicant interactions involving regularity by frequency and regularity byfrequency by block Thus the connectionist models demonstrate effectswhich broadly parallel those found in humans

Comparison with Human Data More detailed comparison is alsopossible Although RMS error is the usual measure of model performancebecause it assesses how well the network learns to inhibit non-relevant unitsas well as to excite relevant ones we also extracted simple accuracy data forthe 8HU model This accuracy score is the amount of activation (between 0and 1) on the single O-unit which corresponds to the appropriate target afxfor that input pattern Figure 3 shows the performance of the 8HU modelusing this metric It is clear that accuracy scores generate a graph which iseffectively a reection in a horizontal plane of the RMS data shown in thethird panel of Fig 2 In fact in the current simulations correct activation isalmost perfectly correlated with MSE (for example r 5 2 0988 for the 8HUmodel) However the activation metric has the advantage of more readyinterpretation and direct comparison with the human data

When the 8HU model and the human data are aligned as in Fig 3 thesecorrespondences become clear Pairwise comparison of individual points

324 ELLIS AND SCHMIDT

FIG 3 A comparison of human accuracy performance and that of the eight hidden unitconnectionist simulation

across these two graphs by correlation shows that the simulation predicts alarge proportion of the variance in the human data (R2 5 078) There aresome differences in detailmdashas is claried in Fig 4 where performance isaveraged over blocks the model performs somewhat better on the regularitems and worse on the irregular items particularly the low frequencyirregular items than do the humans ANOVA (three factor [humanmodelregularity and frequency] with 15 blocks as repeated measures by wordsanalysis) comparing the human and 8HU model data conrms theseinteractions (a) humanmodel F(1 32) 5 136 ns (b) humanmodel byfrequency F(1 32) 5 047 ns (c) humanmodel by regularity F(1 32) 53028 P 00001 (d) humanmodel by regularity by frequency F(1 32) 5501 P 005

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 325

FIG 4 The regularity by frequency interaction averaged over blocks in humans and the eighthidden unit model Error bars reect 95 condence intervals

Generalisation So far we have described performance with traineditems However we also tested model output when the stimulus was thepattern for generalisation item (TesterP) along with activation of the pluralmarking I-unit 22 a state of input on which the models had never beentrained Table 1 shows performance of the different models at the end oftraining It is clear that the larger models have abstracted the regular pluralpattern and tend to apply it by default to the generalisation test item for the15HU model (a) average activation on the regular plural O-unit is 060 (b)mean RMS error comparing observed activation across O-units 22ndash32 andthe target regular plural pattern (10000000000) is just 045 and (c) four outof the ve exemplar runs of this size of model chose the regular pluralpattern as being the closest to observed output as measured by minimum

326 ELLIS AND SCHMIDT

TABLE 1Performance on the Target Regular Plural Pattern for the Four Sizes of Model When

Presented with the Generalisation Wug-test Item TesterP at End of Training

Model Size

Measure 3HUa 5HU 8HU 15HU

RMS errorb

M 081 079 053 045SD 043 050 045 032

Activation weightc

M 020 028 057 060SD 044 044 052 035

N hits (5)d 1 2 3 4

There were ve examplars of each size of model aHU 5 hidden units bRMS error calculatedagainst the target activation pattern across O-units 22ndash32 for the regular plural afx cActivationweight on the regular plural afx O-Unit dNumber of exemplar models (5) which chose theregular plural afx pattern for TesterP as indexed by output weights on O-units 22ndash32 beingclosest to the regular plural afx target pattern activation using a squared Euclidean distancemetric

squared Euclidean distance Thus when the larger models are presentedwith a plural stimulus which they have only ever previously experienced as asingle form there is a tendency for them to generalise and apply the regularplural morpheme (bu-) in the same way that humans might generalise thatthe plural of ldquowugrdquo is ldquowugsrdquo

Effects of Different Sizes of Model Figure 2 also illustrates the effects ofmanipulating computational capacity of model (1) Models with lowercomputational power ( 5 a smaller number of HUs) learn the high frequencyitems quite wellmdashalmost as well as the largest model (2) The most strikingeffect of varying the computational power of the models lies in their abilitiesto learn low frequency irregular itemsmdashthis is by far the most sensitive indexof morphological learning ability The 3HU model hardly manages to learnthese forms at all The 15HU model eventually learns them rather well (3)There is essentially no frequency effect for regular items in the highercomputational power models but none the less the frequency effect forirregular items remains strong (4) The smaller models continue to show afrequency effect for regular items at the end of training Table l provides oneadditional effect of model size (5) The greater the computational power ofthe models the more they operate in ldquorule-likerdquo way by abstracting aldquoregularrdquo plural form which is applied by default to novel items In sumwhile lower computational power models are reasonably good on highfrequency regular items they show frequency effects for irregular and

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 327

regular items are relatively poor on ldquowug testsrdquo and have particulardifculty on low frequency irregular items

Discussion of Simulations

We believe that at least for the issue of regularity and frequency effects inmorphosyntax this is to date the most complete quantitative analysis of theadequacy of t of simulation to human data We are not simply makingpredictions about how an underspecied model might behave (theDaugherty amp Seidenberg 1994 criticisms of the Pinker amp Prince 1988 andPinker 1991 theories) We are not simply demonstrating that simulation andhuman data alike exhibit rst order interactions of frequency and regularity(Daugherty amp Seidenberg 1994) Instead we are showing the parallelpatterns of signicance of main effects rst and second order interactions inANOVAs of simulation and human data and we are showing that thesimulations explain close to 80 of the relevant human data When we go asfar as actually comparing human and model performance in a multifactorialANOVA we nd some differences of detail in the size of interactions thatare qualied by the humanmodel factor But these differences of detail donot detract from the general success of the models in simulating the humanpattern of development of the frequency by regularity interaction Inhumans and models alike high frequency items were learned signicantlyfaster than low frequency ones regular items were learned signicantlyfaster than irregular ones there was a signicant frequency by regularityinteraction where the frequency effect was less for regular items than forirregular ones and this is qualied as the higher level interaction with blockwhereby there is a developmental trendmdashthe frequency effect for regularitems attenuates faster than that for irregular items

We have demonstrated that the models can generalise and produce thedefault plural afx for a novel stimulus Similar ldquowug testrdquo performance by ahuman learner would be taken as an operationalisation that they hadacquired the ldquoregularrdquo morphological systematicity

Finally we have shown how varying the computational capacity of themodels affects both the rate of acquisition of default case the presence orabsence of frequency effects for regular items and ability to acquireirregular items This is compatible with existing data for children withspecic language impairment (SLI) Oetting and Rice (1993) compared ve-year-old SLI children with age-matched controls on their ability to formplurals The SLI children were signicantly worse at generating regularplurals for nonce (5 wug) items they were worse at generating regularplurals and they showed an effect of frequency on the regular items whichthe control children because of ceiling effects did not UnfortunatelyOetting and Rice (1993) do not provide clear data on the childrenrsquos ability to

328 ELLIS AND SCHMIDT

form irregular plurals However their pattern of differences between SLIand control childrenrsquos performance on regular items is sufciently close tothat between the present low-capacity and high-capacity simulations tosuggest that morphosyntactic impairments in individuals with SLI might beexplained by reduced language processing capacity in a general associativememory network rather than by a hybrid account The SLI childrenrsquosshowing frequency effects for regular items is particularly compelling in thisrespect However further assessment of regularity by frequency effects anddefault abstraction in individuals with SLI and with Williams syndrome(whose ability on regular forms is said to outstrip their performance onirregularsmdashBellugi Bihrle Jernigan Trauner amp Dougherty 1990) isnecessary to test these parallels further (see Marchman 1993 for othersimulations of different types of language dysfunction)

GENERAL DISCUSSION

Fluent language users have processed many millions of utterances involvingtens of thousands of types presented as innumerable tokens It should comeas no surprise either that they demonstrate such effortless and complex skillas a result of this mass of practice or that researchers lacking any truerecord of the learnersrsquo experience are awed and confused by thesesophisticated grammatical abilities While we have no wish to deny any ofthe complexity of the nal uent state we suspect that much of the mysteryof morphology can be claried by focusing on the acquisition process ratherthan the end-point This has been our aim in this paper Our MAL is atravesty of natural language but at least we know the types and tokens in thelearnersrsquo language evidence and there is no need to speculate or argue aboutextrapolations from corpus data or assumptions about registers

Human learning of this MAL inectional morphology quickly culminatesin a state where as with natural language frequency and regularity haveinteractive effects on performance But as we chart acquisition it is clearthat this interaction need not imply complex dual-mechanisms of processingRather it simply reects the asymptotes expected from the power law ofpractice a simple associative law of learning Thus we have shown that oneof the most frequently introduced arguments for the necessity of adual-mechanism approach a frequency effect for irregulars and the absenceof such an effect for regulars is not a good argument at all Furthermore wehave demonstrated that a simple connectionist model as an implementationof associative learning provided with the same language evidenceaccurately simulates the human acquisition data

But how is the power law instantiated in human and connectionistsystems and what is being associated in the acquisition of inectional

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 329

morphosyntax The power law of learning in human performance has beeninterpreted as resulting from basic associative mechanisms involving theformation of new chunks and the effects of frequency on the accessibility ofthese representations (Newell 1990 Newell amp Rosenbloom 1981)Anderson and Schooler (1991) suggest that memory (both as its behaviouralexpression in error rate and latency and as its neural expression in LTP)displays properties such as the power law of learning because theseproperties reect an optimal response to the environment where theprobability of an item occurring at any particular time is a power function ofits past frequency of occurrence Neural activation which controlsbehaviour reects the probability of an item occurring in the environmentthus the neural processes are designed to adapt behaviour to the statisticalproperties of the environment (Anderson 1993) Connectionist systems aredesigned to do the same thing (Chater 1995)

In our simplied account of inectional morphology where phonologicalfactors are put to one side the relevant units for chunking are the stem formsand the plural afxes From an associative perspective regularity andfrequency are essentially the same factor under different names The rstmeaning of ldquoregularrdquo in the Pocket Oxford Dictionary involves ldquohabitualconstantrdquo acts a denition in terms of statistical frequencies consistencyand descriptive generalisation the second stresses ldquoconforming to a rule orprinciplerdquo We need to disentangle these senses (see Sharwood-Smith 1994and Lima Corrigan amp Iverson 1994 for conceptual analysis of ldquorules oflanguagerdquo) Whether regular morphology is generated according to a rule ornot it is certainly the case for English and the MAL under study here (andgenerally it is the default if not the universal casemdashwe will return to thismatter later) that regular afxes are more habitual or frequent And asdemonstrated in Fig 5 the power law of practice entails that an effect of aconstant increment of regularity (in its frequency sense) is much moreapparent at low than at high frequencies of practice

Although it is a general principle the degree to which it applies dependson a range of factors including (a) the exponent of the power function (b)the particular level of experience attained and thus the placement ofcomparison points on the learning curve and (c) the degree to whichfrequency and regularity are additive or multiplicative In the presentexperiment a vefold increase in the frequency of the regular items resultsin a (5 3 the number of regular items) increase in use of the regular afx avefold increase in the frequency of an irregular item results in merely avefold increase in the use of the irregular afx Thus frequency andregularity are interactive rather than additive But even if we allow forinteraction the function still results in greater regularity effects for lowfrequency itemsmdashjust as for example the power function

330 ELLIS AND SCHMIDT

FIG 5 A frequency by regularity interaction arising from additive contributions of regularity(solid horizontal arrows) and frequency (dotted horizontal arrows) inputting into anasymptoting power function Notice in particular the solid vertical bars measuring out the largeregularity effect at low frequencies and the much smaller one at high frequencies (Adaptedfrom Plaut McClelland Seidenberg amp Patterson 1994)

y 5 1 2 x2 2

asymptotes so does any power function

y 5 1 2 (xn)2 2

where n 0 the shape remains the same albeit stretched or condensedalong the horizontal axis Thus all associative accounts of morphologywhether they stress the importance of type or token frequency (Bybee 1995)in the determination of statistical regularity imply a frequency by regularityinteraction in performance

Plaut et al (1996) analyse the operation of connectionist networks in theparticular quasi-regular domain of spellingndashsound consistency in reading todemonstrate how the frequency by regularity interaction is a direct

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 331

consequence of the nonlinearity adaptivity and distributed representationproperties of learning and representation in PDP networks In what followswe will minimally rephrase their analysis as it applies to the quasi-regulardomain of inectional morphology In a connectionist network the weightchanges induced by an inputoutput pattern (IOP) on any training epochserve to reduce the error on that IOP The frequency of the IOP (and theunits it involves) is reected in how often it is presented to the network Thusword frequency directly amplies weight changes that are helpful to theIOP itself Consistency of the morphological inections of two stems isreected in the similarity of afx units that are co-activated in their IOPsFurthermore two inputs will induce similar weight changes to the extentthat they activate similar units In our MAL as an extreme case consistentforms all activate the same afx unit irregular ones each activate a differentidiosyncratic afx Given that the weight changes that are induced by eachIOP are superimposed on the weight changes for all other IOPs an IOPwill tend to be helped by the weight changes for IOPs whose inputoutputmappings are consistent with its own and hindered by the weight changesfor inconsistent IOPs Thus frequency and consistency sum because theyboth arise from similar weight changes that are simply added together duringtraining The weight changes result in corresponding increases in thesummed input to output units that should be active and decreases to thesummed units that should be inactive However due to the non-linearity ofthe input-output function of units these changes do not produce directlyproportionate reduction of error Rather as the magnitude of the summedinput to output units increases their states gradually asymptote towards10mdasha given increase in the summed input to a unit yields progressivelysmaller decrements in error over the course of training Thus althoughfrequency and regularity-as-consistency each contribute to the weights andhence to the summed input to units their effect on error is subjected to agradual ceiling effect as unit states are driven towards extremal values

Thus a connectionist associative account of simple morphosyntax as it isembodied in our MAL holds that learning involves associating inputpatterns representing single or plural concepts with stem and afx lemmasacross a large distributed network Frequency of experience increases thestrength of the appropriate IO associations Regularity effects stem fromconsistency the consistent items all involve pairings between plurality andthe regular lemma and thus regularity is frequency by another name Thenetwork sums and abstracts these consistencies but it does so usingnon-linear unit inputndashoutput functions thereby resulting in the frequency byregularity interaction Networks are not simple competitive chunking orMarkov chaining mechanisms working on surface form Their massivelydistributed nature allows the emergence of more abstract internalrepresentations We have argued that this analysis accounts for the human

332 ELLIS AND SCHMIDT

acquisition data of simple MAL morphosyntax quite well We believe thatthe acquisition of natural language morphosyntax where there are manyadditional factors of different phonological consistencies (of the type forexample where the neighbours sink drink and stink are irregular in theirpast tenses but all behave in the same -ankway) are equally conducive to theprinciples of this type of account although as illustrated in grandersimulation enterprises (Cottrell amp Plunkett 1994 Daugherty amp Seidenberg1994 MacWhinney amp Leinbach 1991 Marchman 1993 Plunkett ampMarchman 1993) the complexity of interaction of the factors that are therein the language evidence leads to much more complex developmentaloutcomes Our role here has been to study human acquisition underprecisely known circumstances and to demonstrate just how well aconnectionist associative account can simulate these data

A simple regularity5 consistency account of this type will have difculty ifthe ldquoregularrdquo or ldquodefaultrdquo case is not the most frequent case in a naturallanguage Although there is agreement for English past tense and formorphology more generally that the default case is more frequent theremay be exceptions Marcus et al (1995) argue that while the German particle-t applies to a much smaller percentage of verbs than its English counterpartand the German plural -s applies only to a small percentage of nounsnevertheless these afxes behave as defaults in the language These defaultsufxations in German could thus pose a problem for statistical orconnectionist accounts of the acquisition of the more frequent patterns asdefault since they may not be due to a large number of regular wordsreinforcing a pattern in associative memory (Prasada amp Pinker 1993)However this is still a matter of some debate Bybee (1995) suggests that amore reasonable method of counting German particle type frequency doesshow the default (or ldquoproductiverdquo) process to have the highest typefrequency She also argues that to a large extent the productivity patterns ofGerman plurals also reect their type frequency Nakisa and Hahn (1996)and Plunkett and Nakisa (in press) show that generalisation to unseen ornovel forms in German and Arabic (where there have also been claims for aminority default) is more accurately predicted by their phonologicalsimilarity to existing forms in the language (properly represented for typeand token frequency) rather than by the operation of a default rule FinallyHare Elman and Daugherty (1995) demonstrate that multilayerednetworks can develop a default category even in the absence of superior typefrequency as long as the non-default classes are well dened and narrowlydened so that they serve as strong prototypes for analogising to novelforms In such cases the area outside these well-dened attractor basins canconstitute a potential default (see also Plunkett amp Marchman 1991)

In the original hybrid model irregulars were stored and accessed fromrote memory Pinker and Prince (1994 p 326) modied this part of the

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 333

model arguing that since rote memory could not account (a) for similaritiesbetween the morphological base and irregular forms (eg swingndashswung) (b)for similarity within sets of base forms undergoing similar processes (egsingndashsang ringndashrang springndashsprang) or (c) for the kind of semi-productivityshown when children produce errors such as bringndashbrang or swingndashswangthe memory system underlying such productions must be associative anddynamic somewhat as connectionism portrays it Yet to account for datasuch as the frequencyregularity interaction this revised hybrid model stillholds that regular forms are rule-governed But a purely rule-based accountof regulars cannot explain false friends effects where regular inconsistentitems (eg bakendashbaked is similar in rhyme to neighbours makendashmade andtakendashtook which have inconsistent past tenses) are produced more slowlythat entirely regular ones (Daugherty amp Seidenberg 1994 Seidenberg ampBruck 1990) or frequency effects on regular forms (Oetting amp Rice 1993Stemberger amp MacWhinney 1986) Unlike connectionist models a rule-based account of regulars cannot explain these aspects of the human dataNor is the regularityfrequency interaction any reason to reject connectionistaccounts of morphosyntax in favour of a hybrid model

REFERENCESAnderson JR (1982) Acquisition of cognitive skill Psychological Review 89 369ndash406Anderson JR (1993) Rules of the mind Hillsdale NJ Lawrence Erlbaum Associates IncAnderson JR amp Schooler LJ (1991) Reections of the environment in memory

Psychological Science 2 396ndash408Beck M (1995) Tracking down the source of NSndashNNS differences in syntactic competence

Unpublished manuscript University of North TexasBellugi U Bihrle A Jernigan D Trauner D amp Dougherty S (1990)

Neuropsychological neurological and neuroanatomical prole of Williams SyndromeAmerican Journal of Medical Genetics 6 115ndash125

Braine MDS Brody RE Brooks PJ Sudhalter V Ross JA Catalano L amp FischSM (1990) Exploring language acquisition in children with a miniature articiallanguage Effects of item and pattern frequency arbitrary subclasses and correctionJournal of Memory and Language 29 591ndash610

Broeder P amp Plunkett K (1994) Connectionism and second language acquisition In NEllis (Ed) Implicit and explicit learning of languages (pp 421ndash454) London AcademicPress

Bybee J (1995) Regular morphology and the lexicon Language and Cognitive Processes10 425ndash455

Chater N (1995) Neural networks The new statistical models of mind In JP Levy DBairaktaris JA Bullinaria amp P Cairns (Eds) Connectionist models of memory andlanguage London UCL Press

Cohen JD MacWhinney B Flatt M amp Provost J (1993) PsyScope A new graphicinteractive environment for designing psychology experiments Behavioral ResearchMethods Instruments and Computers 25 257ndash271

Cottrell G amp Plunkett K (1994) Acquiring the mapping from meaning to soundsConnection Science 6 379ndash412

334 ELLIS AND SCHMIDT

Daugherty KG amp Seidenberg MS (1992) Rules or connections The past tense revisitedIn Proceedings of the 14th annual conference of the Cognitive Science Society (pp 259ndash264)Pittsburgh PA Cognitive Science Society

Daugherty KG amp Seidenberg MS (1994) Beyond rules and exceptions A connectionistapproach to inectional morphology In SD Lima RL Corrigan amp GK Iverson (Eds)The reality of linguistic rules (pp 353ndash388) Amsterdam John Benjamins

DeKeyser R (1997) Beyond explicit rule learning Automatizing second languagemorphosyntax Studies in Second Language Acquisition 19 195ndash222

Ellis NC (1996) Sequencing in SLA Phonological memory chunking and points of orderStudies in Second Language Acquisition 18 91ndash126

Eubank L amp Gregg KR (1995) ldquoEt in Amygdala Egordquo UG (S)LA and neurobiologyStudies in Second Language Acquisition 17 35ndash58

Hare M Elman JL amp Daugherty KG (1995) Default generalisation in connectionistnetworks Language and Cognitive Processes 10 601ndash630

Jung J (1971) The experimenterrsquos dilemma New York Harper amp RowKirsner K (1994) Implicit processes in second language learning In N Ellis (Ed) Implicit

and explicit learning of languages (pp 283ndash312) London Academic PressLachter J amp Bever T (1988) The relation between linguistic structure and associative

theories of language learning A constructive critique of some connectionist learningmodels Cognition 28 195ndash247

Lima SD Corrigan RL amp Iverson GK (Eds) (1994) The reality of linguistic rulesAmsterdam John Benjamins

MacWhinney B (1983) Miniature language systems as tests of use of universal operatingprinciples in second-language learning by children and adults Journal of PsycholinguisticResearch 12 467ndash478

MacWhinney B (1994) The dinosaurs and the ring In SD Lima RL Corrigan amp GKIverson (Eds) The reality of linguistic rules (pp 283ndash320) Amsterdam John Benjamins

MacWhinney B amp Leinbach J (1991) Implementations are not conceptualizationsRevising the verb learning model Cognition 40 121ndash157

Marchman VA (1993) Constraints on plasticity in a connectionist model of the Englishpast tense Journal of Cognitive Neuroscience 5 215ndash234

Marcus GF Brinkmann U Clahsen H Wiese R amp Pinker S (1995) Germaninection The exception that proves the rule Cognitive Psychology 29 198ndash256

McLaughlin B (1980) On the use of miniature articial languages in second-languageresearch Applied Psycholinguistics 1 357ndash369

Moeser SD amp Bregman AS (1972) The role of reference in the acquisition of a miniaturearticial language Journal of Verbal Learning and Verbal Behavior 11 759ndash769

Morgan JL Meier RP amp Newport EL (1987) Structural packaging in the input tolanguage learning Contributions of prosodic and morphological marking of phrases to theacquisition of language Cognitive Psychology 19 498ndash550

Morgan JL amp Newport EL (1981) The role of constituent structure in the induction of anarticial language Journal of Verbal Learning and Verbal Behavior 20 67ndash85

Morton J (1979) Facilitation in word recognition Experiments causing change in thelogogen model In PA Kolers ME Wrolstad amp M Bouma (Eds) Processing of visiblelanguage (pp 259ndash268) New York Plenum

Nakisa R amp Hahn U (1996) Where defaults donrsquot help The case of the German pluralsystem In Proceedings of the 18th annual conference of the Cognitive Science Society (pp177ndash182) Hillsdale NJ Lawrence Erlbaum Associates Inc

Newell A (1990) Unied theories of cognition Cambridge MA Harvard University PressNewell A amp Rosenbloom P (1981) Mechanisms of skill acquisition and the law of

practice In JR Anderson (Ed) Cognitive skills and their acquisition Hillsdale NJLawrence Erlbaum Associates Inc

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 335

Oetting JB amp Rice ML (1993) Plural acquisition in children with specic languageimpairment Journal of Speech and Hearing Research 36 1236ndash1248

Paivio A (1986) Mental representations A dual coding approach Oxford UK OxfordUniversity Press

Palermo DS amp Howe HE (1970) An experimental analogy to the learning of past-tenseinection rules Journal of Verbal Learning and Verbal Behavior 9 410ndash416

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S amp Prince A (1988) On language and connectionism Analysis of a parallel

distributed processing model of language acquisition Cognition 29 195ndash247Pinker S amp Prince A (1994) Regular and irregular morphology and the psychological

status of rules of grammar In SD Lima RL Corrigan amp GK Iverson (Eds) The reality oflinguistic rules (pp 321ndash351) Amsterdam John Benjamins

Plaut DC McClelland JL Seidenberg MS amp Patterson KE (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plunkett K amp Marchman V (1991) U-shaped learning and frequency effects in amulti-layered perceptron Implications for child language acquisition Cognition 38 3ndash102

Plunkett K amp Marchman V (1993) From rote learning to system building Acquiring verbmorphology in children and connectionist nets Cognition 48 21ndash69

Plunkett K amp Nakisa RC (in press) A connectionist model of Arabic plural systemLanguage and Cognitive Processes

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Prasada S Pinker S amp Snyder W (1990) Some evidence that irregular forms are retrievedfrom memory but regular forms are rule-governed Paper presented at the 31st meeting ofthe Psychonomic Society New Orleans November

Rumelhart D Hinton G amp Williams R (1986) Learning internal representations by backpropagation In DE Rumelhart amp JL McClelland (Ed) Parallel distributed processingExplorations in the microstructure of cognition Cambridge MA MIT Press

Rumelhart D amp McClelland J (1986) On learning the past tense of English verbs In DERumelhart amp JL McClelland (Eds) Parallel distributed processing Explorations in themicrostructure of cognition Vol 2 Psychological and biological models (pp 272ndash326)Cambridge MA MIT Press

Seidenberg MS amp Bruck M (1990) Consistency effects in the generation of past tensemorphology Paper presented at the 31st meeting of the Psychonomic Society New OrleansNovember

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behavior 23 383ndash404

Sharwood Smith MA (1994) The unruly world of language In N Ellis (Ed) Implicit andexplicit learning of languages (pp 33ndash44) London Academic Press

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Human Learning and Memory 6 174ndash215

Stemberger JP amp MacWhinney B (1986) Frequency and the lexical storage of regularlyinected forms Memory and Cognition 14 17ndash26

Winter B amp Reber AS (1994) Implicit learning and the acquisition of natural languagesIn N Ellis (Ed) Implicit and explicit learning of languages (pp 115ndash146) LondonAcademic Press

Yang LR amp Givon T (1997) Benets and drawbacks of controlled laboratory studies ofsecond language acquisition The Keck second language learning project Studies in SecondLanguage Acquisition 19 173ndash194

336 ELLIS AND SCHMIDT

APPENDIX

The Word-forms of the Articial Language

Picture Stem Plural Form Frequency Regularity

car garth bugarth 5 Rbed pid bupid 1 Rlamp lant bulant 5 Rtable tib butid 1 Rplane poon bupoon 5 Rball prill buprill 1 Rtrain dram budram 5 Rhouse hize buhize 1 Rbook bisk bubisk 5 Rbroom breen bubreen 1 Rphone feem gofeem 5 Iumbrella brol gubrol 1 Ichair charp zecharp 5 Ihorse naig zonaig 1 Imonkey chonk nuchonk 5 Idog woop niwoop 1 Ielephant fant vefant 5 Iscissors zoze vuzoze 1 Ikite kag rekag 5 Ish pisc ropisc 1 I

322

FIG

2

Acq

uisi

tion

data

for

fou

r co

nnec

tioni

st m

odel

s w

ith

incr

easi

ng c

ompu

tati

onal

pow

er t

rain

ed o

n th

e M

AL

mor

phol

ogy

The

re a

re c

lear

reg

ular

ity b

y fr

eque

ncy

inte

ract

ions

in a

ll m

odel

s

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 323

frequency items were learned faster than low frequency ones) regularity(regular items were learned faster than irregular ones) and a frequency byregularity interaction whereby there was much less regularity effect for highfrequency items than for low frequency items and equally that thefrequency effect was less for regular items than for irregular ones

ANOVAs on these RMS data for each size of model demonstrated thatthere was high consistency of response across items and examplesimulations For example when the 8HU model was analysed as a repeatedmeasures ANOVA across 15 roughly equally spaced blocks of training (toparallel the human data analysis) the following signicant effects wereobserved (a) Frequency [by simulations F(1 16) 5 2080 P 00005 bywords F(1 16) 5 5665 P 00001] (b) regularity [by simulations F(1 16)5 907 P 001 by words F(1 16) 5 3957 P 00001] (c) regularity byfrequency [by simulations F(1 16) 5 485 P 005 by words F(1 16) 51561 P 0005] (d) block [by simulations F(14 224) 5 6803 P 00001by words F(14 224) 5 14914 P 00001] (e) block by regularity [bysimulations F(14 224) 5 3675 P 00001 by words F(14 224) 5 2929 P 00001] (f) block by frequency [by simulations F(14 224) 5 1893 P 00001 by words F(14 224) 5 1184 P 00001] and (g) block by regularityby frequency [by simulations F(14 224) 5 1611 P 00001 by words F(14224) 5 1306 P 00001]

Comparison of this pattern of ANOVA effects with that reported earlierfor the human data shows important similarities in both cases there aresignicant main effects of frequency regularity and blocks and there aresignicant interactions involving regularity by frequency and regularity byfrequency by block Thus the connectionist models demonstrate effectswhich broadly parallel those found in humans

Comparison with Human Data More detailed comparison is alsopossible Although RMS error is the usual measure of model performancebecause it assesses how well the network learns to inhibit non-relevant unitsas well as to excite relevant ones we also extracted simple accuracy data forthe 8HU model This accuracy score is the amount of activation (between 0and 1) on the single O-unit which corresponds to the appropriate target afxfor that input pattern Figure 3 shows the performance of the 8HU modelusing this metric It is clear that accuracy scores generate a graph which iseffectively a reection in a horizontal plane of the RMS data shown in thethird panel of Fig 2 In fact in the current simulations correct activation isalmost perfectly correlated with MSE (for example r 5 2 0988 for the 8HUmodel) However the activation metric has the advantage of more readyinterpretation and direct comparison with the human data

When the 8HU model and the human data are aligned as in Fig 3 thesecorrespondences become clear Pairwise comparison of individual points

324 ELLIS AND SCHMIDT

FIG 3 A comparison of human accuracy performance and that of the eight hidden unitconnectionist simulation

across these two graphs by correlation shows that the simulation predicts alarge proportion of the variance in the human data (R2 5 078) There aresome differences in detailmdashas is claried in Fig 4 where performance isaveraged over blocks the model performs somewhat better on the regularitems and worse on the irregular items particularly the low frequencyirregular items than do the humans ANOVA (three factor [humanmodelregularity and frequency] with 15 blocks as repeated measures by wordsanalysis) comparing the human and 8HU model data conrms theseinteractions (a) humanmodel F(1 32) 5 136 ns (b) humanmodel byfrequency F(1 32) 5 047 ns (c) humanmodel by regularity F(1 32) 53028 P 00001 (d) humanmodel by regularity by frequency F(1 32) 5501 P 005

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 325

FIG 4 The regularity by frequency interaction averaged over blocks in humans and the eighthidden unit model Error bars reect 95 condence intervals

Generalisation So far we have described performance with traineditems However we also tested model output when the stimulus was thepattern for generalisation item (TesterP) along with activation of the pluralmarking I-unit 22 a state of input on which the models had never beentrained Table 1 shows performance of the different models at the end oftraining It is clear that the larger models have abstracted the regular pluralpattern and tend to apply it by default to the generalisation test item for the15HU model (a) average activation on the regular plural O-unit is 060 (b)mean RMS error comparing observed activation across O-units 22ndash32 andthe target regular plural pattern (10000000000) is just 045 and (c) four outof the ve exemplar runs of this size of model chose the regular pluralpattern as being the closest to observed output as measured by minimum

326 ELLIS AND SCHMIDT

TABLE 1Performance on the Target Regular Plural Pattern for the Four Sizes of Model When

Presented with the Generalisation Wug-test Item TesterP at End of Training

Model Size

Measure 3HUa 5HU 8HU 15HU

RMS errorb

M 081 079 053 045SD 043 050 045 032

Activation weightc

M 020 028 057 060SD 044 044 052 035

N hits (5)d 1 2 3 4

There were ve examplars of each size of model aHU 5 hidden units bRMS error calculatedagainst the target activation pattern across O-units 22ndash32 for the regular plural afx cActivationweight on the regular plural afx O-Unit dNumber of exemplar models (5) which chose theregular plural afx pattern for TesterP as indexed by output weights on O-units 22ndash32 beingclosest to the regular plural afx target pattern activation using a squared Euclidean distancemetric

squared Euclidean distance Thus when the larger models are presentedwith a plural stimulus which they have only ever previously experienced as asingle form there is a tendency for them to generalise and apply the regularplural morpheme (bu-) in the same way that humans might generalise thatthe plural of ldquowugrdquo is ldquowugsrdquo

Effects of Different Sizes of Model Figure 2 also illustrates the effects ofmanipulating computational capacity of model (1) Models with lowercomputational power ( 5 a smaller number of HUs) learn the high frequencyitems quite wellmdashalmost as well as the largest model (2) The most strikingeffect of varying the computational power of the models lies in their abilitiesto learn low frequency irregular itemsmdashthis is by far the most sensitive indexof morphological learning ability The 3HU model hardly manages to learnthese forms at all The 15HU model eventually learns them rather well (3)There is essentially no frequency effect for regular items in the highercomputational power models but none the less the frequency effect forirregular items remains strong (4) The smaller models continue to show afrequency effect for regular items at the end of training Table l provides oneadditional effect of model size (5) The greater the computational power ofthe models the more they operate in ldquorule-likerdquo way by abstracting aldquoregularrdquo plural form which is applied by default to novel items In sumwhile lower computational power models are reasonably good on highfrequency regular items they show frequency effects for irregular and

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 327

regular items are relatively poor on ldquowug testsrdquo and have particulardifculty on low frequency irregular items

Discussion of Simulations

We believe that at least for the issue of regularity and frequency effects inmorphosyntax this is to date the most complete quantitative analysis of theadequacy of t of simulation to human data We are not simply makingpredictions about how an underspecied model might behave (theDaugherty amp Seidenberg 1994 criticisms of the Pinker amp Prince 1988 andPinker 1991 theories) We are not simply demonstrating that simulation andhuman data alike exhibit rst order interactions of frequency and regularity(Daugherty amp Seidenberg 1994) Instead we are showing the parallelpatterns of signicance of main effects rst and second order interactions inANOVAs of simulation and human data and we are showing that thesimulations explain close to 80 of the relevant human data When we go asfar as actually comparing human and model performance in a multifactorialANOVA we nd some differences of detail in the size of interactions thatare qualied by the humanmodel factor But these differences of detail donot detract from the general success of the models in simulating the humanpattern of development of the frequency by regularity interaction Inhumans and models alike high frequency items were learned signicantlyfaster than low frequency ones regular items were learned signicantlyfaster than irregular ones there was a signicant frequency by regularityinteraction where the frequency effect was less for regular items than forirregular ones and this is qualied as the higher level interaction with blockwhereby there is a developmental trendmdashthe frequency effect for regularitems attenuates faster than that for irregular items

We have demonstrated that the models can generalise and produce thedefault plural afx for a novel stimulus Similar ldquowug testrdquo performance by ahuman learner would be taken as an operationalisation that they hadacquired the ldquoregularrdquo morphological systematicity

Finally we have shown how varying the computational capacity of themodels affects both the rate of acquisition of default case the presence orabsence of frequency effects for regular items and ability to acquireirregular items This is compatible with existing data for children withspecic language impairment (SLI) Oetting and Rice (1993) compared ve-year-old SLI children with age-matched controls on their ability to formplurals The SLI children were signicantly worse at generating regularplurals for nonce (5 wug) items they were worse at generating regularplurals and they showed an effect of frequency on the regular items whichthe control children because of ceiling effects did not UnfortunatelyOetting and Rice (1993) do not provide clear data on the childrenrsquos ability to

328 ELLIS AND SCHMIDT

form irregular plurals However their pattern of differences between SLIand control childrenrsquos performance on regular items is sufciently close tothat between the present low-capacity and high-capacity simulations tosuggest that morphosyntactic impairments in individuals with SLI might beexplained by reduced language processing capacity in a general associativememory network rather than by a hybrid account The SLI childrenrsquosshowing frequency effects for regular items is particularly compelling in thisrespect However further assessment of regularity by frequency effects anddefault abstraction in individuals with SLI and with Williams syndrome(whose ability on regular forms is said to outstrip their performance onirregularsmdashBellugi Bihrle Jernigan Trauner amp Dougherty 1990) isnecessary to test these parallels further (see Marchman 1993 for othersimulations of different types of language dysfunction)

GENERAL DISCUSSION

Fluent language users have processed many millions of utterances involvingtens of thousands of types presented as innumerable tokens It should comeas no surprise either that they demonstrate such effortless and complex skillas a result of this mass of practice or that researchers lacking any truerecord of the learnersrsquo experience are awed and confused by thesesophisticated grammatical abilities While we have no wish to deny any ofthe complexity of the nal uent state we suspect that much of the mysteryof morphology can be claried by focusing on the acquisition process ratherthan the end-point This has been our aim in this paper Our MAL is atravesty of natural language but at least we know the types and tokens in thelearnersrsquo language evidence and there is no need to speculate or argue aboutextrapolations from corpus data or assumptions about registers

Human learning of this MAL inectional morphology quickly culminatesin a state where as with natural language frequency and regularity haveinteractive effects on performance But as we chart acquisition it is clearthat this interaction need not imply complex dual-mechanisms of processingRather it simply reects the asymptotes expected from the power law ofpractice a simple associative law of learning Thus we have shown that oneof the most frequently introduced arguments for the necessity of adual-mechanism approach a frequency effect for irregulars and the absenceof such an effect for regulars is not a good argument at all Furthermore wehave demonstrated that a simple connectionist model as an implementationof associative learning provided with the same language evidenceaccurately simulates the human acquisition data

But how is the power law instantiated in human and connectionistsystems and what is being associated in the acquisition of inectional

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 329

morphosyntax The power law of learning in human performance has beeninterpreted as resulting from basic associative mechanisms involving theformation of new chunks and the effects of frequency on the accessibility ofthese representations (Newell 1990 Newell amp Rosenbloom 1981)Anderson and Schooler (1991) suggest that memory (both as its behaviouralexpression in error rate and latency and as its neural expression in LTP)displays properties such as the power law of learning because theseproperties reect an optimal response to the environment where theprobability of an item occurring at any particular time is a power function ofits past frequency of occurrence Neural activation which controlsbehaviour reects the probability of an item occurring in the environmentthus the neural processes are designed to adapt behaviour to the statisticalproperties of the environment (Anderson 1993) Connectionist systems aredesigned to do the same thing (Chater 1995)

In our simplied account of inectional morphology where phonologicalfactors are put to one side the relevant units for chunking are the stem formsand the plural afxes From an associative perspective regularity andfrequency are essentially the same factor under different names The rstmeaning of ldquoregularrdquo in the Pocket Oxford Dictionary involves ldquohabitualconstantrdquo acts a denition in terms of statistical frequencies consistencyand descriptive generalisation the second stresses ldquoconforming to a rule orprinciplerdquo We need to disentangle these senses (see Sharwood-Smith 1994and Lima Corrigan amp Iverson 1994 for conceptual analysis of ldquorules oflanguagerdquo) Whether regular morphology is generated according to a rule ornot it is certainly the case for English and the MAL under study here (andgenerally it is the default if not the universal casemdashwe will return to thismatter later) that regular afxes are more habitual or frequent And asdemonstrated in Fig 5 the power law of practice entails that an effect of aconstant increment of regularity (in its frequency sense) is much moreapparent at low than at high frequencies of practice

Although it is a general principle the degree to which it applies dependson a range of factors including (a) the exponent of the power function (b)the particular level of experience attained and thus the placement ofcomparison points on the learning curve and (c) the degree to whichfrequency and regularity are additive or multiplicative In the presentexperiment a vefold increase in the frequency of the regular items resultsin a (5 3 the number of regular items) increase in use of the regular afx avefold increase in the frequency of an irregular item results in merely avefold increase in the use of the irregular afx Thus frequency andregularity are interactive rather than additive But even if we allow forinteraction the function still results in greater regularity effects for lowfrequency itemsmdashjust as for example the power function

330 ELLIS AND SCHMIDT

FIG 5 A frequency by regularity interaction arising from additive contributions of regularity(solid horizontal arrows) and frequency (dotted horizontal arrows) inputting into anasymptoting power function Notice in particular the solid vertical bars measuring out the largeregularity effect at low frequencies and the much smaller one at high frequencies (Adaptedfrom Plaut McClelland Seidenberg amp Patterson 1994)

y 5 1 2 x2 2

asymptotes so does any power function

y 5 1 2 (xn)2 2

where n 0 the shape remains the same albeit stretched or condensedalong the horizontal axis Thus all associative accounts of morphologywhether they stress the importance of type or token frequency (Bybee 1995)in the determination of statistical regularity imply a frequency by regularityinteraction in performance

Plaut et al (1996) analyse the operation of connectionist networks in theparticular quasi-regular domain of spellingndashsound consistency in reading todemonstrate how the frequency by regularity interaction is a direct

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 331

consequence of the nonlinearity adaptivity and distributed representationproperties of learning and representation in PDP networks In what followswe will minimally rephrase their analysis as it applies to the quasi-regulardomain of inectional morphology In a connectionist network the weightchanges induced by an inputoutput pattern (IOP) on any training epochserve to reduce the error on that IOP The frequency of the IOP (and theunits it involves) is reected in how often it is presented to the network Thusword frequency directly amplies weight changes that are helpful to theIOP itself Consistency of the morphological inections of two stems isreected in the similarity of afx units that are co-activated in their IOPsFurthermore two inputs will induce similar weight changes to the extentthat they activate similar units In our MAL as an extreme case consistentforms all activate the same afx unit irregular ones each activate a differentidiosyncratic afx Given that the weight changes that are induced by eachIOP are superimposed on the weight changes for all other IOPs an IOPwill tend to be helped by the weight changes for IOPs whose inputoutputmappings are consistent with its own and hindered by the weight changesfor inconsistent IOPs Thus frequency and consistency sum because theyboth arise from similar weight changes that are simply added together duringtraining The weight changes result in corresponding increases in thesummed input to output units that should be active and decreases to thesummed units that should be inactive However due to the non-linearity ofthe input-output function of units these changes do not produce directlyproportionate reduction of error Rather as the magnitude of the summedinput to output units increases their states gradually asymptote towards10mdasha given increase in the summed input to a unit yields progressivelysmaller decrements in error over the course of training Thus althoughfrequency and regularity-as-consistency each contribute to the weights andhence to the summed input to units their effect on error is subjected to agradual ceiling effect as unit states are driven towards extremal values

Thus a connectionist associative account of simple morphosyntax as it isembodied in our MAL holds that learning involves associating inputpatterns representing single or plural concepts with stem and afx lemmasacross a large distributed network Frequency of experience increases thestrength of the appropriate IO associations Regularity effects stem fromconsistency the consistent items all involve pairings between plurality andthe regular lemma and thus regularity is frequency by another name Thenetwork sums and abstracts these consistencies but it does so usingnon-linear unit inputndashoutput functions thereby resulting in the frequency byregularity interaction Networks are not simple competitive chunking orMarkov chaining mechanisms working on surface form Their massivelydistributed nature allows the emergence of more abstract internalrepresentations We have argued that this analysis accounts for the human

332 ELLIS AND SCHMIDT

acquisition data of simple MAL morphosyntax quite well We believe thatthe acquisition of natural language morphosyntax where there are manyadditional factors of different phonological consistencies (of the type forexample where the neighbours sink drink and stink are irregular in theirpast tenses but all behave in the same -ankway) are equally conducive to theprinciples of this type of account although as illustrated in grandersimulation enterprises (Cottrell amp Plunkett 1994 Daugherty amp Seidenberg1994 MacWhinney amp Leinbach 1991 Marchman 1993 Plunkett ampMarchman 1993) the complexity of interaction of the factors that are therein the language evidence leads to much more complex developmentaloutcomes Our role here has been to study human acquisition underprecisely known circumstances and to demonstrate just how well aconnectionist associative account can simulate these data

A simple regularity5 consistency account of this type will have difculty ifthe ldquoregularrdquo or ldquodefaultrdquo case is not the most frequent case in a naturallanguage Although there is agreement for English past tense and formorphology more generally that the default case is more frequent theremay be exceptions Marcus et al (1995) argue that while the German particle-t applies to a much smaller percentage of verbs than its English counterpartand the German plural -s applies only to a small percentage of nounsnevertheless these afxes behave as defaults in the language These defaultsufxations in German could thus pose a problem for statistical orconnectionist accounts of the acquisition of the more frequent patterns asdefault since they may not be due to a large number of regular wordsreinforcing a pattern in associative memory (Prasada amp Pinker 1993)However this is still a matter of some debate Bybee (1995) suggests that amore reasonable method of counting German particle type frequency doesshow the default (or ldquoproductiverdquo) process to have the highest typefrequency She also argues that to a large extent the productivity patterns ofGerman plurals also reect their type frequency Nakisa and Hahn (1996)and Plunkett and Nakisa (in press) show that generalisation to unseen ornovel forms in German and Arabic (where there have also been claims for aminority default) is more accurately predicted by their phonologicalsimilarity to existing forms in the language (properly represented for typeand token frequency) rather than by the operation of a default rule FinallyHare Elman and Daugherty (1995) demonstrate that multilayerednetworks can develop a default category even in the absence of superior typefrequency as long as the non-default classes are well dened and narrowlydened so that they serve as strong prototypes for analogising to novelforms In such cases the area outside these well-dened attractor basins canconstitute a potential default (see also Plunkett amp Marchman 1991)

In the original hybrid model irregulars were stored and accessed fromrote memory Pinker and Prince (1994 p 326) modied this part of the

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 333

model arguing that since rote memory could not account (a) for similaritiesbetween the morphological base and irregular forms (eg swingndashswung) (b)for similarity within sets of base forms undergoing similar processes (egsingndashsang ringndashrang springndashsprang) or (c) for the kind of semi-productivityshown when children produce errors such as bringndashbrang or swingndashswangthe memory system underlying such productions must be associative anddynamic somewhat as connectionism portrays it Yet to account for datasuch as the frequencyregularity interaction this revised hybrid model stillholds that regular forms are rule-governed But a purely rule-based accountof regulars cannot explain false friends effects where regular inconsistentitems (eg bakendashbaked is similar in rhyme to neighbours makendashmade andtakendashtook which have inconsistent past tenses) are produced more slowlythat entirely regular ones (Daugherty amp Seidenberg 1994 Seidenberg ampBruck 1990) or frequency effects on regular forms (Oetting amp Rice 1993Stemberger amp MacWhinney 1986) Unlike connectionist models a rule-based account of regulars cannot explain these aspects of the human dataNor is the regularityfrequency interaction any reason to reject connectionistaccounts of morphosyntax in favour of a hybrid model

REFERENCESAnderson JR (1982) Acquisition of cognitive skill Psychological Review 89 369ndash406Anderson JR (1993) Rules of the mind Hillsdale NJ Lawrence Erlbaum Associates IncAnderson JR amp Schooler LJ (1991) Reections of the environment in memory

Psychological Science 2 396ndash408Beck M (1995) Tracking down the source of NSndashNNS differences in syntactic competence

Unpublished manuscript University of North TexasBellugi U Bihrle A Jernigan D Trauner D amp Dougherty S (1990)

Neuropsychological neurological and neuroanatomical prole of Williams SyndromeAmerican Journal of Medical Genetics 6 115ndash125

Braine MDS Brody RE Brooks PJ Sudhalter V Ross JA Catalano L amp FischSM (1990) Exploring language acquisition in children with a miniature articiallanguage Effects of item and pattern frequency arbitrary subclasses and correctionJournal of Memory and Language 29 591ndash610

Broeder P amp Plunkett K (1994) Connectionism and second language acquisition In NEllis (Ed) Implicit and explicit learning of languages (pp 421ndash454) London AcademicPress

Bybee J (1995) Regular morphology and the lexicon Language and Cognitive Processes10 425ndash455

Chater N (1995) Neural networks The new statistical models of mind In JP Levy DBairaktaris JA Bullinaria amp P Cairns (Eds) Connectionist models of memory andlanguage London UCL Press

Cohen JD MacWhinney B Flatt M amp Provost J (1993) PsyScope A new graphicinteractive environment for designing psychology experiments Behavioral ResearchMethods Instruments and Computers 25 257ndash271

Cottrell G amp Plunkett K (1994) Acquiring the mapping from meaning to soundsConnection Science 6 379ndash412

334 ELLIS AND SCHMIDT

Daugherty KG amp Seidenberg MS (1992) Rules or connections The past tense revisitedIn Proceedings of the 14th annual conference of the Cognitive Science Society (pp 259ndash264)Pittsburgh PA Cognitive Science Society

Daugherty KG amp Seidenberg MS (1994) Beyond rules and exceptions A connectionistapproach to inectional morphology In SD Lima RL Corrigan amp GK Iverson (Eds)The reality of linguistic rules (pp 353ndash388) Amsterdam John Benjamins

DeKeyser R (1997) Beyond explicit rule learning Automatizing second languagemorphosyntax Studies in Second Language Acquisition 19 195ndash222

Ellis NC (1996) Sequencing in SLA Phonological memory chunking and points of orderStudies in Second Language Acquisition 18 91ndash126

Eubank L amp Gregg KR (1995) ldquoEt in Amygdala Egordquo UG (S)LA and neurobiologyStudies in Second Language Acquisition 17 35ndash58

Hare M Elman JL amp Daugherty KG (1995) Default generalisation in connectionistnetworks Language and Cognitive Processes 10 601ndash630

Jung J (1971) The experimenterrsquos dilemma New York Harper amp RowKirsner K (1994) Implicit processes in second language learning In N Ellis (Ed) Implicit

and explicit learning of languages (pp 283ndash312) London Academic PressLachter J amp Bever T (1988) The relation between linguistic structure and associative

theories of language learning A constructive critique of some connectionist learningmodels Cognition 28 195ndash247

Lima SD Corrigan RL amp Iverson GK (Eds) (1994) The reality of linguistic rulesAmsterdam John Benjamins

MacWhinney B (1983) Miniature language systems as tests of use of universal operatingprinciples in second-language learning by children and adults Journal of PsycholinguisticResearch 12 467ndash478

MacWhinney B (1994) The dinosaurs and the ring In SD Lima RL Corrigan amp GKIverson (Eds) The reality of linguistic rules (pp 283ndash320) Amsterdam John Benjamins

MacWhinney B amp Leinbach J (1991) Implementations are not conceptualizationsRevising the verb learning model Cognition 40 121ndash157

Marchman VA (1993) Constraints on plasticity in a connectionist model of the Englishpast tense Journal of Cognitive Neuroscience 5 215ndash234

Marcus GF Brinkmann U Clahsen H Wiese R amp Pinker S (1995) Germaninection The exception that proves the rule Cognitive Psychology 29 198ndash256

McLaughlin B (1980) On the use of miniature articial languages in second-languageresearch Applied Psycholinguistics 1 357ndash369

Moeser SD amp Bregman AS (1972) The role of reference in the acquisition of a miniaturearticial language Journal of Verbal Learning and Verbal Behavior 11 759ndash769

Morgan JL Meier RP amp Newport EL (1987) Structural packaging in the input tolanguage learning Contributions of prosodic and morphological marking of phrases to theacquisition of language Cognitive Psychology 19 498ndash550

Morgan JL amp Newport EL (1981) The role of constituent structure in the induction of anarticial language Journal of Verbal Learning and Verbal Behavior 20 67ndash85

Morton J (1979) Facilitation in word recognition Experiments causing change in thelogogen model In PA Kolers ME Wrolstad amp M Bouma (Eds) Processing of visiblelanguage (pp 259ndash268) New York Plenum

Nakisa R amp Hahn U (1996) Where defaults donrsquot help The case of the German pluralsystem In Proceedings of the 18th annual conference of the Cognitive Science Society (pp177ndash182) Hillsdale NJ Lawrence Erlbaum Associates Inc

Newell A (1990) Unied theories of cognition Cambridge MA Harvard University PressNewell A amp Rosenbloom P (1981) Mechanisms of skill acquisition and the law of

practice In JR Anderson (Ed) Cognitive skills and their acquisition Hillsdale NJLawrence Erlbaum Associates Inc

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 335

Oetting JB amp Rice ML (1993) Plural acquisition in children with specic languageimpairment Journal of Speech and Hearing Research 36 1236ndash1248

Paivio A (1986) Mental representations A dual coding approach Oxford UK OxfordUniversity Press

Palermo DS amp Howe HE (1970) An experimental analogy to the learning of past-tenseinection rules Journal of Verbal Learning and Verbal Behavior 9 410ndash416

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S amp Prince A (1988) On language and connectionism Analysis of a parallel

distributed processing model of language acquisition Cognition 29 195ndash247Pinker S amp Prince A (1994) Regular and irregular morphology and the psychological

status of rules of grammar In SD Lima RL Corrigan amp GK Iverson (Eds) The reality oflinguistic rules (pp 321ndash351) Amsterdam John Benjamins

Plaut DC McClelland JL Seidenberg MS amp Patterson KE (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plunkett K amp Marchman V (1991) U-shaped learning and frequency effects in amulti-layered perceptron Implications for child language acquisition Cognition 38 3ndash102

Plunkett K amp Marchman V (1993) From rote learning to system building Acquiring verbmorphology in children and connectionist nets Cognition 48 21ndash69

Plunkett K amp Nakisa RC (in press) A connectionist model of Arabic plural systemLanguage and Cognitive Processes

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Prasada S Pinker S amp Snyder W (1990) Some evidence that irregular forms are retrievedfrom memory but regular forms are rule-governed Paper presented at the 31st meeting ofthe Psychonomic Society New Orleans November

Rumelhart D Hinton G amp Williams R (1986) Learning internal representations by backpropagation In DE Rumelhart amp JL McClelland (Ed) Parallel distributed processingExplorations in the microstructure of cognition Cambridge MA MIT Press

Rumelhart D amp McClelland J (1986) On learning the past tense of English verbs In DERumelhart amp JL McClelland (Eds) Parallel distributed processing Explorations in themicrostructure of cognition Vol 2 Psychological and biological models (pp 272ndash326)Cambridge MA MIT Press

Seidenberg MS amp Bruck M (1990) Consistency effects in the generation of past tensemorphology Paper presented at the 31st meeting of the Psychonomic Society New OrleansNovember

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behavior 23 383ndash404

Sharwood Smith MA (1994) The unruly world of language In N Ellis (Ed) Implicit andexplicit learning of languages (pp 33ndash44) London Academic Press

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Human Learning and Memory 6 174ndash215

Stemberger JP amp MacWhinney B (1986) Frequency and the lexical storage of regularlyinected forms Memory and Cognition 14 17ndash26

Winter B amp Reber AS (1994) Implicit learning and the acquisition of natural languagesIn N Ellis (Ed) Implicit and explicit learning of languages (pp 115ndash146) LondonAcademic Press

Yang LR amp Givon T (1997) Benets and drawbacks of controlled laboratory studies ofsecond language acquisition The Keck second language learning project Studies in SecondLanguage Acquisition 19 173ndash194

336 ELLIS AND SCHMIDT

APPENDIX

The Word-forms of the Articial Language

Picture Stem Plural Form Frequency Regularity

car garth bugarth 5 Rbed pid bupid 1 Rlamp lant bulant 5 Rtable tib butid 1 Rplane poon bupoon 5 Rball prill buprill 1 Rtrain dram budram 5 Rhouse hize buhize 1 Rbook bisk bubisk 5 Rbroom breen bubreen 1 Rphone feem gofeem 5 Iumbrella brol gubrol 1 Ichair charp zecharp 5 Ihorse naig zonaig 1 Imonkey chonk nuchonk 5 Idog woop niwoop 1 Ielephant fant vefant 5 Iscissors zoze vuzoze 1 Ikite kag rekag 5 Ish pisc ropisc 1 I

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 323

frequency items were learned faster than low frequency ones) regularity(regular items were learned faster than irregular ones) and a frequency byregularity interaction whereby there was much less regularity effect for highfrequency items than for low frequency items and equally that thefrequency effect was less for regular items than for irregular ones

ANOVAs on these RMS data for each size of model demonstrated thatthere was high consistency of response across items and examplesimulations For example when the 8HU model was analysed as a repeatedmeasures ANOVA across 15 roughly equally spaced blocks of training (toparallel the human data analysis) the following signicant effects wereobserved (a) Frequency [by simulations F(1 16) 5 2080 P 00005 bywords F(1 16) 5 5665 P 00001] (b) regularity [by simulations F(1 16)5 907 P 001 by words F(1 16) 5 3957 P 00001] (c) regularity byfrequency [by simulations F(1 16) 5 485 P 005 by words F(1 16) 51561 P 0005] (d) block [by simulations F(14 224) 5 6803 P 00001by words F(14 224) 5 14914 P 00001] (e) block by regularity [bysimulations F(14 224) 5 3675 P 00001 by words F(14 224) 5 2929 P 00001] (f) block by frequency [by simulations F(14 224) 5 1893 P 00001 by words F(14 224) 5 1184 P 00001] and (g) block by regularityby frequency [by simulations F(14 224) 5 1611 P 00001 by words F(14224) 5 1306 P 00001]

Comparison of this pattern of ANOVA effects with that reported earlierfor the human data shows important similarities in both cases there aresignicant main effects of frequency regularity and blocks and there aresignicant interactions involving regularity by frequency and regularity byfrequency by block Thus the connectionist models demonstrate effectswhich broadly parallel those found in humans

Comparison with Human Data More detailed comparison is alsopossible Although RMS error is the usual measure of model performancebecause it assesses how well the network learns to inhibit non-relevant unitsas well as to excite relevant ones we also extracted simple accuracy data forthe 8HU model This accuracy score is the amount of activation (between 0and 1) on the single O-unit which corresponds to the appropriate target afxfor that input pattern Figure 3 shows the performance of the 8HU modelusing this metric It is clear that accuracy scores generate a graph which iseffectively a reection in a horizontal plane of the RMS data shown in thethird panel of Fig 2 In fact in the current simulations correct activation isalmost perfectly correlated with MSE (for example r 5 2 0988 for the 8HUmodel) However the activation metric has the advantage of more readyinterpretation and direct comparison with the human data

When the 8HU model and the human data are aligned as in Fig 3 thesecorrespondences become clear Pairwise comparison of individual points

324 ELLIS AND SCHMIDT

FIG 3 A comparison of human accuracy performance and that of the eight hidden unitconnectionist simulation

across these two graphs by correlation shows that the simulation predicts alarge proportion of the variance in the human data (R2 5 078) There aresome differences in detailmdashas is claried in Fig 4 where performance isaveraged over blocks the model performs somewhat better on the regularitems and worse on the irregular items particularly the low frequencyirregular items than do the humans ANOVA (three factor [humanmodelregularity and frequency] with 15 blocks as repeated measures by wordsanalysis) comparing the human and 8HU model data conrms theseinteractions (a) humanmodel F(1 32) 5 136 ns (b) humanmodel byfrequency F(1 32) 5 047 ns (c) humanmodel by regularity F(1 32) 53028 P 00001 (d) humanmodel by regularity by frequency F(1 32) 5501 P 005

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 325

FIG 4 The regularity by frequency interaction averaged over blocks in humans and the eighthidden unit model Error bars reect 95 condence intervals

Generalisation So far we have described performance with traineditems However we also tested model output when the stimulus was thepattern for generalisation item (TesterP) along with activation of the pluralmarking I-unit 22 a state of input on which the models had never beentrained Table 1 shows performance of the different models at the end oftraining It is clear that the larger models have abstracted the regular pluralpattern and tend to apply it by default to the generalisation test item for the15HU model (a) average activation on the regular plural O-unit is 060 (b)mean RMS error comparing observed activation across O-units 22ndash32 andthe target regular plural pattern (10000000000) is just 045 and (c) four outof the ve exemplar runs of this size of model chose the regular pluralpattern as being the closest to observed output as measured by minimum

326 ELLIS AND SCHMIDT

TABLE 1Performance on the Target Regular Plural Pattern for the Four Sizes of Model When

Presented with the Generalisation Wug-test Item TesterP at End of Training

Model Size

Measure 3HUa 5HU 8HU 15HU

RMS errorb

M 081 079 053 045SD 043 050 045 032

Activation weightc

M 020 028 057 060SD 044 044 052 035

N hits (5)d 1 2 3 4

There were ve examplars of each size of model aHU 5 hidden units bRMS error calculatedagainst the target activation pattern across O-units 22ndash32 for the regular plural afx cActivationweight on the regular plural afx O-Unit dNumber of exemplar models (5) which chose theregular plural afx pattern for TesterP as indexed by output weights on O-units 22ndash32 beingclosest to the regular plural afx target pattern activation using a squared Euclidean distancemetric

squared Euclidean distance Thus when the larger models are presentedwith a plural stimulus which they have only ever previously experienced as asingle form there is a tendency for them to generalise and apply the regularplural morpheme (bu-) in the same way that humans might generalise thatthe plural of ldquowugrdquo is ldquowugsrdquo

Effects of Different Sizes of Model Figure 2 also illustrates the effects ofmanipulating computational capacity of model (1) Models with lowercomputational power ( 5 a smaller number of HUs) learn the high frequencyitems quite wellmdashalmost as well as the largest model (2) The most strikingeffect of varying the computational power of the models lies in their abilitiesto learn low frequency irregular itemsmdashthis is by far the most sensitive indexof morphological learning ability The 3HU model hardly manages to learnthese forms at all The 15HU model eventually learns them rather well (3)There is essentially no frequency effect for regular items in the highercomputational power models but none the less the frequency effect forirregular items remains strong (4) The smaller models continue to show afrequency effect for regular items at the end of training Table l provides oneadditional effect of model size (5) The greater the computational power ofthe models the more they operate in ldquorule-likerdquo way by abstracting aldquoregularrdquo plural form which is applied by default to novel items In sumwhile lower computational power models are reasonably good on highfrequency regular items they show frequency effects for irregular and

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 327

regular items are relatively poor on ldquowug testsrdquo and have particulardifculty on low frequency irregular items

Discussion of Simulations

We believe that at least for the issue of regularity and frequency effects inmorphosyntax this is to date the most complete quantitative analysis of theadequacy of t of simulation to human data We are not simply makingpredictions about how an underspecied model might behave (theDaugherty amp Seidenberg 1994 criticisms of the Pinker amp Prince 1988 andPinker 1991 theories) We are not simply demonstrating that simulation andhuman data alike exhibit rst order interactions of frequency and regularity(Daugherty amp Seidenberg 1994) Instead we are showing the parallelpatterns of signicance of main effects rst and second order interactions inANOVAs of simulation and human data and we are showing that thesimulations explain close to 80 of the relevant human data When we go asfar as actually comparing human and model performance in a multifactorialANOVA we nd some differences of detail in the size of interactions thatare qualied by the humanmodel factor But these differences of detail donot detract from the general success of the models in simulating the humanpattern of development of the frequency by regularity interaction Inhumans and models alike high frequency items were learned signicantlyfaster than low frequency ones regular items were learned signicantlyfaster than irregular ones there was a signicant frequency by regularityinteraction where the frequency effect was less for regular items than forirregular ones and this is qualied as the higher level interaction with blockwhereby there is a developmental trendmdashthe frequency effect for regularitems attenuates faster than that for irregular items

We have demonstrated that the models can generalise and produce thedefault plural afx for a novel stimulus Similar ldquowug testrdquo performance by ahuman learner would be taken as an operationalisation that they hadacquired the ldquoregularrdquo morphological systematicity

Finally we have shown how varying the computational capacity of themodels affects both the rate of acquisition of default case the presence orabsence of frequency effects for regular items and ability to acquireirregular items This is compatible with existing data for children withspecic language impairment (SLI) Oetting and Rice (1993) compared ve-year-old SLI children with age-matched controls on their ability to formplurals The SLI children were signicantly worse at generating regularplurals for nonce (5 wug) items they were worse at generating regularplurals and they showed an effect of frequency on the regular items whichthe control children because of ceiling effects did not UnfortunatelyOetting and Rice (1993) do not provide clear data on the childrenrsquos ability to

328 ELLIS AND SCHMIDT

form irregular plurals However their pattern of differences between SLIand control childrenrsquos performance on regular items is sufciently close tothat between the present low-capacity and high-capacity simulations tosuggest that morphosyntactic impairments in individuals with SLI might beexplained by reduced language processing capacity in a general associativememory network rather than by a hybrid account The SLI childrenrsquosshowing frequency effects for regular items is particularly compelling in thisrespect However further assessment of regularity by frequency effects anddefault abstraction in individuals with SLI and with Williams syndrome(whose ability on regular forms is said to outstrip their performance onirregularsmdashBellugi Bihrle Jernigan Trauner amp Dougherty 1990) isnecessary to test these parallels further (see Marchman 1993 for othersimulations of different types of language dysfunction)

GENERAL DISCUSSION

Fluent language users have processed many millions of utterances involvingtens of thousands of types presented as innumerable tokens It should comeas no surprise either that they demonstrate such effortless and complex skillas a result of this mass of practice or that researchers lacking any truerecord of the learnersrsquo experience are awed and confused by thesesophisticated grammatical abilities While we have no wish to deny any ofthe complexity of the nal uent state we suspect that much of the mysteryof morphology can be claried by focusing on the acquisition process ratherthan the end-point This has been our aim in this paper Our MAL is atravesty of natural language but at least we know the types and tokens in thelearnersrsquo language evidence and there is no need to speculate or argue aboutextrapolations from corpus data or assumptions about registers

Human learning of this MAL inectional morphology quickly culminatesin a state where as with natural language frequency and regularity haveinteractive effects on performance But as we chart acquisition it is clearthat this interaction need not imply complex dual-mechanisms of processingRather it simply reects the asymptotes expected from the power law ofpractice a simple associative law of learning Thus we have shown that oneof the most frequently introduced arguments for the necessity of adual-mechanism approach a frequency effect for irregulars and the absenceof such an effect for regulars is not a good argument at all Furthermore wehave demonstrated that a simple connectionist model as an implementationof associative learning provided with the same language evidenceaccurately simulates the human acquisition data

But how is the power law instantiated in human and connectionistsystems and what is being associated in the acquisition of inectional

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 329

morphosyntax The power law of learning in human performance has beeninterpreted as resulting from basic associative mechanisms involving theformation of new chunks and the effects of frequency on the accessibility ofthese representations (Newell 1990 Newell amp Rosenbloom 1981)Anderson and Schooler (1991) suggest that memory (both as its behaviouralexpression in error rate and latency and as its neural expression in LTP)displays properties such as the power law of learning because theseproperties reect an optimal response to the environment where theprobability of an item occurring at any particular time is a power function ofits past frequency of occurrence Neural activation which controlsbehaviour reects the probability of an item occurring in the environmentthus the neural processes are designed to adapt behaviour to the statisticalproperties of the environment (Anderson 1993) Connectionist systems aredesigned to do the same thing (Chater 1995)

In our simplied account of inectional morphology where phonologicalfactors are put to one side the relevant units for chunking are the stem formsand the plural afxes From an associative perspective regularity andfrequency are essentially the same factor under different names The rstmeaning of ldquoregularrdquo in the Pocket Oxford Dictionary involves ldquohabitualconstantrdquo acts a denition in terms of statistical frequencies consistencyand descriptive generalisation the second stresses ldquoconforming to a rule orprinciplerdquo We need to disentangle these senses (see Sharwood-Smith 1994and Lima Corrigan amp Iverson 1994 for conceptual analysis of ldquorules oflanguagerdquo) Whether regular morphology is generated according to a rule ornot it is certainly the case for English and the MAL under study here (andgenerally it is the default if not the universal casemdashwe will return to thismatter later) that regular afxes are more habitual or frequent And asdemonstrated in Fig 5 the power law of practice entails that an effect of aconstant increment of regularity (in its frequency sense) is much moreapparent at low than at high frequencies of practice

Although it is a general principle the degree to which it applies dependson a range of factors including (a) the exponent of the power function (b)the particular level of experience attained and thus the placement ofcomparison points on the learning curve and (c) the degree to whichfrequency and regularity are additive or multiplicative In the presentexperiment a vefold increase in the frequency of the regular items resultsin a (5 3 the number of regular items) increase in use of the regular afx avefold increase in the frequency of an irregular item results in merely avefold increase in the use of the irregular afx Thus frequency andregularity are interactive rather than additive But even if we allow forinteraction the function still results in greater regularity effects for lowfrequency itemsmdashjust as for example the power function

330 ELLIS AND SCHMIDT

FIG 5 A frequency by regularity interaction arising from additive contributions of regularity(solid horizontal arrows) and frequency (dotted horizontal arrows) inputting into anasymptoting power function Notice in particular the solid vertical bars measuring out the largeregularity effect at low frequencies and the much smaller one at high frequencies (Adaptedfrom Plaut McClelland Seidenberg amp Patterson 1994)

y 5 1 2 x2 2

asymptotes so does any power function

y 5 1 2 (xn)2 2

where n 0 the shape remains the same albeit stretched or condensedalong the horizontal axis Thus all associative accounts of morphologywhether they stress the importance of type or token frequency (Bybee 1995)in the determination of statistical regularity imply a frequency by regularityinteraction in performance

Plaut et al (1996) analyse the operation of connectionist networks in theparticular quasi-regular domain of spellingndashsound consistency in reading todemonstrate how the frequency by regularity interaction is a direct

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 331

consequence of the nonlinearity adaptivity and distributed representationproperties of learning and representation in PDP networks In what followswe will minimally rephrase their analysis as it applies to the quasi-regulardomain of inectional morphology In a connectionist network the weightchanges induced by an inputoutput pattern (IOP) on any training epochserve to reduce the error on that IOP The frequency of the IOP (and theunits it involves) is reected in how often it is presented to the network Thusword frequency directly amplies weight changes that are helpful to theIOP itself Consistency of the morphological inections of two stems isreected in the similarity of afx units that are co-activated in their IOPsFurthermore two inputs will induce similar weight changes to the extentthat they activate similar units In our MAL as an extreme case consistentforms all activate the same afx unit irregular ones each activate a differentidiosyncratic afx Given that the weight changes that are induced by eachIOP are superimposed on the weight changes for all other IOPs an IOPwill tend to be helped by the weight changes for IOPs whose inputoutputmappings are consistent with its own and hindered by the weight changesfor inconsistent IOPs Thus frequency and consistency sum because theyboth arise from similar weight changes that are simply added together duringtraining The weight changes result in corresponding increases in thesummed input to output units that should be active and decreases to thesummed units that should be inactive However due to the non-linearity ofthe input-output function of units these changes do not produce directlyproportionate reduction of error Rather as the magnitude of the summedinput to output units increases their states gradually asymptote towards10mdasha given increase in the summed input to a unit yields progressivelysmaller decrements in error over the course of training Thus althoughfrequency and regularity-as-consistency each contribute to the weights andhence to the summed input to units their effect on error is subjected to agradual ceiling effect as unit states are driven towards extremal values

Thus a connectionist associative account of simple morphosyntax as it isembodied in our MAL holds that learning involves associating inputpatterns representing single or plural concepts with stem and afx lemmasacross a large distributed network Frequency of experience increases thestrength of the appropriate IO associations Regularity effects stem fromconsistency the consistent items all involve pairings between plurality andthe regular lemma and thus regularity is frequency by another name Thenetwork sums and abstracts these consistencies but it does so usingnon-linear unit inputndashoutput functions thereby resulting in the frequency byregularity interaction Networks are not simple competitive chunking orMarkov chaining mechanisms working on surface form Their massivelydistributed nature allows the emergence of more abstract internalrepresentations We have argued that this analysis accounts for the human

332 ELLIS AND SCHMIDT

acquisition data of simple MAL morphosyntax quite well We believe thatthe acquisition of natural language morphosyntax where there are manyadditional factors of different phonological consistencies (of the type forexample where the neighbours sink drink and stink are irregular in theirpast tenses but all behave in the same -ankway) are equally conducive to theprinciples of this type of account although as illustrated in grandersimulation enterprises (Cottrell amp Plunkett 1994 Daugherty amp Seidenberg1994 MacWhinney amp Leinbach 1991 Marchman 1993 Plunkett ampMarchman 1993) the complexity of interaction of the factors that are therein the language evidence leads to much more complex developmentaloutcomes Our role here has been to study human acquisition underprecisely known circumstances and to demonstrate just how well aconnectionist associative account can simulate these data

A simple regularity5 consistency account of this type will have difculty ifthe ldquoregularrdquo or ldquodefaultrdquo case is not the most frequent case in a naturallanguage Although there is agreement for English past tense and formorphology more generally that the default case is more frequent theremay be exceptions Marcus et al (1995) argue that while the German particle-t applies to a much smaller percentage of verbs than its English counterpartand the German plural -s applies only to a small percentage of nounsnevertheless these afxes behave as defaults in the language These defaultsufxations in German could thus pose a problem for statistical orconnectionist accounts of the acquisition of the more frequent patterns asdefault since they may not be due to a large number of regular wordsreinforcing a pattern in associative memory (Prasada amp Pinker 1993)However this is still a matter of some debate Bybee (1995) suggests that amore reasonable method of counting German particle type frequency doesshow the default (or ldquoproductiverdquo) process to have the highest typefrequency She also argues that to a large extent the productivity patterns ofGerman plurals also reect their type frequency Nakisa and Hahn (1996)and Plunkett and Nakisa (in press) show that generalisation to unseen ornovel forms in German and Arabic (where there have also been claims for aminority default) is more accurately predicted by their phonologicalsimilarity to existing forms in the language (properly represented for typeand token frequency) rather than by the operation of a default rule FinallyHare Elman and Daugherty (1995) demonstrate that multilayerednetworks can develop a default category even in the absence of superior typefrequency as long as the non-default classes are well dened and narrowlydened so that they serve as strong prototypes for analogising to novelforms In such cases the area outside these well-dened attractor basins canconstitute a potential default (see also Plunkett amp Marchman 1991)

In the original hybrid model irregulars were stored and accessed fromrote memory Pinker and Prince (1994 p 326) modied this part of the

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 333

model arguing that since rote memory could not account (a) for similaritiesbetween the morphological base and irregular forms (eg swingndashswung) (b)for similarity within sets of base forms undergoing similar processes (egsingndashsang ringndashrang springndashsprang) or (c) for the kind of semi-productivityshown when children produce errors such as bringndashbrang or swingndashswangthe memory system underlying such productions must be associative anddynamic somewhat as connectionism portrays it Yet to account for datasuch as the frequencyregularity interaction this revised hybrid model stillholds that regular forms are rule-governed But a purely rule-based accountof regulars cannot explain false friends effects where regular inconsistentitems (eg bakendashbaked is similar in rhyme to neighbours makendashmade andtakendashtook which have inconsistent past tenses) are produced more slowlythat entirely regular ones (Daugherty amp Seidenberg 1994 Seidenberg ampBruck 1990) or frequency effects on regular forms (Oetting amp Rice 1993Stemberger amp MacWhinney 1986) Unlike connectionist models a rule-based account of regulars cannot explain these aspects of the human dataNor is the regularityfrequency interaction any reason to reject connectionistaccounts of morphosyntax in favour of a hybrid model

REFERENCESAnderson JR (1982) Acquisition of cognitive skill Psychological Review 89 369ndash406Anderson JR (1993) Rules of the mind Hillsdale NJ Lawrence Erlbaum Associates IncAnderson JR amp Schooler LJ (1991) Reections of the environment in memory

Psychological Science 2 396ndash408Beck M (1995) Tracking down the source of NSndashNNS differences in syntactic competence

Unpublished manuscript University of North TexasBellugi U Bihrle A Jernigan D Trauner D amp Dougherty S (1990)

Neuropsychological neurological and neuroanatomical prole of Williams SyndromeAmerican Journal of Medical Genetics 6 115ndash125

Braine MDS Brody RE Brooks PJ Sudhalter V Ross JA Catalano L amp FischSM (1990) Exploring language acquisition in children with a miniature articiallanguage Effects of item and pattern frequency arbitrary subclasses and correctionJournal of Memory and Language 29 591ndash610

Broeder P amp Plunkett K (1994) Connectionism and second language acquisition In NEllis (Ed) Implicit and explicit learning of languages (pp 421ndash454) London AcademicPress

Bybee J (1995) Regular morphology and the lexicon Language and Cognitive Processes10 425ndash455

Chater N (1995) Neural networks The new statistical models of mind In JP Levy DBairaktaris JA Bullinaria amp P Cairns (Eds) Connectionist models of memory andlanguage London UCL Press

Cohen JD MacWhinney B Flatt M amp Provost J (1993) PsyScope A new graphicinteractive environment for designing psychology experiments Behavioral ResearchMethods Instruments and Computers 25 257ndash271

Cottrell G amp Plunkett K (1994) Acquiring the mapping from meaning to soundsConnection Science 6 379ndash412

334 ELLIS AND SCHMIDT

Daugherty KG amp Seidenberg MS (1992) Rules or connections The past tense revisitedIn Proceedings of the 14th annual conference of the Cognitive Science Society (pp 259ndash264)Pittsburgh PA Cognitive Science Society

Daugherty KG amp Seidenberg MS (1994) Beyond rules and exceptions A connectionistapproach to inectional morphology In SD Lima RL Corrigan amp GK Iverson (Eds)The reality of linguistic rules (pp 353ndash388) Amsterdam John Benjamins

DeKeyser R (1997) Beyond explicit rule learning Automatizing second languagemorphosyntax Studies in Second Language Acquisition 19 195ndash222

Ellis NC (1996) Sequencing in SLA Phonological memory chunking and points of orderStudies in Second Language Acquisition 18 91ndash126

Eubank L amp Gregg KR (1995) ldquoEt in Amygdala Egordquo UG (S)LA and neurobiologyStudies in Second Language Acquisition 17 35ndash58

Hare M Elman JL amp Daugherty KG (1995) Default generalisation in connectionistnetworks Language and Cognitive Processes 10 601ndash630

Jung J (1971) The experimenterrsquos dilemma New York Harper amp RowKirsner K (1994) Implicit processes in second language learning In N Ellis (Ed) Implicit

and explicit learning of languages (pp 283ndash312) London Academic PressLachter J amp Bever T (1988) The relation between linguistic structure and associative

theories of language learning A constructive critique of some connectionist learningmodels Cognition 28 195ndash247

Lima SD Corrigan RL amp Iverson GK (Eds) (1994) The reality of linguistic rulesAmsterdam John Benjamins

MacWhinney B (1983) Miniature language systems as tests of use of universal operatingprinciples in second-language learning by children and adults Journal of PsycholinguisticResearch 12 467ndash478

MacWhinney B (1994) The dinosaurs and the ring In SD Lima RL Corrigan amp GKIverson (Eds) The reality of linguistic rules (pp 283ndash320) Amsterdam John Benjamins

MacWhinney B amp Leinbach J (1991) Implementations are not conceptualizationsRevising the verb learning model Cognition 40 121ndash157

Marchman VA (1993) Constraints on plasticity in a connectionist model of the Englishpast tense Journal of Cognitive Neuroscience 5 215ndash234

Marcus GF Brinkmann U Clahsen H Wiese R amp Pinker S (1995) Germaninection The exception that proves the rule Cognitive Psychology 29 198ndash256

McLaughlin B (1980) On the use of miniature articial languages in second-languageresearch Applied Psycholinguistics 1 357ndash369

Moeser SD amp Bregman AS (1972) The role of reference in the acquisition of a miniaturearticial language Journal of Verbal Learning and Verbal Behavior 11 759ndash769

Morgan JL Meier RP amp Newport EL (1987) Structural packaging in the input tolanguage learning Contributions of prosodic and morphological marking of phrases to theacquisition of language Cognitive Psychology 19 498ndash550

Morgan JL amp Newport EL (1981) The role of constituent structure in the induction of anarticial language Journal of Verbal Learning and Verbal Behavior 20 67ndash85

Morton J (1979) Facilitation in word recognition Experiments causing change in thelogogen model In PA Kolers ME Wrolstad amp M Bouma (Eds) Processing of visiblelanguage (pp 259ndash268) New York Plenum

Nakisa R amp Hahn U (1996) Where defaults donrsquot help The case of the German pluralsystem In Proceedings of the 18th annual conference of the Cognitive Science Society (pp177ndash182) Hillsdale NJ Lawrence Erlbaum Associates Inc

Newell A (1990) Unied theories of cognition Cambridge MA Harvard University PressNewell A amp Rosenbloom P (1981) Mechanisms of skill acquisition and the law of

practice In JR Anderson (Ed) Cognitive skills and their acquisition Hillsdale NJLawrence Erlbaum Associates Inc

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 335

Oetting JB amp Rice ML (1993) Plural acquisition in children with specic languageimpairment Journal of Speech and Hearing Research 36 1236ndash1248

Paivio A (1986) Mental representations A dual coding approach Oxford UK OxfordUniversity Press

Palermo DS amp Howe HE (1970) An experimental analogy to the learning of past-tenseinection rules Journal of Verbal Learning and Verbal Behavior 9 410ndash416

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S amp Prince A (1988) On language and connectionism Analysis of a parallel

distributed processing model of language acquisition Cognition 29 195ndash247Pinker S amp Prince A (1994) Regular and irregular morphology and the psychological

status of rules of grammar In SD Lima RL Corrigan amp GK Iverson (Eds) The reality oflinguistic rules (pp 321ndash351) Amsterdam John Benjamins

Plaut DC McClelland JL Seidenberg MS amp Patterson KE (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plunkett K amp Marchman V (1991) U-shaped learning and frequency effects in amulti-layered perceptron Implications for child language acquisition Cognition 38 3ndash102

Plunkett K amp Marchman V (1993) From rote learning to system building Acquiring verbmorphology in children and connectionist nets Cognition 48 21ndash69

Plunkett K amp Nakisa RC (in press) A connectionist model of Arabic plural systemLanguage and Cognitive Processes

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Prasada S Pinker S amp Snyder W (1990) Some evidence that irregular forms are retrievedfrom memory but regular forms are rule-governed Paper presented at the 31st meeting ofthe Psychonomic Society New Orleans November

Rumelhart D Hinton G amp Williams R (1986) Learning internal representations by backpropagation In DE Rumelhart amp JL McClelland (Ed) Parallel distributed processingExplorations in the microstructure of cognition Cambridge MA MIT Press

Rumelhart D amp McClelland J (1986) On learning the past tense of English verbs In DERumelhart amp JL McClelland (Eds) Parallel distributed processing Explorations in themicrostructure of cognition Vol 2 Psychological and biological models (pp 272ndash326)Cambridge MA MIT Press

Seidenberg MS amp Bruck M (1990) Consistency effects in the generation of past tensemorphology Paper presented at the 31st meeting of the Psychonomic Society New OrleansNovember

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behavior 23 383ndash404

Sharwood Smith MA (1994) The unruly world of language In N Ellis (Ed) Implicit andexplicit learning of languages (pp 33ndash44) London Academic Press

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Human Learning and Memory 6 174ndash215

Stemberger JP amp MacWhinney B (1986) Frequency and the lexical storage of regularlyinected forms Memory and Cognition 14 17ndash26

Winter B amp Reber AS (1994) Implicit learning and the acquisition of natural languagesIn N Ellis (Ed) Implicit and explicit learning of languages (pp 115ndash146) LondonAcademic Press

Yang LR amp Givon T (1997) Benets and drawbacks of controlled laboratory studies ofsecond language acquisition The Keck second language learning project Studies in SecondLanguage Acquisition 19 173ndash194

336 ELLIS AND SCHMIDT

APPENDIX

The Word-forms of the Articial Language

Picture Stem Plural Form Frequency Regularity

car garth bugarth 5 Rbed pid bupid 1 Rlamp lant bulant 5 Rtable tib butid 1 Rplane poon bupoon 5 Rball prill buprill 1 Rtrain dram budram 5 Rhouse hize buhize 1 Rbook bisk bubisk 5 Rbroom breen bubreen 1 Rphone feem gofeem 5 Iumbrella brol gubrol 1 Ichair charp zecharp 5 Ihorse naig zonaig 1 Imonkey chonk nuchonk 5 Idog woop niwoop 1 Ielephant fant vefant 5 Iscissors zoze vuzoze 1 Ikite kag rekag 5 Ish pisc ropisc 1 I

324 ELLIS AND SCHMIDT

FIG 3 A comparison of human accuracy performance and that of the eight hidden unitconnectionist simulation

across these two graphs by correlation shows that the simulation predicts alarge proportion of the variance in the human data (R2 5 078) There aresome differences in detailmdashas is claried in Fig 4 where performance isaveraged over blocks the model performs somewhat better on the regularitems and worse on the irregular items particularly the low frequencyirregular items than do the humans ANOVA (three factor [humanmodelregularity and frequency] with 15 blocks as repeated measures by wordsanalysis) comparing the human and 8HU model data conrms theseinteractions (a) humanmodel F(1 32) 5 136 ns (b) humanmodel byfrequency F(1 32) 5 047 ns (c) humanmodel by regularity F(1 32) 53028 P 00001 (d) humanmodel by regularity by frequency F(1 32) 5501 P 005

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 325

FIG 4 The regularity by frequency interaction averaged over blocks in humans and the eighthidden unit model Error bars reect 95 condence intervals

Generalisation So far we have described performance with traineditems However we also tested model output when the stimulus was thepattern for generalisation item (TesterP) along with activation of the pluralmarking I-unit 22 a state of input on which the models had never beentrained Table 1 shows performance of the different models at the end oftraining It is clear that the larger models have abstracted the regular pluralpattern and tend to apply it by default to the generalisation test item for the15HU model (a) average activation on the regular plural O-unit is 060 (b)mean RMS error comparing observed activation across O-units 22ndash32 andthe target regular plural pattern (10000000000) is just 045 and (c) four outof the ve exemplar runs of this size of model chose the regular pluralpattern as being the closest to observed output as measured by minimum

326 ELLIS AND SCHMIDT

TABLE 1Performance on the Target Regular Plural Pattern for the Four Sizes of Model When

Presented with the Generalisation Wug-test Item TesterP at End of Training

Model Size

Measure 3HUa 5HU 8HU 15HU

RMS errorb

M 081 079 053 045SD 043 050 045 032

Activation weightc

M 020 028 057 060SD 044 044 052 035

N hits (5)d 1 2 3 4

There were ve examplars of each size of model aHU 5 hidden units bRMS error calculatedagainst the target activation pattern across O-units 22ndash32 for the regular plural afx cActivationweight on the regular plural afx O-Unit dNumber of exemplar models (5) which chose theregular plural afx pattern for TesterP as indexed by output weights on O-units 22ndash32 beingclosest to the regular plural afx target pattern activation using a squared Euclidean distancemetric

squared Euclidean distance Thus when the larger models are presentedwith a plural stimulus which they have only ever previously experienced as asingle form there is a tendency for them to generalise and apply the regularplural morpheme (bu-) in the same way that humans might generalise thatthe plural of ldquowugrdquo is ldquowugsrdquo

Effects of Different Sizes of Model Figure 2 also illustrates the effects ofmanipulating computational capacity of model (1) Models with lowercomputational power ( 5 a smaller number of HUs) learn the high frequencyitems quite wellmdashalmost as well as the largest model (2) The most strikingeffect of varying the computational power of the models lies in their abilitiesto learn low frequency irregular itemsmdashthis is by far the most sensitive indexof morphological learning ability The 3HU model hardly manages to learnthese forms at all The 15HU model eventually learns them rather well (3)There is essentially no frequency effect for regular items in the highercomputational power models but none the less the frequency effect forirregular items remains strong (4) The smaller models continue to show afrequency effect for regular items at the end of training Table l provides oneadditional effect of model size (5) The greater the computational power ofthe models the more they operate in ldquorule-likerdquo way by abstracting aldquoregularrdquo plural form which is applied by default to novel items In sumwhile lower computational power models are reasonably good on highfrequency regular items they show frequency effects for irregular and

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 327

regular items are relatively poor on ldquowug testsrdquo and have particulardifculty on low frequency irregular items

Discussion of Simulations

We believe that at least for the issue of regularity and frequency effects inmorphosyntax this is to date the most complete quantitative analysis of theadequacy of t of simulation to human data We are not simply makingpredictions about how an underspecied model might behave (theDaugherty amp Seidenberg 1994 criticisms of the Pinker amp Prince 1988 andPinker 1991 theories) We are not simply demonstrating that simulation andhuman data alike exhibit rst order interactions of frequency and regularity(Daugherty amp Seidenberg 1994) Instead we are showing the parallelpatterns of signicance of main effects rst and second order interactions inANOVAs of simulation and human data and we are showing that thesimulations explain close to 80 of the relevant human data When we go asfar as actually comparing human and model performance in a multifactorialANOVA we nd some differences of detail in the size of interactions thatare qualied by the humanmodel factor But these differences of detail donot detract from the general success of the models in simulating the humanpattern of development of the frequency by regularity interaction Inhumans and models alike high frequency items were learned signicantlyfaster than low frequency ones regular items were learned signicantlyfaster than irregular ones there was a signicant frequency by regularityinteraction where the frequency effect was less for regular items than forirregular ones and this is qualied as the higher level interaction with blockwhereby there is a developmental trendmdashthe frequency effect for regularitems attenuates faster than that for irregular items

We have demonstrated that the models can generalise and produce thedefault plural afx for a novel stimulus Similar ldquowug testrdquo performance by ahuman learner would be taken as an operationalisation that they hadacquired the ldquoregularrdquo morphological systematicity

Finally we have shown how varying the computational capacity of themodels affects both the rate of acquisition of default case the presence orabsence of frequency effects for regular items and ability to acquireirregular items This is compatible with existing data for children withspecic language impairment (SLI) Oetting and Rice (1993) compared ve-year-old SLI children with age-matched controls on their ability to formplurals The SLI children were signicantly worse at generating regularplurals for nonce (5 wug) items they were worse at generating regularplurals and they showed an effect of frequency on the regular items whichthe control children because of ceiling effects did not UnfortunatelyOetting and Rice (1993) do not provide clear data on the childrenrsquos ability to

328 ELLIS AND SCHMIDT

form irregular plurals However their pattern of differences between SLIand control childrenrsquos performance on regular items is sufciently close tothat between the present low-capacity and high-capacity simulations tosuggest that morphosyntactic impairments in individuals with SLI might beexplained by reduced language processing capacity in a general associativememory network rather than by a hybrid account The SLI childrenrsquosshowing frequency effects for regular items is particularly compelling in thisrespect However further assessment of regularity by frequency effects anddefault abstraction in individuals with SLI and with Williams syndrome(whose ability on regular forms is said to outstrip their performance onirregularsmdashBellugi Bihrle Jernigan Trauner amp Dougherty 1990) isnecessary to test these parallels further (see Marchman 1993 for othersimulations of different types of language dysfunction)

GENERAL DISCUSSION

Fluent language users have processed many millions of utterances involvingtens of thousands of types presented as innumerable tokens It should comeas no surprise either that they demonstrate such effortless and complex skillas a result of this mass of practice or that researchers lacking any truerecord of the learnersrsquo experience are awed and confused by thesesophisticated grammatical abilities While we have no wish to deny any ofthe complexity of the nal uent state we suspect that much of the mysteryof morphology can be claried by focusing on the acquisition process ratherthan the end-point This has been our aim in this paper Our MAL is atravesty of natural language but at least we know the types and tokens in thelearnersrsquo language evidence and there is no need to speculate or argue aboutextrapolations from corpus data or assumptions about registers

Human learning of this MAL inectional morphology quickly culminatesin a state where as with natural language frequency and regularity haveinteractive effects on performance But as we chart acquisition it is clearthat this interaction need not imply complex dual-mechanisms of processingRather it simply reects the asymptotes expected from the power law ofpractice a simple associative law of learning Thus we have shown that oneof the most frequently introduced arguments for the necessity of adual-mechanism approach a frequency effect for irregulars and the absenceof such an effect for regulars is not a good argument at all Furthermore wehave demonstrated that a simple connectionist model as an implementationof associative learning provided with the same language evidenceaccurately simulates the human acquisition data

But how is the power law instantiated in human and connectionistsystems and what is being associated in the acquisition of inectional

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 329

morphosyntax The power law of learning in human performance has beeninterpreted as resulting from basic associative mechanisms involving theformation of new chunks and the effects of frequency on the accessibility ofthese representations (Newell 1990 Newell amp Rosenbloom 1981)Anderson and Schooler (1991) suggest that memory (both as its behaviouralexpression in error rate and latency and as its neural expression in LTP)displays properties such as the power law of learning because theseproperties reect an optimal response to the environment where theprobability of an item occurring at any particular time is a power function ofits past frequency of occurrence Neural activation which controlsbehaviour reects the probability of an item occurring in the environmentthus the neural processes are designed to adapt behaviour to the statisticalproperties of the environment (Anderson 1993) Connectionist systems aredesigned to do the same thing (Chater 1995)

In our simplied account of inectional morphology where phonologicalfactors are put to one side the relevant units for chunking are the stem formsand the plural afxes From an associative perspective regularity andfrequency are essentially the same factor under different names The rstmeaning of ldquoregularrdquo in the Pocket Oxford Dictionary involves ldquohabitualconstantrdquo acts a denition in terms of statistical frequencies consistencyand descriptive generalisation the second stresses ldquoconforming to a rule orprinciplerdquo We need to disentangle these senses (see Sharwood-Smith 1994and Lima Corrigan amp Iverson 1994 for conceptual analysis of ldquorules oflanguagerdquo) Whether regular morphology is generated according to a rule ornot it is certainly the case for English and the MAL under study here (andgenerally it is the default if not the universal casemdashwe will return to thismatter later) that regular afxes are more habitual or frequent And asdemonstrated in Fig 5 the power law of practice entails that an effect of aconstant increment of regularity (in its frequency sense) is much moreapparent at low than at high frequencies of practice

Although it is a general principle the degree to which it applies dependson a range of factors including (a) the exponent of the power function (b)the particular level of experience attained and thus the placement ofcomparison points on the learning curve and (c) the degree to whichfrequency and regularity are additive or multiplicative In the presentexperiment a vefold increase in the frequency of the regular items resultsin a (5 3 the number of regular items) increase in use of the regular afx avefold increase in the frequency of an irregular item results in merely avefold increase in the use of the irregular afx Thus frequency andregularity are interactive rather than additive But even if we allow forinteraction the function still results in greater regularity effects for lowfrequency itemsmdashjust as for example the power function

330 ELLIS AND SCHMIDT

FIG 5 A frequency by regularity interaction arising from additive contributions of regularity(solid horizontal arrows) and frequency (dotted horizontal arrows) inputting into anasymptoting power function Notice in particular the solid vertical bars measuring out the largeregularity effect at low frequencies and the much smaller one at high frequencies (Adaptedfrom Plaut McClelland Seidenberg amp Patterson 1994)

y 5 1 2 x2 2

asymptotes so does any power function

y 5 1 2 (xn)2 2

where n 0 the shape remains the same albeit stretched or condensedalong the horizontal axis Thus all associative accounts of morphologywhether they stress the importance of type or token frequency (Bybee 1995)in the determination of statistical regularity imply a frequency by regularityinteraction in performance

Plaut et al (1996) analyse the operation of connectionist networks in theparticular quasi-regular domain of spellingndashsound consistency in reading todemonstrate how the frequency by regularity interaction is a direct

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 331

consequence of the nonlinearity adaptivity and distributed representationproperties of learning and representation in PDP networks In what followswe will minimally rephrase their analysis as it applies to the quasi-regulardomain of inectional morphology In a connectionist network the weightchanges induced by an inputoutput pattern (IOP) on any training epochserve to reduce the error on that IOP The frequency of the IOP (and theunits it involves) is reected in how often it is presented to the network Thusword frequency directly amplies weight changes that are helpful to theIOP itself Consistency of the morphological inections of two stems isreected in the similarity of afx units that are co-activated in their IOPsFurthermore two inputs will induce similar weight changes to the extentthat they activate similar units In our MAL as an extreme case consistentforms all activate the same afx unit irregular ones each activate a differentidiosyncratic afx Given that the weight changes that are induced by eachIOP are superimposed on the weight changes for all other IOPs an IOPwill tend to be helped by the weight changes for IOPs whose inputoutputmappings are consistent with its own and hindered by the weight changesfor inconsistent IOPs Thus frequency and consistency sum because theyboth arise from similar weight changes that are simply added together duringtraining The weight changes result in corresponding increases in thesummed input to output units that should be active and decreases to thesummed units that should be inactive However due to the non-linearity ofthe input-output function of units these changes do not produce directlyproportionate reduction of error Rather as the magnitude of the summedinput to output units increases their states gradually asymptote towards10mdasha given increase in the summed input to a unit yields progressivelysmaller decrements in error over the course of training Thus althoughfrequency and regularity-as-consistency each contribute to the weights andhence to the summed input to units their effect on error is subjected to agradual ceiling effect as unit states are driven towards extremal values

Thus a connectionist associative account of simple morphosyntax as it isembodied in our MAL holds that learning involves associating inputpatterns representing single or plural concepts with stem and afx lemmasacross a large distributed network Frequency of experience increases thestrength of the appropriate IO associations Regularity effects stem fromconsistency the consistent items all involve pairings between plurality andthe regular lemma and thus regularity is frequency by another name Thenetwork sums and abstracts these consistencies but it does so usingnon-linear unit inputndashoutput functions thereby resulting in the frequency byregularity interaction Networks are not simple competitive chunking orMarkov chaining mechanisms working on surface form Their massivelydistributed nature allows the emergence of more abstract internalrepresentations We have argued that this analysis accounts for the human

332 ELLIS AND SCHMIDT

acquisition data of simple MAL morphosyntax quite well We believe thatthe acquisition of natural language morphosyntax where there are manyadditional factors of different phonological consistencies (of the type forexample where the neighbours sink drink and stink are irregular in theirpast tenses but all behave in the same -ankway) are equally conducive to theprinciples of this type of account although as illustrated in grandersimulation enterprises (Cottrell amp Plunkett 1994 Daugherty amp Seidenberg1994 MacWhinney amp Leinbach 1991 Marchman 1993 Plunkett ampMarchman 1993) the complexity of interaction of the factors that are therein the language evidence leads to much more complex developmentaloutcomes Our role here has been to study human acquisition underprecisely known circumstances and to demonstrate just how well aconnectionist associative account can simulate these data

A simple regularity5 consistency account of this type will have difculty ifthe ldquoregularrdquo or ldquodefaultrdquo case is not the most frequent case in a naturallanguage Although there is agreement for English past tense and formorphology more generally that the default case is more frequent theremay be exceptions Marcus et al (1995) argue that while the German particle-t applies to a much smaller percentage of verbs than its English counterpartand the German plural -s applies only to a small percentage of nounsnevertheless these afxes behave as defaults in the language These defaultsufxations in German could thus pose a problem for statistical orconnectionist accounts of the acquisition of the more frequent patterns asdefault since they may not be due to a large number of regular wordsreinforcing a pattern in associative memory (Prasada amp Pinker 1993)However this is still a matter of some debate Bybee (1995) suggests that amore reasonable method of counting German particle type frequency doesshow the default (or ldquoproductiverdquo) process to have the highest typefrequency She also argues that to a large extent the productivity patterns ofGerman plurals also reect their type frequency Nakisa and Hahn (1996)and Plunkett and Nakisa (in press) show that generalisation to unseen ornovel forms in German and Arabic (where there have also been claims for aminority default) is more accurately predicted by their phonologicalsimilarity to existing forms in the language (properly represented for typeand token frequency) rather than by the operation of a default rule FinallyHare Elman and Daugherty (1995) demonstrate that multilayerednetworks can develop a default category even in the absence of superior typefrequency as long as the non-default classes are well dened and narrowlydened so that they serve as strong prototypes for analogising to novelforms In such cases the area outside these well-dened attractor basins canconstitute a potential default (see also Plunkett amp Marchman 1991)

In the original hybrid model irregulars were stored and accessed fromrote memory Pinker and Prince (1994 p 326) modied this part of the

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 333

model arguing that since rote memory could not account (a) for similaritiesbetween the morphological base and irregular forms (eg swingndashswung) (b)for similarity within sets of base forms undergoing similar processes (egsingndashsang ringndashrang springndashsprang) or (c) for the kind of semi-productivityshown when children produce errors such as bringndashbrang or swingndashswangthe memory system underlying such productions must be associative anddynamic somewhat as connectionism portrays it Yet to account for datasuch as the frequencyregularity interaction this revised hybrid model stillholds that regular forms are rule-governed But a purely rule-based accountof regulars cannot explain false friends effects where regular inconsistentitems (eg bakendashbaked is similar in rhyme to neighbours makendashmade andtakendashtook which have inconsistent past tenses) are produced more slowlythat entirely regular ones (Daugherty amp Seidenberg 1994 Seidenberg ampBruck 1990) or frequency effects on regular forms (Oetting amp Rice 1993Stemberger amp MacWhinney 1986) Unlike connectionist models a rule-based account of regulars cannot explain these aspects of the human dataNor is the regularityfrequency interaction any reason to reject connectionistaccounts of morphosyntax in favour of a hybrid model

REFERENCESAnderson JR (1982) Acquisition of cognitive skill Psychological Review 89 369ndash406Anderson JR (1993) Rules of the mind Hillsdale NJ Lawrence Erlbaum Associates IncAnderson JR amp Schooler LJ (1991) Reections of the environment in memory

Psychological Science 2 396ndash408Beck M (1995) Tracking down the source of NSndashNNS differences in syntactic competence

Unpublished manuscript University of North TexasBellugi U Bihrle A Jernigan D Trauner D amp Dougherty S (1990)

Neuropsychological neurological and neuroanatomical prole of Williams SyndromeAmerican Journal of Medical Genetics 6 115ndash125

Braine MDS Brody RE Brooks PJ Sudhalter V Ross JA Catalano L amp FischSM (1990) Exploring language acquisition in children with a miniature articiallanguage Effects of item and pattern frequency arbitrary subclasses and correctionJournal of Memory and Language 29 591ndash610

Broeder P amp Plunkett K (1994) Connectionism and second language acquisition In NEllis (Ed) Implicit and explicit learning of languages (pp 421ndash454) London AcademicPress

Bybee J (1995) Regular morphology and the lexicon Language and Cognitive Processes10 425ndash455

Chater N (1995) Neural networks The new statistical models of mind In JP Levy DBairaktaris JA Bullinaria amp P Cairns (Eds) Connectionist models of memory andlanguage London UCL Press

Cohen JD MacWhinney B Flatt M amp Provost J (1993) PsyScope A new graphicinteractive environment for designing psychology experiments Behavioral ResearchMethods Instruments and Computers 25 257ndash271

Cottrell G amp Plunkett K (1994) Acquiring the mapping from meaning to soundsConnection Science 6 379ndash412

334 ELLIS AND SCHMIDT

Daugherty KG amp Seidenberg MS (1992) Rules or connections The past tense revisitedIn Proceedings of the 14th annual conference of the Cognitive Science Society (pp 259ndash264)Pittsburgh PA Cognitive Science Society

Daugherty KG amp Seidenberg MS (1994) Beyond rules and exceptions A connectionistapproach to inectional morphology In SD Lima RL Corrigan amp GK Iverson (Eds)The reality of linguistic rules (pp 353ndash388) Amsterdam John Benjamins

DeKeyser R (1997) Beyond explicit rule learning Automatizing second languagemorphosyntax Studies in Second Language Acquisition 19 195ndash222

Ellis NC (1996) Sequencing in SLA Phonological memory chunking and points of orderStudies in Second Language Acquisition 18 91ndash126

Eubank L amp Gregg KR (1995) ldquoEt in Amygdala Egordquo UG (S)LA and neurobiologyStudies in Second Language Acquisition 17 35ndash58

Hare M Elman JL amp Daugherty KG (1995) Default generalisation in connectionistnetworks Language and Cognitive Processes 10 601ndash630

Jung J (1971) The experimenterrsquos dilemma New York Harper amp RowKirsner K (1994) Implicit processes in second language learning In N Ellis (Ed) Implicit

and explicit learning of languages (pp 283ndash312) London Academic PressLachter J amp Bever T (1988) The relation between linguistic structure and associative

theories of language learning A constructive critique of some connectionist learningmodels Cognition 28 195ndash247

Lima SD Corrigan RL amp Iverson GK (Eds) (1994) The reality of linguistic rulesAmsterdam John Benjamins

MacWhinney B (1983) Miniature language systems as tests of use of universal operatingprinciples in second-language learning by children and adults Journal of PsycholinguisticResearch 12 467ndash478

MacWhinney B (1994) The dinosaurs and the ring In SD Lima RL Corrigan amp GKIverson (Eds) The reality of linguistic rules (pp 283ndash320) Amsterdam John Benjamins

MacWhinney B amp Leinbach J (1991) Implementations are not conceptualizationsRevising the verb learning model Cognition 40 121ndash157

Marchman VA (1993) Constraints on plasticity in a connectionist model of the Englishpast tense Journal of Cognitive Neuroscience 5 215ndash234

Marcus GF Brinkmann U Clahsen H Wiese R amp Pinker S (1995) Germaninection The exception that proves the rule Cognitive Psychology 29 198ndash256

McLaughlin B (1980) On the use of miniature articial languages in second-languageresearch Applied Psycholinguistics 1 357ndash369

Moeser SD amp Bregman AS (1972) The role of reference in the acquisition of a miniaturearticial language Journal of Verbal Learning and Verbal Behavior 11 759ndash769

Morgan JL Meier RP amp Newport EL (1987) Structural packaging in the input tolanguage learning Contributions of prosodic and morphological marking of phrases to theacquisition of language Cognitive Psychology 19 498ndash550

Morgan JL amp Newport EL (1981) The role of constituent structure in the induction of anarticial language Journal of Verbal Learning and Verbal Behavior 20 67ndash85

Morton J (1979) Facilitation in word recognition Experiments causing change in thelogogen model In PA Kolers ME Wrolstad amp M Bouma (Eds) Processing of visiblelanguage (pp 259ndash268) New York Plenum

Nakisa R amp Hahn U (1996) Where defaults donrsquot help The case of the German pluralsystem In Proceedings of the 18th annual conference of the Cognitive Science Society (pp177ndash182) Hillsdale NJ Lawrence Erlbaum Associates Inc

Newell A (1990) Unied theories of cognition Cambridge MA Harvard University PressNewell A amp Rosenbloom P (1981) Mechanisms of skill acquisition and the law of

practice In JR Anderson (Ed) Cognitive skills and their acquisition Hillsdale NJLawrence Erlbaum Associates Inc

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 335

Oetting JB amp Rice ML (1993) Plural acquisition in children with specic languageimpairment Journal of Speech and Hearing Research 36 1236ndash1248

Paivio A (1986) Mental representations A dual coding approach Oxford UK OxfordUniversity Press

Palermo DS amp Howe HE (1970) An experimental analogy to the learning of past-tenseinection rules Journal of Verbal Learning and Verbal Behavior 9 410ndash416

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S amp Prince A (1988) On language and connectionism Analysis of a parallel

distributed processing model of language acquisition Cognition 29 195ndash247Pinker S amp Prince A (1994) Regular and irregular morphology and the psychological

status of rules of grammar In SD Lima RL Corrigan amp GK Iverson (Eds) The reality oflinguistic rules (pp 321ndash351) Amsterdam John Benjamins

Plaut DC McClelland JL Seidenberg MS amp Patterson KE (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plunkett K amp Marchman V (1991) U-shaped learning and frequency effects in amulti-layered perceptron Implications for child language acquisition Cognition 38 3ndash102

Plunkett K amp Marchman V (1993) From rote learning to system building Acquiring verbmorphology in children and connectionist nets Cognition 48 21ndash69

Plunkett K amp Nakisa RC (in press) A connectionist model of Arabic plural systemLanguage and Cognitive Processes

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Prasada S Pinker S amp Snyder W (1990) Some evidence that irregular forms are retrievedfrom memory but regular forms are rule-governed Paper presented at the 31st meeting ofthe Psychonomic Society New Orleans November

Rumelhart D Hinton G amp Williams R (1986) Learning internal representations by backpropagation In DE Rumelhart amp JL McClelland (Ed) Parallel distributed processingExplorations in the microstructure of cognition Cambridge MA MIT Press

Rumelhart D amp McClelland J (1986) On learning the past tense of English verbs In DERumelhart amp JL McClelland (Eds) Parallel distributed processing Explorations in themicrostructure of cognition Vol 2 Psychological and biological models (pp 272ndash326)Cambridge MA MIT Press

Seidenberg MS amp Bruck M (1990) Consistency effects in the generation of past tensemorphology Paper presented at the 31st meeting of the Psychonomic Society New OrleansNovember

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behavior 23 383ndash404

Sharwood Smith MA (1994) The unruly world of language In N Ellis (Ed) Implicit andexplicit learning of languages (pp 33ndash44) London Academic Press

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Human Learning and Memory 6 174ndash215

Stemberger JP amp MacWhinney B (1986) Frequency and the lexical storage of regularlyinected forms Memory and Cognition 14 17ndash26

Winter B amp Reber AS (1994) Implicit learning and the acquisition of natural languagesIn N Ellis (Ed) Implicit and explicit learning of languages (pp 115ndash146) LondonAcademic Press

Yang LR amp Givon T (1997) Benets and drawbacks of controlled laboratory studies ofsecond language acquisition The Keck second language learning project Studies in SecondLanguage Acquisition 19 173ndash194

336 ELLIS AND SCHMIDT

APPENDIX

The Word-forms of the Articial Language

Picture Stem Plural Form Frequency Regularity

car garth bugarth 5 Rbed pid bupid 1 Rlamp lant bulant 5 Rtable tib butid 1 Rplane poon bupoon 5 Rball prill buprill 1 Rtrain dram budram 5 Rhouse hize buhize 1 Rbook bisk bubisk 5 Rbroom breen bubreen 1 Rphone feem gofeem 5 Iumbrella brol gubrol 1 Ichair charp zecharp 5 Ihorse naig zonaig 1 Imonkey chonk nuchonk 5 Idog woop niwoop 1 Ielephant fant vefant 5 Iscissors zoze vuzoze 1 Ikite kag rekag 5 Ish pisc ropisc 1 I

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 325

FIG 4 The regularity by frequency interaction averaged over blocks in humans and the eighthidden unit model Error bars reect 95 condence intervals

Generalisation So far we have described performance with traineditems However we also tested model output when the stimulus was thepattern for generalisation item (TesterP) along with activation of the pluralmarking I-unit 22 a state of input on which the models had never beentrained Table 1 shows performance of the different models at the end oftraining It is clear that the larger models have abstracted the regular pluralpattern and tend to apply it by default to the generalisation test item for the15HU model (a) average activation on the regular plural O-unit is 060 (b)mean RMS error comparing observed activation across O-units 22ndash32 andthe target regular plural pattern (10000000000) is just 045 and (c) four outof the ve exemplar runs of this size of model chose the regular pluralpattern as being the closest to observed output as measured by minimum

326 ELLIS AND SCHMIDT

TABLE 1Performance on the Target Regular Plural Pattern for the Four Sizes of Model When

Presented with the Generalisation Wug-test Item TesterP at End of Training

Model Size

Measure 3HUa 5HU 8HU 15HU

RMS errorb

M 081 079 053 045SD 043 050 045 032

Activation weightc

M 020 028 057 060SD 044 044 052 035

N hits (5)d 1 2 3 4

There were ve examplars of each size of model aHU 5 hidden units bRMS error calculatedagainst the target activation pattern across O-units 22ndash32 for the regular plural afx cActivationweight on the regular plural afx O-Unit dNumber of exemplar models (5) which chose theregular plural afx pattern for TesterP as indexed by output weights on O-units 22ndash32 beingclosest to the regular plural afx target pattern activation using a squared Euclidean distancemetric

squared Euclidean distance Thus when the larger models are presentedwith a plural stimulus which they have only ever previously experienced as asingle form there is a tendency for them to generalise and apply the regularplural morpheme (bu-) in the same way that humans might generalise thatthe plural of ldquowugrdquo is ldquowugsrdquo

Effects of Different Sizes of Model Figure 2 also illustrates the effects ofmanipulating computational capacity of model (1) Models with lowercomputational power ( 5 a smaller number of HUs) learn the high frequencyitems quite wellmdashalmost as well as the largest model (2) The most strikingeffect of varying the computational power of the models lies in their abilitiesto learn low frequency irregular itemsmdashthis is by far the most sensitive indexof morphological learning ability The 3HU model hardly manages to learnthese forms at all The 15HU model eventually learns them rather well (3)There is essentially no frequency effect for regular items in the highercomputational power models but none the less the frequency effect forirregular items remains strong (4) The smaller models continue to show afrequency effect for regular items at the end of training Table l provides oneadditional effect of model size (5) The greater the computational power ofthe models the more they operate in ldquorule-likerdquo way by abstracting aldquoregularrdquo plural form which is applied by default to novel items In sumwhile lower computational power models are reasonably good on highfrequency regular items they show frequency effects for irregular and

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 327

regular items are relatively poor on ldquowug testsrdquo and have particulardifculty on low frequency irregular items

Discussion of Simulations

We believe that at least for the issue of regularity and frequency effects inmorphosyntax this is to date the most complete quantitative analysis of theadequacy of t of simulation to human data We are not simply makingpredictions about how an underspecied model might behave (theDaugherty amp Seidenberg 1994 criticisms of the Pinker amp Prince 1988 andPinker 1991 theories) We are not simply demonstrating that simulation andhuman data alike exhibit rst order interactions of frequency and regularity(Daugherty amp Seidenberg 1994) Instead we are showing the parallelpatterns of signicance of main effects rst and second order interactions inANOVAs of simulation and human data and we are showing that thesimulations explain close to 80 of the relevant human data When we go asfar as actually comparing human and model performance in a multifactorialANOVA we nd some differences of detail in the size of interactions thatare qualied by the humanmodel factor But these differences of detail donot detract from the general success of the models in simulating the humanpattern of development of the frequency by regularity interaction Inhumans and models alike high frequency items were learned signicantlyfaster than low frequency ones regular items were learned signicantlyfaster than irregular ones there was a signicant frequency by regularityinteraction where the frequency effect was less for regular items than forirregular ones and this is qualied as the higher level interaction with blockwhereby there is a developmental trendmdashthe frequency effect for regularitems attenuates faster than that for irregular items

We have demonstrated that the models can generalise and produce thedefault plural afx for a novel stimulus Similar ldquowug testrdquo performance by ahuman learner would be taken as an operationalisation that they hadacquired the ldquoregularrdquo morphological systematicity

Finally we have shown how varying the computational capacity of themodels affects both the rate of acquisition of default case the presence orabsence of frequency effects for regular items and ability to acquireirregular items This is compatible with existing data for children withspecic language impairment (SLI) Oetting and Rice (1993) compared ve-year-old SLI children with age-matched controls on their ability to formplurals The SLI children were signicantly worse at generating regularplurals for nonce (5 wug) items they were worse at generating regularplurals and they showed an effect of frequency on the regular items whichthe control children because of ceiling effects did not UnfortunatelyOetting and Rice (1993) do not provide clear data on the childrenrsquos ability to

328 ELLIS AND SCHMIDT

form irregular plurals However their pattern of differences between SLIand control childrenrsquos performance on regular items is sufciently close tothat between the present low-capacity and high-capacity simulations tosuggest that morphosyntactic impairments in individuals with SLI might beexplained by reduced language processing capacity in a general associativememory network rather than by a hybrid account The SLI childrenrsquosshowing frequency effects for regular items is particularly compelling in thisrespect However further assessment of regularity by frequency effects anddefault abstraction in individuals with SLI and with Williams syndrome(whose ability on regular forms is said to outstrip their performance onirregularsmdashBellugi Bihrle Jernigan Trauner amp Dougherty 1990) isnecessary to test these parallels further (see Marchman 1993 for othersimulations of different types of language dysfunction)

GENERAL DISCUSSION

Fluent language users have processed many millions of utterances involvingtens of thousands of types presented as innumerable tokens It should comeas no surprise either that they demonstrate such effortless and complex skillas a result of this mass of practice or that researchers lacking any truerecord of the learnersrsquo experience are awed and confused by thesesophisticated grammatical abilities While we have no wish to deny any ofthe complexity of the nal uent state we suspect that much of the mysteryof morphology can be claried by focusing on the acquisition process ratherthan the end-point This has been our aim in this paper Our MAL is atravesty of natural language but at least we know the types and tokens in thelearnersrsquo language evidence and there is no need to speculate or argue aboutextrapolations from corpus data or assumptions about registers

Human learning of this MAL inectional morphology quickly culminatesin a state where as with natural language frequency and regularity haveinteractive effects on performance But as we chart acquisition it is clearthat this interaction need not imply complex dual-mechanisms of processingRather it simply reects the asymptotes expected from the power law ofpractice a simple associative law of learning Thus we have shown that oneof the most frequently introduced arguments for the necessity of adual-mechanism approach a frequency effect for irregulars and the absenceof such an effect for regulars is not a good argument at all Furthermore wehave demonstrated that a simple connectionist model as an implementationof associative learning provided with the same language evidenceaccurately simulates the human acquisition data

But how is the power law instantiated in human and connectionistsystems and what is being associated in the acquisition of inectional

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 329

morphosyntax The power law of learning in human performance has beeninterpreted as resulting from basic associative mechanisms involving theformation of new chunks and the effects of frequency on the accessibility ofthese representations (Newell 1990 Newell amp Rosenbloom 1981)Anderson and Schooler (1991) suggest that memory (both as its behaviouralexpression in error rate and latency and as its neural expression in LTP)displays properties such as the power law of learning because theseproperties reect an optimal response to the environment where theprobability of an item occurring at any particular time is a power function ofits past frequency of occurrence Neural activation which controlsbehaviour reects the probability of an item occurring in the environmentthus the neural processes are designed to adapt behaviour to the statisticalproperties of the environment (Anderson 1993) Connectionist systems aredesigned to do the same thing (Chater 1995)

In our simplied account of inectional morphology where phonologicalfactors are put to one side the relevant units for chunking are the stem formsand the plural afxes From an associative perspective regularity andfrequency are essentially the same factor under different names The rstmeaning of ldquoregularrdquo in the Pocket Oxford Dictionary involves ldquohabitualconstantrdquo acts a denition in terms of statistical frequencies consistencyand descriptive generalisation the second stresses ldquoconforming to a rule orprinciplerdquo We need to disentangle these senses (see Sharwood-Smith 1994and Lima Corrigan amp Iverson 1994 for conceptual analysis of ldquorules oflanguagerdquo) Whether regular morphology is generated according to a rule ornot it is certainly the case for English and the MAL under study here (andgenerally it is the default if not the universal casemdashwe will return to thismatter later) that regular afxes are more habitual or frequent And asdemonstrated in Fig 5 the power law of practice entails that an effect of aconstant increment of regularity (in its frequency sense) is much moreapparent at low than at high frequencies of practice

Although it is a general principle the degree to which it applies dependson a range of factors including (a) the exponent of the power function (b)the particular level of experience attained and thus the placement ofcomparison points on the learning curve and (c) the degree to whichfrequency and regularity are additive or multiplicative In the presentexperiment a vefold increase in the frequency of the regular items resultsin a (5 3 the number of regular items) increase in use of the regular afx avefold increase in the frequency of an irregular item results in merely avefold increase in the use of the irregular afx Thus frequency andregularity are interactive rather than additive But even if we allow forinteraction the function still results in greater regularity effects for lowfrequency itemsmdashjust as for example the power function

330 ELLIS AND SCHMIDT

FIG 5 A frequency by regularity interaction arising from additive contributions of regularity(solid horizontal arrows) and frequency (dotted horizontal arrows) inputting into anasymptoting power function Notice in particular the solid vertical bars measuring out the largeregularity effect at low frequencies and the much smaller one at high frequencies (Adaptedfrom Plaut McClelland Seidenberg amp Patterson 1994)

y 5 1 2 x2 2

asymptotes so does any power function

y 5 1 2 (xn)2 2

where n 0 the shape remains the same albeit stretched or condensedalong the horizontal axis Thus all associative accounts of morphologywhether they stress the importance of type or token frequency (Bybee 1995)in the determination of statistical regularity imply a frequency by regularityinteraction in performance

Plaut et al (1996) analyse the operation of connectionist networks in theparticular quasi-regular domain of spellingndashsound consistency in reading todemonstrate how the frequency by regularity interaction is a direct

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 331

consequence of the nonlinearity adaptivity and distributed representationproperties of learning and representation in PDP networks In what followswe will minimally rephrase their analysis as it applies to the quasi-regulardomain of inectional morphology In a connectionist network the weightchanges induced by an inputoutput pattern (IOP) on any training epochserve to reduce the error on that IOP The frequency of the IOP (and theunits it involves) is reected in how often it is presented to the network Thusword frequency directly amplies weight changes that are helpful to theIOP itself Consistency of the morphological inections of two stems isreected in the similarity of afx units that are co-activated in their IOPsFurthermore two inputs will induce similar weight changes to the extentthat they activate similar units In our MAL as an extreme case consistentforms all activate the same afx unit irregular ones each activate a differentidiosyncratic afx Given that the weight changes that are induced by eachIOP are superimposed on the weight changes for all other IOPs an IOPwill tend to be helped by the weight changes for IOPs whose inputoutputmappings are consistent with its own and hindered by the weight changesfor inconsistent IOPs Thus frequency and consistency sum because theyboth arise from similar weight changes that are simply added together duringtraining The weight changes result in corresponding increases in thesummed input to output units that should be active and decreases to thesummed units that should be inactive However due to the non-linearity ofthe input-output function of units these changes do not produce directlyproportionate reduction of error Rather as the magnitude of the summedinput to output units increases their states gradually asymptote towards10mdasha given increase in the summed input to a unit yields progressivelysmaller decrements in error over the course of training Thus althoughfrequency and regularity-as-consistency each contribute to the weights andhence to the summed input to units their effect on error is subjected to agradual ceiling effect as unit states are driven towards extremal values

Thus a connectionist associative account of simple morphosyntax as it isembodied in our MAL holds that learning involves associating inputpatterns representing single or plural concepts with stem and afx lemmasacross a large distributed network Frequency of experience increases thestrength of the appropriate IO associations Regularity effects stem fromconsistency the consistent items all involve pairings between plurality andthe regular lemma and thus regularity is frequency by another name Thenetwork sums and abstracts these consistencies but it does so usingnon-linear unit inputndashoutput functions thereby resulting in the frequency byregularity interaction Networks are not simple competitive chunking orMarkov chaining mechanisms working on surface form Their massivelydistributed nature allows the emergence of more abstract internalrepresentations We have argued that this analysis accounts for the human

332 ELLIS AND SCHMIDT

acquisition data of simple MAL morphosyntax quite well We believe thatthe acquisition of natural language morphosyntax where there are manyadditional factors of different phonological consistencies (of the type forexample where the neighbours sink drink and stink are irregular in theirpast tenses but all behave in the same -ankway) are equally conducive to theprinciples of this type of account although as illustrated in grandersimulation enterprises (Cottrell amp Plunkett 1994 Daugherty amp Seidenberg1994 MacWhinney amp Leinbach 1991 Marchman 1993 Plunkett ampMarchman 1993) the complexity of interaction of the factors that are therein the language evidence leads to much more complex developmentaloutcomes Our role here has been to study human acquisition underprecisely known circumstances and to demonstrate just how well aconnectionist associative account can simulate these data

A simple regularity5 consistency account of this type will have difculty ifthe ldquoregularrdquo or ldquodefaultrdquo case is not the most frequent case in a naturallanguage Although there is agreement for English past tense and formorphology more generally that the default case is more frequent theremay be exceptions Marcus et al (1995) argue that while the German particle-t applies to a much smaller percentage of verbs than its English counterpartand the German plural -s applies only to a small percentage of nounsnevertheless these afxes behave as defaults in the language These defaultsufxations in German could thus pose a problem for statistical orconnectionist accounts of the acquisition of the more frequent patterns asdefault since they may not be due to a large number of regular wordsreinforcing a pattern in associative memory (Prasada amp Pinker 1993)However this is still a matter of some debate Bybee (1995) suggests that amore reasonable method of counting German particle type frequency doesshow the default (or ldquoproductiverdquo) process to have the highest typefrequency She also argues that to a large extent the productivity patterns ofGerman plurals also reect their type frequency Nakisa and Hahn (1996)and Plunkett and Nakisa (in press) show that generalisation to unseen ornovel forms in German and Arabic (where there have also been claims for aminority default) is more accurately predicted by their phonologicalsimilarity to existing forms in the language (properly represented for typeand token frequency) rather than by the operation of a default rule FinallyHare Elman and Daugherty (1995) demonstrate that multilayerednetworks can develop a default category even in the absence of superior typefrequency as long as the non-default classes are well dened and narrowlydened so that they serve as strong prototypes for analogising to novelforms In such cases the area outside these well-dened attractor basins canconstitute a potential default (see also Plunkett amp Marchman 1991)

In the original hybrid model irregulars were stored and accessed fromrote memory Pinker and Prince (1994 p 326) modied this part of the

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 333

model arguing that since rote memory could not account (a) for similaritiesbetween the morphological base and irregular forms (eg swingndashswung) (b)for similarity within sets of base forms undergoing similar processes (egsingndashsang ringndashrang springndashsprang) or (c) for the kind of semi-productivityshown when children produce errors such as bringndashbrang or swingndashswangthe memory system underlying such productions must be associative anddynamic somewhat as connectionism portrays it Yet to account for datasuch as the frequencyregularity interaction this revised hybrid model stillholds that regular forms are rule-governed But a purely rule-based accountof regulars cannot explain false friends effects where regular inconsistentitems (eg bakendashbaked is similar in rhyme to neighbours makendashmade andtakendashtook which have inconsistent past tenses) are produced more slowlythat entirely regular ones (Daugherty amp Seidenberg 1994 Seidenberg ampBruck 1990) or frequency effects on regular forms (Oetting amp Rice 1993Stemberger amp MacWhinney 1986) Unlike connectionist models a rule-based account of regulars cannot explain these aspects of the human dataNor is the regularityfrequency interaction any reason to reject connectionistaccounts of morphosyntax in favour of a hybrid model

REFERENCESAnderson JR (1982) Acquisition of cognitive skill Psychological Review 89 369ndash406Anderson JR (1993) Rules of the mind Hillsdale NJ Lawrence Erlbaum Associates IncAnderson JR amp Schooler LJ (1991) Reections of the environment in memory

Psychological Science 2 396ndash408Beck M (1995) Tracking down the source of NSndashNNS differences in syntactic competence

Unpublished manuscript University of North TexasBellugi U Bihrle A Jernigan D Trauner D amp Dougherty S (1990)

Neuropsychological neurological and neuroanatomical prole of Williams SyndromeAmerican Journal of Medical Genetics 6 115ndash125

Braine MDS Brody RE Brooks PJ Sudhalter V Ross JA Catalano L amp FischSM (1990) Exploring language acquisition in children with a miniature articiallanguage Effects of item and pattern frequency arbitrary subclasses and correctionJournal of Memory and Language 29 591ndash610

Broeder P amp Plunkett K (1994) Connectionism and second language acquisition In NEllis (Ed) Implicit and explicit learning of languages (pp 421ndash454) London AcademicPress

Bybee J (1995) Regular morphology and the lexicon Language and Cognitive Processes10 425ndash455

Chater N (1995) Neural networks The new statistical models of mind In JP Levy DBairaktaris JA Bullinaria amp P Cairns (Eds) Connectionist models of memory andlanguage London UCL Press

Cohen JD MacWhinney B Flatt M amp Provost J (1993) PsyScope A new graphicinteractive environment for designing psychology experiments Behavioral ResearchMethods Instruments and Computers 25 257ndash271

Cottrell G amp Plunkett K (1994) Acquiring the mapping from meaning to soundsConnection Science 6 379ndash412

334 ELLIS AND SCHMIDT

Daugherty KG amp Seidenberg MS (1992) Rules or connections The past tense revisitedIn Proceedings of the 14th annual conference of the Cognitive Science Society (pp 259ndash264)Pittsburgh PA Cognitive Science Society

Daugherty KG amp Seidenberg MS (1994) Beyond rules and exceptions A connectionistapproach to inectional morphology In SD Lima RL Corrigan amp GK Iverson (Eds)The reality of linguistic rules (pp 353ndash388) Amsterdam John Benjamins

DeKeyser R (1997) Beyond explicit rule learning Automatizing second languagemorphosyntax Studies in Second Language Acquisition 19 195ndash222

Ellis NC (1996) Sequencing in SLA Phonological memory chunking and points of orderStudies in Second Language Acquisition 18 91ndash126

Eubank L amp Gregg KR (1995) ldquoEt in Amygdala Egordquo UG (S)LA and neurobiologyStudies in Second Language Acquisition 17 35ndash58

Hare M Elman JL amp Daugherty KG (1995) Default generalisation in connectionistnetworks Language and Cognitive Processes 10 601ndash630

Jung J (1971) The experimenterrsquos dilemma New York Harper amp RowKirsner K (1994) Implicit processes in second language learning In N Ellis (Ed) Implicit

and explicit learning of languages (pp 283ndash312) London Academic PressLachter J amp Bever T (1988) The relation between linguistic structure and associative

theories of language learning A constructive critique of some connectionist learningmodels Cognition 28 195ndash247

Lima SD Corrigan RL amp Iverson GK (Eds) (1994) The reality of linguistic rulesAmsterdam John Benjamins

MacWhinney B (1983) Miniature language systems as tests of use of universal operatingprinciples in second-language learning by children and adults Journal of PsycholinguisticResearch 12 467ndash478

MacWhinney B (1994) The dinosaurs and the ring In SD Lima RL Corrigan amp GKIverson (Eds) The reality of linguistic rules (pp 283ndash320) Amsterdam John Benjamins

MacWhinney B amp Leinbach J (1991) Implementations are not conceptualizationsRevising the verb learning model Cognition 40 121ndash157

Marchman VA (1993) Constraints on plasticity in a connectionist model of the Englishpast tense Journal of Cognitive Neuroscience 5 215ndash234

Marcus GF Brinkmann U Clahsen H Wiese R amp Pinker S (1995) Germaninection The exception that proves the rule Cognitive Psychology 29 198ndash256

McLaughlin B (1980) On the use of miniature articial languages in second-languageresearch Applied Psycholinguistics 1 357ndash369

Moeser SD amp Bregman AS (1972) The role of reference in the acquisition of a miniaturearticial language Journal of Verbal Learning and Verbal Behavior 11 759ndash769

Morgan JL Meier RP amp Newport EL (1987) Structural packaging in the input tolanguage learning Contributions of prosodic and morphological marking of phrases to theacquisition of language Cognitive Psychology 19 498ndash550

Morgan JL amp Newport EL (1981) The role of constituent structure in the induction of anarticial language Journal of Verbal Learning and Verbal Behavior 20 67ndash85

Morton J (1979) Facilitation in word recognition Experiments causing change in thelogogen model In PA Kolers ME Wrolstad amp M Bouma (Eds) Processing of visiblelanguage (pp 259ndash268) New York Plenum

Nakisa R amp Hahn U (1996) Where defaults donrsquot help The case of the German pluralsystem In Proceedings of the 18th annual conference of the Cognitive Science Society (pp177ndash182) Hillsdale NJ Lawrence Erlbaum Associates Inc

Newell A (1990) Unied theories of cognition Cambridge MA Harvard University PressNewell A amp Rosenbloom P (1981) Mechanisms of skill acquisition and the law of

practice In JR Anderson (Ed) Cognitive skills and their acquisition Hillsdale NJLawrence Erlbaum Associates Inc

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 335

Oetting JB amp Rice ML (1993) Plural acquisition in children with specic languageimpairment Journal of Speech and Hearing Research 36 1236ndash1248

Paivio A (1986) Mental representations A dual coding approach Oxford UK OxfordUniversity Press

Palermo DS amp Howe HE (1970) An experimental analogy to the learning of past-tenseinection rules Journal of Verbal Learning and Verbal Behavior 9 410ndash416

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S amp Prince A (1988) On language and connectionism Analysis of a parallel

distributed processing model of language acquisition Cognition 29 195ndash247Pinker S amp Prince A (1994) Regular and irregular morphology and the psychological

status of rules of grammar In SD Lima RL Corrigan amp GK Iverson (Eds) The reality oflinguistic rules (pp 321ndash351) Amsterdam John Benjamins

Plaut DC McClelland JL Seidenberg MS amp Patterson KE (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plunkett K amp Marchman V (1991) U-shaped learning and frequency effects in amulti-layered perceptron Implications for child language acquisition Cognition 38 3ndash102

Plunkett K amp Marchman V (1993) From rote learning to system building Acquiring verbmorphology in children and connectionist nets Cognition 48 21ndash69

Plunkett K amp Nakisa RC (in press) A connectionist model of Arabic plural systemLanguage and Cognitive Processes

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Prasada S Pinker S amp Snyder W (1990) Some evidence that irregular forms are retrievedfrom memory but regular forms are rule-governed Paper presented at the 31st meeting ofthe Psychonomic Society New Orleans November

Rumelhart D Hinton G amp Williams R (1986) Learning internal representations by backpropagation In DE Rumelhart amp JL McClelland (Ed) Parallel distributed processingExplorations in the microstructure of cognition Cambridge MA MIT Press

Rumelhart D amp McClelland J (1986) On learning the past tense of English verbs In DERumelhart amp JL McClelland (Eds) Parallel distributed processing Explorations in themicrostructure of cognition Vol 2 Psychological and biological models (pp 272ndash326)Cambridge MA MIT Press

Seidenberg MS amp Bruck M (1990) Consistency effects in the generation of past tensemorphology Paper presented at the 31st meeting of the Psychonomic Society New OrleansNovember

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behavior 23 383ndash404

Sharwood Smith MA (1994) The unruly world of language In N Ellis (Ed) Implicit andexplicit learning of languages (pp 33ndash44) London Academic Press

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Human Learning and Memory 6 174ndash215

Stemberger JP amp MacWhinney B (1986) Frequency and the lexical storage of regularlyinected forms Memory and Cognition 14 17ndash26

Winter B amp Reber AS (1994) Implicit learning and the acquisition of natural languagesIn N Ellis (Ed) Implicit and explicit learning of languages (pp 115ndash146) LondonAcademic Press

Yang LR amp Givon T (1997) Benets and drawbacks of controlled laboratory studies ofsecond language acquisition The Keck second language learning project Studies in SecondLanguage Acquisition 19 173ndash194

336 ELLIS AND SCHMIDT

APPENDIX

The Word-forms of the Articial Language

Picture Stem Plural Form Frequency Regularity

car garth bugarth 5 Rbed pid bupid 1 Rlamp lant bulant 5 Rtable tib butid 1 Rplane poon bupoon 5 Rball prill buprill 1 Rtrain dram budram 5 Rhouse hize buhize 1 Rbook bisk bubisk 5 Rbroom breen bubreen 1 Rphone feem gofeem 5 Iumbrella brol gubrol 1 Ichair charp zecharp 5 Ihorse naig zonaig 1 Imonkey chonk nuchonk 5 Idog woop niwoop 1 Ielephant fant vefant 5 Iscissors zoze vuzoze 1 Ikite kag rekag 5 Ish pisc ropisc 1 I

326 ELLIS AND SCHMIDT

TABLE 1Performance on the Target Regular Plural Pattern for the Four Sizes of Model When

Presented with the Generalisation Wug-test Item TesterP at End of Training

Model Size

Measure 3HUa 5HU 8HU 15HU

RMS errorb

M 081 079 053 045SD 043 050 045 032

Activation weightc

M 020 028 057 060SD 044 044 052 035

N hits (5)d 1 2 3 4

There were ve examplars of each size of model aHU 5 hidden units bRMS error calculatedagainst the target activation pattern across O-units 22ndash32 for the regular plural afx cActivationweight on the regular plural afx O-Unit dNumber of exemplar models (5) which chose theregular plural afx pattern for TesterP as indexed by output weights on O-units 22ndash32 beingclosest to the regular plural afx target pattern activation using a squared Euclidean distancemetric

squared Euclidean distance Thus when the larger models are presentedwith a plural stimulus which they have only ever previously experienced as asingle form there is a tendency for them to generalise and apply the regularplural morpheme (bu-) in the same way that humans might generalise thatthe plural of ldquowugrdquo is ldquowugsrdquo

Effects of Different Sizes of Model Figure 2 also illustrates the effects ofmanipulating computational capacity of model (1) Models with lowercomputational power ( 5 a smaller number of HUs) learn the high frequencyitems quite wellmdashalmost as well as the largest model (2) The most strikingeffect of varying the computational power of the models lies in their abilitiesto learn low frequency irregular itemsmdashthis is by far the most sensitive indexof morphological learning ability The 3HU model hardly manages to learnthese forms at all The 15HU model eventually learns them rather well (3)There is essentially no frequency effect for regular items in the highercomputational power models but none the less the frequency effect forirregular items remains strong (4) The smaller models continue to show afrequency effect for regular items at the end of training Table l provides oneadditional effect of model size (5) The greater the computational power ofthe models the more they operate in ldquorule-likerdquo way by abstracting aldquoregularrdquo plural form which is applied by default to novel items In sumwhile lower computational power models are reasonably good on highfrequency regular items they show frequency effects for irregular and

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 327

regular items are relatively poor on ldquowug testsrdquo and have particulardifculty on low frequency irregular items

Discussion of Simulations

We believe that at least for the issue of regularity and frequency effects inmorphosyntax this is to date the most complete quantitative analysis of theadequacy of t of simulation to human data We are not simply makingpredictions about how an underspecied model might behave (theDaugherty amp Seidenberg 1994 criticisms of the Pinker amp Prince 1988 andPinker 1991 theories) We are not simply demonstrating that simulation andhuman data alike exhibit rst order interactions of frequency and regularity(Daugherty amp Seidenberg 1994) Instead we are showing the parallelpatterns of signicance of main effects rst and second order interactions inANOVAs of simulation and human data and we are showing that thesimulations explain close to 80 of the relevant human data When we go asfar as actually comparing human and model performance in a multifactorialANOVA we nd some differences of detail in the size of interactions thatare qualied by the humanmodel factor But these differences of detail donot detract from the general success of the models in simulating the humanpattern of development of the frequency by regularity interaction Inhumans and models alike high frequency items were learned signicantlyfaster than low frequency ones regular items were learned signicantlyfaster than irregular ones there was a signicant frequency by regularityinteraction where the frequency effect was less for regular items than forirregular ones and this is qualied as the higher level interaction with blockwhereby there is a developmental trendmdashthe frequency effect for regularitems attenuates faster than that for irregular items

We have demonstrated that the models can generalise and produce thedefault plural afx for a novel stimulus Similar ldquowug testrdquo performance by ahuman learner would be taken as an operationalisation that they hadacquired the ldquoregularrdquo morphological systematicity

Finally we have shown how varying the computational capacity of themodels affects both the rate of acquisition of default case the presence orabsence of frequency effects for regular items and ability to acquireirregular items This is compatible with existing data for children withspecic language impairment (SLI) Oetting and Rice (1993) compared ve-year-old SLI children with age-matched controls on their ability to formplurals The SLI children were signicantly worse at generating regularplurals for nonce (5 wug) items they were worse at generating regularplurals and they showed an effect of frequency on the regular items whichthe control children because of ceiling effects did not UnfortunatelyOetting and Rice (1993) do not provide clear data on the childrenrsquos ability to

328 ELLIS AND SCHMIDT

form irregular plurals However their pattern of differences between SLIand control childrenrsquos performance on regular items is sufciently close tothat between the present low-capacity and high-capacity simulations tosuggest that morphosyntactic impairments in individuals with SLI might beexplained by reduced language processing capacity in a general associativememory network rather than by a hybrid account The SLI childrenrsquosshowing frequency effects for regular items is particularly compelling in thisrespect However further assessment of regularity by frequency effects anddefault abstraction in individuals with SLI and with Williams syndrome(whose ability on regular forms is said to outstrip their performance onirregularsmdashBellugi Bihrle Jernigan Trauner amp Dougherty 1990) isnecessary to test these parallels further (see Marchman 1993 for othersimulations of different types of language dysfunction)

GENERAL DISCUSSION

Fluent language users have processed many millions of utterances involvingtens of thousands of types presented as innumerable tokens It should comeas no surprise either that they demonstrate such effortless and complex skillas a result of this mass of practice or that researchers lacking any truerecord of the learnersrsquo experience are awed and confused by thesesophisticated grammatical abilities While we have no wish to deny any ofthe complexity of the nal uent state we suspect that much of the mysteryof morphology can be claried by focusing on the acquisition process ratherthan the end-point This has been our aim in this paper Our MAL is atravesty of natural language but at least we know the types and tokens in thelearnersrsquo language evidence and there is no need to speculate or argue aboutextrapolations from corpus data or assumptions about registers

Human learning of this MAL inectional morphology quickly culminatesin a state where as with natural language frequency and regularity haveinteractive effects on performance But as we chart acquisition it is clearthat this interaction need not imply complex dual-mechanisms of processingRather it simply reects the asymptotes expected from the power law ofpractice a simple associative law of learning Thus we have shown that oneof the most frequently introduced arguments for the necessity of adual-mechanism approach a frequency effect for irregulars and the absenceof such an effect for regulars is not a good argument at all Furthermore wehave demonstrated that a simple connectionist model as an implementationof associative learning provided with the same language evidenceaccurately simulates the human acquisition data

But how is the power law instantiated in human and connectionistsystems and what is being associated in the acquisition of inectional

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 329

morphosyntax The power law of learning in human performance has beeninterpreted as resulting from basic associative mechanisms involving theformation of new chunks and the effects of frequency on the accessibility ofthese representations (Newell 1990 Newell amp Rosenbloom 1981)Anderson and Schooler (1991) suggest that memory (both as its behaviouralexpression in error rate and latency and as its neural expression in LTP)displays properties such as the power law of learning because theseproperties reect an optimal response to the environment where theprobability of an item occurring at any particular time is a power function ofits past frequency of occurrence Neural activation which controlsbehaviour reects the probability of an item occurring in the environmentthus the neural processes are designed to adapt behaviour to the statisticalproperties of the environment (Anderson 1993) Connectionist systems aredesigned to do the same thing (Chater 1995)

In our simplied account of inectional morphology where phonologicalfactors are put to one side the relevant units for chunking are the stem formsand the plural afxes From an associative perspective regularity andfrequency are essentially the same factor under different names The rstmeaning of ldquoregularrdquo in the Pocket Oxford Dictionary involves ldquohabitualconstantrdquo acts a denition in terms of statistical frequencies consistencyand descriptive generalisation the second stresses ldquoconforming to a rule orprinciplerdquo We need to disentangle these senses (see Sharwood-Smith 1994and Lima Corrigan amp Iverson 1994 for conceptual analysis of ldquorules oflanguagerdquo) Whether regular morphology is generated according to a rule ornot it is certainly the case for English and the MAL under study here (andgenerally it is the default if not the universal casemdashwe will return to thismatter later) that regular afxes are more habitual or frequent And asdemonstrated in Fig 5 the power law of practice entails that an effect of aconstant increment of regularity (in its frequency sense) is much moreapparent at low than at high frequencies of practice

Although it is a general principle the degree to which it applies dependson a range of factors including (a) the exponent of the power function (b)the particular level of experience attained and thus the placement ofcomparison points on the learning curve and (c) the degree to whichfrequency and regularity are additive or multiplicative In the presentexperiment a vefold increase in the frequency of the regular items resultsin a (5 3 the number of regular items) increase in use of the regular afx avefold increase in the frequency of an irregular item results in merely avefold increase in the use of the irregular afx Thus frequency andregularity are interactive rather than additive But even if we allow forinteraction the function still results in greater regularity effects for lowfrequency itemsmdashjust as for example the power function

330 ELLIS AND SCHMIDT

FIG 5 A frequency by regularity interaction arising from additive contributions of regularity(solid horizontal arrows) and frequency (dotted horizontal arrows) inputting into anasymptoting power function Notice in particular the solid vertical bars measuring out the largeregularity effect at low frequencies and the much smaller one at high frequencies (Adaptedfrom Plaut McClelland Seidenberg amp Patterson 1994)

y 5 1 2 x2 2

asymptotes so does any power function

y 5 1 2 (xn)2 2

where n 0 the shape remains the same albeit stretched or condensedalong the horizontal axis Thus all associative accounts of morphologywhether they stress the importance of type or token frequency (Bybee 1995)in the determination of statistical regularity imply a frequency by regularityinteraction in performance

Plaut et al (1996) analyse the operation of connectionist networks in theparticular quasi-regular domain of spellingndashsound consistency in reading todemonstrate how the frequency by regularity interaction is a direct

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 331

consequence of the nonlinearity adaptivity and distributed representationproperties of learning and representation in PDP networks In what followswe will minimally rephrase their analysis as it applies to the quasi-regulardomain of inectional morphology In a connectionist network the weightchanges induced by an inputoutput pattern (IOP) on any training epochserve to reduce the error on that IOP The frequency of the IOP (and theunits it involves) is reected in how often it is presented to the network Thusword frequency directly amplies weight changes that are helpful to theIOP itself Consistency of the morphological inections of two stems isreected in the similarity of afx units that are co-activated in their IOPsFurthermore two inputs will induce similar weight changes to the extentthat they activate similar units In our MAL as an extreme case consistentforms all activate the same afx unit irregular ones each activate a differentidiosyncratic afx Given that the weight changes that are induced by eachIOP are superimposed on the weight changes for all other IOPs an IOPwill tend to be helped by the weight changes for IOPs whose inputoutputmappings are consistent with its own and hindered by the weight changesfor inconsistent IOPs Thus frequency and consistency sum because theyboth arise from similar weight changes that are simply added together duringtraining The weight changes result in corresponding increases in thesummed input to output units that should be active and decreases to thesummed units that should be inactive However due to the non-linearity ofthe input-output function of units these changes do not produce directlyproportionate reduction of error Rather as the magnitude of the summedinput to output units increases their states gradually asymptote towards10mdasha given increase in the summed input to a unit yields progressivelysmaller decrements in error over the course of training Thus althoughfrequency and regularity-as-consistency each contribute to the weights andhence to the summed input to units their effect on error is subjected to agradual ceiling effect as unit states are driven towards extremal values

Thus a connectionist associative account of simple morphosyntax as it isembodied in our MAL holds that learning involves associating inputpatterns representing single or plural concepts with stem and afx lemmasacross a large distributed network Frequency of experience increases thestrength of the appropriate IO associations Regularity effects stem fromconsistency the consistent items all involve pairings between plurality andthe regular lemma and thus regularity is frequency by another name Thenetwork sums and abstracts these consistencies but it does so usingnon-linear unit inputndashoutput functions thereby resulting in the frequency byregularity interaction Networks are not simple competitive chunking orMarkov chaining mechanisms working on surface form Their massivelydistributed nature allows the emergence of more abstract internalrepresentations We have argued that this analysis accounts for the human

332 ELLIS AND SCHMIDT

acquisition data of simple MAL morphosyntax quite well We believe thatthe acquisition of natural language morphosyntax where there are manyadditional factors of different phonological consistencies (of the type forexample where the neighbours sink drink and stink are irregular in theirpast tenses but all behave in the same -ankway) are equally conducive to theprinciples of this type of account although as illustrated in grandersimulation enterprises (Cottrell amp Plunkett 1994 Daugherty amp Seidenberg1994 MacWhinney amp Leinbach 1991 Marchman 1993 Plunkett ampMarchman 1993) the complexity of interaction of the factors that are therein the language evidence leads to much more complex developmentaloutcomes Our role here has been to study human acquisition underprecisely known circumstances and to demonstrate just how well aconnectionist associative account can simulate these data

A simple regularity5 consistency account of this type will have difculty ifthe ldquoregularrdquo or ldquodefaultrdquo case is not the most frequent case in a naturallanguage Although there is agreement for English past tense and formorphology more generally that the default case is more frequent theremay be exceptions Marcus et al (1995) argue that while the German particle-t applies to a much smaller percentage of verbs than its English counterpartand the German plural -s applies only to a small percentage of nounsnevertheless these afxes behave as defaults in the language These defaultsufxations in German could thus pose a problem for statistical orconnectionist accounts of the acquisition of the more frequent patterns asdefault since they may not be due to a large number of regular wordsreinforcing a pattern in associative memory (Prasada amp Pinker 1993)However this is still a matter of some debate Bybee (1995) suggests that amore reasonable method of counting German particle type frequency doesshow the default (or ldquoproductiverdquo) process to have the highest typefrequency She also argues that to a large extent the productivity patterns ofGerman plurals also reect their type frequency Nakisa and Hahn (1996)and Plunkett and Nakisa (in press) show that generalisation to unseen ornovel forms in German and Arabic (where there have also been claims for aminority default) is more accurately predicted by their phonologicalsimilarity to existing forms in the language (properly represented for typeand token frequency) rather than by the operation of a default rule FinallyHare Elman and Daugherty (1995) demonstrate that multilayerednetworks can develop a default category even in the absence of superior typefrequency as long as the non-default classes are well dened and narrowlydened so that they serve as strong prototypes for analogising to novelforms In such cases the area outside these well-dened attractor basins canconstitute a potential default (see also Plunkett amp Marchman 1991)

In the original hybrid model irregulars were stored and accessed fromrote memory Pinker and Prince (1994 p 326) modied this part of the

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 333

model arguing that since rote memory could not account (a) for similaritiesbetween the morphological base and irregular forms (eg swingndashswung) (b)for similarity within sets of base forms undergoing similar processes (egsingndashsang ringndashrang springndashsprang) or (c) for the kind of semi-productivityshown when children produce errors such as bringndashbrang or swingndashswangthe memory system underlying such productions must be associative anddynamic somewhat as connectionism portrays it Yet to account for datasuch as the frequencyregularity interaction this revised hybrid model stillholds that regular forms are rule-governed But a purely rule-based accountof regulars cannot explain false friends effects where regular inconsistentitems (eg bakendashbaked is similar in rhyme to neighbours makendashmade andtakendashtook which have inconsistent past tenses) are produced more slowlythat entirely regular ones (Daugherty amp Seidenberg 1994 Seidenberg ampBruck 1990) or frequency effects on regular forms (Oetting amp Rice 1993Stemberger amp MacWhinney 1986) Unlike connectionist models a rule-based account of regulars cannot explain these aspects of the human dataNor is the regularityfrequency interaction any reason to reject connectionistaccounts of morphosyntax in favour of a hybrid model

REFERENCESAnderson JR (1982) Acquisition of cognitive skill Psychological Review 89 369ndash406Anderson JR (1993) Rules of the mind Hillsdale NJ Lawrence Erlbaum Associates IncAnderson JR amp Schooler LJ (1991) Reections of the environment in memory

Psychological Science 2 396ndash408Beck M (1995) Tracking down the source of NSndashNNS differences in syntactic competence

Unpublished manuscript University of North TexasBellugi U Bihrle A Jernigan D Trauner D amp Dougherty S (1990)

Neuropsychological neurological and neuroanatomical prole of Williams SyndromeAmerican Journal of Medical Genetics 6 115ndash125

Braine MDS Brody RE Brooks PJ Sudhalter V Ross JA Catalano L amp FischSM (1990) Exploring language acquisition in children with a miniature articiallanguage Effects of item and pattern frequency arbitrary subclasses and correctionJournal of Memory and Language 29 591ndash610

Broeder P amp Plunkett K (1994) Connectionism and second language acquisition In NEllis (Ed) Implicit and explicit learning of languages (pp 421ndash454) London AcademicPress

Bybee J (1995) Regular morphology and the lexicon Language and Cognitive Processes10 425ndash455

Chater N (1995) Neural networks The new statistical models of mind In JP Levy DBairaktaris JA Bullinaria amp P Cairns (Eds) Connectionist models of memory andlanguage London UCL Press

Cohen JD MacWhinney B Flatt M amp Provost J (1993) PsyScope A new graphicinteractive environment for designing psychology experiments Behavioral ResearchMethods Instruments and Computers 25 257ndash271

Cottrell G amp Plunkett K (1994) Acquiring the mapping from meaning to soundsConnection Science 6 379ndash412

334 ELLIS AND SCHMIDT

Daugherty KG amp Seidenberg MS (1992) Rules or connections The past tense revisitedIn Proceedings of the 14th annual conference of the Cognitive Science Society (pp 259ndash264)Pittsburgh PA Cognitive Science Society

Daugherty KG amp Seidenberg MS (1994) Beyond rules and exceptions A connectionistapproach to inectional morphology In SD Lima RL Corrigan amp GK Iverson (Eds)The reality of linguistic rules (pp 353ndash388) Amsterdam John Benjamins

DeKeyser R (1997) Beyond explicit rule learning Automatizing second languagemorphosyntax Studies in Second Language Acquisition 19 195ndash222

Ellis NC (1996) Sequencing in SLA Phonological memory chunking and points of orderStudies in Second Language Acquisition 18 91ndash126

Eubank L amp Gregg KR (1995) ldquoEt in Amygdala Egordquo UG (S)LA and neurobiologyStudies in Second Language Acquisition 17 35ndash58

Hare M Elman JL amp Daugherty KG (1995) Default generalisation in connectionistnetworks Language and Cognitive Processes 10 601ndash630

Jung J (1971) The experimenterrsquos dilemma New York Harper amp RowKirsner K (1994) Implicit processes in second language learning In N Ellis (Ed) Implicit

and explicit learning of languages (pp 283ndash312) London Academic PressLachter J amp Bever T (1988) The relation between linguistic structure and associative

theories of language learning A constructive critique of some connectionist learningmodels Cognition 28 195ndash247

Lima SD Corrigan RL amp Iverson GK (Eds) (1994) The reality of linguistic rulesAmsterdam John Benjamins

MacWhinney B (1983) Miniature language systems as tests of use of universal operatingprinciples in second-language learning by children and adults Journal of PsycholinguisticResearch 12 467ndash478

MacWhinney B (1994) The dinosaurs and the ring In SD Lima RL Corrigan amp GKIverson (Eds) The reality of linguistic rules (pp 283ndash320) Amsterdam John Benjamins

MacWhinney B amp Leinbach J (1991) Implementations are not conceptualizationsRevising the verb learning model Cognition 40 121ndash157

Marchman VA (1993) Constraints on plasticity in a connectionist model of the Englishpast tense Journal of Cognitive Neuroscience 5 215ndash234

Marcus GF Brinkmann U Clahsen H Wiese R amp Pinker S (1995) Germaninection The exception that proves the rule Cognitive Psychology 29 198ndash256

McLaughlin B (1980) On the use of miniature articial languages in second-languageresearch Applied Psycholinguistics 1 357ndash369

Moeser SD amp Bregman AS (1972) The role of reference in the acquisition of a miniaturearticial language Journal of Verbal Learning and Verbal Behavior 11 759ndash769

Morgan JL Meier RP amp Newport EL (1987) Structural packaging in the input tolanguage learning Contributions of prosodic and morphological marking of phrases to theacquisition of language Cognitive Psychology 19 498ndash550

Morgan JL amp Newport EL (1981) The role of constituent structure in the induction of anarticial language Journal of Verbal Learning and Verbal Behavior 20 67ndash85

Morton J (1979) Facilitation in word recognition Experiments causing change in thelogogen model In PA Kolers ME Wrolstad amp M Bouma (Eds) Processing of visiblelanguage (pp 259ndash268) New York Plenum

Nakisa R amp Hahn U (1996) Where defaults donrsquot help The case of the German pluralsystem In Proceedings of the 18th annual conference of the Cognitive Science Society (pp177ndash182) Hillsdale NJ Lawrence Erlbaum Associates Inc

Newell A (1990) Unied theories of cognition Cambridge MA Harvard University PressNewell A amp Rosenbloom P (1981) Mechanisms of skill acquisition and the law of

practice In JR Anderson (Ed) Cognitive skills and their acquisition Hillsdale NJLawrence Erlbaum Associates Inc

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 335

Oetting JB amp Rice ML (1993) Plural acquisition in children with specic languageimpairment Journal of Speech and Hearing Research 36 1236ndash1248

Paivio A (1986) Mental representations A dual coding approach Oxford UK OxfordUniversity Press

Palermo DS amp Howe HE (1970) An experimental analogy to the learning of past-tenseinection rules Journal of Verbal Learning and Verbal Behavior 9 410ndash416

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S amp Prince A (1988) On language and connectionism Analysis of a parallel

distributed processing model of language acquisition Cognition 29 195ndash247Pinker S amp Prince A (1994) Regular and irregular morphology and the psychological

status of rules of grammar In SD Lima RL Corrigan amp GK Iverson (Eds) The reality oflinguistic rules (pp 321ndash351) Amsterdam John Benjamins

Plaut DC McClelland JL Seidenberg MS amp Patterson KE (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plunkett K amp Marchman V (1991) U-shaped learning and frequency effects in amulti-layered perceptron Implications for child language acquisition Cognition 38 3ndash102

Plunkett K amp Marchman V (1993) From rote learning to system building Acquiring verbmorphology in children and connectionist nets Cognition 48 21ndash69

Plunkett K amp Nakisa RC (in press) A connectionist model of Arabic plural systemLanguage and Cognitive Processes

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Prasada S Pinker S amp Snyder W (1990) Some evidence that irregular forms are retrievedfrom memory but regular forms are rule-governed Paper presented at the 31st meeting ofthe Psychonomic Society New Orleans November

Rumelhart D Hinton G amp Williams R (1986) Learning internal representations by backpropagation In DE Rumelhart amp JL McClelland (Ed) Parallel distributed processingExplorations in the microstructure of cognition Cambridge MA MIT Press

Rumelhart D amp McClelland J (1986) On learning the past tense of English verbs In DERumelhart amp JL McClelland (Eds) Parallel distributed processing Explorations in themicrostructure of cognition Vol 2 Psychological and biological models (pp 272ndash326)Cambridge MA MIT Press

Seidenberg MS amp Bruck M (1990) Consistency effects in the generation of past tensemorphology Paper presented at the 31st meeting of the Psychonomic Society New OrleansNovember

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behavior 23 383ndash404

Sharwood Smith MA (1994) The unruly world of language In N Ellis (Ed) Implicit andexplicit learning of languages (pp 33ndash44) London Academic Press

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Human Learning and Memory 6 174ndash215

Stemberger JP amp MacWhinney B (1986) Frequency and the lexical storage of regularlyinected forms Memory and Cognition 14 17ndash26

Winter B amp Reber AS (1994) Implicit learning and the acquisition of natural languagesIn N Ellis (Ed) Implicit and explicit learning of languages (pp 115ndash146) LondonAcademic Press

Yang LR amp Givon T (1997) Benets and drawbacks of controlled laboratory studies ofsecond language acquisition The Keck second language learning project Studies in SecondLanguage Acquisition 19 173ndash194

336 ELLIS AND SCHMIDT

APPENDIX

The Word-forms of the Articial Language

Picture Stem Plural Form Frequency Regularity

car garth bugarth 5 Rbed pid bupid 1 Rlamp lant bulant 5 Rtable tib butid 1 Rplane poon bupoon 5 Rball prill buprill 1 Rtrain dram budram 5 Rhouse hize buhize 1 Rbook bisk bubisk 5 Rbroom breen bubreen 1 Rphone feem gofeem 5 Iumbrella brol gubrol 1 Ichair charp zecharp 5 Ihorse naig zonaig 1 Imonkey chonk nuchonk 5 Idog woop niwoop 1 Ielephant fant vefant 5 Iscissors zoze vuzoze 1 Ikite kag rekag 5 Ish pisc ropisc 1 I

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 327

regular items are relatively poor on ldquowug testsrdquo and have particulardifculty on low frequency irregular items

Discussion of Simulations

We believe that at least for the issue of regularity and frequency effects inmorphosyntax this is to date the most complete quantitative analysis of theadequacy of t of simulation to human data We are not simply makingpredictions about how an underspecied model might behave (theDaugherty amp Seidenberg 1994 criticisms of the Pinker amp Prince 1988 andPinker 1991 theories) We are not simply demonstrating that simulation andhuman data alike exhibit rst order interactions of frequency and regularity(Daugherty amp Seidenberg 1994) Instead we are showing the parallelpatterns of signicance of main effects rst and second order interactions inANOVAs of simulation and human data and we are showing that thesimulations explain close to 80 of the relevant human data When we go asfar as actually comparing human and model performance in a multifactorialANOVA we nd some differences of detail in the size of interactions thatare qualied by the humanmodel factor But these differences of detail donot detract from the general success of the models in simulating the humanpattern of development of the frequency by regularity interaction Inhumans and models alike high frequency items were learned signicantlyfaster than low frequency ones regular items were learned signicantlyfaster than irregular ones there was a signicant frequency by regularityinteraction where the frequency effect was less for regular items than forirregular ones and this is qualied as the higher level interaction with blockwhereby there is a developmental trendmdashthe frequency effect for regularitems attenuates faster than that for irregular items

We have demonstrated that the models can generalise and produce thedefault plural afx for a novel stimulus Similar ldquowug testrdquo performance by ahuman learner would be taken as an operationalisation that they hadacquired the ldquoregularrdquo morphological systematicity

Finally we have shown how varying the computational capacity of themodels affects both the rate of acquisition of default case the presence orabsence of frequency effects for regular items and ability to acquireirregular items This is compatible with existing data for children withspecic language impairment (SLI) Oetting and Rice (1993) compared ve-year-old SLI children with age-matched controls on their ability to formplurals The SLI children were signicantly worse at generating regularplurals for nonce (5 wug) items they were worse at generating regularplurals and they showed an effect of frequency on the regular items whichthe control children because of ceiling effects did not UnfortunatelyOetting and Rice (1993) do not provide clear data on the childrenrsquos ability to

328 ELLIS AND SCHMIDT

form irregular plurals However their pattern of differences between SLIand control childrenrsquos performance on regular items is sufciently close tothat between the present low-capacity and high-capacity simulations tosuggest that morphosyntactic impairments in individuals with SLI might beexplained by reduced language processing capacity in a general associativememory network rather than by a hybrid account The SLI childrenrsquosshowing frequency effects for regular items is particularly compelling in thisrespect However further assessment of regularity by frequency effects anddefault abstraction in individuals with SLI and with Williams syndrome(whose ability on regular forms is said to outstrip their performance onirregularsmdashBellugi Bihrle Jernigan Trauner amp Dougherty 1990) isnecessary to test these parallels further (see Marchman 1993 for othersimulations of different types of language dysfunction)

GENERAL DISCUSSION

Fluent language users have processed many millions of utterances involvingtens of thousands of types presented as innumerable tokens It should comeas no surprise either that they demonstrate such effortless and complex skillas a result of this mass of practice or that researchers lacking any truerecord of the learnersrsquo experience are awed and confused by thesesophisticated grammatical abilities While we have no wish to deny any ofthe complexity of the nal uent state we suspect that much of the mysteryof morphology can be claried by focusing on the acquisition process ratherthan the end-point This has been our aim in this paper Our MAL is atravesty of natural language but at least we know the types and tokens in thelearnersrsquo language evidence and there is no need to speculate or argue aboutextrapolations from corpus data or assumptions about registers

Human learning of this MAL inectional morphology quickly culminatesin a state where as with natural language frequency and regularity haveinteractive effects on performance But as we chart acquisition it is clearthat this interaction need not imply complex dual-mechanisms of processingRather it simply reects the asymptotes expected from the power law ofpractice a simple associative law of learning Thus we have shown that oneof the most frequently introduced arguments for the necessity of adual-mechanism approach a frequency effect for irregulars and the absenceof such an effect for regulars is not a good argument at all Furthermore wehave demonstrated that a simple connectionist model as an implementationof associative learning provided with the same language evidenceaccurately simulates the human acquisition data

But how is the power law instantiated in human and connectionistsystems and what is being associated in the acquisition of inectional

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 329

morphosyntax The power law of learning in human performance has beeninterpreted as resulting from basic associative mechanisms involving theformation of new chunks and the effects of frequency on the accessibility ofthese representations (Newell 1990 Newell amp Rosenbloom 1981)Anderson and Schooler (1991) suggest that memory (both as its behaviouralexpression in error rate and latency and as its neural expression in LTP)displays properties such as the power law of learning because theseproperties reect an optimal response to the environment where theprobability of an item occurring at any particular time is a power function ofits past frequency of occurrence Neural activation which controlsbehaviour reects the probability of an item occurring in the environmentthus the neural processes are designed to adapt behaviour to the statisticalproperties of the environment (Anderson 1993) Connectionist systems aredesigned to do the same thing (Chater 1995)

In our simplied account of inectional morphology where phonologicalfactors are put to one side the relevant units for chunking are the stem formsand the plural afxes From an associative perspective regularity andfrequency are essentially the same factor under different names The rstmeaning of ldquoregularrdquo in the Pocket Oxford Dictionary involves ldquohabitualconstantrdquo acts a denition in terms of statistical frequencies consistencyand descriptive generalisation the second stresses ldquoconforming to a rule orprinciplerdquo We need to disentangle these senses (see Sharwood-Smith 1994and Lima Corrigan amp Iverson 1994 for conceptual analysis of ldquorules oflanguagerdquo) Whether regular morphology is generated according to a rule ornot it is certainly the case for English and the MAL under study here (andgenerally it is the default if not the universal casemdashwe will return to thismatter later) that regular afxes are more habitual or frequent And asdemonstrated in Fig 5 the power law of practice entails that an effect of aconstant increment of regularity (in its frequency sense) is much moreapparent at low than at high frequencies of practice

Although it is a general principle the degree to which it applies dependson a range of factors including (a) the exponent of the power function (b)the particular level of experience attained and thus the placement ofcomparison points on the learning curve and (c) the degree to whichfrequency and regularity are additive or multiplicative In the presentexperiment a vefold increase in the frequency of the regular items resultsin a (5 3 the number of regular items) increase in use of the regular afx avefold increase in the frequency of an irregular item results in merely avefold increase in the use of the irregular afx Thus frequency andregularity are interactive rather than additive But even if we allow forinteraction the function still results in greater regularity effects for lowfrequency itemsmdashjust as for example the power function

330 ELLIS AND SCHMIDT

FIG 5 A frequency by regularity interaction arising from additive contributions of regularity(solid horizontal arrows) and frequency (dotted horizontal arrows) inputting into anasymptoting power function Notice in particular the solid vertical bars measuring out the largeregularity effect at low frequencies and the much smaller one at high frequencies (Adaptedfrom Plaut McClelland Seidenberg amp Patterson 1994)

y 5 1 2 x2 2

asymptotes so does any power function

y 5 1 2 (xn)2 2

where n 0 the shape remains the same albeit stretched or condensedalong the horizontal axis Thus all associative accounts of morphologywhether they stress the importance of type or token frequency (Bybee 1995)in the determination of statistical regularity imply a frequency by regularityinteraction in performance

Plaut et al (1996) analyse the operation of connectionist networks in theparticular quasi-regular domain of spellingndashsound consistency in reading todemonstrate how the frequency by regularity interaction is a direct

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 331

consequence of the nonlinearity adaptivity and distributed representationproperties of learning and representation in PDP networks In what followswe will minimally rephrase their analysis as it applies to the quasi-regulardomain of inectional morphology In a connectionist network the weightchanges induced by an inputoutput pattern (IOP) on any training epochserve to reduce the error on that IOP The frequency of the IOP (and theunits it involves) is reected in how often it is presented to the network Thusword frequency directly amplies weight changes that are helpful to theIOP itself Consistency of the morphological inections of two stems isreected in the similarity of afx units that are co-activated in their IOPsFurthermore two inputs will induce similar weight changes to the extentthat they activate similar units In our MAL as an extreme case consistentforms all activate the same afx unit irregular ones each activate a differentidiosyncratic afx Given that the weight changes that are induced by eachIOP are superimposed on the weight changes for all other IOPs an IOPwill tend to be helped by the weight changes for IOPs whose inputoutputmappings are consistent with its own and hindered by the weight changesfor inconsistent IOPs Thus frequency and consistency sum because theyboth arise from similar weight changes that are simply added together duringtraining The weight changes result in corresponding increases in thesummed input to output units that should be active and decreases to thesummed units that should be inactive However due to the non-linearity ofthe input-output function of units these changes do not produce directlyproportionate reduction of error Rather as the magnitude of the summedinput to output units increases their states gradually asymptote towards10mdasha given increase in the summed input to a unit yields progressivelysmaller decrements in error over the course of training Thus althoughfrequency and regularity-as-consistency each contribute to the weights andhence to the summed input to units their effect on error is subjected to agradual ceiling effect as unit states are driven towards extremal values

Thus a connectionist associative account of simple morphosyntax as it isembodied in our MAL holds that learning involves associating inputpatterns representing single or plural concepts with stem and afx lemmasacross a large distributed network Frequency of experience increases thestrength of the appropriate IO associations Regularity effects stem fromconsistency the consistent items all involve pairings between plurality andthe regular lemma and thus regularity is frequency by another name Thenetwork sums and abstracts these consistencies but it does so usingnon-linear unit inputndashoutput functions thereby resulting in the frequency byregularity interaction Networks are not simple competitive chunking orMarkov chaining mechanisms working on surface form Their massivelydistributed nature allows the emergence of more abstract internalrepresentations We have argued that this analysis accounts for the human

332 ELLIS AND SCHMIDT

acquisition data of simple MAL morphosyntax quite well We believe thatthe acquisition of natural language morphosyntax where there are manyadditional factors of different phonological consistencies (of the type forexample where the neighbours sink drink and stink are irregular in theirpast tenses but all behave in the same -ankway) are equally conducive to theprinciples of this type of account although as illustrated in grandersimulation enterprises (Cottrell amp Plunkett 1994 Daugherty amp Seidenberg1994 MacWhinney amp Leinbach 1991 Marchman 1993 Plunkett ampMarchman 1993) the complexity of interaction of the factors that are therein the language evidence leads to much more complex developmentaloutcomes Our role here has been to study human acquisition underprecisely known circumstances and to demonstrate just how well aconnectionist associative account can simulate these data

A simple regularity5 consistency account of this type will have difculty ifthe ldquoregularrdquo or ldquodefaultrdquo case is not the most frequent case in a naturallanguage Although there is agreement for English past tense and formorphology more generally that the default case is more frequent theremay be exceptions Marcus et al (1995) argue that while the German particle-t applies to a much smaller percentage of verbs than its English counterpartand the German plural -s applies only to a small percentage of nounsnevertheless these afxes behave as defaults in the language These defaultsufxations in German could thus pose a problem for statistical orconnectionist accounts of the acquisition of the more frequent patterns asdefault since they may not be due to a large number of regular wordsreinforcing a pattern in associative memory (Prasada amp Pinker 1993)However this is still a matter of some debate Bybee (1995) suggests that amore reasonable method of counting German particle type frequency doesshow the default (or ldquoproductiverdquo) process to have the highest typefrequency She also argues that to a large extent the productivity patterns ofGerman plurals also reect their type frequency Nakisa and Hahn (1996)and Plunkett and Nakisa (in press) show that generalisation to unseen ornovel forms in German and Arabic (where there have also been claims for aminority default) is more accurately predicted by their phonologicalsimilarity to existing forms in the language (properly represented for typeand token frequency) rather than by the operation of a default rule FinallyHare Elman and Daugherty (1995) demonstrate that multilayerednetworks can develop a default category even in the absence of superior typefrequency as long as the non-default classes are well dened and narrowlydened so that they serve as strong prototypes for analogising to novelforms In such cases the area outside these well-dened attractor basins canconstitute a potential default (see also Plunkett amp Marchman 1991)

In the original hybrid model irregulars were stored and accessed fromrote memory Pinker and Prince (1994 p 326) modied this part of the

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 333

model arguing that since rote memory could not account (a) for similaritiesbetween the morphological base and irregular forms (eg swingndashswung) (b)for similarity within sets of base forms undergoing similar processes (egsingndashsang ringndashrang springndashsprang) or (c) for the kind of semi-productivityshown when children produce errors such as bringndashbrang or swingndashswangthe memory system underlying such productions must be associative anddynamic somewhat as connectionism portrays it Yet to account for datasuch as the frequencyregularity interaction this revised hybrid model stillholds that regular forms are rule-governed But a purely rule-based accountof regulars cannot explain false friends effects where regular inconsistentitems (eg bakendashbaked is similar in rhyme to neighbours makendashmade andtakendashtook which have inconsistent past tenses) are produced more slowlythat entirely regular ones (Daugherty amp Seidenberg 1994 Seidenberg ampBruck 1990) or frequency effects on regular forms (Oetting amp Rice 1993Stemberger amp MacWhinney 1986) Unlike connectionist models a rule-based account of regulars cannot explain these aspects of the human dataNor is the regularityfrequency interaction any reason to reject connectionistaccounts of morphosyntax in favour of a hybrid model

REFERENCESAnderson JR (1982) Acquisition of cognitive skill Psychological Review 89 369ndash406Anderson JR (1993) Rules of the mind Hillsdale NJ Lawrence Erlbaum Associates IncAnderson JR amp Schooler LJ (1991) Reections of the environment in memory

Psychological Science 2 396ndash408Beck M (1995) Tracking down the source of NSndashNNS differences in syntactic competence

Unpublished manuscript University of North TexasBellugi U Bihrle A Jernigan D Trauner D amp Dougherty S (1990)

Neuropsychological neurological and neuroanatomical prole of Williams SyndromeAmerican Journal of Medical Genetics 6 115ndash125

Braine MDS Brody RE Brooks PJ Sudhalter V Ross JA Catalano L amp FischSM (1990) Exploring language acquisition in children with a miniature articiallanguage Effects of item and pattern frequency arbitrary subclasses and correctionJournal of Memory and Language 29 591ndash610

Broeder P amp Plunkett K (1994) Connectionism and second language acquisition In NEllis (Ed) Implicit and explicit learning of languages (pp 421ndash454) London AcademicPress

Bybee J (1995) Regular morphology and the lexicon Language and Cognitive Processes10 425ndash455

Chater N (1995) Neural networks The new statistical models of mind In JP Levy DBairaktaris JA Bullinaria amp P Cairns (Eds) Connectionist models of memory andlanguage London UCL Press

Cohen JD MacWhinney B Flatt M amp Provost J (1993) PsyScope A new graphicinteractive environment for designing psychology experiments Behavioral ResearchMethods Instruments and Computers 25 257ndash271

Cottrell G amp Plunkett K (1994) Acquiring the mapping from meaning to soundsConnection Science 6 379ndash412

334 ELLIS AND SCHMIDT

Daugherty KG amp Seidenberg MS (1992) Rules or connections The past tense revisitedIn Proceedings of the 14th annual conference of the Cognitive Science Society (pp 259ndash264)Pittsburgh PA Cognitive Science Society

Daugherty KG amp Seidenberg MS (1994) Beyond rules and exceptions A connectionistapproach to inectional morphology In SD Lima RL Corrigan amp GK Iverson (Eds)The reality of linguistic rules (pp 353ndash388) Amsterdam John Benjamins

DeKeyser R (1997) Beyond explicit rule learning Automatizing second languagemorphosyntax Studies in Second Language Acquisition 19 195ndash222

Ellis NC (1996) Sequencing in SLA Phonological memory chunking and points of orderStudies in Second Language Acquisition 18 91ndash126

Eubank L amp Gregg KR (1995) ldquoEt in Amygdala Egordquo UG (S)LA and neurobiologyStudies in Second Language Acquisition 17 35ndash58

Hare M Elman JL amp Daugherty KG (1995) Default generalisation in connectionistnetworks Language and Cognitive Processes 10 601ndash630

Jung J (1971) The experimenterrsquos dilemma New York Harper amp RowKirsner K (1994) Implicit processes in second language learning In N Ellis (Ed) Implicit

and explicit learning of languages (pp 283ndash312) London Academic PressLachter J amp Bever T (1988) The relation between linguistic structure and associative

theories of language learning A constructive critique of some connectionist learningmodels Cognition 28 195ndash247

Lima SD Corrigan RL amp Iverson GK (Eds) (1994) The reality of linguistic rulesAmsterdam John Benjamins

MacWhinney B (1983) Miniature language systems as tests of use of universal operatingprinciples in second-language learning by children and adults Journal of PsycholinguisticResearch 12 467ndash478

MacWhinney B (1994) The dinosaurs and the ring In SD Lima RL Corrigan amp GKIverson (Eds) The reality of linguistic rules (pp 283ndash320) Amsterdam John Benjamins

MacWhinney B amp Leinbach J (1991) Implementations are not conceptualizationsRevising the verb learning model Cognition 40 121ndash157

Marchman VA (1993) Constraints on plasticity in a connectionist model of the Englishpast tense Journal of Cognitive Neuroscience 5 215ndash234

Marcus GF Brinkmann U Clahsen H Wiese R amp Pinker S (1995) Germaninection The exception that proves the rule Cognitive Psychology 29 198ndash256

McLaughlin B (1980) On the use of miniature articial languages in second-languageresearch Applied Psycholinguistics 1 357ndash369

Moeser SD amp Bregman AS (1972) The role of reference in the acquisition of a miniaturearticial language Journal of Verbal Learning and Verbal Behavior 11 759ndash769

Morgan JL Meier RP amp Newport EL (1987) Structural packaging in the input tolanguage learning Contributions of prosodic and morphological marking of phrases to theacquisition of language Cognitive Psychology 19 498ndash550

Morgan JL amp Newport EL (1981) The role of constituent structure in the induction of anarticial language Journal of Verbal Learning and Verbal Behavior 20 67ndash85

Morton J (1979) Facilitation in word recognition Experiments causing change in thelogogen model In PA Kolers ME Wrolstad amp M Bouma (Eds) Processing of visiblelanguage (pp 259ndash268) New York Plenum

Nakisa R amp Hahn U (1996) Where defaults donrsquot help The case of the German pluralsystem In Proceedings of the 18th annual conference of the Cognitive Science Society (pp177ndash182) Hillsdale NJ Lawrence Erlbaum Associates Inc

Newell A (1990) Unied theories of cognition Cambridge MA Harvard University PressNewell A amp Rosenbloom P (1981) Mechanisms of skill acquisition and the law of

practice In JR Anderson (Ed) Cognitive skills and their acquisition Hillsdale NJLawrence Erlbaum Associates Inc

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 335

Oetting JB amp Rice ML (1993) Plural acquisition in children with specic languageimpairment Journal of Speech and Hearing Research 36 1236ndash1248

Paivio A (1986) Mental representations A dual coding approach Oxford UK OxfordUniversity Press

Palermo DS amp Howe HE (1970) An experimental analogy to the learning of past-tenseinection rules Journal of Verbal Learning and Verbal Behavior 9 410ndash416

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S amp Prince A (1988) On language and connectionism Analysis of a parallel

distributed processing model of language acquisition Cognition 29 195ndash247Pinker S amp Prince A (1994) Regular and irregular morphology and the psychological

status of rules of grammar In SD Lima RL Corrigan amp GK Iverson (Eds) The reality oflinguistic rules (pp 321ndash351) Amsterdam John Benjamins

Plaut DC McClelland JL Seidenberg MS amp Patterson KE (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plunkett K amp Marchman V (1991) U-shaped learning and frequency effects in amulti-layered perceptron Implications for child language acquisition Cognition 38 3ndash102

Plunkett K amp Marchman V (1993) From rote learning to system building Acquiring verbmorphology in children and connectionist nets Cognition 48 21ndash69

Plunkett K amp Nakisa RC (in press) A connectionist model of Arabic plural systemLanguage and Cognitive Processes

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Prasada S Pinker S amp Snyder W (1990) Some evidence that irregular forms are retrievedfrom memory but regular forms are rule-governed Paper presented at the 31st meeting ofthe Psychonomic Society New Orleans November

Rumelhart D Hinton G amp Williams R (1986) Learning internal representations by backpropagation In DE Rumelhart amp JL McClelland (Ed) Parallel distributed processingExplorations in the microstructure of cognition Cambridge MA MIT Press

Rumelhart D amp McClelland J (1986) On learning the past tense of English verbs In DERumelhart amp JL McClelland (Eds) Parallel distributed processing Explorations in themicrostructure of cognition Vol 2 Psychological and biological models (pp 272ndash326)Cambridge MA MIT Press

Seidenberg MS amp Bruck M (1990) Consistency effects in the generation of past tensemorphology Paper presented at the 31st meeting of the Psychonomic Society New OrleansNovember

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behavior 23 383ndash404

Sharwood Smith MA (1994) The unruly world of language In N Ellis (Ed) Implicit andexplicit learning of languages (pp 33ndash44) London Academic Press

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Human Learning and Memory 6 174ndash215

Stemberger JP amp MacWhinney B (1986) Frequency and the lexical storage of regularlyinected forms Memory and Cognition 14 17ndash26

Winter B amp Reber AS (1994) Implicit learning and the acquisition of natural languagesIn N Ellis (Ed) Implicit and explicit learning of languages (pp 115ndash146) LondonAcademic Press

Yang LR amp Givon T (1997) Benets and drawbacks of controlled laboratory studies ofsecond language acquisition The Keck second language learning project Studies in SecondLanguage Acquisition 19 173ndash194

336 ELLIS AND SCHMIDT

APPENDIX

The Word-forms of the Articial Language

Picture Stem Plural Form Frequency Regularity

car garth bugarth 5 Rbed pid bupid 1 Rlamp lant bulant 5 Rtable tib butid 1 Rplane poon bupoon 5 Rball prill buprill 1 Rtrain dram budram 5 Rhouse hize buhize 1 Rbook bisk bubisk 5 Rbroom breen bubreen 1 Rphone feem gofeem 5 Iumbrella brol gubrol 1 Ichair charp zecharp 5 Ihorse naig zonaig 1 Imonkey chonk nuchonk 5 Idog woop niwoop 1 Ielephant fant vefant 5 Iscissors zoze vuzoze 1 Ikite kag rekag 5 Ish pisc ropisc 1 I

328 ELLIS AND SCHMIDT

form irregular plurals However their pattern of differences between SLIand control childrenrsquos performance on regular items is sufciently close tothat between the present low-capacity and high-capacity simulations tosuggest that morphosyntactic impairments in individuals with SLI might beexplained by reduced language processing capacity in a general associativememory network rather than by a hybrid account The SLI childrenrsquosshowing frequency effects for regular items is particularly compelling in thisrespect However further assessment of regularity by frequency effects anddefault abstraction in individuals with SLI and with Williams syndrome(whose ability on regular forms is said to outstrip their performance onirregularsmdashBellugi Bihrle Jernigan Trauner amp Dougherty 1990) isnecessary to test these parallels further (see Marchman 1993 for othersimulations of different types of language dysfunction)

GENERAL DISCUSSION

Fluent language users have processed many millions of utterances involvingtens of thousands of types presented as innumerable tokens It should comeas no surprise either that they demonstrate such effortless and complex skillas a result of this mass of practice or that researchers lacking any truerecord of the learnersrsquo experience are awed and confused by thesesophisticated grammatical abilities While we have no wish to deny any ofthe complexity of the nal uent state we suspect that much of the mysteryof morphology can be claried by focusing on the acquisition process ratherthan the end-point This has been our aim in this paper Our MAL is atravesty of natural language but at least we know the types and tokens in thelearnersrsquo language evidence and there is no need to speculate or argue aboutextrapolations from corpus data or assumptions about registers

Human learning of this MAL inectional morphology quickly culminatesin a state where as with natural language frequency and regularity haveinteractive effects on performance But as we chart acquisition it is clearthat this interaction need not imply complex dual-mechanisms of processingRather it simply reects the asymptotes expected from the power law ofpractice a simple associative law of learning Thus we have shown that oneof the most frequently introduced arguments for the necessity of adual-mechanism approach a frequency effect for irregulars and the absenceof such an effect for regulars is not a good argument at all Furthermore wehave demonstrated that a simple connectionist model as an implementationof associative learning provided with the same language evidenceaccurately simulates the human acquisition data

But how is the power law instantiated in human and connectionistsystems and what is being associated in the acquisition of inectional

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 329

morphosyntax The power law of learning in human performance has beeninterpreted as resulting from basic associative mechanisms involving theformation of new chunks and the effects of frequency on the accessibility ofthese representations (Newell 1990 Newell amp Rosenbloom 1981)Anderson and Schooler (1991) suggest that memory (both as its behaviouralexpression in error rate and latency and as its neural expression in LTP)displays properties such as the power law of learning because theseproperties reect an optimal response to the environment where theprobability of an item occurring at any particular time is a power function ofits past frequency of occurrence Neural activation which controlsbehaviour reects the probability of an item occurring in the environmentthus the neural processes are designed to adapt behaviour to the statisticalproperties of the environment (Anderson 1993) Connectionist systems aredesigned to do the same thing (Chater 1995)

In our simplied account of inectional morphology where phonologicalfactors are put to one side the relevant units for chunking are the stem formsand the plural afxes From an associative perspective regularity andfrequency are essentially the same factor under different names The rstmeaning of ldquoregularrdquo in the Pocket Oxford Dictionary involves ldquohabitualconstantrdquo acts a denition in terms of statistical frequencies consistencyand descriptive generalisation the second stresses ldquoconforming to a rule orprinciplerdquo We need to disentangle these senses (see Sharwood-Smith 1994and Lima Corrigan amp Iverson 1994 for conceptual analysis of ldquorules oflanguagerdquo) Whether regular morphology is generated according to a rule ornot it is certainly the case for English and the MAL under study here (andgenerally it is the default if not the universal casemdashwe will return to thismatter later) that regular afxes are more habitual or frequent And asdemonstrated in Fig 5 the power law of practice entails that an effect of aconstant increment of regularity (in its frequency sense) is much moreapparent at low than at high frequencies of practice

Although it is a general principle the degree to which it applies dependson a range of factors including (a) the exponent of the power function (b)the particular level of experience attained and thus the placement ofcomparison points on the learning curve and (c) the degree to whichfrequency and regularity are additive or multiplicative In the presentexperiment a vefold increase in the frequency of the regular items resultsin a (5 3 the number of regular items) increase in use of the regular afx avefold increase in the frequency of an irregular item results in merely avefold increase in the use of the irregular afx Thus frequency andregularity are interactive rather than additive But even if we allow forinteraction the function still results in greater regularity effects for lowfrequency itemsmdashjust as for example the power function

330 ELLIS AND SCHMIDT

FIG 5 A frequency by regularity interaction arising from additive contributions of regularity(solid horizontal arrows) and frequency (dotted horizontal arrows) inputting into anasymptoting power function Notice in particular the solid vertical bars measuring out the largeregularity effect at low frequencies and the much smaller one at high frequencies (Adaptedfrom Plaut McClelland Seidenberg amp Patterson 1994)

y 5 1 2 x2 2

asymptotes so does any power function

y 5 1 2 (xn)2 2

where n 0 the shape remains the same albeit stretched or condensedalong the horizontal axis Thus all associative accounts of morphologywhether they stress the importance of type or token frequency (Bybee 1995)in the determination of statistical regularity imply a frequency by regularityinteraction in performance

Plaut et al (1996) analyse the operation of connectionist networks in theparticular quasi-regular domain of spellingndashsound consistency in reading todemonstrate how the frequency by regularity interaction is a direct

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 331

consequence of the nonlinearity adaptivity and distributed representationproperties of learning and representation in PDP networks In what followswe will minimally rephrase their analysis as it applies to the quasi-regulardomain of inectional morphology In a connectionist network the weightchanges induced by an inputoutput pattern (IOP) on any training epochserve to reduce the error on that IOP The frequency of the IOP (and theunits it involves) is reected in how often it is presented to the network Thusword frequency directly amplies weight changes that are helpful to theIOP itself Consistency of the morphological inections of two stems isreected in the similarity of afx units that are co-activated in their IOPsFurthermore two inputs will induce similar weight changes to the extentthat they activate similar units In our MAL as an extreme case consistentforms all activate the same afx unit irregular ones each activate a differentidiosyncratic afx Given that the weight changes that are induced by eachIOP are superimposed on the weight changes for all other IOPs an IOPwill tend to be helped by the weight changes for IOPs whose inputoutputmappings are consistent with its own and hindered by the weight changesfor inconsistent IOPs Thus frequency and consistency sum because theyboth arise from similar weight changes that are simply added together duringtraining The weight changes result in corresponding increases in thesummed input to output units that should be active and decreases to thesummed units that should be inactive However due to the non-linearity ofthe input-output function of units these changes do not produce directlyproportionate reduction of error Rather as the magnitude of the summedinput to output units increases their states gradually asymptote towards10mdasha given increase in the summed input to a unit yields progressivelysmaller decrements in error over the course of training Thus althoughfrequency and regularity-as-consistency each contribute to the weights andhence to the summed input to units their effect on error is subjected to agradual ceiling effect as unit states are driven towards extremal values

Thus a connectionist associative account of simple morphosyntax as it isembodied in our MAL holds that learning involves associating inputpatterns representing single or plural concepts with stem and afx lemmasacross a large distributed network Frequency of experience increases thestrength of the appropriate IO associations Regularity effects stem fromconsistency the consistent items all involve pairings between plurality andthe regular lemma and thus regularity is frequency by another name Thenetwork sums and abstracts these consistencies but it does so usingnon-linear unit inputndashoutput functions thereby resulting in the frequency byregularity interaction Networks are not simple competitive chunking orMarkov chaining mechanisms working on surface form Their massivelydistributed nature allows the emergence of more abstract internalrepresentations We have argued that this analysis accounts for the human

332 ELLIS AND SCHMIDT

acquisition data of simple MAL morphosyntax quite well We believe thatthe acquisition of natural language morphosyntax where there are manyadditional factors of different phonological consistencies (of the type forexample where the neighbours sink drink and stink are irregular in theirpast tenses but all behave in the same -ankway) are equally conducive to theprinciples of this type of account although as illustrated in grandersimulation enterprises (Cottrell amp Plunkett 1994 Daugherty amp Seidenberg1994 MacWhinney amp Leinbach 1991 Marchman 1993 Plunkett ampMarchman 1993) the complexity of interaction of the factors that are therein the language evidence leads to much more complex developmentaloutcomes Our role here has been to study human acquisition underprecisely known circumstances and to demonstrate just how well aconnectionist associative account can simulate these data

A simple regularity5 consistency account of this type will have difculty ifthe ldquoregularrdquo or ldquodefaultrdquo case is not the most frequent case in a naturallanguage Although there is agreement for English past tense and formorphology more generally that the default case is more frequent theremay be exceptions Marcus et al (1995) argue that while the German particle-t applies to a much smaller percentage of verbs than its English counterpartand the German plural -s applies only to a small percentage of nounsnevertheless these afxes behave as defaults in the language These defaultsufxations in German could thus pose a problem for statistical orconnectionist accounts of the acquisition of the more frequent patterns asdefault since they may not be due to a large number of regular wordsreinforcing a pattern in associative memory (Prasada amp Pinker 1993)However this is still a matter of some debate Bybee (1995) suggests that amore reasonable method of counting German particle type frequency doesshow the default (or ldquoproductiverdquo) process to have the highest typefrequency She also argues that to a large extent the productivity patterns ofGerman plurals also reect their type frequency Nakisa and Hahn (1996)and Plunkett and Nakisa (in press) show that generalisation to unseen ornovel forms in German and Arabic (where there have also been claims for aminority default) is more accurately predicted by their phonologicalsimilarity to existing forms in the language (properly represented for typeand token frequency) rather than by the operation of a default rule FinallyHare Elman and Daugherty (1995) demonstrate that multilayerednetworks can develop a default category even in the absence of superior typefrequency as long as the non-default classes are well dened and narrowlydened so that they serve as strong prototypes for analogising to novelforms In such cases the area outside these well-dened attractor basins canconstitute a potential default (see also Plunkett amp Marchman 1991)

In the original hybrid model irregulars were stored and accessed fromrote memory Pinker and Prince (1994 p 326) modied this part of the

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 333

model arguing that since rote memory could not account (a) for similaritiesbetween the morphological base and irregular forms (eg swingndashswung) (b)for similarity within sets of base forms undergoing similar processes (egsingndashsang ringndashrang springndashsprang) or (c) for the kind of semi-productivityshown when children produce errors such as bringndashbrang or swingndashswangthe memory system underlying such productions must be associative anddynamic somewhat as connectionism portrays it Yet to account for datasuch as the frequencyregularity interaction this revised hybrid model stillholds that regular forms are rule-governed But a purely rule-based accountof regulars cannot explain false friends effects where regular inconsistentitems (eg bakendashbaked is similar in rhyme to neighbours makendashmade andtakendashtook which have inconsistent past tenses) are produced more slowlythat entirely regular ones (Daugherty amp Seidenberg 1994 Seidenberg ampBruck 1990) or frequency effects on regular forms (Oetting amp Rice 1993Stemberger amp MacWhinney 1986) Unlike connectionist models a rule-based account of regulars cannot explain these aspects of the human dataNor is the regularityfrequency interaction any reason to reject connectionistaccounts of morphosyntax in favour of a hybrid model

REFERENCESAnderson JR (1982) Acquisition of cognitive skill Psychological Review 89 369ndash406Anderson JR (1993) Rules of the mind Hillsdale NJ Lawrence Erlbaum Associates IncAnderson JR amp Schooler LJ (1991) Reections of the environment in memory

Psychological Science 2 396ndash408Beck M (1995) Tracking down the source of NSndashNNS differences in syntactic competence

Unpublished manuscript University of North TexasBellugi U Bihrle A Jernigan D Trauner D amp Dougherty S (1990)

Neuropsychological neurological and neuroanatomical prole of Williams SyndromeAmerican Journal of Medical Genetics 6 115ndash125

Braine MDS Brody RE Brooks PJ Sudhalter V Ross JA Catalano L amp FischSM (1990) Exploring language acquisition in children with a miniature articiallanguage Effects of item and pattern frequency arbitrary subclasses and correctionJournal of Memory and Language 29 591ndash610

Broeder P amp Plunkett K (1994) Connectionism and second language acquisition In NEllis (Ed) Implicit and explicit learning of languages (pp 421ndash454) London AcademicPress

Bybee J (1995) Regular morphology and the lexicon Language and Cognitive Processes10 425ndash455

Chater N (1995) Neural networks The new statistical models of mind In JP Levy DBairaktaris JA Bullinaria amp P Cairns (Eds) Connectionist models of memory andlanguage London UCL Press

Cohen JD MacWhinney B Flatt M amp Provost J (1993) PsyScope A new graphicinteractive environment for designing psychology experiments Behavioral ResearchMethods Instruments and Computers 25 257ndash271

Cottrell G amp Plunkett K (1994) Acquiring the mapping from meaning to soundsConnection Science 6 379ndash412

334 ELLIS AND SCHMIDT

Daugherty KG amp Seidenberg MS (1992) Rules or connections The past tense revisitedIn Proceedings of the 14th annual conference of the Cognitive Science Society (pp 259ndash264)Pittsburgh PA Cognitive Science Society

Daugherty KG amp Seidenberg MS (1994) Beyond rules and exceptions A connectionistapproach to inectional morphology In SD Lima RL Corrigan amp GK Iverson (Eds)The reality of linguistic rules (pp 353ndash388) Amsterdam John Benjamins

DeKeyser R (1997) Beyond explicit rule learning Automatizing second languagemorphosyntax Studies in Second Language Acquisition 19 195ndash222

Ellis NC (1996) Sequencing in SLA Phonological memory chunking and points of orderStudies in Second Language Acquisition 18 91ndash126

Eubank L amp Gregg KR (1995) ldquoEt in Amygdala Egordquo UG (S)LA and neurobiologyStudies in Second Language Acquisition 17 35ndash58

Hare M Elman JL amp Daugherty KG (1995) Default generalisation in connectionistnetworks Language and Cognitive Processes 10 601ndash630

Jung J (1971) The experimenterrsquos dilemma New York Harper amp RowKirsner K (1994) Implicit processes in second language learning In N Ellis (Ed) Implicit

and explicit learning of languages (pp 283ndash312) London Academic PressLachter J amp Bever T (1988) The relation between linguistic structure and associative

theories of language learning A constructive critique of some connectionist learningmodels Cognition 28 195ndash247

Lima SD Corrigan RL amp Iverson GK (Eds) (1994) The reality of linguistic rulesAmsterdam John Benjamins

MacWhinney B (1983) Miniature language systems as tests of use of universal operatingprinciples in second-language learning by children and adults Journal of PsycholinguisticResearch 12 467ndash478

MacWhinney B (1994) The dinosaurs and the ring In SD Lima RL Corrigan amp GKIverson (Eds) The reality of linguistic rules (pp 283ndash320) Amsterdam John Benjamins

MacWhinney B amp Leinbach J (1991) Implementations are not conceptualizationsRevising the verb learning model Cognition 40 121ndash157

Marchman VA (1993) Constraints on plasticity in a connectionist model of the Englishpast tense Journal of Cognitive Neuroscience 5 215ndash234

Marcus GF Brinkmann U Clahsen H Wiese R amp Pinker S (1995) Germaninection The exception that proves the rule Cognitive Psychology 29 198ndash256

McLaughlin B (1980) On the use of miniature articial languages in second-languageresearch Applied Psycholinguistics 1 357ndash369

Moeser SD amp Bregman AS (1972) The role of reference in the acquisition of a miniaturearticial language Journal of Verbal Learning and Verbal Behavior 11 759ndash769

Morgan JL Meier RP amp Newport EL (1987) Structural packaging in the input tolanguage learning Contributions of prosodic and morphological marking of phrases to theacquisition of language Cognitive Psychology 19 498ndash550

Morgan JL amp Newport EL (1981) The role of constituent structure in the induction of anarticial language Journal of Verbal Learning and Verbal Behavior 20 67ndash85

Morton J (1979) Facilitation in word recognition Experiments causing change in thelogogen model In PA Kolers ME Wrolstad amp M Bouma (Eds) Processing of visiblelanguage (pp 259ndash268) New York Plenum

Nakisa R amp Hahn U (1996) Where defaults donrsquot help The case of the German pluralsystem In Proceedings of the 18th annual conference of the Cognitive Science Society (pp177ndash182) Hillsdale NJ Lawrence Erlbaum Associates Inc

Newell A (1990) Unied theories of cognition Cambridge MA Harvard University PressNewell A amp Rosenbloom P (1981) Mechanisms of skill acquisition and the law of

practice In JR Anderson (Ed) Cognitive skills and their acquisition Hillsdale NJLawrence Erlbaum Associates Inc

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 335

Oetting JB amp Rice ML (1993) Plural acquisition in children with specic languageimpairment Journal of Speech and Hearing Research 36 1236ndash1248

Paivio A (1986) Mental representations A dual coding approach Oxford UK OxfordUniversity Press

Palermo DS amp Howe HE (1970) An experimental analogy to the learning of past-tenseinection rules Journal of Verbal Learning and Verbal Behavior 9 410ndash416

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S amp Prince A (1988) On language and connectionism Analysis of a parallel

distributed processing model of language acquisition Cognition 29 195ndash247Pinker S amp Prince A (1994) Regular and irregular morphology and the psychological

status of rules of grammar In SD Lima RL Corrigan amp GK Iverson (Eds) The reality oflinguistic rules (pp 321ndash351) Amsterdam John Benjamins

Plaut DC McClelland JL Seidenberg MS amp Patterson KE (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plunkett K amp Marchman V (1991) U-shaped learning and frequency effects in amulti-layered perceptron Implications for child language acquisition Cognition 38 3ndash102

Plunkett K amp Marchman V (1993) From rote learning to system building Acquiring verbmorphology in children and connectionist nets Cognition 48 21ndash69

Plunkett K amp Nakisa RC (in press) A connectionist model of Arabic plural systemLanguage and Cognitive Processes

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Prasada S Pinker S amp Snyder W (1990) Some evidence that irregular forms are retrievedfrom memory but regular forms are rule-governed Paper presented at the 31st meeting ofthe Psychonomic Society New Orleans November

Rumelhart D Hinton G amp Williams R (1986) Learning internal representations by backpropagation In DE Rumelhart amp JL McClelland (Ed) Parallel distributed processingExplorations in the microstructure of cognition Cambridge MA MIT Press

Rumelhart D amp McClelland J (1986) On learning the past tense of English verbs In DERumelhart amp JL McClelland (Eds) Parallel distributed processing Explorations in themicrostructure of cognition Vol 2 Psychological and biological models (pp 272ndash326)Cambridge MA MIT Press

Seidenberg MS amp Bruck M (1990) Consistency effects in the generation of past tensemorphology Paper presented at the 31st meeting of the Psychonomic Society New OrleansNovember

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behavior 23 383ndash404

Sharwood Smith MA (1994) The unruly world of language In N Ellis (Ed) Implicit andexplicit learning of languages (pp 33ndash44) London Academic Press

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Human Learning and Memory 6 174ndash215

Stemberger JP amp MacWhinney B (1986) Frequency and the lexical storage of regularlyinected forms Memory and Cognition 14 17ndash26

Winter B amp Reber AS (1994) Implicit learning and the acquisition of natural languagesIn N Ellis (Ed) Implicit and explicit learning of languages (pp 115ndash146) LondonAcademic Press

Yang LR amp Givon T (1997) Benets and drawbacks of controlled laboratory studies ofsecond language acquisition The Keck second language learning project Studies in SecondLanguage Acquisition 19 173ndash194

336 ELLIS AND SCHMIDT

APPENDIX

The Word-forms of the Articial Language

Picture Stem Plural Form Frequency Regularity

car garth bugarth 5 Rbed pid bupid 1 Rlamp lant bulant 5 Rtable tib butid 1 Rplane poon bupoon 5 Rball prill buprill 1 Rtrain dram budram 5 Rhouse hize buhize 1 Rbook bisk bubisk 5 Rbroom breen bubreen 1 Rphone feem gofeem 5 Iumbrella brol gubrol 1 Ichair charp zecharp 5 Ihorse naig zonaig 1 Imonkey chonk nuchonk 5 Idog woop niwoop 1 Ielephant fant vefant 5 Iscissors zoze vuzoze 1 Ikite kag rekag 5 Ish pisc ropisc 1 I

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 329

morphosyntax The power law of learning in human performance has beeninterpreted as resulting from basic associative mechanisms involving theformation of new chunks and the effects of frequency on the accessibility ofthese representations (Newell 1990 Newell amp Rosenbloom 1981)Anderson and Schooler (1991) suggest that memory (both as its behaviouralexpression in error rate and latency and as its neural expression in LTP)displays properties such as the power law of learning because theseproperties reect an optimal response to the environment where theprobability of an item occurring at any particular time is a power function ofits past frequency of occurrence Neural activation which controlsbehaviour reects the probability of an item occurring in the environmentthus the neural processes are designed to adapt behaviour to the statisticalproperties of the environment (Anderson 1993) Connectionist systems aredesigned to do the same thing (Chater 1995)

In our simplied account of inectional morphology where phonologicalfactors are put to one side the relevant units for chunking are the stem formsand the plural afxes From an associative perspective regularity andfrequency are essentially the same factor under different names The rstmeaning of ldquoregularrdquo in the Pocket Oxford Dictionary involves ldquohabitualconstantrdquo acts a denition in terms of statistical frequencies consistencyand descriptive generalisation the second stresses ldquoconforming to a rule orprinciplerdquo We need to disentangle these senses (see Sharwood-Smith 1994and Lima Corrigan amp Iverson 1994 for conceptual analysis of ldquorules oflanguagerdquo) Whether regular morphology is generated according to a rule ornot it is certainly the case for English and the MAL under study here (andgenerally it is the default if not the universal casemdashwe will return to thismatter later) that regular afxes are more habitual or frequent And asdemonstrated in Fig 5 the power law of practice entails that an effect of aconstant increment of regularity (in its frequency sense) is much moreapparent at low than at high frequencies of practice

Although it is a general principle the degree to which it applies dependson a range of factors including (a) the exponent of the power function (b)the particular level of experience attained and thus the placement ofcomparison points on the learning curve and (c) the degree to whichfrequency and regularity are additive or multiplicative In the presentexperiment a vefold increase in the frequency of the regular items resultsin a (5 3 the number of regular items) increase in use of the regular afx avefold increase in the frequency of an irregular item results in merely avefold increase in the use of the irregular afx Thus frequency andregularity are interactive rather than additive But even if we allow forinteraction the function still results in greater regularity effects for lowfrequency itemsmdashjust as for example the power function

330 ELLIS AND SCHMIDT

FIG 5 A frequency by regularity interaction arising from additive contributions of regularity(solid horizontal arrows) and frequency (dotted horizontal arrows) inputting into anasymptoting power function Notice in particular the solid vertical bars measuring out the largeregularity effect at low frequencies and the much smaller one at high frequencies (Adaptedfrom Plaut McClelland Seidenberg amp Patterson 1994)

y 5 1 2 x2 2

asymptotes so does any power function

y 5 1 2 (xn)2 2

where n 0 the shape remains the same albeit stretched or condensedalong the horizontal axis Thus all associative accounts of morphologywhether they stress the importance of type or token frequency (Bybee 1995)in the determination of statistical regularity imply a frequency by regularityinteraction in performance

Plaut et al (1996) analyse the operation of connectionist networks in theparticular quasi-regular domain of spellingndashsound consistency in reading todemonstrate how the frequency by regularity interaction is a direct

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 331

consequence of the nonlinearity adaptivity and distributed representationproperties of learning and representation in PDP networks In what followswe will minimally rephrase their analysis as it applies to the quasi-regulardomain of inectional morphology In a connectionist network the weightchanges induced by an inputoutput pattern (IOP) on any training epochserve to reduce the error on that IOP The frequency of the IOP (and theunits it involves) is reected in how often it is presented to the network Thusword frequency directly amplies weight changes that are helpful to theIOP itself Consistency of the morphological inections of two stems isreected in the similarity of afx units that are co-activated in their IOPsFurthermore two inputs will induce similar weight changes to the extentthat they activate similar units In our MAL as an extreme case consistentforms all activate the same afx unit irregular ones each activate a differentidiosyncratic afx Given that the weight changes that are induced by eachIOP are superimposed on the weight changes for all other IOPs an IOPwill tend to be helped by the weight changes for IOPs whose inputoutputmappings are consistent with its own and hindered by the weight changesfor inconsistent IOPs Thus frequency and consistency sum because theyboth arise from similar weight changes that are simply added together duringtraining The weight changes result in corresponding increases in thesummed input to output units that should be active and decreases to thesummed units that should be inactive However due to the non-linearity ofthe input-output function of units these changes do not produce directlyproportionate reduction of error Rather as the magnitude of the summedinput to output units increases their states gradually asymptote towards10mdasha given increase in the summed input to a unit yields progressivelysmaller decrements in error over the course of training Thus althoughfrequency and regularity-as-consistency each contribute to the weights andhence to the summed input to units their effect on error is subjected to agradual ceiling effect as unit states are driven towards extremal values

Thus a connectionist associative account of simple morphosyntax as it isembodied in our MAL holds that learning involves associating inputpatterns representing single or plural concepts with stem and afx lemmasacross a large distributed network Frequency of experience increases thestrength of the appropriate IO associations Regularity effects stem fromconsistency the consistent items all involve pairings between plurality andthe regular lemma and thus regularity is frequency by another name Thenetwork sums and abstracts these consistencies but it does so usingnon-linear unit inputndashoutput functions thereby resulting in the frequency byregularity interaction Networks are not simple competitive chunking orMarkov chaining mechanisms working on surface form Their massivelydistributed nature allows the emergence of more abstract internalrepresentations We have argued that this analysis accounts for the human

332 ELLIS AND SCHMIDT

acquisition data of simple MAL morphosyntax quite well We believe thatthe acquisition of natural language morphosyntax where there are manyadditional factors of different phonological consistencies (of the type forexample where the neighbours sink drink and stink are irregular in theirpast tenses but all behave in the same -ankway) are equally conducive to theprinciples of this type of account although as illustrated in grandersimulation enterprises (Cottrell amp Plunkett 1994 Daugherty amp Seidenberg1994 MacWhinney amp Leinbach 1991 Marchman 1993 Plunkett ampMarchman 1993) the complexity of interaction of the factors that are therein the language evidence leads to much more complex developmentaloutcomes Our role here has been to study human acquisition underprecisely known circumstances and to demonstrate just how well aconnectionist associative account can simulate these data

A simple regularity5 consistency account of this type will have difculty ifthe ldquoregularrdquo or ldquodefaultrdquo case is not the most frequent case in a naturallanguage Although there is agreement for English past tense and formorphology more generally that the default case is more frequent theremay be exceptions Marcus et al (1995) argue that while the German particle-t applies to a much smaller percentage of verbs than its English counterpartand the German plural -s applies only to a small percentage of nounsnevertheless these afxes behave as defaults in the language These defaultsufxations in German could thus pose a problem for statistical orconnectionist accounts of the acquisition of the more frequent patterns asdefault since they may not be due to a large number of regular wordsreinforcing a pattern in associative memory (Prasada amp Pinker 1993)However this is still a matter of some debate Bybee (1995) suggests that amore reasonable method of counting German particle type frequency doesshow the default (or ldquoproductiverdquo) process to have the highest typefrequency She also argues that to a large extent the productivity patterns ofGerman plurals also reect their type frequency Nakisa and Hahn (1996)and Plunkett and Nakisa (in press) show that generalisation to unseen ornovel forms in German and Arabic (where there have also been claims for aminority default) is more accurately predicted by their phonologicalsimilarity to existing forms in the language (properly represented for typeand token frequency) rather than by the operation of a default rule FinallyHare Elman and Daugherty (1995) demonstrate that multilayerednetworks can develop a default category even in the absence of superior typefrequency as long as the non-default classes are well dened and narrowlydened so that they serve as strong prototypes for analogising to novelforms In such cases the area outside these well-dened attractor basins canconstitute a potential default (see also Plunkett amp Marchman 1991)

In the original hybrid model irregulars were stored and accessed fromrote memory Pinker and Prince (1994 p 326) modied this part of the

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 333

model arguing that since rote memory could not account (a) for similaritiesbetween the morphological base and irregular forms (eg swingndashswung) (b)for similarity within sets of base forms undergoing similar processes (egsingndashsang ringndashrang springndashsprang) or (c) for the kind of semi-productivityshown when children produce errors such as bringndashbrang or swingndashswangthe memory system underlying such productions must be associative anddynamic somewhat as connectionism portrays it Yet to account for datasuch as the frequencyregularity interaction this revised hybrid model stillholds that regular forms are rule-governed But a purely rule-based accountof regulars cannot explain false friends effects where regular inconsistentitems (eg bakendashbaked is similar in rhyme to neighbours makendashmade andtakendashtook which have inconsistent past tenses) are produced more slowlythat entirely regular ones (Daugherty amp Seidenberg 1994 Seidenberg ampBruck 1990) or frequency effects on regular forms (Oetting amp Rice 1993Stemberger amp MacWhinney 1986) Unlike connectionist models a rule-based account of regulars cannot explain these aspects of the human dataNor is the regularityfrequency interaction any reason to reject connectionistaccounts of morphosyntax in favour of a hybrid model

REFERENCESAnderson JR (1982) Acquisition of cognitive skill Psychological Review 89 369ndash406Anderson JR (1993) Rules of the mind Hillsdale NJ Lawrence Erlbaum Associates IncAnderson JR amp Schooler LJ (1991) Reections of the environment in memory

Psychological Science 2 396ndash408Beck M (1995) Tracking down the source of NSndashNNS differences in syntactic competence

Unpublished manuscript University of North TexasBellugi U Bihrle A Jernigan D Trauner D amp Dougherty S (1990)

Neuropsychological neurological and neuroanatomical prole of Williams SyndromeAmerican Journal of Medical Genetics 6 115ndash125

Braine MDS Brody RE Brooks PJ Sudhalter V Ross JA Catalano L amp FischSM (1990) Exploring language acquisition in children with a miniature articiallanguage Effects of item and pattern frequency arbitrary subclasses and correctionJournal of Memory and Language 29 591ndash610

Broeder P amp Plunkett K (1994) Connectionism and second language acquisition In NEllis (Ed) Implicit and explicit learning of languages (pp 421ndash454) London AcademicPress

Bybee J (1995) Regular morphology and the lexicon Language and Cognitive Processes10 425ndash455

Chater N (1995) Neural networks The new statistical models of mind In JP Levy DBairaktaris JA Bullinaria amp P Cairns (Eds) Connectionist models of memory andlanguage London UCL Press

Cohen JD MacWhinney B Flatt M amp Provost J (1993) PsyScope A new graphicinteractive environment for designing psychology experiments Behavioral ResearchMethods Instruments and Computers 25 257ndash271

Cottrell G amp Plunkett K (1994) Acquiring the mapping from meaning to soundsConnection Science 6 379ndash412

334 ELLIS AND SCHMIDT

Daugherty KG amp Seidenberg MS (1992) Rules or connections The past tense revisitedIn Proceedings of the 14th annual conference of the Cognitive Science Society (pp 259ndash264)Pittsburgh PA Cognitive Science Society

Daugherty KG amp Seidenberg MS (1994) Beyond rules and exceptions A connectionistapproach to inectional morphology In SD Lima RL Corrigan amp GK Iverson (Eds)The reality of linguistic rules (pp 353ndash388) Amsterdam John Benjamins

DeKeyser R (1997) Beyond explicit rule learning Automatizing second languagemorphosyntax Studies in Second Language Acquisition 19 195ndash222

Ellis NC (1996) Sequencing in SLA Phonological memory chunking and points of orderStudies in Second Language Acquisition 18 91ndash126

Eubank L amp Gregg KR (1995) ldquoEt in Amygdala Egordquo UG (S)LA and neurobiologyStudies in Second Language Acquisition 17 35ndash58

Hare M Elman JL amp Daugherty KG (1995) Default generalisation in connectionistnetworks Language and Cognitive Processes 10 601ndash630

Jung J (1971) The experimenterrsquos dilemma New York Harper amp RowKirsner K (1994) Implicit processes in second language learning In N Ellis (Ed) Implicit

and explicit learning of languages (pp 283ndash312) London Academic PressLachter J amp Bever T (1988) The relation between linguistic structure and associative

theories of language learning A constructive critique of some connectionist learningmodels Cognition 28 195ndash247

Lima SD Corrigan RL amp Iverson GK (Eds) (1994) The reality of linguistic rulesAmsterdam John Benjamins

MacWhinney B (1983) Miniature language systems as tests of use of universal operatingprinciples in second-language learning by children and adults Journal of PsycholinguisticResearch 12 467ndash478

MacWhinney B (1994) The dinosaurs and the ring In SD Lima RL Corrigan amp GKIverson (Eds) The reality of linguistic rules (pp 283ndash320) Amsterdam John Benjamins

MacWhinney B amp Leinbach J (1991) Implementations are not conceptualizationsRevising the verb learning model Cognition 40 121ndash157

Marchman VA (1993) Constraints on plasticity in a connectionist model of the Englishpast tense Journal of Cognitive Neuroscience 5 215ndash234

Marcus GF Brinkmann U Clahsen H Wiese R amp Pinker S (1995) Germaninection The exception that proves the rule Cognitive Psychology 29 198ndash256

McLaughlin B (1980) On the use of miniature articial languages in second-languageresearch Applied Psycholinguistics 1 357ndash369

Moeser SD amp Bregman AS (1972) The role of reference in the acquisition of a miniaturearticial language Journal of Verbal Learning and Verbal Behavior 11 759ndash769

Morgan JL Meier RP amp Newport EL (1987) Structural packaging in the input tolanguage learning Contributions of prosodic and morphological marking of phrases to theacquisition of language Cognitive Psychology 19 498ndash550

Morgan JL amp Newport EL (1981) The role of constituent structure in the induction of anarticial language Journal of Verbal Learning and Verbal Behavior 20 67ndash85

Morton J (1979) Facilitation in word recognition Experiments causing change in thelogogen model In PA Kolers ME Wrolstad amp M Bouma (Eds) Processing of visiblelanguage (pp 259ndash268) New York Plenum

Nakisa R amp Hahn U (1996) Where defaults donrsquot help The case of the German pluralsystem In Proceedings of the 18th annual conference of the Cognitive Science Society (pp177ndash182) Hillsdale NJ Lawrence Erlbaum Associates Inc

Newell A (1990) Unied theories of cognition Cambridge MA Harvard University PressNewell A amp Rosenbloom P (1981) Mechanisms of skill acquisition and the law of

practice In JR Anderson (Ed) Cognitive skills and their acquisition Hillsdale NJLawrence Erlbaum Associates Inc

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 335

Oetting JB amp Rice ML (1993) Plural acquisition in children with specic languageimpairment Journal of Speech and Hearing Research 36 1236ndash1248

Paivio A (1986) Mental representations A dual coding approach Oxford UK OxfordUniversity Press

Palermo DS amp Howe HE (1970) An experimental analogy to the learning of past-tenseinection rules Journal of Verbal Learning and Verbal Behavior 9 410ndash416

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S amp Prince A (1988) On language and connectionism Analysis of a parallel

distributed processing model of language acquisition Cognition 29 195ndash247Pinker S amp Prince A (1994) Regular and irregular morphology and the psychological

status of rules of grammar In SD Lima RL Corrigan amp GK Iverson (Eds) The reality oflinguistic rules (pp 321ndash351) Amsterdam John Benjamins

Plaut DC McClelland JL Seidenberg MS amp Patterson KE (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plunkett K amp Marchman V (1991) U-shaped learning and frequency effects in amulti-layered perceptron Implications for child language acquisition Cognition 38 3ndash102

Plunkett K amp Marchman V (1993) From rote learning to system building Acquiring verbmorphology in children and connectionist nets Cognition 48 21ndash69

Plunkett K amp Nakisa RC (in press) A connectionist model of Arabic plural systemLanguage and Cognitive Processes

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Prasada S Pinker S amp Snyder W (1990) Some evidence that irregular forms are retrievedfrom memory but regular forms are rule-governed Paper presented at the 31st meeting ofthe Psychonomic Society New Orleans November

Rumelhart D Hinton G amp Williams R (1986) Learning internal representations by backpropagation In DE Rumelhart amp JL McClelland (Ed) Parallel distributed processingExplorations in the microstructure of cognition Cambridge MA MIT Press

Rumelhart D amp McClelland J (1986) On learning the past tense of English verbs In DERumelhart amp JL McClelland (Eds) Parallel distributed processing Explorations in themicrostructure of cognition Vol 2 Psychological and biological models (pp 272ndash326)Cambridge MA MIT Press

Seidenberg MS amp Bruck M (1990) Consistency effects in the generation of past tensemorphology Paper presented at the 31st meeting of the Psychonomic Society New OrleansNovember

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behavior 23 383ndash404

Sharwood Smith MA (1994) The unruly world of language In N Ellis (Ed) Implicit andexplicit learning of languages (pp 33ndash44) London Academic Press

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Human Learning and Memory 6 174ndash215

Stemberger JP amp MacWhinney B (1986) Frequency and the lexical storage of regularlyinected forms Memory and Cognition 14 17ndash26

Winter B amp Reber AS (1994) Implicit learning and the acquisition of natural languagesIn N Ellis (Ed) Implicit and explicit learning of languages (pp 115ndash146) LondonAcademic Press

Yang LR amp Givon T (1997) Benets and drawbacks of controlled laboratory studies ofsecond language acquisition The Keck second language learning project Studies in SecondLanguage Acquisition 19 173ndash194

336 ELLIS AND SCHMIDT

APPENDIX

The Word-forms of the Articial Language

Picture Stem Plural Form Frequency Regularity

car garth bugarth 5 Rbed pid bupid 1 Rlamp lant bulant 5 Rtable tib butid 1 Rplane poon bupoon 5 Rball prill buprill 1 Rtrain dram budram 5 Rhouse hize buhize 1 Rbook bisk bubisk 5 Rbroom breen bubreen 1 Rphone feem gofeem 5 Iumbrella brol gubrol 1 Ichair charp zecharp 5 Ihorse naig zonaig 1 Imonkey chonk nuchonk 5 Idog woop niwoop 1 Ielephant fant vefant 5 Iscissors zoze vuzoze 1 Ikite kag rekag 5 Ish pisc ropisc 1 I

330 ELLIS AND SCHMIDT

FIG 5 A frequency by regularity interaction arising from additive contributions of regularity(solid horizontal arrows) and frequency (dotted horizontal arrows) inputting into anasymptoting power function Notice in particular the solid vertical bars measuring out the largeregularity effect at low frequencies and the much smaller one at high frequencies (Adaptedfrom Plaut McClelland Seidenberg amp Patterson 1994)

y 5 1 2 x2 2

asymptotes so does any power function

y 5 1 2 (xn)2 2

where n 0 the shape remains the same albeit stretched or condensedalong the horizontal axis Thus all associative accounts of morphologywhether they stress the importance of type or token frequency (Bybee 1995)in the determination of statistical regularity imply a frequency by regularityinteraction in performance

Plaut et al (1996) analyse the operation of connectionist networks in theparticular quasi-regular domain of spellingndashsound consistency in reading todemonstrate how the frequency by regularity interaction is a direct

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 331

consequence of the nonlinearity adaptivity and distributed representationproperties of learning and representation in PDP networks In what followswe will minimally rephrase their analysis as it applies to the quasi-regulardomain of inectional morphology In a connectionist network the weightchanges induced by an inputoutput pattern (IOP) on any training epochserve to reduce the error on that IOP The frequency of the IOP (and theunits it involves) is reected in how often it is presented to the network Thusword frequency directly amplies weight changes that are helpful to theIOP itself Consistency of the morphological inections of two stems isreected in the similarity of afx units that are co-activated in their IOPsFurthermore two inputs will induce similar weight changes to the extentthat they activate similar units In our MAL as an extreme case consistentforms all activate the same afx unit irregular ones each activate a differentidiosyncratic afx Given that the weight changes that are induced by eachIOP are superimposed on the weight changes for all other IOPs an IOPwill tend to be helped by the weight changes for IOPs whose inputoutputmappings are consistent with its own and hindered by the weight changesfor inconsistent IOPs Thus frequency and consistency sum because theyboth arise from similar weight changes that are simply added together duringtraining The weight changes result in corresponding increases in thesummed input to output units that should be active and decreases to thesummed units that should be inactive However due to the non-linearity ofthe input-output function of units these changes do not produce directlyproportionate reduction of error Rather as the magnitude of the summedinput to output units increases their states gradually asymptote towards10mdasha given increase in the summed input to a unit yields progressivelysmaller decrements in error over the course of training Thus althoughfrequency and regularity-as-consistency each contribute to the weights andhence to the summed input to units their effect on error is subjected to agradual ceiling effect as unit states are driven towards extremal values

Thus a connectionist associative account of simple morphosyntax as it isembodied in our MAL holds that learning involves associating inputpatterns representing single or plural concepts with stem and afx lemmasacross a large distributed network Frequency of experience increases thestrength of the appropriate IO associations Regularity effects stem fromconsistency the consistent items all involve pairings between plurality andthe regular lemma and thus regularity is frequency by another name Thenetwork sums and abstracts these consistencies but it does so usingnon-linear unit inputndashoutput functions thereby resulting in the frequency byregularity interaction Networks are not simple competitive chunking orMarkov chaining mechanisms working on surface form Their massivelydistributed nature allows the emergence of more abstract internalrepresentations We have argued that this analysis accounts for the human

332 ELLIS AND SCHMIDT

acquisition data of simple MAL morphosyntax quite well We believe thatthe acquisition of natural language morphosyntax where there are manyadditional factors of different phonological consistencies (of the type forexample where the neighbours sink drink and stink are irregular in theirpast tenses but all behave in the same -ankway) are equally conducive to theprinciples of this type of account although as illustrated in grandersimulation enterprises (Cottrell amp Plunkett 1994 Daugherty amp Seidenberg1994 MacWhinney amp Leinbach 1991 Marchman 1993 Plunkett ampMarchman 1993) the complexity of interaction of the factors that are therein the language evidence leads to much more complex developmentaloutcomes Our role here has been to study human acquisition underprecisely known circumstances and to demonstrate just how well aconnectionist associative account can simulate these data

A simple regularity5 consistency account of this type will have difculty ifthe ldquoregularrdquo or ldquodefaultrdquo case is not the most frequent case in a naturallanguage Although there is agreement for English past tense and formorphology more generally that the default case is more frequent theremay be exceptions Marcus et al (1995) argue that while the German particle-t applies to a much smaller percentage of verbs than its English counterpartand the German plural -s applies only to a small percentage of nounsnevertheless these afxes behave as defaults in the language These defaultsufxations in German could thus pose a problem for statistical orconnectionist accounts of the acquisition of the more frequent patterns asdefault since they may not be due to a large number of regular wordsreinforcing a pattern in associative memory (Prasada amp Pinker 1993)However this is still a matter of some debate Bybee (1995) suggests that amore reasonable method of counting German particle type frequency doesshow the default (or ldquoproductiverdquo) process to have the highest typefrequency She also argues that to a large extent the productivity patterns ofGerman plurals also reect their type frequency Nakisa and Hahn (1996)and Plunkett and Nakisa (in press) show that generalisation to unseen ornovel forms in German and Arabic (where there have also been claims for aminority default) is more accurately predicted by their phonologicalsimilarity to existing forms in the language (properly represented for typeand token frequency) rather than by the operation of a default rule FinallyHare Elman and Daugherty (1995) demonstrate that multilayerednetworks can develop a default category even in the absence of superior typefrequency as long as the non-default classes are well dened and narrowlydened so that they serve as strong prototypes for analogising to novelforms In such cases the area outside these well-dened attractor basins canconstitute a potential default (see also Plunkett amp Marchman 1991)

In the original hybrid model irregulars were stored and accessed fromrote memory Pinker and Prince (1994 p 326) modied this part of the

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 333

model arguing that since rote memory could not account (a) for similaritiesbetween the morphological base and irregular forms (eg swingndashswung) (b)for similarity within sets of base forms undergoing similar processes (egsingndashsang ringndashrang springndashsprang) or (c) for the kind of semi-productivityshown when children produce errors such as bringndashbrang or swingndashswangthe memory system underlying such productions must be associative anddynamic somewhat as connectionism portrays it Yet to account for datasuch as the frequencyregularity interaction this revised hybrid model stillholds that regular forms are rule-governed But a purely rule-based accountof regulars cannot explain false friends effects where regular inconsistentitems (eg bakendashbaked is similar in rhyme to neighbours makendashmade andtakendashtook which have inconsistent past tenses) are produced more slowlythat entirely regular ones (Daugherty amp Seidenberg 1994 Seidenberg ampBruck 1990) or frequency effects on regular forms (Oetting amp Rice 1993Stemberger amp MacWhinney 1986) Unlike connectionist models a rule-based account of regulars cannot explain these aspects of the human dataNor is the regularityfrequency interaction any reason to reject connectionistaccounts of morphosyntax in favour of a hybrid model

REFERENCESAnderson JR (1982) Acquisition of cognitive skill Psychological Review 89 369ndash406Anderson JR (1993) Rules of the mind Hillsdale NJ Lawrence Erlbaum Associates IncAnderson JR amp Schooler LJ (1991) Reections of the environment in memory

Psychological Science 2 396ndash408Beck M (1995) Tracking down the source of NSndashNNS differences in syntactic competence

Unpublished manuscript University of North TexasBellugi U Bihrle A Jernigan D Trauner D amp Dougherty S (1990)

Neuropsychological neurological and neuroanatomical prole of Williams SyndromeAmerican Journal of Medical Genetics 6 115ndash125

Braine MDS Brody RE Brooks PJ Sudhalter V Ross JA Catalano L amp FischSM (1990) Exploring language acquisition in children with a miniature articiallanguage Effects of item and pattern frequency arbitrary subclasses and correctionJournal of Memory and Language 29 591ndash610

Broeder P amp Plunkett K (1994) Connectionism and second language acquisition In NEllis (Ed) Implicit and explicit learning of languages (pp 421ndash454) London AcademicPress

Bybee J (1995) Regular morphology and the lexicon Language and Cognitive Processes10 425ndash455

Chater N (1995) Neural networks The new statistical models of mind In JP Levy DBairaktaris JA Bullinaria amp P Cairns (Eds) Connectionist models of memory andlanguage London UCL Press

Cohen JD MacWhinney B Flatt M amp Provost J (1993) PsyScope A new graphicinteractive environment for designing psychology experiments Behavioral ResearchMethods Instruments and Computers 25 257ndash271

Cottrell G amp Plunkett K (1994) Acquiring the mapping from meaning to soundsConnection Science 6 379ndash412

334 ELLIS AND SCHMIDT

Daugherty KG amp Seidenberg MS (1992) Rules or connections The past tense revisitedIn Proceedings of the 14th annual conference of the Cognitive Science Society (pp 259ndash264)Pittsburgh PA Cognitive Science Society

Daugherty KG amp Seidenberg MS (1994) Beyond rules and exceptions A connectionistapproach to inectional morphology In SD Lima RL Corrigan amp GK Iverson (Eds)The reality of linguistic rules (pp 353ndash388) Amsterdam John Benjamins

DeKeyser R (1997) Beyond explicit rule learning Automatizing second languagemorphosyntax Studies in Second Language Acquisition 19 195ndash222

Ellis NC (1996) Sequencing in SLA Phonological memory chunking and points of orderStudies in Second Language Acquisition 18 91ndash126

Eubank L amp Gregg KR (1995) ldquoEt in Amygdala Egordquo UG (S)LA and neurobiologyStudies in Second Language Acquisition 17 35ndash58

Hare M Elman JL amp Daugherty KG (1995) Default generalisation in connectionistnetworks Language and Cognitive Processes 10 601ndash630

Jung J (1971) The experimenterrsquos dilemma New York Harper amp RowKirsner K (1994) Implicit processes in second language learning In N Ellis (Ed) Implicit

and explicit learning of languages (pp 283ndash312) London Academic PressLachter J amp Bever T (1988) The relation between linguistic structure and associative

theories of language learning A constructive critique of some connectionist learningmodels Cognition 28 195ndash247

Lima SD Corrigan RL amp Iverson GK (Eds) (1994) The reality of linguistic rulesAmsterdam John Benjamins

MacWhinney B (1983) Miniature language systems as tests of use of universal operatingprinciples in second-language learning by children and adults Journal of PsycholinguisticResearch 12 467ndash478

MacWhinney B (1994) The dinosaurs and the ring In SD Lima RL Corrigan amp GKIverson (Eds) The reality of linguistic rules (pp 283ndash320) Amsterdam John Benjamins

MacWhinney B amp Leinbach J (1991) Implementations are not conceptualizationsRevising the verb learning model Cognition 40 121ndash157

Marchman VA (1993) Constraints on plasticity in a connectionist model of the Englishpast tense Journal of Cognitive Neuroscience 5 215ndash234

Marcus GF Brinkmann U Clahsen H Wiese R amp Pinker S (1995) Germaninection The exception that proves the rule Cognitive Psychology 29 198ndash256

McLaughlin B (1980) On the use of miniature articial languages in second-languageresearch Applied Psycholinguistics 1 357ndash369

Moeser SD amp Bregman AS (1972) The role of reference in the acquisition of a miniaturearticial language Journal of Verbal Learning and Verbal Behavior 11 759ndash769

Morgan JL Meier RP amp Newport EL (1987) Structural packaging in the input tolanguage learning Contributions of prosodic and morphological marking of phrases to theacquisition of language Cognitive Psychology 19 498ndash550

Morgan JL amp Newport EL (1981) The role of constituent structure in the induction of anarticial language Journal of Verbal Learning and Verbal Behavior 20 67ndash85

Morton J (1979) Facilitation in word recognition Experiments causing change in thelogogen model In PA Kolers ME Wrolstad amp M Bouma (Eds) Processing of visiblelanguage (pp 259ndash268) New York Plenum

Nakisa R amp Hahn U (1996) Where defaults donrsquot help The case of the German pluralsystem In Proceedings of the 18th annual conference of the Cognitive Science Society (pp177ndash182) Hillsdale NJ Lawrence Erlbaum Associates Inc

Newell A (1990) Unied theories of cognition Cambridge MA Harvard University PressNewell A amp Rosenbloom P (1981) Mechanisms of skill acquisition and the law of

practice In JR Anderson (Ed) Cognitive skills and their acquisition Hillsdale NJLawrence Erlbaum Associates Inc

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 335

Oetting JB amp Rice ML (1993) Plural acquisition in children with specic languageimpairment Journal of Speech and Hearing Research 36 1236ndash1248

Paivio A (1986) Mental representations A dual coding approach Oxford UK OxfordUniversity Press

Palermo DS amp Howe HE (1970) An experimental analogy to the learning of past-tenseinection rules Journal of Verbal Learning and Verbal Behavior 9 410ndash416

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S amp Prince A (1988) On language and connectionism Analysis of a parallel

distributed processing model of language acquisition Cognition 29 195ndash247Pinker S amp Prince A (1994) Regular and irregular morphology and the psychological

status of rules of grammar In SD Lima RL Corrigan amp GK Iverson (Eds) The reality oflinguistic rules (pp 321ndash351) Amsterdam John Benjamins

Plaut DC McClelland JL Seidenberg MS amp Patterson KE (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plunkett K amp Marchman V (1991) U-shaped learning and frequency effects in amulti-layered perceptron Implications for child language acquisition Cognition 38 3ndash102

Plunkett K amp Marchman V (1993) From rote learning to system building Acquiring verbmorphology in children and connectionist nets Cognition 48 21ndash69

Plunkett K amp Nakisa RC (in press) A connectionist model of Arabic plural systemLanguage and Cognitive Processes

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Prasada S Pinker S amp Snyder W (1990) Some evidence that irregular forms are retrievedfrom memory but regular forms are rule-governed Paper presented at the 31st meeting ofthe Psychonomic Society New Orleans November

Rumelhart D Hinton G amp Williams R (1986) Learning internal representations by backpropagation In DE Rumelhart amp JL McClelland (Ed) Parallel distributed processingExplorations in the microstructure of cognition Cambridge MA MIT Press

Rumelhart D amp McClelland J (1986) On learning the past tense of English verbs In DERumelhart amp JL McClelland (Eds) Parallel distributed processing Explorations in themicrostructure of cognition Vol 2 Psychological and biological models (pp 272ndash326)Cambridge MA MIT Press

Seidenberg MS amp Bruck M (1990) Consistency effects in the generation of past tensemorphology Paper presented at the 31st meeting of the Psychonomic Society New OrleansNovember

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behavior 23 383ndash404

Sharwood Smith MA (1994) The unruly world of language In N Ellis (Ed) Implicit andexplicit learning of languages (pp 33ndash44) London Academic Press

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Human Learning and Memory 6 174ndash215

Stemberger JP amp MacWhinney B (1986) Frequency and the lexical storage of regularlyinected forms Memory and Cognition 14 17ndash26

Winter B amp Reber AS (1994) Implicit learning and the acquisition of natural languagesIn N Ellis (Ed) Implicit and explicit learning of languages (pp 115ndash146) LondonAcademic Press

Yang LR amp Givon T (1997) Benets and drawbacks of controlled laboratory studies ofsecond language acquisition The Keck second language learning project Studies in SecondLanguage Acquisition 19 173ndash194

336 ELLIS AND SCHMIDT

APPENDIX

The Word-forms of the Articial Language

Picture Stem Plural Form Frequency Regularity

car garth bugarth 5 Rbed pid bupid 1 Rlamp lant bulant 5 Rtable tib butid 1 Rplane poon bupoon 5 Rball prill buprill 1 Rtrain dram budram 5 Rhouse hize buhize 1 Rbook bisk bubisk 5 Rbroom breen bubreen 1 Rphone feem gofeem 5 Iumbrella brol gubrol 1 Ichair charp zecharp 5 Ihorse naig zonaig 1 Imonkey chonk nuchonk 5 Idog woop niwoop 1 Ielephant fant vefant 5 Iscissors zoze vuzoze 1 Ikite kag rekag 5 Ish pisc ropisc 1 I

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 331

consequence of the nonlinearity adaptivity and distributed representationproperties of learning and representation in PDP networks In what followswe will minimally rephrase their analysis as it applies to the quasi-regulardomain of inectional morphology In a connectionist network the weightchanges induced by an inputoutput pattern (IOP) on any training epochserve to reduce the error on that IOP The frequency of the IOP (and theunits it involves) is reected in how often it is presented to the network Thusword frequency directly amplies weight changes that are helpful to theIOP itself Consistency of the morphological inections of two stems isreected in the similarity of afx units that are co-activated in their IOPsFurthermore two inputs will induce similar weight changes to the extentthat they activate similar units In our MAL as an extreme case consistentforms all activate the same afx unit irregular ones each activate a differentidiosyncratic afx Given that the weight changes that are induced by eachIOP are superimposed on the weight changes for all other IOPs an IOPwill tend to be helped by the weight changes for IOPs whose inputoutputmappings are consistent with its own and hindered by the weight changesfor inconsistent IOPs Thus frequency and consistency sum because theyboth arise from similar weight changes that are simply added together duringtraining The weight changes result in corresponding increases in thesummed input to output units that should be active and decreases to thesummed units that should be inactive However due to the non-linearity ofthe input-output function of units these changes do not produce directlyproportionate reduction of error Rather as the magnitude of the summedinput to output units increases their states gradually asymptote towards10mdasha given increase in the summed input to a unit yields progressivelysmaller decrements in error over the course of training Thus althoughfrequency and regularity-as-consistency each contribute to the weights andhence to the summed input to units their effect on error is subjected to agradual ceiling effect as unit states are driven towards extremal values

Thus a connectionist associative account of simple morphosyntax as it isembodied in our MAL holds that learning involves associating inputpatterns representing single or plural concepts with stem and afx lemmasacross a large distributed network Frequency of experience increases thestrength of the appropriate IO associations Regularity effects stem fromconsistency the consistent items all involve pairings between plurality andthe regular lemma and thus regularity is frequency by another name Thenetwork sums and abstracts these consistencies but it does so usingnon-linear unit inputndashoutput functions thereby resulting in the frequency byregularity interaction Networks are not simple competitive chunking orMarkov chaining mechanisms working on surface form Their massivelydistributed nature allows the emergence of more abstract internalrepresentations We have argued that this analysis accounts for the human

332 ELLIS AND SCHMIDT

acquisition data of simple MAL morphosyntax quite well We believe thatthe acquisition of natural language morphosyntax where there are manyadditional factors of different phonological consistencies (of the type forexample where the neighbours sink drink and stink are irregular in theirpast tenses but all behave in the same -ankway) are equally conducive to theprinciples of this type of account although as illustrated in grandersimulation enterprises (Cottrell amp Plunkett 1994 Daugherty amp Seidenberg1994 MacWhinney amp Leinbach 1991 Marchman 1993 Plunkett ampMarchman 1993) the complexity of interaction of the factors that are therein the language evidence leads to much more complex developmentaloutcomes Our role here has been to study human acquisition underprecisely known circumstances and to demonstrate just how well aconnectionist associative account can simulate these data

A simple regularity5 consistency account of this type will have difculty ifthe ldquoregularrdquo or ldquodefaultrdquo case is not the most frequent case in a naturallanguage Although there is agreement for English past tense and formorphology more generally that the default case is more frequent theremay be exceptions Marcus et al (1995) argue that while the German particle-t applies to a much smaller percentage of verbs than its English counterpartand the German plural -s applies only to a small percentage of nounsnevertheless these afxes behave as defaults in the language These defaultsufxations in German could thus pose a problem for statistical orconnectionist accounts of the acquisition of the more frequent patterns asdefault since they may not be due to a large number of regular wordsreinforcing a pattern in associative memory (Prasada amp Pinker 1993)However this is still a matter of some debate Bybee (1995) suggests that amore reasonable method of counting German particle type frequency doesshow the default (or ldquoproductiverdquo) process to have the highest typefrequency She also argues that to a large extent the productivity patterns ofGerman plurals also reect their type frequency Nakisa and Hahn (1996)and Plunkett and Nakisa (in press) show that generalisation to unseen ornovel forms in German and Arabic (where there have also been claims for aminority default) is more accurately predicted by their phonologicalsimilarity to existing forms in the language (properly represented for typeand token frequency) rather than by the operation of a default rule FinallyHare Elman and Daugherty (1995) demonstrate that multilayerednetworks can develop a default category even in the absence of superior typefrequency as long as the non-default classes are well dened and narrowlydened so that they serve as strong prototypes for analogising to novelforms In such cases the area outside these well-dened attractor basins canconstitute a potential default (see also Plunkett amp Marchman 1991)

In the original hybrid model irregulars were stored and accessed fromrote memory Pinker and Prince (1994 p 326) modied this part of the

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 333

model arguing that since rote memory could not account (a) for similaritiesbetween the morphological base and irregular forms (eg swingndashswung) (b)for similarity within sets of base forms undergoing similar processes (egsingndashsang ringndashrang springndashsprang) or (c) for the kind of semi-productivityshown when children produce errors such as bringndashbrang or swingndashswangthe memory system underlying such productions must be associative anddynamic somewhat as connectionism portrays it Yet to account for datasuch as the frequencyregularity interaction this revised hybrid model stillholds that regular forms are rule-governed But a purely rule-based accountof regulars cannot explain false friends effects where regular inconsistentitems (eg bakendashbaked is similar in rhyme to neighbours makendashmade andtakendashtook which have inconsistent past tenses) are produced more slowlythat entirely regular ones (Daugherty amp Seidenberg 1994 Seidenberg ampBruck 1990) or frequency effects on regular forms (Oetting amp Rice 1993Stemberger amp MacWhinney 1986) Unlike connectionist models a rule-based account of regulars cannot explain these aspects of the human dataNor is the regularityfrequency interaction any reason to reject connectionistaccounts of morphosyntax in favour of a hybrid model

REFERENCESAnderson JR (1982) Acquisition of cognitive skill Psychological Review 89 369ndash406Anderson JR (1993) Rules of the mind Hillsdale NJ Lawrence Erlbaum Associates IncAnderson JR amp Schooler LJ (1991) Reections of the environment in memory

Psychological Science 2 396ndash408Beck M (1995) Tracking down the source of NSndashNNS differences in syntactic competence

Unpublished manuscript University of North TexasBellugi U Bihrle A Jernigan D Trauner D amp Dougherty S (1990)

Neuropsychological neurological and neuroanatomical prole of Williams SyndromeAmerican Journal of Medical Genetics 6 115ndash125

Braine MDS Brody RE Brooks PJ Sudhalter V Ross JA Catalano L amp FischSM (1990) Exploring language acquisition in children with a miniature articiallanguage Effects of item and pattern frequency arbitrary subclasses and correctionJournal of Memory and Language 29 591ndash610

Broeder P amp Plunkett K (1994) Connectionism and second language acquisition In NEllis (Ed) Implicit and explicit learning of languages (pp 421ndash454) London AcademicPress

Bybee J (1995) Regular morphology and the lexicon Language and Cognitive Processes10 425ndash455

Chater N (1995) Neural networks The new statistical models of mind In JP Levy DBairaktaris JA Bullinaria amp P Cairns (Eds) Connectionist models of memory andlanguage London UCL Press

Cohen JD MacWhinney B Flatt M amp Provost J (1993) PsyScope A new graphicinteractive environment for designing psychology experiments Behavioral ResearchMethods Instruments and Computers 25 257ndash271

Cottrell G amp Plunkett K (1994) Acquiring the mapping from meaning to soundsConnection Science 6 379ndash412

334 ELLIS AND SCHMIDT

Daugherty KG amp Seidenberg MS (1992) Rules or connections The past tense revisitedIn Proceedings of the 14th annual conference of the Cognitive Science Society (pp 259ndash264)Pittsburgh PA Cognitive Science Society

Daugherty KG amp Seidenberg MS (1994) Beyond rules and exceptions A connectionistapproach to inectional morphology In SD Lima RL Corrigan amp GK Iverson (Eds)The reality of linguistic rules (pp 353ndash388) Amsterdam John Benjamins

DeKeyser R (1997) Beyond explicit rule learning Automatizing second languagemorphosyntax Studies in Second Language Acquisition 19 195ndash222

Ellis NC (1996) Sequencing in SLA Phonological memory chunking and points of orderStudies in Second Language Acquisition 18 91ndash126

Eubank L amp Gregg KR (1995) ldquoEt in Amygdala Egordquo UG (S)LA and neurobiologyStudies in Second Language Acquisition 17 35ndash58

Hare M Elman JL amp Daugherty KG (1995) Default generalisation in connectionistnetworks Language and Cognitive Processes 10 601ndash630

Jung J (1971) The experimenterrsquos dilemma New York Harper amp RowKirsner K (1994) Implicit processes in second language learning In N Ellis (Ed) Implicit

and explicit learning of languages (pp 283ndash312) London Academic PressLachter J amp Bever T (1988) The relation between linguistic structure and associative

theories of language learning A constructive critique of some connectionist learningmodels Cognition 28 195ndash247

Lima SD Corrigan RL amp Iverson GK (Eds) (1994) The reality of linguistic rulesAmsterdam John Benjamins

MacWhinney B (1983) Miniature language systems as tests of use of universal operatingprinciples in second-language learning by children and adults Journal of PsycholinguisticResearch 12 467ndash478

MacWhinney B (1994) The dinosaurs and the ring In SD Lima RL Corrigan amp GKIverson (Eds) The reality of linguistic rules (pp 283ndash320) Amsterdam John Benjamins

MacWhinney B amp Leinbach J (1991) Implementations are not conceptualizationsRevising the verb learning model Cognition 40 121ndash157

Marchman VA (1993) Constraints on plasticity in a connectionist model of the Englishpast tense Journal of Cognitive Neuroscience 5 215ndash234

Marcus GF Brinkmann U Clahsen H Wiese R amp Pinker S (1995) Germaninection The exception that proves the rule Cognitive Psychology 29 198ndash256

McLaughlin B (1980) On the use of miniature articial languages in second-languageresearch Applied Psycholinguistics 1 357ndash369

Moeser SD amp Bregman AS (1972) The role of reference in the acquisition of a miniaturearticial language Journal of Verbal Learning and Verbal Behavior 11 759ndash769

Morgan JL Meier RP amp Newport EL (1987) Structural packaging in the input tolanguage learning Contributions of prosodic and morphological marking of phrases to theacquisition of language Cognitive Psychology 19 498ndash550

Morgan JL amp Newport EL (1981) The role of constituent structure in the induction of anarticial language Journal of Verbal Learning and Verbal Behavior 20 67ndash85

Morton J (1979) Facilitation in word recognition Experiments causing change in thelogogen model In PA Kolers ME Wrolstad amp M Bouma (Eds) Processing of visiblelanguage (pp 259ndash268) New York Plenum

Nakisa R amp Hahn U (1996) Where defaults donrsquot help The case of the German pluralsystem In Proceedings of the 18th annual conference of the Cognitive Science Society (pp177ndash182) Hillsdale NJ Lawrence Erlbaum Associates Inc

Newell A (1990) Unied theories of cognition Cambridge MA Harvard University PressNewell A amp Rosenbloom P (1981) Mechanisms of skill acquisition and the law of

practice In JR Anderson (Ed) Cognitive skills and their acquisition Hillsdale NJLawrence Erlbaum Associates Inc

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 335

Oetting JB amp Rice ML (1993) Plural acquisition in children with specic languageimpairment Journal of Speech and Hearing Research 36 1236ndash1248

Paivio A (1986) Mental representations A dual coding approach Oxford UK OxfordUniversity Press

Palermo DS amp Howe HE (1970) An experimental analogy to the learning of past-tenseinection rules Journal of Verbal Learning and Verbal Behavior 9 410ndash416

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S amp Prince A (1988) On language and connectionism Analysis of a parallel

distributed processing model of language acquisition Cognition 29 195ndash247Pinker S amp Prince A (1994) Regular and irregular morphology and the psychological

status of rules of grammar In SD Lima RL Corrigan amp GK Iverson (Eds) The reality oflinguistic rules (pp 321ndash351) Amsterdam John Benjamins

Plaut DC McClelland JL Seidenberg MS amp Patterson KE (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plunkett K amp Marchman V (1991) U-shaped learning and frequency effects in amulti-layered perceptron Implications for child language acquisition Cognition 38 3ndash102

Plunkett K amp Marchman V (1993) From rote learning to system building Acquiring verbmorphology in children and connectionist nets Cognition 48 21ndash69

Plunkett K amp Nakisa RC (in press) A connectionist model of Arabic plural systemLanguage and Cognitive Processes

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Prasada S Pinker S amp Snyder W (1990) Some evidence that irregular forms are retrievedfrom memory but regular forms are rule-governed Paper presented at the 31st meeting ofthe Psychonomic Society New Orleans November

Rumelhart D Hinton G amp Williams R (1986) Learning internal representations by backpropagation In DE Rumelhart amp JL McClelland (Ed) Parallel distributed processingExplorations in the microstructure of cognition Cambridge MA MIT Press

Rumelhart D amp McClelland J (1986) On learning the past tense of English verbs In DERumelhart amp JL McClelland (Eds) Parallel distributed processing Explorations in themicrostructure of cognition Vol 2 Psychological and biological models (pp 272ndash326)Cambridge MA MIT Press

Seidenberg MS amp Bruck M (1990) Consistency effects in the generation of past tensemorphology Paper presented at the 31st meeting of the Psychonomic Society New OrleansNovember

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behavior 23 383ndash404

Sharwood Smith MA (1994) The unruly world of language In N Ellis (Ed) Implicit andexplicit learning of languages (pp 33ndash44) London Academic Press

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Human Learning and Memory 6 174ndash215

Stemberger JP amp MacWhinney B (1986) Frequency and the lexical storage of regularlyinected forms Memory and Cognition 14 17ndash26

Winter B amp Reber AS (1994) Implicit learning and the acquisition of natural languagesIn N Ellis (Ed) Implicit and explicit learning of languages (pp 115ndash146) LondonAcademic Press

Yang LR amp Givon T (1997) Benets and drawbacks of controlled laboratory studies ofsecond language acquisition The Keck second language learning project Studies in SecondLanguage Acquisition 19 173ndash194

336 ELLIS AND SCHMIDT

APPENDIX

The Word-forms of the Articial Language

Picture Stem Plural Form Frequency Regularity

car garth bugarth 5 Rbed pid bupid 1 Rlamp lant bulant 5 Rtable tib butid 1 Rplane poon bupoon 5 Rball prill buprill 1 Rtrain dram budram 5 Rhouse hize buhize 1 Rbook bisk bubisk 5 Rbroom breen bubreen 1 Rphone feem gofeem 5 Iumbrella brol gubrol 1 Ichair charp zecharp 5 Ihorse naig zonaig 1 Imonkey chonk nuchonk 5 Idog woop niwoop 1 Ielephant fant vefant 5 Iscissors zoze vuzoze 1 Ikite kag rekag 5 Ish pisc ropisc 1 I

332 ELLIS AND SCHMIDT

acquisition data of simple MAL morphosyntax quite well We believe thatthe acquisition of natural language morphosyntax where there are manyadditional factors of different phonological consistencies (of the type forexample where the neighbours sink drink and stink are irregular in theirpast tenses but all behave in the same -ankway) are equally conducive to theprinciples of this type of account although as illustrated in grandersimulation enterprises (Cottrell amp Plunkett 1994 Daugherty amp Seidenberg1994 MacWhinney amp Leinbach 1991 Marchman 1993 Plunkett ampMarchman 1993) the complexity of interaction of the factors that are therein the language evidence leads to much more complex developmentaloutcomes Our role here has been to study human acquisition underprecisely known circumstances and to demonstrate just how well aconnectionist associative account can simulate these data

A simple regularity5 consistency account of this type will have difculty ifthe ldquoregularrdquo or ldquodefaultrdquo case is not the most frequent case in a naturallanguage Although there is agreement for English past tense and formorphology more generally that the default case is more frequent theremay be exceptions Marcus et al (1995) argue that while the German particle-t applies to a much smaller percentage of verbs than its English counterpartand the German plural -s applies only to a small percentage of nounsnevertheless these afxes behave as defaults in the language These defaultsufxations in German could thus pose a problem for statistical orconnectionist accounts of the acquisition of the more frequent patterns asdefault since they may not be due to a large number of regular wordsreinforcing a pattern in associative memory (Prasada amp Pinker 1993)However this is still a matter of some debate Bybee (1995) suggests that amore reasonable method of counting German particle type frequency doesshow the default (or ldquoproductiverdquo) process to have the highest typefrequency She also argues that to a large extent the productivity patterns ofGerman plurals also reect their type frequency Nakisa and Hahn (1996)and Plunkett and Nakisa (in press) show that generalisation to unseen ornovel forms in German and Arabic (where there have also been claims for aminority default) is more accurately predicted by their phonologicalsimilarity to existing forms in the language (properly represented for typeand token frequency) rather than by the operation of a default rule FinallyHare Elman and Daugherty (1995) demonstrate that multilayerednetworks can develop a default category even in the absence of superior typefrequency as long as the non-default classes are well dened and narrowlydened so that they serve as strong prototypes for analogising to novelforms In such cases the area outside these well-dened attractor basins canconstitute a potential default (see also Plunkett amp Marchman 1991)

In the original hybrid model irregulars were stored and accessed fromrote memory Pinker and Prince (1994 p 326) modied this part of the

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 333

model arguing that since rote memory could not account (a) for similaritiesbetween the morphological base and irregular forms (eg swingndashswung) (b)for similarity within sets of base forms undergoing similar processes (egsingndashsang ringndashrang springndashsprang) or (c) for the kind of semi-productivityshown when children produce errors such as bringndashbrang or swingndashswangthe memory system underlying such productions must be associative anddynamic somewhat as connectionism portrays it Yet to account for datasuch as the frequencyregularity interaction this revised hybrid model stillholds that regular forms are rule-governed But a purely rule-based accountof regulars cannot explain false friends effects where regular inconsistentitems (eg bakendashbaked is similar in rhyme to neighbours makendashmade andtakendashtook which have inconsistent past tenses) are produced more slowlythat entirely regular ones (Daugherty amp Seidenberg 1994 Seidenberg ampBruck 1990) or frequency effects on regular forms (Oetting amp Rice 1993Stemberger amp MacWhinney 1986) Unlike connectionist models a rule-based account of regulars cannot explain these aspects of the human dataNor is the regularityfrequency interaction any reason to reject connectionistaccounts of morphosyntax in favour of a hybrid model

REFERENCESAnderson JR (1982) Acquisition of cognitive skill Psychological Review 89 369ndash406Anderson JR (1993) Rules of the mind Hillsdale NJ Lawrence Erlbaum Associates IncAnderson JR amp Schooler LJ (1991) Reections of the environment in memory

Psychological Science 2 396ndash408Beck M (1995) Tracking down the source of NSndashNNS differences in syntactic competence

Unpublished manuscript University of North TexasBellugi U Bihrle A Jernigan D Trauner D amp Dougherty S (1990)

Neuropsychological neurological and neuroanatomical prole of Williams SyndromeAmerican Journal of Medical Genetics 6 115ndash125

Braine MDS Brody RE Brooks PJ Sudhalter V Ross JA Catalano L amp FischSM (1990) Exploring language acquisition in children with a miniature articiallanguage Effects of item and pattern frequency arbitrary subclasses and correctionJournal of Memory and Language 29 591ndash610

Broeder P amp Plunkett K (1994) Connectionism and second language acquisition In NEllis (Ed) Implicit and explicit learning of languages (pp 421ndash454) London AcademicPress

Bybee J (1995) Regular morphology and the lexicon Language and Cognitive Processes10 425ndash455

Chater N (1995) Neural networks The new statistical models of mind In JP Levy DBairaktaris JA Bullinaria amp P Cairns (Eds) Connectionist models of memory andlanguage London UCL Press

Cohen JD MacWhinney B Flatt M amp Provost J (1993) PsyScope A new graphicinteractive environment for designing psychology experiments Behavioral ResearchMethods Instruments and Computers 25 257ndash271

Cottrell G amp Plunkett K (1994) Acquiring the mapping from meaning to soundsConnection Science 6 379ndash412

334 ELLIS AND SCHMIDT

Daugherty KG amp Seidenberg MS (1992) Rules or connections The past tense revisitedIn Proceedings of the 14th annual conference of the Cognitive Science Society (pp 259ndash264)Pittsburgh PA Cognitive Science Society

Daugherty KG amp Seidenberg MS (1994) Beyond rules and exceptions A connectionistapproach to inectional morphology In SD Lima RL Corrigan amp GK Iverson (Eds)The reality of linguistic rules (pp 353ndash388) Amsterdam John Benjamins

DeKeyser R (1997) Beyond explicit rule learning Automatizing second languagemorphosyntax Studies in Second Language Acquisition 19 195ndash222

Ellis NC (1996) Sequencing in SLA Phonological memory chunking and points of orderStudies in Second Language Acquisition 18 91ndash126

Eubank L amp Gregg KR (1995) ldquoEt in Amygdala Egordquo UG (S)LA and neurobiologyStudies in Second Language Acquisition 17 35ndash58

Hare M Elman JL amp Daugherty KG (1995) Default generalisation in connectionistnetworks Language and Cognitive Processes 10 601ndash630

Jung J (1971) The experimenterrsquos dilemma New York Harper amp RowKirsner K (1994) Implicit processes in second language learning In N Ellis (Ed) Implicit

and explicit learning of languages (pp 283ndash312) London Academic PressLachter J amp Bever T (1988) The relation between linguistic structure and associative

theories of language learning A constructive critique of some connectionist learningmodels Cognition 28 195ndash247

Lima SD Corrigan RL amp Iverson GK (Eds) (1994) The reality of linguistic rulesAmsterdam John Benjamins

MacWhinney B (1983) Miniature language systems as tests of use of universal operatingprinciples in second-language learning by children and adults Journal of PsycholinguisticResearch 12 467ndash478

MacWhinney B (1994) The dinosaurs and the ring In SD Lima RL Corrigan amp GKIverson (Eds) The reality of linguistic rules (pp 283ndash320) Amsterdam John Benjamins

MacWhinney B amp Leinbach J (1991) Implementations are not conceptualizationsRevising the verb learning model Cognition 40 121ndash157

Marchman VA (1993) Constraints on plasticity in a connectionist model of the Englishpast tense Journal of Cognitive Neuroscience 5 215ndash234

Marcus GF Brinkmann U Clahsen H Wiese R amp Pinker S (1995) Germaninection The exception that proves the rule Cognitive Psychology 29 198ndash256

McLaughlin B (1980) On the use of miniature articial languages in second-languageresearch Applied Psycholinguistics 1 357ndash369

Moeser SD amp Bregman AS (1972) The role of reference in the acquisition of a miniaturearticial language Journal of Verbal Learning and Verbal Behavior 11 759ndash769

Morgan JL Meier RP amp Newport EL (1987) Structural packaging in the input tolanguage learning Contributions of prosodic and morphological marking of phrases to theacquisition of language Cognitive Psychology 19 498ndash550

Morgan JL amp Newport EL (1981) The role of constituent structure in the induction of anarticial language Journal of Verbal Learning and Verbal Behavior 20 67ndash85

Morton J (1979) Facilitation in word recognition Experiments causing change in thelogogen model In PA Kolers ME Wrolstad amp M Bouma (Eds) Processing of visiblelanguage (pp 259ndash268) New York Plenum

Nakisa R amp Hahn U (1996) Where defaults donrsquot help The case of the German pluralsystem In Proceedings of the 18th annual conference of the Cognitive Science Society (pp177ndash182) Hillsdale NJ Lawrence Erlbaum Associates Inc

Newell A (1990) Unied theories of cognition Cambridge MA Harvard University PressNewell A amp Rosenbloom P (1981) Mechanisms of skill acquisition and the law of

practice In JR Anderson (Ed) Cognitive skills and their acquisition Hillsdale NJLawrence Erlbaum Associates Inc

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 335

Oetting JB amp Rice ML (1993) Plural acquisition in children with specic languageimpairment Journal of Speech and Hearing Research 36 1236ndash1248

Paivio A (1986) Mental representations A dual coding approach Oxford UK OxfordUniversity Press

Palermo DS amp Howe HE (1970) An experimental analogy to the learning of past-tenseinection rules Journal of Verbal Learning and Verbal Behavior 9 410ndash416

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S amp Prince A (1988) On language and connectionism Analysis of a parallel

distributed processing model of language acquisition Cognition 29 195ndash247Pinker S amp Prince A (1994) Regular and irregular morphology and the psychological

status of rules of grammar In SD Lima RL Corrigan amp GK Iverson (Eds) The reality oflinguistic rules (pp 321ndash351) Amsterdam John Benjamins

Plaut DC McClelland JL Seidenberg MS amp Patterson KE (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plunkett K amp Marchman V (1991) U-shaped learning and frequency effects in amulti-layered perceptron Implications for child language acquisition Cognition 38 3ndash102

Plunkett K amp Marchman V (1993) From rote learning to system building Acquiring verbmorphology in children and connectionist nets Cognition 48 21ndash69

Plunkett K amp Nakisa RC (in press) A connectionist model of Arabic plural systemLanguage and Cognitive Processes

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Prasada S Pinker S amp Snyder W (1990) Some evidence that irregular forms are retrievedfrom memory but regular forms are rule-governed Paper presented at the 31st meeting ofthe Psychonomic Society New Orleans November

Rumelhart D Hinton G amp Williams R (1986) Learning internal representations by backpropagation In DE Rumelhart amp JL McClelland (Ed) Parallel distributed processingExplorations in the microstructure of cognition Cambridge MA MIT Press

Rumelhart D amp McClelland J (1986) On learning the past tense of English verbs In DERumelhart amp JL McClelland (Eds) Parallel distributed processing Explorations in themicrostructure of cognition Vol 2 Psychological and biological models (pp 272ndash326)Cambridge MA MIT Press

Seidenberg MS amp Bruck M (1990) Consistency effects in the generation of past tensemorphology Paper presented at the 31st meeting of the Psychonomic Society New OrleansNovember

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behavior 23 383ndash404

Sharwood Smith MA (1994) The unruly world of language In N Ellis (Ed) Implicit andexplicit learning of languages (pp 33ndash44) London Academic Press

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Human Learning and Memory 6 174ndash215

Stemberger JP amp MacWhinney B (1986) Frequency and the lexical storage of regularlyinected forms Memory and Cognition 14 17ndash26

Winter B amp Reber AS (1994) Implicit learning and the acquisition of natural languagesIn N Ellis (Ed) Implicit and explicit learning of languages (pp 115ndash146) LondonAcademic Press

Yang LR amp Givon T (1997) Benets and drawbacks of controlled laboratory studies ofsecond language acquisition The Keck second language learning project Studies in SecondLanguage Acquisition 19 173ndash194

336 ELLIS AND SCHMIDT

APPENDIX

The Word-forms of the Articial Language

Picture Stem Plural Form Frequency Regularity

car garth bugarth 5 Rbed pid bupid 1 Rlamp lant bulant 5 Rtable tib butid 1 Rplane poon bupoon 5 Rball prill buprill 1 Rtrain dram budram 5 Rhouse hize buhize 1 Rbook bisk bubisk 5 Rbroom breen bubreen 1 Rphone feem gofeem 5 Iumbrella brol gubrol 1 Ichair charp zecharp 5 Ihorse naig zonaig 1 Imonkey chonk nuchonk 5 Idog woop niwoop 1 Ielephant fant vefant 5 Iscissors zoze vuzoze 1 Ikite kag rekag 5 Ish pisc ropisc 1 I

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 333

model arguing that since rote memory could not account (a) for similaritiesbetween the morphological base and irregular forms (eg swingndashswung) (b)for similarity within sets of base forms undergoing similar processes (egsingndashsang ringndashrang springndashsprang) or (c) for the kind of semi-productivityshown when children produce errors such as bringndashbrang or swingndashswangthe memory system underlying such productions must be associative anddynamic somewhat as connectionism portrays it Yet to account for datasuch as the frequencyregularity interaction this revised hybrid model stillholds that regular forms are rule-governed But a purely rule-based accountof regulars cannot explain false friends effects where regular inconsistentitems (eg bakendashbaked is similar in rhyme to neighbours makendashmade andtakendashtook which have inconsistent past tenses) are produced more slowlythat entirely regular ones (Daugherty amp Seidenberg 1994 Seidenberg ampBruck 1990) or frequency effects on regular forms (Oetting amp Rice 1993Stemberger amp MacWhinney 1986) Unlike connectionist models a rule-based account of regulars cannot explain these aspects of the human dataNor is the regularityfrequency interaction any reason to reject connectionistaccounts of morphosyntax in favour of a hybrid model

REFERENCESAnderson JR (1982) Acquisition of cognitive skill Psychological Review 89 369ndash406Anderson JR (1993) Rules of the mind Hillsdale NJ Lawrence Erlbaum Associates IncAnderson JR amp Schooler LJ (1991) Reections of the environment in memory

Psychological Science 2 396ndash408Beck M (1995) Tracking down the source of NSndashNNS differences in syntactic competence

Unpublished manuscript University of North TexasBellugi U Bihrle A Jernigan D Trauner D amp Dougherty S (1990)

Neuropsychological neurological and neuroanatomical prole of Williams SyndromeAmerican Journal of Medical Genetics 6 115ndash125

Braine MDS Brody RE Brooks PJ Sudhalter V Ross JA Catalano L amp FischSM (1990) Exploring language acquisition in children with a miniature articiallanguage Effects of item and pattern frequency arbitrary subclasses and correctionJournal of Memory and Language 29 591ndash610

Broeder P amp Plunkett K (1994) Connectionism and second language acquisition In NEllis (Ed) Implicit and explicit learning of languages (pp 421ndash454) London AcademicPress

Bybee J (1995) Regular morphology and the lexicon Language and Cognitive Processes10 425ndash455

Chater N (1995) Neural networks The new statistical models of mind In JP Levy DBairaktaris JA Bullinaria amp P Cairns (Eds) Connectionist models of memory andlanguage London UCL Press

Cohen JD MacWhinney B Flatt M amp Provost J (1993) PsyScope A new graphicinteractive environment for designing psychology experiments Behavioral ResearchMethods Instruments and Computers 25 257ndash271

Cottrell G amp Plunkett K (1994) Acquiring the mapping from meaning to soundsConnection Science 6 379ndash412

334 ELLIS AND SCHMIDT

Daugherty KG amp Seidenberg MS (1992) Rules or connections The past tense revisitedIn Proceedings of the 14th annual conference of the Cognitive Science Society (pp 259ndash264)Pittsburgh PA Cognitive Science Society

Daugherty KG amp Seidenberg MS (1994) Beyond rules and exceptions A connectionistapproach to inectional morphology In SD Lima RL Corrigan amp GK Iverson (Eds)The reality of linguistic rules (pp 353ndash388) Amsterdam John Benjamins

DeKeyser R (1997) Beyond explicit rule learning Automatizing second languagemorphosyntax Studies in Second Language Acquisition 19 195ndash222

Ellis NC (1996) Sequencing in SLA Phonological memory chunking and points of orderStudies in Second Language Acquisition 18 91ndash126

Eubank L amp Gregg KR (1995) ldquoEt in Amygdala Egordquo UG (S)LA and neurobiologyStudies in Second Language Acquisition 17 35ndash58

Hare M Elman JL amp Daugherty KG (1995) Default generalisation in connectionistnetworks Language and Cognitive Processes 10 601ndash630

Jung J (1971) The experimenterrsquos dilemma New York Harper amp RowKirsner K (1994) Implicit processes in second language learning In N Ellis (Ed) Implicit

and explicit learning of languages (pp 283ndash312) London Academic PressLachter J amp Bever T (1988) The relation between linguistic structure and associative

theories of language learning A constructive critique of some connectionist learningmodels Cognition 28 195ndash247

Lima SD Corrigan RL amp Iverson GK (Eds) (1994) The reality of linguistic rulesAmsterdam John Benjamins

MacWhinney B (1983) Miniature language systems as tests of use of universal operatingprinciples in second-language learning by children and adults Journal of PsycholinguisticResearch 12 467ndash478

MacWhinney B (1994) The dinosaurs and the ring In SD Lima RL Corrigan amp GKIverson (Eds) The reality of linguistic rules (pp 283ndash320) Amsterdam John Benjamins

MacWhinney B amp Leinbach J (1991) Implementations are not conceptualizationsRevising the verb learning model Cognition 40 121ndash157

Marchman VA (1993) Constraints on plasticity in a connectionist model of the Englishpast tense Journal of Cognitive Neuroscience 5 215ndash234

Marcus GF Brinkmann U Clahsen H Wiese R amp Pinker S (1995) Germaninection The exception that proves the rule Cognitive Psychology 29 198ndash256

McLaughlin B (1980) On the use of miniature articial languages in second-languageresearch Applied Psycholinguistics 1 357ndash369

Moeser SD amp Bregman AS (1972) The role of reference in the acquisition of a miniaturearticial language Journal of Verbal Learning and Verbal Behavior 11 759ndash769

Morgan JL Meier RP amp Newport EL (1987) Structural packaging in the input tolanguage learning Contributions of prosodic and morphological marking of phrases to theacquisition of language Cognitive Psychology 19 498ndash550

Morgan JL amp Newport EL (1981) The role of constituent structure in the induction of anarticial language Journal of Verbal Learning and Verbal Behavior 20 67ndash85

Morton J (1979) Facilitation in word recognition Experiments causing change in thelogogen model In PA Kolers ME Wrolstad amp M Bouma (Eds) Processing of visiblelanguage (pp 259ndash268) New York Plenum

Nakisa R amp Hahn U (1996) Where defaults donrsquot help The case of the German pluralsystem In Proceedings of the 18th annual conference of the Cognitive Science Society (pp177ndash182) Hillsdale NJ Lawrence Erlbaum Associates Inc

Newell A (1990) Unied theories of cognition Cambridge MA Harvard University PressNewell A amp Rosenbloom P (1981) Mechanisms of skill acquisition and the law of

practice In JR Anderson (Ed) Cognitive skills and their acquisition Hillsdale NJLawrence Erlbaum Associates Inc

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 335

Oetting JB amp Rice ML (1993) Plural acquisition in children with specic languageimpairment Journal of Speech and Hearing Research 36 1236ndash1248

Paivio A (1986) Mental representations A dual coding approach Oxford UK OxfordUniversity Press

Palermo DS amp Howe HE (1970) An experimental analogy to the learning of past-tenseinection rules Journal of Verbal Learning and Verbal Behavior 9 410ndash416

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S amp Prince A (1988) On language and connectionism Analysis of a parallel

distributed processing model of language acquisition Cognition 29 195ndash247Pinker S amp Prince A (1994) Regular and irregular morphology and the psychological

status of rules of grammar In SD Lima RL Corrigan amp GK Iverson (Eds) The reality oflinguistic rules (pp 321ndash351) Amsterdam John Benjamins

Plaut DC McClelland JL Seidenberg MS amp Patterson KE (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plunkett K amp Marchman V (1991) U-shaped learning and frequency effects in amulti-layered perceptron Implications for child language acquisition Cognition 38 3ndash102

Plunkett K amp Marchman V (1993) From rote learning to system building Acquiring verbmorphology in children and connectionist nets Cognition 48 21ndash69

Plunkett K amp Nakisa RC (in press) A connectionist model of Arabic plural systemLanguage and Cognitive Processes

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Prasada S Pinker S amp Snyder W (1990) Some evidence that irregular forms are retrievedfrom memory but regular forms are rule-governed Paper presented at the 31st meeting ofthe Psychonomic Society New Orleans November

Rumelhart D Hinton G amp Williams R (1986) Learning internal representations by backpropagation In DE Rumelhart amp JL McClelland (Ed) Parallel distributed processingExplorations in the microstructure of cognition Cambridge MA MIT Press

Rumelhart D amp McClelland J (1986) On learning the past tense of English verbs In DERumelhart amp JL McClelland (Eds) Parallel distributed processing Explorations in themicrostructure of cognition Vol 2 Psychological and biological models (pp 272ndash326)Cambridge MA MIT Press

Seidenberg MS amp Bruck M (1990) Consistency effects in the generation of past tensemorphology Paper presented at the 31st meeting of the Psychonomic Society New OrleansNovember

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behavior 23 383ndash404

Sharwood Smith MA (1994) The unruly world of language In N Ellis (Ed) Implicit andexplicit learning of languages (pp 33ndash44) London Academic Press

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Human Learning and Memory 6 174ndash215

Stemberger JP amp MacWhinney B (1986) Frequency and the lexical storage of regularlyinected forms Memory and Cognition 14 17ndash26

Winter B amp Reber AS (1994) Implicit learning and the acquisition of natural languagesIn N Ellis (Ed) Implicit and explicit learning of languages (pp 115ndash146) LondonAcademic Press

Yang LR amp Givon T (1997) Benets and drawbacks of controlled laboratory studies ofsecond language acquisition The Keck second language learning project Studies in SecondLanguage Acquisition 19 173ndash194

336 ELLIS AND SCHMIDT

APPENDIX

The Word-forms of the Articial Language

Picture Stem Plural Form Frequency Regularity

car garth bugarth 5 Rbed pid bupid 1 Rlamp lant bulant 5 Rtable tib butid 1 Rplane poon bupoon 5 Rball prill buprill 1 Rtrain dram budram 5 Rhouse hize buhize 1 Rbook bisk bubisk 5 Rbroom breen bubreen 1 Rphone feem gofeem 5 Iumbrella brol gubrol 1 Ichair charp zecharp 5 Ihorse naig zonaig 1 Imonkey chonk nuchonk 5 Idog woop niwoop 1 Ielephant fant vefant 5 Iscissors zoze vuzoze 1 Ikite kag rekag 5 Ish pisc ropisc 1 I

334 ELLIS AND SCHMIDT

Daugherty KG amp Seidenberg MS (1992) Rules or connections The past tense revisitedIn Proceedings of the 14th annual conference of the Cognitive Science Society (pp 259ndash264)Pittsburgh PA Cognitive Science Society

Daugherty KG amp Seidenberg MS (1994) Beyond rules and exceptions A connectionistapproach to inectional morphology In SD Lima RL Corrigan amp GK Iverson (Eds)The reality of linguistic rules (pp 353ndash388) Amsterdam John Benjamins

DeKeyser R (1997) Beyond explicit rule learning Automatizing second languagemorphosyntax Studies in Second Language Acquisition 19 195ndash222

Ellis NC (1996) Sequencing in SLA Phonological memory chunking and points of orderStudies in Second Language Acquisition 18 91ndash126

Eubank L amp Gregg KR (1995) ldquoEt in Amygdala Egordquo UG (S)LA and neurobiologyStudies in Second Language Acquisition 17 35ndash58

Hare M Elman JL amp Daugherty KG (1995) Default generalisation in connectionistnetworks Language and Cognitive Processes 10 601ndash630

Jung J (1971) The experimenterrsquos dilemma New York Harper amp RowKirsner K (1994) Implicit processes in second language learning In N Ellis (Ed) Implicit

and explicit learning of languages (pp 283ndash312) London Academic PressLachter J amp Bever T (1988) The relation between linguistic structure and associative

theories of language learning A constructive critique of some connectionist learningmodels Cognition 28 195ndash247

Lima SD Corrigan RL amp Iverson GK (Eds) (1994) The reality of linguistic rulesAmsterdam John Benjamins

MacWhinney B (1983) Miniature language systems as tests of use of universal operatingprinciples in second-language learning by children and adults Journal of PsycholinguisticResearch 12 467ndash478

MacWhinney B (1994) The dinosaurs and the ring In SD Lima RL Corrigan amp GKIverson (Eds) The reality of linguistic rules (pp 283ndash320) Amsterdam John Benjamins

MacWhinney B amp Leinbach J (1991) Implementations are not conceptualizationsRevising the verb learning model Cognition 40 121ndash157

Marchman VA (1993) Constraints on plasticity in a connectionist model of the Englishpast tense Journal of Cognitive Neuroscience 5 215ndash234

Marcus GF Brinkmann U Clahsen H Wiese R amp Pinker S (1995) Germaninection The exception that proves the rule Cognitive Psychology 29 198ndash256

McLaughlin B (1980) On the use of miniature articial languages in second-languageresearch Applied Psycholinguistics 1 357ndash369

Moeser SD amp Bregman AS (1972) The role of reference in the acquisition of a miniaturearticial language Journal of Verbal Learning and Verbal Behavior 11 759ndash769

Morgan JL Meier RP amp Newport EL (1987) Structural packaging in the input tolanguage learning Contributions of prosodic and morphological marking of phrases to theacquisition of language Cognitive Psychology 19 498ndash550

Morgan JL amp Newport EL (1981) The role of constituent structure in the induction of anarticial language Journal of Verbal Learning and Verbal Behavior 20 67ndash85

Morton J (1979) Facilitation in word recognition Experiments causing change in thelogogen model In PA Kolers ME Wrolstad amp M Bouma (Eds) Processing of visiblelanguage (pp 259ndash268) New York Plenum

Nakisa R amp Hahn U (1996) Where defaults donrsquot help The case of the German pluralsystem In Proceedings of the 18th annual conference of the Cognitive Science Society (pp177ndash182) Hillsdale NJ Lawrence Erlbaum Associates Inc

Newell A (1990) Unied theories of cognition Cambridge MA Harvard University PressNewell A amp Rosenbloom P (1981) Mechanisms of skill acquisition and the law of

practice In JR Anderson (Ed) Cognitive skills and their acquisition Hillsdale NJLawrence Erlbaum Associates Inc

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 335

Oetting JB amp Rice ML (1993) Plural acquisition in children with specic languageimpairment Journal of Speech and Hearing Research 36 1236ndash1248

Paivio A (1986) Mental representations A dual coding approach Oxford UK OxfordUniversity Press

Palermo DS amp Howe HE (1970) An experimental analogy to the learning of past-tenseinection rules Journal of Verbal Learning and Verbal Behavior 9 410ndash416

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S amp Prince A (1988) On language and connectionism Analysis of a parallel

distributed processing model of language acquisition Cognition 29 195ndash247Pinker S amp Prince A (1994) Regular and irregular morphology and the psychological

status of rules of grammar In SD Lima RL Corrigan amp GK Iverson (Eds) The reality oflinguistic rules (pp 321ndash351) Amsterdam John Benjamins

Plaut DC McClelland JL Seidenberg MS amp Patterson KE (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plunkett K amp Marchman V (1991) U-shaped learning and frequency effects in amulti-layered perceptron Implications for child language acquisition Cognition 38 3ndash102

Plunkett K amp Marchman V (1993) From rote learning to system building Acquiring verbmorphology in children and connectionist nets Cognition 48 21ndash69

Plunkett K amp Nakisa RC (in press) A connectionist model of Arabic plural systemLanguage and Cognitive Processes

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Prasada S Pinker S amp Snyder W (1990) Some evidence that irregular forms are retrievedfrom memory but regular forms are rule-governed Paper presented at the 31st meeting ofthe Psychonomic Society New Orleans November

Rumelhart D Hinton G amp Williams R (1986) Learning internal representations by backpropagation In DE Rumelhart amp JL McClelland (Ed) Parallel distributed processingExplorations in the microstructure of cognition Cambridge MA MIT Press

Rumelhart D amp McClelland J (1986) On learning the past tense of English verbs In DERumelhart amp JL McClelland (Eds) Parallel distributed processing Explorations in themicrostructure of cognition Vol 2 Psychological and biological models (pp 272ndash326)Cambridge MA MIT Press

Seidenberg MS amp Bruck M (1990) Consistency effects in the generation of past tensemorphology Paper presented at the 31st meeting of the Psychonomic Society New OrleansNovember

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behavior 23 383ndash404

Sharwood Smith MA (1994) The unruly world of language In N Ellis (Ed) Implicit andexplicit learning of languages (pp 33ndash44) London Academic Press

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Human Learning and Memory 6 174ndash215

Stemberger JP amp MacWhinney B (1986) Frequency and the lexical storage of regularlyinected forms Memory and Cognition 14 17ndash26

Winter B amp Reber AS (1994) Implicit learning and the acquisition of natural languagesIn N Ellis (Ed) Implicit and explicit learning of languages (pp 115ndash146) LondonAcademic Press

Yang LR amp Givon T (1997) Benets and drawbacks of controlled laboratory studies ofsecond language acquisition The Keck second language learning project Studies in SecondLanguage Acquisition 19 173ndash194

336 ELLIS AND SCHMIDT

APPENDIX

The Word-forms of the Articial Language

Picture Stem Plural Form Frequency Regularity

car garth bugarth 5 Rbed pid bupid 1 Rlamp lant bulant 5 Rtable tib butid 1 Rplane poon bupoon 5 Rball prill buprill 1 Rtrain dram budram 5 Rhouse hize buhize 1 Rbook bisk bubisk 5 Rbroom breen bubreen 1 Rphone feem gofeem 5 Iumbrella brol gubrol 1 Ichair charp zecharp 5 Ihorse naig zonaig 1 Imonkey chonk nuchonk 5 Idog woop niwoop 1 Ielephant fant vefant 5 Iscissors zoze vuzoze 1 Ikite kag rekag 5 Ish pisc ropisc 1 I

FREQUENCY AND REGULARITY IN MORPHOSYNTAX ACQUISITION 335

Oetting JB amp Rice ML (1993) Plural acquisition in children with specic languageimpairment Journal of Speech and Hearing Research 36 1236ndash1248

Paivio A (1986) Mental representations A dual coding approach Oxford UK OxfordUniversity Press

Palermo DS amp Howe HE (1970) An experimental analogy to the learning of past-tenseinection rules Journal of Verbal Learning and Verbal Behavior 9 410ndash416

Pinker S (1991) Rules of language Science 253 530ndash535Pinker S amp Prince A (1988) On language and connectionism Analysis of a parallel

distributed processing model of language acquisition Cognition 29 195ndash247Pinker S amp Prince A (1994) Regular and irregular morphology and the psychological

status of rules of grammar In SD Lima RL Corrigan amp GK Iverson (Eds) The reality oflinguistic rules (pp 321ndash351) Amsterdam John Benjamins

Plaut DC McClelland JL Seidenberg MS amp Patterson KE (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Plunkett K amp Marchman V (1991) U-shaped learning and frequency effects in amulti-layered perceptron Implications for child language acquisition Cognition 38 3ndash102

Plunkett K amp Marchman V (1993) From rote learning to system building Acquiring verbmorphology in children and connectionist nets Cognition 48 21ndash69

Plunkett K amp Nakisa RC (in press) A connectionist model of Arabic plural systemLanguage and Cognitive Processes

Prasada S amp Pinker S (1993) Generalization of regular and irregular morphologicalpatterns Language and Cognitive Processes 8 1ndash56

Prasada S Pinker S amp Snyder W (1990) Some evidence that irregular forms are retrievedfrom memory but regular forms are rule-governed Paper presented at the 31st meeting ofthe Psychonomic Society New Orleans November

Rumelhart D Hinton G amp Williams R (1986) Learning internal representations by backpropagation In DE Rumelhart amp JL McClelland (Ed) Parallel distributed processingExplorations in the microstructure of cognition Cambridge MA MIT Press

Rumelhart D amp McClelland J (1986) On learning the past tense of English verbs In DERumelhart amp JL McClelland (Eds) Parallel distributed processing Explorations in themicrostructure of cognition Vol 2 Psychological and biological models (pp 272ndash326)Cambridge MA MIT Press

Seidenberg MS amp Bruck M (1990) Consistency effects in the generation of past tensemorphology Paper presented at the 31st meeting of the Psychonomic Society New OrleansNovember

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behavior 23 383ndash404

Sharwood Smith MA (1994) The unruly world of language In N Ellis (Ed) Implicit andexplicit learning of languages (pp 33ndash44) London Academic Press

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Human Learning and Memory 6 174ndash215

Stemberger JP amp MacWhinney B (1986) Frequency and the lexical storage of regularlyinected forms Memory and Cognition 14 17ndash26

Winter B amp Reber AS (1994) Implicit learning and the acquisition of natural languagesIn N Ellis (Ed) Implicit and explicit learning of languages (pp 115ndash146) LondonAcademic Press

Yang LR amp Givon T (1997) Benets and drawbacks of controlled laboratory studies ofsecond language acquisition The Keck second language learning project Studies in SecondLanguage Acquisition 19 173ndash194

336 ELLIS AND SCHMIDT

APPENDIX

The Word-forms of the Articial Language

Picture Stem Plural Form Frequency Regularity

car garth bugarth 5 Rbed pid bupid 1 Rlamp lant bulant 5 Rtable tib butid 1 Rplane poon bupoon 5 Rball prill buprill 1 Rtrain dram budram 5 Rhouse hize buhize 1 Rbook bisk bubisk 5 Rbroom breen bubreen 1 Rphone feem gofeem 5 Iumbrella brol gubrol 1 Ichair charp zecharp 5 Ihorse naig zonaig 1 Imonkey chonk nuchonk 5 Idog woop niwoop 1 Ielephant fant vefant 5 Iscissors zoze vuzoze 1 Ikite kag rekag 5 Ish pisc ropisc 1 I

336 ELLIS AND SCHMIDT

APPENDIX

The Word-forms of the Articial Language

Picture Stem Plural Form Frequency Regularity

car garth bugarth 5 Rbed pid bupid 1 Rlamp lant bulant 5 Rtable tib butid 1 Rplane poon bupoon 5 Rball prill buprill 1 Rtrain dram budram 5 Rhouse hize buhize 1 Rbook bisk bubisk 5 Rbroom breen bubreen 1 Rphone feem gofeem 5 Iumbrella brol gubrol 1 Ichair charp zecharp 5 Ihorse naig zonaig 1 Imonkey chonk nuchonk 5 Idog woop niwoop 1 Ielephant fant vefant 5 Iscissors zoze vuzoze 1 Ikite kag rekag 5 Ish pisc ropisc 1 I